Understanding the Statistical Significance of Study Results

Reports from RCTs such as the WHI study frequently include relative risk as a summary measure of differences between the treatment and placebo groups (Table 8-1). To arrive at the relative risk, the researcher first measures the incidence rate of an outcome in each of the two study groups (i.e., treatment and placebo). The incidence rate for each group is a ratio of the number of new outcome events, such as CHD events, divided by the number of patients at risk for the outcome in that group over a specific period. In multiyear studies, the average annual incidence rate is often reported as a summary measure. In a placebo-controlled RCT, the relative risk is then calculated as a ratio of the incidence rate for the treatment group divided by the incidence rate for the placebo group (Table 8-2).

Table 8-2 Examples of Summary Rates from the Women's Health Initiative (WHI) Study

The following equations show how to take a summary rate commonly reported in published studies (i.e., relative risk) and calculate a summary measure (e.g., number needed to treat, number needed to harm) that may be more useful in describing the results to clinicians and patients. The example considers the average annual incidence rates and relative risk for coronary heart disease (CHD) events in the WHI study on the effects of hormone replacement therapy (HRT):

Average annual incidence among HRT - treated women = 37 CHD events / year /10,000 women

Average annual incidence among placebo - treated women = 30 CHD events / year /10,000 women

Relative RiskofCHD =

37 CHD events /10,000 women 30 CHD events /10,000 women

The relative risk describes a relative 29% increase in CHD events. It may be more useful to consider the absolute difference in incidence rates between the two groups to understand the magnitude of the potential risk for a given patient:

Attributable risk(AR) =

37 CHD events 30 CHD events

10,000 women

7 additional 10,000 women

10,000 women CHDevents

The number needed to harm (NNH) can be calculated to describe, on average, how many women must be treated for 1 year to cause one additional CHD event attributable to HRT:


7 CHD events / 1000 women

Data from Ebell MH, Messimer SR, Barry HC. Putting computer-based evidence in the hands of clinicians. JAMA 1999;28:1171-1172.

How can a physician determine whether the reported relative risk from a study is significant enough to influence clinical decisions? Typically, the statistical significance of the summary measure is reported, which in this case is relative risk. Statistical significance is usually summarized in published studies by a p value for a given summary measure. The p value describes the statistical probability that the observed difference between the groups could have happened simply by chance alone. A p value of less than 0.05 is the arbitrary cutoff most often used for "statistical significance." A "p <0.05" means that there is less than a 1 in 20 (5%) probability that a difference as large as that observed would have occurred by chance alone; a p = 0.04 means a 1 in 25 (4%) probability; a p = 0.06 means a 1 in 16 probability (6%).

Although frequently used, p values provide only limited information: the chance that any difference found is caused by chance, or random error. A p value alone gives no indication of the clinical significance of a finding and provides no information regarding the likelihood that a finding of "no difference" is caused by chance, or random error.

Confidence intervals are much more informative than p values. When relative risk is reported as the summary result of a study, the 95% confidence interval (CI) is often used to give an indication of the precision of the estimated relative risk. The 95% CI describes the range within which there is a 95% probability that the true relative risk (RR) is in that range. An RR of

1.0 indicates no difference. For example, if a study reported an RR of 2.5 with a 95% CI of 2.3 to 2.7, we could be reasonably certain (95% certain) that the true RR was no less than 2.3 and no greater than 2.7. Our conclusion would be that the estimated RR of 2.5 is fairly precise. However, if RR was reported as 2.5 with a 95% CI of 1.1 to 5.0, the true RR could be as low as 1.1 (almost no difference) or as high as 5.0 (a fivefold difference), an obviously imprecise estimate of the relative risk.

Confidence intervals also provide a better measure than p values of the precision for concluding that there is no difference in a relative risk. Any 95% CI that includes RR = 1.0 indicates that there may be "no difference." However, a RR of

1.1 with a 95% CI of 0.99 to 1.11 is almost certainly a finding of no difference (i.e., a narrow confidence interval), whereas an estimated RR = 1.4 with a 95% CI interval of 0.99 to 1.7 is much less precise (i.e., a wide confidence interval). Even though the 95% CI contains 1.0, there may still be a true difference, just not detected in this study.



The Stages Of A Woman’s Life Are No Longer A Mystery. Get Instant Access To Valuable Information On All The Phases Of The Female Body From Menstruation To Menopause And Everything In Between.

Get My Free Ebook

Post a comment