This article outlines the major types of experimental and observational study designs and discusses their relative advantages and disadvantages in evaluating novel interventions, such as medications. The discussion is illustrated using an example in which the two different study designs investigated the same intervention and gave different answers. The possible reasons for this discrepancy are explained by highlighting the inherent strengths and weaknesses of each study’s’ design.
Study designs can be classified into two categories: 1) experimental studies and 2) observational studies.6-9 In experimental studies the researcher intervenes by manipulating the variable of interest.6,7 The best known example is the RCT. In observational studies the researcher does not intervene and reports observed differences between subjects that already differ in the variable of interest.6-9 The two main observational methodologies for evaluating outcomes are cohort and case-control studies. Descriptions of the main trial methodologies are provided below. The specific strengths and weakness of each method are summarised in the corresponding tables but detailed discussion is beyond the scope of his essay (for review see references 6-12). The remainder of the discussion will focus on the underlying methodological arguments for the strengths and weaknesses of observational and RCT data.
Randomised controlled trial
The RCT is the best known example of an experimental trial (fig 1). Properly conducted it provides the highest level of research evidence for efficacy.1,2,5-9. Participants are randomly allocated into groups which receive different interventions but all other variables are standardised.6,7 True randomisation of participants avoids selection bias and all potential confounding variables are equally distributed between the groups negating their effects on the results. Double blinding is also commonly used whereby neither the participants nor the researchers are aware of the intervention that the participant is receiving. This avoids observational bias in the collection of data. The prospective design and standardisation of conditions allows the assessment of cause and effect thereby demonstrating the efficacy of an intervention.
Cohort studies follow a group of people to ascertain if the development of the outcome of interest differs between groups with different exposures to risk factors.6,8,9 Cohort studies can be distinguished into two types prospective and retrospective (fig 1) depending upon when the data was collected.9 Prospective studies identify a group of people (cohort) who do not have the outcome of interest and follow them forward in time measuring various variables relevant to the development of the condition. It can then be seen if these variables differ between those who develop the condition and those who do not (controls). Retrospective cohort studies use the same methods but rely on the analysis of data already collected for another reason. Cohort studies are the best method for determining incidence and the natural history of a condition9 but are expensive and the major criticism is that confounding factors can not always be accounted for in the analysis.6-7 Retrospective cohort studies are cheaper because the data has already been collected. They also lack bias (on behalf of the researchers) because the data was collected for a reason other than the current outcome of interest. However this also raises questions about the rigour and validity of the data collected because the research design was constructed for another purpose.9
Why RCTs and observational studies can produce different answers to the same research question.
There are four possible explanations for the findings of a research study: chance, bias, confounding, and a real effect.8 Appropriate study design (including consideration of sample size, power and statistical analysis) allows the likelihood of a chance finding to be evaluated in both experimental and observational studies.8 Bias tends to be greater in observational studies8 but can be minimised through improvements in study design and research techniques6,14 (eg by using blinding and objective outcomes). Confounding factors (fig 2) are also more likely in observational studies than RCTs.6 A RCT avoids the influence of confounding through randomisation.10 The appropriate use of randomisation (with an adequate sample size) results in all confounding factors (both known and unknown) being evenly distributed between the groups in the study nullifying any impact confounding may have had on the results.15 The impact of confounders on observational studies can be reduced if they are known and they can be measured.6,8 Confounding can then be reduced by rigorous research design to prevent their effects, and statistical techniques to adjust for their effects on the results. However, incomplete understanding and the inability to measure all confounding factors means it is unlikely that all confounding factors can be controlled for in observational studies.6-8 Confounding is therefore considered to be the major limitation of observational studies and is the reason why RCTs are regarded as a higher quality of study design. It is also thought to explain many of the discrepancies in the results of RCTs and observational studies.
HRT: An example of a RCT and observational study giving different answers.
The Nurses Health Study (NHS)16 is a large cohort study and the Heart and Estrogen/progestin Replacement Study (HERS)17 is a RCT. Both were large, well funded, and high quality trials investigations of the use of hormone replacement therapy (HRT) for post menopausal women. The outcomes measures from these trials demonstrated good agreement apart from the risk of coronary heart disease (CHD). The observational study reported a decreased risk of CHD with HRT whereas the RCT reported an increased CHD risk with HRT. This important discrepancy stimulated debate as to the likely cause for this contrasting findings.18-21 The difference in the ages of the samples (HERS17 mean age 66.7 years vs. NHS16 mean age 61.6 years) and the time from menopause until the initiation of HRT has been suggested as a contributing factor to the discrepancy in the results.20 However, the supporting evidence for this argument was weak based on one non-significant subgroup analysis of the HERS17 data (which was interpreted as suggesting that the beneficial effects of HRT on CHD risk reduced with increasing time between menopause and the initiation of HRT) and oestrogen experiments based on animal models.18,20 As a consequence, further suggestions for the discrepancy in the results have revolved around the weaknesses of observational trials in accounting for confounding influences and preventing bias.
Confounding as a reason for the discrepancy
The NHS16 study adjusted for several potential confounders, but could not adjust for all of the confounding factors which were either unknown or impossible to measure. For example, adult socioeconomic status (SES) was accounted for in the analysis, but SES is not stable across the life course and early life SES was found to be independently associated with the likelihood of using HRT.19 Inclusion of early life SES in addition to the other adjustments for potential confounding factors in the observational data produced results of slightly increased CHD risk, in agreement with RCT trials.19 This suggests that a single measure of SES in an observational trial may be inadequate to account for all of its effects.19,21
Bias as a reason for the discrepancy
Selection bias was also suggested as a reason for the discrepancy between the results of the NHS and the HERS. The self selection of the women taking the HRT in the NHS may have caused a “healthy user effect.” The women choosing HRT were likely to be more health conscious with better cardiovascular risk profiles than controls which would have biased the results towards finding HRT to be protective against CHD.18 If this was the cause of the discrepancy then a decrease in other cardiovascular diseases with the same modifiable risk factors as CHD (such as stroke) would also be expected. However, outcomes for risk of stroke and other cardiovascular diseases were concordant between the studies suggesting that another explanation accounted for the discrepancy in the results.18,20
Differences in diagnostic criteria between the studies could have created a reporting bias in the observational study. The HERS17 included silent myocardial infarctions in its analysis and NHS16 did not.16,17 Both participants and researchers were aware of exposure status in the NHS16 thus subjects who believe HRT to be protective may be less likely to attribute chest pain to CHD. In addition although the NHS study reported blinding researchers, the practicalities of this especially in the case of death were the exposure status was not expunged from the medical records makes it unlikely that blinding was effective.18 Researchers who thought HRT to be protective may have been less likely to report CHD as cause of death. Unblinded randomised trials have been found to overestimate the treatment effect22 and it is plausible that the influence on unblinded observational studies would be even larger18 (this is a topic that deserves further empirical review). A sensitivity analysis18 modelling the effect of excluding silent myocardial infarctions and reporting bias found the effects to be synergistic. A 20% misclassification of CHD deaths and differentially recognising 20% of nonfatal MI would reverse the apparent direction of risk from harmful to protective (1.29 to 0.99).18 An even smaller magnitude of bias would be required to reverse the direction of the risk estimate if confounding from SES was also considered. Thus the combination of bias introduced through limitations in the study design and inadequately controlling for confounding factors such as SES in the NHS could explain the discrepancy between the observational and RCTs investigating the benefits of HRT for postmenopausal women.
If RCTs provide the highest level of evidence, why do we need to use other types of research methodologies?
The need for both experimental and observational studies can be demonstrated using the example of evaluating medications. Pre-marketing research relies predominantly on experimental design and a RCT is essential to provide the highest level of evidence for effectiveness of the medication.5 In contrast post-marketing research relies almost completely on observational studies, because the limitations of the RCT design prohibit their use. The sample sizes and the long follow up periods required for a RCT to be used for post-marketing research of rare outcomes or long latent periods would be unfeasible and delay the introduction of beneficial medications. Continued evaluation even after a drug is approved for use is important to identify unknown effects of a drug (both beneficial and adverse) and is most effectively and efficiently done using observational methodology. This example of novel medication evaluation demonstrates how these two kinds of research designs can be used in synergy to answer the specific questions most appropriate for their particular strengths and weaknesses.5 Consideration of the weaknesses of the RCT (table 2) also suggest other situations in which an observational trial may be the only feasible option (eg ethical constraints) or represents the best use of resources to answer a research question (e.g. constraints due to finance, time or rarity of outcome). A combination of research methodologies may also be appropriate, an observational study can be used to generate hypothesis or for the proper planning of a subsequent RCT to ensure the most efficient use of resources.9,21 The most appropriate study design therefore depends upon the research question to be answered and consideration of the relative strengths and weakness of the various study designs.
The RCT is less susceptible to bias and confounding and therefore provides the highest level of research evidence for the efficacy of an intervention but it is wrong to rely solely on experimental design.5 Observational trials are cheaper and quicker than RCTs and may offer the only feasible option of answering a research question due to ethical or resource limitations which prohibit a RCT.8 Rigorously conducted observational studies provide valuable research evidence in good agreement with RCT evidence.13 Discrepancies in research findings are often explained by observational methodology being more open to the influence of bias and confounding.6,8,9 These possible influences should be carefully considered when appraising research evidence but should not prohibit the use of observational evidence. In conclusion research questions should be matched to the most appropriate research design; by carefully considering the type of question to be answered, the resources available and the relative strengths and weaknesses of the various study designs.
- University of Hull, NHS Centre for Reviews and Dissemination. Undertaking systematic reviews of research on effectiveness: CRD guidelines for those carrying out or commissioning reviews. CRD report number 4. 2nd ed. London: Department of Health; 2001.
- NICE Evidence hierarchy [online]. 2008 [accessed 2008 June 7]; [1 screen]. Available from: http://www.nice.org.uk/aboutnice/howwework/developingniceclinicalguidelines/developing_nice_clinical_guidelines.jsp
- Petticrew M, Roberts H. Evidence, hierarchies, and typologies: horses for courses. J Epidemiol Community Health. 2003;57:527–529.
- Rychetnik L, Frommer M, Hawe P. Criteria for evaluating evidence on public health interventions. J Epidemiol Community Health 2002;56:119–27.
- Ray WA. Clinical studies of drug effects in humans. Clinical chemistry. 1996;42(8):1306-1311.
- Gosall NK, Gosall GS. The doctor’s guide to critical appraisal. Knutsford, UK: PasTest; 2006.
- Petrie A, Sabin C. Medical statistics at a glance. 2nd ed. Oxford: Blackwell; 2005.
- Jepsen P, Johnsen SP, Gillman MW, Sørensen HT. Interpretation of observational studies. Heart. 2004;90:956-960.
- Mann CJ. Observational research methods. Research design II: cohort, cross sectional and case-control studies. Emerg Med J. 2003;20;54-60.
- Altman DG, Schulz KF, Moher D, et al. for the CONSORT Group. The revised CONSORT statement for reporting randomised trials: explanation and elaboration. Ann Intern Med 2001;134 663-694.
- von EE, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007; 370(9596):1453-1457.
- McMahon AD, MacDonald TM. Design issues from drug epidemiology. Br J Clin Pharmacol. 2000; 50:419-425
- Concato J, Horwitz RI. Beyond randomised versus observational studies. The Lancet 2004; 363(9422):1660-1661.
- Gluud LL. Bias in clinical intervention research. American journal of epidemiology. 2006; 163(6) 493-501.
- Schultz KF, Grimes DA. Generation of allocation sequences in randomized trials: chance, not choice. Lancet. 2002; 359:515-519.
- Grodstein F, Stampfer MJ, Manson JE, Colditz GA, Willett WC, Rosner B, et al. Postmenopausal estrogen and progestin use and the risk of cardiovascular disease. N Engl J Med. 1996;335:453-61.
- Hulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, et al. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. JAMA. 1998;280:605-613.
- Col NF, Pauker SG. The Discrepancy between Observational studies and randomized trials of menopausal Hormone Therapy: Did expectations shape experience? Ann Intern Med. 2003;139:923-929.
- Lawer DA, Davey Smith G, Ebrahim S. Socioeconomic position and hormone replacement therapy use: explaining the discrepancy in evidence from observational and randomised controlled trials. Am J Public Health. 2004;94:2149–2154.
- Grodstein F, Manson JE, Stampfer MJ, Willet WC. The Discrepancy between Observational Studies and Randomized Trials of Menopausal Hormone Therapy. Annals of internal medicine. 2004. 140(9);764-765.
- Lawlor DA, Smith GD, Bruckdorfer KR, Kundu D, Ebrahim S. Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? The Lancet 2004; 363(9422):1724-1727.
- Kunz R, Vist G, Oxman AD. Randomisation to protect against selection bias in healthcare trials. Cochrane database of systematic reviews 2007, issue 2.