2

Click here to load reader

The pesty P value

Embed Size (px)

Citation preview

Page 1: The pesty P value

Editorial

The pesty P value

Statistics can be considered as the study of quantifiedinformation. A fundamental application of statistics ismaking inferences from data. This process involveshypothesis testing. How do we go about testing a hypo-thesis from the data? For example, say we have a sample ofintraocular pressure (IOP) measurements from the righteyes of 100 glaucoma patients who were randomizedevenly to receive either drop X or drop Y. Our end-pointresults indicate that the mean IOP on drop X is 15 mmHgand the mean IOP on drop Y is 17 mmHg. In terms of itsocular hypotensive action, is drop X better than drop Y, orwas the difference caused by chance? One method toanswer this question is to perform a statistical test to deter-mine a P value. In fact, this practice has become soentrenched in the medical literature that it is generallyaccepted without question. And the notion of whetheror not the results were ‘significant’ has assumed a place inour lexicon, and influences whether or not a study ispublished.

But there are dirty little secrets underpinning thismethodology. The school of probability theory that is usedto construct this form of hypothesis testing is known asfrequentist. In this paradigm, the probability of an event isconceptualized as the long-run frequency of the event. Ifwe tossed a fair coin a thousand times, we would get about500 heads. This seems reasonable; this notion of long-runfrequency is applied to statistical hypothesis testing. If theresults are reduced to the measure of interest, say the dif-ference in sample means, then the P value is defined as thelong-run frequency of obtaining the observed results, or amore extreme result, assuming the null hypothesis is true.Note that if P = 0.01, this does not mean that there is a 1%chance that the null hypothesis is false, a commonmisunderstanding. This interpretation is illogical becausethe P value is calculated on the assumption that the nullhypothesis is true. Nor should the P value be considered asthe false-positive rate (the alpha value). That is a predeter-mined percentage designated prior to the data collection:the P value is evaluated after the data collection. In tradi-tional hypothesis testing, if the P value is less than thealpha value then the results are considered ‘statisticallysignificant’. In perusing recent issues of Clinical and Experi-mental Ophthalmology, the ‘statistical significance’ of theresults is frequently the first point of discussion in pub-lished papers.1–3

There are a number of issues that make the P value apest: (i) P values only make sense in terms of probabilistic

reductio ad absurdum rejection of a null hypothesis; (ii) Pvalues are interpreted in terms of long-run frequencydespite only conducting the study once; (iii) P values areinfluenced by possibilities that never actually occurred;(iv) under frequentist hypothesis testing, it is perfectlypossible to obtain different P values from the same data bychanging the way that the study question is asked; and (v)if we had the same P value in our IOP study from a sampleof 50 rather than 100 patients, would we have more or lessevidence against the null hypothesis? This issue involvesthe effect size of a study, a critical issue in medicine (Ibelieve we would have more evidence from the smallerstudy, but others would disagree).

Abandoning P values and retreating to the use of con-fidence intervals is one proposed solution to the pesty Pvalue, but does not help the logical conundrums inherentin frequentist hypothesis testing. Confidence intervals infrequentist statistics are also unnatural when they areexposed for what they really are. It is commonly believedthat the 95% confidence interval about the parameterestimate indicates that the true value of the parameter inthe population has a 95% probability of being withinthat range. That interpretation is understandable, but is amisconception. Under the frequentist probability theory,the true value of the parameter in the population is afixed value. For example, the mean IOP in the right eyesof an entire target population has an exact value with noerror. This value is either contained within the confidenceinterval with a probability of 1 or it is not, with a prob-ability of 0. The correct interpretation of the 95% confi-dence interval is: the interval that would capture the truemean 95% of the time in the long run, if the study werereplicated over and over again. Despite the illogicalitiesinherent in the frequentist approach to statistical hypoth-esis testing this methodology remains dominant in themedical literature. In this issue, Thomas et al. provide aninsightful introduction to an alternative approach: Baye-sian statistics.4 In this paper, they introduce the reader tothe concept of the Bayesian paradigm and illustrate itsusefulness in understanding clinical applicability ofinvestigations.4

Bayesians view probability in a different light tofrequentists. To a Bayesian there is a certain evidence-based ‘degree of belief’ attributed to the probability of anoutcome, which is modified by new evidence as it arises.In my opinion, this is how clinicians actually practicemedicine and how scientists conduct science.

Conflict/competing interest: No stated conflict of interest.

Clinical and Experimental Ophthalmology 2011; 39: 849–850 doi: 10.1111/j.1442-9071.2011.02707.x

© 2011 The AuthorsClinical and Experimental Ophthalmology © 2011 Royal Australian and New Zealand College of Ophthalmologists

Page 2: The pesty P value

Only a few medical journals currently permit a Bayesianstatistical analysis of data in published papers: Clinical andExperimental Ophthalmology is one of them.

Robert J Casson DPhil FRANZCOSouth Australian Institute of Ophthalmology and Discipline

of Ophthalmology and Visual Sciences, University ofAdelaide, South Australia, Australia

REFERENCES

1. De Moraes CG, Furlanetto RL, Reis AS, Vegini F,Cavalcanti NF, Susanna R Jr. Agreement betweenstress intraocular pressure and long-term intraocular

pressure measurements in primary open angle glau-coma. Clin Experiment Ophthalmol 2009; 37: 270–4.

2. Landers J, Henderson T, Craig J. Prevalence and associa-tions of refractive error in indigenous Australians withincentral Australia: the Central Australian Ocular HealthStudy. Clin Experiment Ophthalmol 2010; 38: 381–6.

3. Grammenandi E, Detorakis ET, Pallikaris IG,Tsilimbaris MK. Differences between goldmann applan-ation tonometry and dynamic contour tonometry inpseudoexfoliation syndrome. Clin Experiment Ophthalmol2010; 38: 444–8.

4. Thomas R, Mengersen K, Parikh RS, Walland MJ,Muliyil J. Enter the reverend: introduction to and appli-cation of Bayes’ theorem in clinical ophthalmology. ClinExperiment Ophthalmol 2011; 39: 865–70.

850 Editorial

© 2011 The AuthorsClinical and Experimental Ophthalmology © 2011 Royal Australian and New Zealand College of Ophthalmologists