What’s a P-Value? | Statistics
In simple terms the p-value expresses how surprised you are with the data, assuming there is no effect. The lower the p-value, the more incompatible the data seems with your model (i.e. the assumption that there is no effect).
Treatment A is compared to treatment B, you assume there is no effect or no difference; you expect the null hypothesis to be correct. You perform the test and get a p-value of 0.02. That means that the data you gathered is pretty surprising, considering that you assumed the groups would not differ.
The p-value exists to protect yourself from randomness. If you perform a study, chances are that the effects you see are just random— or data noise, as we call it. That’s why you might see noticeable differences in the mean values between groups, but no statistically significant effect. It can go the other way around as well. A study might show a non-significant result, but there might be a true effect; perhaps because the sample size is just too small.
What influences the p-value?
P-values are influenced by a few different factors: sample size, effect size, and the type of test with its assumptions.
Sample size: the bigger the group, the faster you’ll get statistically significant results with small differences— and vice versaEffect size: the bigger the effect size, the faster you’ll get statistically significant results, even with smaller groups— and vice versaType of test: a test gets more sensitive to differences with certain assumptions about for example the data distribution, independence of measures, homoscedasticity, one-sided vs two-sided, between-group vs within-group, etc.
A huge study can find statistically significant results with even the smallest of effects. These effects might not mean a thing. This is where clinical significance comes into play.The original penicillin study used a tiny sample to make the data show that there are huge effects on eliminating bacteria.
P-value <0.05 threshold
The threshold for statistical significance most researchers use (i.e. p < 0.05) is just arbitrary. All things considered, it should change based on your study setup. If you really do not want false positive results (eg. a decision to undergo a life-threatening operation), you need a low threshold number. If you really don’t want false negatives (eg. diagnosing aggressive tumors), you need a high-powered study with subsequently a higher p-value threshold number. This illustrates the give-and-take relation between type 1 (α) and type 2 (ß) errors.
Do note that the p-value is derived from the data, not the theory. You cannot ‘prove’ your theory with a statistically significant effect. The only thing you can do is try to refute your theory with different studies, if it holds, your theory stands. This is falsification.
Misconceptions around the p-value
Some common misconceptions about the p-value in medical research include:
- A significant p-value means that the effect or association is large or clinically meaningful. Reality: The p-value only indicates the likelihood of obtaining the observed result or more extreme under the null hypothesis. It does not provide information about the size or clinical significance of the effect or association
- A non-significant p-value means that there is no effect or association. Reality: A non-significant p-value only suggests that the observed result is not statistically significant, but it does not necessarily mean that there is no effect or association. It may be due to low statistical power or other factors such as measurement error or confounding
- A p-value of 0.05 is a universal threshold for statistical significance. Reality: The choice of significance level depends on the context and should be based on factors such as the study design, sample size, and the consequences of making a Type I error. A lower significance level may be appropriate in some situations, such as in studies with multiple comparisons or high stakes
- A significant p-value proves causation. Reality: Statistical significance only indicates the likelihood of obtaining the observed result or more extreme under the null hypothesis. It does not establish causality, which requires additional evidence from study design, biological plausibility, and other factors
- A large sample size always leads to a significant p-value. Reality: A large sample size increases the power to detect an effect or association, but it does not guarantee a significant p-value. The effect size, variability, and other factors also play a role in determining statistical significance.
Elkins, M. R., Pinto, R. Z., Verhagen, A., Grygorowicz, M., Söderlund, A., Guemann, M., Gómez-Conesa, A., Blanton, S., Brismée, J. M., Agarwal, S., Jette, A., Karstens, S., Harms, M., Verheyden, G., & Sheikh, U. (2022). Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors. The Journal of manual & manipulative therapy, 30(3), 133–138.
Neyman, J. and Pearson, E.S. (1928) On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference. Biometrika, 20A, 175-240.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337–350.
Kamper S. J. (2019). Interpreting Outcomes 2-Statistical Significance and Clinical Meaningfulness: Linking Evidence to Practice. The Journal of orthopaedic and sports physical therapy, 49(7), 559–560.
Karl Popper, Conjectures and Refutations, London: Routledge and Keagan Paul, 1963, pp. 33-39; from Theodore Schick, ed., Readings in the Philosophy of Science, Mountain View, CA: Mayfield Publishing Company, 2000, pp. 9-13
Christley, R.M. (2010). Power and Error: Increased Risk of False Positive Results in Underpowered Studies. The Open Epidemiology Journal, 3, 16-19.
Fleming A. On the Antibacterial Action of Cultures of a Penicillium, with Special Reference to their Use in the Isolation of B. influenzæ. Br J Exp Pathol. 1929 Jun;10(3):226–36. PMCID: PMC2048009.
Erickson, R. A., & Rattner, B. A. (2020). Moving Beyond p < 0.05 in Ecotoxicology: A Guide for Practitioners. Environmental toxicology and chemistry, 39(9), 1657–1669.
What customers have to say about the Assessment E-Book
- Vince199225/04/20The Assessment E-Book This book helped me in my studying for my exam and in assessing my first patients. Awesome! Also for beginners!Simon Pagitz06/04/20The Assessment E-Book It’s an amazing Compilation! Congrats to all the work you have put in there. You’ll propably find all the test’s you’ve been looking for with propper explaination and source to doublecheck for you self. definetly a must have for every student, but it will also help an experienced practioner. Im looking forward to the lifelong updates on the topics.
Great work, guys
- Jordi Burrut27/10/19The Assessment E-Book A must-have for all physiotherapists, osteopaths and manual therapists. The authors conducted an extensive research on assessment tests in manual therapy. I find it very easy to read. The more I read the more I learn. Thank you!Josh07/07/19The Assessment E-Book This book is great! It is very structured and detailed. It works extremely well on my Macbook and iPad.
- Polo_soa15/02/19The Assessment E-Book The best way to spend 80euros. Totally worth it. The amount of work you put behind this must have been absolutely huge. Every physical or physiotherapist should own it.
Congrats guys you’ve done an incredible job.
I’ve learnd a lot of new things and my approach to therapy in general have totally changed.
In one word: amazing. Keep going guys ! Best wishes from france.