Stats & Methodology

# What's the Multiple Comparison Problem? | Statistics

Go to

## What’s the Multiple Comparison Problem? | Statistics

The multiple comparison problem is the issue that arises when multiple tests on the same sample are performed. An example will illustrate this.

Eg.

Let’s say that a study looks at prospective risk factors for running injuries in 5000 novice runners. Different variables are tested, since we do not yet know which ones will increase the risk. Examples are: running volume, navicular drop, q-angle, quad and glute strength, heel vs forefoot strike pattern, minimalist vs maximalist shoe, and ankle dorsiflexion ROM.

### False positives with multiple comparison

Most researchers will accept a 5% false positive rate, the alpha or significance level. This is for a given variable like quadriceps strength. It means that if this study is conducted one hundred times, about 5 studies will show a false positive result, when in fact, there is none.

However, the researchers are looking at ten variables, not just quad strength; within the same sample. This poses a problem.

The researchers, unbeknownst to this problem, conduct the trial. Two years later the data comes in, showing a heel strike pattern and glute strength to be a risk factor for a running injury. Great! That’s the conclusion and the paper gets published.

As noted before, the significance level at 5% does not mean there is a 5% false positive rate at this point due to the plethora of different variables that are being researched. So the researchers implicitly accepted a much greater risk of false positive results by conducting the trial, looking at ten variables.

The family-wise error rate demonstrates this. With a quite simple calculation, we can check the false positive rate, it is 40%! The formula is shown below.

### Solutions to the multiple comparison problem

I think we can agree that this forms a problem. So what are we going to do about it? There is a solution. Researchers can make corrections to counteract this alpha-inflation by doing a Bonferroni or Holm correction. This is discussed in “Type 1 error rate control”.

Family-wise error rate formula:

1 – (1 – ɑ)x

ɑ: alpha or significance level in decimals

x: number of tests

### Type II Errors

However, adjusting the significance level of each individual test can increase the probability of making a Type II error (false negative) across all the tests. This is because the more stringent significance level reduces the power of each individual test to detect a true effect or relationship. Consequently, a significant effect may be missed in some tests, leading to false negative results. To avoid false negative results due to the multiple comparison problem, we can use techniques such as pre-registration of hypotheses, replication studies, or more powerful statistical methods such as Bayesian inference. Additionally, it is important to carefully design the study and the hypotheses being tested to minimize the number of tests conducted and ensure that they are meaningful and relevant to the research question.

## References

Wason, J. M. S., & Robertson, D. S. (2021). Controlling type I error rates in multi-arm clinical trials: A case for the false discovery rate. Pharmaceutical statistics, 20(1), 109–116.

Dudoit, S., van der Laan, M. J., & Pollard, K. S. (2004). Multiple testing. Part I. Single-step procedures for control of general type I error rates. Statistical applications in genetics and molecular biology, 3, Article13.

Lakens, D., Type 1 error control by Daniel Lakens, youtube

John Ludbrook (1998). MULTIPLE COMPARISON PROCEDURES UPDATED. , 25(12), 1032–1037. doi:10.1111/j.1440-1681.1998.tb02179.x

Like what you're learning?

#### Use the assessment app

• Over 300 orthopedic physical assessment tests
• Statistics, basic assessments, and screening tests included
• Direct links to PubMed references
• Concise test descriptions
• Video demonstration
• Easy search & favorites function
E-Book

Reviews