The Multiple Comparison Problem Explained

Go to

Find this wiki on the Physiotutors platform Become a member

Learn

What’s the Multiple Comparison Problem? | Statistics

The multiple comparison problem is the issue that arises when multiple tests on the same sample are performed. An example will illustrate this.

Eg.

Let’s say that a study looks at prospective risk factors for running injuries in 5000 novice runners. Different variables are tested, since we do not yet know which ones will increase the risk. Examples are: running volume, navicular drop, q-angle, quad and glute strength, heel vs forefoot strike pattern, minimalist vs maximalist shoe, and ankle dorsiflexion ROM.

False positives with multiple comparison

Most researchers will accept a 5% false positive rate, the alpha or significance level. This is for a given variable like quadriceps strength. It means that if this study is conducted one hundred times, about 5 studies will show a false positive result, when in fact, there is none.

However, the researchers are looking at ten variables, not just quad strength; within the same sample. This poses a problem.

The researchers, unbeknownst to this problem, conduct the trial. Two years later the data comes in, showing a heel strike pattern and glute strength to be a risk factor for a running injury. Great! That’s the conclusion and the paper gets published.

As noted before, the significance level at 5% does not mean there is a 5% false positive rate at this point due to the plethora of different variables that are being researched. So the researchers implicitly accepted a much greater risk of false positive results by conducting the trial, looking at ten variables.

The family-wise error rate demonstrates this. With a quite simple calculation, we can check the false positive rate, it is 40%! The formula is shown below.

Solutions to the multiple comparison problem

I think we can agree that this forms a problem. So what are we going to do about it? There is a solution. Researchers can make corrections to counteract this alpha-inflation by doing a Bonferroni or Holm correction. This is discussed in “Type 1 error rate control”.

Family-wise error rate formula:

1 – (1 – ɑ)x

ɑ: alpha or significance level in decimals

x: number of tests

Type II Errors

However, adjusting the significance level of each individual test can increase the probability of making a Type II error (false negative) across all the tests. This is because the more stringent significance level reduces the power of each individual test to detect a true effect or relationship. Consequently, a significant effect may be missed in some tests, leading to false negative results. To avoid false negative results due to the multiple comparison problem, we can use techniques such as pre-registration of hypotheses, replication studies, or more powerful statistical methods such as Bayesian inference. Additionally, it is important to carefully design the study and the hypotheses being tested to minimize the number of tests conducted and ensure that they are meaningful and relevant to the research question.