Heijboer et al. (2022)

Inter-examiner reliability of the Doha agreement meeting classification system of groin pain in male athletes

This study examined the inter-rater agreement of the Doha classification system for groin injuries

Reliability varies by the method of classification used

The generalizability of these results to common practice may be limited as this study took part in a tertiary setting

Introduction

In 2014 a clinical classification system was developed by an expert panel of the Doha agreement meeting. The classification system was published by Weir et al. (2015) and has found its way to reach clinicians working with athletes and groin injuries all over the world. As the classification system for groin injuries is frequently used, it has to be ascertained that the reliability is adequate. That’s what this paper aimed to research.

 

Methods

The Doha agreement panel defined 4 clinical entities of groin pain: adductor-related, iliopsoas-related, inguinal-related, and pubic-related. Next to these, hip-related causes for groin pain and other causes were defined as well.

classification system for groin injuries
From: Heijboer et al., Scand J Med Sci Sports (2022)

 

This study was set up to examine the inter-rater reliability of using the classification system. A surgeon and a physiotherapist independently examined male adult athletes with groin pain that had a gradual onset that worsened with exercise or had a sudden onset that persisted beyond 6 weeks.

By the use of a semi-structured dialogue, the patient’s symptoms and injury history were questioned. These were based on the Doha agreement classification but the clinicians were allowed to ask other questions as well. Next to the interview, the Arabic version of the Copenhagen Hip and Groin Outcome Score (HAGOS) was completed by the participants. This questionnaire is designed to measure symptoms, pain, function in daily living, function in sport and recreation, participation in physical activities, and hip and/or groin-related quality of life. The scores range from 0-100 and 0 represents extreme hip and/or groin symptoms.

Next to the inventorisation of the symptoms, the clinical examination was performed and consisted of pain provocation tests (palpation, resistance testing, stretching), hip range of motion tests and hip impingement tests (flexion-adduction-internal rotation (FADIR) and flexion-abduction-external rotation (FABER)). Using this information and the information obtained by the interview, the groin pain was classified using the Doha agreement. It was possible to classify multiple clinical entities and this was at the discretion of the examiner. The entities were ranked in case multiple causes for the groin pain had been identified.

The inter-examiner reliability was studied using Cohen’s Kappa statistic. The interpretation of the Kappa values was as follows:

  1. almost perfect (κ = 0.81–1.00),
  2. substantial (κ = 0.61–0.80),
  3. moderate (κ = 0.41–0.60),
  4. fair (κ = 0.21–0.40),
  5. slight (κ = 0–0.20),
  6. and poor (κ < 0).

Results

Forty-eight males with groin pain were included in this study. Eighteen of them had bilateral symptoms and thus 66 sides were examined in total. For the 4 clinical entities of groin pain, inter-examiner reliability was found to be fair for adductor-related, moderate for iliopsoas-related and inguinal-related, and slight for pubic-related groin pain (Kappa according to the dichotomous scale interpretation).

classification system for groin injuries
From: Heijboer et al., Scand J Med Sci Sports (2022)

 

When the clinical entities, in case multiple causes for groin pain were identified, were ranked in descending order of perceived clinical importance, the Kappa values indicated substantial reliability for adductor-related and iliopsoas-related, moderate reliability for inguinal-related and slight reliability for pubic-related. This can be seen in the interpretation of the Kappa values on the ordinal scale.

In seven of the 48 participants, only 1 clinical entity was diagnosed. Here, the agreement between the blinded examiners was 100%. However, the majority of participants were classified as having more than 1 clinical entity causing the groin pain and the inter-examiner agreement was much lower here. The examiners agreed on the same combination of classifications in 29% and 23% of sides.

Questions and thoughts

There appears to be much variation in the diagnosis of the groin injuries between 2 examiners. Could it have been influenced by the different professions of both (surgeon versus physiotherapist)? It appears that using the Doha classification system for groin injuries does not lead to uniformity in the diagnosis between different examiners. The reasons may be partly explained by the fact that it was possible to diagnose multiple clinical entities that caused the groin injuries and by the fact that the investigators were asked to rank these entities according to their perception of their clinical importance from most to least important. These ranks were analyzed as an ordinal variable, meaning that the order matters. When the clinical classification was ranked as such, the examiners agreed to a greater extent.

Table 1 reveals that the examination by the second examiner was not performed on the same day in one-third of the participants. In 13% it was performed after 1-2 days, in 15% after 3-5 days, and in 6% after 6-7 days. This could have had pros and cons. A delay in the second examination could have influenced the inter-examiner agreement as the symptoms could have changed. On the other hand, avoiding a repeat examination on the same day could possibly have limited provocation and worsening of the symptoms during the second examination.

In the article, the following was stated: “Both blind examiners agreed on the same classification/combination of classifications in 14/48 (29%) of participants and 15/66 (23%) sides”. Thus, in less than one-third of cases, the examiners agreed on the cause of the groin injury. When only 1 clinical entity of groin pain was defined, the agreement was 100%, but only 7 out of 48 participants had unilateral symptoms and only one clinical entity. It seems obvious that in more clear clinical pictures the agreement is much higher than in case it is thought that the groin injury results from different problems. But I wonder how it is possible for a very detailed classification system to have so much overlap. It is explained that the examiner could classify the injury even though not all criteria were present. I hear you think about the usefulness of the classification, indeed. When only the injuries that met all criteria of the classification system were analyzed, the inter-rater agreement was improved.

 

classification system for groin injuries
From: Heijboer et al., Scand J Med Sci Sports (2022)

 

So why didn’t they stick to the ‘rules’ of the classification system? The Doha classification leaves room for interpretation as is described by the authors: “For example, the definition for iliopsoas-related groin pain (“iliopsoas tenderness and more likely if there is pain on resisted hip flexion and/or pain on hip flexor stretching”) allows a considerable amount of individual examiner interpretation. If an athlete has mild secondary symptoms reproduced during an iliopsoas palpation test, but not during stretch or resistance tests, one examiner may classify this as iliopsoas-related groin pain while the other may not. This may have led to different interpretations and subsequently to lower agreement. On the other hand, I encourage you to remain critical and avoid checking boxes in your clinical examination. Clinical reasoning remains the most important part of your diagnostic workup.

An Arabic translation of the HAGOS score was used, however, this version still needs to be validated. This forms not so much of a problem as the score was only used to describe the baseline characteristics of the participants.

Talk nerdy to me

Important in interpreting these results is that both researchers made part of the expert panel involved in the development of the Doha classification system for groin injuries used in this study. They had their clinical expertise in this area. This may limit the generalizability of these results to less experienced assessors. It may also have caused bias in the results, as results may be worded slightly differently. We see this, for example, when the authors say that reliability fluctuates between slight and substantial. However, this is true when looking at ordinal data (when the different clinical entities were ranked according to their clinical importance). However, when we look at the nominal data (when no ranking was made on the importance of the different causes of groin pain within 1 patient), we see that the reliability between the reviewers fluctuates from slight to moderate. Here you can see an example of how results are sometimes worded slightly differently. These authors have been involved in the development of this classification and obviously want a good result. It would have been better to have this study conducted by independent reviewers not involved in the expert panel or by less experienced researchers. But of course, this could still happen in the future.

Table 2 reveals that the prevalence of pubic-related, hip-related, and other causes was relatively low. The kappa value is however influenced by the prevalence of the condition. Therefore, the outcomes of pubic-related, hip-related, and other causes of groin pain may be inaccurate. The bias index that was measured, gives an indication of the extent to which the raters disagree on the proportion of positive or negative cases. When bias is high, this means that the raters disagree more. This can result in an overestimation of the kappa value.

Take home messages

This study examined the inter-rater reliability of the Doha classification system for groin injuries. The results indicate that the agreement between both examiners was good when only 1 cause of groin pain was identified. In case multiple clinical entities exist, the reliability was best when it was ranked according to the perceived clinical importance of the injury for adductor-, inguinal- and iliopsoas-related groin pain, but not for pubic-related, hip-related and other causes of groin pain. You could say that even the experts did not always agree, even when they strictly used the clinical criteria as proposed in the Doha Agreement. So I suggest you familiarize yourself with the criteria before using them. It is also better to clearly document your findings so that you can better compare your decisions with another colleague and so that you can better justify your diagnosis.

References

Heijboer WMP, Weir A, Vuckovic Z, Fullam K, Tol JL, Delahunt E, Serner A. Inter-examiner reliability of the Doha agreement meeting classification system of groin pain in male athletes. Scand J Med Sci Sports. 2022 Oct 18. doi: 10.1111/sms.14248. Epub ahead of print. PMID: 36259124.

Weir, A., Brukner, P., Delahunt, E., Ekstrand, J., Griffin, D., Khan, K. M., … & Hölmich, P. (2015). Doha agreement meeting on terminology and definitions in groin pain in athletes. British journal of sports medicine49(12), 768-774.

Additional reference

Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005 Mar;85(3):257-68. PMID: 15733050.

Download our FREE app