Two of the most widely used instruments for assessing depression severity in clinical trials are the Montgomery Asberg Depression Rating Scale (MADRS) and the Hamilton Depression Rating Scale (HAM-D). Both scales have complexities in their administration, necessitating careful attention to rater training to reduce errors and capture high quality data for informed clinical trial decision making.
Cogstate Clinician Network Associate Director, Kim Baldwin, recently presented information on MADRS and HAM-D rater training best practices and some of the common rater errors. Baldwin is a seasoned clinical research professional with years of experience with the MADRS and HAM-D and is known for her expertise in clinical assessment, training, and trial operations.
The Need for Rater Training
Clinical outcomes—particularly complex clinician-rated scales like the MADRS and HAM-D—are subject to variability and error due to issues of standardization, rater drift, inconsistencies, and scale administration/scoring errors. For a study to yield reliable, statistically significant results, robust rater training and certification protocols are suggested, including being recommended by regulatory bodies.
“I understand the pain of seemingly never-ending requirements for training and onboarding, study after study,” said Baldwin, “It can feel understandably burdensome, even redundant to work through all the training requirements for each new study. However, there are steps to take that can reduce burden and preserve quality.”
These steps can include tailored training programs based on a raters’ experience as well as the use of central raters.
3 Common MADRS and HAM-D Rater Errors
Training programs for MADRS and HAM-D raters should be sure to consider some of the common errors regularly seen when administering these scales. Based on many years in the industry, Baldwin described the following common rater pitfalls*:
- HALO EFFECT – Tendency to apply the rater’s impressions of one item or symptom’s severity to all the remaining symptoms. For example, a rater may incorrectly assume a participant who reports fairly severe depressed mood within the prior week would also experience a significant loss of interest, or “inability to feel.” Each of the items on both scales should be rated independently.
- EUTHYMIC BASELINE – Collecting and utilizing the participant’s last well period. Accurate and consistent assessment here requires training and calibration.
- FOLLOW-UP and SCORING
- When rating the MADRS, there are only anchor descriptors for even-numbered scores. That—along with variability in the level of questioning different raters may employ—can introduce a significant opportunity for rater variance.
- For the HAM-D, each item has anchor descriptors to distinguish between the scores, but the anchors provided have the key scoring criteria in parentheses. If raters are not trained on the proper use of the parenthetical anchors, it could lead to discrepancies in scoring across raters.
*Please see the gold standard structured interview guides developed by Dr. Janet Williams (HAM-D) and Dr. Williams and Dr. Kenneth Kobak (MADRS) for more guidance.
MADRS and HAM-D Rater Training Process Best Practices
To reliably train and calibrate a cohort of raters the training menu for clinical research scales such as the MADRS and HAM-D must reflect their use in the trial.
For example, an enhanced training approach is often advisable when the HAM-D or MADRS is an inclusion/exclusion measure or a key endpoint warranting a process whereby raters must demonstrate competency before being certified. If the scales are used as a secondary or exploratory measure for a non-MDD indication, didactic training and the use of the structured guide may suffice and save some cost.
At a minimum, a comprehensive online learning module should be provided to each MADRS/HAM-D rater.
“Particularly for novice raters, a demonstration of proper administration can be important,” said Baldwin. “I find that one of the more challenging parts of both MADRS and HAM-D interviews is starting the first item through the establishing of the euthymic baseline. A good demonstration of this part of the scale can go a long way.”
Another option can be a video scoring exercise. This can be particularly effective to utilize across a larger, or geographically diverse group of raters. This has the added benefit of pairing proper administration demonstrations with correct scoring.
Finally, having an experienced trainer confirm a rater’s readiness and calibration prior to rating a study can be a gamechanger to capture quality data.
Wrap-Up
If you are interested in learning more, please feel free to contact our team and/or access the free webinar with more details.
Kim Baldwin | Associate Director, Clinician Network
Kim Baldwin is a seasoned clinical research professional known for her extensive expertise in clinical assessment, central rating, quality oversight, training, and trial operations. She has a versatile therapeutic background spanning many populations and has worked in individual, family, group, and community mental health settings.
Kim has an MA in Clinical Psychology from Pepperdine University and a BA in Psychology from The University of Texas at Austin.