Article Text

The ADAS-cog in Alzheimer's disease clinical trials: psychometric evaluation of the sum and its parts
  1. Stefan J Cano1,
  2. Holly B Posner2,
  3. Margaret L Moline3,
  4. Stephen W Hurt3,4,
  5. Jina Swartz5,
  6. Tim Hsu3,
  7. Jeremy C Hobart1
  1. 1Clinical Neurology Research Group, Peninsula College of Medicine and Dentistry, Plymouth, UK
  2. 2Pfizer Inc, New York, USA (formerly of Eisai Medical Research Inc)
  3. 3Eisai Neuroscience Product Creation Unit, Woodcliff Lake, New Jersey, USA
  4. 4Weill Medical College of Cornell University, New York, USA
  5. 5Eisai Neuroscience Product Creation Unit, Hatfield, Hertfordshire, UK
  1. Correspondence to Jeremy Hobart, Department of Clinical Neuroscience, Peninsula College of Medicine and Dentistry Room N16 ITTC Building, Tamar Science Park, Davy Road, Plymouth, Devon PL6 8BX, UK; jeremy.hobart{at}pms.ac.uk

Abstract

Background The Alzheimer's Disease Assessment Scale Cognitive Behavior Section (ADAS-cog), a measure of cognitive performance, has been used widely in Alzheimer's disease trials. Its key role in clinical trials should be supported by evidence that it is both clinically meaningful and scientifically sound. Its conceptual and neuropsychological underpinnings are well-considered, but its performance as an instrument of measurement has received less attention.

Objective To examine the traditional psychometric properties of the ADAS-cog in a large sample of people with Alzheimer's disease.

Methods Data from three clinical trials of donepezil (Aricept) in mild-to-moderate Alzheimer's disease (n=1421; MMSE 10–26) were analysed at both the scale and component level. Five psychometric properties were examined using traditional psychometric methods. These methods of examination underpin upcoming Food and Drug Administration recommendations for patient rating scale evaluation.

Results At the scale-level, criteria tested for data completeness, scaling assumptions (eg, component total correlations: 0.39–0.67), targeting (no floor or ceiling effects), reliability (eg, Cronbach's α: = 0.84; test-retest intraclass correlations: 0.93) and validity (correlation with MMSE: −0.63) were satisfied. At the component level, 7 of 11 ADAS-cog components had substantial ceiling effects (range 40–64%).

Conclusions Performance was satisfactory at the scale level, but most ADAS-cog components were too easy for many patients in this sample and did not reflect the expected depth and range of cognitive performance. The clinical implication of this finding is that the ADAS-cog's estimate of cognitive ability, and its potential ability to detect differences in cognitive performance under treatment, could be improved. However, because of the limitations of traditional psychometric methods, further evaluations would be desirable using additional rating scale analysis techniques to pinpoint specific improvements.

  • Alzheimer's disease
  • clinical trials
  • rating scales
  • reliability
  • validity

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Alzheimer's disease is a terminal dementing neurodegenerative disease that impacts on cognition and behaviour.1 It is the most common form of dementia, affecting approximately 27 million people worldwide,2 and incidence rates are expected to quadruple by the middle of this century.3 Considerable interest and resources have been targeted at slowing Alzheimer's disease progression as reflected in the growing number of clinical trials in Alzheimer's disease.4

The most widely used primary outcome measure in these clinical trials has been the Alzheimer's Disease Assessment Scale-Cognitive Behavior Section (ADAS-cog).5 6 It was developed in the early 1980s in response to the then perceived lack of appropriate instruments available to test the efficacy of Alzheimer's disease drug treatments,5 6 to assess the ‘severity of dysfunction and research in patients with Alzheimer's disease’.6 Since its inception, the ADAS-cog has been used in over 127 Alzheimer's disease clinical trials and, although developed specifically for Alzheimer's disease, it has frequently been used in non- Alzheimer's disease populations, including mild cognitive impairment,7 vascular dementia8 and Parkinson's disease.9 Of particular relevance to the present study is that clinical trials are increasingly focusing on people earlier in the disease process and with less severe Alzheimer's disease. As awareness increases, diagnoses are likely to be made much earlier than they were 25 years ago.

If the ADAS-cog is to be considered fit for future measurement of all severities of Alzheimer's disease, including milder forms, it should satisfy stringent criteria as a reliable and valid measure of cognitive performance. Awareness of this issue is now widely recognised by international regulatory agencies concerning the use of patient rating scales. The ADAS-cog was developed with sound consideration of relevant neuropsychological consequences of Alzheimer's disease, but without being subjected to rigorous psychometric techniques of rating scale construction. Although we are unsure as to the precise reasons the ADAS-Cog was developed in this way, the lack of standard rating scale construction methods may have resulted from a lack of awareness. As such, although these methods have existed for decades, they have been rarely applied to clinical rating scale research.

At its introduction, data on the ADAS-cog were provided on inter-rater and test-retest reliability in small samples of Alzheimer's disease patients (n=27) and elderly people without Alzheimer's disease (n=28).6 Since then, it has undergone additional scale-level psychometric evaluations12–14 with some authors suggesting possible key limitations.15 16 The reason as to why psychometric evaluations of rating scales before their use are important requires a brief overview of the key issues surrounding the use of rating scales as outcomes measures.

Measurement requires the construction of an instrument for carrying out the practical process of measuring. Some variables, like height, can be measured directly and by relatively straightforward means. Other variables, like cognitive performance, need to be approached indirectly through quantifying their manifestations. It is important here to note that in its role as a clinical assessment tool the relevance of evaluating the ADAS-Cog using rating scale testing methods is appropriate but less crucial than to do so for its role as a measurement instrument for clinical research. This is because clinical assessment and measurement are different processes that have different requirements. We have previously summarised these,11 but the key issue is that measurement has a specific meaning with respect to the quantification of attributes. By contrast, clinical assessment is, frequently, a qualitative process. Here instrument development is not straightforward and requires the construction of tools that transform numerically graded manifestations into measurements of underlying variables. Indirectly measured variables are often called latent (hidden) variables to emphasise this fact.

Rating scales are constructed to measure latent variables. It is customary for a rating scale to consist of a set of items each of which represents a different manifestation. In relation to the ADAS-cog we have referred to these as components, as the eleven questions used are more detailed, time consuming and involved than traditional rating scale items. Every item is scored and item scores are combined to give a total score for each person. This value is a measure of the variable quantified by the set of items.

Whether a rating scale generates clinically meaningful and scientifically sound measurements depends on decisions during its construction and its performance during testing. The decisions concern the components selected to form the set, their clinical grading and numerical scoring, and how components are combined to give a single value. Performance is tested against a number of predefined measurement (psychometric) criteria.

The original ADAS-cog measures cognitive performance by combining ratings of 11 components (word recall, word recognition, constructional praxis, orientation, naming objects and fingers, commands, ideational praxis, remembering test instruction, spoken language, word finding, comprehension) representing six broad areas of cognition: memory, language, ability to orientate oneself to time, place and person, construction of simple designs and planning, and performing simple behaviors in pursuit of a basic, predefined goal.5 6 Seven of the eleven ADAS-cog components are scored as the ‘number incorrect’. For example, in the commands component, the number of five commands performed incorrectly (none, 1, 2, 3, 4 or all 5). The remaining four ADAS-cog components are scored from 0 (no limitations) to 5 (max limitations) as the examining clinician's perception of remembering test instructions, spoken language ability, word finding and comprehension. Scores for the 11 components are summed, without weighting, into a total ADAS-cog score. Low total scores indicate better cognitive performance. Online supplementary material appendix 1 shows the component structure of the ADAS-cog. Note that the 11 components have different score ranges.

This process appears clinically appropriate but requires empirical proof that it ‘works’. This means that evidence is needed to support the choice of items forming the set, scoring of the individual items and appropriateness of combining item scores into a single score. Also evidence should be available demonstrating that the single score is a reliable and valid measure of cognitive performance. Psychometric methods provide formal frameworks for gathering this evidence.

There are two main types of psychometric method: traditional and modern.11 Traditional methods are the most widely used analytic strategy for determining rating scale reliability and validity and will be reported here.17 These are the psychometric methods best understood by clinicians and clinical researchers and underpin the forthcoming Food and Drug Administration (FDA) guidelines for rating scales.10 The aim of this study was to provide clinicians and researchers with a traditional psychometric evaluation of the ADAS-cog, which goes beyond the existing published examinations in type (ie, detailed evaluations of data quality, scaling assumptions, targeting, reliability, validity) and kind (ie, the inclusion of scale level and, importantly, component-level analyses).

Methods

Setting and participants

Anonymised screening and baseline data from three large clinical trials of donepezil18–20 in people with Alzheimer's disease were pooled for analysis. The inclusion criteria were healthy ambulatory people aged ≥50 y with a diagnosis of probable Alzheimer's disease, of mild to moderate severity (Clinical Dementia Rating 1 or 2), with a Mini-Mental State Examination (MMSE) score between 10 and 26 and uncomplicated by stroke.

Data analysis

Many clinicians are familiar with reliability and validity testing, but a more thorough traditional psychometric evaluation involves the assessment of six properties: data completeness, scaling assumptions, targeting, reliability, validity and responsiveness. Data completeness concerns the extent to which a scale's components are completed in the target sample and the per cent of people for whom it is possible to report a single score. Tests of scaling assumptions examine whether it is appropriate statistically to sum the 11 components to generate a single scale score. Targeting assesses the match between the range of cognitive performance measured by the ADAS-cog and the range of cognitive performance in the sample. Reliability describes the extent to which scale scores are free from random error. Validity refers to the extent to which the ADAS-cog measures cognitive performance. Responsiveness is the ability to detect accurately true change in cognitive performance when it has occurred. We examined five of these six psychometric properties (see online supplementary material appendix 2), which are extensively documented elsewhere.21–23

Results

Sample

Altogether, 1418 of 1421 patients tested provided sufficiently complete ADAS-cog component scores. The sample is characterised in (table 1). The main analyses were undertaken in the total sample. Additional targeting, reliability and validity analyses were conducted in MMSE subgroups (10–14 moderately severe; 15–20 moderate; 21–26 mild) to examine the impact of cognitive impairment on the psychometric properties of the ADAS-cog. The outcomes of the original clinical trials and further specification of the study populations are provided elsewhere.18–20

Table 1

Respondent characteristics (N=1421)

Psychometric properties

Data completeness

Data completeness was high (tables 2 and 3). The proportion of component-level missing data was low (≤0.02%). ADAS-cog total scores could be calculated for 99.7% of the sample (1418/1421).

Table 2

ADAS-cog scale level analyses: data completeness, scaling assumptions, targeting, reliability, validity (N=1421)*

Table 3

ADAS-cog component level analyses:data completeness, scaling assumptions, targeting, reliability* (N=1421)

Scaling assumptions

The ADAS-cog satisfied most criteria for scaling assumptions (tables 2 and 3). For example, component-total correlations (corrected for overlap) for the 11 ADAS-cog components ranged from 0.39–0.67 satisfying the recommended criteria. This supported the scale components as measures of a common underlying construct and indicated that components contained a similar proportion of information about that construct.

However, table 3 shows that ADAS-cog component mean scores and variances were not especially similar. While this implies some criteria for scaling assumptions were not satisfied, it is important to note that ADAS-cog components have different numbers of response categories. Thus, mean scores and variances were similar for components with the same/similar numbers of response categories providing evidence that these criteria were fulfilled.

Targeting (tables 2 and 3; appendix 3)

The ADAS-cog total scores spanned approximately 83% of the entire scale range, with no significant floor and ceiling effects, and were not notably skewed (tables 2 and 3 and appendix 3 of the online supplementary material). This was also found for the word recall, word recognition, and orientation components. However, 7/11 components (naming objects and fingers, commands, ideational praxis, remembering test instruction, spoken language, word finding, comprehension) had significant floor/ceiling effects (40%–64%) and were notably skewed (+1.0 to +2.0). These findings indicate adequate scale-to-sample targeting but potentially poor component-to-sample targeting. They indicate that the range of cognitive performance measured by these eight components is poorly matched to the ranges of cognitive performance in this sample.

Reliability

Cronbach's α and test-retest intraclass correlation coefficients for the ADAS-cog scale were high (0.84 and 0.94), supporting their reliability. Component level ICCs (range 0.75–0.83) were also well above the suggested minimum of 0.50 (tables 2 and 3).

Validity

Correlations between the ADAS-cog and MMSE were near our prediction at both screening (−0.63) and baseline (−0.74). Correlations between the ADAS-cog at baseline and sociodemographic variables (age and sex) were −0.01 and −0.07, respectively, indicating that ADAS-cog scores were not biased by these variables. These findings provided evidence for convergent and discriminant construct validity ICC (table 2).

MMSE subgroups (10–14 moderately severe; 15–20 moderate; 21–26 mild; table 4)

Targeting analyses revealed that ADAS-cog component-level ceiling effects progressively increased as the severity of Alzheimer's disease, measured by the MMSE, decreased (range: moderately severe 0–32%; moderate 0–59%; mild 0–82%; table 4). Reliability, as assessed by Cronbach's α and test-retest ICCs were low (range: 0.62–0.75 and 0.71–0.77, respectively). Finally, the examination of group differences validity revealed a stepwise decrease in ADAS-cog score as the MMSE score increases. The mean scores for the three groups are significantly different, in line with prediction (F=404.22; p<0.0001). However, correlations between ADAS-cog and MMSE scores within each group were low to moderate (0.17–0.49) and much lower than the predicted association between these two measures of cognitive performance and that found in the total sample.

Table 4

ADAS-cog psychometric analyses by MMSE subgroups (score ranges 10–14, 15–20 and 21–26)*

Discussion

At the scale level, the ADAS-cog met most traditional psychometric criteria in this large dataset of people with mild and mild-to-moderate AD, supporting the findings of previous research.5 6 12 13 However, a closer examination of the component level findings, a form of analysis rarely undertaken in previous ADAS-cog research,14 revealed suboptimal scale-to-sample targeting. The key issue here is that we would expect patients in this study to have a range of cognitive abilities. Despite this, over half the ADAS-cog components have substantial percentages of people (often >75%) scoring either 0 or 1, implying few or no problems in cognitive performance. As there is likely to be more clinical heterogeneity in patients' abilities than these components imply, this indicates a targeting problem, or mismatch, between the components' difficulties and patients' abilities in this sample. This is important because the limited component-level targeting will impact on the overall ability of the ADAS-cog to detect cognitive differences between people and groups and potentially be less sensitive to the effects of interventions, as reflected in the findings of others.14

Our findings demonstrate the importance of targeting rating scales to the individuals within a study sample. Specifically, the range of cognitive performance measured by the ADAS-cog should be well-matched to the range of cognitive performance present in the study sample so that the scale has the ability to detect variability among and within individuals. Poorly targeted scales most likely underestimate changes over time and differences between groups, which is particularly relevant for future Alzheimer's disease clinical trials that are tending to recruit people with milder Alzheimer's disease. The issue becomes more evident when targeting was examined in Alzheimer's disease severity subgroups, demonstrating that the component-level ceiling effects progressively increased as the severity of Alzheimer's disease decreased. This underscores the importance of examining component level targeting and demonstrates a misleading aspect of scale-level results.

The problem of targeting could be improved by developing the components of the ADAS-cog so that they span a wider and more appropriate range of measurement. Although component-level floor and ceiling effects will almost always exist to some extent, they should be minimised if the potential of the ADAS-cog to detect change is to be maximised. However, although demonstrating these issues, the information provided by the traditional psychometric analyses used here does not provide specific guidance on how the ADAS-cog items might be improved. Alternative approaches are needed to elaborate upon these findings and propose an evidence-based strategy for restructuring and expanding the existing ADAS-cog components.

Results from this study may have important implications for clinical research. Developments in our understanding of Alzheimer's disease have led to attempts to produce treatments aimed at slowing or altering disease progression. Appropriate evaluation of these treatments is dependent on rigorous measurement of clinically meaningful outcomes. Although the ADAS-cog offers clinicians a method of quantifying cognitive performance in people with Alzheimer's disease, our findings highlight important limitations. This research emphasises the importance of fully testing measures before clinicians and researchers apply them in clinical practice and treatment trials. In particular, it highlights the value of the component-level analyses, not typically undertaken, that identified problems with the ADAS-cog that were not detected by standard tests of scale reliability and validity.

Our study has three key limitations. First, the dataset was formed from baseline and screening data from proprietary clinical trial data. It would be valuable to repeat these analyses in non-proprietary data, in other large datasets, to ensure generalisability of our findings to the wider mild-to-moderate Alzheimer's disease population. Second, the current dataset did not allow for analyses of responsiveness to clinical change of cognitive performance over time. Although examinations of responsiveness will be useful to elaborate on and substantiate our present findings, they should not detract from addressing the component level targeting problems identified. Validity testing was also limited. In particular, we were restricted in the extent to which we could examine aspects of construct validity. Essentially, we were limited to using the MMSE as an external measure; a less detailed and comprehensive measure of cognitive performance. Thus, further examinations would be beneficial, including head-to-head comparisons with other more comprehensive neuropsychological measures of cognitive performance.

A third limitation is, although the current dominant paradigm for rating scale testing procedures, traditional psychometric analyses have many clinically important limitations, which we have outlined in detail elsewhere.11 24 In relation to the current study there are two key issues.

First, these methods are sample and scale dependent. This is clearly seen when we compare the performance of the ADAS-cog in terms of scaling assumptions (range of item-total correlations), reliability (Cronbach's α, test-retest ICCs) and validity (correlations between the ADAS-cog and MMSE) in the three Alzheimer's disease severity subgroups (as described above in the results above and presented in table 4). These results, if taken at face value, imply that the measurement performance of the ADAS-cog is Alzheimer's disease severity dependent. However, the variability in results can be explained by the limited variance of the estimates in each subgroup. This is because traditional psychometric methods are largely based on correlational analyses and correlations are strongly influenced by variability in the entities correlated. Unfortunately, traditional psychometric methods do not enable us to determine if the differences detected are real (ie, scale performance is dependent on Alzheimer's disease severity) or simply an artifact of the data distributions. More sophisticated psychometric approaches—for example, an analysis of differential item functioning using Rasch analysis—are required to make that distinction.

The second key issue relating to traditional psychometric analyses is that they provide limited information at the component level; particularly about the adequacy of the response options. Importantly, there are concerns over the use of traditional analyses in scales (for discussion see Hobart et al11), such as the ADAS-cog, that combine components with differing number and type of response categories. Therefore, once again, further examinations are required using newer sophisticated rating scale analysis techniques that overcome these limitations, such as Rasch measurement methods,25 26 to better diagnose the specific issues surrounding the performance of the ADAS-cog.11

In this study, the ADAS-cog showed the potential to be a scientifically strong measurement instrument. However, our study also suggests that the ADAS-cog has limited ability to detect cognitive performance differences between people, changes over time and the impact of treatment mild Alzheimer's disease. Our analyses of the ADAS-cog by MMSE subgroup (table 4) indicate that these limitations are more pronounced in the milder forms of Alzheimer's disease. The natural extrapolation of these findings is that the situation may be more problematic in people with mild cognitive impairment. Thus, in order for this scale to be a valuable cognitive performance measure in these patient groups, these limitations may need to be addressed.

Overall, although the ADAS-cog's psychometric performance was found to be satisfactory, more than half of its components may underestimate differences in cognitive performance in people with mild and moderate Alzheimer's disease. The limited distributions indicate widespread targeting issues, which may lead to problems in detecting clinical change when it occurs. This has important implications for the inferences of present and future clinical trials of Alzheimer's disease using the ADAS-cog. Given the limitations of traditional psychometric methods, further evaluations would be desirable using more sophisticated modern rating scale analysis techniques to pinpoint the specific improvements that are required to maximise the ADAS-cog as a measure of cognitive performance in people with Alzheimer's disease.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.

Supplementary materials

Footnotes

  • Funding Eisai Medical Research, Inc.

  • Competing interests TH and MM are employees of Eisai Medical Research. JS is an employee of Eisai Global Clinical Development. HP is an employee of Pfizer (previously an employee of Eisai Medical Research). SH was retained as a consultant to Eisai Medical Research. JH and SC were supported in part through a grant from Eisai Medical Research.

  • Provenance and peer review Not commissioned; externally peer reviewed.