Abstract
Mild cognitive impairment (MCI) is recognized as a transitional phase in the progression toward more severe forms of dementia and is an early precursor to Alzheimer's disease. Previous neuroimaging studies reveal that MCI is associated with aberrant sensory–perceptual processing in cortical brain regions subserving auditory and language function. However, whether the pathophysiology of MCI extends to speech processing before conscious awareness (brainstem) is unknown. Using a novel electrophysiological approach, we recorded both brainstem and cortical speech-evoked brain event-related potentials (ERPs) in older, hearing-matched human listeners who did and did not present with subtle cognitive impairment revealed through behavioral neuropsychological testing. We found that MCI was associated with changes in neural speech processing characterized as hypersensitivity (larger) brainstem and cortical speech encoding in MCI compared with controls in the absence of any perceptual speech deficits. Group differences also interacted with age differentially across the auditory pathway; brainstem responses became larger and cortical ERPs smaller with advancing age. Multivariate classification revealed that dual brainstem–cortical speech activity correctly identified MCI listeners with 80% accuracy, suggesting its application as a biomarker of early cognitive decline. Brainstem responses were also a more robust predictor of individuals' MCI severity than cortical activity. Our findings suggest that MCI is associated with poorer encoding and transfer of speech signals between functional levels of the auditory system and advance the pathophysiological understanding of cognitive aging by identifying subcortical deficits in auditory sensory processing mere milliseconds (<10 ms) after sound onset and before the emergence of perceptual speech deficits.
SIGNIFICANCE STATEMENT Mild cognitive impairment (MCI) is a precursor to dementia marked by declines in communication skills. Whether MCI pathophysiology extends below cerebral cortex to affect speech processing before conscious awareness (brainstem) is unknown. By recording neuroelectric brain activity to speech from brainstem and cortex, we show that MCI hypersensitizes the normal encoding of speech information across the hearing brain. Deficient neural responses to speech (particularly those generated from the brainstem) predicted the presence of MCI with high accuracy and before behavioral deficits. Our findings advance the neurological understanding of MCI by identifying a subcortical biomarker in auditory–sensory processing before conscious awareness, which may be a precursor to declines in speech understanding.
- auditory evoked potentials
- brainstem frequency-following response (FFR)
- cognitive aging
- dementia biomarkers
- event-related brain potential (ERPs)
- speech processing
Introduction
Mild cognitive impairment (MCI) is a prodromal stage of dementia recognized as an intermediate transitional phase in the progression of cognitive aging. Individuals with MCI show a high rate (>40%) of progression to dementia (Roberts and Knopman, 2013) and MCI is a key risk factor (×6) for conversion to Alzheimer's disease (Roberts and Knopman, 2013). Despite progress in identifying individuals with MCI using (neuro)psychological assessment, current clinical screens for cognitive impairments often suffer from high variability and poor test–retest reliability (<50%) (Wollman and Prohovnik, 2003; Spencer et al., 2013). To date, clinical interventions at early phases of cognitive decline (MCI state) have produced mixed results (Petersen et al., 2005). Advancing diagnostics and interventions for early cognitive impairment requires that relevant biomarkers of cognitive deficits are identified to detect individuals with early MCI before conversion to more severe dementia (Golob et al., 2007) and also distinguish them from normal aging adults.
MCI-related changes in neuroelectric auditory cortical activity have been reported in several event-related potential (ERP) studies (Golob and Starr, 2000; Golob et al., 2001; Golob et al., 2007). However, cortical dysfunction might be anticipated given the known atrophy (i.e., volumetric abnormalities) of cerebral cortex that occurs in MCI and early forms of dementia (Kantarci et al., 2009; Roberts and Knopman, 2013). To date, ERP studies have also used relatively simple stimuli (e.g., clicks and tones) to evaluate changes in auditory processing resulting from cognitive decline (Golob et al., 2001; Irimajiri et al., 2005; Golob et al., 2007). Arguably, these stimuli do not tax auditory system function—and potentially reveal neurological deficits—in the same way as more ecologically relevant sounds (e.g., speech; Song et al., 2006; Rocha-Muniz et al., 2014). Moreover, aging has been associated with declines in listening skills necessary for communication (Gordon-Salant and Fitzgibbons, 1993; Hutka et al., 2013; Bidelman et al., 2014a) that are exacerbated in cases of MCI (Johnson and Lin, 2014; Petersen, 2014). Investigating speech processing from brainstem to auditory cortex in adults with MCI could reveal the earliest level of the brain at which cognitive decline impairs neural function related to human communication. Moreover, whereas disruptions to cortical function are well documented (Golob and Starr, 2000; Golob et al., 2001; Wollman and Prohovnik, 2003; Golob et al., 2007; Bajo et al., 2010), the possibility that MCI produces functional changes in subcortical (brainstem) structures has yet to be established.
In the current study, we recorded neuroelectric activity elicited by speech sounds from the auditory brainstem (i.e., frequency-following responses, FFR) and cortex (ERPs) of normal-hearing older adults who did or did not present with subtle MCI confirmed via behavioral neuropsychological testing (Nasreddine et al., 2005). This approach has been helpful in delineating the hierarchy of neural events during speech perception (Bidelman et al., 2013) and has revealed, for example, that normal aging produces differential deficits in speech processing between subcortical and cortical brain regions (Bidelman et al., 2014a; Bidelman and Alain, 2015). Based on prior studies examining auditory ERPs in MCI (Golob et al., 2001; Irimajiri et al., 2005; Golob et al., 2007), we expected cortical speech responses to be weakened and delayed in individuals showing cognitive impairments. In addition, based on our prior work examining concurrent brainstem–cortical ERPs in normal aging adults (Bidelman et al., 2014a; Bidelman and Alain, 2015), we predicted that MCI would be characterized by aberrant brainstem speech representations beyond those exhibited purely with advancing age. This novel approach also allowed us the unique opportunity to evaluate for the first time the relative power of brainstem versus cortical biomarkers in predicting the severity of MCI as measured behaviorally. Our findings provide novel evidence that MCI affects the neural capture and transfer of speech signals between functional levels of the auditory system and extends the pathophysiological understanding of cognitive aging by identifying subcortical speech processing deficits with MCI 7–10 ms after sounds enter the ear.
Materials and Methods
Participants.
Twenty-three older adults (age range: 52–86 years; mean ± SD: 70.2 ± 7.2 years) were recruited from the University of Toronto and the greater Toronto area to participate in our ongoing studies of aging and the auditory system (Hutka et al., 2013; Alain et al., 2014; Bidelman et al., 2014a; Bidelman and Alain, 2015). Recruitment was aided by contacting individuals listed in the Rotman Research Institute's volunteer research participant database, which contains ∼10,000 participants, half of whom are over age 60. Participants were screened for cognitive function using normative data from the Montreal Cognitive Assessment (MOCA) (Nasreddine et al., 2005). The MOCA was administrated in a quiet room (the same room used for EEG recordings) by a trained research assistant familiar with neuropsychological testing. Fifteen older adults (8 male, 7 female) achieved normal MOCA scores (≥26 points; 27.6 ± 1.18; range: 26–30) (hereafter referred to as the control group). Eight participants were identified as having putative MCI (4 male, 4 female) via scores of <26 (23.0 ± 1.85; range: 20–25) on the MOCA battery (hereafter referred to as the MCI group). Note that these scores are well within the normative range for MCIs (19–25.2), but still well above the MOCA cutoff for Alzheimer's Disease or more severe dementia (11.4–21) (Nasreddine et al., 2005). As expected, MOCA scores were significantly lower in the MCI compared with control group (t(21) = −6.95, p < 0.001). Because only behavioral measures were used to identify MCI individuals, underlying etiology was unknown.
All participants were strongly right-handed (Oldfield, 1971), reported no known history of neurological or psychiatric illnesses, and were matched in total years of formal education (MCI: 14.6 ± 3.2 years, controls: 17.4 ± 3.75 years; t(21) = −1.68, p = 0.11). Musical training is known to enhance and partially counteract age-related declines in the neural encoding of speech at the brainstem and cortical levels (Bidelman and Alain, 2015). Importantly, groups did not differ in the extent of their formal musical training (MCI: 1.25 ± 3.2 years, controls: 6.8 ± 7.8 years; t(21) = 1.91, p = 0.07).
Age-related hearing loss is also known to alter brainstem and cortical auditory evoked potentials (Bidelman et al., 2014a). Audiometric testing showed that hearing thresholds did not differ between groups at octave frequencies between 250 and 4000 Hz (all p > 0.05), well beyond the bandwidth of our stimuli. Moreover, in both groups, thresholds were clinically normal (≤25 dB HL) based on pure-tone average (PTA) thresholds (i.e., average of 500, 1000, and 2000 Hz) bilaterally and did not differ between groups (t(21) = 1.80, p = 0.09). No ear differences were observed for either the MCI (t(14) = 0.25, p = 0.73) nor control group (t(28) = 0.30, p = 0.77), indicating symmetrical hearing. Therefore, hearing was well matched both within participants and between groups. Nevertheless, we found it prudent to use listeners' PTA as a covariate in subsequent analysis to control for differences in hearing acuity that may affect ERP/FFR analysis. The MCI group was on average ∼7 years older than the control group (MCI: 74.6 ± 3.3 years, range: 69–77; controls: 67.5 ± 8.2 years, range: 52–86; t(21) = 2.34, p = 0.03). Therefore, age was also used as a covariate in subsequent analyses. Participants were compensated for their time and gave written informed consent in compliance with a protocol approved by the Baycrest Centre research ethics committee.
Stimuli.
A synthetic five-step vowel continuum (hereafter “vw1–5”) was constructed so that each 100 ms token would differ minimally acoustically, yet be perceived categorically (Pisoni, 1973; Bidelman et al., 2013). First formant (F1) frequency was varied parametrically over five equal steps between 430 and 730 Hz, resulting in a stimulus set that spanned a perceptual phonetic continuum from /u/ to /a/. All other stimulus attributes (e.g., fundamental frequency, higher formants) were identical between tokens (for further stimulus details, see Bidelman et al., 2013). Categorical perception requires listeners to properly encode and compare sensory representations to an internalized memory template for speech and thus is likely to tax the auditory–cognitive system in ways that other speech listening tasks may not (e.g., paired discrimination).
Electrophysiological recordings.
Data acquisition and response evaluation were similar to our previous reports (Bidelman et al., 2013, 2014a; Bidelman and Alain, 2015). The task and EEG recordings were conducted in an electroacoustically shielded chamber (Industrial Acoustics). Briefly, stimuli were delivered to both ears at an intensity of 83 dB SPL through insert earphones (ER-3A; Etymotic Research). Extended acoustic tubing (50 cm) was used to eliminate electromagnetic stimulus artifact from contaminating neurophysiological responses (Aiken and Picton, 2008; Campbell et al., 2012).
In one block of 200 active trials, we used a forced-choice procedure in which participants indicated whether they heard “u” or “a” via a button press on the keyboard. After participants' behavioral response, the interstimulus interval (ISI) was jittered randomly between 400 and 600 ms (20 ms steps, rectangular distribution) to avoid α entrainment of the EEG (Luck, 2005) and listeners anticipating their behavioral response. An additional block of 2000 passive trials (ISI = 150 ms) was then collected to detect the submicrovolt brainstem FFR (Bidelman, 2014). Brainstem activity is approximately an order of magnitude smaller than cortical activity, so a larger number of sweeps is necessary to achieve a comparable signal-to-noise ratio for FFRs as for the ERPs (Bidelman, 2015a). Brainstem FFRs show robust repeatability within and across test sessions (Song et al., 2011; Bidelman et al., 2017) and are unaffected by attention (Picton et al., 1971; Hillyard and Picton, 1979; Galbraith and Kane, 1993). Therefore, participants watched a self-selected movie with subtitles during blocks of brainstem recordings to facilitate a calm yet wakeful state. In total, the protocol lasted ∼2 h.
Continuous EEGs were recorded differentially between electrodes placed on the high forehead at the hairline and linked mastoids. This montage (∼Fpz-A1/A2) is optimal for recording evoked responses of both subcortical and cortical origin (Musacchia et al., 2008; Bidelman et al., 2013; Bidelman, 2015a). Contact impedances were maintained below 3 kΩ throughout the duration of the experiment. Neuroelectric brain activity was digitized at 20 kHz and band-pass filtered online between 0.05 and 3500 Hz (Symamps2; Neuroscan). Traces were then segmented (cortical ERP: −100–600 ms; brainstem FFR: −40–210 ms), baselined to the prestimulus interval, and subsequently averaged in the time domain to obtain waveforms for each condition (Delorme and Makeig, 2004). Trials exceeding a ±50 μV threshold were rejected as blinks before averaging. Grand average evoked responses were then band-pass filtered in different frequency bands to emphasize brainstem (80–2500 Hz) and cortical (1–30 Hz) neural activity, respectively (Musacchia et al., 2008; Bidelman et al., 2013; Bidelman and Alain, 2015; Bidelman, 2015a).
Electrophysiological data analysis.
Age-related changes in brainstem and cortical responses are typically largest for amplitude rather than latency characteristics of the ERPs (Alain and Woods, 1999; Golob et al., 2007; Alain et al., 2012; Bidelman et al., 2014a). For data reduction purposes and to provide comparable measures of response amplitude between brainstem and cortical ERP classes, for each stimulus, we measured the RMS amplitude of the steady-state portion of the brainstem FFR (50–150 ms window) and the P1–N1–P2 magnitude of the cortical ERPs. P1 was taken as the peak positivity between 45 and 65 ms, N1 as the peak negativity between 70 and 120 ms, and P2 as the peak positivity between 150 and 250 ms (Irimajiri et al., 2005; Bidelman et al., 2014a). The overall magnitude of the N1–P2 complex, computed as the voltage difference between the two individual waves, was used as a singular index of the total cortical activation to each vowel stimulus. Whereas other FFR/ERP measures are available (Luck, 2005; Skoe and Kraus, 2010), FFR RMS and cortical N1–P2 metrics are advantageous because they both provide a description of the overall amplitude of the evoked response at brainstem and cortical levels using an isomorphic metric; their comparable unit of scale (microvolts) also ensures that multivariate discriminant analysis of neural responses was not artificially skewed by using unrelated signal properties across levels (e.g., FFR spectral vs ERP latency measures).
Behavioral data analysis.
Individual vowel identification scores were fit with a two-parameter sigmoid function. We used standard logistic regression: p = 1/(1+e−β1(x−β0)), where P is the proportion of trials identified as a given vowel, x is the step number along the stimulus continuum, and β0 and β1 are the location and slope of the logistics fit estimated using nonlinear least-squares regression, respectively. Comparing the β1 slope parameter between groups allowed us to assess possible differences in the “steepness” (i.e., rate of change) of the categorical speech boundary as a function of cognitive status. We have shown previously that normal aging weakens speech categorization skills, as reflected in shallower psychometric functions (Bidelman et al., 2014a; Bidelman and Alain, 2015). Behavioral reaction times (RTs) were computed separately for each participant as the mean response latency across trials for a given speech token. Following our previous reports on categorical speech perception (Bidelman et al., 2013; Bidelman and Alain, 2015; Bidelman and Walker, 2017), RTs shorter than 200 ms or exceeding 5500 ms were discarded as implausibly fast responses and lapses of attention, respectively. These trials (<5% of trials) were excluded from analysis. In addition to mean RTs, we computed the coefficient of variation (CV) of RTs (i.e., CV = SD/mean) for each participant per vowel condition. Previous studies have shown that, even in the absence of significant differences in central tendency, RTs can differ in terms of their intrasubject variability (dispersion measured via CVs) (Bernstein et al., 2014). Intraparticipant variability in behavioral RTs are also thought to index “neurological integrity” (Strauss et al., 2002) and have been shown to distinguish older adults with and without dementia (Duchek et al., 2009).
Discriminant function analysis: predicting MCI via scalp ERPs.
In addition to identifying potential changes in auditory processing within each group and level of the auditory neuroaxis, we aimed to determine whether brainstem and cortical activity could correctly classify listeners into their respective group membership and thus predict MCI. To this end, a linear discriminant analysis (LDA) was conducted to determine whether a weighted combination of neural predictors (FFR RMS and N1–P2 amplitudes) could discriminate control from MCI listeners based on their ERPs alone. The LDA was developed at the group level with a linear equation that was designed to classify listeners' neural responses into one of two mutually exclusive groups on the basis of their brainstem and cortical ERP measures. Two neural measures per listener and speech stimulus were input to the LDA (i.e., FFR RMS, N1–P2 amplitudes). Because it is recommended that there be >40–50 observations for LDA (Zavorka and Perrett, 2014), we used all 5 vowel responses for each listener, resulting in a total of 115 observations (=5 vowels * 23 listeners) submitted to the classifier. However, we note that classification results did not differ depending on whether we collapsed across vowels (see Results). Equality of the covariance matrices between predictors was confirmed using the Barlett test (p = 1.0) (Box, 1949), indicating appropriateness of a linear (rather than quadratic) discriminant function. LDA was implemented in MATLAB 2015b using the “fitdiscr” function using prior probabilities of p = 0.50 (i.e., chance level for two group classes).
Classification performance of the resulting discriminant function was determined by comparing the predicted group for each listener (based solely on their ERP measures) against their ground truth actual group identity based on their MOCA scores. This resulted in both a classification accuracy and error rate that were then used to construct a “neural confusion matrix.” Confusion matrices quantitatively indicate the degree (percentage correct) to which the combined neural brainstem and cortical ERP measures correctly classified listeners into the correct group (i.e., “control” vs “MCI”). We further assessed generalizability of the LDA predictions using k-fold cross-validation (where k = 10 folds). In this procedure, the dataset is partitioned into k equal-sized subsamples (folds). A single subsample of the k folds is retained as the validation data for model testing; the remaining k − 1 subsamples are used as training data. Because the number of to-be classified observations was 115, each test portion of a single fold was 11 observations and the training portion was 104 observations. k-fold cross-validation is advantageous because all observations are used for both the training and validation of the classifier model. For each subsample, the LDA function was recomputed and the results averaged across all k-folds to arrive at a cross-validated estimate of accuracy. Although cross-validated performance is typically lower than unvalidated model performance, it helps to prevent overfitting.
Brain–behavior relationships.
Pairwise Pearson's correlations were used to investigate correspondences between subcortical and cortical speech representations (brainstem: RMS amplitude; cortical: N1–P2 magnitude) and MOCA scores. This analysis allowed us to determine the degree to which brainstem and/or cortical speech encoding predicted severity of MCI as measured via the MOCA.
Statistical analysis.
Initial diagnostics confirmed that all dependent variables satisfied the assumptions of parametric analysis (i.e., homogeneity of variance, normality). Two-way, mixed-model ANOVAs were first conducted on all dependent variables using SAS version 9.4 software. Group (2 levels: MCI, control) functioned as the between-subjects factor; vowel stimulus (5 levels: vw 1–5) as the within-subjects factor; participants nested within group served as a random factor. Initial analyses indicated no main effect of vowel stimulus nor a group × vowel interaction in neural measures (all p > 0.05). Therefore, we pooled this factor (i.e., averaged across tokens) to reduce the dimensionality of the ANOVAs and assess group effects directly on ERPs. We used age and hearing (PTA) as covariates in the ANOVA models. Bonferroni corrections were used to control type I error inflation for multiple comparisons. Significance level was set at α = 0.05. All tests are two-tailed.
Results
Behavioral speech processing
While recording electrical brain activity, participants labeled sounds drawn randomly from a set of five vowels that differed only in first-formant frequency (Bidelman et al., 2013, 2014a). Speech identification functions and labeling speeds (i.e., RT CV) are shown in Figure 1, a and b, respectively. Despite the continuous change in the acoustic signal, listeners heard a clear perceptual shift in the phonetic category (/u/ vs /a/) near the midpoint of the vowel continuum (vw3). The overall location (β0 parameter: t(21) = 0.65, p = 0.52) and slope (β1 parameter; t(21) = 0.68, p = 0.50 (Fig. 1a, inset) of the psychometric functions did not differ between groups. However, these results might be expected given that all listeners were native English speakers.
An ANOVA on RTs showed a sole main effect of stimulus token (F(4,84) = 23.35, p < 0.001, ηp2 = 0.53) with no group (F(1,18) = 0.00, p = 0.98, ηp2 = 0.00002) nor group × vowel stimulus interaction (F(4,84) = 0.88, p = 0.48, ηp2 = 0.04) (data not shown). Planned contrasts [i.e., mean(vw1, vw2, vw4, vw5) vs vw3] revealed that participants in both groups were slower at classifying speech tokens near the categorical boundary (vw3) relative to others in the continuum (MCI: t(84) = 4.85, p < 0.001; controls: t(84)= 8.93, p < 0.001). The slowing of RT for ambiguous speech tokens is consistent with previous reports examining speeded vowel classification and is a hallmark of categorical perception (Pisoni and Tash, 1974; Bidelman et al., 2013). As with the central tendency of RTs (i.e., mean RT), we did not find a group difference in the CV (dispersion) of RTs, a measure of the intrasubject variability of the behavioral decision process (F(1,18) = 0.06, p = 0.80, ηp2 = 0.003) (Fig. 1b). However, RTs were more variable in both groups near the categorical boundary (vw 3) relative to stimuli in the continuum (MCI: t(84) = 3.45, p < 0.002; controls: t(84) = 4.80, p < 0.001). Collectively, these findings suggest that both MCI and control listeners were able to adequately categorize speech sounds and did so similarly in terms of their speed, accuracy, and variability.
Neural correlates of speech representation
We recorded brainstem and cortical evoked potentials to speech sounds using a novel electrophysiological paradigm that we recently developed (Fig. 2) (Bidelman et al., 2013, 2014b; Bidelman and Alain, 2015). Selective filtering techniques (Musacchia et al., 2008; Bidelman et al., 2013, 2014a, 2014b) were used to extract ERPs generated at both the subcortical and early cortical levels of the auditory pathway for subsequent analysis and comparison with behavior and MCI severity.
Brainstem FFRs
Speech-evoked brainstem FFRs and response spectra are shown in Figure 3, a and b, for a subset of the speech stimuli. FFR waveforms were marked by a strong phase-locked neurophonic reflecting the periodicity of speech. Across all stimuli, average onset latency of FFRs (first peak in the 8–15 ms time window) were 11.5 ± 1.3 ms (MCI) and 11.2 ± 0.9 ms (controls), consistent with generators in the upper midbrain (Sohmer et al., 1977; Bidelman, 2015b). Brainstem latencies did not differ between groups (t(21) = 0.59, p = 0.56). As corroborated by lesion data (Sohmer et al., 1977; Kiren et al., 1994), the latency (∼7–10 ms) (King et al., 2016) and high-frequency energy seen in FFR spectra (Bidelman, 2015b) (e.g., components up to ∼1000 Hz; Fig. 3b) rules out the possibility of a cochlear microphonic or cortical contribution in FFRs; the preneural cochlear microphonic has 0 ms latency and is coincident with the stimulus onset (Batra et al., 1986; Skoe and Kraus, 2010), whereas cortical evoked responses show much later onset latency (>30 ms) (Liégeois-Chauvel et al., 1994) and neurons in the Sylvian fissure are highly restrictive (low-pass) in their upper phase-locking limit (<80–90 Hz) (Joris et al., 2004). We found that MCI listeners showed larger FFRs than controls (Fig. 3a). MCI-related changes in subcortical activity were particularly evident in response spectra, which revealed hypersensitivity across the response bandwidth (Fig. 3b).
These observations were confirmed via quantitative analysis. FFR RMS amplitudes were larger for the low-MOCA (MCI) compared with high-MOCA (control) group (F(1,16) = 10.38, p = 0.005, ηp2 = 0.40) after accounting for age and hearing (Fig. 4a). However, a group x age interaction (F(1,16) = 10.96, p = 0.004, ηp2 = 0.41) revealed that the magnitude of this group effect varied with age. Post hoc contrasts showed that brainstem responses diverged between low- and high-MOCA scorers; FFRs were similar at age 60 and 70 but differed at 80 years (Fig. 4e). These findings indicate increased subcortical responses to speech in low-MOCA performers, particularly in adults ≥80 years of age.
Recent studies suggest that FFRs to low-frequency (∼100 Hz) sounds may receive contributions from cortical generators (Coffey et al., 2016). To confirm that FFR differences in the MCI group were of brainstem origin, we averaged amplitudes of the third through sixth harmonics (H3–H6; Fig. 3b) to evaluate neural speech coding at ≥300 Hz. These harmonics are several hundred Hertz above the known phase-locking limit of cortical neurons (Joris et al., 2004). Furthermore, FFRs from human depth electrode recordings in Heschl's gyrus are rarely observed above ∼100 Hz (Brugge et al., 2009). As with FFR RMS amplitudes, we found robust group differences at higher frequencies (F(1,16) = 15.38, p = 0.0012, ηp2 = 0.34); MCI listeners showed weaker phase locking to the upper harmonics (H3–H6) of speech sounds relative to controls with a similar group × age interaction (F(1,16) = 15.07, p = 0.0013, ηp2 = 0.33) as RMS amplitudes. Given that cortical FFRs have not been observed above 200 Hz (Brugge et al., 2009), the MCI group's poorer responses at even higher speech frequencies rules out the possibility that their aberrant FFR coding was due to cortical involvement and confirms a processing deficiency of brainstem origin.
Cortical ERPs
Using nonspeech stimuli (e.g., tone pips), previous studies have suggested that MCI is associated with increases in the P1 amplitude (∼50 ms) relative to the normal effects of aging (Golob et al., 2007). However, using speech sounds, we found no effect of group on P1 amplitudes (F(1,16) = 0.31, p = 0.59, ηp2 = 0.07) after accounting for age and hearing thresholds. P1 amplitude is often poorly defined (Godey et al., 2001), is strongly dependent on neural synchrony, and is optimally elicited by transient stimuli with abrupt onsets. This could account for the relatively weaker P1 and lack of group differences that we found here for speech (gradual voice onset time) compared with previous studies (e.g., tone pips; Golob et al., 2007).
Paralleling brainstem FFRs, we found larger cortical N1–P2 amplitudes in MCI listeners, particularly in the 100–200 ms time window marked by the N1 and P2 deflections (Fig. 3c). Analysis of N1–P2 amplitudes confirmed larger responses for the low-MOCA (MCI) compared with high-MOCA (control) group (F(1,16) = 10.96, p = 0.004, ηp2 = 0.34), even after adjusting for age and hearing (Fig. 4b). However, the group × age interaction was significant (F(1,16) = 10.41, p = 0.005, ηp2 = 0.33), meaning that the magnitude of this group effect was again age dependent. Whereas brainstem responses diverged between low- and high-MOCA performers with advancing age, cortical responses converged (Fig. 4f). Bonferroni-adjusted group contrasts at ages 60, 70, and 80 years revealed that this interaction was attributable to larger N1–P2 responses in the younger divisions of our cohort (MOCAhigh > MOCAlow; age 60–70 years), whereas cortical responses were similar in the oldest listeners (i.e., age 80: MOCAhigh = MOCAlow). These findings suggest that increased cortical responsiveness to speech in low-MOCA scorers is more prevalent in adults <70 years of age. A lengthening in the interpeak latency between N1 and P2 may be indicative of a deficit at the cortical level beyond general, age-related slowing. However, we found no group differences (nor interactions) in interpeak latency between the N1 and P2 waves (F(1,16) = 0.00, p = 0.97, ηp2 = 0.0001).
Importantly, these brainstem and cortical ERP effects were absent when we compared a median split of the control group by age (Fig. 5). Brainstem responses were more muted between the youngest (n = 8) and oldest (n = 7) divisions of the control group compared with MCI changes (Fig. 4) and did not reach significance (F(1,8) = 4.56, p = 0.07, ηp2 = 0.36) (Fig. 5a). Similarly, cortical N1–P2 amplitude differences were indistinguishable between the youngest and oldest of the normal cognitive listeners of our control sample (Fig. 5b) (F(1,8) = 0.14, p = 0.72, ηp2 = 0.015). Null age effects were observed using both these parametric and Friedman nonparametric statistical tests (i.e., two-way ANOVAs conducted on ranked variables). The absence of any neural effects in either brainstem FFRs or cortical ERPs further corroborates our covariate analyses and suggests that MCI-related changes in speech processing are not due exclusively to age alone.
Auditory biomarkers of MCI
Across speech tokens, it was evident that the control and MCI groups were linearly separable based on either their brainstem or cortical ERP responses; little overlap was observed in the distributions for each group for either of these two neural measures (Fig. 4c,d). High separability suggests that brainstem and cortical ERPs to speech might be used as a biomarker to distinguish low- from high-MOCA performers and thus identify listeners with putative MCI based solely on their brain activity. To this end, we conducted an LDA to evaluate whether group membership (control vs MCI) could be predicted by a linear combination of listeners' neural responses. For this analysis, we used all five vowel responses for each listener, resulting in a total of 115 (5 vowels * 23 listeners) submitted to the classifier.
Figure 6 shows the outcome of the LDA classifier in which each observation represents individuals' brainstem/cortical response amplitude for a single speech token. Overall, group membership was correctly predicted based on listeners' FFRs/ERPs with 80% accuracy (20% misclassification error, cross-validated), which is markedly above chance level (50% of classifications were expected to be correct by chance alone). Although the MOCA has good sensitivity/specificity (90/87%) (Nasreddine et al., 2005), our neural classification performance is considerably higher than other clinical diagnostics for cognitive impairment (e.g., MMSE: <50% Wollman and Prohovnik, 2003). On average, 68% of MCI listeners were accurately classified as low-MOCA performers; similarly, 88% of controls were correctly identified as high-MOCA performers. Similar performance was observed even after first collapsing (averaging) across the five vowel responses per listener and classifying the 23 participants into one of two groups (74% accuracy, cross-validated). Therefore, robust classification of MCI from control listeners was achieved using either the individual speech responses or a more global speech-processing measure, pooling responses across the vowel continuum.
Group classification was also well above chance when considering FFRs and ERPs alone in separate (single predictor variable) LDAs. However, cross-validated accuracy was notably higher in predicting group membership using brainstem responses (75.7%) compared with cortical ERPs (65.2%). Collectively, our LDA results suggest that either brainstem or cortical ERPs can distinguish normal from MCI listeners above chance levels (the former being a better predictor), but the combination of both neural measures yields up to ∼15% better classification.
Brain–behavior relations in predicting MOCA scores
Having established that combined brainstem and cortical speech-evoked responses can distinguish MCI from controls, we next aimed to determine whether neural measures were related to MOCA scores (a continuous variable) and, if so, which level of auditory processing (i.e., brainstem or cortex) was most predictive of MCI severity. Correlations between listeners' neural (brainstem FFR, cortical ERP) responses and MOCA scores are shown in Figure 7. We found a significant correlation between brainstem FFRs and MOCA (r = −0.52, p = 0.01). In contrast, we did not observe a relation between the cortical ERPs and MOCA scores (r = −0.28, p = 0.19). These findings corroborate the LDA analysis and indicate that, whereas both brainstem and cortical speech coding distinguish MCI from control individuals, subcortical activity seems to more strongly predict mild cognitive decline than cortical speech processing.
Discussion
Our findings indicate that MCI coincides with changes to the neural representation of speech at multiple levels of the nervous system. Notably, we demonstrate for the first time, that MCI is associated with subcortical brainstem deficits in auditory processing. Our results provide new insight into the neuropathology of MCI by revealing subtle changes in how the nervous system captures fine spectral details of sound that enables robust speech identification. Collectively, our data provide evidence that, in individuals with MCI, the early sensory transcription of speech is altered at a subcortical level and there is a disproportionate increase in cortical speech-evoked responses compared with normal aging adults.
Brainstem and cortical speech-evoked responses as a function of age and MCI
MCI-related increase in brainstem and cortical speech-evoked responses interacted with age, suggesting that the influence of MCI on subcortical and cortical auditory–sensory processing is age dependent. Interestingly, the pattern of these age-dependent effects was reversed between brainstem and cortex. Brainstem speech responses were abnormally large in MCI participants compared with controls, which became even more prominent with advancing age (i.e., diverging effect). In MCI participants, cortical overresponsivity was also observable in cortical evoked responses (N1–P2). However, this effect became smaller (i.e., converged) with advancing age. Nevertheless, MCI-related increased amplitudes in both subcortical and cortical responses suggests diffuse changes in auditory processing that occur concomitantly at multiple levels of the nervous system.
Both peripheral hearing loss and reduced cognitive flexibility may contribute to the speech processing deficits that emerge late in life (Humes et al., 2012). Although both groups had normal audiometric thresholds and were equated for hearing acuity, MCI-related changes in hierarchical processing are more difficult to dissociate from those of normal aging. Despite accounting for age in our analyses, MCI listeners were older than their peers. Comparing present findings alongside our previous studies on normal aging helps to further distinguish the effects of MCI (present study) from pure aging (Bidelman et al., 2014a). We have shown previously that normal aging produces differential changes in the hierarchy of speech representations between brainstem and cortex. However, in individuals with normal cognitive function, we showed that aging in and of itself has a surprisingly small effect on brainstem speech FFRs, whereas age-related hearing loss (i.e., presbycusis) actually weakens subcortical neural responses (Bidelman et al., 2014a). This is in contrast to both MCI-related deficits observed in the current study, which showed an opposite pattern; stronger brainstem and cortical responses were observed in cognitively impaired listeners across the board along with a differential pattern (i.e., age × MCI interaction) between the FFRs and ERPs. This was further substantiated in the present study when our data were split by median age, which confirmed that MCI-related changes in speech processing were not accounted for by age alone.
Although MCI listeners tended to show weaker perceptual speech identification, we found no group difference at the behavioral level either in terms of speed or accuracy, although there was a tendency for higher intrasubject variability in MCI listeners' speech-labeling speeds (Fig. 1b). This suggests that MCI is associated with significant neurophysiological changes in the auditory system that precede behavioral deficits and loss of communication skills (Johnson and Lin, 2014; Petersen, 2014). Communication deficits associated with MCI tend to involve complex expressive and receptive language processing (e.g., verbal fluency, reading comprehension; Johnson and Lin, 2014). The lack of group effect observed here in categorical speech perception could be due the simplistic nature of our identification task. Nevertheless, our behavioral data underscore the limitation of using perceptual measures alone in diagnostics. In this regard, the dual brainstem–cortical ERP approach used here might provide a “preclinical” biomarker of MCI before it manifests in traditional clinical and neuropsychological assessments.
Our results can be interpreted in the context of the decline compensation hypothesis (Wong et al., 2009) or the related compensation-related utilization of neural circuit hypothesis (Reuter-Lorenz and Cappell, 2008). Both frameworks posit that, even at low task complexity levels, brain regions are overrecruited in cognitive aging, reflecting neural processing inefficiencies farther downstream. Under these interpretations, the minimal differences we find in behavioral performance (Fig. 1) could result from an overactivation of the speech network (Figs. 3, 4) as a means to compensate for the pathophysiological declines of MCI and maintain a computational output similar to cognitively normal individuals. Our results also align with dedifferentiation frameworks (Li and Lindenberger, 1999), which suggest that age-related impairments in cognitive function stem from reductions in the fidelity of neural representations. In the present study, we find lower fidelity neural speech representations in MCI individuals at both the subcortical and cortical levels.
Predictive power of subcortical versus cortical ERPs in predicting MCI severity
Previous studies showed normal or slightly larger responses of the early cortical ERPs (e.g., P1) in patients with mild-to-moderate Alzheimer's disease relative to age-matched controls (Golob and Starr, 2000). Moreover, recordings of the mismatch negativity (MMN), a component reflecting early auditory change detection, reveal greater MMN attenuation with slowing stimulus presentation rates in patients with dementia than in controls, indicative of a faster decay of auditory sensory memory (Pekkonen et al., 1994). These results, along with our present findings, indicate an overresponsivity to auditory stimuli. The pervasive hypersensitivity in neural coding suggests that MCI might be characterized by early disinhibition resulting from prefrontal dysfunction via efferent connections or abnormalities within the ascending auditory pathways. Indeed, prefrontal dysfunction (e.g., lesions) have been shown to enhance the amplitude of middle latency auditory evoked potentials, which reflect early corticothalamic registration of sound (Knight et al., 1989). We extend previous studies (Golob et al., 2007) by demonstrating that a hypersensitivity in auditory coding extends before auditory cortex as low as the auditory brainstem. Presumably, impaired transfer of information between brainstem and cortex might be a precursor to declines in speech reception and language skills often observed in MCI (Johnson and Lin, 2014; Petersen, 2014).
To our knowledge, this is the first study to reveal that MCI is associated with changes in subcortical auditory function. Previous electrophysiological studies investigating brainstem (click-ABR) and middle latency responses in cases of MCI have reported comparable responses compared with age-matched controls (Irimajiri et al., 2005). However, MCI-related changes have been observed in the latency/amplitude characteristics of the P1 and later waves of the ERPs, leading to the view that primary auditory cortex was the earliest level of the auditory system susceptible to MCI-related neuropathology (Irimajiri et al., 2005). Our results challenge this notion by showing clear evidence of MCI-related changes in the brainstem's transcription of speech sounds. This was evident in FFRs by the increased (hypersensitive) coding across the response's spectrum. The FFR is thought to reflect a neurophonic potential generated in the inferior colliculus (Sohmer et al., 1977; Bidelman, 2015b), the onset of which corresponds to the transmission delay from the ear to midbrain (i.e., ∼7–10 ms; King et al., 2016). Our data indicate that MCI is associated with speech coding deficits as early as 10 ms after sounds enter the ear, well before a listener becomes consciously aware of the speech signal. Our findings bolster the notion that speech-evoked brainstem FFRs might provide a sensitive index of nascent changes in auditory neural processing that accompany early MCI.
Current in vivo neuroimaging diagnostics (e.g., MRI, PET) are promising in defining the neurology of MCI, but are costly, have numerous contraindications (e.g., metal, claustrophobia), and lack portability. Most also characterize anatomical, volumetric, and metabolic abnormalities of the brain rather than functional changes that underlie cognition proven to be more effective in characterizing neurocognitive disorders (Sporns et al., 2004; Hoeft et al., 2007). Auditory FFRs/ERPs provide a direct measure of neuronal function and might provide a cost-effective alternative for assessing subtle changes in brain function in nascent forms of dementia. Nevertheless, a limitation of our study is that it only included behavioral measures to identify MCI individuals. The underlying etiology was unknown. It would be important in future work to establish links between our preliminary brainstem–cortical electrophysiological indicators and other prognostic biomarkers of dementia (e.g., hippocampal volume, β amyloid proteins) (Cordell et al., 2013). Such studies could also address several interesting questions regarding MCI's time course; for example, whether the brainstem dysfunctions observed here precede hippocampal damage and/or the development of amyloid deposits.
Nevertheless, we found that dual brainstem–cortical ERPs offered an overall accuracy of 80% in detecting MCI. This suggests that early changes in auditory sensory coding, as indexed by FFRs/ERPs, might complement behavioral measures by providing an objective measure for detecting early MCI and help to discriminate early dementia from normal aging (Wollman and Prohovnik, 2003; Spencer et al., 2013). Although we found that both brainstem and cortical speech coding distinguished control from MCI individuals (Fig. 6), subcortical activity more strongly predicted mild cognitive decline than cortical speech processing (Fig. 7).
Previous work suggests that the earliest components of the cortical evoked potentials (e.g., P1 at ∼50 ms) might be especially sensitive to MCI severity and predict the conversion to more severe forms of dementia within a few years; that is, older adults showing the largest cortical response amplitudes may have a greater risk of converting to Alzheimer's disease (Golob et al., 2007). Our findings suggest that subcortical responses may provide an even more sensitive biomarker for predicting MCI severity than ERP components, which show comparably weaker change (N1–P2) or do not yet show reliable changes (P1) in the early MCI states examined here. Nonetheless, future studies are needed to determine whether the brainstem and/or cortical auditory electrophysiological markers observed here can detect MCI in a larger sample and broader demographic. This latter point is important in light of the fact that normative MOCA scores can vary quite dramatically depending on listeners' educational achievements and ethnicity (Nasreddine et al., 2012; Narazaki et al., 2013). Future longitudinal studies are also needed to determine whether brainstem speech FFRs also predict conversion to Alzheimer's or more severe dementia from the MCI stages examined here. This is needed to extend our cross-sectional results and demonstrate a causal connection between MCI and declining brainstem–cortical speech processing that is suggested by these correlational data. Indeed, the involvement of sensory regions extending as low as the brainstem may represent early progression from an isolated episodic memory disorder (amnesic MCI) to one affecting multiple perceptual–cognitive systems (Golob et al., 2007), which impairs even preattentive sound processing in the brainstem.
Footnotes
This work was supported by the GRAMMY Foundation (G.M.B.), the Canadian Institute of Health Research (Grant MOP 106619 to C.A.), the FedEx Institute of Technology, and the Center for Technologies and Research in Alzheimer's Care at the University of Memphis (S.H.T.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Gavin M. Bidelman, Ph.D., School of Communication Sciences and Disorders, University of Memphis, 4055 North Park Loop, Memphis, TN 38152. g.bidelman{at}memphis.edu.