Abstract
Recognizing speech in difficult listening conditions requires considerable focus of attention that is often demonstrated by elevated activity in putative attention systems, including the cingulo-opercular network. We tested the prediction that elevated cingulo-opercular activity provides word-recognition benefit on a subsequent trial. Eighteen healthy, normal-hearing adults (10 females; aged 20–38 years) performed word recognition (120 trials) in multi-talker babble at +3 and +10 dB signal-to-noise ratios during a sparse sampling functional magnetic resonance imaging (fMRI) experiment. Blood oxygen level-dependent (BOLD) contrast was elevated in the anterior cingulate cortex, anterior insula, and frontal operculum in response to poorer speech intelligibility and response errors. These brain regions exhibited significantly greater correlated activity during word recognition compared with rest, supporting the premise that word-recognition demands increased the coherence of cingulo-opercular network activity. Consistent with an adaptive control network explanation, general linear mixed model analyses demonstrated that increased magnitude and extent of cingulo-opercular network activity was significantly associated with correct word recognition on subsequent trials. These results indicate that elevated cingulo-opercular network activity is not simply a reflection of poor performance or error but also supports word recognition in difficult listening conditions.
Introduction
Speech recognition requires considerable focused attention, particularly in adverse listening conditions. Indeed, the engagement of putative attention systems is observed in neuroimaging studies during speech recognition when intelligibility is reduced by adding background noise (Wong et al., 2008; Adank et al., 2012), bandpass filtering (Eckert et al., 2009; Harris et al., 2009), time compression (Poldrack et al., 2001; Adank and Devlin, 2010) and noise vocoding (Wild et al., 2012; Erb et al., 2013). Specifically, increased activity is observed in a cingulo-opercular pattern of brain regions that include medial frontal cortex, anterior insula, and frontal operculum when speech recognition becomes more effortful (Wild et al., 2012). However, there is limited understanding about the degree to which cingulo-opercular activity provides benefit during speech recognition in difficult listening conditions.
A cingulo-opercular pattern of performance-related effects is not limited to speech and language studies. Elevated activity in cingulo-opercular cortex has been observed across perceptual and cognitive tasks (Duncan and Owen, 2000; Braver et al., 2001; Dosenbach et al., 2006), particularly when people make errors or are uncertain about their response. Moreover, participants exhibit overlapping cingulo-opercular activity when they perform both auditory and visuospatial tasks (Eckert et al., 2009), thereby suggesting domain general function(s) for cingulo-opercular cortex.
The sensitivity of cingulo-opercular cortex to task difficulty and performance across a variety of tasks, as well as the shared pattern of activity over time of these regions, has guided the hypothesis that cingulo-opercular cortex comprises a network that can monitor and adjust performance throughout a task (Dosenbach et al., 2007). Because the cingulo-opercular network is proposed to optimize performance in response to task demands and performance, cingulo-opercular activity is also expected to relate to future task performance. This premise is supported by evidence that cingulo-opercular activity is predictive of behavior on subsequent trials during visuospatial tasks [i.e., response latencies and percentage correct (Carter et al., 2000; Kerns et al., 2004; Weissman et al., 2006; Eichele et al., 2008)].
Based on the extant literature, we hypothesized that the relative engagement of cingulo-opercular cortex supports speech recognition in difficult listening conditions, in addition to signaling increased error risk. A functional magnetic resonance imaging (fMRI) experiment with a word recognition in noise task was used to test the prediction that elevated cingulo-opercular activity increases the likelihood of correct word recognition on the next trial. Trial-level functional connectivity analyses were also performed to test the prediction that a broader extent of network activity occurs before correct word recognition than incorrect word recognition, in difficult listening conditions.
Materials and Methods
Participants.
Eighteen healthy adults participated in this study (10 females; aged 20–38 years; 29.2 ± 5.8, mean ± SD). The participants were native English speakers and had a mean Edinburgh handedness questionnaire score of 68.3 ± 50.3 from a possible range of −100 (strongly left-handed) to 100 (strongly right-handed; Oldfield, 1971). The participants had an average of 16.1 ± 2.2 years of education, an average socioeconomic status of 53.9 ± 9.9 with a possible range of 8–66 (Hollingshead, 1983), and reported no history of neurological or psychiatric events. Participants were selected for clinically normal hearing with mean pure-tone thresholds from the better ear <9.2 dB hearing level from 200 to 8000 Hz (Madsen OB922 audiometer and TDH-39 headphones). Mean pure-tone thresholds did not differ between right and left ears by >5 dB, and all participants had normal immittance measures. Informed consent was obtained in compliance with the Institutional Review Board at the Medical University of South Carolina, and experiments were conducted in accordance with the Declaration of Helsinki.
Stimuli.
The word-recognition task included 120 monosyllabic consonant–vowel–consonant words recorded by a male speaker (Dirks et al., 2001) that were presented through Sensimetrics piezoelectric insert ear phones. Words were not repeated across intelligibility conditions to avoid interactions between word intelligibility and priming or memory effects. The signal-to-noise ratio (SNR) of the words was manipulated by presenting a continuous multi-talker babble recording at a constant level of 82 dB SPL and words at either 92 dB SPL (+10 dB SNR) or 85 dB SPL (+3 dB SNR). The multi-talker babble recording was originally prepared as part of the SPIN test (Kalikow et al., 1977) and consists of 12 talkers, which results in energetic masking equivalent to steady-state noise at the same SNR (Miller, 1947; Carhart et al., 1969; Wilson et al., 2012). The level of target words and multi-talker babble were calibrated separately in the scanner control room before each scanning session. Mean word recognition was not related to mean pure-tone thresholds from the best ear (r = −0.36, p = 0.14), indicating that stimulus audibility did not contribute substantially to performance differences across participants.
Experimental procedure.
During each word-recognition epoch (60 trials each), participants performed a word-recognition task in multi-talker babble that was presented continuously with each word at +3 or +10 dB SNR during a sparse sampling imaging acquisition (Fig. 1). Words were presented in each SNR condition for four to six consecutive trials to limit predictability for the onset of the next SNR condition block. Participants were instructed to respond vocally, repeating each word aloud or to say “nope” if the word was not recognized. No performance feedback was provided during the experiment. A visual prompt (“get ready”) was displayed to cue participants to the start of word-recognition epochs. In addition, a crosshair changed color from white to red to cue participants about when to respond. Participants viewed the projector screen through a periscope mirror. The experimental design consisted of two word-recognition epochs and three rest intervals at the beginning, middle, and end of the experiment (Vaden et al., 2012).
Word-recognition scores.
Responses were scored as correct only if the word was repeated exactly as it was presented. Participant responses were transcribed by two raters with agreement for 96.07% of the trials. Recordings from an MRI-compatible microphone (Magnetic Resonance Technologies) were used to clarify scoring discrepancies between raters. Unintelligible or missing responses were omitted from analyses, whereas “nope” responses were scored as incorrect. For the word-recognition results presented below examining the affect of SNR, a general linear mixed model (GLMM) was used to assess the main effect of SNR on word recognition (correct or incorrect) for each trial using the R statistics software (R version 2.15.0 with R-packages: lme4, version 0.999375.42). Model testing confirmed that inclusion of individual differences in age and mean pure-tone threshold did not improve model fit (χ2 < 3.32, p > 0.07).
Image acquisition.
Structural and functional images were collected using a 32-channel head coil on a Siemens 3 T scanner. The T1 images were acquired in 160 slices with a 256 × 256 matrix, TR of 8.13 ms, TE of 3.7 ms, flip angle of 8°, slice thickness of 1.0 mm, and no slice gap. One hundred eighty whole-brain functional images were acquired using a T2*-weighted, single-shot echo-planar imaging sequence (36 slices with a 64 × 64 matrix; TR, 8.6 s; TE, 35 ms; acquisition time, 1647 ms; slice thickness, 3.00 mm; gap, 0; sequential order; GRAPPA-parallel imaging with acceleration factor of 2). Voxel dimensions for the functional images were 3 × 3 × 3 mm.
Preprocessing neuroimaging data.
Structural T1-weighted images were processed with the Advanced Normalization Tools (ANTS version 1.9; www.picsl.upenn.edu/ANTS) to create a study-specific template (Avants and Gee, 2004; Vaden et al., 2012). Preprocessed functional images were coregistered to native space structural images and then spatially transformed into group-defined space using parameters generated by ANTS. Voxel coordinates with significant peak effects were converted into Montreal Neurological Institute (MNI) space by normalizing the study-specific template to the MNI template with ANTS and applying the resultant deformation parameters to the peak voxel coordinates in study-specific space.
Functional blood oxygen level-dependent (BOLD) images were preprocessed using SPM8 (www.fil.ion.ucl.ac.uk/spm) to realign, unwarp, and coregister functional images to corresponding structural scans from each individual before normalization and smooth spatially normalized images with an 8 mm Gaussian kernel. The linear model of the global signal method (Macey et al., 2004) was used to detrend the global mean signal fluctuations from these preprocessed images. An algorithm detailed by Vaden et al. (2010) was used to identify functional images with voxel or volume intensities that exceeded 2.5 SDs from the mean time series intensity. Across the 18 participants, the signal outlier vectors identified 4.48% of the functional images as containing extreme noise, which were entered as binary nuisance regressors in the individual level general linear model (GLM). Before performing the GLMM analysis, the fMRI data were residualized to eliminate correlations with the nuisance regressors.
GLM fMRI analyses.
A GLM analysis was performed to test the predictions that cingulo-opercular activity increased during more effortful word recognition (i.e., +3 > +10 dB SNR) and errors (i.e., incorrect > correct) and that this activity was distinct from responses to salient transitions (i.e., first trial of rest, babble, word-recognition blocks > others). Words presented in babble were modeled as separate event types in the +3 and +10 dB SNR conditions, each with a word-recognition parametric modulator (1 for correct responses, 0 for incorrect). The model also included discrete event types for trials when babble was presented in the absence of a word-recognition trial, as well as transition trials that initiated each block of continuous silent rest, babble, and words in babble. The resultant GLM contained four modeled event types that were convolved with the hemodynamic response function and two word-recognition parametric modulators. Six additional nuisance regressors were used: four vectors measuring head position and motion (Kuchinsky et al., 2012; Wilke, 2012) and two signal outlier vectors. Group-level analyses were performed using the individual contrast maps to identify significant effects related to the intelligibility manipulation (+3 dB SNR words compared with +10 dB SNR), word recognition, and salient transitions between blocks of trials. Main effects were submitted to an uncorrected (UNC) voxel statistic threshold of Z = 3.09, pUNC = 0.001 and were familywise error corrected (FWE) with a cluster extent threshold ≥20 voxels, pFWE = 0.05 (Friston et al., 1994).
GLMM fMRI analysis.
A logistic GLMM analysis was performed to test the prediction that cingulo-opercular activity was associated with word recognition on the subsequent trial, using the R statistics software (R version 2.15.0 with R-packages: lme4, version 0.999375.42; AnalyzeFMRI, version 1.1.14). The dependent variable was binary word recognition (W) for each trial (t), excluding trials that immediately followed a different SNR condition or rest (i.e., first trial of each block). The GLMM analysis was performed for each voxel with the following independent variables: (1) BOLD contrast measured before each word-recognition trial (t − 1), (2) SNR condition, (3) SNR × BOLD interaction, and (4) random subject effects (SUB), which can be expressed as follows: Wt = SNRt + BOLDt − 1 + (1|SUB) + error. Only voxels with time series missing Vaden et al., 2012)] were included in the GLMM analyses. To ensure that BOLD variability did not reflect participant or SNR differences, BOLD time series from each participant were centered and scaled (mean ± SD, 0 ± 1) within SNR conditions. To determine whether error-related BOLD variability was necessary for activity to predict next trial word recognition, a control GLMM analysis was performed, which only included trials that followed correct responses (1258 post-correct trials of 1633 in the main GLMM analysis).
Model testing (Hofmann, 1997) was performed in each voxel before significance testing to determine whether the fit of the GLMM significantly decreased after removing the SNR × BOLD interaction. This interaction term was excluded from the GLMM to avoid losing sensitivity to main effects, when the interaction did not significantly affect model fit (χ2 test, p > 0.05). Z-scores were saved for each voxel before applying an UNC voxel statistic threshold of Z > 3.09, pUNC = 0.001, and correcting for FWE with a cluster extent threshold ≥26 voxels, pFWE < 0.05, to identify clusters with activity that significantly predicted performance on the next trial as well as SNR × BOLD interactions.
Functional connectivity analyses.
Connectivity analyses were performed to demonstrate the extent to which cingulo-opercular regions could be operationally defined as a functional network and to test the prediction that connectivity increased during the word-recognition task compared with rest trials. Regions of interest (ROIs) were functionally defined using each significant cluster that predicted word-recognition performance. Mean BOLD time series were computed for each participant in the six ROIs. As with the GLMM analysis above, the first trial of each block was excluded to eliminate transition-related effects (Sridharan et al., 2007), and the time series were centered and scaled within SNR conditions or silent rest (Fig. 2B,C). The strength of connectivity between cingulo-opercular network regions was tested using partial correlations that controlled for random subject effects (1692 word-recognition trials; 432 silent rest trials). Because 15 unique ROI combinations were tested, the significance of each partial correlation was Bonferroni's corrected by adjusting the critical α for multiple comparisons. To assess the increase in connectivity throughout the cingulo-opercular network when participants were engaged in the task, a paired-sample test was performed to compare partial correlations between cingulo-opercular network regions during word-recognition epochs and rest epochs. Before the paired-sample test, a Fisher Z′ transform was applied to the partial correlations to ensure a normal distribution.
We then tested the prediction that elevated and coherent activity across multiple cingulo-opercular network regions was predictive of word recognition on the subsequent trial. In other words, we examined whether the extent of cingulo-opercular network engagement provided word-recognition benefit on subsequent trials. This approach differs from a traditional connectivity analysis in which a single metric of connectivity is obtained across an experiment and is correlated with performance across subjects. Instead, this analysis examined the extent to which coordinated and elevated activity across cingulo-opercular regions on one trial predicted word recognition on the next trial within subjects.
The extent of cingulo-opercular network engagement was estimated for each trial by counting the number of ROIs with activity that was elevated compared with the mean activity for each ROI across word-recognition trials. Each participant's cingulo-opercular ROIs were defined based on the GLMM results (above) for the other 17 participants. This leave-one-out procedure ensured that the ROI time series used in the trial-by-trial connectivity analyses were independent from the data that were used to define the ROIs, because each participant's ROIs were based on significant clusters in the other participants. The control procedure for defining each participant's ROIs used the same voxel statistic (pUNC = 0.001) and cluster extent (pFWE = 0.05) thresholds as in the previous group analyses. The significant cingulo-opercular clusters varied spatially across leave-one-out GLMM results, such that the number of ROIs was 4.22 ± 1.31. For that reason, the proportion of cingulo-opercular network regions in each participant (PROI) with elevated activity and word recognition on the following trial were used in a trial-level GLMM logistic regression analysis: Wt = SNRt + PROIt − 1 + (1|SUB) + error. This model excluded the SNR × PROI interaction term, because the interaction term did not improve model fit (χ2 = 1.77, p = 0.18). Thus, this trial-level connectivity analysis was designed to provide evidence for coherent network activation effects on subsequent word recognition.
Results
Task errors and difficulty
As expected, word recognition was significantly lower in the +3 dB SNR (66.1 ± 7.6%) compared with the +10 dB SNR (90.9 ± 3.9%) condition (Z = 11.34, p < 0.001). Word-recognition errors and reduced SNR were uniquely associated with elevated activity in cingulo-opercular voxels using a GLM (Fig. 2A,B). These effects were also independent of right temporo-parietal junction and right fronto-opercular regions that responded to salient transitions between experiment epochs (rest, babble, word recognition; Fig. 2C; Table 1), which have been related to task performance (Lie et al., 2006; Weissman et al., 2006). Together, the results demonstrate unique responses of cingulo-opercular cortex to performance and relative difficulty that also are unique from a temporo-parietal orienting response.
Cingulo-opercular activity and word recognition on the next trial
A voxelwise GLMM analysis demonstrated that correct word recognition was significantly more likely for trials that followed elevated activity in cingulo-opercular regions than for trials that followed lowered activity (Fig. 3A,B; Table 2, top). Although there was a 13.1% word-recognition benefit from elevated cingulo-opercular activity in the +3 dB SNR condition compared with a 4.9% word-recognition benefit in the +10 dB SNR condition, there were no significant interactions between BOLD contrast and SNR in predicting word recognition on the next trial. In addition, there were no regions with significant negative associations between BOLD contrast and word recognition on the next trial. Similar to the studies by Kerns et al. (2004) and Weissman et al. (2006), elevated activity in regions sensitive to task difficulty and errors was associated with improved word recognition in the next trial. We also examined the extent to which there was word-recognition benefit from elevated cingulo-opercular activity when the analysis only included trials that were preceded by correct responses. A nearly identical set of cingulo-opercular network regions were significantly predictive of word recognition on the next trial regardless of whether post-error trials were included in the analysis (Table 2, top) or not (Table 2, bottom).
Connectivity and word recognition on the next trial
Cingulo-opercular regions exhibited significantly correlated activity across rest and across word-recognition trials (Fig. 3C,D). Importantly, a paired-sample test demonstrated significantly increased partial correlations (Z′) between cingulo-opercular network regions during word-recognition epochs compared with rest epochs (Z = 4.93, p < 0.001). For example, the partial r for the right anterior insula/frontal operculum and dorsal paracingulate time series was significantly greater (Z = 2.01, p = 0.02) during word recognition (partial r = 0.48) than during rest (partial r = 0.40). The increased connectivity across the cingulo-opercular regions from rest to word-recognition epochs demonstrates that the coherence of activity within the cingulo-opercular network increased during task performance.
The trial-level connectivity analysis demonstrated that word recognition was significantly more likely after trials in which greater numbers of cingulo-opercular network regions exhibited elevated activity relative to the mean of the time series for each ROI (Z = 2.03, p = 0.04). Consistent with the voxelwise analyses above, the association between the extent of cingulo-opercular network activation and subsequent word recognition was also significant for trials that followed only correct responses (Z = 2.61, p = 0.005). Together, the voxel-level and connectivity results demonstrate that the magnitude and coherence of elevated cingulo-opercular activity facilitated subsequent word recognition, independent of error and task difficulty (SNR condition).
Discussion
A strikingly consistent pattern of functional imaging results from speech-recognition studies is elevated cingulo-opercular activity, particularly when speech recognition is difficult (Sharp et al., 2006; Eckert et al., 2009; Harris et al., 2009; Adank, 2012; Wild et al., 2012; Erb et al., 2013). Our findings demonstrate that cingulo-opercular engagement increases the likelihood of correct word recognition on the next trial, which is consistent with similar findings in visuospatial studies (Kerns et al., 2004; Weissman et al., 2006). Moreover, using a trial-level connectivity analysis, we observed that correct word recognition increased when all regions within the cingulo-opercular network exhibited elevated activity on the previous trial. Thus, our findings are consistent with the premise that the cingulo-opercular network is important for adaptive control, including during word recognition.
Attention findings in language mapping studies
Speech intelligibility manipulations have traditionally been used to test predictions about the organization of language function (Scott et al., 2000; Davis and Johnsrude, 2003; Okada et al., 2010). Elevated activity in response to speech-recognition tasks is often observed within the cingulate, anterior insulae, and frontal opercula, which has been interpreted as speech-simulation or premotor processes (Hervais-Adelman et al., 2012), decision-making (Binder et al., 2004; Dehaene-Lambertz et al., 2005), semantic processing (Friederici et al., 2003), top-down selection processes (Rodd et al., 2005; but see Davis et al., 2011), conflict-related processing (Burton et al., 2000; de Zubicaray et al., 2001), and attention allocation or redirection (Hagoort, 2005). In the context of our results and the extant attention literature, elevated cingulo-opercular activity during language tasks may support all of these functions as a domain-general adaptive control network.
One prediction from the adaptive control theoretical framework is that the cingulo-opercular network would exhibit an inverted-U pattern of activity that is related to task difficulty (“convex responses” in the study by Poldrack et al., 2001). For example, Wild et al. (2012) demonstrated that the magnitude of activity in frontal regions during speech recognition depended on attention to the stimuli and was most pronounced when intelligibility was degraded with noise vocoding. Similarly, Zekveld et al. (2006) presented sentences in varying SNRs and observed a nonlinear response in frontal cortex with relatively reduced activity in the lowest and highest SNR conditions. Therefore, the engagement of the cingulo-opercular network during word recognition appears to depend on whether participants can perform the task and require attentional support.
Performance monitoring and adaptive control
A large body of electrophysiology and functional imaging literature demonstrates elevated frontal activity in difficult task conditions (Tregellas et al., 2006; Klein et al., 2007), negative behavioral outcomes or loss of reward (Bush et al., 2002; Nieuwenhuis et al., 2004), and response selection uncertainty (Ullsperger and von Cramon, 2004), which occurs regardless of the sensory stimuli used or type of behavioral response (Duncan and Owen, 2000; Dosenbach et al., 2006; Eckert et al., 2009). These findings are consistent with evidence from animal studies that medial frontal and lateral frontal cortices encode utility to guide adaptive changes in goal-oriented behavior (Rushworth and Behrens, 2008). This premise is also supported by human fMRI evidence that error-related activity is followed by behavioral adjustments (Carter et al., 2000; Kerns et al., 2004; Klein et al., 2007; Danielmeier et al., 2011; e.g., post-error slowing), which has been interpreted as a mechanism to guide performance on the next trial, in part through increased response caution (Danielmeier and Ullsperger, 2011). Importantly, the effects observed in our word-recognition experiment are consistent with findings from visuospatial studies using traditional attention paradigms that implicate cingulo-opercular activity in adaptive control to enhance performance.
To control for a potential trial-order confound (i.e., hard words that elicit increased activity followed by easy words) and SNR effects, we demonstrated that cingulo-opercular activity predicted future performance when (1) analyses were restricted to post-correct trials for which activity was used to predict performance on the next trial and (2) for activity on each trial that was normalized within SNR condition. Participants did not receive feedback and were not asked to report their expected performance on each trial. This is important because participants were likely to be uncertain or had varied confidence in their performance across trials, including when they responded correctly. Uncertainty has been associated with increased cingulo-opercular activity (Grinband et al., 2006) and could therefore explain variation in activity after controlling for performance and SNR.
Medial frontal cortex has been a focus of studies on error, reward, and uncertainty (Dehaene et al., 1994; Carter et al., 2000; Bush et al., 2002; Fiehler et al., 2004), in part because of strong electrophysiological evidence linking performance-related responses to medial frontal cortex (Cohen et al., 2008). Adaptive control results in the current word-recognition experiment and previous visuospatial studies (Kerns et al., 2004; Weissman et al., 2006; Eichele et al., 2008) support the premise that additional frontal regions contribute to adaptive control (Dosenbach et al., 2007). For example, response inhibition in a stop-signal task was impaired by damage to the right inferior frontal gyrus (Aron et al., 2003). Our voxel-based results include multiple frontal regions in which the magnitude of activity was significantly predictive of subsequent word recognition. Furthermore, the trial-level connectivity results showed that word recognition improved after trials in which more cingulo-opercular regions exhibited elevated activity. Although it is possible that medial frontal cortex drives activity throughout the cingulo-opercular network (Ridderinkhof et al., 2004), our results demonstrate that engagement of the entire cingulo-opercular network was associated with optimal word recognition.
Cingulo-opercular network mechanisms for word-recognition benefit
The results of previous studies suggest that the cingulo-opercular network can modulate activity in task-relevant cortex, while suppressing distracting intrinsic or extrinsic information processing. One proposed explanation for the positive effects of cingulo-opercular activity on subsequent word recognition is redirection of attention to a task after a lapse of attention (Weissman et al., 2006; Eichele et al., 2008). This is consistent with the hypothesis that cingulo-opercular network activity is critical for monitoring performance throughout a task (Dosenbach et al., 2006), perhaps in part through suppression of default mode network (DMN) activity (Eichele et al., 2008; Kelly et al., 2008; Sadaghiani et al., 2009). Weissman et al. (2006) and Eichele et al. (2008) observed that the DMN was deactivated to a lesser extent before errors when people made visual congruency judgments. Trial-level DMN activity was not predictive of word recognition in our study, which could reflect the selection of SNR conditions to prevent floor effects. Attention may have been more likely to drift in the study by Weissman et al. (2006), for example, which included a task that appears to have been relatively easier (96.7%) compared with our word-recognition task (+3 dB SNR: 66.1%; +10 dB SNR: 90.9% correct). Word-recognition studies with floor, ceiling, or perhaps fatigue effects may be more likely to demonstrate a relationship between DMN activity and word recognition in healthy young adults.
A related explanation for positive cingulo-opercular network influences on word recognition is the enhancement of auditory cortex sensitivity to target speech stimuli, while suppressing responses to irrelevant information (Weissman et al., 2005; Sadaghiani et al., 2009; Danielmeier et al., 2011), perhaps through interactions with a frontal–parietal attention network (Menon et al., 2001; King et al., 2010). For example, Danielmeier et al. (2011) observed that error-related medial frontal activity was associated with elevated activity within trial-relevant color-object-sensitive cortex and reduced post-error activity within trial-irrelevant motion-sensitive cortex. Although we were limited in the ability to measure cingulo-opercular network and auditory cortex interactions from one trial to the next because of our the sparse sampling design (8.6 s TR), as well as the effect of continuous multi-talker babble on auditory cortex activity, increased functional connectivity between medial frontal cortex and auditory cortex has been observed during speech recognition (Obleser et al., 2007). The elevated auditory cortex response throughout the continuous multi-talker babble (Table 1) may have also limited univariate sensitivity to increased superior temporal activity for more intelligible speech (Scott et al., 2000; Obleser et al., 2008; Harris et al., 2009; Kuchinsky et al., 2012). We predict that trial-to-trial changes in cingulo-opercular network engagement increase auditory cortex sensitivity to target speech stimuli, thereby increasing the likelihood of separating and detecting words from noise.
Conclusion
Trial-by-trial increases in the magnitude and extent of cingulo-opercular activity were associated with an increased likelihood of correct word recognition on the next trial, even after correcting for variability attributable to SNR condition and error. These distinct activity patterns related to SNR, error, and outcome on a subsequent trial are consistent with electrophysiological evidence from rodents for a functional mosaic of medial prefrontal cortex neurons that differentially respond to behavioral choice, outcome, and prospective outcome (Horst and Laubach, 2012). Although cingulo-opercular activity related to error may contribute to post-error adjustments (Weissman et al., 2006; Eichele et al., 2008), our findings suggest that there is a unique pattern of cingulo-opercular activity that provides performance benefit for trials that occur at least 8.6 s later in time.
In summary, engagement of the cingulo-opercular network was not necessary for word recognition in difficult listening conditions but was associated with optimal performance across participants. Our results indicate that the amplitude and the extent of cingulo-opercular network-wide activity can be used to predict when someone is likely to experience speech-recognition difficulty. Thus, cingulo-opercular activity has broad significance for speech recognition in challenging conditions and may partially account for why and when people experience speech-recognition impairments.
Footnotes
This work was supported in part by National Institutes of Health (NIH)/National Institute on Deafness and Other Communication Disorders Grant P50 DC 00422, Medical University of South Carolina Center for Advanced Imaging Research, South Carolina Clinical and Translational Research Institute, NIH/National Center for Research Resources Grant UL1 RR029882. This investigation was conducted in a facility constructed with support from Research Facilities Improvement Program Grant C06 RR14516 from the National Center for Research Resources, NIH. We thank the study participants for assistance with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Mark A. Eckert or Kenneth I. Vaden Jr., Medical University of South Carolina, Department of Otolaryngology–Head and Neck Surgery, Hearing Research Program, 135 Rutledge Avenue, MSC 550, Charleston, SC 29425-5500. eckert{at}musc.edu or vaden{at}musc.edu