Abstract
Working memory enables the temporary storage of relevant information in the service of behavior. Neuroimaging studies have suggested that sensory cortex is involved in maintaining contents in working memory. This raised the question of how sensory regions maintain memory representations during the exposure to distracting stimuli. Multivariate pattern analysis of fMRI signals in visual cortex has shown that the contents of visual working memory could be decoded concurrently with passively viewed distractors. The present fMRI study tested whether this finding extends to auditory working memory and to active distractor processing. We asked participants to memorize the pitch of a target sound and to compare it with a probe sound presented after a 13 s delay period. In separate conditions, we compared a blank delay phase (no distraction) with either passive listening to, or active processing of, an auditory distractor presented throughout the memory delay. Consistent with previous reports, pitch-specific memory information could be decoded in auditory cortex during the delay in trials without distraction. In contrast, decoding of target sounds in early auditory cortex dropped to chance level during both passive and active distraction. This was paralleled by memory performance decrements under distraction. Extending the analyses beyond sensory cortex yielded some evidence for memory content-specific activity in inferior frontal and superior parietal cortex during active distraction. In summary, while our findings question the involvement of early auditory cortex in the maintenance of distractor-resistant working memory contents, further research should elucidate the role of hierarchically higher regions.
SIGNIFICANCE STATEMENT Information about sensory features held in working memory can be read out from hemodynamic activity recorded in human sensory cortices. Moreover, visual cortex can in parallel store visual content and process newly incoming, task-irrelevant visual input. The present study investigated the role of auditory cortex for working memory maintenance under distraction. While memorized sound frequencies could be decoded in auditory cortex in the absence of distraction, auditory distraction during the delay phase impaired memory performance and prevented decoding of information stored in working memory. Apparently, early auditory cortex is not sufficient to represent working memory contents under distraction that impairs performance. However, exploratory analyses indicated that, under distraction, higher-order frontal and parietal regions might contribute to content-specific working memory storage.
Introduction
Working memory enables the temporal storage of a limited amount of information to guide current thought and action. Studies using noninvasive neuroimaging methods, such as fMRI in humans, have identified brain regions subserving the storage of task-relevant information in working memory. Previous fMRI studies have consistently observed sustained activity increases in prefrontal and parietal cortex during the delay phases of working memory tasks (Curtis and Sprague, 2021). Multivariate analysis methods (Kriegeskorte et al., 2006; Haynes, 2015) revealed that the activity in sensory cortex codes content-specific memory signals even in the absence of sustained elevated activity (for overview, see Christophel et al., 2017). Specifically, a multitude of low-level visual features could be decoded from delay-phase activity in visual cortex (e.g., Harrison and Tong, 2009; Serences et al., 2009; Christophel et al., 2012; Emrich et al., 2013; Peters et al., 2015). Likewise, memory information about auditory features could be decoded from the auditory cortex (Linke et al., 2011; Kumar et al., 2016; Uluç et al., 2018; Czoschke et al., 2021). Moreover, content-specific memory signals were also observed in higher-order parietal and frontal brain regions (e.g., Christophel et al., 2012; Ester et al., 2015; Peters et al., 2015; Czoschke et al., 2021).
Considering our constant exposure to sensory input, the question arose which brain regions store task-relevant contents during distracting stimulation. For instance, Bettencourt and Xu (2016) could decode visual orientation-specific memory information only in intraparietal sulcus (IPS) during passive processing of different task-irrelevant distractors, suggesting a crucial role of IPS in working memory storage. In contrast, Rademaker et al. (2019) reconstructed orientations maintained under task-irrelevant distraction from activity in visual cortex in addition to parietal regions. Apparently, visual cortex stores task-relevant contents and processes task-irrelevant visual information concurrently. While these studies had used passive distraction, Hallenbeck et al. (2021) investigated the effects of a distracting visual motion discrimination task on the decoding of spatial locations maintained in working memory. The reconstruction fidelity of memorized positions in visual and parietal cortex was substantially reduced during active distractor processing but recovered after the offset of the distractor. These findings corresponded well to data by Kiyonaga et al. (2017) who found that the decoding of memorized visual categories dropped to baseline under a difficult distractor condition.
Notably, regardless of the effects of distraction on decoding performance, subjects were still able to remember the target feature above chance in all of these neuroimaging studies. However, distraction often reduced memory performance, particularly when it was processed actively. This is in line with behavioral studies showing, for example, that letter categorization during a delay phase impaired memory for orientations (Bae and Luck, 2019). Similarly, Peters et al. (2018, 2019) found that the recall precision for Gabor stimuli dropped steeply after an active task, whereas it was reduced only slightly after a passive task. These findings indicate that active distractor processing during a delay period affects memory performance considerably, regardless of whether or not distractors and memory targets share the same perceptual features.
Together, evidence from neuroimaging studies has shown that mnemonic information is decodable from early visual cortex even when task-irrelevant distractors are presented concurrently. However, it remains unclear whether these findings extend to other sensory modalities. Thus, the first aim of our study was to test whether sound features held in working memory can be decoded from activity in the auditory cortex during the presentation of distracting auditory input. Second, we compared the effects of passive versus active distractor processing on the decodability of auditory memory information in auditory cortex. Finally, we also investigated whether hierarchically higher auditory-related brain regions, including the temporal pole, superior parietal and inferior frontal areas, code distractor-resistant auditory features during the working memory delay.
Materials and Methods
Participants
Fifty-seven adults (33 female, 24 male, 0 diverse; age 18-31 years, mean = 21.64 years, SD = 3 years) participated in a behavioral session, which had two purposes: (1) to test whether the participants' performance in the auditory working memory task met the inclusion criterium for the fMRI experiment; and (2) to practice the working memory task before fMRI scanning. We defined an inclusion criterium to ensure that only subjects participated in five sessions of the fMRI experiment (in total ∼8 h) who were able to perform clearly above chance in the most difficult condition of our auditory working memory experiment (i.e., the active distraction condition). The inclusion criterion was based on the following simulation: we generated a probability distribution of accuracies based on 1 million iterations for a hypothetical subject who does not memorize any information (i.e., who is guessing). In ∼10% of the iterations, the hypothetical subject randomly achieved an accuracy of ≥68.75%. As we considered an α error of 10% as acceptable, we set the inclusion criterion at 68.75% correct responses in those trials of the active distraction condition where memory target item and probe were maximally dissimilar to each other. We selected the easiest trials within the active distraction condition as we wanted to make sure that subjects were able to remember a minimum amount of information about the auditory feature during active distraction. Twenty-nine participants passed this inclusion criterion. Six of these participants dropped out before fMRI sessions. During fMRI sessions, 4 other individuals discontinued their participation. After data acquisition, we had to exclude 3 further participants because of technical problems.
The remaining 16 healthy volunteers (11 female, 5 male, 0 diverse; age 18-28 years, mean = 21.19 years, SD = 3.08 years) took part in the fMRI sessions: 15 subjects completed 5 sessions, 1 subject completed 4 sessions. Each fMRI session included one run of an auditory perception and 6 runs of an auditory working memory experiment. All participants provided their written informed consent and received a remuneration of €10 per hour, except for 2 participants (a coauthor and a research student who had helped with data acquisition). The ethics committee of the Goethe University medical faculty approved the study.
Stimuli and procedure
The stimuli were complex sounds consisting of three harmonics with varying spectral frequencies. This was true for the stimuli in the perception experiment and for both target and distractor sounds in the working memory experiment. Specifically, the stimuli were composed of a combination of three band-passed noises (a fundamental frequency and two harmonics, each bandpass filtered to a bandwidth of 1/10 octave). The fundamental frequency was chosen from 4 stimulus bins with two bins from the small octave (Cis and A) and two bins from the two-line octave (Cis′′ and A′′). The centers of the stimulus bins were 138.591, 220, 554.365, and 880 Hz, respectively. To prevent encoding into long-term memory, the precise frequency of the stimuli varied slightly around the central frequency of the 4 stimulus bins. Specifically, the frequency randomly ranged from a semitone below to a semitone above the bin center in 5 equidistant, logarithmic steps. For example, the exact stimulus frequencies of the Cis bin were as follows: 130.81, 134.65, 138.59, 142.65, and 146.83 Hz. Thus, 20 (4 stimulus bins × 5 steps) unique frequencies characterized the stimuli in the perception experiment, as well as the target and distractor sounds in the working memory experiment.
All auditory stimuli were processed with an external soundcard (Fireface UC, 192 kHz sampling rate, RME) and presented at a comfortable intensity via MRI-compatible, active noise cancellation headphones (OptoActive, Optoacoustics). Stimulus construction and timing were controlled with MATLAB R2012b (The MathWorks) and the Psychophysics Toolbox (Brainard, 1997). Visual instructions were projected onto a screen outside the scanner. The participants could see the screen through a tilted mirror in a dimly lit room. Participants were instructed to maintain central fixation throughout the experimental runs.
Perception experiment
In each trial, participants had to detect volume reductions within a sequence of pulsating sounds with a fixed frequency composition (see Stimuli and procedure) for 6 s. Each stimulus consisted of 10 pulses of 0.3 s duration with increasing and decreasing volume during the first and last 50 ms, followed by a 0.3 s silent period. The volume level of a standard pulse in each trial was constant, but 1, 2, or 3 target pulses could be reduced in volume. The number and temporal position of target pulses were randomly selected. To ensure attentive stimulus engagement, participants had to respond to volume changes by pressing a button. A response was considered correct if the participant responded within 1 s after the onset of a target pulse. After a 0.5 s delay, performance feedback was provided for 0.5 s. The order and the color of the dots provided feedback to the subjects about their performance on each of the volume reductions. The leftmost dot informed about the first volume change, the neighboring one about the second of up to three volume changes, etc. Green and red dots symbolized hits and misses, respectively. False alarm and correct rejection performance was not fed back to the participants. Trials were separated by an intertrial interval of 2, 4, or 7 s.
We chose a Bayesian psychometric staircase procedure to adapt the difficulty of detecting the reduced volume level (Watson and Pelli, 1983). For every subject, Quest estimated the maximum likelihood of a volume reduction threshold corresponding to a specified performance criterion. We set the target hit rate at 75% to keep the effort constant between participants and to obtain a challenging but motivating task. The prior threshold estimate (prior SD) was set at 0.5, the steepness of the Weibull function (β) was 3.5, the false alarm rate (γ) was set at 0, and the lapse rate (δ) at 0.05. The Quest algorithm takes the response pattern of all previous trials into account and updates the volume reduction threshold on a trial-by-trial basis. The staircase procedure was initialized at the first fMRI session and ran continuously for every participant. The average hit rate across all trials was 76.82% (SD = 4.24%), which did not differ significantly from the targeted 75% (t(15) = 1.72, p = 0.11).
Every fMRI session included one run of the perception experiment with 60 trials. We counterbalanced and randomized the stimulus bins, the stimulus steps, and the number of volume reductions within runs. This procedure resulted in 60 unique trials, each occurring once per run.
Working memory experiment
Participants had to memorize the pitch of a single 0.3 s target sound in anticipation of an upcoming two-alternative forced-choice task (Fig. 1a). After a delay period of 12.8 s, a probe sound was presented for 0.3 s. The fundamental frequency of the probe was manipulated in four steps of ±0.007, ±0.015, ±0.025, or ±0.05 log10(Hz) relative to the frequency of the to-be-probed target sound. The sign and the exact difference between target and probe sound were chosen randomly. Participants had to indicate if the frequency of the probe sound was higher or lower than the target sound by rotating a trackball to the right or to the left, respectively. The duration of the response period was 2.9 s. Afterward, participants received visual feedback for 0.5 s via a colored dot located to the right of the fixation circle. Green and red dots indicated correct and incorrect responses, respectively. Trials were separated by an intertrial interval of 2, 4, or 7 s.
Crucially, we manipulated whether distractors were presented during the delay period or not. When distractor sounds were presented, either participants could ignore them (passive distraction) or they had to detect possible variations in their volume level (active distraction). The no distraction condition was characterized by a blank delay period without distractor sounds. In the passive distraction condition, participants were instructed to ignore the pulsating distractor sound presented during the whole delay period (for a similar approach, see Deutsch, 1970). The distractor sound was similar to the stimulus in the perception experiment described above; that is, a pulsating sound sequence with a fixed frequency composition was presented for 10.8 s starting 1 s after the target presentation. The fundamental frequencies of the distractor and target sounds were drawn from the stimulus pool described above (see Stimuli and procedure) and combined in an orthogonal way. Every combination of target and distractor frequency bin was presented once during each run, but every combination of target and distractor sound frequency described above could possibly occur throughout the experiment. In the active distraction condition, the physically identical pulsating distractor sounds were presented as in the passive condition; however, participants were instructed to detect pulses with a reduced volume level. To ensure that participants actively engaged with the distractor throughout this period, they had to respond to 1, 2, or 3 possible volume reductions via button press. Moreover, we implemented the same Quest procedure with the identical initial parameters as for the perception experiment. The staircase procedure of the active distraction task was initialized at the first run and ran continuously across all fMRI sessions. The average hit rate across all trials (mean = 76.9%, SD = 4.12%) was slightly higher than the targeted hit rate 75% (t(15) = 1.85, p = 0.084). The small SD indicated constant task difficulty between participants. For this active distraction task, participants received additional feedback analogously to the feedback provided during the perception task. It consisted of up to three dots, arranged horizontally to the left of the fixation circle with green and red dots indicating detected and missed volume changes, respectively.
In every fMRI session, participants completed six experimental runs of the working memory experiment comprising 16 trials per run. As all trials within a run were from the same condition, participants completed two runs each of the no distraction, passive distraction, and active distraction conditions per fMRI session. The serial position of the conditions was counterbalanced such that each condition had to occur once during the first three and last three runs of the session. The stimulus bins of the target sound and the distractor sound were counterbalanced within runs. Therefore, each of the resulting 16 target-distractor combinations occurred once per run. The stimulus steps and the ITI durations were counterbalanced and randomized within a session.
Data acquisition
We recorded structural and functional MRI data using a 3-Tesla Magnetom Prisma MR scanner (Siemens), located at the Brain Imaging Center Frankfurt with a 64-channel head coil. We collected functional scans during each run of the perception and working memory experiment and one structural image per session. Functional data were whole-brain EPI (72 slices, 2 × 2 × 2 mm resolution, 0 mm gap, FOV = 20.8 cm, TR = 1 s, TE = 35 ms, flip angle = 56°). Furthermore, we recorded 5 EPI volumes before each experimental run with the same parameters but reversed phase encoding direction to correct for distortions. Structural data were high-resolution T1-weighted images (176 sagittal slices, 1 × 1 × 1 mm resolution, FOV = 25.6 × 25.6 cm, TR = 2.8 s, TE = 2.12 ms).
MRI preprocessing
All imaging data were preprocessed using FMRIB Software Library (FSL) (https://fsl.fmrib.ox.ac.uk) and SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12). Functional images within a run were motion-corrected via FSL MCFlirt with a final sinc interpolation stage. We then applied FSL's topup procedure to correct for distortions using the reversed phase-encoding volumes collected just before each functional run. Next, realignment was performed in two steps using SPM. First, we aligned all functional data with the first volume of the functional run that followed directly on the structural scan. Second, all runs were realigned onto a functional mean image. Finally, the anatomic image was coregistered to the functional mean image.
Anatomical ROIs
Based on our recent work (Czoschke et al., 2021) and previous literature (Linke et al., 2011; Kumar et al., 2016; Uluç et al., 2018), we defined the auditory cortex as our main anatomic ROI comprising Heschl's gyrus, planum temporale, and the posterior division of the superior temporal gyrus. To investigate the auditory cortex ROI in greater detail, we also defined Heschl's gyrus, planum temporale, and the posterior division of the superior temporal gyrus as separate ROIs. In an exploratory analysis, we also tested the possibility that distractor-resistant information was stored in higher-order regions known to belong to a putative ventral processing stream including a temporal pole (TP) ROI (Arnott et al., 2005) and an inferior frontal gyrus (IFG) ROI (Czoschke et al., 2021). The TP ROI comprised the anterior divisions of the superior temporal and middle temporal gyrus and the TP. The IFG ROI included the pars triangularis and pars opercularis of the IFG. We also included the superior parietal lobule (SPL) ROI that includes IPS, as this region was suggested to be involved in working memory processing, including auditory working memory (Czoschke et al., 2021). It has been reported repeatedly to represent distractor-resistant visual working memory information (Bettencourt and Xu, 2016; Lorenc et al., 2018; Rademaker et al., 2019).
Anatomical probability maps were taken from the FSL Harvard-Oxford cortical and subcortical structural atlases (Desikan et al., 2006). We transformed anatomic ROIs from standard into native space using SPM segmentation and normalization functions. Specifically, segmentation was applied to generate a forward transformation matrix containing information to warp anatomic images from native space to standard space. Using the inverse of the matrix, we transformed ROIs from standard to native space.
BOLD signal time course
For data visualization, Figure 1c shows blood-oxygen-level-dependent (BOLD) signal time courses time-locked to the onset of stimulation both for the perception and working memory experiments in the auditory cortex ROI. The BOLD signal was calculated as the percent signal change relative to the average of the two TRs preceding each trial. For every participant, we selected a number of voxels taken from a nested cross-validation procedure based on data from the classifiers trained on perception data (see Multivariate analyses) and averaged the BOLD signal time courses across the selected voxels from Trials 2-16 of each run separately for each distraction condition.
Univariate analyses: voxel selection
Voxels responsive to auditory stimulation were selected with a univariate approach in SPM12. First, we set up a GLM, including all runs in the perception experiment. The GLM included one regressor for the stimulation period of all trials, six motion regressors per run, and constant regressors to model the run means. Next, the stimulation period regressor was convolved with a canonical HRF. Then, we contrasted the stimulus period regressor with the implicit baseline and multiplied the resulting whole-brain t maps with each binary ROI map to receive separate t maps for every ROI and participant. After that, specific voxel sets were selected based on the highest positive t values from the ROI- and participant-specific t maps reflecting the n most auditory-responsive voxels. Based on this procedure, we created 30 sets of n voxels for every ROI and participant, with n varying from 100 to a maximum of 3000 in steps of 100 voxels. The maximum number of voxels in a ROI was determined based on the participant with the lowest number of auditory responsive voxels in that ROI. Finally, multivariate analyses were computed for every voxel set separately before entering a nested cross-validation scheme (see Multivariate analyses).
Multivariate analyses
Multivariate analyses were performed on trial-specific t maps derived from functional data and recorded during runs in the perception and working memory experiment. To create these maps, we set up finite impulse response (FIR) models for every run and participant, including a 1/128 Hz high-pass filter and a first-order autoregressive error structure, using SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12). All models included six motion regressors and a constant regressor to model the run mean. For the perception experiment, we computed t maps in two steps. First, we set up first-level FIR models with nine regressors per trial to model the 9 s from stimulation onset. BOLD signals in each voxel were then fitted with this set of FIR regressors separately for every run. Then, in a second step, we computed trial-specific t maps (i.e., perception t maps) by contrasting trial-specific regressors of FIR bins from 3 to 8 s after stimulation onset against the implicit baseline (i.e., the t maps contained trial-specific data averaged across the stimulation-related TRs). We deliberately chose this time window to account for the hemodynamic delay.
Runs in the working memory experiment were processed in two steps. First, we set up first-level FIR models using 21 regressors to model the signal of 21 time points from target presentation onward. BOLD activity in each voxel was then fitted with this set of FIR regressors separately for every run. Second, trial-specific t maps (i.e., working memory t maps) were computed by contrasting trial-specific regressors of 8-14 s after target presentation against the implicit baseline to cover working memory retention-related activity that was not contaminated by stimulus encoding and retrieval. Again, the t maps contained voxel- and trial-specific data but averaged across the delay-related TRs.
The t maps from the perception and working memory experiment formed the input for ROI-based multivariate pattern analyses. For this purpose, we grouped trials with target sounds taken from the Cis′ and A′ bins (high octave) and stimuli taken from the lower octave Cis and A bins (low octave). Then, we implemented support vector machines to classify whether participants memorized a high or low octave stimulus in a given trial. To ask whether working memory representations are stored in a format similar to perceptual coding, we trained classifiers on the independent perception t maps and tested them on the working memory t maps. This procedure followed previous work on the effects of sensory distractors on working memory representations (Rademaker et al., 2019). A somewhat similar approach was taken in a recent auditory working memory study where classifiers were trained on the probe phase and tested on the maintenance phase (Lim et al., 2022). We repeated the decoding procedure for every participant, ROI, distraction condition, and the up to 30 voxel counts (described above, see Univariate analyses: voxel selection). The resulting classification accuracies were then entered into a nested cross-validation procedure to select optimal ROI sizes and avoid overfitting at the same time (Christophel et al., 2018).
Previous studies have found that content-specific memory information can also be decoded from activity patterns in the visual cortex (e.g., Harrison and Tong, 2009; Serences et al., 2009; Christophel et al., 2012; Emrich et al., 2013; Peters et al., 2015) and in the auditory cortex (Linke et al., 2011; Kumar et al., 2016; Uluç et al., 2018; Czoschke et al., 2021) when the classifiers were not trained on data from a separate perception experiment but trained (and tested) on data the from the delay period in the memory experiment itself using a leave-one-out cross-validation approach. While an analysis approach using perception-trained classifiers captures memory information that is similar to the activity measured in the perception task, the analysis approach that uses memory-delay data itself to train a classifier capitalizes on any signal differentiating remembered sounds during the delay (Iamshchinina et al., 2021). Thus, we trained and tested classifiers on separate data from the working memory experiment by using a leave-one-session-out approach (Czoschke et al., 2021). Specifically, the classifiers were trained on working memory t maps from all but one session and tested on working memory t maps from the remaining session. The procedure was iterated until every session was used for testing. After that, the decoding accuracies of each iteration were averaged. This approach has the advantage that the training and test sets are temporally independent, which reduces the risk of overfitting.
The previous analyses were based on a priori defined ROIs. However, it is possible that distraction-resistant memory information is coded in fMRI activity patterns outside these regions. For this aim, we used a whole-brain searchlight analysis (Kriegeskorte, 2006; Allefeld and Haynes, 2014; Christophel, 2017; Erhart et al., 2021). Specifically, we applied a support vector machine classification procedure, which was trained on perception data and tested on memory data, to a set of voxels within a spherical mask with a radius of 5 voxels (i.e., 515 voxels). The resulting decoding accuracy was then attributed to the central voxel of the sphere, and this procedure was repeated for every voxel in the brain. Decoding accuracy maps for each participant and distraction condition were then normalized using unified segmentation and smoothed with a Gaussian kernel of 4 mm FWHM. Finally, group-level t maps for each condition were calculated using SPM's built-in procedure using voxels with above-chance level decoding accuracy (after correction for the chance level, i.e., 50%).
Statistical analyses
For the analysis of the behavioral data, trials without a valid response were removed, accounting for 0.35% of all trials (no distraction: 0.47%; passive distraction: 0.47%; active distraction: 0.12%). Then, each participant's mean accuracy was calculated separately for the distraction conditions. To test whether distractor presence influenced behavioral accuracy, we computed a repeated-measures ANOVA with the factor distraction condition. Post hoc paired t tests were calculated for significant ANOVA effects to reveal the underlying differences between the individual conditions.
For all fMRI decoding analyses, we tested for above-chance decoding performance by calculating sign permutation tests for every ROI, distraction condition, and decoding approach: the null distribution of group means (10,000 iterations) was generated by subtracting the chance level (0.5) from individual accuracies. Then, we randomly inverted the sign of the resulting differences and computed the group means. Finally, we calculated the proportion of simulated group means equal or larger than the empirical group mean of the accuracies and reported this proportion as the p value. An α level of 0.05 was set with no correction of multiple comparisons across ROIs.
In addition to the permutation analyses, we computed one-tailed Bayesian t tests (Rouder et al., 2009) against chance level decoding of 0.5. We estimated Bayes factors (BF10) for every ROI, distraction condition, and decoding approach. Bayes factors indicate the likelihood of a model with the one-tailed informed alternative hypothesis compared with a model with the null hypothesis given the data. Thus, Bayes factors with BF10 > 1 indicate that a one-tailed informed alternative hypothesis model is more likely than a model with the null hypothesis given the data. Conversely, Bayes factors with BF10 < 1 indicate the opposite. We interpreted BF10 > 3 as evidence for the alternative hypothesis, BF10 < 1/3 as evidence for the null hypothesis, and 1/3 < BF10 < 3 as inconclusive evidence (Kass and Raftery, 1995). All Bayesian t tests were performed in JASP (JASP Team, 2022) with default settings, including a zero-centered Cauchy prior with a width of r = 0.707.
To directly test whether decoding accuracies differed between distraction conditions, we computed repeated-measures ANOVAs with the factor distraction condition. Following up on significant main effects, post hoc paired t tests as well as paired Bayesian t tests were calculated to assess differences between individual conditions. These analyses were performed for every ROI and decoding approach separately.
For the whole-brain searchlight analysis, group-level t maps for each condition were corrected for multiple comparisons using the cluster-size thresholding method (Forman et al., 1995; Goebel et al., 2006) implemented in BrainVoyager. Specifically, for each statistical map, the uncorrected voxel level threshold was set at p < 0.001 (uncorrected) and then submitted to a whole-brain correction criterion based on the estimate of the spatial smoothness of the map and using Monte Carlo simulations for estimating cluster-level false-positive rates. After 1000 iterations, the minimum cluster-size that yielded a cluster-level false-positive rate of 5% was used to threshold the statistical map, thus resulting in a final map with p < 0.05, corrected for multiple comparisons.
Results
Behavioral data
During the fMRI scanning sessions, all included participants responded correctly in 77.78% of the working memory trials (SD = 7.47%, range = 65.63%-92.50%). Importantly, every participant performed above chance both in trials without (mean = 83.18%, SD = 8.06%, range = 66.87%-95.63%) and with distractors (passive distraction: mean = 76.80%, SD = 8.02%, range = 64.33%-92.50%; active distraction: mean = 74.20%, SD = 7.85%, range = 63.75%-89.38%), demonstrating that participants were able to maintain some information even under distraction (Fig 1b). However, the presentation of a distractor during the delay period had a detrimental effect on the participants' accuracy in the working memory experiment (F(2,30) = 24.26, p < 0.0001). Participants responded more accurately in trials without distraction compared with trials with passive (t(15) = 5.65, p < 0.0001) or active distraction (t(15) = 6.37, p < 0.0001). Active distractor processing impaired working memory performance even more than passive distraction (t(15) = 2.20, p = 0.04).
Auditory working memory task, behavioral data, and auditory cortex blood-oxygen-level-dependent (BOLD) responses. a, Participants were asked to memorize the pitch of a target sound for 12.8 s. During the delay, participants encountered no sound (no distraction), participants listened to a task-irrelevant pulsating sound with some of the pulses showing a volume change (passive distraction), or had to detect these volume changes by pressing a mouse button (active distraction). At the end of a trial, participants judged whether a pitch of a similar probe sound had a higher or lower frequency as the memorized sound. b, Behavioral data of the working memory experiment showed better memory performance for the no distractor condition than for the passive and active distracting conditions. Performance also differed between active and passive distracting conditions. Colored dots and light gray horizontal lines indicate the values of individual participants. Black dots represent the condition mean. Vertical gray bars represent the interquartile range. Shading represents the density trace. Dashed gray line indicates the guess rate. Significant differences between conditions: *p < 0.05; ***p < 0.001 (post hoc paired t tests). c, BOLD responses in the auditory cortex ROI averaged across all trials in the working memory experiment, separately for the no, passive, and active distraction conditions (left) and in the perception experiment (right). Bold lines indicate the BOLD responses averaged across subjects. Shaded error areas represent ±1 within-subject SEM. Gray rectangles represent periods of sample, distraction and probe in the working memory task, and the stimulus presentation in the perception experiment.
Decoding of sensory information from auditory cortex
First, we used fMRI signals in response to sensory stimulation of the independent perception experiment to train classifiers distinguishing between high- and low-octave sounds. We then tested these classifiers on data from the sensory distractor stimulation during the delay period in the working memory experiment. We found that the distractor sound frequency could be decoded from fMRI activity patterns in the auditory cortex ROI with high accuracy in trials with both passive distraction (mean = 88.68%, SD = 6.24%, p < 0.0001, BF10 = 7.48 × 1010) and active distraction (mean = 89.19%, SD = 4.72%, p < 0.0001, BF10 = 4.17 × 1012). This cross-decoding finding indicates that our classifiers were capable of reliably discriminating between activity patterns in the auditory cortex resulting from stimulation with sounds of different frequencies.
Decoding of working memory information from auditory cortex
Next, we tested whether multivoxel fMRI activity patterns in the auditory cortex ROI carried sound-specific working memory information. For this aim, we used the same perception-trained classification approach and tested it on memory data from the delay period in the absence of distracting input. Figure 2 depicts the results of this and all remaining working memory decoding analyses. Table 1 shows descriptive and inferential statistics for the decoding results that we describe below but are not stated in the text. In line with previous findings (Linke et al., 2011; Kumar et al., 2016; Uluç et al., 2018; Czoschke et al., 2021), we observed above-chance classifier performance, indicating that the auditory cortex carries stimulus-specific working memory information in the absence of distraction.
Working memory decoding results across ROIs and distraction conditions
Decoding accuracy of a sample sound retained in auditory working memory. Decoding accuracies are shown for the pitch of a memorized sound during the last 6 s of the delay phase (for details, see Materials and Methods). The left column contains a rendered representation of the human brain depicting the ROI corresponding to the adjacent result plots. The remaining columns represent decoding results when classifiers were trained on data from the perception experiment (middle column) or the working memory experiment (right column). We found significant decoding in the auditory cortex and TP ROI in the absence of distraction. Exploratory analyses yielded significant above-chance decoding in the inferior frontal and superior parietal ROIs under active distraction. Colored dots represent the values of individual participants. Black and dark gray dots represent the group mean. Vertical gray bars represent the interquartile range. Shading represents the density trace. Dashed gray lines indicate the chance level of the classification process. Above-chance decoding as supported by sign permutation tests: *p < 0.05; **p < 0.01.
Regarding our first research question, that is, whether sound features held in working memory can be decoded also from activity in the auditory cortex during the presentation of distracting auditory input, we tested the perception-trained classifiers on memory data from the delay periods of the passive and active distraction conditions. Decoding accuracies dropped to chance level in trials with distractors, indicating that the auditory cortex did not carry memory information during distraction. Bayesian analyses supported the null finding in the active distraction condition, but did not yield conclusive support for the null result in the passive distraction condition. Moreover, decoding accuracy in both distraction conditions was significantly reduced compared with the no distraction condition, but Bayesian analyses only conclusively supported such a reduction during active distraction (F(2,30) = 8.14, p = 0.002; no distraction vs passive distraction: t(15) = 2.47, p = 0.026, BF10 = 2.52; no distraction vs active distraction: t(15) = 3.56, p = 0.003, BF10 = 15.58).
While we failed to find positive evidence for memory coding in either distraction condition, we tested our second research question, that is, whether there was a difference in decoding of auditory memory information between passive versus active distractor processing in auditory cortex. The comparison between the decoding accuracy in the passive versus active distraction conditions showed no difference between conditions (t(15) = 1.62, p = 0.125). However, Bayesian analysis did not yield conclusive evidence for this finding (BF10 = 0.76).
Decoding of working memory information from auditory cortex using alternative procedures
Applying perception-trained classifiers to the data from the working memory experiment revealed that auditory cortex carried sound information during the memory delay in the absence of distracting auditory input. To evaluate whether our findings were because of our a priori planned analysis approach using perception-trained classifiers, we retested our two research questions using classifiers trained on data from the working memory experiment itself using a leave-one-out cross validation approach. Again, we observed above-chance decoding from fMRI activity patterns in the auditory cortex ROI in trials without distraction, but decoding dropped in the presence of distractors (F(2,30) = 10.05, p = 0.0005; no distraction vs passive distraction: t(15) = 3.48, p = 0.003, BF10 = 13.7; no distraction vs active distraction: t(15) = 3.93, p = 0.001, BF10 = 29.9). Decoding accuracies in trials with passive and active distractor processing were indistinguishable from chance level and did not differ from each other (t(15) = 0.2, p = 0.84, BF10 = 0.26).
Our auditory cortex ROI consisted of several anatomic areas that are thought to constitute different levels of auditory processing (Hall et al., 2003; Altmann et al., 2010). We thus asked whether the anatomic subdivisions of our auditory cortex ROI, including Heschl's gyrus, planum temporale, and the posterior division of the superior temporal gyrus carried stimulus-specific memory information that might have been masked when analyzing the auditory cortex ROI.
When classifiers were trained on data from the independent perception task, Heschl's gyrus and the posterior division of the superior temporal gyrus showed above-chance memory decoding in the absence of distraction. However, decoding under distraction was significantly reduced both in Heschl's gyrus (F(2,30) = 5.77, p = 0.008; no distraction vs passive distraction: t(15) = 2.44, p = 0.03; no distraction vs active distraction: t(15) = 2.62, p = 0.02) and in the posterior division of the superior temporal gyrus (F(2,30) = 9.92, p = 0.0005; no distraction vs passive distraction: t(15) = 3.2, p = 0.006; no distraction vs active distraction: t(15) = 4.18, p = 0.0008). In both sub-ROIs, decoding accuracies dropped to chance level under both passive and active distraction. Decoding from fMRI activity patterns in the planum temporale did not yield conclusive above-chance decoding in any distraction condition. Decoding accuracies did not conclusively differ between the passive and active distraction condition in any of the ROIs (all t(15) < 1.53, all p > 0.14, all BF10 < 0.68).
Training classifiers on memory-delay data yielded above-chance decoding in the planum temporale in the absence of distraction. There was a trend for a reduction of decoding in trials with distraction (F(2,30) = 3.07, p = 0.06; no distraction vs passive distraction: t(15) = 2.5, p = 0.02; no distraction vs active distraction: t(15) = 1.54, p = 0.14) with both distraction conditions not exceeding the chance level conclusively. Decoding accuracies did not differ between trials with passive and active distraction, either (t(15) < 0.77, p = 0.45, BF10 = 0.33). Finally, we did not observe conclusive above-chance decoding in any distraction condition from fMRI activity patterns in Heschl's gyrus and the posterior division of the superior temporal gyrus.
Decoding of memory information from higher-level brain regions
As we found no significant decoding under distraction within the auditory cortex, we also tested the possibility that distractor-resistant information was stored in higher-order regions, including a TP ROI, an IFG ROI, and an SPL ROI. For this exploratory analysis, we applied the same decoding approaches with classifier training on sensory data and training on memory data to these ROIs.
Temporal pole ROI
Classifiers trained on independent sensory data yielded strong support for above-chance decoding in the TP ROI in the absence of distraction. Decoding in trials with passive and active distraction did not conclusively exceed the chance level, whereby decoding accuracies did not differ significantly between distraction conditions (F(2,30) = 2.4, p = 0.11). When we trained classifiers on memory-delay data, we did not observe above-chance decoding in any distraction condition.
Inferior frontal gyrus ROI
In the IFG ROI, classifiers trained on independent data from the perception task revealed above-chance working memory decoding in trials with active distraction. In trials without or with passive distraction, decoding accuracies did not exceed chance-level decoding. The decoding accuracy for the active distraction condition was significantly higher compared with both remaining conditions (F(2,30) = 9.85, p = 0.0005; active distraction vs no distraction: t(15) = 3.46, p = 0.003, BF10 = 13.08; active distraction vs passive distraction: t(15) = 5.26, p = 0.0001, BF10 = 295.28). The decoding accuracies of classifiers trained on data from the memory experiment pointed in a similar direction. We observed above-chance decoding in trials with active distraction, but not in trials with passive distraction or without distraction. However, decoding accuracies did not significantly differ between distraction conditions (F(2,30) = 2.03, p = 0.15).
Superior parietal lobule ROI
We did not observe above-chance decoding from SPL fMRI activity patterns in trials without distraction or passive distraction. Under active distraction, stimulus-specific information was decoded with above-chance accuracy, but Bayesian analyses suggested an inconclusive result. However, the decoding accuracy for the active distraction condition was significantly higher compared with the passive distraction condition (F(2,30) = 5.08, p = 0.013; active distraction vs passive distraction: t(15) = 3.28, p = 0.005, BF10 = 9.66) but did not differ from decoding in trials without distraction (t(15) = 1.35, p = 0.2, BF10 = 0.551). Classifiers trained on memory-delay data did not yield above-chance decoding in the SPL ROI in any distraction condition.
To summarize, we found multiple regions representing auditory working memory information in the absence of distraction, including the auditory cortex, its subdivisions, and the TP. However, none of these regions showed above-chance decoding under distraction. In contrast, the IFG and, to some extent, the superior parietal lobule, showed above-chance decoding in the active distraction condition, but not in trials without or with passive distraction.
Decoding of memory information with a whole-brain searchlight approach
We also investigated the possibility of memory storage in brain regions beyond the a priori defined ROIs using a whole-brain searchlight analysis. In contrast to the ROI approach, a searchlight analysis does not depend on prior assumptions regarding brain regions involved in memory storage. However, it has a limited sensitivity for detecting above-chance decoding performance because of the limited size of the searchlight sphere, the absence of functional voxel selection, and the need for rigorous correction for multiple comparisons. Nevertheless, the searchlight analysis revealed above-chance decoding in the left STG (p < 0.05 corrected; cluster size: 136 voxels; coordinates: −60, −39, 18 in MNI coordinates) in the absence of distraction. This result corresponds well to our ROI-based findings, showing that voxels in the auditory cortex ROI coded memory content in the absence of distraction. There was also a cluster in the right STG, which, however, did not reach significance after the correction for multiple comparisons. Moreover, when subjects were presented with passive or active distraction, we did not find evidence for above-chance memory decoding in any brain region.
Discussion
The present study used fMRI multivariate pattern analysis to assess the impact of distraction on the working memory processing of abstract harmonic sounds in auditory cortex. In separate conditions, there was either no distraction during the delay phase, or acoustically similar distractor stimuli were presented that either could be ignored or had to be processed actively. Distractors led to accuracy decrements that were more pronounced under active than passive distraction. Hemodynamic signal pattern analysis showed that the contents of auditory working memory were decodable in auditory cortex in the absence of distraction. In contrast, both passive and active distraction prevented the decoding of memorized sounds. Exploratory analyses in selected regions outside the auditory cortex showed working memory decoding during the active distraction condition in IFG and SPL for classifiers trained on sensory data. For training on memory data, this finding held up in IFG only. A whole-brain searchlight analysis did not reveal distractor-resistant coding in any further brain regions. This suggests that, while signal patterns in early auditory regions are dominated by sensory interference, there are hints for a representation of working memory contents in regions further along the auditory processing hierarchy.
The decodability of memory contents in trials without distraction supported an involvement of auditory cortex in storing pitch memoranda (Linke et al., 2011; Kumar et al., 2016; Uluç et al., 2018; Czoschke et al., 2021). This finding thus adds to the growing body of research favoring the sensory recruitment hypothesis (Pasternak and Greenlee, 2005; Sreenivasan et al., 2014; D'Esposito and Postle, 2015; Serences, 2016). A common neuronal substrate underlying both perception and working memory was further supported by studies showing that visual cortex coded memory content during concurrent visual stimulation that was processed either passively (Bettencourt and Xu, 2016, their Experiment 3; Lorenc et al., 2018; Rademaker et al., 2019; but see Bettencourt and Xu, 2016) or actively (Derrfuss et al., 2017; Kiyonaga et al., 2017; Hallenbeck et al., 2021). In contrast, we could not decode the memorized content in auditory cortex during distraction. This null effect was consistently observed across different analyses, including Bayesian ones. These results suggest that, in contrast to visual cortex, auditory cortex may not represent sensory and mnemonic information concurrently.
The present auditory distraction impaired working memory performance in both active and passive conditions. Notably, almost half of the initial participants did not pass the a priori defined exclusion criterion in the behavioral screening sessions with active distractor processing. This means that even high performers showed a clear reduction in their memory performance during both distraction conditions in parallel with the decoding impairment. This finding is compatible with studies in which visual distractors produced detrimental effects on the behavioral level, particularly when they were similar to target stimuli (Duncan and Humphreys, 1989; Folk et al., 1992; Kiyonaga and Egner, 2016; see also Wöstmann et al., 2022). Similarly, the presence of distractors can result in reduced, biased, or baseline level working memory decoding (Derrfuss et al., 2017; Lorenc et al., 2018; Rademaker et al., 2019). For example, Hallenbeck et al. (2021) found that the active distractor processing affected both reaction times and accuracy measures alongside reduced fidelity of working memory reconstructions from fMRI patterns in visual cortices. Together, these results indicate a close relationship between distraction-related memory performance and decodability of mnemonic information in sensory cortex: The decodability of a memory representation can vanish when new input presented during the memory delay successfully distracts attention from the memoranda. A strong effect of an attentionally demanding distractor task has also been demonstrated by Kiyonaga et al. (2017). They found a drop in the decoding accuracy of the memory content, although visual stimuli for the distracting and the memory task came from different visual categories and although memory performance in the framing working memory task remained unaffected even under the most demanding distractor condition.
Our results of decoding memory information in auditory cortex in situations without distraction and its failure in situations with permanent distraction are in keeping with previous single-cell recordings in the nonhuman primate brain. This work has shown that spiking activity in the auditory cortex during sample sound presentation continued during the delay period, thus bridging the temporal gap between the sample and test. However, this activity dropped when distracting stimuli appeared (Scott et al., 2014). Similarly, monkeys also showed a strong reduction in auditory working memory performance as soon as distractors were presented during the memory delay (Artchakov et al., 2009; Scott et al., 2014). Interestingly, Artchakov et al. (2009) found that some neurons in monkey PFC seemed to compensate for the adverse effects of distraction in an auditory spatial working memory task by creating a spatially tuned neuronal activity only when the task became more demanding as a consequence of distraction. In human participants, Koelsch et al. (2009) reported an increased activation of the inferior frontal cortex under articulatory suppression during an auditory working memory task, possibly reflecting the recruitment of additional storage components when verbal rehearsal was not available. Here we found above-chance decoding of auditory stimuli in IFG during the most demanding condition with active distractor processing. Inferior frontal cortex has been previously described as part of the putative auditory ventral stream specialized in the processing of sound patterns in humans (Arnott et al., 2005) and has been involved in the maintenance of pitch information in working memory (Kumar et al., 2016; Czoschke et al., 2021). In addition to the IFG, we also explored whether parietal regions carried memory information about target sounds under distraction. This is because the IPS has been shown to be resilient to distraction in visual working memory studies (Bettencourt and Xu, 2016; Lorenc et al., 2018; Rademaker et al., 2019). Also, we have decoded auditory working memory content from activity in an SPL ROI, including IPS (Czoschke et al., 2021). For SPL, we found above-chance decoding during active distraction. However, this was only true for classifiers trained on sensory data. Our observation of distractor-resistant coding in SPL and IFG fits well with previous work suggesting that working memory storage is distributed across the brain (Christophel et al., 2017), with more abstract and action-oriented formats in hierarchically higher brain regions, and that parallel coding supports distractor-resistant working memory (Lorenc et al., 2021). However, given that the present findings were obtained from exploratory analyses without corrections for multiple comparisons, we would only very cautiously interpret the role of these hierarchically higher regions as auditory working memory storage regions showing resilience to distraction. Further research seems warranted to assess the robustness of the present results. This is even more important as the above-chance decoding was restricted to the active distraction condition.
We would also interpret our “null finding” concerning distractor-resistant activity in the auditory cortex with caution. As the absence of evidence is not evidence of absence, we cannot exclude the possibility that our finding might result from insufficient power or from our choice of stimuli and distractors. This needs to be stressed, although the present sample size (16) and trial number (5 separate fMRI sessions with 96 trials each) were higher than in previous working memory decoding studies and although we kept our stimuli as simple as possible by using a classification procedure that only discriminated between two simple pitch categories, which resulted in a near-perfect decodability of distractor stimuli in auditory cortex. Additionally, it remains unclear whether the lack of working memory decoding in the auditory cortex is because of the suppression of activity patterns by the distractor, or whether distraction results in a permanent disruption of working memory representations. Previous research by Hallenbeck et al. (2021) has shown that working memory information can be regained from visual cortex after the offset of a distractor. Further studies could explore whether a similar pattern can be observed in the auditory domain.
Another possible interpretation is that our “null finding” indicates that memory contents under distraction are stored in the auditory cortex in an “activity-silent” or “offline” state (Stokes, 2015) that is not detectable with fMRI. An activity-silent storage might protect working memory representations from distraction (Lorenc et al., 2021). The decrease in the recall precision of working memory items in behavioral studies could result from a qualitative change of their memory state after distraction (Peters et al., 2018, 2019). Similarly, memory contents might be protected from distraction via rotational remapping into a different representational subspace (Yu et al., 2020; Lorenc et al., 2021). It is possible that our binary decoding approach could not capture such remapping of auditory representations.
In conclusion, distractors with a high similarity to memorized sounds presented throughout the delay phase of a working memory experiment led to a substantial decrease in behavioral performance. While participants were still able to perform the working memory task under distraction, memorandum-specific signals were absent in auditory cortex. Exploring auditory working memory regions beyond early auditory cortex yielded tentative evidence for memory content-specific signals in inferior frontal and superior parietal cortex. This calls for further research to assess the role of these regions for distractor-resilient storage in auditory working memory.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft Grant BL 931/4-1 to C.B. and Grant KA 1493/7-1 to J.K. We thank Shirin Hagner, Max Dosch, and Hannah Schröder for help in data acquisition.
The authors declare no competing financial interests.
- Correspondence should be addressed to Christoph Bledowski at bledowski{at}em.uni-frankfurt.de