The brain basis for auditory working memory, the process of actively maintaining sounds in memory over short periods of time, is controversial. Using functional magnetic resonance imaging in human participants, we demonstrate that the maintenance of single tones in memory is associated with activation in auditory cortex. In addition, sustained activation was observed in hippocampus and inferior frontal gyrus. Multivoxel pattern analysis showed that patterns of activity in auditory cortex and left inferior frontal gyrus distinguished the tone that was maintained in memory. Functional connectivity during maintenance was demonstrated between auditory cortex and both the hippocampus and inferior frontal cortex. The data support a system for auditory working memory based on the maintenance of sound-specific representations in auditory cortex by projections from higher-order areas, including the hippocampus and frontal cortex.
SIGNIFICANCE STATEMENT In this work, we demonstrate a system for maintaining sound in working memory based on activity in auditory cortex, hippocampus, and frontal cortex, and functional connectivity among them. Specifically, our work makes three advances from the previous work. First, we robustly demonstrate hippocampal involvement in all phases of auditory working memory (encoding, maintenance, and retrieval): the role of hippocampus in working memory is controversial. Second, using a pattern classification technique, we show that activity in the auditory cortex and inferior frontal gyrus is specific to the maintained tones in working memory. Third, we show long-range connectivity of auditory cortex to hippocampus and frontal cortex, which may be responsible for keeping such representations active during working memory maintenance.
This work concerns the neural bases for human auditory working memory (AWM), the process of keeping sounds in mind for short periods of time when the sounds are no longer present in the environment.
A first key question is the role of early auditory areas and the nature of the brain activity that supports AWM maintenance. In contrast to a number of human neuroimaging studies in the visual domain showing the involvement of early visual areas in working memory (Pasternak and Greenlee, 2005; Postle, 2006; for review, see D'Esposito and Postle, 2015), evidence for the involvement of auditory cortex in AWM is both limited and conflicting. In a study that required subjects to maintain two tones in AWM, Linke et al. (2011) observed suppression of fMRI activity in auditory cortex. However, in a more recent study Linke and Cusack (2015) showed increased fMRI activity when subjects maintained a single complex sound in memory. Two other studies (Zatorre et al., 1994; Gaab et al., 2003) examined brain activity corresponding to the comparison of two tones in a sequence with interfering sounds in between. While one study using fMRI (Gaab et al., 2003) showed increased activity in auditory cortex, the other study using PET (Zatorre et al., 1994) showed decreased regional cerebral blood flow in auditory cortex. Furthermore, the paradigm used in those two studies does not permit a differentiation of the encoding, maintenance, or retrieval phases of WM. The current study assessed the fundamental bases for encoding, maintenance, and retrieval of single tones in AWM.
A second question relates to the nature and specificity of representations in AWM. Conventionally, sustained activity observed in a given brain area during the maintenance period was regarded as a neural basis of WM. However, a number of recent studies (Harrison and Tong, 2009; Serences et al., 2009; Riggall and Postle, 2012) have shown that although a brain region may not show elevated activity during the delay period, patterns of subthreshold activity within the same area may nevertheless contain information about the remembered stimulus. Moreover, sustained activity does not imply that the activity is related to WM per se (e.g., it may be related to sustained attention). In this study, we assessed whether representations during AWM are specific to the maintained tone, both in auditory cortex and in higher-order areas, including frontal cortex.
A third question concerns how representations in auditory cortex are kept active during the maintenance period. A consistent observation in WM studies in other sensory modalities has been sustained activation during maintenance in multiple brain areas, including frontal and parietal regions (Salazar et al., 2012). Interaction between sensory cortex and these areas is thought to support active sensory representations. Although some evidence for this hypothesis for humans is available in the visual domain (Gazzaley et al., 2004), we are not aware of any such evidence in the auditory domain. In this study, we measured functional connectivity between auditory cortex and other areas to delineate the functional network underlying AWM.
Last, we were interested in the controversial role of the hippocampus in WM suggested by certain visual studies (Ranganath and D'Esposito, 2001; Axmacher et al., 2007, 2010a,b), as opposed to its better established role in long-term memory (LTM). In this study, in addition to showing robust involvement of hippocampus in all phases (encoding, maintenance, and retrieval) of AWM, we go beyond the simple demonstration of hippocampal activity and evaluate the relationship between this and behavior that supports a model of hippocampal involvement in WM based on an overlap with LTM.
To address these questions, we measured the fMRI BOLD response while subjects, after listening to a pair of tones, were then cued to maintain either a low or a high tone for 16 s. Using univariate analysis on the fMRI data, we show sustained maintenance activity in the auditory cortex, hippocampus, and frontal areas, including the inferior frontal gyrus (IFG). Using multivoxel pattern analysis (MVPA) on the fMRI data (Chadwick et al., 2012; Kumar et al., 2014b), we show that activity patterns in auditory cortex and the left IFG (LIFG) are specific to information kept in WM. Using functional connectivity analysis (Whitfield-Gabrieli and Nieto-Castanon, 2012), we demonstrate long-range connectivity between auditory cortex and both the hippocampus and IFG. The data support a scheme based on content-specific representations in auditory cortex during AWM that are kept active during the maintenance period by long-range functional connections from the hippocampus and IFG.
Materials and Methods
Seventeen healthy adults (8 females; mean age, 29.5 years; age range, 19–52 years) participated in this study after providing written informed consent to undergo procedures approved by the local ethics committee. Participants were paid for their participation and were selected based on the following criteria: normal hearing and no musical training. Data from one subject could not be used because of technical problems in the sound delivery in the MRI scanner.
Stimuli consisted of pure tones, logarithmically sampled at random either from a low range of 200–300 Hz or a high range of 2500–3000 Hz. A set of three tones was chosen randomly in each range. A different set of frequencies was produced for each session. Auditory stimuli were generated at a sampling rate of 44.1 kHz in Matlab version R2013b (MathWorks) and presented using Cogent (http://www.vislab.ucl.ac.uk). Sounds were delivered binaurally through MRI-compatible in-ear headphones [model S14, Sensimetrics (http://www.sens.com/s14/)] at an intensity of ∼70 dBA.
Procedure and experimental design.
Outside the scanner, subjects performed a pitch discrimination task (one block of 16 trials) to ensure that their pitch discrimination thresholds were within the normal range. This was followed by a practice block of the WM task (Fig. 1), which was then performed inside the scanner.
Inside the scanner, subjects were scanned in four runs each consisting of 24 trials. The start of a given trial of each memory block was indicated by text instructions appearing onscreen (“Sounds to start soon”). This message was presented for 1 s and was followed by the presentation of two pure tones picked randomly from each of the two categories (encoding). There was always a low tone and a high tone contained within each set of sounds, and the order in which they appeared was randomized. Therefore, the low tone could be presented at the first or second serial position and vice versa. The stimulus duration was 0.5 s with an interstimulus interval of 1 s. Next, 0.6 s after the second tone, another message appeared on the screen for 2.5 s, indicating which tone subjects had to actively maintain in mind (Cue). Whether tone 1 or tone 2 had to be held in mind was randomized. In each block of 24 trials, half of them required maintenance of a low tone. After subjects knew which tone to maintain, they had to actively keep this particular tone in mind for 16 s, while maintaining gaze on a fixation cross at the center of the screen. Finally, a probe tone was presented for 0.5 s. Participants had to decide whether this tone was the same or different from the one held in mind. The message “Same or different?” appeared onscreen for 1 s followed by a button press response. Subjects were instructed to make a response as quickly as possible without making mistakes within a time window of 3 s. Each trial was followed by a resting period of variable length, jittered at 8–12 s. For trials with a different probe than the test tone, a ±10% change in frequency was made. Equal numbers of trials with and without change were presented in a block.
Functional imaging/MRI data collection.
All imaging data were collected on a Siemens 3 tesla Quattro head-only MRI scanner (http://www.siemens.com) at the Wellcome Trust Functional Imaging Centre. MRI images were acquired continuously (TR, 1.1 s; TE, 15.85 ms; flip angle, 15°; 3D sequence; whole brain acquisition; nominal flip angle; isotropic voxel size, 2 mm; matrix size, 96 × 96). A total of 725 volumes were acquired per run. After the fMRI runs, a high-resolution (1 × 1 × 1 mm) T1-weighted structural MRI scan was acquired for each subject.
MRI data were analyzed using SPM8 (http://www.fil.ion.ucl.ac.uk/spm). At the preprocessing stage, images were realigned to the first volume, then normalized to stereotactic space and finally smoothed with a 3D Gaussian kernel with full-width at half-maximum of 6 mm. After preprocessing, a general linear model (GLM) was used for statistical analysis. The design matrix consisted of boxcar functions encoding the onsets and durations of different events convolved with a hemodynamic response function.
The following events were included in the design matrix and modeled as three regressors: (1) an encoding period modeled as a single block of 3 s (1 s alert plus 2 s sound); (2) a maintenance period with onset starting from the onset of cue and of duration 18.5 s (2.5 s Cue period plus 16 s); and (3) a retrieval period modeled as a block of 3 s after the maintenance period. The design matrix also included physiological and motion regressors of no interest. A high-pass filter with a cutoff frequency of 1/128 Hz was applied to remove low-frequency fluctuations in the BOLD signal. Once the GLM for each subject was estimated, the contrasts of parameter estimates for each individual subject were entered into second-level t tests to form statistical parametric maps, and a whole-brain random-effects analysis was implemented. The correlation of brain activity with behavioral performance at group level was performed using regression analysis with age of the subjects as a regressor of no interest. Given prior hypotheses about the involvement of auditory cortex and hippocampus in WM, small-volume correction (using anatomically defined volumes) was used for auditory cortex and hippocampus.
One concern in WM studies using fMRI is that the maintenance phase immediately follows the encoding phase and, because of the delay intrinsic to the BOLD signal, the regressors for the maintenance and encoding phases may be correlated. The activity observed in the maintenance phase may thus be contaminated with activity during the encoding phase. In our study, because the maintenance duration was much longer than the encoding phase, the correlation coefficient between the encoding and maintenance regressors in the design matrix was very small (0.04). Activity during the maintenance phase is not therefore explained by correlation between the regressors in this design. We also repeated the univariate analysis by leaving 4 s on either side of the maintenance phase [i.e., modeling only the middle portion (10.5 s) of the maintenance phase; Zarahn et al., 1997] to avoid any possibility of “lagged” blood flow changes during maintenance affecting the maintenance measurements. The results of this analysis were very similar to the one using all of the maintenance period.
Time series were extracted by first dividing the continuous time series into individual trials, with presentation of the visual alert marking the beginning of a trial, and 33 s (30 scans) after the alert cue marking the end of the trial. Trial activity was then referenced with respect to the first scan of that trial before computing the average across all trials. To compute the time series for a given region of interest (ROI), an average across all voxels within that ROI was computed.
Multivoxel pattern analysis.
The principal aim of MVPA is to determine whether patterns of activity in a given area can distinguish between two or more experimental conditions (Haynes and Rees, 2006; Norman et al., 2006). We used a linear support vector machine (SVM) classifier to determine whether patterns of activity from different brain areas could be decoded when participants held a low or high tone in WM. Activity during the maintenance period was summarized by computing β values for each trial on the unsmoothed data. In our experiment, the probe tone presented at the retrieval phase was of the same category that was cued for the maintenance phase. It is, therefore, likely that the probe tone could aid in the classification of tones during the maintenance phase (because of the common variance captured by maintenance and recall phase regressors). To ensure that this was not so, only the middle 10.5 s of the maintenance period was modeled, leaving 4 s on either side based on recommendations by Zarahn et al. (1997). This was done by entering each trial as a separate regressor in the GLM analysis. Regressors for the encoding and retrieval phases, and regressors of no interest (motion and physiological regressors) were also added in the design matrix. For each subject, the following five ROIs were selected: Heschl's gyrus (HG), planum temporale (PT), hippocampus, LIFG Brodmann areas (BAs) 44 and 45, and right IFG (RIFG; BAs 44 and 45). For HG, the ROI was defined based on cytoarchitectonic maps as defined in the study by Morosan et al. (2001), a MNI template of which is available in the Anatomy toolbox (Eickhoff et al., 2005); for PT, the probabilistic map (thresholded at 30%) as defined in the study by Westbury et al. (1999); for hippocampus, the map as defined in the study by Amunts et al. (2005); and for LIFG and RIFG, we used ROIs as defined in the WFU PickAtlas toolbox (Maldjian et al., 2003). To reduce noise in classification, only trials with correct responses were used. Three subjects who did not perform the task well [they had d-prime (detection) values of 0.89, 0.99, and 1.16, respectively] were dropped from the analysis. We did not drop these subjects from the univariate analysis because we wanted to analyze the correlation of brain activity with behavioral performance.
The SVM classifier, as implemented in the LIBSVM toolbox (Chang and Lin, 2011), was used with the following leave-one-session-out cross-validation strategy: the classifier was trained on data from three of the four sessions and tested on the left-out session. For training and testing, β values at all voxels within a defined ROI were used as the feature vector. The feature vector was normalized to unit norm before inputting to the classifier. Classifier accuracy values for each brain region were compared with chance, which in our case was 50%. Given that we were interested in whether results were significantly above chance, one-tailed t tests were used for testing the statistical significance of classification.
Connectivity analysis was conducted using psychophysiological interactions (PPIs) as implemented in the CONN toolbox (Whitfield-Gabrieli and Nieto-Castanon, 2012). The analysis was limited to the following five ROIs in each hemisphere: HG, PT, IFG, anterior hippocampus, and posterior hippocampus. After the data were preprocessed using SPM, the artifact detection (ART) toolbox (Mazaika et al., 2007) was used to detect outliers. Time points in the series were marked as outliers if the global signal exceeded 3 SDs and/or the movement exceeded ≥1 mm. The effect of movement and physiological parameters on the BOLD signal was reduced by regressing out motion and physiological artifacts, along with their first-order temporal derivative, by running whole-brain voxelwise regression. Additionally, five covariates generated using the aCmpCorr method (Behzadi et al., 2007), which uses principal component analysis on the measurements made in the white matter and CSF of each individual subject's segmented white matter and CSF masks, were used. The data were then high-pass filtered with a cutoff frequency of 0.008 Hz. PPI analysis was performed for each ROI to every other ROI (ROI-to-ROI analysis) for every subject. We compared the encoding, maintenance, and retrieval conditions against each other instead of comparing each of these against baseline. This is because physiological variables such as heart rate during cognitive tasks (here, working memory) are different from the rest condition (Middleton et al., 1999), and changes in physiological variables are known to confound functional connectivity in fMRI (Birn, 2012). Since the physiological parameters will be similar across encoding, maintenance, and retrieval conditions compared with any of these conditions against baseline, to minimize the effect of physiological noise on the connectivity analysis, we report the comparison in connectivity among the three conditions. The contrasts computed at the first levels were then submitted to second-level analysis. The group-level connectivity between the regions was corrected for multiple comparisons using the false discovery rate method.
Behavioral data analysis
The overall accuracy scores of participants varied from 67% to 98% (mean, 83.5%; SD, 10%). Accuracy was better when cued to retain the first tone compared with the second tone (first tone, 85.7%; second tone, 81.3%; t(15) = 2.68, p < 0.017).
Univariate fMRI analysis
Individual voxel GLM analysis was performed for the whole brain. Based on previous studies, we hypothesized sustained activity in the auditory cortex, hippocampus, and inferior frontal cortex during the maintenance period. The activity in these areas is described in detail below. The rest of the areas activated are summarized in Table 1.
Figure 2 shows activity in auditory cortex during the encoding, maintenance, and retrieval phases of the WM task. During encoding (Fig. 2A, first row), significant activity was observed all along the mediolateral extent of HG and PT bilaterally [peak at MNI coordinates (x, y, z) 52, −32, 14; t(15) = 12.63]. During maintenance (Fig. 2A, middle row), significant activity in the auditory cortex was mostly confined to PT bilaterally: of all the voxels that survived threshold, 84.42% were located in the PT. The percentages in the medial, middle, and lateral part of HG were 3.19%, 5.59%, and 6.79%, respectively. The peak of activity during maintenance was observed at coordinates −58, −44, 16 (t(15) = 12.34). In the retrieval phase (Fig. 2A, bottom row), bilateral activity in HG and PT was observed (peak at coordinates 70, −26, 10; t(15) = 10.83). No suppression of activity was observed in auditory cortex during any of the three phases of the task.
Region-of-interest analysis that averages activity across an anatomically defined region was performed for both HG and PT. The HG was partitioned into the following three divisions: medial, middle, and lateral, based on cytoarchitectonic probabilistic maps (Morosan et al., 2001). Figure 2B shows the results of ROI analysis during encoding, maintenance, and retrieval. In the maintenance phase, activity was significant in all the divisions of HG in both hemispheres [left HG: medial (t(15) = 4.36, p < 0.001), middle (t(15) = 4.37, p < 0.001), lateral (t(15) = 4.76, p < 0.001); right HG: medial (t(15) = 3.55, p = 0.003), middle (t(15) = 3.67, p = 0.002), lateral (t(15) = 4.36, p < 0.001)] and PT (left: t(15) = 7.76, p < 0.001; right: t(15) = 6.53, p < 0.001). A two-way repeated-measures ANOVA for maintenance data with hemisphere (left and right) and ROI (medial HG, middle HG, lateral HG and PT) as factors showed a main effect of ROI (F(3,45) = 15.70, p < 0.001), no main effect of hemisphere (F(1,15) = 3.96, p = 0.065), and no interaction (F(3,45) = 1.16, p = 0.33). Post hoc comparison showed that activity in lateral HG was greater than in medial (p = 0.002) and middle (p < 0.001) parts of HG. Activity in PT was also greater than that in medial HG (p < 0.001) and middle HG (p < 0.05). Activity in lateral HG was not significantly different from activity in PT.
To further confirm the activation of auditory cortex during the maintenance phase, we extracted BOLD time series from each of the ROIs (for details, see Materials and Methods). The average time series (across all voxels within an ROI) in PT and in each of the three divisions of HG for both hemispheres are plotted in Figure 2C. The figure shows positive BOLD activity with respect to the rest baseline condition throughout the maintenance phase in all of the defined ROIs of auditory cortex.
Figure 3A shows activity in hippocampus (coronal slices from anterior to posterior) during all phases of the task. Significant activity was observed during encoding (peak at MNI coordinates 18, −30, −6; t(15) = 11.65), maintenance (peak at MNI coordinates −10, −34, −8; t(15) = 7.62), and retrieval (peak at MNI coordinates −18, −26, −10; t(15) = 10.85). On comparing the activity during encoding and retrieval, we observed greater activity for encoding in the anterior hippocampus (encoding > retrieval; peak at MNI coordinates 26, −10, −16; t(15) = 3.23) and greater activity for retrieval in the posterior hippocampus (retrieval > encoding; peak at −16, −26, −12; t(15) = 7.51).
In light of the debate about gradations of functions along the anterior–posterior axis of hippocampus (Lepage et al., 1998; Greicius et al., 2003; Poppenk et al., 2013; Strange et al., 2014), we tested for systematic variations of activity along this axis during the three phases of the WM task. For this purpose, the average activity of all voxels within hippocampus was calculated at all y-coordinates ranging from anterior to posterior (y = −8 to −38) with a 2 mm resolution. Plots of this activity for encoding, maintenance, and retrieval for both left and right hippocampi are shown in Figure 3B. There was a gradual increase in activity from approximately the middle of the axis (y = −20) to the posterior end of the axis during retrieval in both hippocampi. A similar increase (more in the right than in the left hippocampus) was also observed during the encoding phase. During the maintenance phase, however, activity was almost constant throughout the anterior–posterior axis.
Figure 3C shows time series plots of BOLD activity from peak voxels in the anterior (left: −14, −12, −20; right: 28, −10, −28) and posterior (left: −10, −34, −8; right: 14, −30, −8) hippocampus in both hemispheres.
Inferior frontal gyrus
Figure 4Ai shows significant activity in the left IFG (peak at MNI coordinates −50, 6, 0; t(15) = 15.11) and the right IFG (peak at MNI coordinates 52, 8, 0; t(15) = 10.16) during the maintenance phase. Plots of β values (Fig. 4Aii) and time series (Fig. 4Aiii) at the peak coordinates show that both left and right IFG respond not only in the maintenance phase but also to the encoding and retrieval phases of the task. However, a part of the right IFG that is more dorsal and anterior (Fig. 4Bi) responds only during the retrieval phase (peak at MNI coordinates 46, 26, 6; t(15) = 7.46) but does not respond during encoding or maintenance. Parameter estimates at this peak coordinate are shown in Figure 4Bii, and the corresponding time series is shown in Figure 4Biii.
Correlation of univariate data with behavior
Behavioral accuracy performance of subjects varied considerably from 67% to 98%. We, therefore, assessed the correlation between performance and brain activity in all three stages of the task. Significant correlation between the behavioral performance and BOLD activity was observed in a number of brain areas (Fig. 5).
During encoding, a positive correlation was observed in the right temporoparietal junction (rTPJ; peak at MNI coordinates 64, −38, 30; t(15) = 4.85; Fig. 5Ai), left anterior superior temporal gyrus (STG)/superior temporal sulcus (STS; peak at MNI coordinates −48, −14, −8; t(15) = 4.47, Fig. 5Aii), and right anterior STS (peak at MNI coordinates 66, −8, −8; t(15) = 4.58).
During the maintenance phase, a positive correlation in parietal cortex (peak at MNI coordinates 52, −44, 54; t(15) = 4.77; Fig. 5Bi) was observed. In addition, negative correlation was observed in ventromedial prefrontal cortex (peak at MNI coordinates 8, 36, −10; t(15) = 6.26; Fig. 5Bii) and posterior cingulate, which collectively form part of what is known as the default mode network (DMN; Buckner et al., 2008). Negative correlation of neural activity with behavioral performance was also obtained in left posterior hippocampus (peak at MNI coordinates −24, −24, −10; t(15) = 7.16]; Fig. 5Biii), right anterior hippocampus (peak at MNI coordinates 30, −10, −24; t(15) = 5.59), and right posterior hippocampus (peak at MNI coordinates 18, −32, −4; t(15) = 6.05). In the frontal region, activity in the right inferior frontal junction (IFJ) was negatively correlated with performance (peak at MNI coordinates 48, 14, 24; t(15) = 13.42; Fig. 5Biv).
During the retrieval phase, a negative correlation was observed in retrosplenial cortex (peak at MNI coordinates −2, −52, 2; t(15) = 7.57; Fig. 5Ci).
Multivoxel pattern analysis
In the task, subjects were cued to maintain one of the two tones in memory, which could be a low or high tone. Using standard univariate (GLM) analysis, no difference in activity during the maintenance period was found for retention of a low versus a high tone. We used MVPA to test whether patterns of activity during the maintenance period could distinguish between low and high tones. Five ROIs (HG, PT, hippocampus, left IFG, and right IFG) were chosen. For HG, PT, and hippocampus, the ROIs were bilateral, whereas for IFG we divided the ROI into left and right ROIs because (1) the right, but not the left, has been implicated in AWM before (Zatorre et al., 1994); and (2) the left, but not the right, has been shown to be involved in rehearsal during maintenance periods of AWM (Koelsch et al., 2009). For the left and right IFG, standard templates of BAs 44 and 45 defined the ROIs.
Performance of the classifier is shown in Figure 6. Only two regions, HG and left IFG, showed an above-chance level classification at the p < 0.05 level (HG: accuracy = 55.61%, t(12) = 3.82, p = 0.001; PT: accuracy = 49.08%, t(12) = −0.51, p = 0.69; hippocampus: accuracy = 49.26%, t(12) = −0.38, p = 0.64; left IFG: accuracy = 54.25%, t(12) = 2.41, p = 0.016; right IFG: accuracy = 48.87%, t(12) = −0.51, p = 0.69). For HG, 9 of 13 subjects performed above the chance level, 3 performed at chance level, and the performance of 1 subject was below the chance level. For the left IFG, eight subjects were above the chance level and five subjects performed below the chance level.
To make sure that the statistical significance of our MVPA results was not biased by the assumptions of the parametric t test, we also evaluated the statistical significance using a nonparametric test for MVPA as proposed in the study by Stelzer et al. (2013). Briefly, we first randomized the target labels and ran the MVPA for each subject 200 times. For each subject, then, one accuracy value (of the 200 above) was picked up randomly and averaged across subjects. This procedure was repeated 100,000 times to obtain 100,000 accuracy values, which represent the null distribution for group accuracy. The p value for the group accuracy with correct labels is then determined based on this null distribution. Using this procedure the p values obtained for HG and left IFG are 0.0013 and 0.015, respectively. These values are almost identical to the p values using the t test.
Functional connectivity analysis
To test how specific representations are kept active in auditory cortex during the maintenance phase, we analyzed functional connectivity using psychophysiological interactions with hippocampus and frontal areas. The auditory cortex was partitioned into two ROIs (HG and PT). Also, given the debate about the differential function of anterior and posterior hippocampus (Lepage et al., 1998; Greicius et al., 2003), we divided the hippocampus ROI into anterior and posterior parts. From the frontal region, we chose IFG. The following (bilateral) regions were therefore included in the analysis: HG, PT, anterior hippocampus, posterior hippocampus, and IFG. A seed was placed in each of these regions, and functional connectivity to the remaining regions was computed. Figure 7 shows the results of the connectivity analyses, which are summarized below.
Connectivity during encoding compared with maintenance
(1) There is no long-range connectivity of the auditory cortex; both HG and PT in each hemisphere are connected to each other, but there is no connectivity of these regions outside the auditory cortex (Fig. 7i).
Connectivity during maintenance compared with encoding
(2) Auditory cortex is strongly connected to the right hippocampus (right HG and right PT are connected to the right posterior hippocampus, and left PT is connected to the right anterior hippocampus). (3) Auditory cortex is strongly connected to left IFG; both left HG and PT in each hemisphere are connected to left IFG. (4) Right auditory cortex is connected to right IFG; HG and PT in the right hemisphere only are connected to the right IFG (Fig. 7i).
Connectivity during retrieval compared with maintenance
(5) There was no significant difference (in either direction) in the connectivity during the maintenance phase compared with the retrieval phase.
Connectivity during retrieval compared with encoding
(6) For retrieval > encoding, auditory cortex is strongly connected to both IFG and hippocampus; both HG and PT receive stronger connections from IFG and hippocampi in both hemispheres (Fig. 7ii). (7) For encoding > retrieval contrast, no significant difference was observed.
We observed sustained activation of the auditory cortex during a maintenance period, which is in contrast to findings from Linke et al. (2011), who observed suppression during maintenance. This might be attributable to differences in the task in the two studies. While Linke et al. (2011) required subjects to maintain two tones from different categories without any cue before or after the tones, participants in the present study were explicitly instructed to maintain one of the two tones after the tones were presented (retro-cue). Behavioral studies (Matsukura et al., 2007; Pertzov et al., 2013) show that, without any cue, items maintained in memory tend to be forgotten rapidly, but a selective retro-cue leads to protection of the cued item from temporal decay during the maintenance period. It is also known that maintaining multiple items, without a selective cue, leads to diffuse attention (Makovsik and Jiang, 2007) and competition for memory resources, which suppress each other's representation (Bahcall and Kowler, 1999). The suppression in activity during the maintenance period observed by Linke et al. (2011) could therefore be driven by such a competition between the representation of tones. The later study of Linke and Cusack (2015), which required maintenance of a single sound in WM showed activation during maintenance, like the present study.
The specificity of auditory cortex activity during the maintenance period was examined using MVPA, which showed that patterns of activity in HG, but not in PT, reliably encoded whether subjects were maintaining a low tone or a high tone during the delay period. Possible explanations for this finding include the idea that the representation in PT is in the form of high-level symbolic representations or “templates” (Griffiths and Warren, 2002) as opposed to activity in HG that might more closely match the sensory pattern of the stimulus. Templates in PT might be more removed from the stimulus sensory structure and harder to disambiguate based on blood flow patterns. In any event, the work is congruent with visual studies (Lebedev et al., 2004; Serences et al., 2009; Riggall and Postle, 2012) showing subthreshold activity during WM that is content specific.
The role of hippocampus in WM is controversial (Ranganath and Blumenfeld, 2005; Graham et al., 2010; Jeneson and Squire, 2011). While one school argues for a fundamental role of hippocampus in WM (Graham et al., 2010), the other school (Jeneson and Squire, 2011) maintains that the primary role of hippocampus is in LTM alone and argues that activity in hippocampus during WM is observed only when either novel stimuli are maintained for long periods of time or the number of items maintained exceed the WM capacity. In these conditions, according to this model, it is difficult to focus attention on the item or items to be remembered, and the deviation in attention causes loss of the item or items from the current focus of attention. To re-engage attention on the item to be remembered, the item is then recalled from LTM. Models of WM (Cowan, 1995; Oberauer, 2009) also posit two separate components for items that are under the focus of attention and items that are active but outside the focus of attention. The latter component constitutes an “activated LTM” component of WM (Cowan, 1995). In our study, we used tones that changed across trials that were maintained for long intervals of time (16 s). It is, therefore, likely that subjects relied on LTM to perform the task activating hippocampus in the process. Activity in hippocampus in that case would form part of the activated LTM component of WM.
To further examine the role of hippocampus in WM, we evaluated the correlation between behavioral performance and hippocampus activity during the maintenance period. In light of the proposal by Jeneson and Squire (2011), we predicted that subjects who could not maintain a sustained focus of attention on the tone to be maintained would rely more on LTM, and, therefore, hippocampus would be activated to a greater extent in these subjects. These subjects, however, would perform poorly on the task because recalling the tone from LTM would entail proactive interference from tones that were presented on previous trials, which are also held in LTM; as per the model of Cowan (1995), these tones are held in the activated part of LTM. A negative correlation between hippocampus activity and behavioral performance is thus expected. This is exactly what we observed: activity in both left and right hippocampi increased with poorer performance. Furthermore, we also observed a negative correlation with behavior during recall in retrosplenial cortex, which has been known to be involved in retrieval of long-term memories (Vann et al., 2009; Rugg and Vilberg, 2013; Kumar et al., 2014a). This is also consistent with retrieval of LTM during task performance.
We examined the variation of activity along the anterior–posterior axis of hippocampus (Poppenk et al., 2013; Strange et al., 2014). While there is almost constant activity all along the axis during the maintenance period, the activity for encoding and recall is dominant on the posterior end of the axis. The activity starts building from y = −20 mm, and then there is a striking increase in activation as the far posterior end is approached. The starting point of buildup closely agrees with the anatomical boundary between anterior and posterior hippocampus suggested by other studies (Poppenk et al., 2013). In our study, auditory stimuli were presented both during the encoding and retrieval phases of the task. One possible interpretation is that the posterior hippocampus, compared with the anterior hippocampus, is more involved in the analysis of auditory stimuli in real time, during perception. This in turn predicts greater functional and structural connectivity between auditory cortex and the posterior hippocampus. Although structural connectivity between auditory cortex and hippocampus in humans is not completely understood, tract-tracing studies in primates suggest (Munoz-Lopez et al., 2010) that auditory cortex projects directly to parahippocampal cortex, which is known to project preferentially to posterior hippocampus (Aggleton, 2012).
Inferior frontal gyrus
Significant activity was observed in the LIFG, which lasted throughout the maintenance period. The LIFG has been implicated in a number of WM studies that required maintenance of phonological information (Paulesu et al., 1993; Awh et al., 1996; Strand et al., 2008). One role of LIFG in these studies is thought to be covert articulatory rehearsal, which keeps representations in the active state. However, this role of LIFG is not restricted to phonological WM alone but also extends to rehearsal of pitch (Koelsch et al., 2009). Although we did not explicitly instruct the subjects to follow any particular strategy in our study, pitch values of the low tone were in the vocal range that could be rehearsed as a basis for the LIFG activity. Activation of motor areas during maintenance (Table 1) also support the possibility of rehearsal being used during the maintenance period. It should, however, be noted that the role of motor areas in working memory is still controversial as studies using nonrehearsable stimuli also show activity in motor areas during WM (Liao et al., 2014).
In the RIFG, we observed two foci of activation, one more ventral and posterior (vpRIFG), and the other more dorsal and anterior (daRIFG). The vpRIFG showed significant activity in all phases of the WM task, whereas the daRIFG was active only during the retrieval phase. Although RIFG has been shown to be involved in a number of WM studies (Zatorre et al., 1994; Shivde and Thompson-Schill, 2004), a dominant view of the functioning of RIFG is that it actively inhibits the motor response to prepotent stimuli (Aron et al., 2014). Although a complete reconciliation of RIFG involvement in both inhibition and WM has yet to be made, one suggestion is that processes involved in inhibition, such as resistance to distractions, are useful in WM (Roberts and Pennington, 1996). Consistent with this idea, evidence from a study (McNab et al., 2008) using both tasks suggests a common focus of activation in RIFG for both inhibition and WM. This focus of activation matches with vpRIFG in our study. Furthermore, patients with lesions in prefrontal cortex are also known to be impaired in neglecting distractors during the maintenance period (Chao and Knight, 1998). Regarding daRIFG, which is active only during the retrieval phase, we cannot dissociate whether the activity is related to motor response or to retrieval of WM information.
Is the sustained activity in LIFG and RIFG specific to the information held in WM? Application of MVPA to these ROIs showed that, while patterns of activity in the LIFG could distinguish which of the two tones was held in WM, the RIFG could not. LIFG is well known to be involved in speech production. In a recent study in which subjects were asked to repeat a spoken word, Flinker et al. (2015) showed that Broca's area transforms sensory representation of the word in the auditory cortex to articulatory representations, which are then passed on to motor cortex. The content-specific representations demonstrated during the maintenance period in our study might reflect distinct articulatory representations of tones.
Correlation with behavior
During the encoding phase, positive correlations between behavioral performance accuracy and activity in rTPJ and left STS were observed. The rTPJ is a part of the ventral attentional network that reorients attention toward task-relevant objects (Corbetta and Shulman, 2002) or contextually updates the internal model in the event of behaviorally relevant perceptual input (Geng and Vossel, 2013). One likely interpretation of the positive correlation in rTPJ, therefore, is that subjects who paid greater attention to the stimuli during encoding or updated the representations of the stimuli (tones changed across trials and sessions) performed better than those who did not. Left STS activity has been shown in the categorical perception of speech sounds (Liebenthal et al., 2005) and musical intervals (Klein and Zatorre, 2015). Greater activity in the left STS for subjects who performed better in the task may therefore reflect that these subjects encoded the two tones as distinct categories more reliably than subjects with a lower performance.
During the maintenance period, we observed a positive correlation with activity in part of parietal cortex which has been shown to be involved in maintaining sustained attention (Foucher et al., 2004). Subjects who could keep sustained attention during the maintenance period performed better on the task. Negative correlation between activity in a set of areas that together comprise the DMN, and behavioral performance was observed. The DMN is known to be active in mind wandering (Mason et al., 2007). Greater activation in this network, therefore, implies more mind wandering during the maintenance period and, therefore, poorer performance. Negative correlation was also observed in the right IFJ. Evidence (Roth and Courtney, 2007) shows that IFJ is involved in updating the contents of WM from LTM. The subjects who used LTM to perform the task activated the IFJ more but (as explained above in the case for hippocampus) performed poorly, which explains the negative correlation observed in IFJ.
During the maintenance period, we observed the long-range connectivity of the auditory cortex to hippocampus and IFG, which, putatively, keeps representations active in the auditory cortex. Specifically, connectivity of right hippocampus (both anterior and posterior) to auditory cortex is consistent with its role in keeping representations active in auditory cortex by recalling from the activated LTM in the event of deviation from the focus of attention (see above). Connectivity of LIFG with the auditory cortex during the maintenance period is consistent with the role of LIFG in keeping the representations active in auditory cortex by subvocal rehearsal of the tones. Interestingly, both LIFG and HG have content-specific representations. Our data, therefore, suggest that HG and LIFG, possibly via PT, form a closed loop where tone-specific representations in the LIFG initiate subvocal rehearsal, which activates the auditory cortex. Functional connectivity of auditory cortex to the RIFG was also observed during the maintenance period. The importance of both structural and functional connectivity between auditory cortex and RIFG for the normal perception of a sequence of tones has been shown in studies of disorders of music perception (compare with amusia; Albouy et al., 2013). Since listening to a sequence of tones requires a memory component capable of storing pitch information so as to integrate pitch across the sequence, the results of our study combined with those of other studies (Zatorre et al., 1994; Koelsch et al., 2009; Albouy et al., 2013) clearly point toward a role of RIFG in the maintenance of auditory information. Consistent with this, auditory WM training is shown to increase the efficiency of neural processes in RIFG (Schneiders et al., 2012).
In summary, our data point to a system for WM in which content-specific representations in the auditory cortex are kept active by its remote connectivity with hippocampus and frontal areas.
This work was supported by Wellcome Trust Grant WT091681MA.
The authors declare no competing financial interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Sukhbinder Kumar, Auditory Group, Institute of Neuroscience, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, UK.
This article is freely available online through the J Neurosci Author Open Choice option.