Abstract
Short-term memory (STM), the brief maintenance of information in the absence of external stimulation, is central to higher-level cognition. Behavioral and neural data indicate that information maintained in STM can be represented in qualitatively distinct states. These states include a single chunk held in the focus of attention available for immediate processing (the “focus”), a capacity-limited set of additional actively maintained items that the focus can access (the “active state”), and passively maintained items (the “passive state”). Little is known about how information is shifted among these states. Here, we used fMRI in humans to examine the neural correlates of shifting information among representational states of STM. We used a paradigm that has demonstrated dissociable performance costs associated with shifting the focus among active items and switching sets of items between active and passive states. Behavioral results confirmed distinct behavioral costs associated with different representational states. Neural results indicated that the caudal superior frontal sulcus (cSFS), in the vicinity of the frontal eye fields, was associated with shifting the focus, consistent with the role of this region in internal and external attention. By contrast, the ventral premotor cortex (PMv) was associated with shifting between active and passive states. Increased cSFS-medial temporal lobe (MTL) connectivity was associated with shifting the focus, while cSFS-MTL connectivity was disrupted when the active state was changed. By contrast, PMv–MTL connectivity increased when the active state was switched. These data indicate that dissociable frontal–MTL interactions mediate shifts of information among different representational states in STM.
Introduction
Short-term memory (STM) refers to the brief maintenance of information in the absence of external stimulation for use in ongoing cognition. STM forms the workspace for higher-level cognition providing the representational input for comprehension, problem solving, and reasoning. This is evidenced by the strong relationship between the amount of information that can be held in STM and prowess in various higher-level cognitive skills (Carpenter et al., 1990; Just and Carpenter, 1992; Jaeggi et al., 2008; Fukuda et al., 2010). Thus, understanding STM has widespread implications for cognition.
Recent data have provided evidence that STM is composed of multiple distinct states. Although several items can be actively held in STM, attention can be drawn to particular items to improve their saliency. Focusing attention on items in STM enhances fMRI signal in areas known to represent those items (Lepsien and Nobre, 2007; Higo et al., 2011; Lewis-Peacock et al., 2012; LaRocque et al., 2013) while facilitating decisions based upon attended items (Garavan, 1998). Such attentional focusing in STM has been associated with dorsal frontal and parietal areas (Garavan et al., 2000; Bledowski et al., 2009) that are also engaged in attending to external stimuli (Tamber-Rosenau et al., 2011). Furthermore, actively maintained items that are not in the focus of attention can be distinguished from items that are passively maintained (Oberauer, 2002, 2005). Recognition decisions based on actively maintained items involve the medial temporal lobe (MTL), whereas recognition decisions based on passively maintained items involve ventrolateral prefrontal cortex (PFC) (Nee and Jonides, 2011, 2013a). Together, these data indicate that at least three representational states exist in STM: focused, active, and passive, each with distinct neural signatures.
Although there is mounting evidence for multiple states in STM, the underlying neural mechanisms remain unclear. In particular, little is known about how information is dynamically transitioned among states to perform complex cognitive tasks. To explore this matter, we adapted a task that has provided behavioral signatures of different states of information representation (Oberauer, 2002, 2005). The task uses cues to direct shifts of the focus of attention among items in STM, as well as cues that direct which sets of items are held actively versus passively (Fig. 1). We anticipated that dorsal frontal areas would mediate shifts of the focus of attention, consistent with their role in attention to both external and internal information (Bledowski et al., 2009; Tamber-Rosenau et al., 2011). Furthermore, we hypothesized that dorsal frontal areas would interact with the MTL. We have previously shown MTL involvement during the retrieval of information held in the active state of STM (Nee and Jonides, 2008, 2011, 2013a), which we have suggested reflects the bindings of items to contexts (Diana et al., 2007; Eichenbaum et al., 2007). So, contextual cues that direct the focus of attention through dorsal frontal mechanisms were hypothesized to retrieve context-associated items through their MTL-mediated bindings (Nee and Jonides, 2013b). Regions likely to mediate shifts of active states were less clear and were an exploratory matter.
Materials and Methods
Participants.
We report data from 26 right-handed participants (age 18–23 years; mean 19.9 years; 13 female). Informed consent was obtained for all subjects in accordance with the Institutional Review Board at the University of Michigan. Subjects received $20/h as well as a bonus for fast and accurate performance.
Materials and procedure.
The task (Fig. 1A) was designed to engage shifts among different representational states in STM. On each trial, participants were presented with two sets of digits. Each digit was presented in a colored frame arranged hexagonally around a central fixation cross. Frames served as a context for the digits to be held in STM. All frames were placed equidistantly from fixation. One set of digits was presented in red frames and the other in blue frames. The number of digits in each set was orthogonally varied between two and three items. During the encoding phase, participants committed all of the digits to STM. Encoding was followed by a retention interval, during which time the digits were removed from view and the frames were colored in black. Thereafter, a set cue indicated that one of the sets would be operated upon (set cue 1). The set cue consisted of restoring the color of the cued set (e.g., from black to red). The set cue was followed by an operation (operation 1.1) presented in one of the colored frames (e.g., “+2”). Participants were instructed to apply the operation to the digit that corresponded to the frame, respond with the solution, and update their STM with the result. This was followed by a second operation (operation 1.2). The entire postencoding sequence was then repeated (i.e., retention, set cue 2, operation 2.1, operation 2.2). Finally, participants were instructed to recall both sets of digits. Recall of each digit was prompted by a cursor placed in each frame in turn.
Responses were made via a custom-built MR-compatible number pad (www.natatech.com). The keypad consisted of the numbers 0–9 arranged in a layout identical to a standard keyboard number pad. The results of operations were restricted to the numbers 1–9 so that all responses involved a single key press. Participants were instructed to use the “0” key if they could not recall the digit that corresponded to a cued frame. Participants were asked to respond as quickly and accurately as possible. Reaction time and accuracy data were recorded.
The task included two kinds of shifts: (1) A focus shift when the second operation of a series (e.g., operation 1.2) was performed on a different digit than the first (e.g., operation 1.1; focus switch). This was compared with cases in which both operations were performed on the same item (focus repeat). (2) A set shift when set cue 2 indicated a different set than set cue 1 (active switch). This was compared with cases in which set cue 2 indicated the same set as set cue 1 (active repeat).
To ensure that switch events were not confounded with eye movements, participants were instructed to maintain central fixation before and after each operation event. Because participants had difficulty distinguishing the digits and operations with peripheral attention, we allowed participants to move their eyes during encoding and operation events. Because each frame was situated equidistantly from central fixation, returning to central fixation prior and after each operation event ensured that the same saccade distance was used for each operation event so that activations could not be attributed to saccade magnitude. Eye-movement data were collected to confirm adherence to instructions.
The timing of events was as follows: the encoding period lasted 1 s per digit (i.e., 4–6 s). The retention interval was pseudo-randomly jittered between 4 and 6 s in equal steps of 1 s. The set cue was presented for 2 s and separated from the first operation by a 4–6 s interval, pseudo-randomly jittered in equal steps of 1 s (the set cue and the interval that followed are depicted as a single event in Fig. 1, but they were modeled separately as indicated below). Each operation was presented for 2 s separated from each other by 1 s. The recall period lasted 2 s per item. Finally, there was a 4 s intertrial interval. The task was divided into 6 runs of 8 trials each.
Within a week before scanning, participants completed a full session of the task outside of the scanner as practice. This procedure ensured that the participants understood the instructions and could make responses using a number pad without looking at the keys. Four participants were excluded from the fMRI session because of inability to perform the task. All data reported are from the 26 participants who completed both the practice and fMRI session. Data from the fMRI session only are reported in that the prior session served as practice on the task.
Behavioral analysis.
Based on previous behavioral work with a related paradigm (Oberauer, 2002, 2005), we hypothesized that the task would involve three representational states (Fig. 1B). First, the cued set is hypothesized to be held in an active state in that any member of that set could be a candidate for a future operation. Second, the uncued set is hypothesized to be held in a passive state. Third, items that are the objects of operation are hypothesized to be the focus of attention. These hypotheses were tested via the following behavioral comparisons: (1) Based on evidence that scanning in STM is slowed by the number of items maintained (Sternberg, 1966), we expected that the size of the active set should influence reaction times, but the size of the passive set should not. These set-size effects were measured on all operation events. (2) We further hypothesized that switching the active set (active switch) should incur a cost relative to repeating the same active set (active repeat). This active switch cost was measured on operation 2.1 (i.e., the operation that followed an active switch/repeat). We also examined whether the active switch cost persisted into operation 2.2. (3) We hypothesized that switching the focus of attention (focus switch) should incur a cost relative to repeated processing of the same item (focus repeat). This cost was measured on the second operation of each series (i.e., operation 1.2 and operation 2.2). (4) Finally, we hypothesized that the focus switch-cost would interact with the size of the active set, but not the passive set. This is because only items in the active set should compete for the focus of attention.
Data of main interest were reaction times. Because of technical issues with the keypad, error data were not always a faithful reflection of STM. This was largely the result of the propensity for keys to “stick” so that the response device continued to send key press events even after the participants had stopped pressing a key. As a result, the number of errors was inflated. Nevertheless, to avoid potential contamination from true errors, reaction times were analyzed only for correct responses. Furthermore, reaction times for the second operation of a series (i.e., operation 1.2 and operation 2.2) were analyzed only on trials in which the first operation was correct.
Saccade latencies and magnitudes were calculated using the GazeAlyze toolbox (Berger et al., 2012) implemented in MATLAB (MathWorks). Preprocessing was performed using the ILAB toolbox (Gitelman, 2002). Blinks were removed from the data, and the data were smoothed using a 4-point running average. Saccades produced faster than 90 ms or >500 ms after stimulus onset, and saccades to regions outside of the stimulus field of view were ignored. Different censoring procedures led to similar results as those reported here. Data corruption precluded the analysis of saccade data for 2 subjects.
Image acquisition and preprocessing.
Images were acquired on a GE Signa 3T scanner equipped with a 4-channel head coil. Head movement was minimized using foam padding and a cloth restraint strapped across participants' foreheads. Experimental tasks were presented using E-Prime software version 2.0 (Psychology Software Tools). Eye tracking was performed using ViewPoint (Arrington Research). Because of technical issues, eye tracking could not be performed on 2 subjects.
Functional T2*-weighted images were acquired using a spiral sequence with 43 contiguous slices with 3.44 × 3.44 × 3 mm voxels (repetition time, or TR = 2000 ms; echo time, or TE = 30 ms; flip angle = 90°; field of view, or FOV = 220 mm2). A T1-weighted gradient-echo anatomical overlay was acquired using the same FOV and slices (TR = 250 ms, TE = 5.7 ms, flip angle = 90°) to improve coregistration between the high-resolution anatomical image and functional images. Additionally, a 124-slice high-resolution T1-weighted anatomical image was collected using spoiled-gradient-recalled acquisition in steady-state imaging (TR = 9 ms, TE = 1.8 ms, flip angle = 15°, FOV = 250–260 mm2, slice thickness = 1.2 mm).
Functional data were spike-corrected to reduce the impact of artifacts using AFNI's 3dDespike (http://afni.nimh.nih.gov/afni). Subsequent processing and analyses were done using SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). Functional images were corrected for differences in slice timing using sinc-interpolation and head movement using a least-squares approach and a 6-parameter rigid body spatial transformation. Structural data were coregistered to the functional data and segmented into gray and white-matter probability maps (Ashburner and Friston, 1997). These segmented images were used to calculate spatial normalization parameters to the MNI template; these were subsequently applied to the functional data. As part of spatial normalization, the data were resampled to 2 × 2 × 2 mm3. Eight-millimeter full-width/half-maximum isotropic Gaussian smoothing was applied to all functional images before analysis using SPM5. All analyses included a temporal high-pass filter (128 s), correction for temporal autocorrelation using an autoregressive AR(1) model, and each image was scaled to have a global mean intensity of 100. For participants demonstrating >3 mm/degrees of motion over the course of the session or a single movement of >0.5 mm/degrees in-between TRs, 24 motion regressors were included reflecting total displacement, squared total displacement, differential (TR-to-TR) displacement, and squared differential displacement to capture signal artifacts related to motion (Lund et al., 2005; Satterthwaite et al., 2013). Excluding these participants altogether resulted in similar results.
Image analysis.
Our analyses were centered on four events of interest described above: focus switch, focus repeat, active switch, and active repeat. Each of these events was treated as an impulse and convolved with SPM's canonical hemodynamic response function. Given the complex design, numerous other regressors were included to capture signal associated with processes that were not of interest. These included the encoding phase, retention interval, set cue 1, recall phase, and the intervals in between set cues and the first operation events of a series. The first operation event of each series was not explicitly modeled because it was temporally separated from the second operation event by a fixed 1 s interval making the hemodynamic signal for the first and second operation events highly correlated. This short interval was found necessary during piloting to preserve the focus switch cost. Presumably, longer intervals allow attention to meander, thereby removing costs associated with switching the focus. Additionally, we included several modulators of the regressors described above. Encoding and retention intervals were modulated by the total number of items held in STM. Focus switch/repeat and active switch/repeat events included separate modulators for the active set size and passive set size. We also included modulators to capture signal associated with incorrect responses. As indicated above, many errors resulted from technical issues with the response device rather than true errors of STM. So, we chose to use a modulator to capture these events rather than to model errors separately to avoid inappropriately discarding trials and losing power. However, it should be noted that a model that did separately model errors produced qualitatively similar results to those reported here. Finally, modulators were included for retention intervals based on the proportion of items correctly recalled at the end of the trial.
We assessed two contrasts of interest: (1) focus switch > focus repeat and (2) active switch > active repeat. Contrasts were performed at the subject level, and contrast estimates for each subject were then submitted to a group analysis that treated subject as a random effect. These analyses were performed as one-sample t tests. Group whole-brain analyses were thresholded at p < 0.001 at the voxel level, with a 75-voxel cluster extent providing family-wise error correction according to simulations performed with AlphaSim. Targeted searches within the MTL were performed within masks, including the bilateral hippocampi and parahippocampal gyri as defined by the automated anatomical labeling atlas (Tzourio-Mazoyer et al., 2002). These searches were performed at p < 0.05 at the voxel level, with a 190-voxel cluster extent providing family-wise error correction according to AlphaSim.
Follow-up analyses were performed within unbiased ROIs created using a leave-1-subject-out procedure. For each subject, the second level one-sample t tests (focus switch > focus repeat, active switch > active repeat) were reestimated with that subject held out; 6 mm spheres were then drawn around peak activations in the right caudal superior frontal sulcus for the focus switch > focus repeat contrast, and left ventral premotor cortex for the active switch > active repeat contrast. The left caudal superior frontal sulcus and right ventral premotor cortex were identified by flipping the sign of the x-coordinate of the ROIs described above. For each ROI, data from the held-out subject were extracted and the procedure was repeated for each subject. Contrasts of focus switch − focus repeat and active switch − active repeat were computed within each ROI with contrast estimates averaged across all voxels of the ROI. Similar results were obtained using caudal superior sulcus ROIs based upon previous studies examining attention shifting in STM. The leave-1-subject-out procedure was chosen because of the lack of previous related literature localizing the ventral premotor cortex during shifts between active and passive states of STM.
β series analysis.
Interactions between functional connectivity and task conditions were assessed using the β series method (Rissman et al., 2004). Each event of each trial was modeled with a separate regressor resulting in a separate parameter estimate (β) for each event. The model was based upon the model used for the univariate analyses with the following changes: parametric modulators were omitted and the duration of the regressor capturing the retention interval was shortened to decorrelate the regressor from the active switch/repeat events. Once again, the events of interest were focus switch, focus repeat, active switch, and active repeat. In this case, each event was associated with a series of β's rather than a single β.
Seeds were placed in frontal areas associated with focus switching (right caudal superior frontal sulcus: 30 − 6 48) and active switching (left ventral premotor cortex: −60 4 30). Each seed consisted of a 6 mm sphere centered around the coordinate of maximal activation in the univariate analyses. For each condition, correlations were computed between the average activation across all voxels in the seed region and every other voxel in the brain. This resulted in one correlation map per condition per seed per subject. Correlation maps were transformed using an arc-hyperbolic tangent function to approximate a normal distribution. These transformed correlation maps were then used to calculate switch × state interactions (i.e., [focus switch − focus repeat] − [active switch − active repeat] and [active switch − active repeat] − [focus switch − focus repeat]). Contrasts were submitted to a group-level one-sample t test. Group whole-brain analyses were thresholded at p < 0.001 at the voxel level, with a 75-voxel cluster extent providing family-wise error correction according to simulations performed with AlphaSim. Targeted searches within the MTL were performed within masks, including the bilateral hippocampi and parahippocampal gyri as defined by the automated anatomical labeling atlas (Tzourio-Mazoyer et al., 2002). These searches were performed at p < 0.05 at the voxel level, with a 190-voxel cluster extent providing family-wise error correction according to AlphaSim.
Further exploration of functional connectivity interactions were performed within ROIs based upon previous studies that have documented evidence for distinct representational states in STM. The right anterior hippocampus ROI was formed using a 6 mm sphere centered around 24 −10 −16 based upon the hippocampal area jointly active in Nee and Jonides (2011) and Nee and Jonides (2013a). The left posterior hippocampus ROI was formed using a 6 mm sphere centered around −27 −33 −3 based upon the hippocampal area active in Oztekin et al. (2010). These ROIs provided unbiased estimates of functional connectivity interactions suitable for depiction and exploratory analysis.
To uncover regions that could provide gateways between the frontal seeds and hippocampus, we calculated additional β series correlations using the unbiased hippocampal ROIs described above as seeds. Then, we performed conjunction analyses (Nichols et al., 2005) to find areas that showed common connectivity interactions with both the frontal cortex and hippocampus. Separate conjunction analyses were performed for the right caudal superior frontal sulcus with right anterior hippocampus and the left ventral premotor cortex with the left posterior hippocampus. The former looked for areas showing a common switch × state interaction ([focus switch − focus repeat] − [active switch − active repeat]) with the right caudal superior frontal sulcus and right anterior hippocampus. Because functional connectivity interactions between the left ventral premotor cortex and left posterior hippocampus were driven entirely by the active switch/repeat conditions, conjunctions were performed for the simpler contrast of active switch − active repeat. Because valid conjunction analyses are conservative, we improved detection sensitivity by searching for conjunctions within a restricted mask consisting of regions that are likely to mediate the connectivity between the frontal areas of interest and the hippocampus. Candidate regions were based upon known anatomical connectivity in monkeys, which were uncovered through systematic searches of the CoCoMac database (Stephan et al., 2001). The mask consisted of the middle frontal gyrus, inferior frontal gyrus, pars triangularis and pars orbitalis, anterior cingulate cortex, and angular gyrus as defined by the automated anatomical labeling atlas (Tzourio-Mazoyer et al., 2002). Only right hemisphere regions were used for the conjunction of the right caudal superior frontal sulcus and right anterior hippocampus, and only left hemisphere regions were used for the conjunction of the left ventral premotor cortex and left posterior hippocampus. Results for each contrast (i.e., for each seed region) were thresholded at p < 0.05 at the voxel-level with a 315-voxel cluster extent providing family-wise error correction according to AlphaSim. Conjunctions of >50 voxels are reported.
To check for consistency, we also performed a simpler, model-free connectivity analysis. In this analysis, raw (preprocessed) data from the right caudal superior frontal sulcus and MTL were extracted and high-pass filtered and de-meaned on a run-by-run basis. For each event-of-interest, individual trial estimates of activation were obtained by averaging data from TRs occurring 4 and 6 s after stimulus onset. These trial estimates were then used to examine condition-wise changes in connectivity using similar logic to the β series analyses described above. The model-free analysis produced qualitatively similar results to the β series analysis (switch × state interaction across methods: r = 0.498, p < 0.01) but did not obtain significance. The model-free method was noticeably noisier than the β series approach (approximately twice the between-subject variability), which may be the result of the lack of hemodynamic shape constraint and failure to take into account variance from surrounding cognitive events. As a result, we focus on the β series analyses from hereafter.
Results
The task involved simple arithmetic on digits held in STM (Fig. 1). On each trial, participants committed two sets of digits to memory. The size of each set was orthogonally varied between two and three digits. Each digit was presented within a frame that served as a context for the digit. Thereafter, a cue indicated that the digits of one of the sets would be candidates for arithmetic. We hypothesized that the cued set would be held in an active state (active set), whereas the uncued set would be held in a passive state (passive set). Following the cue, two arithmetic operations appeared sequentially within frames of the active set. Participants applied the arithmetic operation to the digit that corresponded to the frame, responded with the solution via a key press, and updated their STM with the result. We hypothesized that arithmetic operations required that the focus of attention was fixated on the appropriate item. Hence, when the second operation was performed on a different item than the first, we anticipated a reaction time cost that reflected the need to switch the focus of attention among items in the active set (focus switch). This cost would be borne out through comparison to conditions where arithmetic operations were applied to the same item twice in a row (focus repeat). Following the second operation, a second set cue appeared that could indicate the same active set as before (active repeat) or that the previously passive set should now be the active set (active switch). The cue was once again followed by two arithmetic operations. Finally, at the end of the trial, participants recalled all of the digits in STM.
Behavioral results
The data of main interest for behavioral analyses were reaction times. Reaction time data were analyzed for correct responses only.
We began by testing the assumption that the active set could be distinguished from the passive set. Following Oberauer (2002, 2005), we hypothesized that a behavioral signature that the active set is held in an active state would be a set-size effect. That is, the duration of operations should grow with the size of the active set, consistent with the well-documented finding that scanning time in STM is a linear function of set size (Sternberg, 1966). By contrast, we hypothesized that there would be no effect of the passive set size because passive items should not be candidates for memory searches. A 2 × 2 ANOVA on reaction times with factors of active set size and passive set size revealed a significant effect of active set size (F(1,25) = 37.03, p < 0.0001), no effect of passive set size (F(1,25) = 0.59, p > 0.4), and no interaction (F(1,25) = 0.06, p > 0.8). Responses were 56 ms slower on average when the active set size was three relative to two (Fig. 2A). The same patterns were observed in error rates (active set size: F(1,25) = 8.51, p < 0.01; passive set size: F(1,25) = 0.96, p > 0.3; interaction: F(1,25) = 0.03, p > 0.85). These data confirm that scanning rate was affected by the active set size, but not passive set size, consistent with the hypothesis that each set was held in a different representational state in STM.
Although the above data indicate that subjects maintained the active and passive sets in different states, the need to recall all of the items at the end of the trial precluded subjects from discarding the passive set entirely. To confirm that the passive set was maintained, we examined the recall data. For recall, reaction times were not analyzed because the sequential recall procedure allowed strong anticipatory responses rendering the meaning of the response times ambiguous. Recall data were split into four conditions: active–active, active–passive, passive–active, and passive–passive where the first half of a pair indicates an item's role during the first half of a trial and the second half of a pair indicates an item's role during the second half of a trial. A one-way ANOVA on recall accuracy revealed a significant effect of condition (F(1,25) = 3.22, p < 0.05). This effect was driven by poorer recall accuracy for the passive–active condition (86.5%) relative to other conditions (active–active, 89.2%; active–passive, 88.7%; passive-passive, 90.3%; all pairwise t(25) > 2.05, p ≤ 0.05). No other pairwise comparisons approached significance (all pairwise t(25) < 1.14, p > 0.25). Thus, recall was not systematically worse for items held in the passive state consistent with the idea that even passive items were faithfully maintained in STM.
Next, we examined behavioral signatures of shifting representational states. A paired t test on reaction times comparing focus switch and focus repeat revealed a significant cost in shifting the focus of attention (t(25) = 5.08, p < 0.0001; 110.1 ms; Fig. 2B). A similar cost was observed in error rates (t(25) = 2.55, p < 0.05). To examine the effect of switching the active set, we contrasted reaction times to operation events following an active switch to those following an active repeat. Because two operations followed each active switch/repeat, operation number was included as a factor in a 2 × 2 ANOVA. This analysis revealed a significant effect of active switching (F(1,25) = 7.22, p < 0.05), a significant effect of operation number (F(1,25) = 6.77, p < 0.05), and a borderline interaction (F(1,25) = 4.19, p = 0.05; Fig. 2C). The interaction was driven by a significant active switch cost for the first operation (t(25) = 3.33, p < 0.005; 65.6 ms), but not for the second operation (t(25) = 0.06, p > 0.95; 1.2 ms). These results indicate that switching the active set slowed the first operation immediately following a switch but did not affect the second operation. A comparable ANOVA on error rates revealed no significant main effects or interaction (all p > 0.15). Collectively, the data support the hypothesis that switching both the focus and active set incurs behavioral costs.
We next examined the interaction between set size and switch costs. We hypothesized that the focus switch cost would interact with the size of the active, but not the passive set. This is because, when switching the focus of attention, a larger active set should introduce greater competition for shifting. This effect was found (focus switch cost with active set size two: 83.0 ms; focus switch cost with active set size three: 141.3 ms; t(25) = 1.89, p < 0.05, one-tailed; Fig. 2D). No such interaction was observed with the passive set size (t(25) = 0.19, p > 0.8). Furthermore, active switch costs did not interact with either the active (t(25) = −1.08, p > 0.25) or passive set sizes (t(25) = −0.16, p > 0.85). These data bolster the idea that the focus of attention is shifted only among items in the active set.
Finally, we examined whether eye movements differed as a function of switching the focus of attention. Neither saccade latency (t(21) = −1.50, p > 0.1) nor saccade magnitude differed during focus switch compared with focus repeat events (t(21) = 1.27, p > 0.2). Hence, the focus switch cost was not the result of eye movements but was instead attributable to the need to shift attention within STM.
Univariate fMRI results
We assessed areas involved in shifting the focus of attention in STM by comparing focus switch events with focus repeat events. Based on previous literature (Garavan et al., 2000; Bledowski et al., 2009; Tamber-Rosenau et al., 2011), we anticipated that shifting the focus of attention in STM would involve dorsal frontal areas implicated in both internal and external attention. Consistent with this idea, we found significantly greater activation for focus switches than focus repeats in the right caudal superior frontal sulcus (cSFS; MNI peak: 30 −6 48) in the vicinity of the frontal eye fields (Fig. 3A). A similar peak was observed in the left hemisphere, albeit slightly more laterally (MNI peak: −36 −8 54). Significant shifting-related activation was also observed in dorsal medial frontal cortex, including the anterior cingulate and presupplemental motor area, right dorsal premotor cortex, left ventrolateral PFC, including the inferior frontal junction, inferior frontal gyrus, and anterior insula, and inferior occipital cortex (for complete descriptions, see Table 1). A small-volume search of the MTL did not reveal any significant results.
Although the involvement of dorsal frontal areas is consistent with the idea that switching the focus engaged areas involved in shifting attention, attention shifting is also known to robustly engage the superior parietal lobule (Yantis et al., 2002; Yantis and Serences, 2003). Moreover, the superior parietal lobule has also been implicated in shifting internal attention (Garavan et al., 2000; Bledowski et al., 2009; Tamber-Rosenau et al., 2011; Nee et al., 2013). To examine whether the superior parietal lobule was similarly engaged here, ROIs were placed around peak activations reported in previous related work (Yantis et al., 2002; Tamber-Rosenau et al., 2011; Nee et al., 2013). Although an ROI based upon Yantis et al., 2002 (14 −56 62) did not demonstrate an effect of switching the focus (t(25) = 0.14, p > 0.8), significant focus-switching effects were observed in an ROI based upon Tamber-Rosenau et al., 2011 (14 −64 52; t(25) = 2.16, p < 0.05), as well as an ROI based upon a meta-analysis of attention-shifting in working memory (14 −66 60; t(25) = 1.91, p < 0.05, one-tailed). These results suggest that focus switching-related activations in the present data were limited to posterior aspects of the superior parietal lobule.
The reverse contrast (focus repeat − focus switch) revealed bilateral activation in the temporal-parietal junction, which was stronger in the right hemisphere. Notably, we and others have observed similar activations in previous studies when recognition probes match the focus of attention (Oztekin et al., 2010; Nee and Jonides, 2011, 2013a). In the past, we have attributed such activations to a pop-out effect that occurs when external stimuli match the focus of attention (Nee and Jonides, 2013b). This is consistent with the role of the temporal–parietal junction in bottom-up attention (Corbetta and Shulman, 2002; Cabeza et al., 2008; Cabeza et al., 2012). Activations were also observed in the right middle and superior temporal gyri. A small-volume search of the MTL did not reveal any significant results.
Next, we examined the contrast of active switch − active repeat to detect regions involved in shifting information between active and passive states. This contrast revealed activation in the left ventral premotor cortex (PMv; MNI peak: −60 4 30) and left occipital cortex (Fig. 3B; Table 1). The activations in left PMv were slightly posterior to activations we have previously observed in the inferior frontal gyrus during retrieval from the passive state of STM (Nee and Jonides, 2011, 2013a). The left PMv is often coactive with the left inferior frontal gyrus during the rehearsal of verbal information in STM and these regions are thought to work together to enable subvocalization (Paulesu et al., 1993; Smith and Jonides, 1997; Smith et al., 1998). So, these activations may reflect the rehearsal and reactivation of the passive set to transition the set to the active state of STM. A small-volume search of the MTL did not reveal any significant results. No significant activations were found for the reverse contrast.
Finally, we compared focus switching and active switching activations within the cSFS and PMv (Fig. 3C). ROIs were created using a leave-1-subject-out procedure (see Materials and Methods) to provide unbiased estimates of effect sizes and allow direct comparisons. Consistent with the whole-brain analyses reported above, the left cSFS demonstrated a significant focus switch effect (t(25) = 2.72, p < 0.05) but no active switch effect (t(25) = 0.73, p > 0.45). However, a switch × state interaction was not observed (F(1,25) = 0.26, p > 0.6). Similar effects were observed in the right cSFS (focus switch effect: t(25) = 3.80, p < 0.001; active switch effect: t(25) = 1.27, p > 0.2; switch × state interaction: F(1,25) = 0.66, p > 0.4). By contrast, the left PMv demonstrated no focus switch effect (t(25) = 1.07, p > 0.25) but a significant active switch effect (t(25) = 4.32, p < 0.0005). However, the switch × state interaction was not significant (F(1,25) = 2.38, p > 0.1). Similar effects were observed in the right PMv (focus switch effect: t(25) = 1.07, p > 0.25; active switch effect: t(25) = 2.30, p < 0.05; switch × state interaction: F(1,25) = 0.55, p > 0.45). To directly compare effects observed in the cSFS and PMv, we performed a 2 × 2 × 2 ANOVA with factors of switch (switch, repeat), state (focus, active), and region (cSFS, PMv). For this analysis, activations across hemispheres were combined. This analysis revealed a borderline switch × state × region interaction (F(1,25) = 4.14, p = 0.05). Decomposing this interaction revealed that it was primarily driven by a significant switch × state × region interaction across the right cSFS and left PMv (F(1,25) = 4.72, p < 0.05) with nonsignificant trends across other comparisons. Although these results do not demonstrate a strict double dissociation, they do indicate preferential roles of the cSFS in focus switching and PMv in active switching.
Functional connectivity interactions
The univariate analyses revealed that the cSFS was preferentially involved in shifting the focus of attention whereas the PMv was preferentially involved in shifting between active and passive states. To further understand the nature of state shifts, we examined areas that interact with the cSFS and PMv during shifting. To do so, we examined changes in functional connectivity using the β series method (Rissman et al., 2004).
Based on previous research, we hypothesized that the MTL maintains item-context bindings that support the active state (for review, see Nee and Jonides, 2013b). In the present task, cues that direct the focus of attention do so by presenting operations within frames that serve as contexts. Retrieval of the items themselves is therefore hypothesized to rely on the item-context bindings maintained by the MTL. This account predicts coordination between attention, mediated by the cSFS, and recollection, mediated by the MTL. When cues direct attention shifts, coordination between the cSFS and the MTL enables the activation of an item-context pair that becomes the focus of attention in STM. A second prediction is that switching the active set should disrupt established coordination between the cSFS and the MTL. If attention is focused on a particular item-context binding, switching the active item-context bindings will disrupt that focus. To make this idea intuitive, consider reading this sentence (focus) on this page (active set). If a colleague were to flip the page on you (switch active), the coupling between your attention and the page you were reading would be disrupted. Similarly, switching the active item-context bindings is predicted to disrupt focus-binding synchrony. Together, this account predicts a switch × state interaction between the cSFS and MTL. There should be greater coordination between the cSFS and MTL during focus switches (relative to focus repeats) and reduced coordination between the cSFS and MTL during active switches (relative to active repeats).
To test this idea, we placed a seed in the right cSFS and looked for areas showing a switch × state interaction in functional connectivity. Although no areas were found in a whole-brain search, a small-volume search of the MTL revealed a significant functional connectivity interaction in the right MTL, including the anterior hippocampus and parahippocampal gyrus (Fig. 4A; Table 2). To examine the nature of this interaction, an unbiased ROI was placed in the right anterior hippocampus based upon previous literature (see Materials and Methods). This analysis revealed that the interaction was driven both by increased connectivity between the cSFS and MTL during focus switches compared with focus repeats (t(25) = 1.76, p < 0.05, one-tailed) and a nonsignificant trend toward reduced connectivity between the cSFS and MTL during active switches compared with active repeats (t(25) = −1.60, p = 0.06, one-tailed) resulting in a significant switch × state interaction (F(1,25) = 6.78, p < 0.05).
Next, we performed a comparable analysis using the left PMv as a seed. This time, we looked for areas showing the reverse interaction (i.e., [active switch − active repeat] − [focus switch − focus repeat]). Once again, no areas were found in a whole-brain search, but a small volume search of the MTL revealed a significant functional connectivity interaction with the left MTL, including the posterior hippocampus and parahippocampal gyrus (Fig. 4B; Table 2). To examine the nature of this interaction, an unbiased ROI was placed in the left posterior hippocampus based upon previous literature (see Materials and Methods). This analysis revealed that the interaction was primarily driven by increased connectivity between the PMv and MTL during active switches compared with active repeats (t(25) = 2.12, p < 0.05) with no difference between focus switches and focus repeats (t(25) = −0.57, p > 0.55), resulting in a significant switch × state interaction (F(1,25) = 4.71, p < 0.05).
Collectively, these data indicate that shifts among representational states in STM are mediated by frontal–MTL interactions. cSFS-anterior MTL interactions are associated with shifts of the focus of attention, whereas PMv-posterior MTL interactions are associated with shifts of the active state.
Gateways between frontal cortex and the MTL
Although changes in functional connectivity indicated correlated activations between the frontal cortex and MTL, it is unclear how this coordination occurs. In particular, we are unaware of compelling evidence that the cSFS and PMv have direct anatomical connections to the MTL. However, tract-tracing studies in monkeys have demonstrated direct MTL connections with the lateral PFC (Pandya et al., 1981; Goldman-Rakic et al., 1984; Suzuki and Amaral, 1994; Morris et al., 1999), anterior cingulate cortex (Pandya et al., 1981; Insausti et al., 1987; Vogt and Pandya, 1987; Arikuni et al., 1994; Morris et al., 1999), and posterior parietal cortex (PPC) (Seltzer and Pandya, 1976; Seltzer and Van Hoesen, 1979; Cavada and Goldman-Rakic, 1989; Andersen et al., 1990; Suzuki and Amaral, 1994; Rockland and Van Hoesen, 1999). These areas also have direct connections to the cSFS (referred to as the frontal eye fields or area 8Ad in monkeys) (Barbas and Pandya, 1987; Barbas, 1988; Selemon and Goldman-Rakic, 1988; Felleman and Van Essen, 1991; Petrides and Pandya, 1994, 1999) and PMv (referred to as PMv, F5, or 6VR in monkeys) (von Bonin and Bailey, 1947; Pandya et al., 1981; Barbas and Pandya, 1987, 1989; Bullier et al., 1996; Luppino et al., 1999; Takada et al., 2004). Hence, it is possible that connections between the frontal–MTL areas observed above run through the lateral PFC, anterior cingulate cortex, or PPC.
To explore this issue, we performed a new set of β series correlations using areas in the MTL as seeds. We reasoned that areas that mediate the interaction between the cSFS/PMv and MTL should show functional connectivity interactions with both the cSFS/PMv and MTL. We began by searching for areas that show common functional connectivity interactions with the cSFS and anterior hippocampus. We computed the switch × state interaction contrast ([focus switch − focus repeat] − [active switch − active repeat]) for both the cSFS seed and anterior hippocampus seed. Then, we looked for areas showing a significant interaction with both seeds by performing a conjunction analysis (Nichols et al., 2005). This was done within a small-volume mask consisting of the lateral PFC, anterior cingulate cortex, and PPC based upon known anatomical connectivity in monkeys (see Materials and Methods). The conjunction revealed a single region in the right PPC (center 50 −58 32; area 39, 282 voxels; Fig. 5A), suggesting that the PPC acts as a gateway between the cSFS and MTL.
Next, we performed a similar conjunction analysis upon the PMv and posterior hippocampus. In this case, because the previously observed PMv–MTL interaction was driven entirely by the active switch > active repeat contrast, we looked for conjunctions using this simpler contrast. This analysis revealed three areas all in the PFC (Fig. 5B). The first was situated in the lateral frontal polar cortex (center −32 48 8; area 10, 46, 152 voxels), the second in ventrolateral PFC (center −50 34 8; area 45, 50 voxels), and the third in dorsal PFC (center −26 14 54; area 8, 232 voxels). These results suggest that the PFC acts as a gateway between the PMv and MTL.
Discussion
We examined how information is transitioned among representational states in STM. Behavioral data indicated distinct signatures of representational states. Information held in an active state demonstrated a set-size effect such that memory scanning times increased with the number of actively maintained items (Sternberg, 1966). No such effect was observed for passively maintained items. Nevertheless, passive items could be recalled with similar accuracy to active items demonstrating that they were still faithfully maintained in STM. Moreover, consecutive operations performed on the same item were accomplished more quickly than consecutive operations performed on different items indicating a cost in switching the focus of attention among items in STM. There were similar costs in switching the active and passive sets. These different forms of switching were associated with different neural correlates. Switching the focus of attention was associated with the cSFS, whereas switching the active set was associated with the PMv. Functional connectivity analyses revealed that the cSFS and MTL show functional interactions during switching. The cSFS and MTL were more correlated when the focus of attention was switched relative to when it was not. Conversely, correlations between the cSFS and MTL were disrupted when the active set was switched. These same functional connectivity interactions were observed between the cSFS and PPC, on the one hand, and MTL and PPC, on the other, suggesting that the PPC may be an intermediary between the cSFS and MTL. The PMv and MTL also showed switch-related interactions in functional connectivity such that these areas were more correlated when the active set was switched relative to when it was not. Similar effects were observed between the PMv and PFC, as well as the MTL and PFC, suggesting that the PFC may act as a gateway between the PMv and MTL. Collectively, these data demonstrate that frontal–MTL interactions mediate shifts of representational states in STM (Fig. 6).
Previous research has demonstrated a close commonality between the focus of attention in STM and attention to the external environment. Shifts of both internal and external attention elicit activations in dorsal frontal and parietal areas (Garavan et al., 2000; Bledowski et al., 2009; Tamber-Rosenau et al., 2011). Moreover, external stimuli that match the contents of the focus of attention in STM capture attention (Downing, 2000), but attention is not captured if information is held in a passive state (Downing and Dodds, 2004; Houtkamp and Roelfsema, 2006; Olivers et al., 2011). These results suggest that the attentional template that guides visual search (Desimone and Duncan, 1995) is synonymous with the focus of attention in STM (Nee and Jonides, 2013b). Consistent with this idea, while dorsal frontal and parietal areas are engaged during top-down search, ventral–parietal and temporal–parietal areas are engaged when the object of search is found (Corbetta and Shulman, 2002). Similarly, we observed dorsal frontal involvement during “searches” of STM evoked by shifts of the focus of attention but temporal–parietal activation when an item matched the focus of attention. These results bolster the link between internal and external attention.
There has been some evidence for interactions between attention and memory mediated by dorsal frontal areas and the MTL. In a pair of studies, subjects showed improved performance as they learned to search for a target hidden in a cluttered scene (Summerfield et al., 2006; Stokes et al., 2012). In these cases, subjects could use their memory of target-context bindings to guide their attention. Compared with unlearned scenes and visually cued scenes, memory-cued scenes produced greater activations in the hippocampus (Summerfield et al., 2006) and cSFS (Stokes et al., 2012). Although it stands to reason that memory guided attention in these cases, a direct link between the MTL and cSFS was not established. More recently, it has been shown that attention and memory networks that include the cSFS and MTL, respectively, interact during immediate free recall of 24-item lists (Kragel and Polyn, 2013). It is likely that free recall of long lists involves numerous state transitions in a dynamic interplay between attention and memory. Here, we have demonstrated increased correlations between the cSFS and MTL when attention and memory are coordinated to shift the focus of attention among actively maintained items. Furthermore, our results suggest that this link is mediated by the PPC.
We found that the PMv showed greater activation when the active and passive states were swapped compared with when they remained constant. Although not entirely expected, the PMv is hypothesized to work together with the ventrolateral PFC during the rehearsal of verbal content in STM (Paulesu et al., 1993; Smith and Jonides, 1997; Smith et al., 1998). Rehearsal serves to refresh and keep active phonological representations for their continued involvement in STM (Baddeley, 1986). Hence, one key feature that likely distinguishes the active from passive states is that the former are explicitly rehearsed whereas the latter are not (Nee and Jonides, 2013b). Although the PMv is generally considered a motor structure, it does have a more general role in sequencing cognitive events (Fiebach and Schubotz, 2006). The preferential involvement of the PMv in switching here may be the result of the need to initiate a new rehearsal sequence when the active set is switched.
Shifting among the active and passive states involved correlated activation in the PMv and MTL. Our data suggest that this correlation was coordinated by the PFC. Such coordination could facilitate the establishment of new item-context bindings in the MTL. The PFC, through its widespread anatomical connections, has a massive propensity for integration (Miller and Cohen, 2001). Greater PFC involvement has been observed when spatial and verbal content is bound compared with when each form of content is held in mind separately (Prabhakaran et al., 2000). Here, a similar spatial-verbal pairing (i.e., context-digit) needed to be established to perform the task. While the PMv may have been responsible for activating a new active set, the PFC may have been responsible for linking the active set to its corresponding context through coordination with the MTL. Once paired, the item-context bindings could continue to be maintained through MTL-mediated synchrony (Cashdollar et al., 2009; Nee and Jonides, 2013b).
It is interesting to note that distinct areas of the MTL were involved in different forms of representational transitions. Shifting the focus of attention was associated with the anterior MTL, whereas shifting the active set was associated with the posterior MTL. Recent theories propose that anterior regions of the MTL preferentially process item content, whereas posterior regions of the MTL preferentially process contexts (Diana et al., 2007; Eichenbaum et al., 2007; Ranganath and Ritchey, 2012). While such theories are supported by data from the LTM literature, our results map well onto this dichotomy as the focus of attention identifies an object within the larger context of the active set. However, this result should be taken with some caution because previous research demonstrating MTL involvement for retrieval of information outside of the focus of attention has identified various areas all along the anterior–posterior axis of the MTL (Nee and Jonides, 2008; Oztekin et al., 2009; Oztekin et al., 2010; Nee and Jonides, 2011, 2013a). So, whether the item-context dichotomy that has been observed for LTM maps onto distinct states of STM remains to be determined.
Previously, we and others have referred to the passive state with the term “activated LTM” (e.g., Nee and Jonides, 2013b). This terminology arose from the notion that the passive state forms a continuum with LTM such that those items considered to be in the passive state were simply “activated” beyond a certain level. Recently, it has been suggested that what distinguishes the passive state from information in LTM may not be activity in the sense of neural firing (Lewis-Peacock et al., 2012; LaRocque et al., 2013; Larocque et al., 2014). Instead, short-term plasticity lasting on the order of a minute may be the mechanism by which items in the passive state remain accessible (Mongillo et al., 2008; Erickson et al., 2010). In this way, both passive items and LTM rely on synaptic mechanisms. However, a different form of synaptic mechanism underlies the passive state of STM (short-term plasticity) and LTM (long-term plasticity).
Our results indicate a dynamic interplay between the frontal cortex and MTL during a complex STM task. Although traditional accounts have associated these regions with STM and LTM, respectively, it has become increasingly evident that both areas are engaged by diverse demands (Ranganath and Blumenfeld, 2005; Ranganath, 2006; Blumenfeld and Ranganath, 2007; Jonides et al., 2008; Nee et al., 2008). Here we have shown interactions between the frontal cortex and MTL when STM is challenged by the need to organize and prioritize information for use in ongoing cognition. Similar interplays are likely to arise during higher-level cognitive tasks, such as problem solving and reasoning, for which STM is considered central. Hence, frontal–MTL interactions that contribute to representational state shifts are likely to be of substantial importance for higher-level cognition more generally.
Footnotes
This work was supported by National Science Foundation Grant BCS 0822748 to J.J. and National Institute of Neurological Disorders and Stroke Grant NS082069 to D.E.N.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Derek Evan Nee, University of California, 132 Barker Hall, #3190, Berkeley, CA 94720-3190. denee{at}berkeley.edu