Abstract
The ability to discriminate between stimuli relies on a chain of neural operations associated with perception, memory and decision-making. Accumulating studies show learning-dependent plasticity in perception or decision-making, yet whether perceptual learning modifies mnemonic processing remains unclear. Here, we trained human participants of both sexes in an orientation discrimination task, while using functional magnetic resonance imaging (fMRI) and transcranial magnetic stimulation (TMS) to separately examine training-induced changes in working memory (WM) representation. fMRI decoding revealed orientation-specific neural patterns during the delay period in primary visual cortex (V1) before, but not after, training, whereas neurodisruption of V1 during the delay period led to behavioral deficits in both phases. In contrast, both fMRI decoding and disruptive effect of TMS showed that intraparietal sulcus (IPS) represented WM content after, but not before, training. These results suggest that training does not affect the necessity of sensory area in representing WM information, consistent with the sensory recruitment hypothesis in WM, but likely alters the coding format of the stored stimulus in this region. On the other hand, training can render WM content to be maintained in higher-order parietal areas, complementing sensory area to support more robust maintenance of information.
SIGNIFICANCE STATEMENT There has been accumulating progresses regarding experience-dependent plasticity in perception or decision-making, yet how perceptual experience moulds mnemonic processing of visual information remains less explored. Here, we provide novel findings that learning-dependent improvement of discriminability accompanies altered WM representation at different cortical levels. Critically, we suggest a role of training in modulating cortical locus of WM representation, providing a plausible explanation to reconcile the discrepant findings between human and animal studies regarding the recruitment of sensory or higher-order areas in WM.
Introduction
The ability to differentiate between similar features is essential for visual recognition in complex environment. For instance, the predators must learn to discriminate the prey items from surroundings to ensure survival. Learning and experience are known to improve the discrimination ability even in adulthood by re-organizing the brain functions and connections (Sagi and Tanne, 1994; Kourtzi and DiCarlo, 2006; Gilbert et al., 2009; Watanabe and Sasaki, 2015; Dosher and Lu, 2017; Hooks and Chen, 2020). Previous studies have focused on how training alters perceptual encoding of the stimuli (Schoups et al., 2001; Schwartz et al., 2002; Furmanski et al., 2004; Yang and Maunsell, 2004; Yotsumoto et al., 2008; Jehee et al., 2012; Yan et al., 2014; Chen et al., 2015) or the decision-making process (Law and Gold, 2008; Kahnt et al., 2011; Kuai et al., 2013; Dosher and Lu, 2017). However, mnemonic processing also matters for discrimination judgments where the to-be-compared stimuli are often sequentially presented. In these tasks, participants are required to encode a sample item and hold it in working memory (WM) for later comparison with a test item. Yet whether and how training on these tasks modifies mnemonic processing of stimuli remain largely unclear.
The view that percepual learning may change mnemonic processing of stimuli received support from findings of the relationship between WM and discrimination ability (Cornette et al., 2001; Brady et al., 2013; Ester et al., 2014; Zhang et al., 2016). In particular, variability of neuronal activity during WM retention is proposed as a potential indicator of the discrimination performance (Hussar and Pasternak, 2010; Qi and Constantinidis, 2015). The amount of information carried by the activity patterns during WM delay correlates with mnemonic precision (Ester et al., 2013) and performance changes as a function of WM load (Emrich et al., 2013). These findings point to the assumption that learning-dependent improvement of discriminability may be accompanied by modified WM representation of the stimuli. It has been established that multiple levels of cortical areas are recruited for representing WM information (D'Esposito, 2007; Christophel et al., 2017; Dotson et al., 2018). In particular, intraparietal sulcus (IPS) area is identified as a candidate region for mnemonic processing of the stimuli (Song and Jiang, 2006; Bettencourt and Xu, 2016; Weber et al., 2016; Lorenc et al., 2018), while the sensory recruitment account of WM suggests that primary visual area (V1) is also engaged for temporary maintenance of WM content (Pasternak and Greenlee, 2005; Ester et al., 2009; Harrison and Tong, 2009; Serences et al., 2009). Representing WM content at multiple areas could play complementary roles such that sensory areas encode precise sensory information and higher-order areas provide abstract and robust representation (D'Esposito, 2007; Christophel et al., 2017). Here, we particularly focus on these two regions of interest (ROIs) to examine learning-dependent alterations of WM representation.
To this end, we trained participants on a two-interval forced-choice (2IFC) orientation discrimination task that required temporary maintenance of the sample stimulus during a delay period. In Experiment 1, we combined functional magnetic resonance imaging (fMRI) with multivariate pattern analysis (MVPA) to investigate learning-dependent changes of WM representation in V1 and IPS. In Experiment 2, we used online repetitive transcranial magnetic stimulation (rTMS) to test the effect of training on the causal role of these two areas during WM processing. We found orientation-specific patterns during WM delay in V1 before, but not after, training, whereas V1 stimulation during the delay period impaired behavioral performance in both phases. In contrast, both fMRI decoding and TMS effect indicated that IPS represented WM content after, but not before, training. These findings suggest that perceptual learning modified mnemonic processing at different cortical levels.
Materials and Methods
Experiment 1: fMRI
Participants
Sixteen participants (nine females; age range: 18–26 years) took part in this study. The sample size is comparable to those reported in previous work on perceptual learning (Zhang et al., 2010) or fMRI decoding of WM content using discrimination tasks (Ester et al., 2009; Gosseries et al., 2018; Lawrence et al., 2018). All participants had normal or corrected-to-normal vision, and reported being right-handed. They were naive to the aim of the study and received payment on completion of the experiment. All participants gave written informed consent and the study protocol was approved by the local ethics committee.
Stimulus and apparatus
We presented Gabor patches (Gaussian windowed sinusoidal gratings) in either upper-left or lower-right visual field with an eccentricity of 6.5° against a gray background (∼35 cd/m2). The Gabor stimuli of random phase had a fixed diameter of 4°, contrast of 0.8, spatial frequency of 1.5 cycle/°. The angle of Gabor stimuli was tilted clockwise or counterclockwise from the base orientations (55° or 145°).
The stimuli were generated using Psychtoolbox 3.0 (Brainard, 1997; Pelli, 1997) for MATLAB (MathWorks). In the behavioral lab, the stimuli were presented on a Dell cathode ray tube (CRT) monitor with the size of 40 × 30 cm2, resolution of 1024 × 768 and a refresh rate of 60 Hz. Gamma correction was applied to the monitor. We used a chin-rest to stabilize participants' head position and maintain the viewing distance at 90 cm. Participants were asked to make responses using a keyboard. Inside the MRI scanner, the stimuli were back-projected onto a translucent screen located inside the scanner bore (resolution, 1024 × 768; refresh rate, 60 Hz). Participants viewed the stimuli at a distance of 90 cm through a mirror placed above their eyes. An MRI-compatible response box was used for making responses.
Experimental design and statistical analysis
All participants completed four phases in this experiment, each phase consisted of multiple sessions (Fig. 1A): (1) a 2-d pretest, (2) a 6-d training, (3) a 2-d posttest I; and (4) a 2-d posttest II. Posttest I and posttest II phases were separated for around 10 d to assess the stability of training effect. Each test phase comprised of a behavioral test session (first day) and a scanning session (second day) on two consecutive days.
Behavioral tasks
We used a 2IFC, orientation discrimination task throughout the experiment. Two types of tasks, a short-delay and a long-delay task, varying in the length of delay period between the stimuli were included (Fig. 1B,C). Similar to the conventional learning regimen, we used a short-delay of 0.6 s to measure behavioral performance during the training and behavioral test sessions. To isolate memory-specific activity from fMRI signal (Harrison and Tong, 2009; Serences et al., 2009), we used a long-delay of 11.8 s during the behavioral tests and scanning sessions. Note that our WM task required holding of one orientation in WM, which differed from those manipulated WM load. This design was chosen as it is commonly used in studies on perceptual learning (Schoups et al., 2001; Jehee et al., 2012) and fMRI decoding of WM content (Harrison and Tong, 2009; Serences et al., 2009; Bettencourt and Xu, 2016).
In the short-delay task, each trial began with a central fixation of a black dot shown for 0.6 s. In the long-delay task, each trial began with a central fixation dot that was white for 0.2 s and then turned into black for 3.8 s. The change of color was designed to remind participants of the trial onset. Participants were instructed to press a button once they saw the white dot. In both tasks, the sample and test Gabor stimuli were then sequentially presented for 0.2 s each, separated by a delay period (short-delay task: 0.6 s; long-delay task: 11.8 s). Participants were asked to report whether the test Gabor was tilted clockwise or counter-clockwise relative to the sample stimulus. A uniformly distributed jitter (±5°) was added to the base orientations (i.e., 55° or 145°) to encourage perceptual comparison between two Gabors in each trial, rather than direct retrieval of a constant stimulus template.
Staircase procedure
To equate task difficulty across different conditions throughout the experiment, we used adaptive staircase method (3-down-1-up, 15 reversals, step size of 0.5°) that converges to 79.4% accuracy in the orientation discrimination tasks. We adjusted the angle difference between the sample and test stimuli independently for each condition. The threshold in each run was determined by the mean angle difference of the last eight reversals.
Behavioral test sessions (first day of pretest, posttest I, and posttest II)
This session included both the short-delay and long-delay orientation discrimination tasks. The short-delay task consisted of four experimental conditions (2 stimulus orientations × 2 stimulus locations) to assess the effect of learning and the learning specificity for orientation and location. Participants started with 16 practice trials (4 trials per condition) using a fixed angle difference (10°) and then completed 12 staircase runs (∼65 trials per run, three staircases per condition in random order). For the first run of each condition, the starting angle difference was 8° with a step size of 0.5°. For the subsequent staircase runs, the starting value was the threshold of corresponding condition in the preceding run. Participants' performance in each session was quantified using the averaged thresholds across three staircases for each condition.
To keep consistency with the trial sequence in the scanning session, the long-delay task consisted of two stimulus conditions (i.e., ∼55° or ∼145°) shown only at the trained location. Participants began with 20 practice trials (10 trials per condition, fixed angle difference: 10°) and then completed one run of randomly interleaved staircases (∼65 trials for each condition). The starting angle difference was 8° with a step size of 0.5°. We quantified the performance using the threshold for each condition. No feedback on correctness was provided in any of these test sessions.
Scanning sessions (second day of the pretest, posttest I, and posttest II)
Participants completed six runs of a long-delay task (16 trials per run, eight trials for each orientation in randomized order). Each run began with an 8-s fixation. Trials were separated by a 10- or 12-s interval to allow fMRI signals to return to baseline. We measured performance with staircase procedure. The starting value was the threshold inherited from the preceding behavioral session in the corresponding test phase. In addition to the discrimination task, each participant completed a retinotopic mapping scan (6 min 20 s), a localizer scan (5 min 36 s) and an anatomic image scan (see ROI definition for details). No feedback on the correctness was provided in the scanning sessions.
Training sessions
Participants were trained on an orientation discrimination task with Gabors presented at the same orientation and location throughout training. In each session, participants performed 16 runs of short-delay task. We measured performance with staircase procedure. For the first run of the first session, the starting angle difference was 8° with a step size of 0.5°. For the subsequent staircase runs, the starting value was the threshold from the preceding run. Training locations (i.e., upper-left or lower-right) and orientation (i.e., 55° or 145°) were counterbalanced across participants. In addition, we provided auditory feedback on incorrect trials. We trained participants for 6 d, resulting in a total of ∼6200 trials.
Behavioral data analysis
To validate the training effect, we used paired t test to compare the discrimination thresholds between the first and last sessions of the training phase. To examine the effect of training on discrimination performance in the test phases, we calculated a mean percent improvement [MPI = (pretest threshold – posttest threshold)/pretest threshold × 100%; Xiao et al., 2008], separately for each posttest phase. For the short-delay task, we applied a three-way repeated-measures ANOVA (2 stimulus orientations × 2 stimulus locations × 2 posttest phases) on MPI. For the long-delay task, we applied a 2-way repeated-measures ANOVA (2 stimulus orientations × 2 posttest phases) on MPI because of the presence of stimulus solely at the trained location.
MRI data acquisition and preprocessing
Imaging data were acquired on a Siemens 3T Prisma scanner located at Peking University. All imaging data were acquired with a 20-channel head coil. For each participant, anatomic images were acquired using MPRAGE T1-weighted sequence (TR = 2530 ms, TE = 2.98 ms, FOV = 256 × 224 mm2, flip angle: 7°, resolution 0.5 × 0.5 × 1 mm3, number of slices: 192, slice thickness: 1 mm, slice orientation: sagittal). Functional scans were acquired using echo planar imaging (EPI) sequence (TR = 2000 ms, TE = 30 ms, FOV = 224 × 224 mm2, flip angle: 90°, matrix: 64 × 64, resolution 3.5 × 3.5 × 3.5 mm3, gap = 0.7 mm, number of slices: 33).
Each participant's anatomic image was segmented into gray and white matter using FreeSurfer (http://surfer.nmr.mgh.harvard.edu/). We performed the cortical reconstruction of the segmented images in BrainVoyager QX software (Brain Innovation). For the functional images, we discarded the first four volumes at the beginning of each run to ensure that the longitudinal magnetization reached steady state. The functional data were processed with slice-timing correction, head motion correction, temporal filtering (three cycles), and removal of linear trends in BrainVoyager QX. Within each scanning session, the functional data were aligned to the first volume of the first run and co-registered to the anatomic image obtained in the same session. Between scanning sessions, all anatomic images were aligned to the participant's own anatomic data acquired in their first session and transformed to the Talairach space. The functional data in the Talairach space were resampled into 3 × 3 × 3 mm3 resolution.
ROI definition and fMRI data analysis
Definition of V1
Participants viewed rotating wedges that created traveling waves of neural activity. We identified V1 boundaries using standard phase-encoded method (Sereno et al., 1995; Engel et al., 1997). In a separate localizer run, we mapped two location-specific areas in V1, corresponding to the stimulus locations from the orientation discrimination task (i.e., upper-left and lower-right). In each trial, a Gabor patch (55° or 145°) was presented at one of the locations for 2 s. The intertrial interval (ITI) was either 2 or 4 s. The location was randomized across 32 trials. Participants were asked to detect a subtle change of orientation. For each participant and each functional localizer, we computed each voxel's response using a general linear model (GLM) comprised of two regressors, one for each stimulus location. Contrasts comparing stimulus in one location to the other led to positive responses in V1 ROI contralateral to the stimulus location. We selected 40 voxels with top-ranked β estimates for each stimulus location, the exact number of voxels was determined by the minimal number of voxels across participants and V1 ROIs. This voxel selection regime controlled for potential biases in classification accuracy because of varying number of voxels across locations and participants.
Definition of IPS
We selected IPS ROIs that were functionally defined by the delay period activity, within the anatomically constrained regions. The delay period activity was primarily assumed to reflect WM storage, while it may also relate to other control-related processes (Christophel et al., 2017; Sreenivasan and D'Esposito, 2019). In particular, after applying anatomic segmentation in FreeSurfer, we used the automated ROI labels from Destrieux atlas (Destrieux et al., 2010) to transform the identified IPS into Talariach space. For each participant, we conducted a GLM analysis that modeled the WM-related activity after the sample stimulus (i.e., delay period) and the baseline activity after the test stimulus (i.e., ITI). The resulting β estimates that indicated statistically significant increases of delay period activity (p < 0.05) were used for voxel selection within anatomically-defined ROIs (Xu, 2007). Because of the more prominent delay period activity in left IPS, we defined IPS based on hemisphere (i.e., left and right IPS) to accord with previous studies that had similar observation of left-lateralized delay activity in IPS (Christophel et al., 2012; Albers et al., 2013; Ester et al., 2015). We selected 250 voxels with top-ranked β estimates in each hemisphere for further analysis. The exact number of voxels was determined by the minimal number of voxels across participants and ROIs.
Univariate analysis
We assessed whether training changes the overall BOLD response during WM delay. For each participant and each run, we first extracted z-normalized response amplitude of each voxel in the predefined ROIs (V1 and IPS). Then, we took the trial-averaging BOLD response between 0 and 26 s time locked to trial onset, separately for each experimental condition during the test phase. Because of our primary focus on the sustained activity during the delay period, we averaged the response across 12 and 14 s after the trial onset (7th and 8th TRs) that were uncontaminated by the test stimulus presentation (16 s after the trial onset, 9th TR). For each of the ROIs, we then applied two-way repeated-measures ANOVA (2 stimulus orientations × 3 test phases) on the delay activity to assess how training influenced WM-related activity.
MVPA
We used the MVPA to decode the stimulus orientation during the delay period in V1 and IPS. For each participant and each test phase, we extracted z-normalized BOLD responses between 12 and 14 s after the trial onset (7th and 8th TRs) and used the average of the two data points in each trial to represent the delay period activity. By training the classifier to discriminate between two orientations using LIBSVM (http://www.csie.ntu.edu.tw/∼cjlin/libsvm/), we calculated the classification accuracy with a leave-one-run-out cross-validation scheme that divided the data set into training (five runs) and testing data (one run). This procedure was repeated for six times until each run was tested once (Kamitani and Tong, 2006). The classification accuracy was averaged across the folds, separately for each test phase. To evaluate whether the classification accuracy exceeded the chance level, we performed the permutation test (see below, Permutation test). To assess how training influenced WM representation, we performed one-way repeated-measures ANOVA (3 test phases) on the classification accuracy for each brain area.
To validate that the orientation decoding in V1 reflected WM content, rather than residual sensory information, we conducted control analyses by applying MVPA on the neural activity during ITI (24 and 26 s after the trial onset, 13th and 14th TRs). The selected time window for ITI was supposed to contain the same amount of sensory information as that for the delay period (i.e., 8 and 10 s after the onset of the sample and test stimulus, respectively). To compare the classification accuracy between these two time periods (delay vs ITI), we conducted a three-way repeated-measures ANOVA (2 V1 ROIs × 2 time periods × 3 test phases). Further, to directly compare learning-dependent changes of decoding performance between V1 and IPS, we calculated the difference of classification accuracy between the pretest and two posttest phases (i.e., posttest I – pretest, posttest II – pretest), separately for V1 and IPS. A two-way repeated-measures ANOVA (2 ROIs × 2 posttest phases) were conducted to assess the training effect between brain areas.
Permutation test
We evaluated the statistical significance of MVPA results using the permutation tests (Stelzer et al., 2013; Allefeld et al., 2016). In particular, for each scanning session (pretest, posttest I and posttest II) and each brain area, we took the leave-one-run-out cross-validation approach, in which we shuffled the trial labels in the training data and calculated the classification accuracy on the test data. We obtained the classification accuracy that averaged across folds for each participant. Then, we averaged the accuracies across participants to obtain a mean value. This procedure was repeated for 5000 times to compute a group-level null distribution, consistent with the method used in previous studies (Chen et al., 2011; Cocchi et al., 2017; Roth et al., 2018; Henderson and Serences, 2019). We obtained the p values by calculating the proportion of random samples that exceeds the observed value (i.e., mean classification accuracy from real data). We applied false discovery rate (FDR) method (Benjamini and Hochberg, 1995) to correct p values for multiple comparisons across predefined ROIs and test phases.
Experiment 2: TMS
Participants
Twenty-three participants (12 females; age range: 19–26 years) were recruited for this study. Three of them did not participate TMS sessions after fMRI scanning because of the lack of elevated delay period activity in IPS (for details, see below, Definition of ROIs). The sample size was comparable to those reported in previous WM-related TMS studies (Zanto et al., 2014; Zokaei et al., 2014). All participants were neurologically intact, had normal or corrected-to-normal vision, and reported being right-handed. All participants gave the informed consent and the study protocol was approved by the local ethics committee.
Stimulus and apparatus
Identical stimuli were used as that in Experiment 1. The stimuli were presented against a gray background (∼15 cd/m2) on a CRT monitor (refresh rate: 60 Hz) for both behavioral and fMRI experiments. In the TMS lab, the stimuli were displayed on a gray background (∼19 cd/m2) on a CRT monitor (refresh rate: 100 Hz).
Experimental design and statistical analysis
The TMS experiment consisted of four phases (Fig. 2A): (1) a scan for defining ROIs (V1 and IPS) per participant, (2) a 2-d pretest, (3) a 6-d training, (4) a 2-d posttest. Pretest and posttest were completed 1 d before and after the training phase, respectively. The pretest and posttest phases consisted of a behavioral session (first day) and a TMS session (second day).
Scanning session
To guide precise stimulation of the target regions in TMS sessions, each participant completed a V1 localizer scan (two runs) and an IPS localizer scan (one run), in addition to an anatomic image scan and a retinotopic mapping scan (for details, see below, Definition of ROIs).
Behavioral test sessions (first day of the pretest and posttest)
Participants performed a long delay orientation discrimination task that was similar to that used in Experiment 1. In brief, the total duration of each trial was fixed to 7 s. Each trial began with a central fixation of a black dot shown for 0.6 s on a gray background. The sample and test Gabor stimuli were then sequentially presented for 0.2 s each, separated by a 4-s delay. A blank screen was then shown for 2 s. Participants were asked to report whether the test Gabor was tilted clockwise or counter-clockwise relative to the sample stimulus within 1.5 s. Notable changes were made to the task design for several practical concerns. First, we shortened the delay period from 11.8 to 4 s during online TMS stimulation. In Experiment 1, the sluggish BOLD signals require a long delay to isolate WM-related activity, which is not necessary for assessing the neurodisruptive effects of TMS. Second, taking into considerations of the stimulation site on the left hemisphere (see rTMS protocol), we presented the stimulus at one of the stimulus locations used in Experiment 1 (i.e., lower-right visual field) that corresponds to the left hemisphere. Third, we used one orientation (55°) as we aimed to compare the TMS effects on discrimination performance between test phases (i.e., pretest vs posttest) and between stimulation conditions (i.e., V1, IPS, and sham), rather than between two orientations (i.e., the trained and untrained orientations). Participants started with 40 practice trials (a fixed angle difference: 10°) and then completed two to three runs of the main task using the staircase procedure (∼65 trials per run), identical to that used in the behavioral test sessions of Experiment 1. No feedback on correctness was provided in these sessions except practice trials.
Training sessions
Participants were trained to discriminate the orientation around 55° presented at the lower-right visual field. On each session, they performed 16 staircase runs, using the same protocol as that in the training sessions of Experiment 1. We provided auditory feedback on incorrect trials. We trained participants for 6 d, resulting in a total of ∼6200 trials.
TMS sessions (second day of the pretest and posttest)
In an orientation discrimination task (Fig. 2B), we used a fixed angle difference determined by the threshold from the behavioral test session in corresponding phase for each participant. Participants started with 40 practice trials and then completed three runs (80 trials per run). In separate runs, the magnetic stimulation was delivered to one of the three stimulation conditions (V1, IPS, and sham) during the delay period. The order of the stimulation conditions was counterbalanced across participants. No feedback on correctness was provided in the TMS sessions.
TMS and MRI parameters
MRI data acquisition and preprocessing
Imaging data were acquired on a 3T GE MEDICAL SYSTEMS scanner located at Peking University using an eight-channel head coil. For each participant, anatomic images were acquired using T1-weight sequence (TR = 6.656 ms, TE = 2.92 ms, FOV = 256 × 256 mm2, flip angle: 90°, resolution 1 × 1 × 1 mm3, number of slices: 192, slice thickness: 1 mm, slice orientation: sagittal). Functional scans were acquired using EPI sequence (TR = 2000 ms, TE = 30 ms, FOV = 224 × 224 mm2, flip angle: 90°, matrix: 64 × 64, resolution 3.5 × 3.5 × 3.5 mm3, gap = 0.7 mm, number of slices: 33). The same preprocessing procedure was applied to fMRI data as that in Experiment 1.
Definition of ROIs
For each participant, the functional data were aligned to the anatomic data in native space. Identical to our ROI definition approach in Experiment 1, we defined V1 using retinotopic mapping and standard phase-encoded method. Further, we applied a GLM on data from two localizer runs to estimate each voxel's response in V1 (i.e., β estimate), allowing us to define the exact stimulus location (β: lower-right > upper-left). Further, we selected IPS voxels showing elevated delay period activity (β: delay period > ITI), while also locating within an anatomically defined IPS ROI. In addition, we included a control condition with sham TMS over the vertex. Vertex was defined as a midpoint between inion and nasion that was equidistant from left and right intertrachial notches. The coil was centered at the vertex with its face rotated 90° away from the scalp during stimulation. Thus, no cortical stimulation should be received during sham TMS.
rTMS protocol
To investigate the cause role of sensory and parietal areas during WM retention in discrimination tasks along with training, online rTMS was applied over V1 and IPS during the delay period. Online 10-Hz rTMS (five pulses synchronized with 1500 ms after the offset of the sample stimulus) was delivered at each stimulation site. This TMS protocol was shown to induce interference effects (Mevorach et al., 2010; Romei et al., 2010; Chang et al., 2014) and disrupt BOLD signals in the stimulated area in a concurrent fMRI-TMS study (Sack et al., 2007). We used a fixed intensity of 60% of the stimulator's maximum output for all participants that was comparable to previous studies on visual and parietal stimulation (Mevorach et al., 2010; Romei et al., 2010; Chang et al., 2014). Note that we did not use motor threshold to determine stimulation intensity for individual participant because it is not necessarily a reliable index of excitability in non-motor areas of the brain (Stewart et al., 2001; Robertson et al., 2003). Moreover, previous work showed that 10-Hz rTMS induced disruptive effects in both occipital and parietal areas (Romei et al., 2010), which were short-lived and observed only by the end of the TMS trains. Meanwhile, a concurrent EEG-TMS study found that this protocol led to progressively enhanced alpha activity during stimulation, which lasted for ∼100–150 ms after the last pulse of TMS train (Thut et al., 2011). Therefore, to avoid the disruptive effect of TMS on the processing of the test stimulus following the delay period, we added a 2-s delay by the end of the TMS trains. The TMS coil was air-cooled for >10 min after each run to prevent overheating of the coil during the experiment.
In particular, we included two target sites (i.e., left V1 and left IPS) for the following reasons. First, TMS effects on V1 for peripherally presented stimuli (>3°) are mainly restricted to the lower visual field (Kastner et al., 1998). Considering that we presented peripheral stimuli at an eccentricity of 6.5° in two stimulus locations (i.e., upper-left vs lower-right visual field) in Experiment 1, it is expected to be more effective to stimulate left V1 that responded to the lower-right visual field. Second, motivated by our IPS findings in Experiment 1 (see Results), we had a prior to stimulate the left IPS. In addition, we included a sham condition to account for nonspecific TMS effects related to variations in general behavioral state (e.g., noise, vigilance).
rTMS pulses were delivered through a MagStim Super Rapid2 stimulator (The MagStim Company) in combination with a 70-mm figure-of-eight coil. Using fMRI-guided Visor Navigation System (Visor2; Advanced Neuro Technology), we separately overlaid V1 and IPS on the anatomic MR image for each participant with their centroid serving as the target site. The center of the coil was placed tangentially over these sites and a mechanical arm was used to keep the coil steady on the scalp. During V1 stimulation, the coil was held with the handle pointing right and parallel to the ground. During IPS stimulation, the coil was held with the handle pointing away ∼45° along the midline (Capotosto et al., 2012; Morgan et al., 2013). The coil position in different sites was chosen based on the literature (Janssen et al., 2015) and was in real-time monitored using Visor2 throughout each session.
Behavioral analysis
Behavioral performance was quantified using the discrimination accuracy and reaction time (RT). To validate the effect of training, we used paired t test to compare the discrimination thresholds between the first and the last sessions of training phase. We applied two separate two-way repeated-measures ANOVAs (2 TMS conditions: active vs sham × 2 test phases: pretest and posttest) on the discrimination accuracy and RT. To parallel the comparison of fMRI decoding between two brain areas in Experiment 1, we performed two-way repeated-measures ANOVAs (2 stimulation sites: V1 vs IPS × 2 test phases: pretest and posttest) on the sham-normalized discrimination accuracy (i.e., V1 – sham, IPS – sham).
Results
Experiment 1: fMRI
Perceptual learning improves performance in short and long delay tasks
Perceptual learning improved participants' discrimination performance, as revealed by the decreased threshold from the first session (mean = 3.00°, SD = 0.81°) to the last session (mean = 2.11°, SD = 0.51°) of the training phase (paired t test: t(15) = 6.40, p < 0.001, Cohen's d = 1.600; Fig. 3A).
To assess the specificity of learning effect, we applied a three-way repeated-measures ANOVA (2 stimulus orientations × 2 stimulus locations × 2 posttest phases) on MPI in the short-delay task (Fig. 3B, left panel). The results showed a main effect of stimulus orientation (F(1,15) = 6.06, p = 0.026,
Perceptual learning does not change BOLD amplitude in V1 and IPS
To examine whether perceptual learning changes the response amplitude in V1 and IPS during the delay period, we used event-related analysis that compared BOLD response between the trained and untrained orientations. Figure 4 showed an example of the averaged temporal dynamics of BOLD responses in V1 and IPS from posttest I. A two-way repeated-measures ANOVA (2 stimulus orientations × 3 test phases) revealed no significant effects on the delay activity in either V1 (ps > 0.31) or IPS (ps > 0.12). Our results suggest that training did not alter overall BOLD response during the WM delay in sensory and higher-order areas.
Perceptual learning alters feature-specific WM representation in V1 and IPS
We next examined whether the feature-specific information was contained in the distributed pattern activity during the delay period and how training modulated such representation. Using MVPA (Kamitani and Tong, 2006), we decoded the stimulus orientation in V1 during the delay period, separately for each test phase (Fig. 5, left). Repeated-measures ANOVAs on the classification accuracy in V1 revealed a main effect of test phases (pretest, posttest I, and posttest II) in the contralateral (F(2,30) = 5.72, p = 0.008,
To ensure that selected time window for MVPA reflected WM retention and was not spuriously induced by residual effect of sensory processing, we applied the same MVPA analysis to the data from ITI, a period following the test stimulus (i.e., 8–10 s after the onset of the test stimulus). This period presumably contained comparable sensory information to that during the WM delay following the sample stimulus (i.e., 8–10 s after the onset of the sample stimulus) but without the demand of WM maintenance. This control analysis on ITI revealed chance level decoding performance in all of the test phases and V1 subregions (ps > 0.524) and no significant difference across test phases (repeated-measures ANOVAs: contralateral: F(2,30) = 0.02, p = 0.984,
To address whether high-order areas related to WM processing contained feature-specific information and how training influenced such representation, we trained the classifier to distinguish between two orientations in left and right IPS (Fig. 5, right). Repeated-measures ANOVAs revealed a main effect of test phase (pretest, posttest I, and posttest II) in left IPS (F(2,30) = 3.56, p = 0.041,
Given that multivariate fMRI results demonstrate distinct profiles of learning-dependent changes in classification performance between V1 and left IPS, we directly compared the change of decoding performance between these two brain areas. We first collapsed the results in two V1 ROIs (contralateral and ipsilateral) to obtain a single estimate, because a two-way repeated-measures ANOVA (2 V1 ROIs × 3 test phases) on classification accuracies revealed no main effect of V1 ROIs (F(1,15) = 0.12, p = 0.731,
Experiment 2: TMS
Replication of the learning effect on behavior in Experiment 1
Participants were trained on a short-delay orientation discrimination task and showed decreased threshold over the training sessions (first session: mean = 2.99°, SD = 0.93°; last session: mean = 2.22°, SD = 0.51°; paired t test: t(19) = 5.15, p < 0.001, Cohen's d = 1.152). Similar learning effect was also observed in the long-delay task (pretest: mean = 3.46°, SD = 1.11°; posttest: mean = 2.53°, SD = 0.55°; paired t test: t(19) = 3.62, p = 0.002, Cohen's d = 0.809). These results replicated the behavioral effects observed in Experiment 1, showing that training enhanced discrimination performance in the long-delay task.
The causal role of V1 and IPS in mnemonic processing over training
While our multivariate analyses in Experiment 1 suggest that perceptual learning changed the engagement of V1 and left IPS during WM retention, we further took advantage of TMS to infer causal relation between neural and behavior and examine how such relation was altered after training.
The results from Experiment 1 showed reliable orientation decoding in V1 during WM delay before, but not after, training. These results provide a seeming account that V1 became unnecessary for WM after training. We thus disrupted V1 activity during WM delay and compared the change of performance to the sham condition (Fig. 6A). A two-way repeated-measures ANOVA (2 test phases: pretest vs posttest × 2 stimulation conditions: V1 vs sham) on discrimination accuracy revealed a significant main effect of the stimulation condition (F(1,19) = 11.05, p = 0.004,
On the contrary, the fMRI results showed reliable orientation decoding in IPS during WM delay after, but not before, training. These results predicted a learning-dependent involvement of IPS for WM maintenance. With this rationale, we disrupted IPS activity during WM delay and compared the change of performance to that in the sham condition (Fig. 6A). A two-way repeated-measures ANOVA (2 test phases: pretest vs posttest × 2 stimulation conditions: IPS vs sham) on discrimination accuracy revealed a main effect of stimulation condition (F(1,19) = 8.38, p = 0.009,
To parallel the cross-region comparison of the training effect on fMRI decoding, we performed a two-way repeated-measures ANOVA (2 stimulation sites: V1 vs IPS × 2 test phases: pretest vs posttest) on the sham-normalized discrimination accuracy (see Materials and Methods). This analysis revealed neither a main effect of stimulation sites (F(1,19) = 1.29, p = 0.270,
To rule out the alternative possibility that the disruptive effect of TMS on the accuracy reflected the speed-accuracy trade-off, we performed the same ANOVA tests on RT (Fig. 6B). The results showed main effects of test phases (2 test phases: pretest vs posttest × 2 stimulation conditions: V1 vs sham: F(1,19) = 18.11, p < 0.001,
Discussion
In the present study, we provide evidence that training alters mnemonic representation of simple visual features (i.e., orientation) in a discrimination task. We focused on V1 and IPS that have been associated with WM retention for visual features (Harrison and Tong, 2009; Serences et al., 2009; Bettencourt and Xu, 2016; Weber et al., 2016). In particular, combining fMRI decoding and TMS techniques, we found orientation-specific information during WM delay that was decodable in V1 before, but not after, training; whereas the V1 stimulation led to decreased behavior performance both before and after training. In contrast, both fMRI decoding and TMS results showed that IPS represented WM content after, but not before, training. These findings thus point to learning-related changes in mnemonic representation of visual features at different cortical levels, complementing prior studies that mainly addressed learning-dependent alterations in sensory and decision-making processes.
Previous neurophysiological and neuroimaging studies have shown training-induced changes in visual cortex that presumably occurred at an early stage of encoding processes (Schoups et al., 2001; Schwartz et al., 2002; Furmanski et al., 2004; Yang and Maunsell, 2004; Yotsumoto et al., 2008; Jehee et al., 2012; Yan et al., 2014; Chen et al., 2015). Here, we examined the contribution of visual cortex to mnemonic representation. Converging pretraining results from fMRI decoding and TMS provided direct support for the theoretical hypothesis of “sensory recruitment of WM” (Pasternak and Greenlee, 2005; Harrison and Tong, 2009; Serences et al., 2009). Extending beyond these findings, we further tested the effects of training on the sensory engagement for WM. Unexpectedly, while the posttraining TMS results suggests a learning-independent mechanism of V1 during WM maintenance, we did not find decodable WM information in V1. Of note, the null result in V1 decoding after training was paired with positive results of V1 decoding before training, using the same sets of voxels and analytical approach. It is thus unlikely that this null effect reflected lack of sensitivity in voxel selection for decoding. Therefore, we speculate that training moulds the coding format in this region that is insensitive to decoding.
Although our observation of learning-independent behavioral impairment after V1 stimulation seems at odds with the inability to decode WM content in V1 specifically after training, such discrepancy may be reconciled by a time-varying attentional modulation on sensory processing given the close relationship between WM and attention (Awh and Jonides, 2001; Gazzaley and Nobre, 2012). Itthipuripat et al. (2017) found a dominance of attentional gain modulation on sensory activity early in training, which was abolished at late phase of training. This reported change of gain modulation (i.e., early vs late phase of training) may correspond to our observation of changes in pattern difference between two features (i.e., before vs after training) in V1. In particular, the lack of gain modulation after extensive training may correspond to the absence of decodable WM content. The inability to decode memorized features, however, does not necessarily mean the absence of information. Instead of a gain mechanism, training was proposed to improve performance via noise reduction (Itthipuripat et al., 2017), to which the MVPA of fMRI data might be insensitive.
An alternative interpretation for the lack of decodable WM information in V1 after training is that training induced synaptic changes in WM storage (Mongillo et al., 2008; Christophel et al., 2017; Masse et al., 2020). Previous studies using computational modeling offered a plausible mechanism of activity-silent short-term retention of WM, where the feature-specific information can be retained in the pattern of synaptic weights even in absence of persistent delay activity (Mongillo et al., 2008; Masse et al., 2020). We thus speculate that such effect of synaptic facilitation might be sufficient for mnemonic processing of features after training without relying on pattern-level differences. Nevertheless, our current study was not sufficient to discriminate between these two possibilities (i.e., noise reduction vs synaptic changes). Further investigations with advanced neurobiological techniques are needed to clarify this issue.
Another governing assumption in WM research has been that the retention of WM information is supported by the delay period activity in the frontoparietal network (D'Esposito and Postle, 2015; Sreenivasan and D'Esposito, 2019). Here, we also showed elevated delay period activity in IPS, which was not influenced by training. Although the pattern activity during the delay period did not contain feature information (i.e., orientation) before training, as also reported in other studies (Linden et al., 2012; Riggall and Postle, 2012), combined evidence from fMRI decoding and TMS suggest that after training, IPS became engaged for feature-specific mnemonic processing. Such learning-dependent changes in IPS may contribute to the formation of a more stabilized WM representation in higher-order areas. Combined with the findings in V1, we suggest that discrimination training may alter feature-specific WM coding at different cortical levels for complementary roles (Bettencourt and Xu, 2016; Lorenc et al., 2018). It is worth noting that some previous studies showed decodable WM content in IPS without training (Bettencourt and Xu, 2016; Cai et al., 2019; Rademaker et al., 2019), differing from the lack of IPS decoding before training in our data. This apparent discrepancy can be ascribed to the different parietal subregions defined across studies: we used the delay-period activity to define IPS, whereas other studies used retinotopic mapping procedure (Rademaker et al., 2019) or tasks measuring WM capacity (Bettencourt and Xu, 2016). This possibility was supported by the study from Bettencourt and Xu (2016), where they found decodable WM content in IPS regions sensitive to WM load, but not for IPS defined by topographic or anatomic features. In addition, the differences in the presented stimulus may also contribute to differences in fMRI decoding: we used a small, peripheral stimulus, in contrast to the large and centrally presented stimulus used in their studies (Bettencourt and Xu, 2016; Rademaker et al., 2019).
We should note that IPS may not be the unique region that shows robust delay period activity, as prefrontal cortex (PFC) also exhibited elevated delay activity that can be selective for different stimuli (Miller et al., 1996; Stokes et al., 2013; Ester et al., 2015). However, many neuroimaging studies using MVPA failed to decode WM content in this region (Christophel et al., 2012; Riggall and Postle, 2012; Lee et al., 2013; Ester et al., 2015; LaRocque et al., 2017). The absence of decodable WM content may reflect the limitations of applying MVPA techniques to PFC, as suggested by a meta-analysis (Bhandari et al., 2018) and other studies showing stimulus-specific response in PFC by means of the inverted encoding models (Ester et al., 2015). Another possibility is that PFC activity primarily represent higher order information, such as task rules, abstract representations of categories (Lee et al., 2013; Ester et al., 2015). Future studies may use other analytical approaches and neurophysiological methods to investigate learning-dependent modification in the WM representation in PFC.
Previous studies have produced mixed results regarding the involvement of V1 and IPS during WM delay, especially between human and non-human primates. Here, we point to the role of training that may explain such discrepancy. While human only need a few trials to familiarize the task, monkeys have to go through extensive training before recording their neural activity, analogous to the measurement of posttraining performance in human learning studies. In this regard, the decodable WM content found in human V1 (Harrison and Tong, 2009; Serences et al., 2009) versus weak (or absent) WM information reported in sensory areas from neurophysiological studies (Zaksas and Pasternak, 2006; Mendoza-Halliday et al., 2014) may reflect the key difference in training. Similarly, the decodable WM information contained in elevated delay period activity from neurophysiological studies (Baeg et al., 2003; Averbeck and Lee, 2007) versus the mixed results of WM decoding in parietal cortex from human neuroimaging studies (Linden et al., 2012; Riggall and Postle, 2012) may also relate to the difference in training. Thus, we provide a plausible account that can reconcile the discrepant findings across species. That is, the neural locus of WM representation depends critically on training.
Interestingly, a previous study that used feature discrimination task showed a lack of TMS effect when stimulating IPS both before and after training (Chang et al., 2014). However, in that study, two features were concurrently, rather than sequentially, shown to the participants. This design did not require short-term memory retention of a particular item and thus involvement of IPS was not necessary for performing the task. The contrast between our results and the findings in Chang et al. (2014) supported the critical dependence on parietal cortex to afford stable WM representation after training, specifically when the task included mnemonic processing of the stimuli (Postle, 2006; Xu, 2017).
In summary, discrimination training influenced the mnemonic processing of visual features in sensory and higher-order areas. Although the sensory engagement for WM is relatively independent of training, training may alter the coding format of WM content in this region. In contrast, the recruitment of higher-order parietal areas for WM representation depends on training, potentially contributing to a more stabilized representation along with improved discriminability. The feature-specific WM representation at different cortical areas may serve complementary roles to support learning-related brain plasticity throughout multiple cortical hierarchy (Watanabe and Sasaki, 2015; Dosher and Lu, 2017; Maniglia and Seitz, 2018).
Footnotes
This work was supported by the National Key R&D Program of China Grant 2017YFB1002503 and by National Natural Science Foundation of China Grants 31271081, 31230029, 31800910, and 32000784.
The authors declare no competing financial interests.
- Correspondence should be addressed to Sheng Li at sli{at}pku.edu.cn