Abstract
Although practice has long been known to improve perceptual performance, the neural basis of this improvement in humans remains unclear. Using fMRI in conjunction with a novel signal detection-based analysis, we show that extensive practice selectively enhances the neural representation of trained orientations in the human visual cortex. Twelve observers practiced discriminating small changes in the orientation of a laterally presented grating over 20 or more daily 1 h training sessions. Training on average led to a twofold improvement in discrimination sensitivity, specific to the trained orientation and the trained location, with minimal improvement found for untrained orthogonal orientations or for orientations presented in the untrained hemifield. We measured the strength of orientation-selective responses in individual voxels in early visual areas (V1–V4) using signal detection measures, both before and after training. Although the overall amplitude of the BOLD response was no greater after training, practice nonetheless specifically enhanced the neural representation of the trained orientation at the trained location. This training-specific enhancement of orientation-selective responses was observed in the primary visual cortex (V1) as well as higher extrastriate visual areas V2–V4, and moreover, reliably predicted individual differences in the behavioral effects of perceptual learning. These results demonstrate that extensive training can lead to targeted functional reorganization of the human visual cortex, refining the cortical representation of behaviorally relevant information.
Introduction
Plasticity is an essential quality of the brain. Whether we are learning to better recognize a face or to discriminate the finer details of an image, our day-to-day interactions with the environment require an adaptive neural architecture. One particularly well-studied form of cortical plasticity is that induced by perceptual learning, which can lead to highly specific improvements in behavioral performance for trained visual features presented at a trained retinotopic location (Fiorentini and Berardi, 1980; Gilbert et al., 2001; R. W. Li et al., 2004; W. Li et al., 2008). Such behavioral specificity is often interpreted as implicating the early visual cortex as the site of learned-induced neural plasticity, because of the receptive field properties in these areas. However, direct evidence for the involvement of these areas in perceptual learning has been mixed. While some neurophysiological studies have reported training-related changes in neural response in primary visual cortex (V1) (Schoups et al., 2001) and area V4 (Adab and Vogels, 2011), others have found only minimal effects in early visual areas (Ghose et al., 2002; Chowdhury and DeAngelis, 2008; Law and Gold, 2008). Human neuroimaging studies have shown increased activation in visual areas after training (Schwartz et al., 2002; Furmanski et al., 2004; Yotsumoto et al., 2008; Kourtzi, 2010), but this effect appears to vanish after extensive practice, despite the persistence of behavioral improvements (Yotsumoto et al., 2008). Might extensive training lead to changes in neural response that go undetected with conventional measures? It is possible that relatively subtle changes in neural tuning, as have been observed in previous neurophysiological studies, may leave the overall amplitude of the BOLD response unchanged, but that resulting shifts in the neuronal population code could still be detectable in the distributed patterns of fMRI activity.
In the present study, we used fMRI to determine whether orientation-selective responses in early human visual areas might be enhanced by extensive training on an orientation discrimination task. Of particular interest was whether cortical changes induced by perceptual learning would prove to be specific to the orientation and location of the stimuli viewed during training, consistent with the specific improvements in behavioral performance that mark low-level perceptual learning. We were also interested in investigating the potential role of attention in perceptual learning. Observers practiced discriminating small changes in the orientation of a peripherally presented grating over a series of 20 or more training sessions, and underwent fMRI scanning before and shortly after this training regimen. We developed a signal detection-based measure to estimate the mean strength of orientation-selective responses in individual voxels in early visual areas, based on previous work demonstrating the presence of reliable orientation signals at this spatial scale (Kamitani and Tong, 2005; Swisher et al., 2010). Although extensive practice did not lead to changes in the gross BOLD response, we found that training nonetheless improved the neural representation of the trained orientation at the trained location in V1 and higher extrastriate visual areas (V2–V4).
Materials and Methods
Observers
Twelve healthy adult volunteers (aged 25–34 years, six females) with normal or corrected-to-normal vision participated in the experiment. All participants gave informed written consent. The study was approved by the Vanderbilt University Institutional Review Board.
Apparatus
The stimuli were generated on luminance-calibrated displays, using Matlab and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). During fMRI scans, stimuli were displayed on a rear-projection screen using an Eiki LC-X60 LCD projector with a Navitar zoom lens. During training sessions and the pretraining and posttraining behavioral threshold measurements, stimuli were displayed on a calibrated CRT monitor. To minimize head movements, participants used a bite-bar system in the scanner and a chin-and-head rest in behavioral sessions.
Scanning was performed using a Philips 3-Tesla Intera Achieva MRI scanner with an eight-channel head coil located at the Vanderbilt University Institute for Imaging Science. We used standard gradient-echo echoplanar T2*-weighted imaging to obtain functional images of the occipital lobe as well as posterior parietal and temporal cortex (TR, 2000 ms; TE, 35 ms; flip angle, 80°; FOV, 192 × 192; slice thickness, 3 mm, no gap; in-plane resolution, 3 × 3 mm; 28 slices oriented perpendicular to the calcarine sulcus).
Stimuli and design
Behavioral training.
Throughout the experiment, observers were instructed to maintain fixation on a small, central bull's-eye target. Stimuli consisted of counterphasing full-contrast sinusoidal gratings (∼55 or ∼145°) that were presented 5° to the left or right of fixation against a uniform gray background (grating radius, 3.5°; spatial frequency, 1.0 cycles/deg with randomized spatial phase; temporal frequency, 2 Hz sinusoidal contrast modulation). Spatial frequency changed slightly on each trial (randomly drawn from a Gaussian distribution; mean, 1.0 cycles/deg; SD, 0.01 cycles/deg). Contrast decreased linearly to zero over the outer 0.5° radius of the grating. Using a counterbalanced design, observers were randomly assigned to a trained stimulus, which had a fixed base orientation (∼55 or ∼145°) and a fixed spatial location (left or right visual field) throughout the training sessions. Each trial started with the presentation of a central cue (250 ms on, 250 ms off), followed by a first grating (1000 ms), a brief interstimulus interval (500 ms), a second grating (1000 ms), and then a 1000 ms response period. On each trial, subjects performed a two-interval forced-choice orientation discrimination task, reporting via a button press whether the second grating was rotated clockwise or counterclockwise relative to the first. Immediately following a correct response, subjects were given feedback by a short auditory tone. The base orientation was varied slightly on every trial (randomly drawn from a Gaussian distribution; mean, 55 or 145°; SD, 2°) to ensure that the stimuli presented during each interval had to be actively compared to each other, rather than to remembered information about the average base orientation. The additional Gaussian jitter added to the base orientation on each trial ensured that the variance of the presented orientations did not differ between the trained and untrained orientations (t(22) = 0.71, p = 0.48) or before and after training (t(22) = 0.17, p = 0.86). The change in orientation between gratings on each trial, and subsequent orientation discrimination threshold estimates, were determined using an adaptive staircase procedure at 75% accuracy (Watson and Pelli, 1983). Subjects extensively practiced this discrimination task, with training occurring in daily 1 h sessions across 20–23 d (∼10,000 trials in total). To help motivate observers, additional monetary awards of up to $5 daily were given based on the precision of absolute orientation thresholds attained by the observer on that day of training.
Pretraining and posttraining psychophysical test sessions.
We measured orientation discrimination thresholds for both trained and untrained orientations and locations, before and after the conclusion of the series of training sessions. Observers performed the same orientation discrimination task as during training, but for both the trained and orthogonal untrained base orientations (∼55 and ∼145°) and at two locations (5° to the left or right of fixation). The magnitude of orientation change on each trial was determined by separate adaptive staircases for each base orientation and location. Subjects received no trial-by-trial feedback on the correctness of their response for these test sessions. The trial structure was otherwise identical to that of the training sessions.
Pretraining and posttraining fMRI.
BOLD activity was measured before and after training, while subjects performed the orientation discrimination task on each of four possible base orientations (∼10, ∼55, ∼100, or ∼145°) and at both the trained and untrained stimulus locations. Trial structure and stimuli were as in the pretraining and posttraining behavioral sessions, except that gratings were displayed in both locations simultaneously, allowing us to compare training-related changes in BOLD activity for attended versus unattended gratings. Before each trial, observers were shown a compound white/black cue that straddled the fixation point (±0.5°). The color of the cue indicated with 100% validity the location of the upcoming task-relevant stimulus, with the relevant cue color counterbalanced across subjects. The design of this cue ensured balanced visual stimulation in the two hemifields. A single run of fMRI scanning consisted of an initial fixation block followed by eight stimulus blocks and a final fixation block. All blocks were 16 s in duration, except for the final fixation block, which lasted 24 s. Each stimulus block consisted of four trials in which gratings were presented at the same base orientation. The grating orientation for each block was determined independently for the trained and untrained locations. The small orientation variations in each trial were determined by separate adaptive staircases for each base orientation and location, and orientation variations of equal magnitude were applied to both attended and unattended orientations. The task-relevant location alternated every stimulus block. Participants completed 20–28 orientation discrimination runs in each scan session.
Spatially selective visual regions were identified using two visual localizer runs, in which subjects viewed flickering checkerboard stimuli presented in the same locations as the lateral gratings (checker size, 0.5°; display rate, 10 images/s; edge, 0.5° linear contrast ramp). The checkerboard stimulus was presented alternately in the left hemifield and right hemifield for 12 s blocks, interleaved between blocks of fixation (run duration, 300 s). Visual areas were mapped during separate scan sessions using conventional retinotopic mapping procedures (Engel et al., 1997; Sereno et al., 1995).
Functional imaging data were initially motion corrected using FSL's MCFLIRT (Jenkinson et al., 2002). Brain Voyager QX (version 1.8, Brain Innovation) was used for subsequent preprocessing, including slice scan timing correction and linear trend removal. No spatial or temporal smoothing was performed. The functional data were aligned to a previously collected anatomical reference scan and resliced into a common volumetric space before subsequent analysis.
Eye position was successfully monitored in the fMRI scanner for five of our subjects, using an MR-compatible Applied Science Laboratories EYE-TRAC 6 eye-tracking system (60 Hz). Data were corrected for blinks and slow linear drift. Analysis of the data confirmed that subjects maintained stable fixation throughout the recording sessions. Mean eye position deviated by <0.05° of visual angle between stimulus blocks, and the stability of the eye position did not differ between any of the orientation conditions (all p > 0.6).
Regions of interest.
Regions of interest (ROIs) were defined on the reconstructed cortical surface for V1 and extrastriate areas V2, V3, V3A, and hV4 combined, separately for each hemisphere (Sereno et al., 1995; Engel et al., 1997). Within area V1, all voxels that responded to the contralateral localizer stimulus at a lenient threshold (p < 0.05 uncorrected, one-tailed t test) were then selected and used as the ROI for subsequent analysis. Within areas V2–V4 combined, we selected the same number of voxels as in area V1, which were the most significantly activated by the contralateral localizer stimulus. For the BOLD amplitude analysis, the time series of each voxel was normalized to percent signal change units, after which we fitted the time series with a general linear model, assuming a temporally shifted gamma function as a model for the hemodynamic response function, and spatially averaged across all voxels in the ROI to produce a single measure for response amplitude. For the signal detection analysis, activation patterns for each block were defined by averaging together the 16 s of data for that block, after adding a 4 s temporal shift to account for hemodynamic delay, and normalized to the percentage signal change units, defined relative to the average activation level across all task blocks. Separate ROIs were defined for each hemisphere, with activity patterns in each hemisphere used to determine the discriminability of the orientation stimuli presented in the contralateral visual field.
Signal detection analysis of voxel orientation selectivity.
Previous studies have shown that linear classifiers can accurately predict viewed orientations when applied to fMRI activity patterns from the human visual cortex, due to the presence of modest orientation biases in the response of individual voxels (Kamitani and Tong, 2005; Serences et al., 2009; Swisher et al. 2010). In the present study, we developed a signal detection-based measure (d′) to estimate the strength of orientation-selective responses in individual voxels, based on the rationale that this continuous measure may be more sensitive to detecting subtle changes in orientation selectivity across experimental conditions or sessions. Linear classifiers, such as support vector machines, are ideally suited for predicting the category of a novel test pattern given a set of training data, but the resulting measures of classification accuracy will provide a fairly coarse, discretized estimate of the amount of information contained in the data set, especially if the number of data samples is small.
For the signal detection analysis, we measured how well a voxel could discriminate between two different orientations (i.e., orientation pairs) by measuring the discriminability (d′) of its responses to the two stimuli across multiple stimulus blocks. The distance d′ between two signal distributions is described by the means and the variances of the two distributions:
where μij is the mean activity and σij is the SD for voxel i when orientation j was presented, and μik is the mean activity and σik is the SD for that same voxel when orientation k was presented. Because we were interested in an unbiased measure of d′, for which the noise distribution is centered around 0, we computed dijk′ in individual voxels and for all possible pairs of orientations using a cross-validation procedure. First, we used the data from all but two runs to determine the sign α of dijk′ in a given voxel for a given pair of orientations (α can be either 1 for an increase in activity or −1 for a decrease in activity). Then, for the same voxel and pair of orientations, we used the data from the two remaining independent test runs to calculate dijk′, and multiplied this value with α. We repeated the cross-validation procedure until each run had served in the test data set twice, and then calculated the average of α dijk′ across all test runs. We determined the average d′ across all voxels in the ROI for all relevant pairs of orientations to arrive at a single d′ value for each particular condition and region of interest.
With respect to other pattern analysis approaches, our d′ metric is related to the Mahalanobis distance, which is a distance measure for multivariate distributions. A key difference is that d′ calculates the scaled distance (in SD units) between two orientation distributions along every single voxel dimension and then averages the distance values across all dimensions, whereas the Mahalanobis distance takes into account the full covariance structure of the data to arrive at the multivariate distance. Full covariance matrices try to account for the correlations in the data between separate feature dimensions, which can be difficult or impossible to estimate accurately when the number of data samples is limited and the number of dimensions is many. When only a limited number of data samples are available, then excluding the off-diagonal values of the covariance matrix (e.g., naive Bayes classifier with diagonal covariance matrix) can sometimes lead to a more robust estimator; our d′ metric is closely related to this “diagonal” solution for calculating multivariate distances.
Learning modulation index.
We defined learning modulation index (LMI) measures so as to be sensitive to relative changes in activity or performance between orientations, independent of nonspecific changes in performance across scanning sessions. LMI measures for BOLD amplitude and d′ were defined as (posttrained − pretrained orientation) − (postuntrained − preuntrained orthogonal orientation), so that positive values represent a greater increase in activation or accuracy with training for the trained orientation compared to the orthogonal untrained orientation. By contrasting changes for the trained orientation to those for the untrained orientation, the LMI measure isolates those effects specific to the trained task and distinguishes these from general effects of practice or common sources of variance within scanning sessions, such as subject motion.
Results
Twelve participants practiced discriminating small changes in orientation across successive presentations of a peripherally viewed sinusoidal grating, which throughout each participant's training was displayed at the same location in the visual field and with the same base orientation. Participants practiced this orientation discrimination task in daily 1 h sessions held over 20–23 d, for a total of ∼10,000 trials.
Perceptual performance improved substantially over the course of training, with the smallest discriminable change in the practiced orientation dropping on average to approximately half of that seen before training (Fig. 1a). To assess the specificity of this improvement, we measured orientation discrimination thresholds for both trained and untrained orientations, in separate behavioral test sessions performed before and after the training regime. Specifically, we measured orientation discrimination thresholds for stimuli presented at the trained location and at the opposite location in the visual field, evaluating performance for both the trained orientation and the untrained orthogonal orientation. We summarized the featural specificity of the improvement following training by calculating an LMI, defined as (posttraining − pretraining performance for the trained orientation) − (posttraining − pretraining performance for the untrained orientation). Because lower behavioral threshold values indicate superior performance, we multiplied all behavioral LMI values by −1, so that positive LMI measures indicate greater training-related improvements in performance for the trained orientation than for the untrained orientation (Fig. 1b). The behavioral LMI was significantly higher at the trained location than at the untrained location, as indicated by a within-subjects ANOVA performed on the behavioral threshold values with factors of time of measurement, orientation, and visual field location (interaction between time, orientation, and location, F(1,11) = 9.29, p = 0.01). Additional analysis revealed that the training-related decrease in orientation discrimination threshold was statistically significant for the trained orientation at the trained location (t(11) = 11.11, p < 0.001), but not for the untrained orientation or at the untrained location (all p > 0.18). Thus, practice specifically enhanced behavioral performance for the trained orientation at the trained location.
Effects of learning on behavioral performance for trained versus untrained orientations. a, Orientation discrimination thresholds plotted over time (N = 12). Mean thresholds for discriminating changes about the trained base orientation at the trained visual field location decreased substantially over training sessions. The benefits of training on a particular stimulus did not transfer to untrained stimuli. Because observers trained for a variable number of days (20–23), only the first 20 training days are shown. In this and subsequent figures, shaded area and error bars correspond to ±1 SEM. Points are jittered along the x-axis to aid in data visualization. b, To quantify the extent and specificity of improvements in behavioral performance with perceptual learning, changes in behavioral thresholds were plotted as an LMI [(pretrained − posttrained orientation threshold) − (preuntrained − postuntrained orthogonal orientation threshold)]. Positive LMIs correspond to orientation-specific improvements after training, which were evident only for the trained base orientation at the trained location (trained orientation at the trained location, t(11) = 11.11, p < 0.001; other conditions, all p > 0.18).
What changes in cortical processing underlie this marked improvement for the trained visual orientation? Using fMRI, we first assessed whether there were any training-related changes in overall BOLD activity in the early visual cortices, which are known to encode basic visual features such as orientation. We measured BOLD activation in separate functional imaging sessions before and after the training regime, while subjects performed the orientation discrimination task on each of four possible base orientations (i.e., the trained orientation, the orthogonal untrained orientation, and two flanking orientations at ±45°) and at both the trained and untrained stimulus locations. Gratings were presented in both visual hemifields simultaneously, with a cue near fixation instructing observers to perform the discrimination task on only one location at a time. This manipulation allowed us to assess the effects of visuospatial attention on cortical responses evoked by trained and untrained gratings.
A functional localizer was used to define ROIs in visual areas V1–V4 that corresponded to the retinotopic representation of the stimuli. To facilitate comparison across measurements, and to account for between-session variability that could arise from nonspecific factors unrelated to training, such as subject arousal and head motion, we summarized orientation-specific increases in overall BOLD amplitude within these ROIs (Fig. 2a,b) as LMI values (Fig. 2c). Similar to the behavioral LMIs, positive values for amplitude LMIs indicate greater increases in mean BOLD amplitude after training for the trained orientation, compared to the orthogonal untrained orientation.
a–f, Effects of learning on gross BOLD amplitude for the attended (a–c) and unattended (d–f) conditions. a, b, Average fMRI responses for trained and untrained orientations at both trained and untrained locations, with attention. Data are shown for V1(a) and V2–V4 (b). Because the pattern of BOLD activity was qualitatively similar across the orthogonal and flanking orientations, data were collapsed across all untrained orientations in a, b, d, and e. Training induced no orientation- or location-specific change in gross BOLD amplitude (F(3,33) = 0.59, p = 0.62). c, LMIs for gross BOLD amplitudes in the attended condition. d, e, Average fMRI responses for trained and untrained orientations presented at both trained and untrained locations while attention was directed away from the stimuli. Data are shown for V1(d) and V2–V4 (e). Training induced no orientation- or location-specific change in overall BOLD response (F(3,33) = 1.67, p = 0.19). f, LMIs for gross BOLD amplitudes in the unattended condition.
Our analysis of mean BOLD responses to attended visual stimuli indicated no reliable changes in early visual areas due to training. A within-subjects ANOVA performed on mean BOLD amplitudes with factors of ROI, time of scanning, orientation, and visual field location indicated no greater effect of training on gross BOLD amplitude for the trained orientation at the trained location than for the untrained orientation or location (interaction between time, orientation, and location; F(3,33) = 0.59, p = 0.62). Replicating previous findings (Brefczynski and DeYoe, 1999; Gandhi et al., 1999; Somers et al., 1999), we did observe a strong effect of covert spatial attention in early visual areas, with greater BOLD responses observed at attended than unattended stimulus locations (five-factor ANOVA with attention included as an additional factor; main effect of attention, F(1,11) = 85.22, p < 0.001). However, this increase was not specific to the trained orientation at the trained visual-field location. Much like the attended condition, the BOLD amplitudes did not significantly change due to training when the contralateral stimulus was unattended (Fig. 2d–f; four-factor ANOVA, interaction between time, orientation, and location; F(3,33) = 1.67, p = 0.19).
Previous human neuroimaging studies have reported increased BOLD activation in early visual areas after training (Schwartz et al., 2002; Furmanski et al., 2004; Yotsumoto et al., 2008; Kourtzi, 2010), but at least in some circumstances this effect has been found to disappear with more extensive practice, despite the persistence of behavioral improvements (Yotsumoto et al., 2008). Although we found no reliable change in the overall mean BOLD response after 20 d of perceptual training, it is possible that finer-scale changes in the patterns of cortical responses might have occurred as a result of extensive training with a specific visual orientation. Relatively subtle changes in neuronal tuning and response variability have been reported in single-unit recording studies of perceptual learning (Schoups et al., 2001; Ghose et al., 2002; Adab and Vogels, 2011). At a population level, such changes might be expected to lead to subtle shifts in the patterns of feature-selective responses across the visual cortex, even if the overall response is no greater after extensive training. Indeed, training has been shown to increase the discriminability of neural responses to trained orientations, in the absence of an increase in overall neural response (Adab and Vogels, 2011).
Although extensive training did not change overall BOLD activity, could the observed behavioral improvements have arisen from subtler shifts residing in the pattern of BOLD activity? To address this question, we conducted a voxel-based analysis to explore whether individual voxels in a given region of interest tended to show more differentiated responses to the trained orientation after learning. During the fMRI sessions, subjects viewed four base orientations that differed by 45° increments (i.e., the trained orientation, the orthogonal untrained orientation, and two flanking orientations at ±45°). We measured the separability of these orientation responses by calculating the discriminability (d′) of the voxel's responses to each possible pair of orientations, across repeated presentations, and asking whether the discriminability of the trained orientation and the three untrained orientations was greater after training. Critically, this measure of discriminability (i.e., signal-to-noise ratio) (for details, see Materials and Methods) does not presume that a voxel's response should necessarily increase for the trained orientation. For example, if a voxel prefers an untrained orientation, and after training is shown to elicit an even weaker response to the trained orientation, the discriminability between these two orientations will also have been enhanced.
We compared the strength of cortical orientation-selective responses for each orientation and location before and after the training regimen (Fig. 3a,b). To assess the orientation specificity of the effects of training, we calculated LMI values for our d′ measure of orientation discriminability, where positive LMIs indicate greater training-related improvements in d′ for the trained orientation compared to the orthogonal untrained orientation (Fig. 3c). For our first set of analyses, we focused on attended visual stimuli. A within-subjects ANOVA performed on discriminability values with factors of ROI, time of scanning, orientation, and visual field location indicated a significant three-way interaction between time, orientation, and location (F(3,33) = 3.49, p = 0.02). Tests of the two-way interactions between time and orientation and between time and location did not approach significance (F(3,33) = 0.75, p = 0.53 and F(3,33) = 0.56, p = 0.47, respectively). Moreover, comparing LMIs between trained and untrained locations indicated a significantly greater improvement due to training for the trained orientation at the trained location (V1, t(11) = 2.55, p = 0.03; V2–V4, t(11) = 3.48, p = 0.005). These analyses reveal a highly specific effect of visual training: extensive training results in both orientation- and location-specific improvements in orientation discriminability, similar to those observed in the behavioral results. Interestingly, a breakdown of d′ into contributions from either signal or noise revealed that this improvement in orientation representation from training arose from a combination of both greater values of signal and decreased levels of noise; neither of these components on their own was sufficient to explain the improvement (interaction between time, orientation, and location; signal, F(3,33) = 1.97, p = 0.13; noise, F(3,33) = 2.4, p = 0.08).
a–f, Effects of learning on voxel-based orientation discriminability (d′) for the attended (a–c) and unattended (d–f) conditions. a, b, Average d′ for areas V1 (a) and V2–V4 (b) with attention. Because the pattern of d′ values was qualitatively similar for the orthogonal and flanking orientations, data were collapsed across all untrained orientations in a, b, d, and e. We found a significant three-way interaction between time of scanning, orientation, and visual field location (F(3,33) = 3.49, p = 0.02). The effect of training on d′ was no different across orientations or locations (F(3,33) = 0.75, p = 0.53 and F(3,33) = 0.56, p = 0.47, respectively). c, In both ROIs, LMIs were significantly greater at the trained location than at the untrained location (V1, t(11) = 2.55, p = 0.03; V2–V4, t(11) = 3.48, p = 0.005). d, e, Average d′ for areas V1 and V2–V4 without attention. Data are shown for areas V1(d) and V2–V4 (e). Training induced no orientation- or location-specific change in d′ when the orientation stimuli were ignored (F(3,33)=.89, p = 0.46). f, LMIs for d′ in the unattended condition.
In the unattended condition, we found that orientation-selective responses for the trained stimulus at the trained location were not reliably stronger than those found for the untrained orientation or location following training (four-factor ANOVA; interaction between time, orientation, and location; F(3,33)=.89, p = 0.46; Fig. 3d–f). However, the difference between the attended and unattended training-related changes in discriminability, as measured in a five-factor ANOVA with attention included as an additional factor, did not reach statistical significance (interaction between attention, time, orientation, and location; F(3,33) = 1.42, p = 0.25). Consistent with previous reports (Saproo and Serences, 2010; Jehee et al., 2011), we did observe a main effect of covert attention, with significantly stronger orientation-selective responses at the attended stimulus location compared to the unattended location (F(1,11) = 6.76, p = 0.025).
We next examined whether these cortical improvements in orientation discriminability due to training were predictive of the training-related improvement in behavioral performance. If the neural benefits of training are linked to behavior, then we would expect a correlation between the intersubject variability in neural LMIs and intersubject variability in behavioral LMIs. Indeed, we discovered a positive correlation between individual participants' training-based improvements in behavioral thresholds and their corresponding improvements in neural discriminability of the trained orientation for the attended condition (Fig. 4). Interestingly, this correlation was strongest within area V1 (r = 0.58, p = 0.04) and appeared weaker in higher visual areas (V2–V4, r = 0.38, p = 0.22). Meanwhile, there was little to no correlation between neural LMIs and behavior for the untrained locations (V1, r = 0.09, p = 0.75; V2–V4, r = 0.06, p = 0.84).
Correlation between voxel-based LMIs for orientation-selective responses and behavioral LMIs for the attended condition. Within V1, there was a significant positive correlation (r = 0.58, p < 0.05) between participants' training-based improvements in behavioral thresholds and corresponding improvements in their neural discriminability. Correlations between behavior and neural activity were not significant in any other condition (all p > 0.2).
Discussion
Our results reveal that on the scale of fMRI activity patterns, perceptual learning gives rise to a more distinctive representation of trained stimuli in the early visual cortex, even in the absence of gross changes in the magnitude of BOLD activation. These refinements in the neural representation were found to be specific to the trained orientation and the trained location in the visual field, in agreement with the visual specificity of perceptual learning in this task. These findings provide compelling evidence that perceptual learning can lead to highly specific and localized changes in early visual areas. In addition, we found a strong correlation between the behavioral effects of learning in individual participants and corresponding improvements in the neural representation of trained orientations in area V1. These results suggest that the functional plasticity of early visual areas is important for realizing the benefits of extended perceptual training.
The improved reliability of orientation-selective activity found here may reflect the enhanced gain or sharpening of orientation-tuned responses at the population level, as such changes would be expected to lead to more distinctive patterns of fMRI activity at the trained orientation. Selective enhancement of orientation responses could lead to improved fidelity of orientation coding, while leaving gross BOLD amplitude largely unaffected. A previous study of alert monkeys found that prolonged training on an orientation discrimination task led to steeper orientation tuning functions in V1, affecting specifically those neurons most sensitive to changes in the trained orientation (Schoups et al., 2001). Interestingly, the improvement in cortical discrimination of the trained orientation relied on some voxels showing increased activity for the trained orientation and others showing decreased activity—a pattern of results that is consistent with sharpening of the population response. A reduction in neuronal noise correlations might also have contributed to our results. A single fMRI voxel indirectly measures activity from a large population of neurons, but because these neurons share noise correlations, pooling over many cells is less effective at improving signal fidelity. If perceptual learning were to reduce the degree of correlated noise at the trained orientation, this could potentially reduce the variability of fMRI responses at the voxel level (Averbeck et al., 2006). Indeed, a previous single-unit study reported a reduction in noise variability following perceptual learning (Gu et al., 2011), and computational modeling has suggested that learning may both sharpen the width of orientation tuning and reduce noise correlations at the population level (Bejjanki et al., 2011). Our results are also in agreement with a previous study that reported both a decrease in the Fano factor and an increase in the separability of trained orientation representations, all in the absence of overall increases in neural response (Adab and Vogels, 2011).
Although we found a significant effect of training on orientation discriminability only in the attended condition, a comparison of the effects of perceptual learning between the attended and unattended conditions did not indicate a statistically reliable difference. This aspect of our findings, although not conclusive, is generally consistent with the proposal that the effects of perceptual learning may be better revealed in the presence of attention (Crist et al., 2001; Gilbert et al., 2001; Ahissar and Hochstein, 2004). A prior electrophysiological study in awake-behaving macaques found that the neural effects of learning could only be observed when the animal actively performed the trained task (Crist et al., 2001; W. Li et al., 2004). The fact that top-down attention appears important for enhancing the representations of highly trained stimuli might help to explain why other neurophysiological studies have reported weak or negligible effects of learning in early visual areas. In many of these studies, the trained animal was either anesthetized or attending away from the trained stimulus (Schoups et al., 2001; Ghose et al., 2002; Law and Gold, 2008; W. Li et al., 2008). In a direct test of this hypothesis, W. Li et al. (2008) found that while training modulated responses in monkey V1, when the same animals were tested under anesthesia, both the behavioral and neuronal effects of perceptual learning were completely abolished.
The proposal that attention is important for the effects of perceptual learning should not be confused with one in which the effects of training are largely explained as a difference in attentional effort between trained and untrained conditions. This latter explanation holds that long training periods might motivate subjects to do well on the task and, as a consequence, simply pay more attention to the trained stimulus than to any of the untrained conditions. Although visual attention can have large effects on the fidelity of orientation representations in early visual areas (Jehee et al., 2011; Scolari et al., 2012), there are several reasons why differences in attentional allocation across stimulus conditions are unlikely to explain our results. First, we used an adaptive staircase procedure to equate task-difficulty levels both before and after training, and across orientations and locations, holding effort constant across all conditions. Second, we found no fMRI evidence to suggest increased effort when participants had to attend to trained stimuli. Previous work has shown a link between overall BOLD amplitude and the amount of effort put into a visual task, with higher BOLD amplitudes for increasing levels of effort (Ress et al., 2000). We found no such increase in gross BOLD amplitude for the trained orientation at the trained location when compared with untrained conditions, suggesting that higher levels of effort due to training cannot explain our results.
An interesting aspect of our findings was the enhancement of orientation-selective activity patterns that we found specifically for the trained orientation, despite the fact that overall BOLD amplitudes were no greater for stimuli presented at the trained orientation. The lack of an observed increase in BOLD response amplitude is consistent with a two-stage model of perceptual learning proposed by Yotsumoto et al. (2008). In this model, initial training leads to synaptic proliferation and increased BOLD activity, while more extensive practice is accompanied by synaptic pruning and reduction of BOLD activation to levels near the pretraining baseline. Our subjects showed gradual improvements in behavioral performance after a period of rapid initial gains, suggesting that the posttraining scanning session fell within this later stage of learning.
It is important to note that a previous neuroimaging study reported substantial increases in overall BOLD amplitude in visual cortex for trained visual orientations after prolonged training (Furmanski et al., 2004) for a period comparable to that of the present study. A notable difference, however, was that participants were trained on a visual detection task, whereas our participants practiced a near-threshold discrimination task. Optimal visual coding models predict differing effects of training in these two cases. Improved population coding for detection requires increasing the number or sensitivity of the relevant detectors (Jazayeri and Movshon, 2006), which might be expected to lead to increased BOLD activation for trained orientations. Consistent with this notion, a previous neurophysiological study found that cats trained to detect low-contrast gratings presented to one eye exhibited relatively higher V1 neuronal contrast gain for the trained eye than for the untrained eye (Hua et al., 2010). Optimal coding for discrimination tasks, as used here, requires producing more distinctive patterns of neural activity (Jazayeri and Movshon, 2006; Scolari and Serences, 2010), which may not necessarily lead to gross increases in firing rates or BOLD amplitude. Both the neural and behavioral components of perceptual learning may thus be exquisitely sensitive to both the trained visual stimulus and the precise demands of the practiced task.
Footnotes
This work was supported by a Rubicon grant from the Netherlands Organization for Scientific Research (J.F.M.J.), National Eye Institute (NEI) Grant R01 EY017082 (F.T.), National Research Service Award (NRSA) fellowship F32 EY019802 (S.L.), NRSA fellowship F32 EY019448 (J.D.S.), and EU FP7-PEOPLE-2009-RG Grant 256456 (J.F.M.J.). We thank Ben Wolfe and Elizabeth Counterman for technical assistance, and the Vanderbilt University Institute of Imaging Science for MRI support. We are also thankful for administrative support from the Vanderbilt Vision Research Center, supported by NEI Grant P30-EY008126.
- Correspondence should be addressed to Janneke F. M. Jehee, Donders Institute for Brain, Cognition and Behavior, Center for Cognitive Neuroimaging, Kapittelweg 29, 6525 EN Nijmegen, The Netherlands. janneke.jehee{at}donders.ru.nl