Abstract
Learning is known to facilitate our ability to detect targets in clutter and optimize brain processes for successful visual recognition. Previous brain-imaging studies have focused on identifying spatial patterns (i.e., brain areas) that change with learning, implicating occipitotemporal and frontoparietal areas. However, little is known about the interactions within this network that mediate learning-dependent improvement in complex perceptual tasks (i.e., discrimination of visual forms in clutter). Here we take advantage of the complementary high spatial and temporal resolution of simultaneous EEG-fMRI to identify the learning-dependent changes in spatiotemporal brain patterns that mediate enhanced behavioral sensitivity in the discrimination of global forms after training. We measured the observers' choices when discriminating between concentric and radial patterns presented in noise before and after training. Similarly, we measured the choices of a pattern classifier when predicting each stimulus from EEG-fMRI signals. By comparing the performance of human observers and classifiers, we demonstrated that learning alters sensitivity to visual forms and EEG-fMRI activation patterns related to distinct visual recognition processes. In particular, behavioral improvement after training was associated with changes in (1) early processes involved in the integration of global forms in higher occipitotemporal and parietal areas, and (2) later processes related to categorical judgments in frontal circuits. Thus, our findings provide evidence that learning acts on distinct visual recognition processes and shapes feedforward interactions across brain areas to support performance in complex perceptual tasks.
Introduction
Successful visual recognition relies on our ability to extract structure from noisy sensory inputs and integrate local features into global forms. Learning plays a key role in facilitating performance in these tasks and optimizing visual recognition processes in the primate brain. In particular, as previous work has shown, learning facilitates the detection and recognition of targets in clutter (Dosher and Lu, 1998; Goldstone, 1998; Schyns et al., 1998; Gold et al., 1999; Kovacs et al., 1999; Sigman and Gilbert, 2000; Gilbert et al., 2001; Brady and Kersten, 2003) by enhancing the integration of relevant features and their segmentation from noisy backgrounds.
The evidence on the neural mechanisms that support perceptual improvements due to training remains controversial. Some studies argue that learning alters early sensory processing (Adini et al., 2002; Teich and Qian, 2003), while others propose that learning alters later decision-related processes (Dosher and Lu, 1999; Li et al., 2004; Law and Gold, 2008; Jacobs, 2009). Previous fMRI studies have implicated both occipitotemporal and frontoparietal circuits in shape learning (Dolan et al., 1997; Gauthier et al., 1999; Grill-Spector et al., 2000; Chao et al., 2002; Kourtzi et al., 2005; Op de Beeck et al., 2006). However, little is known about how learning shapes interactions between these circuits and mediates perceptual improvements in the discrimination of global forms (Scott et al., 2006, 2008; Rossion et al., 2007). Previous EEG studies have focused on the temporal processes that mediate visual feature (e.g., orientation, motion) (Fahle and Skrandies, 1994; Skrandies et al., 2001; Ding et al., 2003; Shoji and Skrandies, 2006; Song et al., 2007; Pourtois et al., 2008; Bao et al., 2010) rather than global form learning. Here, we seek to identify the learning-dependent mechanisms that support distinct processes for visual form learning in clutter, ranging from the extraction of stimulus features from noise to the categorization of global forms.
Using fMRI alone would make it difficult to identify cortical circuits related to different temporal processes involved in visual form learning due to the low temporal resolution of the technique. We exploit the complementary high temporal and spatial resolution of simultaneous EEG-fMRI recordings to determine learning-dependent changes in the discrimination of global forms in clutter. In particular, we trained observers to discriminate global form patterns (concentric vs radial) embedded in parametrically manipulated background noise (see Fig. 1A). Using EEG-informed fMRI and pattern classification analysis methods, we tested for learning-dependent changes in EEG-fMRI activation patterns that related to the observers' enhanced sensitivity in discriminating global forms after training.
Our findings demonstrate that learning acts on distinct visual recognition processes and shapes feedforward interactions between visual and frontal areas to support complex perceptual tasks (e.g., discrimination of visual forms in cluttered backgrounds). In particular, training improved observers' sensitivity in discriminating global forms in noise. This behavioral improvement was associated with neural changes in (1) early processes involved in the integration of global forms engaging occipitotemporal and posterior parietal areas, and (2) later processes related to categorical judgments engaging frontal circuits.
Materials and Methods
Observers
Ten observers (six male, four female; mean age, 21.4 years) participated in the experiment. All observers were from the University of Birmingham, had normal or corrected-to-normal vision, and gave written informed consent. The study was approved by the local ethics committee.
Stimuli
We used Glass pattern stimuli defined by white dot pairs (dipoles) displayed within a square aperture (7.7° × 7.7°) on a black background (100% contrast). For all stimulus patterns, the dot density was 3% and the size of each dot was 2.3 × 2.3 arc min. These parameters were chosen based on pilot psychophysical studies and in accordance with previous studies (Li et al., 2009; Mayhew et al., 2010a) showing that coherent form patterns are reliably perceived for these parameters. We generated radial (0° spiral angle) and concentric (90° spiral angle) Glass patterns by placing dipoles tangentially (concentric stimuli) or orthogonally (radial stimuli) to the circumference of a circle centered on the fixation dot. Each stimulus comprised dot dipoles that were aligned according to the specified spiral angle (signal dipoles) for a given stimulus, and noise dipoles for which the spiral angle was randomly selected. Stimuli were embedded in varying levels of noise by randomizing the orientation of a chosen percentage (0–100%) of dot dipoles (see Fig. 1A).
To control for stimulus-specific training effects, and to ensure generalization of learning, we used the following procedures. We trained observers using stimuli with Glass shift (i.e., distance between the two dots in a pair) of 25 arc min, but tested (pretraining and post-training test), and scanned, using stimuli with Glass shift of 30 arc min. Further, to control for local adaptation due to stimulus repetition, we generated different stimulus exemplars by randomly jittering (±5°) the spiral angle for each stimulus. These procedures ensured that learning could not be due to similar local cues between the stimuli used for training, tests, and scanning, but rather global features (i.e., spiral angle) used by the observers for stimulus categorization.
Design
Observers were trained to perform a categorization task (concentric vs radial) and tested in two EEG-fMRI sessions. The first imaging session was preceded by a pretraining psychophysical test session (480 trials). The first scanning session was conducted a maximum of 5 d after the behavioral pretraining test session, depending on the availability of the observers. The second imaging session was preceded by three sessions of psychophysical training outside the scanner, each comprising between five and eight runs (256 trials per run). At the end of this training, observers were tested on a post-training psychophysical test session (480 trials). All three training sessions were completed on consecutive days. The second scanning session was conducted on the following day after the post-training test session.
Psychophysical training
Familiarization phase.
Observers were familiarized with the task and stimuli in a short practice session. Observers were shown 100% signal Glass patterns and categorized the presented stimuli as either radial (0° spiral angle) or concentric (90° spiral angle) patterns.
Training and test.
Two test runs were performed where observers were presented with Glass patterns ranging in signal strength from 0 to 100% (steps: 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 85, 100%) and performed the categorization task without feedback. Sixteen stimuli were used for each signal level (eight radial, eight concentric) totaling 240 trials per run. This pretraining test allowed us to assess each observer's initial categorization performance before the first imaging session and training. Following the first imaging session, observers were presented with stimuli (range, 5–75%; signal levels in steps of 5%) and were trained (three sessions per observer, self-paced procedure with auditory error feedback) to categorize between radial and concentric patterns. Each training session comprised multiple runs (ranging from five to eight runs) with 256 trials per run. For each trial during training, the stimulus was presented for 300 ms. A white fixation square (7.7 × 7.7 arc min) was presented at the center of each stimulus. Observers were instructed to indicate which category the stimulus belonged to by pressing one of two keys. Observers were trained with auditory error feedback until their performance reached a stable level. That is, training was completed when observers reached 80% performance across all trials twice during the training sessions and 80% performance across all trials during the post-training test. After training, observers were tested on stimuli ranging from 0 to 75% (steps of 5%) in two post-training runs (240 trials per run) without feedback.
FMRI measurements
All observers participated in two scanning sessions during which they performed the categorization task on the Glass pattern stimuli. For each observer, we collected data from seven or eight event-related runs in each session. The order of trials was matched for history (one trial back) such that each trial was equally likely to be preceded by any of the conditions. The order of the trials differed across runs and observers. Eight conditions (seven stimulus conditions and one fixation condition during which only the fixation point was displayed at the center of the screen) with 16 trials per condition were presented in each run. Each run comprised 129 trials (128 trials across conditions and 1 initial trial for balancing the history of the second trial) and two 9 s fixation periods (one at the beginning and one at the end of the run).
The stimulus conditions comprised Glass patterns of 0° ± 1.5° or 90° ± 1.5° spiral angle at 0, 25, 35, 50, 70, 85, and 100% signal level. Each trial lasted 3 s. For fixation trials, the fixation square was displayed for 3 s. For experimental trials, each trial started with a 200 ms stimulus presentation followed by a 1300 ms delay, during which a white fixation square was displayed at the center of the screen. After this fixed delay, the fixation dot changed color to either green or red. This change in fixation color served as a cue for the motor response using one of two buttons. If the color cue was green, observers indicated concentric versus radial by pressing the left versus right finger key. If the color was red, the opposite keys were used (e.g., concentric = right key). The fixation color was changed back to white 300 ms before the next trial onset. This procedure dissociated the motor response (button press) from the stimulus categories. Observers were familiarized with this procedure before scanning.
Data acquisition
FMRI scanning.
Experiments were conducted at the Birmingham University Imaging Center (3T Achieva scanner, Philips). EPI and T1-weighted anatomical (1 × 1 × 1 mm) data were collected with an eight-channel sensitivity encoding (SENSE) head coil. EPI data (gradient echo-pulse sequences) were acquired from 24 slices (whole-brain coverage; TR, 1500 ms; TE, 35 ms; flip-angle, 73°; 2.5 × 2.5 × 4 mm resolution).
EEG recordings.
We recorded EEG and fMRI signals simultaneously during scanning. EEG data were acquired from 64 electrodes using an MRI-compatible cap and amplifiers (BrainProducts) with current limiting safety resistors of 5 kΩ at the amplifier input and in each electrode. The EEG cap comprises 62 scalp electrodes distributed in the 10–20 system. To identify trials contaminated by electro-oculographic blinks, we recorded signals using an electrode placed over the mid-lower eyelid. To correct for ballistocardiographic (BCG) artifacts, we recorded the electrocardiogram (ECG) from an electrode attached to the observer's chest, below the left collarbone. Data were sampled at 5000 Hz, with a low-pass hardware filter at 250 Hz. Electrode impedances were always kept <20 kΩ. The EEG system clock was synchronized with the MRI scanner clock using SyncBox (BrainProducts). A custom-made photo sensor was used to measure the precise timing of stimulus onset on the screen inside the scanner. The detected stimulus onsets and the MRI volume triggers were saved as markers, together with the recorded EEG signals.
Data analysis
Behavioral data analysis.
We fitted psychometric (proportion concentric) data collected in the laboratory with a cumulative Gaussian function using a procedure that implements a maximum-likelihood method (Wichmann and Hill, 2001). Confidence intervals were calculated on the fits from 2000 bootstrap iterations of the data. Using this procedure for each individual observer's behavioral data, we identified the threshold (i.e., signal level at 78% correct) for each observer.
FMRI data processing.
MRI data were processed using Brain Voyager QX (Brain Innovations). Anatomical data were used for 3D cortex reconstruction, inflation, and flattening. Preprocessing of functional data included slice-scan time correction, head movement correction, temporal high-pass filtering (three cycles), and removal of linear trends. Trials with head motion larger than 1 mm of translation or 1° of rotation were excluded from the analysis. Spatial smoothing (Gaussian filter; full-width at half-maximum, 6 mm) was performed only for group random-effect analysis but not for data used for the multivoxel pattern classification analysis. The functional images were aligned to anatomical data, and the complete data were transformed into Talairach space. For each observer, the functional imaging data between the two sessions were coaligned, registering all volumes of each observer to the first functional volume of the first run and session. This procedure ensured a cautious registration across sessions. To avoid confounds from any remaining registration errors, we compared fMRI signals between stimulus conditions within each session rather than across sessions. A gray-matter mask was generated for each observer in Talairach space from the anatomical data for selecting only gray-matter voxels for further analyses.
EEG data processing.
We focused our EEG analysis on robust event-related signals in the 1–40 Hz frequency range. These signals have previously been shown to reflect visual form processing (Ohla et al., 2005; Pei et al., 2005). The MRI volume triggers were used to identify the onset of each gradient artifact to create an artifact template. MRI gradient artifacts were then removed using average artifact subtraction (Allen et al., 2000) in BrainVision Analyzer (BrainProducts). EEG data were downsampled to 500 Hz, and BCG artifacts were removed using the optimal basis set method (Niazy et al., 2005) available as a plug-in to EEGLAB (Delorme and Makeig, 2004). For each imaging session, EEG data from all experimental runs were concatenated and EEG signals were band-pass filtered between 0.1 and 40 Hz. The filtered data were then analyzed with a FastICA (Hyvarinen and Oja, 1997) algorithm to generate 62 independent components (ICs). In each session, ICs containing the transient eye-blink artifact were removed from the data (Jung et al., 2000). These ICs were identified from (1) plots of trial amplitude or ERP images (Jung et al., 2000) that showed a distinctive pattern of transient deviations with large amplitude occurring at unpredictable latencies relative to the stimulus, and (2) scalp maps of electric-field distribution with an obvious frontal weighting. These measurements are quite distinct from ERP images and scalp maps from stimulus-related signals. Further, components whose time course significantly correlated with the recorded ECG signal were rejected as residual BCG artifacts (Srivastava et al., 2005; Debener et al., 2007). The remaining ICs were used to reconstruct the EEG signal for further analysis. Single-trial EEG epochs were extracted using a window of 0.7 s (from 200 ms prestimulus to 500 ms poststimulus) based on the stimulus-onset markers provided by the photo sensor. For each epoch, a baseline correction was performed by subtracting the average of the prestimulus (200 ms) data. Single trials with maximum amplitude difference >100 μV were excluded from further analysis.
Mutual information estimation for EEG signals.
As the mean amplitude of EEG signals did not allow us to discriminate between radial and concentric patterns (see Fig. 2B), we used information theory (Shannon, 1948; Cover and Thomas, 1991) to estimate mutual information (MI) between stimulus conditions and EEG responses (Montemurro et al., 2008). This measure is driven by the distribution of the EEG signal amplitudes and therefore is more sensitive than the mean ERP signals in identifying informative EEG components related to stimulus conditions. That is, MI between EEG amplitude and a given stimulus condition is a measure of the statistical dependence between these two variables. High MI values suggest that the distributions of the two variables share common information.
We estimated the mutual information I(S;R) between stimulus conditions and EEG responses for each observer, session, and EEG channel as follows:
where s is the stimulus condition (N = 13), r is the amplitude of the EEG response, and P is the probability.
MI was calculated across all stimulus conditions (i.e., signal levels). For each channel and trial, the EEG time series was smoothed by averaging 10 ms signals around each time point. We then estimated the distribution of the signal amplitudes using 30 response bins (that is, Ns/R̄ ≥ 4, where Ns is the number of trials per condition and R̄ is the number of bins). For each session, we set the two-tailed 95% confidence intervals to the upper and lower bounds of the amplitude distribution. Amplitude values outside this range were set to the upper bound and lower bound, respectively. Note that this amplitude correction was applied only for this mutual information analysis; all subsequent analyses used the preprocessed EEG signal without this correction. This amplitude correction was performed to preserve the sensitivity of the information measurement given the number of bins necessary for estimating the mutual information (Panzeri et al., 2007).
We estimated the MI for each EEG channel based on Equation 1. We also shuffled the condition labels 500 times and estimated the shuffled MI to create a baseline measurement. Due to the limited number of trials, we corrected the estimated MI following a Bayesian procedure (Panzeri and Treves, 1996) and subtracting the shuffled MI. Following this correction separately for each time point, we tested across all observers and sessions whether MI values differed significantly from chance. Using this procedure, we computed MI separately for each session (pretraining, post-training) and observer. No significant differences were observed between the two sessions in either the component latency (Component 1, F(1,9) = 0.95, p = 0.38; Component 2, F(1,9) = 0.31, p = 0.62) or amplitude (Component 1, F(1,9) = 0.85, p = 0.4; Component 2, F(1,9) = 1.21, p = 0.22). Thus, to ensure sufficient signal power for robust and independent estimation of informative EEG components, we calculated the MI per temporal bin (30 ms) across all channels, stimulus conditions, and EEG single trials in both sessions and observers. In particular, we used the maximum amplitude peaks of the MI time course (averaged across all EEG channels and observers) to identify the temporal components that contained discriminative information across stimulus conditions.
EEG channel selection
To select EEG channels that contained information for discriminating between radial and concentric patterns, we first computed the scalp topographies based on average ERP signals. However, comparing the topographies for radial versus concentric stimuli did not reveal any significant differences in EEG amplitude for either Component 1 or 2. As this analysis was not sensitive enough to reveal channels containing information useful for discriminating between stimulus conditions, we used a receiver operating characteristic (ROC) analysis on the response amplitude of each channel across single trials. We performed this analysis on the data within a 10 ms window around the peak of each of the two components and measured the area under the curve that indicated the discriminability of EEG signals related to radial (0°) and concentric (90°) trials. We calculated significance values using a bootstrap procedure; that is, we shuffled the stimulus labels and calculated the ROC value for each channel 1000 times. We then ranked all channels by ROC significance value and selected the top 20% channels across the whole scalp. This procedure allowed us to select channels from across the scalp with higher frequency of posterior channels selected for Component 1 (parietal, 27.5%; occipital, 8.3%) than for Component 2 (parietal, 17.5%; occipital, 4.6%), while higher frequencies of frontal channels were selected for Component 2 (31.3%) than for Component 1 (26.7%). This is consistent with previous studies that implicate posterior areas in visual processing and higher frontal circuits in later perceptual judgments. We then averaged the time course of the selected channels to generate a mean EEG time course for each component and observer.
EEG-informed fMRI mapping
To identify brain regions associated with different stages of shape discrimination in noise, we used an EEG-informed GLM analysis (Debener et al., 2005; Eichele et al., 2005; Philiastides and Sajda, 2007). We generated separate regressors for each of the two EEG temporal components and tested for fMRI responses that correlated with the amplitude of each EEG component across trials. For each individual observer, a separate regressor for each EEG temporal component was generated based on the single-trial variability in the EEG amplitude at the respective component latency. The regressor amplitude at each trial was calculated by averaging the amplitude of the selected channel time course within a 10 ms window, centered at the component peak latency. The two EEG regressors were decorrelated (using Gram–Schmidt orthogonalization for removing the common variance between the two from the first or second regressor) (Eichele et al., 2005). In particular, correlations between regressors (mean across observers and sessions, r = 0.12; SD, 0.01) were eliminated (r = 0.00) after removing any common variance from the second or first component regressor. This procedure ensured that fMRI activations were specific to each component rather than a general feature of the visual evoked response. Both regressors were then convolved with a canonical double-γ hemodynamic response function. These regressors were used to form a GLM along with six other regressors derived from the motion correction parameters.
We performed group random-effects analysis and identified regions for which the amplitude of each of the two EEG components correlated significantly (p < 0.05, cluster threshold correction) with the BOLD signal. Performing this EEG-informed GLM analysis separately on pretraining and post-training data and comparing the activation maps (t test) between sessions did not show any significant (p < 0.05) differences in between sessions for either component. Therefore, to identify regions of interest that correlated with each of the two EEG components, we pooled the data across the two sessions. We identified the same regions in individual observers using fixed-effects analysis, and labeled these regions based on the overlap of functional activations and anatomical landmarks.
Multivoxel pattern fMRI analysis
To test which brain regions showed learning-dependent changes, we conducted multivoxel pattern analysis (MVPA) on the activation patterns of the regions identified based on the EEG-informed fMRI analysis. This approach has been shown to be more sensitive than conventional statistical analyses of fMRI signals in revealing learning-dependent differences in the discrimination of visual forms (Li et al., 2009). This approach is supported by an analysis comparing the functional signal change between radial and concentric stimuli for each session that showed no significant differences between the BOLD responses to radial or concentric stimuli for either the pretraining or post-training session. A three-way repeated-measures ANOVA (ROI × session × stimulus) showed no significant effect of session (F(1,9) = 0.88, p = 0.52) or stimulus (F(1,9) = 0.55, p = 0.47) and no significant interaction between session and stimulus (F(1,9) = 1.06, p = 0.34).
In particular, for each observer, we selected voxels in each ROI that were activated significantly stronger for the corresponding EEG-fMRI statistical map (p < 0.05, uncorrected). We ordered these voxels based on their t value (in descending order) for radial versus concentric stimuli (i.e., we compared activations for radial and concentric stimulus trials across conditions). Following this procedure, we selected up to 100 voxels for each ROI and observer for the analysis, as prediction accuracy had saturated at this pattern size across areas, resulting in a dimensionality compatible with previous studies (Haynes and Rees, 2005; Kamitani and Tong, 2005; Li et al., 2007). As the regions of interest were defined by pooling data across both the pretraining and post-training sessions, a common set of voxels was selected for pattern classification of the data from each session. Cautious alignment of the functional data across sessions ensured that the 100 voxels selected for MVPA were the same across sessions. Each voxel time course was z-score normalized for each experimental run separately. The data pattern for each trial was generated by shifting the fMRI time series by 3 volumes (4.5 s) to account for the hemodynamic delay.
Finally, we used a linear support vector machine (SVM) and a leave-one-run-out cross-validation procedure for the pattern classification. We trained the classifier to associate fMRI signals with a label (radial vs concentric) related to the stimulus condition. We averaged the two volumes from each trial (trial duration = 3 s; TR = 1.5 s) to generate one training pattern per trial. We then tested whether the classifier predicted the stimulus condition (radial vs concentric) using an independent dataset. To ensure generalization of the classification, we used a leave-one-run-out cross-validation procedure. That is, for each cross-validation, we left one run out as an independent test dataset. Data from the rest of the runs were used as the training set (112 patterns per run). For each observer, we calculated the mean performance of the classifier (proportion of trials classified correctly) in predicting whether each stimulus was radial or concentric across cross-validations. Finally, we calculated fMR-metric functions (Li et al., 2009) per ROI by averaging the performance of the classifier for each stimulus condition across cross-validations and fitting the data using a cumulative Gaussian. It is important to note that the classification comparisons were independent from the voxel selection procedure. The voxel selection was conducted using only the training dataset for the extreme stimulus conditions (excluding the test dataset for each cross-validation).
Multivariate pattern analysis for EEG data.
To test which temporal processes associated with each of the two EEG components showed learning-dependent changes, we performed pattern classification on the EEG data (112 trials across conditions per run) for each of the two components. As described above, we used data from 20% of channels selected based on an ROC analysis. For each session, we trained a linear SVM to classify single-trial EEG signals associated with the two different stimulus categories (concentric vs radial) and tested the classifier's accuracy using an independent dataset. For each EEG trial, we averaged the signal from a 30 ms window centered at the peak of each of the two EEG components. For each cross-validation, 10% of the data were left out as an independent test dataset and the remaining 90% of the data were used as the training set. We calculated the classifier performance for each condition across 100 cross-validations and observers and fitted the data using a cumulative Gaussian.
Eye movement analysis.
We recorded eye movements from eight observers while performing the categorization task in the scanner, as follows: (1) data from pretraining; and (2) data from post-training. Eye movements were recorded using the ASL 6000 eye-tracker (Applied Science Laboratories) with 60 Hz temporal resolution. Eye-tracking data were preprocessed using Eyenal software (Applied Science Laboratories) and analyzed using custom Matlab (Mathworks) software. We computed horizontal eye position, vertical eye position, proportion of saccades for each condition at different saccade amplitude ranges, and number of saccades per trial per condition.
Results
Behavioral results
We tested the observers' ability to categorize global form patterns as radial or concentric in noise (Fig. 1A) and plotted their performance (proportion correct) as a function of stimulus signal level (i.e., number of dot dipoles comprising the global form patterns). Comparing the psychometric functions fitted to data averaged across trials for each session (pretraining vs post-training) showed that training increased the observers' task sensitivity. In particular, the 78% performance threshold improved from 82.2% signal level (±22.6%) before training to 43.4% signal level (±7.3%) after training (Fig. 1B). Similar learning-dependent changes in performance threshold were observed when we estimated the 78% performance threshold separately for radial (81.4% signal level before training; 45.1% signal level after training) and concentric stimuli (83.2% signal level before training; 42.6% signal level after training). This was confirmed by a significant increase in the slope of the psychometric function (estimated from cumulative Gaussian fits on individual observer data) after training (t(9) = 4.84; p < 0.001).
Stimuli and behavioral data. A, Example radial (top row) and concentric (bottom row) Glass pattern stimuli at signal levels of 100, 70, 35, and 0%. B, Behavioral data collected in the lab (circles) and the scanner (squares) for pretraining (gray dotted line) and post-training (black solid line) sessions.
EEG-informed fMRI mapping of regions of interest
We exploited the high temporal resolution of EEG to identify temporal components that correspond to distinct processes related to global form discrimination. We used information theory (Shannon, 1948; Cover and Thomas, 1991; Montemurro et al., 2008) to identify informative components (i.e., temporal components that contain stimulus- or task-related information) in the EEG signal related to stimulus categories (radial, concentric) based on MI. This method provides a sensitive tool for identifying task-relevant temporal components of the EEG signal (Fig. 2A) that may be difficult to discriminate from comparison of standard ERP waveforms between stimulus conditions (Fig. 2B). We identified two time intervals that showed significant MI values (compared with chance levels), as follows: (1) 86–119 ms (p < 0.05) and (2) 229–249 ms (p < 0.05) after stimulus onset. High MI values suggest that the amplitude of the EEG signal at given latency varies according to the stimulus condition when considered across all trials and channels. We identified peak time points with the highest MI value within these significant time intervals that corresponded to early and later EEG components. The average peak for the first component was at 105 ± 16.1 ms poststimulus, and for the second component was at 242 ± 19.2 ms poststimulus. We concentrated on these two components for further analysis, as previous studies suggest that they reflect distinct processes (Johnson and Olshausen, 2003; Ohla et al., 2005; Pei et al., 2005; Tanskanen et al., 2008; Das et al., 2010). In particular, previous studies showing differential responses to global forms at later rather than early latencies suggest that latencies around the first component relate to visual form integration, while latencies around the second component relate to perceptual classification judgments. Finally, previous studies (Philiastides et al., 2006) have discriminated between components related to task difficulty (∼220 ms) and decision-related events (later than 300 ms). Following these studies, we explored a third component with average peak latency of 376 ms. However, analysis of peak latencies around the second (mean latency of 242 ms) and third (mean latency of 376 ms) component did not show any significant differences across stimulus difficulty levels (F(1,9) = 0.44, p = 0.51) or a significant interaction between component and stimulus difficulty (F(1,9) = 1.05, p = 0.33). These results suggest that the second component could not be discriminated from the third one on the basis of task difficulty. This is consistent with recent work showing that ERP signals at latencies ∼220 ms reflect sensory processing of stimuli embedded in noise rather than task-difficulty (Banko et al., 2011). Thus, we focus on the first two temporal components for the rest of the analyses.
EEG temporal components. A, Mutual information time course of radial versus concentric stimulus conditions was estimated using all single trials across all channels of each observer from 200 ms prestimulus to 400 ms poststimulus. Information was smoothed using a 30 ms window and averaged across all channels, observers, and sessions (pre- and post-training). Information values (bits) are shown after subtraction of the mutual information estimated using shuffled condition labels (500 iterations) and normalization to prestimulus baseline. The latencies of the two temporal peaks that showed the highest significant mutual information values (MI significantly different from zero across observers; p < 0.05, paired t test) are indicated by the gray-shaded portions of B. Group average visual-evoked potential in response to concentric and radial Glass patterns. The waveforms are averaged across trials, EEG channels, and observers. This analysis showed similar peaks but lacked sufficient sensitivity to discriminate between signals related to concentric and radial patterns, in contrast to the MI-based analysis that enabled us to identify components that discriminated significantly between these signals.
To identify brain regions involved in the different temporal processes related to the above EEG components, we conducted an EEG-informed fMRI analysis (Fig. 3), as described in previous studies (Debener et al., 2005; Eichele et al., 2005; Philiastides and Sajda, 2007). This analysis showed activations (p < 0.05, cluster threshold corrected) in V3/V3B [left hemisphere (LH)], lateral occipital (LO) (LH), inferior parietal sulcus (IPS), postcentral sulcus (PostCS), posterior cingulate (PCC), and dorsal premotor cortex (PMd) (LH) that correlated significantly with the amplitude of the first EEG component. Significant correlations with the amplitude of the second EEG component were found in the supplementary eye-field (SEF) and superior frontal gyrus (SFG). These results demonstrate two distinct cortical networks engaged in shape discrimination in noise. First, occipitotemporal, parietal, and motor regions were engaged early (first component) in processing. This is consistent with the role of occipitotemporal regions in the processing of visual forms (Ostwald et al., 2008), and parietal and motor regions in perceptual categorization (Freedman and Assad, 2006) and stimulus–response association processes (Toni et al., 2001). Second, processes related to perceptual judgments (i.e., associated with the second component) engaged prefrontal regions, consistent with the role of prefrontal cortex in categorization and adaptive cognitive processes (Miller, 2000; Duncan, 2001).
Single-trial EEG-informed fMRI analysis. Random effects GLM analysis (data grouped across all observers and sessions) using EEG-defined regressors corresponding to the two temporal components. Activation maps are shown using regressors orthogonalized by removing the common variance from the second component regressor. Activations that correlated significantly (p < 0.05, cluster threshold corrected) with the first component (orange/yellow) and second component (blue/green) are shown. t-statistic maps are superimposed on flattened cortical surfaces of both hemispheres (Table 1: Talairach coordinates). Sulci are shown in dark gray. Gyri are shown in light gray.
Talairach coordinates (mean, standard deviation) of all ROIs that showed significant activations across observers for the EEG-informed fMRI GLM
In interpreting these results, it is important to take into account the possible limitation of the EEG-fMRI methodology. First, the EEG-informed GLM analysis relies on differences in the amplitude rather than the latency of the regressors, as latencies related to the different EEG components overlap in the fMRI time course. Despite this limitation, this approach has been successful in linking fMRI activations to specific temporal components that differ in their response amplitude across trials (Debener et al., 2005; Eichele et al., 2005; Philiastides and Sajda, 2007). Second, activations associated with the selected channels may reflect processing across brain regions due to the low spatial resolution of the EEG. Our selection of the most informative channels across the whole scalp ensured unbiased use of EEG information from different scalp locations to identify regions across the whole brain associated with distinct temporal processes. Third, EEG-fMRI enables us to identify cortical areas that are more strongly rather than causally related to one of the processes associated with different temporal components (e.g., form integration vs perceptual classification). However, it is possible that additional interactions across areas are engaged at a finer resolution than can be measured by EEG-fMRI. That is, recurrent interactions between occipitotemporal and parietal areas may support fast categorization at early processing stages. Finally, despite recent advances in data acquisition and artifact correction techniques that have greatly improved the signal-to-noise ratio of EEG signals recorded during fMRI (Laufs et al., 2008), small residual artifacts may remain in the EEG and compromise activation maps resulting from EEG-informed fMRI analyses. However, the activations we observed using EEG-based GLMs corresponded closely to activation patterns in our previous fMRI studies on shape discrimination (Li et al., 2009). This was confirmed by an additional analysis using searchlight multivoxel pattern classification analysis to compare fMRI activations between stimulus categories. Thus, activations revealed by the EEG-informed fMRI analysis cover the network of regions engaged in visual form processing. The advantage of EEG-informed fMRI is that it allows us to identify the cortical areas associated with the distinct temporal processes that mediate the categorization of global forms (i.e., early form integration vs later categorical judgments).
Learning-dependent changes: fMR-metric functions
We tested which brain regions identified by the EEG-informed fMRI analysis showed learning-dependent changes in their activation patterns. In particular, we tested whether activation patterns in these regions after training corresponded to the changes in sensitivity that we observed in behavioral performance after training. As described above, univariate analyses of fMRI signals (statistical comparison of activation maps, or ROI-based analysis of BOLD signals for radial vs concentric patterns before and after training) did not show any significant differences in activations between sessions. Therefore, we used multivariate methods (i.e., multivoxel pattern classification) for the analysis of fMRI data that have been shown to be more sensitive in revealing voxel preferences.
fMR-metric functions in occipitotemporal (V3/V3B, LO), intraparietal (IPS), and somatosensory (PostCS) areas related to the first EEG component showed training-induced increases in sensitivity (Fig. 4). In particular, the slope (estimated from cumulative Gaussian fits on individual observer data) of the fMR-metric functions increased significantly (F(1,9) = 14.4, p < 0.01) after training, and there was a significant interaction between session (pre-, post-training) and ROI (F(1,9) = 5.6, p < 0.05). Specifically, significant differences between sessions were observed in higher occipitotemporal areas (V3/V3B, F(1,9) = 14.9, p < 0.01; LO, F(1,9) = 4.6, p < 0.05), and parietal areas (IPS, F(1,9) = 15.4, p < 0.01; PostCS, F(1,9) = 18.3, p < 0.01). In contrast, no significant differences were observed in frontal regions (PCC, F(1,9) = 1.3, p = 0.28; PMd, F(1,9) = 0.94, p = 0.37). Further, fMR-metric functions in frontal areas related to the second EEG component showed training-induced increases in sensitivity, as indicated by a significant increase in the slope of the fMR-metric function for SFG (F(1,9) = 14.2, p < 0.01) after training (only seven of the participants showed activation in SEF resulting in functions that were not significantly fitted). These findings suggest that learning modulates a feedforward network of areas associated with distinct processes related to global form discrimination (i.e., early form integration vs later categorical discrimination). In particular, learning modulates early processing in higher occipitotemporal and parietal regions associated with global form integration, while later processing in prefrontal regions engaged in categorical judgments.
FMR-metric functions. A, B, FMR-metric curves based on the classification of radial versus concentric stimuli across conditions for regions significantly correlated with the first EEG component (A) and the second EEG component (B). The classifier performance (proportion correct) at each condition was averaged across observers and fitted with cumulative Gaussian functions for each session. Gray dotted lines indicate pretraining sessions. Black solid lines indicate post-training sessions. The table below indicates the goodness of the fit (R and p values) for ROIs with nonsignificantly fitted fMR-metric functions in at least one of the two scanning sessions. All ROIs showed significantly fitted fMR-metric functions with the exception of PMd (Component 1, r = 0.33, p = 0.47; Component 2, r = 0.57, p = 0.18) and SEF (Component 1, r = 0.17, p = 0.68; Component 2, r = 0.53, p = 0.20).
Learning-dependent changes: EEG-metric functions
As described above, univariate analyses of EEG signals did not show any significant differences in latency or amplitude between sessions for either of the two components. Therefore, similar to the analysis of the fMRI data, we used sensitive multivariate methods (i.e., pattern classification) for comparing EEG data before and after training. Similar to the fMR-metric functions, we generated EEG-metric functions (Philiastides and Sajda, 2006; Das et al., 2010) for each of the two EEG components (Fig. 5A). We tested whether decoding radial versus concentric patterns from single-trial EEG data improved after training. Our results showed that learning shapes early processes related to global form perception (i.e., detection and integration) as well as later processes related to perceptual judgments (i.e., categorization). In particular, comparing EEG-metric functions before and after training showed significant learning-dependent changes in both EEG components. We observed a significant increase in the slope of the EEG-metric functions after training for both components (F(1,9) = 16.6, p < 0.01) and no significant interaction between component and session (F(1,9) = 0.43, p = 0.50).
EEG-metric functions. A, EEG-metric curves for the first and second components. The classifier performance at each condition was averaged across observers and fitted with cumulative Gaussian functions for each session. Gray dotted lines indicate pretraining. Black solid lines indicate post-training. EEG-metric functions were significantly fitted for both the first component (pretraining, r = 0.85, p = 0.02; post-training, r = 0.81, p = 0.03) and second component (pretraining, r = 0.89, p < 0.01; post-training, r = 0.9, p < 0.01). B, Correlating psychometric and EEG-metric functions. As with the fMR-metric functions, we scaled the cumulative Gaussian model obtained from the psychophysical data to fit the classifier predictions based on single-trial EEG data. EEG-metric functions were significantly fitted for both the first component (pretraining, r = 0.80, p = 0.03; post-training, r = 0.84, p = 0.02) and second component (pretraining, r = 0.82, p = 0.03; post-training, r = 0.81, p = 0.03).
Control analyses
We performed the following additional analyses to control for possible confounding factors. In particular, to control for the possibility that our results were due to random correlations in the data, we computed the fMR-metric and EEG-metric functions from randomly permuted signal patterns (i.e., we randomized the correspondence between the data and training labels and estimated the classifier prediction for each stimulus condition). The lack of significant correlations in these control analyses supports our interpretation for a link between task-relevant behavioral performance and neural preferences. Supporting evidence for this link comes from an additional analysis. In particular, fitting the fMRI (Fig. 6) and EEG (Fig. 5B) data using a scaled version of the psychometric function showed similar learning-dependent changes.
Correlating psychometric and fMR-metric functions. Similar to previous studies (Zenger-Landolt and Heeger, 2003; Chandrasekaran et al., 2007), we scaled the cumulative Gaussian model obtained from the psychophysical data to fit the fMRI data (i.e., the predictions of the pattern classifier) according to the following equation: y = B + (S/[1 + exp(β − αx)]), where B is the baseline, S is the scale of the fitting, β is the intercept, and α is the slope of the cumulative Gaussian model. A, B, Data are shown for regions significantly correlated with the first EEG component (A) and the second EEG component (B) for each session. Black solid lines indicate pretraining. Gray dotted lines indicate post-training. All ROIs showed significantly fitted fMR-metric functions with the exception of PMd (Component 1, r = 0.38, p = 0.38; Component 2, r = 0.56, p = 0.19) and SEF (Component 1, r = 0.29, p = 0.59; Component 2, r = 0.48, p = 0.25).
The design of our study allowed us to rule out a number of less likely interpretations of our results. First, it is unlikely that the learning-induced changes we observed resulted from learning-specific category exemplars or stimulus–response associations, as the stimuli tested during scanning differed in their visual properties (i.e., signal level) from the stimuli presented during training. Further, by randomizing the motor responses based on the cue in the main experiment, we controlled for the possibility that the results could be due to memorized stimulus–response associations. Second, the learning-dependent changes we observed could not be due to differences in task difficulty across conditions, as the classification analysis compared trials associated with different stimuli (radial vs concentric) rather than conditions. Further, analysis of the fMRI responses (percentage of signal change) across areas did not show any significant differences between the two fMRI sessions (F(1,9) = 1.31, p = 0.28) or interaction between ROI and session (F(1,9) = 0.31, p = 0.46). This result suggests that the learning-dependent fMRI changes we observed could not be accounted for by differences in attentional allocation between the two sessions (i.e., training may result in enhanced target salience and increased fMRI responses, or familiarity with the task may decrease fMRI responses). Thus, our experimental design and additional analyses control for the possibility that nonspecific effects rather than form-specific learning contribute to our findings. This is supported by additional ongoing behavioral studies showing lack of improvement in Glass pattern discrimination without training or transfer to nontrained tasks (e.g., contrast discrimination).
The cued-delay paradigm we used controlled for differences in the observers' response time. That is, observers made their decision during the delay after stimulus offset and waited for the cue before they could select the correct motor response, resulting in similar response times across stimulus conditions. As the stimulus–response association was randomized across trials, the motor response could not be anticipated on a given trial. Further, a searchlight-based classification (Fig. 7) on the button press used by the observers to indicate their behavioral choice showed significant accuracies in motor regions but not in occipitotemporal, parietal, or prefrontal regions, suggesting that results in these areas cannot be simply explained on the basis of motor responses.
Searchlight analysis related to motor responses. A, B, Using the searchlight method (Kriegeskorte et al., 2006) with a leave-one-run-out cross-validation, we trained a linear SVM to classify the finger used by the observers for indicating their behavioral choice based on fMRI data from the first volume (A) and the second volume (B) for each trial. For each observer, the classification accuracy was obtained by averaging the accuracy across cross-validations. We then performed a second-level analysis (t test on the accuracies across observers and the two sessions) and identified the voxels showing significantly higher accuracy than chance (p < 0.001, with cluster threshold). The t-statistic maps are superimposed on flattened cortical surfaces of both hemispheres. The analysis on the first volume of each trial, during which the stimulus was presented, showed significant activations in the central sulcus (CS) and premotor ventral cortex (PMv), possibly related to motor response preparation. The same analysis on the second volume, during which the motor response was executed, showed significant CS activation, consistent with the role of this area in motor execution. Sulci are shown in dark gray. Gyri are shown in light gray.
Finally, eye-movement recordings during scanning showed no significant differences in the eye position, number, or amplitude of saccades across stimulus conditions and sessions. In particular, a repeated-measures ANOVA (Greenhouse–Geisser corrected) indicated that there was no significant difference between stimulus conditions on mean horizontal eye position [pretraining, F(1.6,3.1) = 0.67, p = 0.54; post-training, F(1.2,2.5) = 0.73, p = 0.50], mean vertical eye position [pretraining, F(1.9,3.8) = 3.19, p = 0.15; post-training: F(1.3,2.6) = 3.26, p = 0.19], mean saccade amplitude [pretraining, F(1.8,3.6) = 1.60, p = 0.31; post-training, F(1.4,2.7) = 0.60, p = 0.55], or the number of saccades per trial per condition [pretraining, F(1.5,3.1) = 0.80, p = 0.49; post-training, F(1.6,3.2) = 0.54, p = 0.59]. Further, no significant interaction was observed between session and stimulus conditions on horizontal eye position [F(1.8,3.7) = 1.19, p = 0.39], vertical eye position [F(1.1,2.2) = 5.20, p = 0.14], saccade amplitude [F(1.4,2.9) = 2.02, p = 0.27], and number of saccades [F(1.6,3.2) = 0.09, p = 0.88]. These analyses suggest that it is unlikely that our results were significantly confounded by eye movements.
Discussion
By combining behavioral measurements and simultaneous EEG-fMRI recordings, we provide evidence for distinct brain mechanisms that mediate learning when sensory uncertainty (i.e., noise) challenges perceptual judgments. Our work advances our understanding of the processes that mediate adaptive shape recognition beyond previous studies in the following main respects.
First, previous functional imaging studies have implicated occipitotemporal and frontoparietal circuits in shape learning (Dolan et al., 1997; Gauthier et al., 1999; Grill-Spector et al., 2000; Chao et al., 2002; Kourtzi et al., 2005; Op de Beeck et al., 2006). However, the indirect, slow hemodynamic response of fMRI limits our understanding of the spatiotemporal brain dynamics that mediate visual form learning. Here, using simultaneous EEG-fMRI recordings, we demonstrate that learning enhances observers' sensitivity to discriminate visual forms in noise by shaping a circuit of feedforward interactions among higher occipitotemporal, parietal, and frontal areas. In particular, enhanced visual sensitivity is mediated by neural changes at (1) early processing stages in occipitotemporal and parietal regions known to be involved in the detection and integration of global visual forms (Ostwald et al., 2008), and (2) later decision stages in prefrontal regions thought to accumulate sensory evidence for perceptual judgments (Newsome et al., 1989; Kim and Shadlen, 1999; Shadlen and Newsome, 2001; Heekeren et al., 2004; Grinband et al., 2006).
Second, the learning-dependent changes we observed in later frontal processes are consistent with previous imaging studies implicating frontal regions in category and rule learning (for review, see Keri, 2003; Ashby and Maddox, 2005; Poldrack and Foerd, 2008; Seger and Miller, 2010). In particular, improved sensitivity in visual categorization in noise is related to learning-dependent changes in dorsolateral prefrontal regions (SFG) known to contribute to the accumulation of sensory information toward a decision (Newsome et al., 1989; Kim and Shadlen, 1999; Shadlen and Newsome, 2001; Heekeren et al., 2004; Grinband et al., 2006). However, our findings demonstrate that improved sensitivity in the discrimination of visual forms involves not only later but also earlier processes in higher occipital regions (i.e., V3/V3B) known to mediate perceptual integration (Ostwald et al., 2008). This is consistent with previous findings showing learning-dependent changes early in processing (Fahle and Skrandies, 1994; Skrandies et al., 2001; Ding et al., 2003; Shoji and Skrandies, 2006; Song et al., 2007; Pourtois et al., 2008; Bao et al., 2010). However, these previous studies have concentrated on the detection of low-level visual features (e.g., orientation, motion) rather than the discrimination of complex global forms. Our findings extend beyond this previous work by showing that learning to discriminate visual forms in noise alters early processes specific to global form perception in higher occipitotemporal areas.
The role of learning in modifying early sensory processing remains highly debated (Adini et al., 2002; Teich and Qian, 2003). Recent studies suggest that learning alters later decision-related processes thought to reweight the contributions of early sensory representations (Dosher and Lu, 1999; Li et al., 2004; Law and Gold, 2008; Jacobs, 2009). Our findings suggest that learning modifies early recurrent processing (Roelfsema and van Ooyen, 2005; Roelfsema, 2006) within higher visual and parietal areas engaged in the integration (Ostwald et al., 2008) and categorization (Freedman and Assad, 2006) of global visual forms. Specifically, our findings showing that learning modulates shape processing in higher occipitotemporal regions (LO) sheds light on the contested role of temporal cortex in visual learning. LO has been suggested to contribute to the comparison of sensory evidence during decision making by accumulating information to the time of recognition (Ploran et al., 2007), and supporting the persistence of a percept (Philiastides and Sajda, 2007). Here we show that learning to discriminate visual forms in clutter modulates early sensory processing in LO, suggesting that visual detection and integration in occipitotemporal areas are modulated by early recurrent mechanisms. In contrast to previous physiology (Schoups et al., 2001; Li et al., 2004) and imaging studies (Schwartz et al., 2002; Furmanski et al., 2004; Kourtzi et al., 2005; Sigman et al., 2005; Mukai et al., 2007; Yotsumoto et al., 2008; Bao et al., 2010), we did not observe learning-dependent changes in primary visual cortex. This finding could be due to our stimulus choice (global form rather than local orientation features) and may relate to previous electrophysiological results that demonstrate enhanced perceptual learning effects in higher compared with primary visual areas (Yang and Maunsell, 2004; Raiguel et al., 2006).
Third, our work provides novel methodological advances by combining simultaneous EEG-fMRI with pattern classification and applying this methodology for the first time to the study of visual form learning. Combining the high temporal and spatial resolution of EEG and fMRI allows us to investigate the processing dynamics between cortical circuits involved in perceptual judgments. Although previous studies (Philiastides and Sajda, 2007) have recorded EEG and fMRI data at different sessions, simultaneous recordings avoid differences across sessions (e.g., alertness, adaptation, familiarity) that confound learning effects. Further, the EEG-informed fMRI analysis bypasses the source localization limitations of EEG related to the infinite number of possible source configurations that may give rise to a given scalp distribution. Finally, comparing the choices of linear classifiers (EEG/fMR-metric functions) with the observers' choices (psychometric functions) provides us with a sensitive tool for directly comparing brain activity and behavior and determining the link between adaptive human choices and learning-dependent brain plasticity (Pessoa and Padmala, 2007; Li et al., 2009).
Using this methodology, we provide novel evidence for learning mechanisms that modify processing across distinct visual recognition processes. It is important to note that EEG-fMRI signals reflect processing at the level of large neural populations and do not allow us to discern whether learning reflects changes in the selectivity of single neurons or correlations across neural populations. Further, correlations between EEG and fMRI signals do not necessarily imply that the signals have the same underlying physiological source. However, recent work shows that trial-by-trial EEG analysis has the potential to identify correlated fMRI activity, thus providing information about the cortical network engaged in specific temporal processes (Debener et al., 2005; Eichele et al., 2005; Mayhew et al., 2010b). Despite these potential limitations, our findings make interesting predictions that can be further tested by physiology. In particular, we suggest that improved sensitivity in the discrimination of global forms in clutter may relate to changes in neural sensitivity (i.e., tuning of neural populations that show weak preferences to stimuli in clutter before training), as indicated by learning-dependent changes in visual form areas at early stages of processing. In sum, our findings propose distinct functional brain plasticity mechanisms that support behavioral improvements and mediate our ability to make successful perceptual judgments in the face of sensory uncertainty.
Footnotes
This work was supported by grants to Z.K. from the Biotechnology and Biological Sciences Research Council (D52199X, E027436) and to S.L. from the National Natural Science Foundation of China (31070896). We thank A. P. Bagshaw for help setting up equipment.
- Correspondence should be addressed to Zoe Kourtzi, University of Birmingham, School of Psychology, Edgbaston, Birmingham B15 2TT, UK. z.kourtzi{at}bham.ac.uk