Abstract
Primary visual cortex (V1) forms the initial cortical representation of objects and events in our visual environment, and it distributes information about that representation to higher cortical areas within the visual hierarchy. Decades of work have established tight linkages between neural activity occurring in V1 and features comprising the retinal image, but it remains debatable how that activity relates to perceptual decisions. An actively debated question is the extent to which V1 responses determine, on a trial-by-trial basis, perceptual choices made by observers. By inspecting the population activity of V1 from human observers engaged in a difficult visual discrimination task, we tested one essential prediction of the deterministic view: choice-related activity, if it exists in V1, and stimulus-related activity should occur in the same neural ensemble of neurons at the same time. Our findings do not support this prediction: while cortical activity signifying the variability in choice behavior was indeed found in V1, that activity was dissociated from activity representing stimulus differences relevant to the task, being advanced in time and carried by a different neural ensemble. The spatiotemporal dynamics of population responses suggest that short-term priors, perhaps formed in higher cortical areas involved in perceptual inference, act to modulate V1 activity prior to stimulus onset without modifying subsequent activity that actually represents stimulus features within V1.
Introduction
An enduring challenge in visual neuroscience has been to understand how neural activity in visual cortex relates to what we see. During the decades immediately following Hubel and Wiesel's seminal discoveries (Hubel and Wiesel, 1962), research focused on predicting neural responses to simple visual features in the primary visual cortex (V1), where the first cortical representations of visual information are formed. More recently, the challenge has expanded to neural activity in response to dynamic visual stimuli embedded in more complex contexts, leading to neural models of visual cortex that incorporate nonlinear neural operations, such as gain control and normalization (Carandini and Heeger, 2012). Moreover, the scope of work on this challenge has expanded to human brain imaging studies that seek to identify (Kay et al., 2008) and even reconstruct (Nishimoto et al., 2011) natural scenes by decoding cortical responses.
While establishing a tight linkage between V1 activity and stimulus conditions setting off those neural chain reactions, research has also explored V1 activity's impact on the final outcome of neural processes ensuing from it. A question at the core of this exploration is whether or how V1 neurons' responses contribute to perceptual judgments. A fruitful strategy for addressing this question is to compute a trial-to-trial correlation between single neurons' responses to physically identical stimuli and perceptual choices made by observers performing a difficult perceptual decision task on those stimuli, dubbed “choice probability” (CP). Whereas above-chance-level CPs have been consistently found in high-tier sensory areas (for reviews, see Nienborg and Cumming, 2010; Nienborg et al., 2012), non–sensory associative areas (Hernández et al., 2010), and subcortical areas (Liu et al., 2013), the presence of CPs in V1 remains controversial (Grunewald et al., 2002; Nienborg and Cumming, 2006; Palmer et al., 2007). Moreover, the mere demonstration of statistically significant CPs does not necessarily support a causal role of V1 in settling perceptual choices. Significant CPs in given neurons may arise when a fraction of activity of those neurons is modulated by other neurons that actually cause choices, either via feedback (Nienborg and Cumming, 2009) or via horizontal connections (Cohen and Newsome, 2009; Law and Gold, 2009; Nienborg and Cumming, 2010). Another caveat to previous attempts at relating V1 activity to its perceptual consequences is that CPs have not been estimated for population responses, despite growing evidence for the importance of population activity in neural representation of sensory signals (Hol and Treue, 2001; Chen et al., 2006; Jazayeri and Movshon, 2007; Graf et al., 2011; Berens et al., 2012).
By acquiring fMRI measurements of V1 population activity while human observers performed a difficult ring-size discrimination task (Fig. 1), we identified choice-correlated responses and compared them with stimulus-correlated responses. We reasoned that, if V1 activity causally contributes to choices, choice-related V1 responses should match stimulus-related responses both in timing and in neural origin. Our fMRI measurements of V1 population activity, however, run counter to this prediction: stimulus- and choice-related components arise at different points in time and in different cortical subpopulations. Moreover, realizing that small deviations in fixational eye movements could affect our fMRI measurements, we tested and confirmed that this pattern of results cannot be attributed to fixational eye movements.
Materials and Methods
Observers
Nineteen individuals (nine females; 20–30 years old; normal or corrected-to-normal vision) participated in the main fMRI experiment, and 23 (11 females; 18–36 years old; normal or corrected-to-normal vision; 2 of whom also participated in the main experiment) participated in the eye-tracking experiment, with informed consent in accordance with the guidelines and approval of the Institutional Review Board at Seoul National University. All except one observer (the first author, who participated in the fMRI experiment) were naive to the purpose of the study.
Main fMRI experiment
Experimental setup.
MR data were collected using a 3 Tesla Siemens Tim Trio scanner equipped with a 12-channel Head Matrix coil at the Seoul National University Brain Imaging Center. Stimuli were generated using MATLAB (MathWorks) in conjunction with MGL (http://justingardner.net/mgl) on a Macintosh computer. Observers looked through an angled mirror attached to the head coil to view stimuli displayed via an LCD projector (Canon XEED SX60) onto a back-projection screen at the end of the magnet bore at a viewing distance of 87 cm, yielding a field of view of 22 × 17°.
Behavioral protocol.
Observers participated in one fMRI session of retinotopy-mapping runs, wherein V1 boundaries, a population eccentricity-tuning map, and a hemodynamic impulse response function (HIRF) were defined, and one session of main experimental runs, wherein observers performed a ring-size discrimination task (Fig. 1). On each trial of this task, the observer initially viewed a small fixation dot (diameter, 0.12°; luminance, 321 cd/m2) appearing at the center of a dark (luminance, 38 cd/m2) screen. A small but foveally visible increase in the size of the fixation dot (from 0.12° to 0.18° in diameter) forewarned the observer of an upcoming presentation of the test stimulus. That test stimulus consisted of the brief (300 ms) presentation of a thin (full-width at half-maximum of a Gaussian envelope, 0.17°), white (321 cd/m2), dashed (radial frequency, 32 cycles/360°) ring that counter-phase-flickered at 10 Hz. After each brief ring presentation, observers reported the ring's size (“small” or “large”) using a left-hand or right-hand key, guessing if necessary. Observers were instructed to maintain strict fixation on the central dot, for otherwise they would be unable to detect the change in the fixation dot signaling a forthcoming brief target stimulus and would invariably hamper their performance on the ring-size discrimination task.
Inside the scanner but without being scanned, observers performed 54 practice trials and then 180 threshold-estimation trials before the main experimental scan runs. On each of the threshold-estimation trials, which were performed with intertrial interval of 2.7 s, one of 20 different-sized rings was presented according to a multiple random staircase procedure (four randomly interleaved 1-up-2-down staircases, two starting from the easiest stimulus and the other two starting from the hardest one) with trial-to-trial feedback. A Weibull function was fit to the psychometric curves obtained from the threshold-estimation trials using a maximum-likelihood procedure. From the fitted Weibull function, the size contrast (SC) associated with 70.7% correct was estimated to determine the radii (rM, rS, and rL) of the three ring stimuli (S, Small; L, large; M, medium; Fig. 1B) used in the main scan runs: rM = 2.84°; rS = (1 − SC) * rM; rL = (1 + SC) * rM. In the main experimental scan runs, observers performed 156 trials in total, while being scanned over six, 343.2 s functional scan runs, on these three different-sized rings, which were presented in the order defined by an m-sequence (base = 3, power = 3; nine S- and L-rings and eight M-rings were presented; all scan runs started with two M-rings) (Buracas and Boynton, 2002) to null the autocorrelation between stimuli, which were shown with intertrial interval of 13.2 s (Fig. 1A, trial structure). Before participating in the fMRI experiments, each observer practiced on the task intensively (∼6000 trials with short, 2.7 s, intertrial interval over 6 sessions) outside the scanner.
MRI data acquisition and preprocessing.
For each observer's brain, two 3D, T1-weighted, high-resolution (1 × 1 × 1 mm) anatomical scans were acquired with an optimized protocol (MPRAGE; field of view (FOV), 256 mm; repetition time (TR), 1.9 s; time for inversion, 700 ms; time to echo (TE), 2.36 ms; flip angle (FA), 9°), averaged to improved image fidelity and segmented/flattened to be aligned with the data from the retinotopy-mapping and main experimental scan sessions using FreeSurfer (http://surfer.nmr.mgh.harvard.edu) (Dale et al., 1999).
T2*-weighted functional images were obtained with a gradient EPI pulse sequence for the retinotopy-mapping and main experimental scans. The parameters for these two scan types, which differed slightly, were as follows (retinotopy followed by experimental): TR, 2.7 s, 2.2 s; TE, 40 ms, 40 ms; FA, 77°, 73°; FOV, 208 mm, 207 mm; image matrix, 104 × 104, 90 × 90; slice thickness, 1.8 mm with 11% gap, 2 mm with 15% slice gap; slices, 30, 22 oblique transverse slices; bandwidth, 858 Hz/px, 750 Hz/px; effective voxel size 2.0 × 2.0 × 1.998 mm, 2.3 × 2.3 × 2.3 mm). At the beginning of each functional session, a high-resolution (1.078 × 1.078 × 2.0 mm, 1.083 × 1.083 × 2.3 mm) T1-weighted inplane image was acquired with the same slice prescription as the functional images (MPRAGE; TR, 1.5 s; TI, 700 ms; TE 2.79 ms; FA, 9°) for the image-based registration.
All functional EPI images were motion-corrected using SPM8 (http://www.fil.ion.ucl.ac.uk/spm) (Friston et al., 1996; Jenkinson et al., 2002) and then coregistered to the high-resolution reference anatomical volume of the same observer's brain via the high-resolution inplane image (Nestares and Heeger, 2000). After coregistration, the images of the retinotopy-mapping scan were resliced, but not spatially smoothed, in alignment with the spatial dimensions of the main experimental scans. The area V1 was manually defined on the flattened gray-matter cortical surface mainly based on the meridian representations, resulting in 825.4 ± 140.7 (mean ± SD across observers) voxels. The individual voxels' time-series were divided by their means to convert them from arbitrary intensity units to percentage modulations and were linearly detrended and high-pass filtered (Smith et al., 1999) using custom scripts in MATLAB (MathWorks). The cutoff frequency was 0.0185 Hz for the retinotopy-mapping session and 0.0076 Hz for the main session. The first 10 (of 90; a length of a cycle) and 6 (of 156; a length of a trial) frames of each retinotopy-mapping and main scan, respectively, were discarded to minimize the effect of transient magnetic saturation and allow the hemodynamic response to reach steady state. The “blood-vessel-clamping” voxels, which show unusually high variances of fMRI responses, were discarded (Olman et al., 2007; Shmuel et al., 2007); a voxel was classified as “blood-vessel-clamping” if its variance exceeds 10 times of the median variance value of the entire voxels.
Retinotopy-mapping scans.
Standard traveling wave methods (Engel et al., 1994; Sereno et al., 1995) were used to define V1 (Fig. 2A), to estimate each observer's hemodynamic impulse response function (HIRF) of V1 (Fig. 2D), and to estimate V1 voxels' receptive field center and width (Fig. 2B,C). High-contrast and flickering (1.33 Hz) dartboard patterns were presented either as 0.89°-thick expanding or contracting rings in two scan runs, as 40°-width clockwise or counterclockwise rotating wedges in four runs, or in one run as four stationary, 15°-wide wedges forming two bowties centered on the vertical and horizontal meridians. Each scan run consisted of 9 repetitions of 27 s period of stimulation. The fixation behavior during the scans was assured by monitoring observers' performance on a fixation task, in which they had to detect any reversal in direction of a small dot rotating around the fixation.
HIRF estimation
For each observer, the data from the bowtie scan provided the estimation of HIRF. The procedure of HIRF estimation was as follows. First, a group of voxels that were driven by the bowtie stimuli was defined by identifying the ones whose signal-to-noise ratio (SNR; the ratio of Fourier power at the stimulus frequency, 0.037 Hz, to at frequencies higher than the third harmonics, >0.111 Hz) was >3. Second, the time-series from those voxels (204.6 ± 50.8 and 136.9 ± 30.9 voxels locked to the vertical and horizontal meridians, respectively; mean ± SD across observers) were aligned each to the stimulus onset and then all pooled and averaged across voxels to enhance SNR, resulting in a single representative time-series. Third, the HIRF was parameterized using a difference of two γ functions (Friston et al., 1998; Glover, 1999) by fitting the predicted fMRI time-series to the representative time-series using a least-square procedure, which was implemented by the ga function (for initial estimation) in conjunction with fminsearch function in the Global Optimization Toolbox in MATLAB (MathWorks). The model explained a large fraction of the total variance in the representative time-series (91.8 ± 3.7%; mean ± SD across observers).
Estimation of population eccentricity-tuning curves.
The map of population eccentricity-tuning curves (Fig. 2C) was defined by fitting a one-dimensional Gaussian function simultaneously to the time-series of fMRI responses to the expanding and contracting ring stimuli, which were also used for definition of V1. Details of this procedure are as follows.
First, as in the HIRF estimation, the time-series of fMRI were extracted only from a relevant group of voxels with SNR >3 in both of the ring scan runs. Second, an eccentricity-tuning curve (gain over eccentricity, in other words) of a single voxel, g(ε), was modeled by a Gaussian as a function of the eccentricity in a visuotopic space, ε, and it was parameterized by a peak eccentricity, e, and a tuning width, σ:
Third, the collective responses of neurons within that voxel with a particular g(ε) at a given time frame t, n(t), were predicted by multiplying g(ε) to spatial layout of stimulus input at that time frame, s(ε,t):
Fourth, the predicted time-series of fMRI responses of that voxel, fMRIp(t), were generated by convoluting n(t) with a scaled (by β) copy of the HIRF acquired from the meridian scans (as described above), h(t)β plus a baseline response, b:
Fifth, the model parameters (e, σ, β, b) were found by fitting fMRIp(t) to the predicted time-series of fMRI responses to actual stimulation, fMRIo(t), by minimizing the residual sum of squared errors between fMRIp(t) and fMRIo(t) over all time frames, RSS:
Sixth, a valid group of voxels was further refined by discarding voxels with goodness of fit, estimated by R2, the squared correlation between fMRIp(t) and fMRIo(t), below a criterion, which itself was established by a bootstrap procedure. The bootstrap distribution of R2 was created by computing R2 values based on the fits of fMRIp(t) to bootstrap sample time-series of fMRIo, which was in turn obtained by superposing the 8 repetitions of 10 normally distributed random values (mean 0; SD √3) onto 80 normally distributed random values (mean 0; SD 1). The bootstrapped 99% confidence criterion for R2 was 0.4.
Seventh, the estimated parameters, e and σ, were approximately consistent with previous fMRI studies (Dumoulin and Wandell, 2008; Kay et al., 2008; Harvey and Dumoulin, 2011), but we noticed two notable exceptions with respect to the value of σ: its sudden increase and decrease in the foveal and peripheral regions and a large gap around σ = 2°. Therefore, the voxels whose values of e were either <1.26° or >5.65° or whose values of σ were >2 were discarded from further analysis. Overall, the number of the valid voxels was 191.5 ± 54.5 (mean ± SD across observers), resulting in the selection rate of 23.5 ± 6.8%. Our conservative rule of voxel selection was supported by the well-known linear regression of σ by e with a power of 1.1 (Duncan and Boynton, 2003):
where the estimated c and d are 0.0952 and 0.5953, respectively. Based on this relationship, we constructed the matrix of population eccentricity-tuning curves by assigning σ values to e values following Equation 5. The pattern of results we found, incidentally, remained unchanged when we reanalyzed our data applying a less conservative voxel selection criterion (R2 = 0.2) that increased the selection rate (56.1 ± 7.8%).
Model prediction of fMRI population responses to ring stimuli.
The model prediction of fMRI population responses to the different-sized rings based on the map of population eccentricity-tuning curves (Fig. 2E) was generated in the following steps. First, a vector of stimulus events was defined by an m-sequence of the S-, M-, and L-rings (80 trials with base of 3 and power of 4) with interstimulus interval of 13.2 s (6 time bins with 2.2 s repetition time, representing −1.1, 1.1, 3.3, 5.5, 7.7, and 9.9 s after stimulus onset, respectively), which replicated the temporal structure of stimulus events in the main experiment except for the sequence length. Second, for each simulation trial, a profile of responses of the 21 eccentricity bins (for how these bins were defined, see Definition of eccentricity bins) to a given ring (the across-observer averaged SC was used to determine the size of S- and L-ring) was defined by a set of gains of population eccentricity-tuning curves at the eccentricity of the stimulus across of the bins (Fig. 2C, blue and red bell-shape curves with dotted lines at center, plotted on the right-hand vertical axis). The generation of the response profiles across all the simulation trials resulted in a spatiotemporal matrix of neural responses to the sequence of ring stimuli (21 eccentricity bins × 480 time frames). For ease of comparison with the experimental data, sum of squares of the matrix was scaled to match the size of the matrix (21 × 480). Third, the convolution of this matrix with the observer-averaged V1 HIRF (Fig. 2D) predicted a matrix of noise-free fMRI responses to the sequence of rings. Finally, the 2D matrix of differential responses to the L- and S-rings was obtained by subtracting the trial-locked average of predicted fMRI responses to the S-ring trials from that to the L-ring trials (Fig. 2E).
Definition of eccentricity bins.
The 21 eccentricity bins and their fMRI responses were defined by the following steps. First, the estimated eccentricity values (e), which were defined in visuotopic scale, were converted into values in units of “relative cortical distance” (ercd), which scales positions relative to the cortical region representing rM, based on the canonical cortical magnification factor for human visual cortex (Horton and Hoyt, 1991):
Second, the ercd values were split into 21 equal-sized bins, such that the central (11th) bin represents rM (2.84°) whereas the foveal and peripheral ends represent 1.26° and 5.65°, respectively. Because of differences in cortical coverage, the number of voxels in single bins monotonically increased as a function of the preferred eccentricity. Third, the fMRI responses from V1 voxels (Fig. 3A) were transformed into those from the 21 eccentricity bins by applying the Gaussian kernels, the centers of which were at each bin center and the full-width half-maximum of which were the two units of bin size (Fig. 3B). Fourth, the binned fMRI responses of each scan run from each observer were scaled to match the size of the matrix (21 eccentricity bins × 150 time frames; Fig. 3C), as we did to the simulated time-series.
Computation of stimulus and choice probabilities at local cortical sites.
To quantify the ability of an ideal observer to predict from the matrix of fMRI responses whether the stimulus presented was an S-ring or an L-ring, we computed stimulus probabilities (SPs) in the following way. For each observer, trials were sorted into 6 classes jointly defined by a stimulus and by a choice: stimulus|choice = S|S, S|L, M|S, M|L, L|S, and L|L (Fig. 4A). By contrasting the stimulus factor conditional on the choice factor (‘S|S vs L|S’ or ‘S|L vs L|L’; Fig. 4B, top), receiver operating characteristic (ROC) curves (Fig. 4C) were constructed by defining α and β, two integrals of the conditional response probability density p[r|S] and p[r|L] as a function of c, a classification threshold:
where r is a response at a cell within the trial-related matrix of fMRI responses (Fig. 3D). Then, because the ROC curve is β plotted as a function of α, the probability of the correct classification for a given ‘S versus L’ contrast pair P[correct] is equivalent to the area under the ROC curve, which can be computed by integrating β over all values of α:
In a similar manner, we quantified the ability of an ideal observer to predict an observer's choice by computing CPs from the fMRI responses at the same local cortical sites used to compute SPs. This entailed contrasting the choice factor conditional on the stimulus factor (S|S vs S|L, M|S vs M|L, and L|S vs L|L; Fig. 4B, bottom).
When the procedure above is applied to the two “stimulus-contrast” pairs and the three “choice-contrast” pairs, the resulting P[correct]s are the two individual SPs (SPchoice=S and SPchoice=L; Fig. 4A, horizontal brackets) and the three CPs (CPstimulus=S, CPstimulus=M, and CPstimulus=L; Fig. 4A, vertical brackets), respectively. The grand SP and CP (Fig. 4D) were the averages of those individual SPs and CPs. Our definitions of SPs and CPs are different from the definition used in single-cell neurophysiological studies. Specifically, our definition means that values of SP >0.5 denote larger fMRI responses on the L-ring trials than on the S-ring trials, regardless of eccentricity bin preference (and conversely that SP values <0.5 indicate larger fMRI responses on S-ring trials relative to L-ring trials). Likewise, by our definitions, CP values >0.5 denote larger fMRI responses on the L-choice trials than on the S-choice trials (and conversely CP <.5 indicates larger fMRI responses on the S-choice trials than on the L-choice trials). The proportions of S- or L-choice trials within the S- or L-ring trials could be unbalanced, simply because of observers' above-chance-level performance. However, the numbers in the most unbalanced trial group across all observers were 48 to 5. In earlier studies, datasets containing at least five trials of each alternative are considered valid (Nienborg and Cumming, 2006).
The spatiotemporal cells with significant (corrected p < 0.05) SP or CP values were identified with the threshold-free cluster-enhancement method (TFCE) (Smith and Nichols, 2009): at each of 2000 permutations, the maximum TFCE (computed with dh = 0.1, H = 2, and E = 2/3) value out of all 126 spatiotemporal cells was taken to build up the null distribution, against which the observed TFCE values of each cells were compared.
Removal of nonspecific component from raw responses.
At each time frame, t, and at each eccentricity bin, i, the average of raw responses (RRs) across the ne (21) eccentricity bins was subtracted from the RRi to derive the tuned response (TRi) (Fig. 5A):
Computation of population-level stimulus and choice probabilities.
The procedure for computing the population SPs and CPs was identical to that for computing the SPs and CPs at the individual local cells, except that “r” in Equation 7 was replaced with “pr,” the weighted (w) sum of population RRs over the eccentricity bins (i = [1, ne]):
The three different weighting profiles, each representing the contributions of the individual eccentricity bins assessed by the three different schemes (the uniform, the discriminability, and the log-likelihood ratio schemes) were defined as follows. The uniform scheme assigned three discrete values to the eccentricity bins depending on which flanking side of the M-size ring their preferred eccentricities (e) belonged to:
The discriminability scheme (Fig. 7B) defined weights in proportion to the differential responses of given eccentricity bins to the L-size and the S-size rings, which were derived from the eccentricity-tuning curves defined from the retinotopy-mapping session:
where ge is the eccentricity-tuning curve of the eccentricity bin with preferred eccentricity, e, as defined by Equation 1, and the baseline offset, δ, is as follows:
The log-likelihood ratio scheme (Fig. 7C) defined weights by taking the differences between the log-likelihoods of obtaining a given response if the stimulus were the L-ring stimulus, logLL, and if the stimulus were the S-ring stimulus, logLS. Because the eccentricity-tuning curves were assumed to be described by a Gaussian function, the log-likelihood ratio weights at preferred eccentricity, e, can be simplified to the following formula:
where σL and σS are the tuning widths derived from Equation 5 with rL and rS, and the baseline offset, δ, is as follows:
Eye-tracking behavioral experiment
Experimental setup.
In a dimly lit room, observers viewed stimuli at 90 cm distance on a gamma-linearized 22-inch Totoku CV921X CRT monitor (800 × 600 pixels, 180 Hz vertical refresh) while their binocular eye positions were sampled at 500 Hz by an EyeLink 1000 Desktop Mount (SR Research), a video-based eye-tracker (instrument noise, 0.01° RMS). To minimize eye-tracking errors due to head motion, the observer adjusted the height of the chair and the table on which the monitor was mounted, positioned his/her head on a cushion-padded chin-rest (HeadSpot, UHCOTech), and fastened a head strap around the head to the chin-rest posts. We individually calibrated the eye-tracker before each session using the built-in five point routine (HV5). Observers took as many breaks during a session as they wished by disengaging the head from the chin-rest and moisturizing their eyes, which were often dried because of infrared illumination, using artificial tear. After each break, we recalibrated the eye-tracker before resuming the task.
Behavioral protocol.
Each observer participated in a total of three daily sessions: one for practice (315 short-interval trials), one for threshold SC estimation (315 short-interval trials plus four runs of the main task, 108 trials), and the other for six runs of the ring-size discrimination trials with eye position being monitored (162 trials). The stimuli and procedure matched those of the main fMRI experiment except for the following. First, although the luminance contrast between the stimuli and the background remained comparable with that in the fMRI experiment, their absolute luminance values were changed to 30 cd/m2 and 3 cd/m2, respectively. Second, the threshold SC value was determined using an even larger number of trials (315 instead of 180). Third, one more M-ring trial was added to a given run, resulting in slightly more M-ring trials (48 trials in total per observer) available for data analysis. Last, the number of practice trials was smaller compared with that in the fMRI experiment.
In addition, a visually guided saccade task (Tse et al., 2010) taking ∼4 min was conducted at the beginning of each session to measure the sensitivity of the eye-tracker. For this task, observers tracked a dot (diameter 0.12°) that either stayed in the center of the monitor screen (fixation block; 16 s) or jumped either 0.12° leftward or rightward from the center at pseudorandom times (saccade block; 31 s). In the saccade block, the location of the dot at each second was determined by an m-sequence (31 trials with base of 2 and power of 5), and each block was alternated five times after the initial fixation block.
Eye-tracking data analysis
Preprocessing.
First, the raw time-series of eye position were corrected for eye blinks (“deblinked” as it is called). As done in the previous studies using the same eye-tracker we used (Troncoso et al., 2008; Otero-Millan et al., 2012), eye blinks were identified by detecting time periods in which either pupil information was missing or pupil area measurements fluctuated abruptly with a large amplitude (>50 units per sample). The eye position samples centered on those eye blinks (±200 ms) were deemed to be “blink-confounded” and were excluded from the subsequent eye-position-based analyses. With this correction, the percentage of blink-confounded samples was 9.3 ± 7.3% (mean ± SD across observers). Then, the raw data were corrected for measurement errors associated with changes in pupil size, which are known to distort position estimates acquired by some video-based eye-trackers (Wyatt, 2010; Kimmel et al., 2012). The relationship between pupil size and eye position errors is likely to be affected by multiple factors, such as eye geometry or eyelid position, which are idiosyncratic among individuals. Thus, for each observer, we first found a linear trend that best captured the relationship between pupil size and eye position and then corrected the eye-blink-free data by regressing out the measurement errors associated with the best-fitting trend line. The detailed correction procedure and its validation will be described in a separate paper. The pupil size regressor accounted for 52.7 ± 24.8% (mean ± SD across observers) and 26.5 ± 22.4% of the total variance of the left and right horizontal eye positions, respectively, and 27.3 ± 18.7% and 28.7 ± 18.2% of the left and right vertical eye positions, respectively. The correction for pupil size was performed only after confirming that pupil size changes were not correlated with choices made by observers (see Results below; Fig. 9D).
Sensitivity analysis.
The accuracy and precision of eye-tracking data were evaluated based on the visually guided saccade task. To perform these analyses, the 500 Hz eye position data, both horizontal and vertical, were down-sampled to 1 Hz, by averaging the middle 167 eye-blink-free samples of each 1 second interval. Each 1 Hz sample was assigned to the three fixation conditions (−0.12°, 0°, and 0.12°) based on where the dot was shown but was discarded if its averaging duration contained >50 blink-confounded samples. The mean horizontal and vertical positions of the fixation conditions in each observer were calculated by averaging the 1 Hz samples. The accuracy was defined as the average deviations of the observed means from the true positions, merged across the fixation conditions and observers. The precision was defined within observer as the SD of the 1 Hz samples, merged across the fixation conditions after correcting for the true positions.
In addition, a computer simulation method was used to assess whether our eye-tracking setup could reliably distinguish eye position shifts as small as 0.1°. For each observer, two distributions of Gaussian random numbers, the mean of which differed by 0.1, were generated from the SDs of the 1 Hz samples and the number of actual S- and L-choices in M-ring trials (23.9 ± 5.3 and 24 ± 5.3, respectively; mean ± SD across observers). For each bootstrap set (n = 10,000), we tested whether there was a significant difference between the synthetic eye positions (paired t test across observers with p < 0.01). This analysis revealed that 99.9% of the bootstrap sets using the horizontal SDs were significantly different, whereas 49.4% of the bootstrap sets using the vertical SDs were found to be significantly different. Therefore, only the horizontal eye measurements were used in the subsequent analyses.
Analysis of microsaccades.
We defined microsaccades conservatively by designating them as events when the position measurements from the both eyes met the following set of criteria, which have been routinely used in previous studies (Engbert and Kliegl, 2003; Engbert and Mergenthaler, 2006): median velocity threshold, λ = 6; minimum duration of 6 ms; minimum intersaccadic interval of 20 ms; maximum amplitude of 2°. However, only the horizontal positions were used for detecting those binocular microsaccades because the vertical eye positions were often noisy because of pupil occlusion in some observers and because most microsaccades are known to occur along the horizontal meridian (Tse et al., 2004). The distribution of microsaccade amplitudes (n = 72,466, merged across observers; median, 0.19°; γ parameters, α = 1.66 and β = 0.14) was comparable with those reported in previous studies using video-based eye-trackers (Tse et al., 2010; Kimmel et al., 2012; Otero-Millan et al., 2012; Hafed, 2013).
Analysis of pupil size.
The raw pupil size data varied substantially across individuals (2135 ± 574 arbitrary size units; deblinked mean ± SD across observers). To make data comparable across different runs and observers, we normalized the raw pupil size by dividing them by their deblinked means within each run and converting into percentage mean values. The pupil size CPs were calculated using the same procedure described above, except that “r” in Equation 7 was replaced with the mean pupil size in each 2.2 s interval, matched to the duration associated with acquisition of an fMRI volume in the main experiment. Our definition means that values of CP >0.5 denote larger pupil size on the L-choice trials than on the S-choice trials (and conversely that CP values <0.5 denote larger pupil size on S-choice trials relative to L-choice trials). In calculating group statistics for CPs (Fig. 9D, bottom, diamonds), the values from six observers whose dataset contained less than five incorrect trials in either S- or L-ring conditions were excluded; none was excluded in calculating CPstimulus=M (Fig. 9D, bottom, circles).
Analysis of eye position and vergence angle.
Eye positions were sampled binocularly, so the final estimates of eye position were defined by taking the average of the position measurements from both eyes. In addition, we estimated relative vergence angles, deviations from the default vergence angle determined by the distance between the two eyes and the fixation point on the display: the positive and negative angle values indicate convergence and divergence, respectively. The eye position and vergence CPs (Fig. 9E, bottom and Fig. 9F, bottom, respectively) were calculated using the procedure described above, except that “r” in Equation 7 was replaced with the mean horizontal eye position and the mean vergence angle, respectively, in each 2.2 s interval of each trial. Our definition means that values of CP >0.5 denote more rightward eye position or convergent eye movement, respectively, on the L-choice trials than on the S-choice trials (and conversely that CP values <0.5 denote more rightward eye position or convergent eye movement, respectively, on S-choice trials relative to L-choice trials).
Results
Fine ring-size classification task
We devised a difficult one-interval two-alternative forced-choice task wherein observers viewed one of three different-sized rings that were always symmetrically centered around a central fixation mark and classified the ring as either “small” or “large” (Fig. 1A). To detect cortical signatures of stimulus and choice simultaneously in trial-to-trial population fMRI activity of V1, we optimized the task and stimulus parameters as follows.
Task, stimuli, and behavioral performance. A, An example sequence of trials and phases constituting a trial. Eight exemplar trials are shown, each belonging to one of the six possible classes (labeled by letter symbols at the top, “stimulus|choice”). The gray rectangles represent brief periods during which observers were warned of stimulus onset, viewed a ring stimulus (colored thick vertical bars), and made a choice at a particular time point (colored thin vertical bars). B, Examples of the three different-sized rings. The luminance polarity is reversed here for illustrative purpose. C, Distribution of threshold SC values (on the horizontal axis) and actual performances in the main fMRI experiment (on the vertical axis). The small circles represent individual observers, and the large circle and error bars represent their population average and standard deviation (SD) respectively.
As stimuli whose subtle differences are best resolvable with population fMRI measurements, we opted for concentric rings whose size was the feature dimension of relevance for perceptual decision (Fig. 1B). Because of its configuration, a concentric ring engages a large ensemble of neurons whose peak activities within the V1 retinotopic map will vary with the eccentricity of the ring's circumference (i.e., ring size). Because of this feature of concentric ring stimuli, we could exploit the fact that the retinotopic architecture of V1 is resolvable with a mesoscopic-scale analysis of fMRI (Lee et al., 2005; Dumoulin and Wandell, 2008; Kay et al., 2008; Park et al., 2013), allowing us to take advantage of population coding of subtle stimulus differences (Paradiso, 1988; Pouget et al., 2000; Jazayeri and Movshon, 2006). Also, these ring stimuli provide the additional benefit of encouraging observers to maintain central fixation, for this ensures optimal retinal stimulation for performance of the task: shifting fixation toward any selected portion of a ring inevitably images the remaining portions of the ring at even more eccentric areas of the retina with poorer spatial resolution.
We optimized the spatiotemporal parameters of the ring stimuli to generate an optimal level of uncertainty in perceived size of the ring, so as to observe cortical representations of choice information. The SC (Fig. 1B) between the rings was calibrated to be at a threshold level for each individual, based on the performance in prescan practice trials performed in the scanner (0.020 ± 0.009; mean ± SD across observers; see Materials and Methods). In addition, we created trials in which observers' choices would not correlate with stimuli by introducing a middle-sized ring (M-ring) whose radius (rM = 2.84°) was halfway between the radii of the smallest (S-ring) and largest (L-ring) rings (Fig. 1B). Observers were not told there would be three different-sized rings; they were only told to classify each ring as “small” or “large.” The ring was shown for 0.3 s (Fig. 1A, “stimulus” period), a duration sufficiently long to produce reliable fMRI responses in V1 yet sufficiently brief to contribute to a degree of uncertainty in perceived size of the ring. This tailor-made calibration of stimulus size and duration succeeded at holding observers' performances during the fMRI scan sessions within a threshold range (73.7 ± 5.7%; mean ± SD across observers; Fig. 1C).
We adopted a sparse event-related design (Fig. 1A) to individuate trial-to-trial fluctuations of fMRI responses to repeated presentations of the rings. To encapsulate neural events associated with a single trial of perceptual decision-making within a short period of time, we forced observers to make a perceptual choice within 1.2 s after stimulus onset, which resulted in actual response times with the mean of 0.66 s and SD of 0.13 s (2825 trials, pooled across observers). To minimize carryover effects in fMRI signal between consecutive trials resulting from hemodynamic delay, individual trials were separated by 13.2 s. To stabilize eye position and to regulate cortical and cognitive states during this intertrial period, we required observers to maintain their gaze on the fixation dot (diameter 0.12°) and signaled an upcoming trial by increasing the size of the fixation dot slightly (diameter 0.18°) 2.2 s before stimulus onset. It is worth noting that stable, central fixation is essential for optimizing psychophysical performance, as mentioned above, and for successful measurement of high-resolution fMRI responses to the ring stimuli. To prevent unwanted feedback-related events from contaminating trial-locked fMRI measurements, we did not provide trial-by-trial feedback. Instead, observers were updated about their overall performance at the end of each scan run containing 26 trials.
Definition of eccentricity-tuning curves for individual voxels
While observers performed the ring-size classification task with parameters optimized as described above, we acquired time-series of fMRI measurements from a population of unit gray-matter volumes (voxels) in the V1 cortical surface whose width (0.5°–7.5°; a region marked by color spectrum in Fig. 2A) was larger than the site directly stimulated by the rings (2.72°–2.96°, i.e., the smallest and largest rings, respectively, used in the experiment across observers; dotted circle in the inset of Fig. 2A). To inspect trial-to-trial patterns of population responses in a feature dimension relevant to the perceptual decision task, we first mapped the coordinates of those individual voxels in visual eccentricity space.
Eccentricity-tuning curves in V1. A, Eccentricity map of V1 from Observer S08 shown on the flattened left occipital cortex. The white dot, dashed curve, and solid curve represent the V1 cortical sites representing the fovea, the upper vertical meridian, and the lower vertical meridian, respectively, in visuotopic space. The colors represent the eccentricities of the voxels with high goodness of fit by the tuning-curve model (R2 > 0.4; see Materials and Methods). The black dotted circle in the color legend represents the eccentricity of the M-ring stimulus. B, Relationship between preferred eccentricity and tuning width. The gray lines plot tuning widths (the vertical axis) as a function of preferred eccentricity (the horizontal axis) for individual observers, and the black line is a pseudo linear regression of the eccentricity to the tuning width (see Materials and Methods), which was used to estimate the eccentricity-tuning curves for the 21 cortical bins (the black dots) shown in C. C, Population-averaged eccentricity-tuning curves. The horizontal axis specifies stimulus eccentricity, the vertical axis the estimated preferred eccentricity of a cortical bin, and the intensity corresponds to the normalized cortical activity. The solid blue and red vertical lines indicate the population averages of the eccentricities of the S- and L-rings, respectively. The dotted blue and red horizontal lines indicate the two cortical sites that show maximum responses to the S- and L-rings, respectively. The blue and red curves on the right are the predicted population responses to the S- and L-rings, respectively. D, Individual (thin gray) and averaged (thick black) HIRFs. E, Predicted fMRI responses to ring stimuli. In all three panels, the horizontal and vertical axes specify time relative to stimulus onset (indicated by the colored dots) and the preferred eccentricity of a cortical bin, respectively. The L- and S-ring panels represent responses to the L- and S-ring stimuli, respectively, which were predicted by convoluting the red and blue curves in C with the averaged V1 HIRF in D. The L-S panel represents the differences between the L- and S-ring panels, with hue and saturation representing the sign and magnitude of the differential responses, respectively.
By applying the model-based population receptive field estimation method (Dumoulin and Wandell, 2008) to fMRI time-series responses to an expanding/contracting annulus, we defined the eccentricity-tuning curves with a Gaussian function for individual voxels in each observer's V1 (Fig. 2A; see Materials and Methods). The range of estimated widths of the tuning curves (1.08 ± 0.51°; mean ± SD across 6379 V1 voxels with R2 > 0.4, pooled across observers) and their positive correlation with preferred eccentricity (Pearson's r = 0.32 ± 0.08, 10−17 < p < 0.001; mean ± SD across observers; Fig. 2B) were consistent with previous studies (Dumoulin and Wandell, 2008; Kay et al., 2008; Harvey and Dumoulin, 2011; Park et al., 2013), supporting the validity of our estimation procedure. These tuning estimates from individual observers (Fig. 2B, thin gray lines) were merged and summarized by fitting a power function (Fig. 2B, black line; see Eq. 5) (Duncan and Boynton, 2003) to obtain a reference eccentricity map (Fig. 2C) with 21 eccentricity bins (whose centers are marked by the filled circles in Fig. 2B; see Materials and Methods).
The eccentricity map (Fig. 2C) allowed us to preview the potential population fMRI responses to the different-sized rings. This map predicts that the S- and L-rings (whose eccentricities are marked by the vertical blue and red solid lines, respectively, in Fig. 2C; the group average values, rS = 2.78° and rL = 2.90°, were used) produce profiles of activity that are broad across-eccentricity but that are nonetheless slightly offset with respect to one another (Fig. 2C, blue and red bell-shape curves with dotted lines at center, plotted on the right-hand vertical axis). These spatial profiles of predicted cortical responses were then convolved with the V1 HIRFs (Fig. 2D), which were estimated from the retinotopy-mapping scan runs (see Materials and Methods), to produce matrices of noise-free fMRI responses to the L-ring and S-ring stimuli (Fig. 2E, L-ring and S-ring panels, respectively; see Materials and Methods). By subtracting the matrix predicting responses to the S-ring from that predicting responses to the L-ring, we obtained the matrix predicting differential fMRI responses to the L- and S-rings (Fig. 2E, L-S panel). The predicted differential responses peaked in time at 3.3–5.5 s after stimulus onset due to hemodynamic delay, and in space at two flanking banks of eccentricity bins (Fig. 2E, blue and red pixels in L-S panel), representing the foveal and the peripheral sides of the rings.
Definition of trial-related matrices of population responses
With eccentricity-tuning curves defined for individual voxels, we expressed the V1 population responses of individual observers performing the ring-size classification task in a matrix with two dimensions: one defined by voxels' preferred eccentricities and the other defined by time frames at which fMRI measurements were acquired (Fig. 3A). To increase SNR at the cost of resolution and to be able to merge data across individuals, fMRI responses of neighboring voxels were summed with windows of weights centered at discrete-step eccentricity values (Fig. 3B; see Materials and Methods), as done similarly in previous fMRI studies (Brefczynski-Lewis et al., 2009; Kolster et al., 2010; Park et al., 2013). This smoothing procedure resulted in a 21 (the number of eccentricity bins)-by-150 (the number of time frames per scan run) matrix of responses for each scan run (Fig. 3C). As a final step of preparatory analysis of fMRI measurements, we dissected the response matrix for each scan run into “trial-related” matrices of responses with 21 rows representing the eccentricity bins and six columns representing the time frames defined relative to stimulus onset (Fig. 3C, example shown by the matrix demarcated by the dotted box). Then, with these trial-related matrices (Fig. 3D), we searched for cortical signatures of stimulus and choice by examining whether responses in each of those 126 (=21 eccentricity bins × 6 time frames) individual spatiotemporal cells covary with stimuli shown or choices made over trials.
Definition of trial-related matrices of population responses. A, Eccentricity-sorted individual voxels' time-series of fMRI measurements during a single scan run from Observer S08. The horizontal and vertical axes specify the time bin of measurement and the peak eccentricity of voxels, respectively, with image intensity corresponding to level of fMRI activity. The letters at the top represent the stimulus shown and the choice made by the observer in a given trial (stimulus|choice). B, Kernels used for spatial smoothing over eccentricity. The horizontal axis specifies the preferred eccentricity of a target eccentricity bin in cortical scale. The vertical axis specifies the center of a given voxel's eccentricity-tuning curve in visuotopic scale. The image intensities correspond to the weights of the smoothing kernels, whose widths were constant over the eccentricity in cortical scale. C, Responses at eccentricity bins. The format is identical to the one in A, except that the vertical axis specifying the preferred eccentricity is scaled in cortical distance. The blue, magenta, and red dots represent the spatiotemporal locations of S-, M-, and L-rings, respectively, presented over trials. D, An example matrix of trial-related population responses. Each trial-related matrix spanned 13.2 s (2.2 s per frame) in time and ∼5.5 degrees in space. The responses within the dashed black box in C are replotted here, with time axis magnified. The shaded rectangle and the bars within it represent the same events depicted in Fig. 1A. The dashed vertical and horizontal lines indicate the stimulus onset and the eccentricity of the M-ring stimulus, respectively.
Neither stimuli nor choices significantly correlated with raw responses
For each of the spatiotemporal cells of the trial-related matrix, we computed SPs (Tolhurst et al., 1983; Newsome et al., 1989) and CPs (Celebrini and Newsome, 1994; Britten et al., 1996) by assessing how well the trial-to-trial distributions of the raw responses predicted the stimulus actually presented and the choices that were made by individual observers. We sorted the individual trials into the six possible classes jointly defined by a stimulus shown and a choice made in a given trial: S|S, S|L, M|S, M|L, L|S, or L|L (Fig. 4A, stimulus|choice). The SPs were estimated by comparing the distributions of the raw responses belonging to two stimulus-contrast pairs (S-ring vs L-ring; Fig. 4A, horizontal brackets) of these classes, wherein the choice factor was held constant, S|S versus L|S (SPchoice=S) and S|L vs L|L (SPchoice=L). By varying the location of the discrimination criterion over those distributions (Fig. 4B, top panel), we constructed an ROC curve (Fig. 4C, green curve) and computed an SP by summing the area under the ROC curve. We defined the grand SP by taking the average of the two SPs associated with different choices, SPchoice=S and SPchoice=L (as indicated by the operations in Fig. 4A, bottom). The CPs were estimated similarly, first computing the three individual CPs for the three choice-contrast pairs (Fig. 4A, vertical brackets) of the distributions (S|S vs S|L, CPstimulus=S; M|S vs M|L, CPstimulus=M; and L|S vs L|L, CPstimulus=L) and then averaging those three CPs. As a reminder, our definition of CPs is different from that used in single-cell studies (see Materials and Methods).
Computation of SPs and CPs. A, Classification of trials and definition of stimulus- and choice-contrast pairs associated with SPs and CPs. A trial-related matrix for each trial was classified according to the stimulus|choice class. SPs and CPs were computed by averaging the stimulus-contrast (horizontal brackets) and choice-contrast (vertical brackets) pairs of the stimulus|choice classes, as indicated (Materials and Methods). B, Example distributions of responses at a representative spatiotemporal bin (black squares in D) from Observer S08. Top, Contrast of the histograms of raw fMRI responses between the L-ring|S-choice trials (open) and the S-ring|S-choice trials (filled). Bottom, Contrast of the histograms of raw fMRI responses between the M-ring|L-choice trials (open) and the M-ring|S-choice trials (filled). The dashed vertical line indicates a classification criterion that is slid to generate ROC curves. C, Example ROC curves. The horizontal and vertical axes specify the false alarm and hit rates, respectively (see Materials and Methods for definitions of α and β). The green and orange curves were derived from the top and bottom distributions, respectively, in B. D, Across-observer averages of SP, CP, and CPstimulus=M computed for raw fMRI responses. The format for axes is identical to that in Fig. 3D. Hue and saturation represent stimulus or choice probability values, as indicated.
We estimated the SPs and CPs exhaustively over the entire trial-related matrix of the raw responses (Fig. 4D), but none of those values reached statistical significance (minimum TFCE-corrected p = 0.41, 0.87, and 0.60 among 126 spatiotemporal bins, respectively, for SP, CP, and CPstimulus=M). Only the overall pattern of the across-observer averages of SPs (Fig. 4D, SP panel) exhibited a somewhat systematic distribution, which appeared similar to the pattern of the model-predicted BOLD differential responses (Fig. 2E, L-S panel).
We wondered that these weak probability values in the raw responses might have been caused by the interference from large nonspecific fluctuations in background cortical activity. Recent optical imaging and fMRI studies, wherein large-size population neural activities were monitored simultaneously in early visual cortex, observed large-scale cofluctuations over an entire population of neurons under observation regardless of whether or not individual neurons' stimulus preferences match incoming visual input (Sharon and Grinvald, 2002; Fiser et al., 2004; Chen et al., 2006; Jack et al., 2006; Donner et al., 2008; Sirotin and Das, 2009; Sirotin et al., 2012). In line with these findings, the raw responses in our study waxed and waned in synchrony over the entire array of eccentricity bins, which is readily appreciated by visual inspection of the sample matrix of the raw population responses (Fig. 5A, top). The presence of these so-called “untuned responses” was supported by significant widespread correlations among the eccentricity bins (Pearson's r = 0.62 (mean) ± 0.17 (SD across observers), the mean correlation of nonoverlapping eccentricity bin pairs, which are demarcated with the dotted boundary in Figure 5B). These significant positive correlations were not confined to the pairs of nearby eccentricity bins (Fig. 5C) but were also found between ones representing the directly stimulated visual region and the ones representing either the foveal or peripheral regions (e.g., Pearson's r = 0.54 ± 0.17, 0.67 ± 0.13; the mean ± SD correlations of the foveal and peripheral pairs marked by the gray boxes in Fig. 5B, respectively) and even between the ones representing the foveal and peripheral regions (e.g., Pearson's r = 0.57 ± 0.16 for the pair marked by the black box in Fig. 5B).
Filtering out a nonspecific component from raw responses. A, Definition of TRs via averaging and subtracting operations on RRs. Top, Image in Fig. 3C is replotted. Middle, Averages of RRs over the eccentricity bins at individual time frames. Bottom, TRs were obtained by subtracting those averages from the RRs. B, A matrix of population average correlations in RRs among eccentricity bins. The dotted line boundary indicates the bin pairs whose fMRI measurements are not blended via spatial smoothing. The squares represent the values of correlations for the three possible pairs from the time-series of RRs at the three eccentricity bins, representing 1.26°, 2.84°, and 5.65°, respectively. C, Correlations in RRs between all the eccentricity bins and the seed (2.84°) bin. The dashed vertical line indicates the preferred eccentricity of the seed bin, the visuotopic representation of which covers the locations of the ring stimuli. The gray thin lines indicate the correlations from individual observers, and the black thick line indicates their population average.
Having confirmed the nonspecific nature of the moment-to-moment background fluctuations, we filtered out those correlated responses throughout the entire region of V1 under observation by averaging the raw responses across the entire set of eccentricity bins (Fig. 5A, middle) and subtracting that average from the raw responses at each time frame (Fig. 5A, bottom). These averaging and subtracting operations were validated by the additive nature of “tuned” and “unturned” responses (Bianciardi et al., 2009; Cardoso et al., 2012; Schölvinck et al., 2012) and have been routinely used in previous studies using optical imaging (Shtoyerman et al., 2000; Sharon and Grinvald, 2002; Benucci et al., 2009) and fMRI (Fox et al., 2006; Larsson et al., 2006; Donner et al., 2008, 2013; Pestilli et al., 2011; Schölvinck et al., 2012). Hereafter we will refer to the unfiltered raw responses as RRs and the filtered responses as tuned responses (TRs).
Dissociated signatures of stimulus and choice in tuned responses
The subtraction of the untuned component from the RRs revealed clear signatures of stimulus and choice. To compute the SPs and CPs for the TRs, we followed the same procedure used for the RRs. Unlike the RRs, the TRs exhibited significant SPs and CPs, respectively, at different sets of spatiotemporal cells of the trial-related matrix [significant cells (corrected p < 0.05) are marked in Fig. 6A (*); corrected for multiple comparisons across the 126 spatiotemporal cells using the TFCE method (Smith and Nichols, 2009)].
Stimulus and choice probabilities in tuned responses. A, Across-observer averages of SP, CP, and CPstimulus=M in tuned responses. The format is identical to that in Fig. 4D. The white asterisks indicate the significant bins (TFCE-corrected p < 0.05; Materials and Methods). B, SP values at 5.5 s after stimulus onset from individual observers. The SPs are plotted against the eccentricity bins. Gray circles represent individual observers, and the blue- and red-filled circles represent the SPs averaged across observers at eccentricity bins, at which they were significant, as indicated by the white asterisks in the SP panel in A. C, CPstimulus=M values at 1.1 s after stimulus onset from individual observers. The axis format and symbols are identical to those in B, except for the empty blue and red circles, which represent the CPs averaged across observers at eccentricity bins, at which they were significant, as indicated by the white asterisks in the CPstimulus=M panel in A. D, Significant positive correlation between the model-predicted differential responses (Fig. 2E, L-S panel) and the observed SPs (SP panel in A). Gray circles represent individual spatiotemporal cells, and the colored circles represent the cells wherein either significant SPs (as indicated by the corresponding filled circles in B) or significant CPs (as indicated by the corresponding open circles in C) were found. E, No correlation between the CPstimulus=Ms (CPstimulus=M panel in A) and the SPs (SP panel in A). The representation of spatiotemporal cells by the symbols is the same as in D. The filled and open symbols are located far away from one another, illustrating the spatiotemporal dissociation between the significant SPs and CPs.
The SPs in the TRs (Fig. 6A, SP panel) were signed properly and clustered systematically both in space and time. At the time frames matched to the typical hemodynamic delay (3.3 s and 5.5 s) from stimulus onset, the responses to the S-ring were greater than those to the L-ring within the cortical subregion representing the side of the rings nearer to the fovea (Fig. 6A, blue pixels with SP < 0.5 in the SP panel), and the opposite was true within the cortical subregion representing the peripheral side of the rings (Fig. 6A, red pixels with SP > 0.5 in the SP panel). This emergence of the sinusoidal-shape spatial profile of SPs centered around the stimulation site at the time points a few seconds delayed from stimulus onset (Fig. 6B) is exactly what was previewed by our model prediction of differential cortical responses based on the eccentricity-tuning curves (Fig. 2E): the cortical sites that generate the largest differential responses to the ring stimuli with subtle differences are those with eccentricity preferences slightly deviated from the eccentricity of the stimuli. This strong resemblance between the spatiotemporal maps of the SPs and the model prediction of differential responses, as evidenced by the high cell-to-cell correlation between them (Fig. 6D; Pearson's r = 0.78, p < 10−9, across 126 spatiotemporal cells), assures that our fMRI measurements, once corrected for nonspecific background fluctuations, are reliable enough to delineate the cortical sites that encode the fine stimulus differences with high fidelity on a trial-to-trial basis.
Having characterized the V1 signature of stimulus by specifying which cortical sites carry that signature and when that signature is formed in relation to the stimulus onset, we set out to characterize the signature of choice in the same manner with the aim of examining whether the two signatures originate from the same neural population at the same time. This examination puts to a critical test the deterministic view of sensory neurons' role in perceptual decision, which posits that perceptual judgments on otherwise identical stimuli are caused by trial-to-trial fluctuations in responses of the neuronal ensemble that participates in encoding sensory features of relevance to a given perceptual task (Newsome et al., 1989; Salzman et al., 1990; Celebrini and Newsome, 1994; Britten et al., 1996; Shadlen et al., 1996). Hence, we reasoned, as have previous single-cell studies (Celebrini and Newsome, 1994; Britten et al., 1996; Parker et al., 2002; Romo et al., 2002; Uka and DeAngelis, 2004; Purushothaman and Bradley, 2005; Gu et al., 2007; Gu et al., 2008; Law and Gold, 2008; Ghose and Harrison, 2009; Law and Gold, 2009; Price and Born, 2010; Smith et al., 2011; Liu et al., 2013), that, if the causal view is correct, significant CPs should be found in the vicinity of the spatiotemporal cells at which the significant SPs were identified.
The observed pattern of CPs, indeed, was inconsistent with this prediction in four important ways. First, we failed to observe significant CPs at any of those seven cells that housed significant SPs (Fig. 6A, SP panel, *; Fig. 6B, blue or red filled circles) in the trial-related matrix of the TRs (CP = 0.51 ± 0.06, 0.49 ± 0.05, 0.49 ± 0.05, 0.50 ± 0.05, 0.51 ± 0.06, 0.51 ± 0.06, and 0.50 ± 0.05; CPstimulus=M = 0.51 ± 0.06, 0.52 ± 0.08, 0.51 ± 0.08, 0.51 ± 0.11, 0.52 ± 0.10, 0.51 ± 0.07, and 0.51 ± 0.06; from foveal to periphery, respectively; mean ± SD across observers; Fig. 6A, CP and CPstimulus=M panels). Second, instead, the significant CPs were found at the cells wherein the insignificant SPs were found. The cells with the significant CPs were quite advanced in time and far away from the site with direct stimulation in space (Fig. 6A, CP and CPstimulus=M panels, *; TFCE-corrected p < 0.05). Only 1.1 s after the stimulus onset (fMRI activity at which probably reflects neural activity occurring before the stimulus onset), the responses at the cortical site representing a region very close to the fovea (1.51°; Fig. 6A, CP and CPstimulus=M panels, blue pixels with *; Fig. 6C, blue open circle) were greater in the S-choice trials than in the L-choice trials, whereas the responses at the cortical site representing the far periphery (5.65°; Fig. 6A, CPstimulus=M panel, red pixel with *; Fig. 6C, red open circle) were greater in the L-choice trials than in the S-choice trials. Third, in search of any hints of the meaningful relationship between the SPs and the CPs, we also considered the possibility that, despite the mismatch in statistical significance, the SPs and the CPs might have been weakly correlated with one another. However, the correlation analysis, which was conducted over the entire ensemble of cells constituting the trial-related matrix of the TRs, showed that the spatiotemporal distribution of CPs was not correlated (Fig. 6E; Pearson's r = −0.03, p = 0.74, for CPstimulus=Ms) or even anticorrelated (Pearson's r = −0.38, p < 10−9 for CPs) with that of SPs. Fourth and finally, we further checked the possible involvement of the stimulus-encoding cortical sites in representing choice-associated information by comparing the spatiotemporal pattern of the CPs with that of the model prediction of differential responses to the stimuli (Fig. 2E, L-S panel). Again, we failed to observe any significant correlations between the CPs and the model predictions (Pearson's r = −0.15 and 0.02, p = 0.09 and 0.84, for CPs and CPstimulus=Ms, respectively).
Decoding stimulus and choice information from raw responses with population read-out weights
So far, reliable signatures of stimulus or choice were available only in the TRs derived by removing nonspecific background fluctuations from the RRs in which those TRs were embedded. Does this imply that the large-scale, moment-to-moment fluctuations in “untuned” activity impose a fundamental limitation on V1's capacity to carry stimulus- or choice-related information? That implication is not necessarily correct. Instead, the failure to find reliable signatures in the RRs could reflect the limitation of our “local coding” strategy, which evaluated the stimulus- or choice-related variability in neural responses confined to local cortical sites separately. Indeed, if a decision stage in the brain relies on population coding to interpret sensory signals within V1 (Paradiso, 1988; Pouget et al., 2000), fluctuations in untuned activity, a substantial fraction of which is shared by an entire population of encoding neurons, can be efficiently canceled at the decision stage regardless how large those fluctuations are.
To test this hypothesis, we revisited the RRs, this time decoding stimulus signals and choice signals from the RRs over the entire extent of the eccentricity matrix and computing the stimulus probabilities and choice probabilities by comparing the trial-to-trial distributions of those decoded signals at the population level. We will refer to these probabilities as population SPs and population CPs to distinguish them from the probabilities estimated at the individual cells of the trial-related matrix. For population decoding, we developed three different read-out weight profiles, implementing the three major decoding schemes proposed by previous studies (Gold and Shadlen, 2001; Jazayeri and Movshon, 2006; Graf et al., 2011; Berens et al., 2012; Haefner et al., 2013). The simplest form of read-out weights was a uniform read-out, in which the eccentricity bins are divided into either the “fovea” pool or the “periphery” pool, with uniform weights assigned to the bins within each pool (Fig. 7A). In the remaining two weight profiles, the eccentricity bins had nonuniform weights that were derived from the eccentricity-tuning curves according to two different task-optimal decoding schemes. In one scheme, the read-out weights were proportional to stimulus discriminability of given cortical sites (Fig. 7B), which are similar to the profile of model-predicted differential responses at the time frame with hemodynamic peak. The other scheme determined read-out weights by evaluating the contributions of given cortical sites to probabilistic inference of differences between the S-ring and L-ring stimuli (Fig. 7C), estimated by log-likelihood ratios between tuning responses to those two stimuli (for detailed definitions of the three weight profiles, see Materials and Methods).
Population decoding of stimulus and choice information in raw responses. A–C, Weight profiles defined by three different population decoding schemes. Individual symbols represent arbitrary-unit weight values assigned to eccentricity bins. D, Time courses of across-observer averages of population SPs (green line and symbols) and population CPstimulus=Ms (orange line and symbols). The circles, triangles, and squares represent the uniform (A), discriminability (B), and log-likelihood ratio (C) weights, respectively. The salient open and filled symbols represent the probability values significant (uncorrected) at p < 0.05 and p < 0.005, respectively. Error bars indicate SEM across observers. E, Comparison of population and individual SP values for the RRs. The blue open circles represent the SP values from the 10 foveal bins, which were adjusted for preference by (1 − SP). The red open circles represent the SP values from the 10 peripheral bins. The eccentricity bin corresponding to the M-ring (2.84°) is not shown. The pale green line indicates the population SPs with the discriminability weight. F, Comparison of population and individual CPstimulus=M values for the RRs. Blue open circles represent the CP values from the 10 foveal bins, which were adjusted for preference by (1 − CP). Red open circles represent the CP values from the 10 peripheral bins. The pale orange line indicates the population CPs with the log-likelihood ratio weight.
All three decoding schemes resulted in similar outcomes, each revealing clear-cut signatures of stimulus and choice in the RRs at the time points where the significant SPs and CPs were found for the TRs (Fig. 7D). The population SPs were significant at 3.3 s and 5.5 s after stimulus onset (Fig. 7D, green-filled markers; uncorrected p < 0.005). These signatures were seen using all the three read-out schemes, but they were most conspicuous in the “discriminability” read-out (Fig. 7D, green-filled triangles). In contrast, the population CPs were significant only 1.1 s after stimulus onset and were strongest when derived using the log-likelihood ratio scheme (Fig. 7D, yellow-filled squares; uncorrected p < 0.005). This unmistakable dissociation between the population SPs and CPs estimated in the RRs, both in time and in profile shape, neatly dovetails with the results from the local SPs and CPs estimated in the TRs, further corroborating our conclusion that V1 carries stimulus and choice signatures that are embodied in different neural ensembles at different points in time.
The advantage of the population coding strategy over the local coding strategy was substantial in RRs: the best population probabilities (Fig. 7E,F, filled markers) surpassed all of the individual probabilities estimated at the local cells of the trial-related matrix of the RRs (Fig. 7E,F, open circles; for the 10 bins located in the foveal bank, their individual probabilities were adjusted for preference by taking 1-SP or 1-CP, so that they can be directly compared with the population SPs and CPs). This analysis verifies that the RRs, despite including a substantial untuned component, retain sufficient information for supporting perceptual judgments at a subsequent decision stage.
Given the advantageous effect of population coding in the RR signals (Fig. 7E,F), why does population coding not do better when applied to TRs (Fig. 8A,B)? Why, in other words, are population SPs and population CPs no larger than the best SPs and CPs exhibited by local cells of the trial-related matrix? One obvious possibility is that the beneficial effect of pooling signals from neurons with similar preferences is limited when those neurons' responses are highly correlated across trials (Averbeck et al., 2006). To check that possibility in the case of our TRs, we calculated pairwise temporal correlations among those TRs and found that the responses from nearby eccentricity bins are indeed highly correlated even after removal of global fluctuations (Fig. 8C). These correlations might reflect moment-to-moment cofluctuations among neurons with similar stimulus preferences, but they might have arisen because of our method for combining voxel signals (Fig. 3B) and/or the spatially correlated nature of the fMRI signal.
Population versus individual probabilities in tuned responses. A, Comparison of population and individual SP values for the TRs. The format is identical to that in Fig. 7E. The pale green line indicates the population SPs with the discriminability weight applied on the TRs. This line is identical to the line in Fig. 7E because those weights have removed the global fluctuations that distinguish TRs from RRs. B, Comparison of population and individual CPstimulus=M values for the TRs. The format is identical to that in Fig. 7F, and again the population values are identical to those in Fig. 7F for the same reason mentioned above. The black X indicates the average (0.52) of the individual CP values at 1.1 s after stimulus onset. C, A matrix of across-observer average correlations in TRs between the eccentricity bins.
Eye movements as a potential origin of choice signature in V1?
Our primary goal was to examine the “causal” hypothesis regarding the role of V1 activity in trial-to-trial variability of perceptual choice. Although our results are inconsistent with the causal hypothesis, we were puzzled about why the choice-related cortical activity, which does not match stimulus-related responses either in timing or in neural origin, appeared in V1. One possible explanation for that puzzle attributes the seemingly errant activity to eye movements. So, accordingly, we tested eye movement-related hypotheses as the origin of V1 choice signature by conducting an eye-tracking experiment.
Although the observers in the fMRI experiment were explicitly instructed to maintain their gaze on the fixation dot throughout an entire scan run, their eyes may well have moved unintentionally (Ratliff and Riggs, 1950; Martinez-Conde et al., 2004). And it is known that tiny movements, such as drifts, tremors, and microsaccades can affect V1 neural activity (Gur et al., 1997; Martinez-Conde et al., 2000; Snodderly et al., 2001; Tse et al., 2010). Thus, we first considered these involuntary fixational (“miniature”) eye movements as a possible origin for the observed CPs in V1. Given the advanced temporal locus of the significant CPs (0–2.2 s after the onset of the ring stimulus; Fig. 6A, CP and CPstimulus=M panels) and the hemodynamic delay of fMRI signal in the current study (4.4–6.6 s), which can be estimated from the locus of the significant SPs (Fig. 6A, SP panel), we were particularly interested in whether there were any differential eye movements associated with perceptual choices at the temporal bin spanning 4.4–2.2 s before the ring stimulus onset (Fig. 9, shaded rectangular areas), when no stimulus was presented other than the small fixation dot. This absence of a ring stimulus at the moment of the CPs in our study makes fixational eye movements unlikely to be the cause of the CPs because microsaccades, which occur with the greatest amplitude among the major fixational eye movement types, cause no changes in V1 neural activity in the absence of a stimulus other than a fixation mark (Martinez-Conde et al., 2000, 2002). However, because the impact of various fixational eye movements, including microsaccades, on fMRI measurements in V1 in the absence of stimulation has never been measured directly and because fixational eye movements might alter perception possibly via attention (Hafed et al., 2011) or gain modulation (Hafed, 2013), we explored the possibility that fixational eye movements, which were not monitored during the fMRI experiment, might have generated the choice signature in V1. This possibility can be tested behaviorally because one necessary (but not sufficient) condition for eye movements to generate the CPs observed in the fMRI experiment is that eye movements occurring at ∼4.4–2.2 s before stimulus onset should predict observers' choices yet to be made after stimulus presentation. We searched for eye movements that satisfy this necessary condition by repeating the same experiment on a new batch of observers outside the scanner, but now with their eye movements being tracked throughout the entire experiment. The stimuli, procedure, and number of observers (23) in this eye-tracking experiment were otherwise identical to those in the original fMRI experiment (see Materials and Methods). As expected, the SC and resulting performance of this new batch of observers (0.023 ± 0.006 and 81.6 ± 5.1%, respectively; mean ± SD across observers) were comparable with those who participated in the fMRI experiment.
Results of the eye-tracking experiment. A, Accuracy and precision of gaze position measurements in the visually guided saccade task. The horizontal and vertical gaze measurements obtained under three different fixation positions (cross-shape polygons; different symbols) are plotted against each other for individual observers (small gray symbols). The black symbols with error bars represent the group means with 0.5*SDs pooled over observers for each fixation mark. B–F, Eye movements during the ring-size discrimination task. The mean and variability statistics for five different aspects of eye movements are plotted against the time points relative to the stimulus onset, which matched the time axis of the trial-related fMRI matrix after being adjusted for hemodynamic delay. The dotted and dashed vertical lines indicate the onset of the “ready” cue and ring stimulus, respectively. The shaded rectangular areas represent the time bin at which the significant CPs were found in the fMRI experiment. B, Eye blinks. Top, Across-observer average (black line) time course of percentage blink rate with SEM (shaded area). Bottom, Plot of the differences in blink rate between the M-ring|S-choice and M-ring|L-choice trials, with error bars indicating SD/2 across observers. C, Microsaccades. Top and bottom, Time course of microsaccade frequency and the ratio of rightward microssacades, respectively, at each 2.2 s bin. Different symbols represent the trial types. The dashed curve indicates the average of the S- and L-ring trials and serves as a baseline. Error bars indicate SD/2 across observers. D, Changes in pupil size. Top, Time course of pupil size changes around its mean. Bottom, CPs computed for the entire trials (diamonds) and for M-ring trials (circles). Error bars indicate SD/2 across observers. E, Gaze positions. The format is identical to that in D. The vertical black bar represents the width of the fixation mark. F, Changes in vergence angle. Filled symbols in the bottom panel represent the statistically significant CPs (uncorrected p < 0.05).
Before conducting the eye-tracking experiment, we assessed the accuracy and reliability (precision) of eye movement measurements obtained with our video-based eye-tracker (for its model and specifications, see Materials and Methods) by computing the mean and variance, respectively, of visually guided saccades made to three different fixation targets (Fig. 9A, cross-shape polygons). Here, the fixation targets were spatially separated by only 0.12°, closely matched to the average radius difference between the small and large rings, which was quite small but big enough to produce differential fMRI responses (SPs) in V1. The eye position measurements in the horizontal axis were both accurate, as indicated by the small amounts of deviations of the mean positions from the true position (0.02 ± 0.11°; mean ± SD across observers and conditions), and precise, as indicated by the small SDs across trials (0.23 ± 0.09°; mean ± SD across observers), thus providing a spatial resolution that can reliably distinguish eye position differences up to 0.1° (see Materials and Methods). In contrast, the measurements along the vertical direction were quite noisy, as indicated by the large SDs across trials (0.50 ± 0.44°; mean ± SD across observers; compare the horizontal and vertical error bars in Fig. 9A), thus unable to detect reliably the eye position difference of 0.1°. This relatively poor resolution along the vertical axis was expected because vertical eye position measurements in video-based eye-trackers are notoriously unreliable because the eyelid often occludes the pupil. The incidence of pupil occlusion varies substantially across individuals and over time during prolonged fixation due to fluctuations in cognitive states, such as arousal or fatigue. For these reasons, we only used eye movement data along the horizontal axis to detect microsaccades (Fig. 9C) and to estimate gaze positions (Fig. 9E,F).
First, we checked whether choices correlated with the frequency of eye blinks, which are known to affect fMRI activity in human V1 in the absence (Bristow et al., 2005) and in the presence (Tse et al., 2010) of retinal stimulation. Although the overall blink frequency (Fig. 9B, top, black line) decreased around the time of stimulus onset (Fig. 9B, dashed vertical lines) and increased afterward, its time course did not differ at any time bins between the two choices either for the M-ring trials only (Fig. 9B, bottom; paired t test across observers, uncorrected 0.11 < p < 0.99) or for the entire set of trials (0.11 < p < 0.65). Next, we checked whether choices correlated with the frequency or direction of microsaccades because a microsaccade may affect the response gain of neurons representing visual regions around its target (Hamker et al., 2008; Hafed, 2013), or the direction of a microsaccade may interact with covert spatial attentional shifts (Hafed and Clark, 2002; Engbert and Kliegl, 2003; Hafed et al., 2011). Consistent with a phenomenon known as microsaccadic inhibition (Rolfs et al., 2008; Hafed and Ignashchenkova, 2013), the overall frequency (Fig. 9C, top, dashed line) decreased after onset of the ready signal (Fig. 9C, dotted vertical lines) and recovered to the baseline level after the choice period. However, neither the frequency (Fig. 9C, top) nor the direction (Fig. 9C, bottom) differed at any time bins between the two groups of choice-sorted M-ring trials (paired t test across observers, uncorrected 0.20<p < 0.97 and 0.21 < p < 0.96, respectively).
Next, we checked whether choices correlated with pupil diameter because changes in pupil size are known to accompany changes in arousal (Hess and Polt, 1960; Bradshaw, 1967; Henson and Emuh, 2010), changes in perceptual interpretation (Einhäuser et al., 2008, 2010), and task-related factors (Hess and Polt, 1964; Kahneman and Beatty, 1966; Nassar et al., 2012). These kinds of uncontrolled changes in pupil size in turn would produce variations in the retinal image (Campbell and Gubisch, 1966) that could induce changes in V1 activity. As suggested by its kin relationship with task structure, the overall pupil size (Fig. 9D, top, black line) fluctuated substantially around the stimulus onset in a manner similar to the blink rate (Fig. 9B, top) and the microsaccade frequency (Fig. 9C, top). However, the pupil size failed to predict the choices made by observers, as indicated by the absence of significant CP at any time bins (Fig. 9D, bottom), including the one spanning 4.4–2.2 s before stimulus onset, either for the M-ring trials (Student's t test across observers, uncorrected 0.52<p < 0.85) or for the entire trials (0.17 < p < 0.99).
Next, we checked whether choices correlate with gaze position, which may affect V1 activity via gain modulation (Trotter and Celebrini, 1999; Rosenbluth and Allman, 2002; Sharma et al., 2003; Merriam et al., 2013). Unlike the previous measurements (blink rate, microsaccade frequency, and pupil size), the overall gaze position (Fig. 9E, top, black line trace) remained virtually stationary, showing only negligible amounts of fluctuation inside the fixation mark (Fig. 9E, right, thick black bar). In agreement with the previous measurements, however, the gaze position did not carry choice information at all (Fig. 9E, bottom; Student's t test across observers, uncorrected 0.13<p < 0.80 and 0.39<p < 0.85; for CP and CPstimulus=M, respectively).
Last, we checked whether choices correlated with changes in vergence angle because vergence eye movements are known to affect the perceived size of an object (Mon-Williams et al., 1997; Sperandio et al., 2013), which in turn could affect V1 activity (Murray et al., 2006). The vergence angle (Fig. 9F, top, black line trace) was not biased toward the either near or far side before and during the stimulus presentation. However, the significant correlations were found at the two consecutive time bins after stimulus onset (Fig. 9F, bottom, black diamonds; CP = 0.46 ± 0.06 and 0.46 ± 0.07, respectively; mean ± SD across observers; Student's t test, uncorrected p = 0.01 and 0.04, respectively), but not at the time bin (4.4–2.2 s before stimulus onset) associated with the significant CPs found in the V1 fMRI activity (p = 0.63). This temporal mismatch disqualifies vergence angle as a potential origin for the choice signature in V1. Instead, the delayed changes in vergence angle are likely to reflect the vergence control system's automatic reaction to the perceived size of a ring stimulus. This interpretation is supported by the fact that divergence was greater after the ring stimulus was judged to be large than when it was judged to be small (vergence angle difference at 0–2.2 s = −0.05 ± 0.12°; mean ± SD across observers).
In conclusion, our eye movement measurements were accurate and reliable enough to reveal the previously known subtle changes associated with task structure. However, none of the five aspects of eye movements in the time bin at which the significant CPs were found in the fMRI experiment was related to choice behavior. Hence, there is no reason to attribute the choice-related V1 activity found in our fMRI experiment to eye movements. In the following section, we consider other possible reasons for that activity.
Discussion
In this study, human observers made speeded perceptual judgments about the size of ring stimuli while, at the same time, neural responses were being measured from their primary visual cortex by fMRI. Ring stimuli highly similar in size were briefly presented, creating a difficult discrimination task that induced variability in choice behavior across trials, thereby allowing us to identify when and where stimulus-correlated and choice-correlated responses arose relative to stimulus onset. Raw fMRI measurements failed to exhibit reliable neural signatures selective either for the particular stimulus presented or for the choice made by the observer. When the omnibus, untuned component was filtered from the raw responses, however, both “stimulus” and “choice” signatures were revealed within two separate constellations of activation, located far apart from one another in both space and time. The spatial and temporal loci of the “stimulus” signature precisely matched those derived from the eccentricity-tuning curves and those predicted by hemodynamic delay of sensory responses, respectively. Compared with the “stimulus” signature, the “choice” signature was quite advanced in time, appearing before onset of the stimulus, and located farther away from the stimulation site in space.
Besides our study, there is another human fMRI study, by Ress and Heeger (2003), that also found significant correlations between V1 fMRI responses and observers' perceptual choices. Unlike us, however, Ress and Heeger (2003) found those correlations within the same voxels as those associated with fMRI responses triggered by stimulus presentation. We can envision several possible considerations that reconcile their findings with ours. For one thing, we used a difficult stimulus discrimination task that probably relies on information carried by neurons that are not maximally responsive to the presented stimulus. Ress and Heeger (2003), on the other hand, used a contrast detection task that relies on the overall magnitude of responses associated with the presence or absence of a weak stimulus, responses likely to arise in neurons maximally responsive to the stimulus. For another thing, we purposefully tailored our task (difficult size discrimination), stimuli (thin, highly localized rings), and fMRI protocol (sparse event-related) to achieve very high spatial resolution. Ress and Heeger (2003) used a detection task in concert with an fMRI protocol (dense event-related) that was not optimized to uncover possible dissociations in the spatial and temporal origins of SPs and CPs.
What might our results say about neural coding of ring size? The most reliable stimulus signals associated with ring size were not those appearing within voxels maximally responsive to a given-sized ring but, instead, were signals arising at locations to either side of the sites registering the precise peak response to the given-sized ring (Fig. 6A, SP panel). This finding comports with the idea that the information capacity of cortical neurons in primary sensory cortex is not always governed by their maximal responses to sensory stimuli. By definition, the preferred stimulus of a visual neuron is the one producing the strongest responses in that neuron (i.e., the peak of its tuning curve), but when it comes to discriminating subtle, near-threshold differences between stimuli (e.g., ring size) maximally responsive neurons may not provide the optimal information for that discrimination (Regan and Beverley, 1985; Jazayeri and Movshon, 2006). This idea has been corroborated by results from single-cell recordings in V1 (Graf et al., 2011; Berens et al., 2012) and extrastriate cortex (Purushothaman and Bradley, 2005), and now we have evidence supporting this principle within human V1.
Turning to the choice-related activity revealed by our study, our results are noteworthy in two respects: (1) significant CPs were not located at neural sites where significant SPs were arising; and (2) those CPs were appearing much earlier in time than were the SPs. The absence of significant CPs within the spatiotemporal window in which significant SPs were found is consistent with previous single-cell studies reporting the absence of significant CPs in the responses of neurons within primary sensory cortex that participate in encoding sensory inputs relevant to given tasks (Grunewald et al., 2002; de Lafuente and Romo, 2005; Nienborg and Cumming, 2006; Hernández et al., 2010). And in the one neurophysiological study that did find significant choice-related responses in V1 (Palmer et al., 2007), those responses were not arising within the neurons maximally sensitive to the evoking stimulus, again consistent with what we observed in our fMRI results.
The choice signature observed in our study originates 3–4 s before the brief appearance of the task-related stimulus. This is remarkable because the only stimulus present at that time is the small fixation dot seen against an otherwise dark background (Fig. 1A). So, what is responsible for producing this prestimulus choice signature? A few previous studies found a tendency for eye movements in the absence of visual stimulation to produce spiking (Kagan et al., 2008) or fMRI activity (Sylvester et al., 2005) in V1. Thus, we considered the possibility that eye movements made 3–4 s before the stimulus onset might be correlated with the choices made by the observers. This possibility was tested exhaustively by searching for any choice-related changes in the five different aspects of fixational eye movements: blink rate, frequency and direction of microsaccades, pupil size, gaze position, and vergence angle. However, none of those measurements at the moment of the choice signature in V1 was significantly correlated with the choices made by the observers (Fig. 9B–F, bottom). This lack of correlation between prestimulus eye movements and choices cannot be attributed to the inaccuracy nor unreliability of our measurements because the eye-tracking signals in the current study were accurate and reliable enough to resolve as small a difference as 0.1° and to exhibit previously known task-locked changes in overall microsaccades and pupil diameter (Fig. 9C,D, top).
Given this absence of the prestimulus, choice-related eye movements, we speculate about top-down expectation as an alternative candidate for the origin of the CPs found in V1. The choice signature's specific location in space and in time, remote from the retinotopic site activated by the stimuli and advanced in time relative to appearance of those stimuli, suggests that this signature could arise from pretrial expectations (“short-term” priors, in other words) originating from high-tier areas involved in perceptual inference. Given the high degree of uncertainty imposed by the subtle ring-size differences and by time pressure to make judgments in our study, perhaps the brain builds up “trial-to-trial priors” in advance of trial onset, with those priors based on nonsensory sources of information (e.g., frequencies of choices made in recent trials). The trial-to-trial priors may be implemented by feedback projections carrying neural signals anticipating the forthcoming visual stimulus. Those feedback signals (Fig. 10A, pale-red and pale-blue curves) are unlikely to be as precise as stimulus-evoked feedforward signals (Fig. 10A, dark-red and dark-blue curves), thus shifting the cortical sites with the maximum differences away from the boundary (Fig. 10B, gray triangles). At the same time, the dissociated signatures of stimulus and choice suggest that the prior signals preceding in V1 does not interfere with V1 neural activity that encodes stimulus features, as would be expected if sensory-evoked activity quenches preexisting neural variability (Churchland et al., 2010).
Conceptual implementation of trial-to-trial prior. A, Hypothetical feedforward and top-down signals in V1. The horizontal axis specifies the preferred eccentricity. The dotted blue and red vertical lines indicate the two cortical sites that respond maximally to the S- and L-rings, respectively. The solid blue and red curves are population responses evoked by the S- and L-ring stimuli, respectively, representing feedforward signals. The pale-blue and pale-red curves are hypothetical population responses reflecting spatially blurred top-down signals. B, Cortical sites with the maximum response difference. The solid black curve specifies the difference of two solid curves in A, and the gray curve the difference of two pale-colored curves in A. The black and gray triangles represent the cortical sites with the maximum response differences between the solid curves and between the pale-colored curves, respectively.
What do our results say about V1's role in choice behavior? We reasoned, if sensory neurons in V1 played a causal role in determining perceptual judgments, trial-to-trial variability in choice behavior should be linked to the same neural activity that signifies differences in stimuli relevant to the task. A recent computational study provided a formal proof that this relationship should hold, if neuronal populations are read-out in a (sub)optimal manner (see Eq. 6) (Haefner et al., 2013, their Fig. 6). Our observation of a clear-cut dissociation between the signatures of stimulus and choice undermines the “causal” view as it pertains to V1. Of course, one can never definitively rule out the possibility that fMRI measurements simply failed to detect choice-related responses from a small subset of neurons within the stimulus-related voxels. We are satisfied, however, that our measurement techniques are sufficiently sensitive to distinguish both the stimulus- and choice-related responses with good spatial and temporal resolution.
At this stage, we are cautious about generalizing our finding of a dissociation between stimulus and choice signatures in V1 activity to other perceptual tasks, such as simple detection (Ress and Heeger, 2003; Palmer et al., 2007), coarse discrimination (Britten et al., 1996), and, for that matter, to other forms of top-down modulation, such as working memory (Harrison and Tong, 2009) and perceptual learning (Li et al., 2008; Law and Gold, 2009). It will be interesting to look for dissociations between stimulus- and choice-associated population responses in other contexts and, for that matter, in extrastriate visual areas, including V2 and V3, to learn whether the dissociation is unique to primary sensory cortex. The stimuli and task used in our study were optimized to exploit the high spatial resolution of the V1 retinotopic map, so they may not work so well when targeting extrastriate visual areas because of the relatively larger receptive field sizes (and, hence, poorer spatial resolution) of neurons in those higher-tier areas. In all events, however, our findings make a unique contribution to the elucidation of the roles of V1 in perceptual decision-making by directly comparing choice-related responses and stimulus-encoding responses at a level of population activity.
Footnotes
This work was supported by the National Research Foundation of Korea Grants 2008-2005752 and NRF-2013R1A2A2A03017022 and United States Air Force Office of Scientific Research, Asian Office of Aerospace R&D Grant AOARD-12-4090. We thank Justin L. Gardner and Daeyeol Lee for helpful comments on an earlier version of this paper; the Seoul National University Brain Imaging Center for providing an excellent research environment; Yunseo Choi and Minju Kim for technical assistance; and Minsuk Kang, Choongkil Lee, and Kayeon Kim for eye-tracking consultation.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Sang-Hun Lee, Department of Brain and Cognitive Sciences, Seoul National University, Seoul 151-742, Korea. visionsl{at}snu.ac.kr
This article is freely available online through the J Neurosci Author Open Choice option.