To successfully interact with objects in the environment, sensory evidence must be continuously acquired, interpreted, and used to guide appropriate motor responses. For example, when driving, a red light should motivate a motor command to depress the brake pedal. Single-unit recording studies have established that simple sensorimotor transformations are mediated by the same neurons that ultimately guide the behavioral response. However, it is also possible that these sensorimotor regions are the recipients of a modality-independent decision signal that is computed elsewhere. Here, we used functional magnetic resonance imaging and human observers to show that the time course of activation in a subregion of the right insula is consistent with a role in accumulating sensory evidence independently from the required motor response modality (saccade vs manual). Furthermore, a combination of computational modeling and simulations of the blood oxygenation level-dependent response suggests that this region is not simply recruited by general arousal or by the tonic maintenance of attention during the decision process. Our data thus raise the possibility that a modality-independent representation of sensory evidence may guide activity in effector-specific cortical areas before the initiation of a behavioral response.
On a moment-to-moment basis, the brain must infer the most likely state of the world given a variable amount of sensory evidence, a process referred to as “perceptual decision making” (Newsome et al., 1989; Salzman and Newsome, 1994; Gold and Shadlen, 2001; Shadlen and Newsome, 2001). In a prototypical laboratory experiment, observers view a noisy field of moving dots drifting to the left or to the right [a random-dot pattern (RDP)] and indicate the direction with a saccade in the appropriate direction. The firing rate of motion-selective neurons in the middle temporal area (MT) monotonically tracks the quality of the available sensory evidence, which is systematically manipulated by varying the percentage of dots moving in a common direction (termed “motion coherence”) (Newsome et al., 1989; Salzman et al., 1992; Britten et al., 1996; Shadlen et al., 1996; Gold and Shadlen, 2001, 2007; Shadlen and Newsome, 2001; Ditterich et al., 2003; Mazurek et al., 2003). This sensory information is then thought to be temporally integrated by spatially selective oculomotor neurons in areas such as the lateral intraparietal area (LIP), frontal eye fields (FEFs), dorsal lateral prefrontal cortex (DLPFC), and superior colliculus (SC) until a threshold level of activity is reached and an appropriate eye movement response is triggered (Hanes and Schall, 1996; Kim and Shadlen, 1999; Gold and Shadlen, 2001, 2007; Schall, 2001; Shadlen and Newsome, 2001; Roitman and Shadlen, 2002; Ditterich et al., 2003; Huk and Shadlen, 2005; Churchland et al., 2008; Kiani et al., 2008). Microstimulating oculomotor neurons within some of these regions can also bias the response outcome, implying a causal role in perceptual decision making (Gold and Shadlen, 2000, 2003; Horwitz et al., 2004; Hanks et al., 2006).
The strong coupling between neural activity and behavior suggests that decision making is performed by the same neurons that ultimately initiate the appropriate motor response (here termed the “modality-dependent” hypothesis). For example, oculomotor regions mediate simple decisions requiring saccadic responses, and somatosensory cortex (S1) mediates vibrotactile decisions (Romo and Salinas, 1999, 2003; Romo et al., 2002; Tegenthoff et al., 2005; Preuschhof et al., 2006). Decisions about complex stimuli, such as images of faces or places, are also mediated by motor-specific cortical areas depending on the response output modality that is required by the task (Tosoni et al., 2008).
Although these studies leave no doubt that specialized motor areas play an important role in translating sensory information into a behavioral response, it is also possible that a separate mechanism computes a more abstract supramodal representation of sensory evidence and sends a continuous input signal to motor effector-specific sensorimotor areas during the course of the decision process (termed the “modality-independent” hypothesis). Here, we show that a region of right insula exhibits an activation profile consistent with the accumulation of sensory evidence during decision making, independent of response modality (saccade vs manual). This finding raises the possibility that a modality-independent mechanism guides activity in motor-specific regions before movement initiation.
Materials and Methods
Twelve right-handed subjects (nine females) were recruited from the University of California, Irvine (UCI) (Irvine, CA) community, and one right-handed subject (male) was recruited from the University of California, San Diego (UCSD) (La Jolla, CA) community. Data from one subject (female) were discarded because the manual and saccadic responses were not recorded correctly during the scanning session. All had normal or corrected-to-normal vision. Each subject gave written informed consent per Institutional Review Board requirements at either UCI or UCSD and completed two 1 h training sessions outside the scanner and one 1.5 h session in the scanner. Compensation for participation was $10.00/h for training and $20.00/h for scanning.
Stimuli and task.
Visual stimuli were generated using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) for Matlab (version 7.1; Mathworks), presented at a frame rate of 60 Hz, and projected onto a screen at the back of the scanner bore that subjects viewed through a mirror. Button-press responses were made on a functional magnetic resonance imaging (fMRI)-compatible response box using the fingers of the right hand.
Subjects viewed a display consisting of two overlapping centrally presented RDPs—one composed of 100 red dots and the other made up of 100 blue dots—against a light gray background (Fig. 1). Each small dot subtended 0.1° of visual angle, and the circular stimulus aperture subtended 4° of visual angle (radius) with a small circular cutout around fixation (1° radius). On every trial, the coherence level for each RDP was determined by the proportion of dots moving in one of four possible directions—either to the upper left, upper right, lower left, or lower right—while the direction of each remaining dot was selected from a uniform distribution (across 360°). Each RDP moved in a different direction (pseudorandomly determined) and contained a motion coherence level of either 40 or 80% so that the total motion signal in the display was equated on every trial (e.g., if the red RDP had 40% coherent motion, the blue RDP would contain 80% and vice versa). Additionally, there were four small black circles (subtending 1° and centered 11.3° from fixation) arrayed at each corner of the screen that served as saccade targets.
At the start of each trial, a cue was presented for 750 ms in the form of a colored fixation cross (either red or blue), indicating which of the two RDPs subjects should monitor. This colored fixation cross remained onscreen throughout the stimulus display. Subjects were asked to judge the direction of the coherent motion of the RDP to which they were cued. If the cued RDP contained 80% coherent motion, then the trial was termed “easy,” and if the cued RDP contained only 40% coherent motion, the trial was termed “hard.” The stimulus remained onscreen for 1500 ms, after which only a white fixation cross was displayed for the remainder of the trial. Each trial lasted 5250 ms and subjects were allowed to respond any time after stimulus onset and up until the termination of the trial. Each run in the scanner consisted of 32 task trials randomly interleaved with 10 null trials (which were the same duration as a normal trial but only required passively viewing the fixation cross: no RDPs were presented). The color of the cue and the cued motion direction were randomized and counterbalanced within each block and each run ended with 10 s of passive fixation.
Response modality was alternated on a run-by-run basis and subjects were informed beforehand whether they were to make their responses via saccades or manual button presses. When making saccadic responses, subjects were instructed to keep their eyes on the fixation cross and then to make one clean eye movement to one of the four peripheral black circles before redirecting their gaze back to the central cross in preparation for the start of the next trial. When responding with button presses, subjects were instructed to keep their eyes on the fixation cross throughout the entire trial and to press one of four buttons spatially arrayed to correspond to the four possible target directions.
Eye movement data acquisition and analysis.
At UCI, eye movements were monitored using an infrared video eye tracker (Applied Science Laboratories; long range optics system); at UCSD, an Avotech SV-7021 infrared eye tracker was used. The position of the right eye was sampled at 60 Hz, and before each run, the eye tracker was recalibrated. Preprocessing and saccade extraction were performed using the ILAB toolbox for Matlab (http://www.brain.northwestern.edu/ilab/) (Gitelman, 2002). The raw data were first binned into temporal epochs corresponding to each trial, and then blinks (periods when the pupil disappeared), as well as five samples on either side of each blink, were marked and removed from the epoched data. The following parameters were used to identify saccades: an initial velocity threshold of 30° per second, a minimum saccade duration of 35 ms, and a minimum fixation duration of 100 ms at the endpoint of the saccade. Response times (RTs) on saccade trials were defined as the time between the onset of the stimulus and the first saccadic eye movement that deviated >3° from fixation in the direction of one of the four peripheral targets (data were scored by hand on a trial-by-trial basis to ensure accuracy).
fMRI data acquisition and analysis.
For 11 of the subjects, MRI scanning was performed on a Phillips Intera 3 tesla scanner equipped with an eight-channel head coil at the John Tu and Thomas Yuen Center for Functional Onco Imaging (UCI). Anatomical images were acquired using a magnetization-prepared rapid-acquisition gradient echo (MPRAGE) T1-weighted sequence that yielded images with a 1 × 1 × 1 mm resolution. Whole-brain echoplanar functional images (EPIs) were acquired in 35 transverse slices [repetition time (TR), 2000 ms; echo time (TE), 30 ms; flip angle, 70°; image matrix, 64 × 64; field of view, 240 mm; slice thickness, 3 mm; 1 mm gap; SENSE factor, 1.5]. For the remaining subject, scanning was performed on a General Electric 3 T scanner equipped with an eight-channel head coil at the W. M. Keck Center for Functional MRI (UCSD). Anatomical images were acquired using a MPRAGE T1-weighted sequence that yielded images with a 1 × 1 × 1 mm resolution. Whole-brain EPIs were acquired in 33 transverse slices (TR, 2000 ms; TE, 30 ms; flip angle, 90°; image matrix, 64 × 64; field of view, 240 mm; slice thickness, 3 mm; 1 mm gap).
Data analysis was performed using BrainVoyager QX (version 1.91; Brain Innovation) and custom time series analysis routines written in Matlab. Data from the main experiment were collected in 8 or 10 runs per subject (i.e., either 4 or 5 runs per response modality, respectively), with each run lasting 230 s. EPI images were slice time corrected, motion corrected (both within and between scans), high-pass filtered (3 cycles/run) to remove low-frequency temporal components from the time series, and spatially smoothed with a 4 mm full width at half-maximum kernel. The motion parameters were used to estimate and remove motion-induced artifacts in the time series of each voxel using a general linear model (GLM). The time series from each voxel in each observer was then z-transformed on a run-by-run basis to ensure that the time series had a mean of zero. All anatomical and EPI images were transformed into the atlas space of Talairach and Tournoux (1988) before group analyses were performed.
Linear ballistic accumulator model.
Behavioral data were modeled using the linear ballistic accumulator (LBA), a simplified version of the ballistic accumulator (Brown and Heathcote, 2005), which was in turn a simplified version of the leaky competing accumulator of Usher and McClelland (2001). The simplifications included in the LBA allow it to keep the essential predictive qualities of Usher and McClelland's original model, but with much improved analytic tractability. The simplifying assumptions used in the LBA are similar to those in some other neurally inspired models of decision making, most notably the LATER model of Reddi and Carpenter (2000) and the random ray model of Reeves et al. (2005).
In the LBA, each of the four response alternatives (motion directions) is represented by an independent linear accumulator, illustrated in Figure 2. On each trial, each accumulator begins with a random activation level that is independently drawn from a uniform distribution on [0,A]. During decision making, activity in each accumulator increases linearly, and a response is triggered as soon as the first accumulator crosses a response threshold (b). The predicted response time is simply the time taken to reach the threshold, plus a constant offset time t0. The rate at which activation increases in each accumulator is termed the “drift rate” for that accumulator. These drift rates are drawn from independent normal distributions for the four accumulators. To simplify matters, we always assumed that these normal distributions share a common SD (s). The means of the normal distributions reflect the perceptual input: when the motion direction of the cued RDP closely matches the response assigned to a particular accumulator, that accumulator will have a large drift mean rate, and vice versa. We estimated a parameter for the mean drift rate of the accumulator corresponding to the correct response (dc) and assumed that the other three accumulators had equal mean drift rates (1 − dc)/3, keeping the total of all four drift rates fixed at 1. We also calculated a more detailed analysis with different mean drift rates for the accumulators corresponding to incorrect responses. That analysis showed obvious differences (e.g., the mean drift rate for the response opposite the correct response was ∼10% smaller than the mean drift rates for responses that were orthogonal to the correct response) but all of the substantive results were unchanged.
Brown and Heathcote (2008) showed that the LBA accommodates all the benchmark phenomena observed in choice RT paradigms. The LBA is also sufficiently simple in that there are closed form solutions for the densities of predicted RT distributions, making it easy to apply to data such as ours. These solutions were used to calculate likelihood values when fitting the model to data. We assessed the goodness-of-fit between the observed RT distributions and those predicted by the LBA model using the quantile maximum product statistic (Heathcote et al., 2002; Heathcote and Brown, 2004). The parameters of the model were adjusted to maximize the goodness of fit using the simplex algorithm (Nelder and Mead, 1965; Brown and Heathcote, 2008).
Predicting blood oxygenation level-dependent responses based on the rate of evidence accumulation.
Assuming a different rate of evidence accumulation on easy and hard trials, we generated predictions of the blood oxygenation level-dependent (BOLD) response profile within regions involved in accumulating sensory evidence during perceptual decision making. The model is primarily motivated by the work of Shadlen and coworkers, who have shown that the firing rates of neurons in areas such as LIP monotonically increase until a response threshold is achieved and a response is executed. In our simulation, we assumed that the estimated drift rate on easy and hard trials is a proxy for neural activity (Fig. 3A); we then convolved this ramping activity profile with a canonical model of the BOLD response (a difference of two gamma functions; time to peak, 5 s; undershoot ratio, 6; time to undershoot peak, 15 s). Assuming that the firing rate of “accumulator” neurons in areas like LIP falls off after a response is made (Shadlen and Newsome, 2001; Roitman and Shadlen, 2002), the simulation predicts that lower drift rates will produce larger and temporally extended BOLD responses because the response is proportional to the integrated amount of neural activity during the decision process (Fig. 3A,B). However, this same effect—larger and temporally extended BOLD responses on hard trials—might also be expected in a region involved in maintaining selective attention to relevant aspects of the stimulus display during decision making, or in a region that more generally participates in sustaining a task set or an aroused state. Thus, it is not possible to distinguish areas involved in accumulating sensory evidence based solely on an increased response associated with perceptual difficulty. Fortunately, the simulation also predicts that the BOLD response should rise more slowly on hard trials compared with easy trials, because hard trials are associated with a more gradual ramping of neural activity (Fig. 3B, shaded region). In contrast, regions that are involved in general attentional processes should be uniformly engaged for the duration of the decision process, resulting in a similar main effect of perceptual difficulty without the accompanying shift in response latency. Two variants of such attentional accounts—along with the predicted BOLD response profiles—are shown for comparison in Figure 3C–F.
Identifying supramodal mechanisms of information accumulation.
The main goal of the analysis was to use a two-step inferential process to define regions that (1) exhibit a larger and temporally extended response on hard trials compared with easy trials and (2) exhibit a temporally delayed BOLD response on hard compared with easy trials (Fig. 3A,B). These properties define regions that are likely performing evidence accumulation, rather than some other role in the decision process.
To identify regions of interest (ROIs) that respond more on hard compared with easy trials (step 1 in the analysis), the hemodynamic response function for each event type (easy saccade, hard saccade, easy manual, hard manual) was estimated using a GLM and a finite impulse response model that included separate regressors to estimate the BOLD response at the time of event onset and at each of the next eight time points after that event (times 0–16 s after stimulus) (Dale and Buckner, 1997). Using this approach, the rows in the GLM design matrix correspond to the number of time points in a scanning session and the columns correspond to the relative temporal position of each model regressor with respect to the time of event onset. Each of the nine time points was modeled with a “1” in the appropriate row and column of the GLM design matrix, yielding scaled fit coefficients (β weights) at each time point for each event type. Additional regressors-of-no-interest were included to model the mean response across the nine time points after incorrect trials, collapsed across trial type. A three-way repeated-measures ANOVA with response modality (saccade vs manual), perceptual difficulty (easy vs hard), and time (0–16 s after stimulus, in nine intervals) as factors was then performed on the estimated β weights; ROIs were defined based on the interaction between perceptual difficulty and time, collapsed across response modality. All statistical maps were thresholded at p < 0.05, after correcting for multiple comparisons using the false discovery rate algorithm implemented in BrainVoyager.
Having identified ROIs in which the response is larger and temporally extended on hard compared with easy trials (step 1 of the analysis), we next tested for latency differences in the onset of the BOLD response in each ROI (step 2 in the analysis) by evaluating the interaction between perceptual difficulty and time across only the first two time points (0–2 s) of the event-related BOLD responses. A significant interaction across this temporal window indicates a differential slope during the rising phase of the responses, which is consistent with the accumulation of sensory evidence and inconsistent with the maintenance of sustained attention or general arousal.
Because both analytical steps involved evaluating the interaction between perceptual difficulty and time (albeit across different temporal windows), we performed a “leave-one-out” cross-validation procedure to ensure that the selection of voxels to include in a ROI during step 1 (larger and temporally extended response on hard compared with easy trials) did not bias the outcome of the statistical test in step 2 (difference in onset latency). Using this procedure, ROIs that exhibited a significant interaction between perceptual difficulty and time (from 0 to 16 s after stimulus) were identified using data from 11 of 12 subjects, and then the data from the remaining subject was extracted from each ROI and used for statistical tests (see Tables 3, 4) and for generating time series plots (see Figs. 6⇓–8, 10). This procedure was repeated 12 times across all permutations of leaving one subject out, generating 12 sets of ROIs (see Table 2). In addition to protecting against bias when evaluating differences in response latency, this procedure also ensured that the time courses are not biased by the inclusion of noise that is favorable to our conclusions (Kriegeskorte et al., 2009; Vul and Kanwisher, 2009; Vul et al., 2009). All analyses of the BOLD response used this leave-one-out procedure, with the exception of the results reported from the human MT region (hMT+); however, hMT+ was identified using independent localizers, so bias of this sort was not an issue (see below).
hMT+ functional localizer.
To identify motion-responsive voxels in hMT+, we presented alternating 10 s trials of 100% coherent motion moving in one of four directions with 10 s trials in which the position of each dot was randomly replotted within the circular aperture on every video frame (resembling “snow” on a television set). The size of the stimulus aperture was the same as the one used in the main experiment. The subject's task was to press a button whenever the speed of the stimulus slowed briefly for 500 ms; these target events occurred at three randomly determined intervals in each 10 s trial. A GLM that contained a regressor corresponding to each stimulus type was used to identify hMT+ as the contiguous cluster of voxels lateral to the parietal–occipital sulcus that responded more during epochs of coherent motion than to the random-dot stimulus (single voxel threshold was set to p < 0.05, corrected for multiple comparisons using the false discovery rate algorithm implemented in BrainVoyager). Bilateral regions of hMT+ were identified in 10 of 12 subjects; only left hMT+ was identified in one of the remaining subjects, and only right hMT+ was identified in the other.
Figure 1 shows a schematic of the four-alternative forced-choice (4AFC) behavioral task subjects performed while in the scanner. This task was relatively easy when subjects were cued to report the direction of the high-coherence dot field, and relatively hard when they were cued to report the direction of the low-coherence dot field (termed easy and hard trials, respectively). Importantly, high and low coherence dot fields were simultaneously present on every trial, so the sensory properties of the display were fixed with respect to the total amount of coherent motion. This feature of the design was introduced to avoid simultaneously manipulating sensory factors (i.e., the motion coherence level) and perceptual difficulty. The subject was free to make a response at any point during the trial to indicate the direction of the currently relevant dot field; a saccadic response was required on one-half of the runs, and a manual button press response was required on the remaining runs. On saccadic-response runs, subjects were required to maintain central fixation until the response was executed; on manual-response runs, central fixation was maintained throughout the trial.
By requiring subjects to use different output response modalities, we were able to search for supramodal signals related to decision making; the observation of this type of signal would support the existence of modality-independent decision variables (Heekeren et al., 2006). To identify such regions, we used the LBA model (see Materials and Methods) to make inferences from the behavioral data about how manipulations of perceptual difficulty should influence the BOLD signal originating from areas that play a role in accumulating sensory evidence during decision making. Importantly, these modeling efforts also dissociated cortical regions involved in perceptual decision making from those more generally involved in attentional processes (i.e., general arousal, task demands, etc.).
Separate two-way repeated-measures ANOVAs with response modality (saccade vs manual) and perceptual difficulty (easy vs hard) were used to assess the accuracy and RT data collected during the scanning session (for a summary of the group data, see Table 1). Subjects were slightly more accurate when making manual compared with saccadic responses (F(1,11) = 6.2; p = 0.03), and there was a robust main effect of perceptual difficulty on accuracy, indicating that deciphering the direction of a low-coherence stimulus on a hard trial was more challenging than deciphering a high-coherence stimulus on an easy trial (F(1,11) = 69.4; p < 0.001). Finally, there was no interaction between response modality and discrimination difficulty, indicating that manipulations of perceptual difficulty had a similar influence on both saccade- and manual-response accuracy (F(1,11) = 0.13; p = 0.73).
RTs were shorter on saccade trials compared with manual trials, but this effect did not reach significance (F(1,11) = 2.7; p = 0.13). RTs were reliably shorter on easy trials compared with hard trials (F(1,11) = 64.6; p < 0.001), and there was no interaction between perceptual difficulty and response modality (F(1,11) = 0.14; p = 0.72).
Linear ballistic accumulator model of behavioral data
Before analyzing the BOLD fMRI data, we fit our behavioral data using the LBA model (Brown and Heathcote, 2008). The goal was to investigate how manipulations of perceptual difficulty and response modality affected RT distributions. For instance, RTs might be faster on easy compared with hard trials because of (1) a change in the rate with which sensory evidence from the display was accumulated (termed the “drift rate” in the model) or (2) a change in the amount of evidence required to make a decision (termed the “response threshold”) or (3) both. Analysis using a cognitive model allows us to tease apart these separate influences and to estimate parameters associated with each. By establishing which parameters changed with experimental manipulations, we can then estimate the pattern of BOLD responses expected from a region that is involved in accumulating sensory evidence during the decision process.
We report here fits to data averaged over participants, for simplicity of exposition. However, we repeated the same analyses separately for each individual participant and obtained broadly similar results (see below). The data were split into four within-subject conditions, defined by two factors: response modality (saccade vs manual) and stimulus coherence (easy vs hard). For simplicity, we collapsed across motion direction (upper left, upper right, lower right, lower left); however, we obtained qualitatively similar results if we included the four motion directions in the analysis to bring the total number of within-subject conditions to 16.
For a single decision condition, the LBA model as described above has five free parameters: t0, A, b, s, and dc, but it is not reasonable that all five of these should be estimated separately for all four conditions (easy vs hard, saccade vs manual). Instead, we fit the LBA model to the data 28 times, using different designs for constraining the parameters. Each design reflects a particular set of psychological assumptions regarding the way our experimental manipulations influenced cognitive processing. For example, the simplest model used a single set of five parameter estimates for all conditions, reflecting the assumption that the data were completely unaffected by the experimental manipulations. Other designs allowed drift rates (dc) to be different for easy versus hard stimuli, or for manual versus saccadic responses, and so on. We compared the adequacy of all possible designs using the Bayesian information criterion (BIC) (Schwarz, 1978). The best design, with BIC = 18,784.26, used constant values of s = 0.227/s (SD) and A = 0.849 (start point parameter) across all conditions. However, the design used higher mean drift rates on easy versus hard trials (dc,easy = 0.739/s, dc,hard = 0. 517/s, with equal drift rates across modalities), and smaller nondecision times for saccadic responses (t0,s = 0.053 s) compared with manual responses (t0,m = 0.134 s), likely reflecting the modestly faster movement execution times for saccades. The model also assumed that the response threshold was slightly lower—that is, less cautious—for saccadic than manual responses (bsaccade = 1.212; bmanual = 1.278). Figure 4 illustrates the observed RT distributions (histograms) along with the predictions from the LBA model (solid lines). The top row shows distributions from high coherence conditions, and the bottom row for low coherence conditions. The first two columns show data from trials with saccadic responses, and the next two show data from trials with manual responses. The same y-axis scale was used for all histograms, so the heights of the distributions illustrate the relative probabilities of the responses (e.g., there are many more correct than incorrect responses for high coherence trials, so both the observed and predicted distributions are much taller for correct responses). The distributions predicted by the LBA are those corresponding to the best-BIC design described above.
We obtained similar results when we repeated the above analyses separately for each individual subject, although results were more variable because of the smaller sample sizes involved. Most importantly, the best-BIC model from the group data analysis performed well across the individual subjects. That model had the third best mean BIC score of the 28 models we tested; the model with the best mean BIC score was identical except with the added constraint that even t0 should be constant across all conditions (not a surprising outcome given that the BIC tends to favor simpler models for smaller sample sizes). Patterns observed in the mean parameter estimates across individual participants also closely matched those obtained from the group data: s = 0.182/s, A = 0.791, dc,easy = 0.688/s, dc,hard = 0.521/s, t0,s = 0.120 s, t0,m = 0.203 s, bsaccade = 0.993 and bmanual = 1.078.
Supramodal mechanisms of information accumulation
To identify “candidate” regions that might be involved in perceptual decision making, we first performed a random effects analysis on data from 11 of 12 subjects to identify cortical areas exhibiting a two-way interaction between perceptual difficulty (easy vs hard) and time (0–16 s in 2 s intervals); this interaction was used to target areas that had a larger and temporally extended response on hard trials compared with easy trials (as in Fig. 3B). The time series of the response on hard and easy trials was then computed from each ROI in the 12th subject; this leave-one-subject-out procedure was then repeated so that each subject was left out in turn (the permutation analysis was performed to avoid biasing a subsequent evaluation of response onset latency) (see below and Materials and Methods). We collapsed across response modality because estimated drift rates did not vary between saccade and manual response conditions, and therefore our simulation predicted an identical BOLD response profile on hard compared with easy trials for both response modalities (Fig. 3A,B). This analysis identified regions in the right insula, bilateral intraparietal sulcus (IPS), bilateral FEFs, a region of medial frontal cortex (MFC), right inferior frontal gyrus (IFG) (just anterior to the insula), right superior frontal gyrus (SFG), and left temporal parietal junction (TPJ) (Fig. 5, Tables 2, 3). We also identified a ROI in left superior frontal sulcus (SFS) on 10 of 12 permutations of leaving one subject out; however, the interaction between perceptual difficulty and time did not reach significance in this region when evaluated in the left-out subjects (Table 3). In all of the regions identified, the interaction between perceptual difficulty and time was driven by a larger and temporally extended response on hard trials compared with easy trials, with the exception of the left SFS and the left TPJ (a description of these regions is presented in Discussion and Fig. 10).
Although a larger and temporally extended response on hard compared with easy trials is consistent with the accumulation of sensory evidence during perceptual decision making, similar effects of perceptual difficulty would also arise from areas involved in maintaining sustained attention or arousal during the decision process (Fig. 3C–F). Therefore, we next used data from only the “left-out” subjects to evaluate the latency of the BOLD response on easy and hard trials in each ROI; a delayed onset on hard trials is a distinguishing characteristic of a neural accumulator (Fig. 3B, shaded region). To test for latency differences, we performed a two-way repeated-measures ANOVA with perceptual difficulty and time as factors, but this time we only included data from the first two time points of the BOLD response (0–2 s after stimulus). Note that the use of a leave-one-out procedure ensures that this second interaction test is independent from the criterion used to define each ROI. A subregion of the right insula was the only area in which the onset of the BOLD response was delayed on hard trials for both response modalities (Table 4, Fig. 6), making it a candidate for computing a supramodal decision variable that might mediate activity in effector-specific regions of sensorimotor cortex. Moreover, three-way repeated-measures ANOVA with perceptual difficulty, time (0–2 s), and ROI as factors revealed that the difference in the slope of the BOLD response on hard compared with easy trials was larger in right insula than in any of the other regions (all values of F(1,11) > 5.0; all values of p < 0.05; excluding data from the left SFS and left TPJ). Finally, to further explore the relationship between perceptual difficulty and BOLD response latency in the right insula, we divided RTs into three bins (collapsed across easy and hard trials) and found that the slope of the BOLD response across the first two time points decreased systematically with increasing RT (two-way repeated-measures ANOVA with RT-bin and time as factors, F(2,22) = 5.11, p = 0.015) (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
In addition, the onset of the BOLD response was delayed on hard trials in bilateral regions of IPS on saccadic response trials (but not on manual response trials), as predicted by previous single-unit recording studies (Shadlen and Newsome, 2001; Roitman and Shadlen, 2002) (differential effect of perceptual difficulty over the first two time points, F(1,11) = 12.3, p = 0.005, collapsed across right and left IPS) (for data from each hemisphere, see Table 4 and Fig. 7). However, no effect of perceptual difficulty on response latency was observed in the FEF on saccade response trials (Table 4, Fig. 7).
Modality-dependent accumulator region for manual responses
Although the BOLD response in IPS was temporally delayed on hard saccade trials (Table 4, Fig. 7), no corresponding modality-dependent accumulator region was found on manual response trials. Therefore, based on previous reports (Meier et al., 2008), we used a two-way ANOVA and a leave-one-out procedure to identify a cluster of voxels in the superior aspect of the left central sulcus that responded more robustly on manual response trials than on saccade response trials (interaction between response modality and time, F(8,88) = 14.2, p < 0.001; mean Talairach coordinates, −35, −23, 54; ±1 SEM across permutations, 0.6, 0.8, 0.5; mean volume, 5.2 ml; ±1 SD, 0.637 ml) (Fig. 8). This region showed a larger and temporally extended response on hard manual trials compared with easy manual trials (F(8,88) = 2.9; p < 0.01) (Fig. 8). Moreover, the onset of the BOLD response was delayed on hard trials when manual responses were required, meeting the second requirement for a modality-specific neural accumulator (differential effect of perceptual difficulty across the first two time points when only considering manual response trials, F(1,11) = 8.0, p < 0.025). No such effects were found on saccade response trials (interaction between perceptual difficulty and time across all time points: F(8,88) = 1.0, NS; interaction between perceptual difficulty and time across only the first two time points: F(1,11) = 1.1, NS).
Activation profile in motion-selective area hMT+
Single-unit recording studies have demonstrated that neurons within stimulus-specific regions in early visual cortex—such as area MT for motion—signal the amount of sensory evidence present in the visual field (Newsome et al., 1989; Salzman et al., 1992; Britten et al., 1993, 1996; Ditterich et al., 2003). However, such regions do not integrate sensory evidence over time, suggesting that they primarily function to provide input to sensorimotor regions that are more directly involved in decision making (Roitman and Shadlen, 2002; Romo and Salinas, 2003; Huk and Shadlen, 2005; Hanks et al., 2006; Gold and Shadlen, 2007). If this account applies to hMT+ as well, then we predict a larger and temporally extended BOLD response on hard compared with easy trials because the sensory evidence on hard trials must be represented for a longer period of time. However, no shift in the latency of activation onset is predicted because the underlying neural activity should be relatively constant for the duration of the stimulus presentation epoch (as opposed to ramping activity, as shown in Fig. 3A). We tested this prediction by examining the BOLD activation profile within independently localized regions of hMT+ (see Materials and Methods). There was a significant interaction between perceptual difficulty and time (from 0 to 16 s), indicating a larger and temporally extended response on hard trials (F(8,88) = 3.8; p < 0.005; collapsed across right and left MT). However, there was no interaction between perceptual difficulty and time over the first two time points of the responses, suggesting that onset latency was similar on hard and easy trials (F(1,11) = 0.2, NS). These results are consistent with the notion that hMT+ primarily plays a role in relaying information about sensory properties of the display to higher order accumulation centers (for a graphical depiction of the BOLD time courses from left and right hMT+, see supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
Here, we examined the neural mechanisms of perceptual decision making using a simple 4AFC task that controlled for sensory factors and a model that allowed us to predict the BOLD activation profile expected from cortical areas that accumulate sensory evidence (Figs. 2, 3). Although the BOLD response in many regions increased with increasing perceptual difficulty, only a subset of these regions exhibited the latency offset predicted for a region involved in accumulating sensory evidence. Of these, only the right insula displayed this characteristic response profile for both tested response modalities. This finding raises the possibility that perceptual decisions are not solely computed by the same neural mechanisms that mediate the ultimate motor response. Instead, the ramping-up of neural activity in sensorimotor regions such as the LIP may also reflect input from downstream regions that compute an abstract decision variable. Note that this account still allows for a causal influence of sensorimotor areas on decision making (Romo et al., 2002; Hanks et al., 2006). However, such regions may not be the actual site of the decision process, but instead might serve as “relay stations” that translate abstract decision signals into an appropriate motor response (Table 4, Figs. 7, 8). As would be the case with any correlational method, the evidence we provide here in support of this hypothesis is tentative; additional work using converging methodologies will be required to clarify the role of the modality-independent signals that we observed in the right insula.
An alternative account of the temporally delayed onset of the BOLD response in the right insula holds that neural activity might briefly pulse (an impulse response) at a slightly later time on hard compared with easy trials, perhaps signaling the termination of the decision process. For example, de Lafuente and Romo (2005) demonstrated that neurons in the medial prefrontal cortex signal the production of a “yes” response in an all-or-none manner, such that the amplitude of the response does not correlate with the difficulty of the perceptual decision (in the context of a detection task). However, our data are inconsistent with this type of all-or-none termination signal because if the two temporally shifted impulse responses were equal in amplitude (Fig. 9A,B), then we should not see a larger and temporally extended BOLD response on hard compared with easy trials (which we observe) (Table 3, Figs. 6⇑–8). However, if the impulse response on hard trials is temporally delayed and larger (Fig. 9C,D), then we would expect to see a BOLD response pattern that is similar to the ramping accumulator model shown in Figure 3, A and B. This second hypothesis is not suggested by any data that we are aware of, but one ad hoc account is that the amplitude of the impulse response is somehow tied to the height of the decision boundary. However, the LBA model we used estimated that the decision boundary was similar on easy and hard trials, arguing against this hypothesis (i.e., primarily drift rate differed). In any case, the pattern of activity depicted in Figure 9, C and D, also implies an important functional role for the right insula as it indicates sensitivity to both the difficulty and the timing of a perceptual decision.
In contrast to the predictions generated by our simulation (Fig. 3), at least two previous studies asserted that the magnitude of the BOLD response should be higher on easy trials compared with hard trials because more sensory evidence is present on easy trials. Based on this criterion, Heekeren et al. (2004, 2006) highlighted a region of posterior left SFS/DLPFC as being important for perceptual decision making. Although we identified a region of the left SFS in 10 of 12 permutations of leaving one subject out that tended to respond more on easy than on hard trials, the effect was not significant (Table 3, Fig. 10). In addition, we also identified a region of the left TPJ—similar to an inferior parietal lobe activation reported by Heekeren et al. (2006)—that responded more on easy trials compared with hard trials (Table 3, Fig. 10). Interestingly, both the left SFS and the left TPJ showed negative response profiles in our experiment (compared with the fixation baseline), with relatively smaller negative deflections on easy compared with hard trials (Fig. 10). Thus, in our study at least, the left SFS and TPJ regions do not appear to follow an activation profile that is consistent with the active accumulation of sensory evidence (i.e., the pattern shown in Fig. 3B). A similar pattern of deactivations was also reported by Tosoni et al. (2008), and we (along with Tosoni et al.) speculate that these regions are functionally related to the “default” network that is actively suppressed during the performance of a demanding task; this suppression should be longer on hard trials because subjects spend more time trying to discriminate the direction of the target (Greicius et al., 2003; Shulman et al., 2003; Raichle and Snyder, 2007; Buckner et al., 2008) (for additional discussion of this point, see also Tosoni et al., 2008).
Tosoni et al. (2008) also proposed that activation levels in putative accumulator areas should increase with increasing sensory evidence, contrary to our model simulations. In their study, the primary focus was on identifying regions of parietal and frontal cortex that mediate modality-dependent responses (saccade and pointing movements) to arbitrary images (faces and houses); they found that modality-sensitive subregions of parietal cortex responded more strongly on easy trials. At first glance, this observation appears at odds with the data we present here that shows larger responses on hard trials when the sensory evidence is weaker. However, because Tosoni et al. (2008) wanted to separate “sensory” from “motor” contributions to the BOLD signal, they had subjects delay their decision for 10.5 s after the presentation of the stimulus while awaiting a “go” signal. Since this delay interval is longer than required by the decision process, it is possible that subjects were storing a modality-dependent representation of their planned response for much of the trial. Given that the computation of the response occurs more quickly when ample sensory evidence is present, the process of storing the prepared motor response for a longer period of time might have contributed to increases in activation on easy trials. In contrast, our subjects were required to make speeded perceptual decisions and thus had little time to engage in cognitive processes not directly related to perceptual decision making. Clearly, more work needs to be done to resolve this issue, perhaps by combining the methods of Tosoni et al. for precisely mapping manual and saccadic sensitive regions with a task that constrains the cognitive operations subjects engage in during the “decision-making” stage of the task.
Even though we focus on the role of right insula in perceptual decision making, we cannot rule out the possibility that other regions are also involved in accumulating sensory evidence across multiple response modalities. Indeed, the interpretation of activation patterns in other areas is difficult: a larger response on hard compared with easy trials in the absence of a latency shift is equally consistent with a role in general attentional control or a lack of statistical sensitivity to detect a true difference in onset latencies. Therefore, we withhold speculation about other regions in anticipation of future studies that will selectively target candidate areas with converging methodologies to further delineate their role in perceptual decision making.
Similar regions of insula have been previously implicated in different aspects of perceptual decision making. Trial-by-trial fluctuations in the left insula predict decisions about near-threshold fearful and nonfearful faces (Pessoa and Padmala, 2005, 2007), even when the sensory evidence is ambiguous and thus equated (Thielscher and Pessoa, 2007). Activation levels in bilateral regions of the anterior insula scale with the amount of differential sensory evidence during vibrotactile decision making (Pleger et al., 2006), increase at the moment of a perceptual decision in an image recognition task (Ploran et al., 2007), and correlate with a non-monotonic RT function during an auditory discrimination task, implying a role in the decision process as opposed to sensory processing (Binder et al., 2004). Finally, activation levels in insular regions also scale with the amount of “uncertainty” a subject experiences while discriminating a stimulus, suggesting a role in the process of comparing sensory evidence to a decision criterion (Grinband et al., 2006).
In contrast, other investigators have suggested that insular regions participate in attentional control precisely because more activation is observed on hard compared with easy tasks (Heekeren et al., 2006, 2008; Philiastides et al., 2006; Philiastides and Sajda, 2007; Tosoni et al., 2008). However, our simulation (Fig. 3A,B) predicts a qualitatively distinct activation profile in decision-making areas compared with “attention” areas, and the profile we observe in the right insula is more consistent with the former. We therefore argue that the present results support the hypothesis that the right insula is involved in coding an abstract decision variable capable of guiding the buildup of activity in effector-specific regions of sensorimotor cortex.
Ultimately, the extent to which regions outside of sensorimotor cortex participate directly in computing perceptual decisions may turn out to depend on the amount of training and the complexity of the task. For example, most single-unit recording studies employ 2AFC paradigms that involve highly stereotyped stimulus–response pairings that are practiced many thousands of times over many months [but see Churchland et al. (2008) for a more complex 4AFC task]. In these tasks, making a perceptual decision is tantamount to selecting a motor response, so it is perhaps not surprising that the empirical evidence is consistent with the hypothesis that perceptual decisions are directly computed by sensorimotor neurons. However, in many everyday situations, a combination of motor responses must be issued in response to a single stimulus. For example, when driving, a red light should motivate both a saccade toward the car immediately in front of you as well as a signal to depress the brake pedal. If perceptual decisions are solely computed and executed by the same mechanisms that mediate the motor response(s), then multiple systems—one for each response modality—must accumulate sensory evidence, translate the evidence into a decision based on current behavioral goals, and then generate two distinct motor responses. An alternative account, and one that is consistent with the present results, holds that a single modality-independent representation of the decision variable is computed and that this representation can then be used to efficiently guide multiple motor responses.
This work was supported by start-up funds provided by University of California, Irvine, and University of California, San Diego (J.T.S.); National Institutes of Health Grant R21-MH083902 (J.T.S.); and Australian Research Council Discovery Project DP0878858 (S.B.). We thank N. Kriegeskorte and C. I. Baker for useful discussions.
- Correspondence should be addressed to John T. Serences, Department of Psychology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0109.
- Binder et al., 2004.↵
- Brainard, 1997.↵
- Britten et al., 1993.↵
- Britten et al., 1996.↵
- Brown and Heathcote, 2005.↵
- Brown and Heathcote, 2008.↵
- Buckner et al., 2008.↵
- Churchland et al., 2008.↵
- Dale and Buckner, 1997.↵
- de Lafuente and Romo, 2005.↵
- Ditterich et al., 2003.↵
- Gitelman, 2002.↵
- Gold and Shadlen, 2000.↵
- Gold and Shadlen, 2001.↵
- Gold and Shadlen, 2003.↵
- Gold and Shadlen, 2007.↵
- Greicius et al., 2003.↵
- Grinband et al., 2006.↵
- Hanes and Schall, 1996.↵
- Hanks et al., 2006.↵
- Heathcote and Brown, 2004.↵
- Heathcote et al., 2002.↵
- Heekeren et al., 2004.↵
- Heekeren et al., 2006.↵
- Heekeren et al., 2008.↵
- Horwitz et al., 2004.↵
- Huk and Shadlen, 2005.↵
- Kiani et al., 2008.↵
- Kim and Shadlen, 1999.↵
- Kriegeskorte et al., 2009.↵
- Mazurek et al., 2003.↵
- Meier et al., 2008.↵
- Nelder, 1965.↵
- Newsome et al., 1989.↵
- Pelli et al., 1997.↵
- Pessoa and Padmala, 2005.↵
- Pessoa and Padmala, 2007.↵
- Philiastides and Sajda, 2007.↵
- Philiastides et al., 2006.↵
- Pleger et al., 2006.↵
- Ploran et al., 2007.↵
- Preuschhof et al., 2006.↵
- Raichle and Snyder, 2007.↵
- Reddi and Carpenter, 2000.↵
- Reeves et al., 2005.↵
- Roitman and Shadlen, 2002.↵
- Romo and Salinas, 1999.↵
- Romo and Salinas, 2003.↵
- Romo et al., 2002.↵
- Salzman and Newsome, 1994.↵
- Salzman et al., 1992.↵
- Schall, 2001.↵
- Schwarz, 1978.↵
- Shadlen and Newsome, 2001.↵
- Shadlen et al., 1996.↵
- Shulman et al., 2003.↵
- Talairach and Tournoux, 1988.↵
- Tegenthoff et al., 2005.↵
- Thielscher and Pessoa, 2007.↵
- Tosoni et al., 2008.↵
- Usher and McClelland, 2001.↵
- Vul and Kanwisher, 2009.↵
- Vul et al., 2009.↵