Decision making can be conceptualized as the culmination of an integrative process in which evidence supporting different response options accumulates gradually over time. We used functional magnetic resonance imaging to investigate brain activity leading up to and during decisions about perceptual object identity. Pictures were revealed gradually and subjects signaled the time of recognition (TR) with a button press. We examined the time course of TR-dependent activity to determine how brain regions tracked the timing of recognition. In several occipital regions, activity increased primarily as stimulus information increased, suggesting a role in lower-level sensory processing. In inferior temporal, frontal, and parietal regions, a gradual buildup in activity peaking in correspondence with TR suggested that these regions participated in the accumulation of evidence supporting object identity. In medial frontal cortex, anterior insula/frontal operculum, and thalamus, activity remained near baseline until TR, suggesting a relation to the moment of recognition or the decision itself. The findings dissociate neural processes that function in concert during perceptual recognition decisions.
Decision making involves the formation of a set of response options, the gathering and synthesis of information, and the selection of a response. Accumulator models (Audley and Pike, 1965; Link and Heath, 1975; Ratcliff, 1978; Ratcliff and McKoon, 1982; Smith and Vickers, 1988; Usher and McClelland, 2001) describe evidence gathering as a gradual process in which evidence supporting different choices accrues over time. Incoming information is evaluated and assigned to a response option, and a decision is made when the evidence exceeds a response threshold. Accumulator models have proven highly effective in describing human performance in recognition memory, lexical decisions, economic decisions, and sensory discriminations (Ratcliff, 1978; Busemeyer, 1985; Bundesen, 1990; Nosofsky and Palmeri, 1997; Ratcliff and Rouder, 1998; Logan and Gordon, 2001; Ratcliff et al., 2004; Gold and Shadlen, 2007). Such models also appear to describe the evolution of neural spiking rates in macaque frontal eye fields during initiation of eye movements (Hanes and Schall, 1996), the lateral intraparietal area (Shadlen and Newsome, 2001) and dorsolateral prefrontal cortex (dlPFC) (Kim and Shadlen, 1999) during motion detection, and superior colliculus during distance judgments (Ratcliff et al., 2003). Importantly, the neurophysiological data provide a link between empirically derived patterns of activity and evidence accumulation. The relationship between neuronal activity and choice outcome indicates that neural processors compute behavioral decision variables by integrating afferent inputs over time (Hanes and Schall, 1996; Kim and Shadlen, 1999; Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Cook and Maunsell, 2002; Roitman and Shadlen, 2002; Hanks et al., 2006; Gold and Shadlen, 2007).
Although such work has been conducted extensively in nonhuman primates, very little has been done to identify the neural mechanisms of evidence accumulation in human decision making (Heekeren et al., 2004). We used functional magnetic resonance imaging (fMRI) to study accumulation and decision making in an extended perceptual recognition task (James et al., 2000; Carlson et al., 2006). Perceptual recognition can occur rapidly. To ensure that the timing of recognition responses varied sufficiently to overcome the limited temporal resolution of fMRI, noise-occluded pictures were revealed gradually over 16 s. Subjects indicated when they recognized the stimulus' identity. This response provided an estimate of the time of recognition (TR). A second response at the end of the trial, when the stimulus was fully revealed, verified the accuracy (VoA) of earlier recognition. Because we were primarily interested in studying decision processes, VoA served as one method to control for motor activity occurring at TR.
In an accumulator model framework, perceptual recognition follows a period of evidence gathering and maintenance. To identify neural substrates of these time-dependent processes, we measured blood oxygen level-dependent (BOLD) responses during, and in the period leading up to, the time of recognition. We hypothesized that fMRI activity related to accumulation and recognition processes would vary as a function of TR, with longer TRs associated with later BOLD responses. We present data from two experiments demonstrating that perceptual recognition is accomplished in dissociable sets of brain regions that process sensory inputs, accrue evidence, and signal recognition.
Materials and Methods
Two experiments were conducted on two separate groups of participants. Experiments 1 (Exp1) and 2 (Exp2) were nearly identical, except that we used within-trial jitter in Exp1 and added trials in Exp2. Experiment 1 was designed to identify regions of interest (ROIs) that contribute to perceptual recognition. Experiment 2 was designed to evaluate the time course of TR-dependent activity in those regions. We chose this two-experiment approach for several reasons. First, comparing TR and VoA events in Exp1 reduced the probability that ROIs would be related to motor processing. Second, because ROIs were defined using data from one set of subjects (Exp1) and applied to data from a different set of subjects (Exp2), the ROI-based time course data from Exp2 were not biased to show specific effects. Third, whereas within-trial jitter was needed in Exp1 to separate BOLD responses associated with recognition and VoA events, in Exp2 we eliminated the within-trial jitter to reduce the complexity of the time courses. In all cases below, anatomical labels and Brodmann's areas (BAs) are approximate.
Participants were 31 right-handed, native English speakers with normal or corrected-to-normal vision (18 female, age 19–29 years). Four participants were excluded from analysis because of excessive movement, and two were excluded because of data loss. Of the remaining 25 participants, 12 were run in Exp1 and 13 in Exp2. Some runs from four of the participants were excluded because of excessive movement. Informed consent was obtained in a manner approved by the Institutional Review Board of the University of Pittsburgh, and participants received $75.
The pictures were 233 grayscale images (Rossion and Pourtois, 2004) reformatted into a standard 284 × 284 pixel image with a white background. Five images were reserved for the practice session; five lists of 12 (Exp 1) or 20 (Exp 2) pictures were randomly selected out of the remaining 228 pictures for task presentation. Each subject received his or her own randomly selected list set, ensuring that no participant received exactly the same set and order of images as another. The displayed images subtended an average of 10.3° of the visual field and were presented against a black background.
Testing consisted of five runs of a perceptual recognition task using picture stimuli, with 12 and 20 trials per run in Exp1 and Exp2, respectively. Runs were randomly intermixed with five runs from a related task, using word stimuli, which is not included in the current report. In each trial, stimulus revelation occurred over eight discrete steps, each corresponding with acquisition of a whole-brain image (Fig. 1a). In Exp1, the steps of revelation were randomly intermixed with six 2 s jitter periods, resulting in an average step duration of 3.5 s. Subjectively, jitter produced a pause of 2, 4, or 6 s between steps of revelation. In Exp2, the revelation steps occurred every 2 s without within-trial jitter. Between-trial jitter of 2, 4, or 6 s [mean intertrial interval (ITI) = 4 s] was included in both experiments to allow event-related analysis of individual trials.
At trial onset, pictures were covered by a black mask. The mask partially dissolved at each successive 2 s interval (i.e., revelation step) until pictures were completely revealed (Fig. 1a). Participants were instructed to press a button when they could identify the picture with a reasonable degree of confidence (TR). Neither speed nor accuracy were emphasized in the TR response, and participants were not specifically encouraged to respond before full revelation. When stimuli were fully revealed, participants pressed the same button again only if their earlier recognition had been correct (VoA). We used gradual stimulus revelation over other unmasking procedures (e.g., mask degradation remains constant but areas revealed change from step to step) because we could readily map the quantity of stimulus input onto neural activity as a linear increase across the trial (Carlson et al., 2006). To help factor out basic lateralized motor signals in group analyses, response hand was counterbalanced across participants (Thielscher and Pessoa, 2007). Psyscope X was used for stimulus presentation and data collection (Cohen et al., 1993) (http://psy.ck.sissa.it).
Images were obtained using a Siemens Allegra 3T scanner. Visual stimuli were generated on an Apple iBook G4 with PsyScope X and projected onto a screen positioned at the head of the magnet bore using a Sharp PG-M20X digital multimedia projector via a mirror attached to the head coil. Earplugs dampened scanner noise. Responses were made using a fiber optic button stick connected to the computer via an interface unit (Current Designs, Philadelphia, PA).
Anatomic images were collected using a magnetization-prepared rapid-acquisition gradient echo sequence [repetition time (TR) = 1540 ms, echo time (TE) = 3.04 ms, flip angle = 8°, inversion time = 800 ms, delay time = 0 ms]. A series of whole-brain spin-echo echo-planar T2*-weighted functional images sensitive to the BOLD contrast (TR = 2000 ms, TE = 30 ms, flip angle = 79°, 2.2 × 2.2 in-plane resolution) were collected during testing. The first three image acquisitions were discarded to allow net magnetization to reach steady state.
Imaging data from each subject were preprocessed to remove noise and artifacts, including: (1) correction for movement within and across runs using a rigid-body rotation and translation algorithm (Snyder, 1996), (2) whole-brain normalization to a common mode of 1000 to allow for comparisons across subjects (Ojemann et al., 1997), and (3) temporal realignment using sinc interpolation of all slices to the temporal midpoint of the first slice, accounting for differences in the acquisition time of each individual slice. Functional data were then resampled into 2 mm isotropic voxels and transformed into stereotaxic atlas space (Talairach and Tournoux, 1988). Atlas registration involved aligning each subject's T1-weighted image to a custom atlas-transformed (Lancaster et al., 1995) target T1-weighted template using a series of affine transforms (Michelon et al., 2003; Fox et al., 2005).
Preprocessed data were analyzed at the voxel level using a general linear model (GLM) approach (Friston et al., 1994; Miezin et al., 2000). Details of this procedure are described by Ollinger et al. (2001). Briefly, the model treats the data at each time point (in each voxel) as the sum of all effects present at that time point (i). Effects can be produced by events in the model (b) and by error (e). Thus, the equation for a given time point i is yi = ai,0b0 + ai,1b1 + … + ai,M − 1bM − 1 + ei, where ai,M is a coefficient relating the effect to the data at time i, and M is the number of modeled effects. In matrix form, this becomes Y = Ab + e, where A is the design matrix relating event types with time, b is a vector of events being modeled, and e is a vector of noise. Estimates of the time course of effects were derived from the model for each response category by coding time points as a set of δ functions immediately after onset of the coded event (Ollinger et al., 2001). The numbers of time points in Exp1 and Exp2 were 9 and 16, respectively. Over each run, a trend term accounted for linear changes in signal, and a constant term modeled the baseline signal. Event-related effects are described in terms of percentage signal change, defined as signal magnitude divided by a constant term. This approach makes no assumptions about the shape of the BOLD response but does assume that all events included in a category (e.g., accurate TR7) are associated with the same BOLD response (Ollinger et al., 2001). Thus, we could extract time courses without placing constraints on their shape. Image processing and analyses were performed using in-house software written in IDL (Research Systems, Boulder, CO).
Group z-statistical maps were derived from the GLM using voxelwise repeated-measures ANOVA with time as a repeated factor (Winer et al., 1991). The ANOVA implementation produces a set of main effect and interaction images determined by the factors in the design (Schlaggar et al., 2002). The main effect of time image identifies voxels in which the temporal profile over the analyzed time period is not flat (i.e., a change in signal). Interaction by time images identify voxels in which activity differs across levels of factors as a function of time. Following is a description of data analysis procedures specific to each experiment.
Experiment 1: region selection.
The goal of this analysis was to define regions that showed differential activity at the time of recognition (TR), independently of when it occurred in the trial, compared with activity elicited at the time of verification (VoA). A subset of these regions could be specifically related to the time of recognition. Events were coded into the voxel-level GLM as follows. Recognition responses occurring before VoA were collapsed into a single TR condition. Trials with single responses occurring at the VoA stage (“end-trial recognition”) were coded separately in the GLM but were not included in statistical analyses reported here. Trials were sorted by self-reported accuracy (earlier recognition response correct, incorrect) and coded separately according to accuracy. A manual VoA response only occurred on trials that were scored as correct. Trial events were modeled over nine time points (18 s) beginning at the time of response. Overall, six regressors were coded in each participant's GLM: TR, VoA correct, VoA incorrect, end-trial recognition, trend, and baseline.
We wanted to identify regions associated with the recognition decision, while at the same time minimizing the extent to which the observed accumulation effects were related to action planning and initiation. To this end, we identified voxels in which activity differed at TR and VoA by entering the TR and VoA events into a repeated-measures ANOVA (Winer et al., 1991) with event type (TR, VoA) and time (nine time points) as factors. This analysis produced a number of main effect and interaction images. The interaction of event type and time image identified voxels in which activity related to TR and VoA differed over time. We used this image to derive regions of interest that were (1) likely to be involved in the recognition process and (2) unlikely to be directly related to motor planning or execution. Functional ROI volumes were defined by growing regions around peak voxels using algorithms developed by Abraham Snyder (Wheeler et al., 2006). This procedure resulted in 73 ROIs (see Tables 1⇓⇓–4).
Experiment 2: time course analysis.
In Exp2, we removed within-trial jitter from the revelation paradigm to evaluate the evolution of the BOLD response over a period of regularly increasing stimulus information. The aim was to find differential timing of activity related to different times of recognition.
Events were coded into each participant's GLM as follows. Recognition responses were binned into seven categories according to the step of revelation (TR1–7) in which they occurred. These categories were further subdivided according to recognition accuracy as denoted by the verification response (correct, incorrect). Trials with single responses occurring in the VoA stage (“end-trial recognition”) were not further categorized and were coded separately in the GLM. As in Exp1, trend and baseline terms were also modeled, resulting in 17 possible regressors (TR1–7 correct, TR1–7 incorrect, end-trial recognition, trend, constant) for each subject. Although trials were 16 s in length, each event was modeled over 32 s (16 time points) to account for the lagged hemodynamic response.
Using ROIs defined in Exp1, we next extracted time courses for a subset of Exp2 conditions. Behavioral data indicated that most correct recognition responses occurred in steps TR4–7 (Fig. 1c, shaded bars). To maximize power, imaging analyses focused on correct TR4–7 trials. Recognition decisions that were judged to be incorrect were also analyzed, but the data are not included in this report. The primary focus is on evaluating the influence of the timing of TR on the shape of the hemodynamic response, including timing of onset, peak, and width waveform components.
Hierarchical cluster analysis.
There are several ways in which time courses might differ between TR and VoA. Our hope was to define regions fitting the different profiles in Figure 2. To objectively identify such areas (from other response profiles), we used hierarchical cluster analysis (Cordes et al., 2002; Salvador et al., 2005; Dosenbach et al., 2007) to classify the time course profiles in the 73 Exp1 regions of interest (ROI). Four time courses, each consisting of 16 time points, were extracted from each ROI relating to recognition times 4–7. The four time courses were concatenated, resulting in a 1 × 64 vector of time points. A 73 × 64 matrix containing each vector from the 73 predefined ROIs was then formed. Correlation coefficients were obtained from the relationship between each region's vector and all other vectors in the matrix. A “1 − r” calculation was then performed as a means of attaining a distance measure between the regions.
From these values, a dendrogram (cluster tree) depicting the region by region relationship was constructed. The method used to build the dendrogram was the commonly chosen unweighted paired group method with arithmetic mean (UPGMA) (Handl et al., 2005), which is included in the Statistics and Bioinformatics Toolboxes available in Matlab 7.2 (The MathWorks; Natick, MA). Two additional hierarchical clustering algorithms also exist as a means of depicting quantifiable relationships within a dataset. These algorithms are referred to as single and complete linkage. Single linkage defines the distance between two clusters as the minimum distance between any two points within the clusters. This method, however, is susceptible to chaining, which typically fails to produce functionally dissociable clusters of data. In contrast, the complete linkage algorithm defines the distance between two clusters as the maximum distance between any two data points within the clusters. The rule implementation in this algorithm often causes the cluster analysis to be susceptible to noise that may contain several outliers. As a means of accommodating more diverse patterns of data, UPGMA was developed (Eisen et al., 1998). The UPGMA algorithm defines the distance between two clusters as the mean distance of all possible pairs of data points between the two clusters.
To validate the clusters created by the dendrogram, a cophenetic correlation coefficient (cophenetic r) is calculated. This value, which ranges from 0 to 1, is a measure of how accurately the dendrogram represents the original pairwise distances between each data point in the original distance matrix. A key element of the UPGMA algorithm is that it is designed to maximally preserve the data in the original, unmodeled correlation matrix. Thus, the cophenetic r when using UPGMA will always be greater than or equal to the cophenetic r when using single or complete linkage (Handl et al., 2005). This is indeed the case for our dataset where UPGMA gives a larger cophenetic r (0.8234) than either single linkage (0.6584) or complete linkage (0.7781) approaches.
Once appropriate candidate regions were identified using cluster analysis, the next step involved quantifying aspects of the time courses related to the hypothesized time courses of Figure 2. Namely, we used linear interpolation to extract response onset (onset), the time at which the response reached its greatest magnitude (peak), and the overall length of responses, measured as the full width at half maximum (FWHM; width) of the BOLD response. Group average time courses for each TR (4–7) were used. We interpolated the data for several reasons. First, the time course profiles differed substantially across ROIs and in many cases were atypical because of the gradual revelation procedure. For example, in sensory processing and accumulator ROIs, the shape was not well represented by a gamma function (Boynton et al., 1996). Second, signal onsets and the rising edge of the BOLD response occurred quite rapidly in some ROIs, and the 2 s sampling rate was sometimes insufficient to obtain clear estimates of the timing of signal onsets. Linear interpolation provided a straightforward procedure to quantify time course parameters across a variety of different types of response profile, while making the simple assumption that the data values between any two time points are adequately approximated by a linear fit between those two points.
We first generated 1000 points between each time point, connecting each pair of the 16 time points with a straight line. This procedure transformed each time course into a time series of 15,000 time points. The peak was defined as the time point at which the peak magnitude occurred. Peak values always occurred at multiples of the repetition time (2 s). Width (FWHM) was defined as the distance between the two points at which the time course was 50% of the peak magnitude, centered around the peak. The interpolated onset was defined by stepping backward from the peak and identifying the time point at which activity exceeded a threshold percentage of peak activity. Because the choice of an onset threshold is not as objectively defined as peak and width, we examined four different threshold values (10, 15, 20, and 25% of peak). The results did not change markedly across the four values, so we used the mean of the four values as the onset point.
A separate, supplemental analysis targeted ROIs in sensorimotor cortex to examine time courses presumably related to overt manual responses, and thus likely showing similar activity at TR and VoA. The aim of this analysis was to characterize the pattern of BOLD response across TRs that were related to motor production so we could determine the extent to which our procedure factored out motor processing in the decision process. ROIs were defined using data from Exp1, and time course data from Exp2 were then extracted from those ROIs. More specifically, participants were divided into two groups according to response hand (right = 6, left = 7). An ANOVA on Exp1 TR data, with response hand (left, right) as a between-group factor, identified a set of regions in which activity differed as a function of response hand. We identified two ROIs encompassing precentral and postcentral gyri (left peak = −41, −26, +58; right = +39, −20, +57) (supplemental Figure 3, available at www.jneurosci.org as supplemental material). Time courses from the two ROIs were highly similar, so to increase power we merged the ROIs before extracting time courses for TR4–7.
In the group analysis (n = 12), 620 of 704 picture trials received a recognition response during steps 1–7 (TR1–7) of revelation. Of these, 564 (91.0%) received a VoA response and were thus judged to be accurate. The 84 recognition responses occurring in step 8 (VoA) were not scored because the pictures were by then fully revealed, and recognition and VoA occurred simultaneously. The distribution of responses was examined by binning recognition times (TRs) on correct trials at 2 s intervals, time-locked to the acquisition of a whole-brain fMRI volume. Binning produced seven categories of response, each associated with a step of revelation (TR1–7). As shown in Figure 1b, most correct responses occurred in TR4–7.
In Exp2 (n = 13), 985 of 1148 trials received a recognition response before VoA. Of these, 852 (86.5%) received a later VoA response. The distribution of binned correct trials was quite similar to the distribution from Exp1 (Fig. 1c), with 157, 156, 247, and 204 in TR4–7, respectively. Data from the two studies demonstrate that gradual revelation produced significant spread in recognition times.
To identify regions involved in perceptual recognition, but not motor execution, TR and VoA events were compared using a repeated-measures ANOVA (see Materials and Methods). This analysis revealed significant differences (p < 0.0001, uncorrected) between recognition and VoA activity in many regions including bilateral calcarine sulcus, cuneus (posterior occipital), precuneus, inferior temporal cortex (IT), posterior parietal lobes (PPLs), anterior insula near the frontal operculum (aI/fO), striatum, dorsal anterior cingulate cortex (ACC), medial frontal gyrus near the presupplementary motor area (meFG/pre-SMA), and dorsolateral prefrontal cortex (Tables 1⇓⇓–4). From this map, we derived regions of interest around peak voxels (10 mm radius, 10 mm consolidation distance between peaks) and masked out voxels that failed to pass multiple comparison and sphericity corrections (see Materials and Methods). This procedure produced 73 ROIs (see Fig. 4, middle; Tables 1⇓⇓–4), which were then used in Exp2 to examine the evolution of TR-dependent BOLD activity during nonjittered revelation.
Based on basic principles of linear systems, including scaling (output magnitude is proportional with input magnitude) and superposition (total response to multiple inputs equals the sum of the responses), we hypothesized that various cognitive events could influence the shape of the evolving BOLD signal. These events include sensory processing, evidence accumulation, recognition decisions, verification decisions, and overt behavior. Because stimuli were revealed gradually from under a black mask, the amount of stimulus information increased at regular intervals throughout the trial. Accordingly, in visual processing areas that process basic stimulus features, the BOLD signal should begin to increase early in the trial and continue to increase as the stimulus is revealed. Thus, as shown in Figure 2a, the width of the time course profile in sensory processing areas should correspond to trial duration. In contrast, the neurophysiological findings (Kim and Shadlen, 1999; Shadlen and Newsome, 2001) predict that activity in accumulation regions will begin early in the trial and continue to increase at a TR-dependent rate. For instance, recognition at TR4 should be associated with a more rapid increase in activity than recognition in TR7 (Fig. 2b). Based on the findings reported in the literature, we may find accumulating patterns of activity in parietal and frontal areas. Given the task demands, we may also find task-specific ROIs that are important in visual object processing. BOLD responses associated with processes engaged at the moment of recognition should also vary according to TR. Because the moment of recognition is a discrete event, responses should occur transiently at TR (Fig. 2c). A large body of literature on decision making (Bush et al., 2002; Carlson et al., 2006; Grinband et al., 2006; Hampton and O'Doherty, 2007; Thielscher and Pessoa, 2007) implicates anterior cingulate and frontal opercular regions in this type of processing. Accordingly, we hypothesize that these regions will be recruited at the moment of recognition. Although the time course profile predictions are derived from theoretical accounts and empirical findings, we note that BOLD responses need not necessarily show these specific time course patterns because some regions may show combinations of such responses (i.e., both accumulator and moment of recognition patterns, which would preclude dissociation), or show responses not otherwise considered a priori.
According to these expectations, the pattern of activity associated with accumulation and the moment of recognition should have some similarity. For example, because accumulation and recognition are inherently associated with TR, the time to peak activity for both should shift in time as a function of TR. However, as illustrated in Figure 2, b and c, they should differ in two important ways. First, because of a hypothetical role in integrating information over time, BOLD responses associated with accumulation should increase as soon as information related to a decision becomes available (e.g., earlier onset) (Fig. 2b). In contrast, the onset of activity associated with the moment of recognition should shift later in time depending on TR (Fig. 2c). Second, because evidence gathering is a prolonged process, time courses associated with accumulation should be more extended in time (i.e., greater width), albeit showing narrower widths than regions related to visual processing. Combined, these two predictions further predict that in accumulation (but not moment of recognition) ROIs the slope of the leading edge of the BOLD response will decrease as TR increases. It is important to emphasize that, despite the preceding hypotheses describing event-dependent time course shapes, we did not explicitly model the shape of the hemodynamic response.
Hierarchical cluster analysis.
From the 73 Exp1 ROIs, we extracted TR-dependent time courses from the Exp2 trials in which recognition occurred during revelation steps 4–7. We wanted to characterize the similarities and differences in patterns of time courses across a large set of ROIs, so we began sorting the data using a hierarchical cluster analysis (see Materials and Methods). Figure 3a displays the relationships among the 73 ROIs using a cluster tree (dendrogram). Regions with similar patterns of time course are clustered more closely than regions with different patterns. Pruning the cluster tree at 1 − r = 0.8 produced four clusters, each associated with a distinct pattern of time course when averaged over all ROIs in a cluster (Fig. 3b–e; Tables 1⇑⇑–4). A large cluster of regions located in medial parietal (precuneus) cortex, superior temporal gyrus, posterior insula, medial frontal cortex, and lateral parietal lobes exhibited negative time courses that tended to peak in step with TR (Fig. 3b, Table 1). A cluster of seven ROIs located in bilateral posterior occipital cortex, lingual gyrus, and left parahippocampal gyrus displayed an initial decrease in activity, followed by a prominent increase near the end of the time series (late positive) (Fig. 3c, Table 2). A third cluster included ROIs near left middle frontal gyrus (∼BA 6), PPL, and middle temporal gyrus, and a subset of the ROIs found in bilateral ACC (∼BA 32, 24). These regions displayed a bimodal response, with an initial positive response that corresponded with TR and a secondary positive response near the end of the time series (Fig. 3d, Table 3). The fourth and largest cluster displayed positive responses that, as a group, suggested increased activity at a rate corresponding with TR (Fig. 3e, Table 4). These ROIs were located in bilateral dlPFC (∼BA 47, 46), ACC (BA 32), IT (including fusiform gyrus; BA 37, 20), PPL (BA 40, 7), cuneus/posterior occipital (BA 18), striatum, thalamus and cerebellum, and represent the target of our further analyses.
Figure 4 shows representative time courses from six ROIs: (1) left ventral cuneus near BA 17 (Talairach atlas x = −19, y = −99, z = 2), (2) left IT near BA 37 (−42, −63, −9), (3) middle occipital gyrus near BA 18 (−1, −83, +25), (4) precuneus near BA 7 (2, −60, 37), (5) meFG/pre-SMA near BA 32 (−1, 14, 51), and (6) right aI/fO (33, 22, −2). The plotted time series extend over 16 frames of image acquisition (32 s) and are shaded according to TR. Even in individual ROIs, there were clear time course patterns evident in the large positive cluster (Table 4). For example, early visual processing areas displayed an early onset of BOLD signal change, followed by a gradual increase in activity that extended to the end of the trial (e.g., left ventral cuneus near BA 17) (Fig. 4a). In these regions, activity corresponded mostly with the amount of visual information on the screen. This type of response is consistent with a “sensory processor” (Fig. 2a) that processes basic sensory information but may not contribute directly to higher order recognition analysis. In other regions with an early onset, the peak in activity shifted with TR. This set included ROIs in bilateral PPL, precuneus, middle occipital gyrus, left IT (Fig. 4b), and bilateral dlPFC (Table 4). This pattern of accumulation is consistent with the predicted behavior of an evidence accumulator (Fig. 2b) that integrates information over time. In contrast, BOLD responses in bilateral aI/fO (Fig. 4f), dorsal ACC, meFG/pre-SMA (Fig. 4e), and thalamus were markedly more transient, with onsets and peaks that appeared to correlate positively with TR. Activity in these regions appeared to be most directly related to processes engaged at the time of recognition. As noted earlier, the cluster analysis identified two patterns of negative response. For example, ROIs near the lingual gyrus and cuneus (∼BA 19) (Fig. 4c) showed an initial decrease in activity followed by a marked increase near the end of the trial. In other regions near the angular gyrus and the precuneus (Fig. 4d), activity appeared to decrease at a rate corresponding with TR. Because of space considerations, we will focus on the positive responses.
Linear interpolation of BOLD responses.
We next investigated time course patterns from the positive-going cluster (Fig. 3e) by quantifying onset, peak, and FWHM for each of the 32 ROIs. Our predictions across levels of TR for sensory processors, accumulators, and moment of recognition ROIs are displayed in Figure 2. To objectively classify ROI behavior based on these attributes, we entered the onset, peak, and FWHM values into a second hierarchical cluster analysis. Thus, in contrast to the previous cluster analysis (Fig. 3), in which we correlated the time courses between ROIs, the goal of the second cluster analysis was to correlate interpolation parameters in the 32 positive ROIs. Interpolation data for each ROI took the form of a 1 × 12 vector representing the four onset points (TR4–7), four peaks (TR4–7), and four width values (TR4–7). For reference, we also included the idealized values for each predicted response type displayed in Figure 2. The results of the second cluster analysis are displayed in Figure 5, with idealized parameters labeled “accumulator,” “sensory,” and “recognition.” The 32 ROIs clustered into three distinct groups containing 14, 5, and 13 ROIs (not including the three idealized value sets). Within each group, some ROIs clustered quite closely with the idealized parameters. For example, regions in medial occipital cortex (∼BA 17/18) clustered closely with the idealized sensory processing parameters (Fig. 5a, red). Responses in occipital and fusiform ROIs clustered quite closely with the idealized accumulator parameters (Fig. 5a, blue). Regions in meFG (∼BA 6), ACC (∼BA 8, 32), right inferior parietal cortex (∼BA 40), right inferior frontal gyrus (∼BA 47), and right inferior precentral gyrus (∼BA 44) clustered tightly with the idealized moment of recognition parameters (Fig. 5a, green). A number of other ROIs, however, were associated with parameter differences that led to greater cluster distances. For example, ROIs in frontal and parietal cortex displayed response parameters similar to the idealized accumulator, but tended to have larger FWHM values than our ideal predictions. Despite these differences, the values in moment of recognition ROIs were clearly distinct from the accumulator and sensory processing ROIs. The parameter values for onset, peak, and width are plotted in Figure 5b–d as a function of TR, averaged over all ROIs in each of the three clusters.
Figures 6⇓–8 (b–d) show the interpolation values for each ROI in the three main interpolation clusters, as well as the average values represented by a thicker black line for reference (which are also shown in Fig. 5b–d). Onset times in sensory ROIs (Fig. 6b) changed little across TR4–7 (mean = 5.5, 5.2, 4.8, 5.2 s, respectively, across ROIs within the cluster), whereas onset times in accumulator ROIs (Fig. 7b) increased slightly (7.1, 7.0, 8.2, 8.6 s). In recognition ROIs, onset times (Fig. 8b) increased markedly as TR increased (8.5, 12.0, 13.3, 14.3 s). Peak times tended to increase in all three region types (Figs. 6⇓–8c), with steeper increases in accumulator (14.8, 16.5, 18.3, 20.0 s) and recognition (14.2, 16.3, 18.3, 20.2 s) ROIs than sensory (15.6, 17.6, 17.6, 18.8 s) ROIs. Sensory ROIs were associated with the widest response profiles across levels of TR (Fig. 6d) (15.7, 15.2, 14.6, 15.3 s), followed by accumulator (Fig. 7d) (11.3, 11.0, 10.2, 9.2 s) and recognition (Fig. 8d) (7.1, 4.8, 5.7, 5.4 s) ROIs. All in all, the regions that make up each category deviated little from the average line.
To test our hypotheses about differences in onset times, peak times, and response width between the three main clusters, we entered the corresponding values for each ROI into three sets of repeated-measures ANOVAs (using SPSS; SPSS, Chicago, IL) with four levels of the repeated-measure TR (4, 5, 6, 7). In the first analysis in the set, we tested for differences in onset, peak, and width across region grouping by including three levels of the between factor region type (sensory, accumulator, recognition). In the second analysis of the set, when warranted by significant effects in the first, we directly contrasted region groups by including only two levels of the between factor region type and testing each unique combination (sensory, accumulator; sensory, recognition; and accumulator, recognition) in three separate ANOVAs. The second set of analyses investigated the source of significant effects in the first analysis by making pairwise comparisons of repeated-measures models. In the third set, we tested the degree to which onset and peak values followed a linear trend across TR by computing three 1 × 4 repeated-measures ANOVAs and including a polynomial trend analysis. In these analyses, the main effect of TR determines the reliability by which interpolated values differ across levels of TR, whereas the trend analysis determines whether that change is linear. All F and p values of repeated measures incorporate the Greenhouse-Geisser sphericity correction to adjust the degrees of freedom.
The 3 × 4 ANOVA of onset times with the three region groups revealed a significant main effect of TR (F(2.5,73.4) = 14.47, p < 0.0001) and a significant interaction of TR with region type (F(5.1,73.4) = 10.91, p < 0.0001), indicating that across regions the onset times differed over levels of TR, and this effect was modified by region type. The main effect of the between-factor region type was also significant (F(2,29) = 31.91, p < 0.0001). To identify the source of the interaction, we next computed three 2 × 4 ANOVAs on each unique combination of the three ROI types. All three ANOVAs revealed an interaction of region type with TR (all p < 0.05), indicating that the observed rates of increasing onset values across TR4–7 (recognition > accumulator > sensory) differed significantly between the three region types.
Next, we computed a separate 1 × 4 repeated-measures ANOVA on each region type to determine whether the change in onset times followed a linear trend across levels of TR. According to these analyses, onset values at different TRs did not differ in sensory ROIs (F(1.3,5.4) = 1.67, p = 0.26) and did not reliably follow a linear trend across TR4–7 (F(1,4) = 1.04, p = 0.37). In contrast, onset values differed significantly across levels of TR in accumulator (F(2.3,28.1) = 5.93, p = 0.01) and recognition (F(2.0,26.1) = 33.51, p < 0.0001) ROIs. Both of the latter region types also showed significant linear trends (accumulator: F(1,12) = 13.39, p < 0.0001; recognition: F(1,13) = 75.37, p < 0.0001), indicating that the onset values increased significantly and linearly as TR increased.
Combined, the three sets of analyses indicate that onset times increased linearly in accumulator and recognition, but not sensory, ROIs. Furthermore, in accumulator and recognition ROI types, the increase in onset times across levels of TR differed reliably. Thus, the increasing onset values observed in recognition ROIs was greater than the accumulation ROI values.
To determine whether peak values changed reliably as a function of TR, we next performed the same set of analyses on interpolated peak times. The 3 × 4 ANOVA with three levels of region type revealed a significant main effect of TR (F(2.8,81.0) = 225.62, p < 0.0001) and an interaction of TR with region type (F(5.6,81.0) = 8.06, p < 0.0001), indicating that peak times increased significantly as a function of TR and that this effect was modified by region type. To explore the source of these differences we next computed three 2 × 4 ANOVAs, each with two levels of region type. In these analyses, interactions of TR with region type indicated that the change in peak values over TR4–7 in sensory ROIs differed from accumulator (F(2.6,41.1) = 9.64, p < 0.0001) and recognition (F(2.2,38.0) = 17.59, p < 0.0001) ROIs. However, there was no difference in peak values over TR4–7 between accumulator and recognition ROIs (F(2.7,67.6) = 1.46, p = 0.23). Thus, increases in time to peak values were greater in accumulator and recognition ROIs than in sensory ROIs. In all three ROI types, the main effect of TR in the 1 × 4 ANOVAs was significant (all p < 0.01), and all showed a significant linear trend (all p < 0.01). Peak times increased as a function of TR in all three region types. The 2 × 4 analyses indicated that the rate of increase in the sensory ROI group was significantly less than in accumulator and recognition ROI groups.
BOLD response width
To test the hypothesis that response widths would be greatest in sensory ROIs, least in recognition ROIs, and intermediate in accumulator ROIs, we first entered the FWHM values into a 3 × 4 ANOVA (describe above). A significant main effect of TR (F(2.0,58.4) = 4.50, p < 0.05) and a nonsignificant interaction of TR with region type (F(4.0,58.4) = 1.96, p = 0.11) indicated that width values differed across levels of TR but not as a function of region type. However, a significant main effect of the between-factor region type (F(2,29) = 90.48, p < 0.0001) revealed a highly reliable difference in widths between regions. Because the interaction was not significant, we did not perform the 2 × 4 regionwise analyses that were conducted on the onset and peak data. Finally, the 1 × 4 ANOVAs revealed no significant main effect of TR in the sensory ROI group (F(1.1,4.3) = 0.32, p = 0.62), and a significant main effect in the accumulator (F(1.6,19.0) = 7.00, p < 0.01) and recognition (F(1.5,19.0) = 4.86, p < 0.05) ROI groups. Only the accumulator group showed significant linear trends in width values across levels of TR (F(1,12) = 11.41, p < 0.01). The principle result from this analysis was that response widths, which were greatest for sensory ROIs, least for recognition ROIs, and intermediate for accumulator ROIs, differed significantly across region type.
We used the onset and peak times to also determine whether the slopes of the leading edge of BOLD response changed as a function of TR. Leading edge slopes were computed as follows: (SCP − SCON)/(TP − TON), where SC = percentage signal change, T = time (sec), P = peak, and ON = onset. Specifically, we hypothesized that in accumulators the slope of the leading edge would decrease as decision time increased. To test this hypothesis, we entered slope values from accumulator ROIs (TR4–7 means: 0.13, 0.11, 0.10, 0.10) into a single factor repeated-measures (TR4–7) ANOVA and included a linear trend analysis to determine whether slope values changed linearly across levels of TR. This analysis revealed a main effect of TR (F(2.1,25.7) = 6.15, p < 0.01) and a significant linear trend (F(1,12) = 8.91, p < 0.05). The same analysis on data from recognition ROIs (TR4–7 means: 0.10, 0.13, 0.10, 0.13) revealed neither a main effect of TR (F(1.9,24.6) = 1.28, p = 0.29) nor a significant linear trend (F(1,13) = 0.57, p = 0.46).
Although the ROIs clustered into three distinct categories that were broadly consistent with our predicted profiles, some ROIs did not cluster as tightly with the idealized response as others. For example, in two “accumulator” ROIs located in left cerebellum (supplemental Fig. 1a, top, available at www.jneurosci.org as supplemental material), activity increased rapidly and independently of TR, but then displayed a more gradual increase that was TR dependent (supplemental Fig. 1b, available at www.jneurosci.org as supplemental material). In ROIs located near the right supramarginal gyrus (+34, −57, +47), right posterior parietal lobe (+49, −48, +47), right inferior precentral gyrus near frontal operculum (+51, +15, +07), and right posterior inferior frontal gyrus (+45, +14, −03) (supplemental Fig. 1a, bottom panel, available at www.jneurosci.org as supplemental material), activity increases occurred relatively late in the trial (supplemental Fig. 1c, available at www.jneurosci.org as supplemental material). The late response suggests that these regions may be more involved in processing occurring at VoA than at TR. In addition, ROIs in or near bilateral striatum (−11, +07, +05; +12, +06, +03), near the head of the caudate nucleus, clustered broadly into the recognition group. However, their time course parameters were least similar to other ROIs in the group (supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
To determine the shape of the BOLD response associated with motor events, we extracted time courses from two sensorimotor regions encompassing precentral and postcentral gyri. Results are shown in supplemental Figure 3, available at www.jneurosci.org as supplemental material. The time courses associated with TR4–5 were bimodal, indicative of separate motor responses occurring at recognition and again at VoA. The time course profiles for TR6–7 were wider, most likely reflecting the fact that perceptual recognition decisions and VoA occurred closely enough in time that the BOLD responses were summated. In motor areas, TR4–5 time courses displayed a clear bimodal pattern in which the initial peak was associated with TR and the second peak with VoA. This pattern of activity differed substantially from the patterns observed in the positive waveform cluster. We conclude from these data that the procedure used to identify ROIs by comparing TR with VoA factored out motor processing to a reasonable degree.
The decision process involves analysis of sensory input, gathering of evidence toward behavioral options, and a later process that allows the selection of a contextually relevant behavior and the evaluation of its appropriateness. We present data indicating that neural mechanisms supporting perceptual recognition decisions can be dissociated using fMRI. These include (1) sensory processors in which activity reflects the quantity of stimulus information entering the system, (2) accumulators that may reflect the gathering of information used in making the decision, and (3) a set of processors that are clearly engaged at, but not before, the time of recognition. Overall, the results help define a hierarchy of neural mechanisms involved in sorting inputs, gathering evidence, and deciding and monitoring an appropriate course of action. In addition, they demonstrate that accumulation processes can be identified using fMRI as dynamically evolving signals.
During stimulus revelation, activity in posterior occipital regions initially increased monotonically as a function of the amount of visual information entering the system. Activity in some sensory processors also appeared to vary modestly with TR, as evidenced by peak latencies that shifted positively with TR (Figs. 4a, 5c). It appears that TR may have been influenced somewhat by bottom-up stimulus-specific differences in the amount or type of stimulus information available throughout the revelation process. That is, some objects may have been identified earlier in the trial because more critically identifiable features were unmasked earlier. For example, certain visible elements (e.g., geons) provide a higher degree of object level information than straight lines (Biederman, 1987). If true, then TR-dependent activity in late visual processing areas could have depended on differences in lower-level feature extraction in early visual areas. The Exp2 behavioral data support this alternative, because recognition times for some items tended to be consistent across subjects (see supplemental text, available at www.jneurosci.org as supplemental material).
Regions demonstrating accumulation may compute decision variables, such that the quantity of accrued activity is associated with decision outcome. We found accumulator patterns of response in ROIs in bilateral occipital lobes, fusiform gyrus near lateral occipital complex (Kourtzi and Kanwisher, 2001), dlPFC, and PPL (Fig. 7). The pattern of TR-dependent accumulation of BOLD activity observed in these regions is consistent with the buildup of neural activity found in studies of nonhuman primates making perceptual decisions (Hanes and Schall, 1996; Kim and Shadlen, 1999; Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Ratcliff et al., 2003). Interestingly, in an fMRI study with human participants, Kleinschmidt et al. (2002) used a perceptual task capable of inducing hysteresis and found right lateralized frontal and parietal, and bilateral occipital/temporal areas in which activity increases were related to a perceptual pop-out effect of letter stimuli. The data are also reminiscent of a diffusion process. For example, in Ratcliff's two-choice diffusion model (Ratcliff, 1978; Ratcliff et al., 2004), evidence is accumulated in a drift parameter and decisions are generated when the value of the drift parameter surpasses a response boundary. The level of activity in the accumulator regions may thus reflect neuronal processing relevant to the recognition decision. For instance, when recognition occurred early in the trial, the leading edge of activity followed a steep slope. As recognition time increased, however, the slope became shallower, suggesting a longer diffusion process. Ultimately, perceptual recognition may occur when activity in one or more accumulators surpasses a response threshold. Because we could not perform trial-level analyses, the findings are too broad to test the veracity of different accumulator models. We note the relationship, instead, to build a conceptual link between the current findings and theoretical accounts of decision making.
It seems plausible that component processes related to analysis of visual features and semantic knowledge could contribute to a pattern of accumulation. Thus, the degree to which the TR-dependent buildup in activity reflects “evidence accumulation” per se is not clear from the data. For example, rather than supporting the presence of an integrative mechanism, it is possible that accumulating activity is a mere byproduct of information processing. Accordingly, the level of activity could be a consequence of the timing by which neurons are recruited during the task. In our view, however, the tight coupling of the rate of activity buildup and TR in our data and in single-unit studies in nonhuman primates (Kim and Shadlen, 1999; Shadlen and Newsome, 2001) suggests that a purely epiphenomenal account is unlikely. However, further experiments are needed to test the content and task specificity of accumulation to the decision-making process.
It is worth noting that some parietal regions displaying accumulation responses are near parietal “retrieval success” regions identified in episodic memory studies using old/new recognition and source memory tasks (Habib and Lepage, 1999; Konishi et al., 2000; McDermott et al., 2000; Donaldson et al., 2001a,b; Wheeler and Buckner, 2003, 2004). For example, a region near the intraparietal sulcus (IPS) (e.g., −26, −68, 38) demonstrated an accumulator pattern of response. Several studies of recognition memory have reported that IPS and precuneus are most active when participants respond “old” and least active when they respond “new,” independently of accuracy (Wheeler and Buckner, 2003; Kahn et al., 2004). The association between the decision outcome and level of BOLD signal suggests that the memory decision may have been based on a simple threshold mechanism. This finding has raised the possibility that in memory tasks the function of such regions is to accumulate and maintain relevant mnemonic information over time. The current data show that activity in IPS gradually accumulates until the moment of object identification, and thus suggest a more domain-general integrative mechanism supporting episodic recognition and perceptual identification.
Activity in a large number of regions, including bilateral thalamus, meFG/pre-SMA, dorsal ACC, and aI/fO, was tightly coupled to the time of recognition (Fig. 8). The precise role of meFG/pre-SMA is unclear, although its function does not appear to be directly related to action planning or execution (Picard and Strick, 2001). The absence of a clear response at VoA supports this view, because the motor demand at VoA did not produce a marked change in signal in meFG/pre-SMA. In contrast, in motor ROIs we observed a transient change in BOLD signal at both recognition and VoA (supplemental Figure 3, available at www.jneurosci.org as supplemental material). The ACC, and more recently the aI/fO, have both been associated with aspects of decision making, including choice (Hampton and O'Doherty, 2007; Thielscher and Pessoa, 2007), error detection (Dehaene et al., 1994), error likelihood (Brown and Braver, 2005), reward evaluation (Rogers et al., 1999; Sanfey et al., 2003), attention for action (Posner et al., 1988; Posner and Petersen, 1990; Bush et al., 1998), response conflict/competition (Bush et al., 1998; Botvinick et al., 1999; Botvinick et al., 2001), confidence (Fleck et al., 2006), and uncertainty (Grinband et al., 2006). The current results do not directly differentiate among these possibilities but do temporally dissociate processes occurring at the moment of recognition and the subsequent verification of that recognition.
The current work extends previous research findings by demonstrating that regions in pre-SMA, ACC, aI/fO, and thalamus show strong activity to cognitive events that arise at, but not before, the moment of recognition. However, they do not appear to be explicitly linked to the decision process. Aside from the recognition decision, a verification decision was made at the VoA response. If these ROIs were obligatorily involved in decision making, then they should display a bimodal time course, the first peak time-locked with TR and the second associated with VoA (Fig. 3d, supplemental Fig. 3, available at www.jneurosci.org as supplemental material). Instead, it appears that activity in areas recruited at the moment of recognition reflects contingencies regarding their recruitment; these regions were clearly more involved at the time of recognition than at the time of verification. Further, it has been hypothesized that several of these regions play a role in error detection (Dehaene et al., 1994). Error trials (not presented) were associated with significantly greater TR-dependent activity than correct trials in the moment of recognition ROIs. However, a pure error detection hypothesis is not supported by the current data because TR-dependent BOLD responses were clearly evident on correct trials. Instead, errors appear to modulate other processing occurring in these areas.
The overall pattern of data suggests a hierarchical framework of neural mechanisms that is recruited during decision making. In this framework, information processing proceeds through sensory processing and evidence accumulation to decision mechanisms, and culminates in a behavior. Presumably, this process begins with the task-level assignment of setting decision criteria and includes postdecision monitoring. Despite the finding presented here, the precise functional relationship between accumulators and moment-of-recognition areas is unknown. For example, it is unclear whether decisions arise from information processed in accumulators or in regions active at the moment of recognition (or elsewhere). Future research on task specificity may be informative in determining whether and how accumulating information is interpreted by decision mechanisms, whether they are task-specific or task-general, and how decisions are reached and enacted.
Randy Buckner, Nico Dosenbach, Ronny Dosenbach, Damien Fair, Anthony Jack, Francis Miezin, and Erik Reichle provided helpful comments and suggestions. Abraham Snyder and Mark McAvoy provided imaging analysis software and support, Kwan-Jin Jung provided magnetic resonance technical assistance, and Kate Fissel assisted with image processing.
- Correspondence should be addressed to Mark E. Wheeler, University of Pittsburgh, 608 Learning Research and Development Center, 3939 O'Hara Street, Pittsburgh, PA 15260.