Abstract
The ability to make accurate and timely decisions, such as judging when it is safe to cross the road, is the foundation of adaptive behavior. While the computational and neural processes supporting simple decisions on isolated stimuli have been well characterized, decision-making in the real world often requires integration of discrete sensory events over time and space. Most previous experimental work on perceptual decision-making has focused on tasks that involve only a single, task-relevant source of sensory input. It remains unclear, therefore, how such integrative decisions are regulated computationally. Here we used psychophysics, electroencephalography, and computational modeling to understand how the human brain combines visual motion signals across space in the service of a single, integrated decision. To that purpose, we presented two random-dot kinematograms in the left and the right visual hemifields. Coherent motion signals were shown briefly and concurrently in each location, and healthy adult human participants of both sexes reported the average of the two motion signals. We directly tested competing predictions arising from influential serial and parallel accounts of visual processing. Using a biologically plausible model of motion filtering, we found evidence in favor of parallel integration as the fundamental computational mechanism regulating integrated perceptual decisions.
Significance Statement
Many decisions require integration of discrete sensory input over time and space raising the question of how distinct input sources are integrated in the service of a single, integrated decision. Complementing previous research which showed that temporal integration exhibits dynamics which are absent from simple decisions, here we characterized neural processes that support spatial integration. Using computational modeling of electroencephalography data to independently characterize neural responses to discrete sensory input across visual hemifields in combination with simulations of neural activity under different integration architectures, we tested whether evidence accumulation in support of integrated decisions is serial or parallel.
Introduction
Decision-making has been widely studied in the laboratory by psychologists (Summerfield and Tsetsos, 2015; Ratcliff et al., 2016), economists (Woodford, 2020; Prat-Carrabin and Woodford, 2022), and neuroscientists (Shadlen and Kiani, 2013; O’Connell et al., 2018) over several decades. Significant progress has been made in understanding the computational and neural processes involved, particularly in the realm of simple perceptual decisions (Forstmann et al., 2016; Ratcliff et al., 2016; Hanks and Summerfield, 2017). While most decision-making paradigms used just a single task-relevant stimulus—a patch of moving dots (Loughnane et al., 2016), a static grating (McIntyre et al., 2022), or a face (Philiastides et al., 2011)—real-world choices, however, typically require integration of discrete sensory events over time and space. Much less is known about these more complex, “integrated” decisions (Wyart et al., 2012, 2015; Rangelov and Mattingley, 2020). In previous work, we have demonstrated that integrating discrete sensory input over time exhibits dynamics that are absent from simple decisions (Rangelov et al., 2021; McIntyre et al., 2022). Here, we focused on spatial integration and used psychophysics combined with brain imaging and computational modeling to understand how human observers combine spatially discrete sensory inputs in the service of integrated perceptual decisions.
Evidence-accumulation models of perceptual decision-making (Heathcote et al., 2019) posit that sensory evidence is accumulated over time into an abstract decision variable and a choice is made when a threshold is reached. Electrophysiological recordings in animals (Shadlen and Kiani, 2013; Licata et al., 2017; Waskom et al., 2018) and brain imaging in humans (Heekeren et al., 2008; O’Connell et al., 2012; Twomey et al., 2016) have uncovered the neural signatures of evidence accumulation, lending neurobiological support to these models. Recently, multivariate computational modeling of whole-scalp electroencephalography (EEG) has permitted researchers to characterize neural responses in a feature-specific manner, indexing how well the brain represents sensory inputs in near real time, as decisions unfold (Tang et al., 2018; Smout et al., 2019). These feature-specific responses have a gradual onset and increase monotonically toward an asymptotic threshold (Rangelov and Mattingley, 2020; Rangelov et al., 2021; McIntyre et al., 2022). Unlike other neural correlates of evidence accumulation, these analyses can isolate neural responses to several concurrently presented stimuli. In recent work, we found significant motion tuning to a task-relevant random-dot kinematograms (RDKs), but not to a spatially overlapping, task-irrelevant RDK of equal salience (Rangelov and Mattingley, 2020; Rangelov et al., 2021). The fact that such feature-specific analyses capture neural responses only to decision-relevant inputs suggests that feature-tuning analyses can reveal the temporal dynamics of evidence accumulation during decision-making. Here we used feature-tuning analyses of human EEG to characterize evidence accumulation in a novel decision-making task.
To investigate decisions that require integration of spatially discrete inputs, we presented two RDKs in the left and the right visual hemifields. Coherent motion signals were shown concurrently in each RDK, and participants reported the average of the two motion signals. Hypothetically, sensory evidence from different locations in the visual field could be sampled and accumulated into a decision variable either serially, one at a time, or in parallel, all at once. Both serial and parallel processing architectures have been identified in the literature, albeit for different cognitive processes (Treisman, 1999; Pashler, 2000). Studies on evidence accumulation from two discrete sensory sources in the service of two decisions have yielded mixed findings, suggesting both parallel (Wyart et al., 2012) and serial mechanisms (Kang et al., 2021). It remains unclear, therefore, which mechanism supports integrated decisions of the kind we are interested in here. To determine which mechanism supports integrated decisions, we simulated the accumulation of inputs by motion-tuned sensory neurons using a biologically plausible model of motion filtering (Adelson and Bergen, 1985; Waskom et al., 2018) and compared the simulated data with participants’ observed brain activity. The simulation analyses revealed that serial evidence accumulation and integration predicts negative correlations between processing of the individual RDKs, whereas parallel processing predicts null correlations.
Materials and Methods
Participants
Forty-two healthy, human adults (22 female, Mage/SDage = 23/5 years) took part in the study. They were compensated for their participation at a rate of 20 AUD/h. Participants had corrected-to-normal or normal vision, and all were right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). Sample size was determined in advance in accordance with our recent studies (Rangelov and Mattingley, 2020; Rangelov et al., 2021; McIntyre et al., 2022) which used a similar data-analytic approach while accounting for a projected participant attrition rate of 10–15%. No a priori exclusion criteria were applied. Six participants were excluded due to data corruption, and a further three participants were excluded as outliers on the basis of brain imaging data (see below), leaving 33 complete datasets (15 females, Mage/SDage = 24/5 years). The study was approved by the Human Research Ethics Committee of The University of Queensland (Approval 2016001247) and was conducted in accordance with the Human Subjects Guidelines of the Declaration of Helsinki. Written informed consent was obtained from all participants prior to experimental testing.
Stimuli and procedure
Each trial (Fig. 1A) started with a central, red fixation cross (height/width, 0.25 degrees of visual angle, i.e., dva, RGB = [255, 0, 0]) shown against a black background (RGB = [0, 0, 0]). After 2–3 s, two circular patches of moving white dots (diameter, 8 dva) appeared, situated 8 dva to the left and right from the fixation cross. Each patch comprised 150 dots (diameter, 0.16 dva; lifetime, 50 ms) moving at 8 dva/s. The dots moved randomly for 0.5–1 s, after which 80% of dots in each patch started moving coherently for 0.8 s, followed by a blank screen for 0.2 s. Coherent motion events were displayed well above the normal perceptual threshold for motion at fixation (Scase et al., 1996) because the RDKs were presented in the periphery and the task required averaging of the two signals rather than discrimination of either component-motion alone. The motion direction of the left patch was randomly and uniformly sampled on every trial from directions spanning the full circle (0–2π radians). The motion direction of the right patch was randomly and uniformly sampled from directions shifted ±(0.25π–0.75π) radians relative to the motion of the left patch. Participants were instructed to monitor the motion directions of both patches while maintaining fixation.
A response display appeared at the end of each trial and comprised a central, gray circle (diameter, 8 dva; RGB = [125, 125, 125]) together with a blue dial (length, 4 dva; RGB = [0, 0, 255]). Participants were instructed to reproduce the average motion direction of the left and the right coherent motion events by adjusting the orientation of the dial using a computer mouse. They had 3 s to respond, after which a new trial started.
Prior to the main testing session, participants completed a training session comprising one block of 85 trials. The trial structure during training was identical to that of the main session, with the exception that a feedback display was added at the end of every trial for 1 s, showing the correct answer. In the main testing session, participants completed three blocks of 96 trials (i.e., 288 trials) separated by short rest breaks. The whole testing session took ∼45 min to complete. As participants performed the task, their brain activity was recorded using EEG. Participants also had inactive electrodes attached to their scalp for a transcranial electrical stimulation (TES) protocol, the results of which are not reported here. The active TES and sham sessions were completed on different days, and the TES to session assignment was randomized across participants using Latin square. Experimenters were aware of the TES (active vs sham) per session, while participants were not informed about the nature of the TES.
Apparatus
The stimuli were presented on a 24 inch LED monitor (screen resolution, 1,920 × 1,080 pixels; refresh rate, 60 Hz). Participants were seated ∼57 cm in front of the monitor. The stimuli were generated using Matlab with Psychtoolbox on a Dell Precision T1700 computer with Quadro K4000 graphics card, running Microsoft Windows 7 Enterprise, 64 bit. Brain activity was recorded using a 24 bit BioSemi ActiveTwo EEG system comprising 128 Ag/AgCl scalp electrodes positioned in the 10–20 layout with a resolution of 31.25 nV (±262 mV recording range) at a sampling rate of 1,024 Hz.
Behavioral analyses
For every trial, error magnitudes were computed as the angular difference between the reproduced
EEG analyses
The MNE-Python library was used for EEG data analyses (Gramfort et al., 2013). Continuous recordings were preprocessed off-line using the automated FASTER pipeline (Nolan et al., 2010). Briefly, the EEG channels were rereferenced using the average of all electrodes. Using a fourth-order IIR Butterworth filter, the data were bandpass filtered in the range 0.1–40 Hz and notch filtered at 50 Hz to remove electrical line noise. Bad channels were identified and interpolated using adjacent electrodes (9 out of 128 channels on average across participants; SD = 2). All channels were again rereferenced to the average electrode. The independent component analysis (ICA) was used to identify and remove eye-movement and other artifacts. The ICA yielded 70 components on average across participants (SD = 12), of which nine bad components (SD = 2) were identified and removed on average across participants. The data were segmented into 1 s epochs time locked to the onset of coherent motion, down-sampled to 256 Hz, baseline-corrected using the average amplitude in the −0.05 to 0.05 s time window, and linearly detrended. Since the coherent motion signals were preceded by a variable buffer period (0.5–1 s) containing random motion, the time window for baseline correction was selected to minimize the influence of neural responses to the random motion signals. There was no reason to believe that including 50 ms poststimulus would interfere with the ability to decode motion signals since we have shown in our previously published work using a similar paradigm that significant motion decoding does not arise until ∼200 ms after the onset of coherent motion signals. The segmented data were spatially filtered using a surface Laplacian and temporally smoothed using a Gaussian window (SD = 16 ms). Bad epochs were identified and excluded from further analyses (14 epochs per participant on average; SD = 6), leaving 274 epochs on average across participants for further analyses. The final step of the FASTER processing pipeline involved z-scoring of the ERPs over all conditions per participant relative to other participants. Participants with an absolute z-score higher than 3 (i.e., >3 standard deviations from the sample mean) were considered as outliers and removed from further analyses. Three participants were identified as outliers on the basis of the EEG data and excluded from further analyses.
Motion tuning analyses
Analyses of the segmented EEG data focused on characterizing motion-specific patterns (Wolff et al., 2017) of responses across all electrodes. When analyzing responses to the left motion direction, for example, for every trial n (i.e., test trial), the remaining trials (i.e., training trials) were sorted into 16 equidistant bins spanning the full circle (i.e., bin centers ∈ {−π, …, −0.6π, …, −0.2π, …, 0.2π, …, 0.6π, …, π}), relative to the angular difference between the left motion presented in the test and the training trials. For each bin b and time sample t, a multivariate Mahalanobis distance
Simulation analyses
To investigate neural mechanisms of sensory integration, responses of hypothetical motion-tuned neurons were simulated using a biologically plausible model of motion filtering (Waskom et al., 2018). Briefly, this model simulates responses of motion-tuned neurons which respond preferentially to a specific motion direction (e.g., 12 o’clock) within a specific region of visual field (e.g., around fixation). Visual displays comprising dot motion stimuli were generated using the same code as used in the experiment proper. Only the 0.8 s periods of the left- and right-sided coherent motion events were generated, together with 0.2 s of blank display at the end. In addition, the left and right patches were algorithmically shifted to the center of the display, mimicking pooling of sensory input across space by a single, integrative neuron. Since the stimuli were presented around fixation, all simulated neurons were preferentially tuned to the central area of the visual field. The sensory input was simulated in two different ways (Fig. 4A). Parallel sampling involved presenting the left and the right patches together (i.e., overlapping) on every frame. Serial sampling, in contrast, involved presenting either the left patch alone or the right patch alone on every frame. Two sampling frequencies were used: 10 Hz sampling involved presenting, e.g., the left patch for 100 ms, then randomly choosing which patch (i.e., either the left again or the right) to present for the next 100 ms, and so on. For 5 Hz sampling, the same procedure was used, the only difference being that a patch was presented for 200 ms. One hundred trials were simulated, and each trial had three different sampling versions (parallel sampling, 5 Hz serial and 10 Hz serial). For every trial, 16 neurons were simulated, differing in their preferred motion directions. The preferred motion directions spanned a full circle, ranging from completely opposite to the motion signal shown in the display (e.g.,
Similar to the multivariate feature-specific analyses, the simulation analyses yielded a matrix of simulated accumulated responses
Noise correlations
To further investigate whether signals in the left and right patches were integrated in a parallel or serial manner, the correlations between the instantaneous tuning noise for the left and the right motion patches were analyzed. If, for example, only the left motion patch is sampled at any given moment, then tuning to the left motion events should be robust and substantial. Since in this example the right motion stimulus is not sampled at all, tuning to the right motion event should only reflect noise in the sensory input, predicting a negative correlation between the two. In contrast, the parallel model predicts no correlation as the tuning noise is independent across different motion signals. To compute the noise correlations, the tuning time-traces across trials were demeaned relative to the average of all trials. Next, the correlation coefficient was computed (Spearman's ρ) between the residual tuning to the left and the right motion events across trials and per time sample. Finally, the median correlation coefficient across time samples was computed as the overall estimate of noise correlations. To estimate distributions of the noise correlations, the correlations for 100 simulated trials were bootstrapped by random and uniform sampling with replacement 1,000 times. In addition to estimating noise correlations between tuning to the left and the right motion signals, noise correlations for the average motion representations were also examined. These analyses were performed separately for three different sampling algorithms (5 Hz serial, 10 Hz serial and parallel sampling).
While the analyses of noise correlations in the simulated neural responses would demonstrate that different sampling algorithms predict different patterns of correlations, it is unclear which sampling algorithm is implemented by the brain. The final analysis characterized the noise correlations between tuning strengths estimated using the recorded EEG activity. As with the analyses described above, the tuning time traces were first demeaned. Next, the demeaned time traces across all trials per participant were used to compute correlation coefficients between tuning to different analyzed motion directions, separately per time sample. Finally, the median correlation across time samples was computed as the estimate of the overall correlation coefficient. The distributions of correlation coefficients across participants were qualitatively compared with the predictions of the serial and parallel models generated on the basis of the simulated neural responses.
Experimental design and statistical analyses
One factor (motion direction) with three levels (left and right component motions and their average) within-participants experimental design was used. One-sample, two-way t tests and one-way repeated-measures ANOVAs corrected for multiple comparisons using FDR correction (Benjamini and Hochberg, 1995) were used for statistical analyses. All tests were evaluated at p = 0.05.
Data and code accessibility
The collected data together with the code used for analyses and reporting are available at the eSpace repository of the University of Queensland under a “reuse with acknowledgment” license (https://doi.org/10.48610/5485667).
Results
Behavioral analyses
Participants’ responses were quantified in terms of error magnitude (i.e., the reproduced average motion direction, relative to the true average of the two motion signals for that trial). The distributions of error magnitudes were unimodal (Fig. 1C), centered on zero and fairly narrow, indicating that observers were able to make decisions on the average direction of motion. To independently characterize random guessing and noisy target responses, observers’ error magnitudes were fitted using mixture distribution modeling (Bays and Husain, 2008; Zhang and Luck, 2008). We modeled the noisy target responses as a combination of veridical and reversed motion percepts (e.g., left veridical/right reversed, etc.). Participants guessed in 8% of trials (SEM = 2%), and experienced motion reversals in 9% of trials (SEM = 3%). The estimated precision of target responses was fairly high (KM/SEM = 9.89/1.26), equivalent to 55° full-width at half-maximum on average. For reference, the average tuning width of direction-selective neurons in macaque area MT is 60° (Treue et al., 2000), suggesting that participants performed the task well.
By experimental design, the angular distance between the two component motion signals varied randomly between trials. We asked whether participants’ judgments of the average motion direction on each trial might be more accurate for cases in which the two component motion signals were more similar in their directions, compared with trials in which the difference between component-motion directions was larger. To characterize the effect of angular distance, trials were binned using a median split, and mixture distribution models were refit separately for small angular distances (67° on average) and large angular distances (113° on average). The effect of angular distance was negligible for all fitted parameters as revealed by one-sample two-tailed t tests (guesses: 13.5 and 8.9% for small and large distance, respectively; reversals: 7.2 and 9.8%; precision of target responses: 10.84 and 10.59, all p > 0.10). These results suggest that the angular distance between component motion signals have a minimal effect on the accuracy of participants’ integrated decisions. Since behavioral analyses revealed no effects of angular distance, these analyses were not performed over the neural data.
Mass univariate EEG analyses
Continuous EEG data were preprocessed off-line using the FASTER pipeline (Nolan et al., 2010) and segmented into 1 s epochs, time locked to the onset of coherent motion. In the first step, to determine whether individual electrodes carried any feature-selective information about the motion signals, we regressed time-resolved voltage at individual electrodes on the left and right component motions and the average motion direction using the same methods as in our previously published work (Smout et al., 2019). With the exception of a few brief, phasic responses following signal onset, neural activity recorded from individual electrodes was not selective for either of the component motion signals (left, right) or their computed average. Importantly, there was no evidence for selective responses over frontal electrodes, which would have been apparent had participants made eye movements to track the motion signals.
Population tuning EEG analyses
We chose to measure neural responses using EEG because this method affords millisecond-level temporal resolution, and because we and others have shown that time-resolved direction-selective responses to brief dot motion stimuli can be reliably decoded from EEG recordings using multivariate decoding to characterize patterns of brain activity across all electrodes (Brouwer and Heeger, 2009; Garcia et al., 2013; Rangelov and Mattingley, 2020; Rangelov et al., 2021; McIntyre et al., 2022). The decoded signals (Fig. 2A, Materials and Methods) correspond to the aggregated activity of hypothetical neural populations responding preferentially to different motion directions. The profile of the decoded signals resembles a bell-shaped curve defined by the match between the preferred and analyzed motion directions. We used this curve to quantify population-level motion tuning to the left and right component motion signals, as well as to the derived (but never shown), average motion direction. Comparing the tuning time-courses across different motion directions allowed us to compare neural representations of the different component motion signals and their average and thus to characterize the neural mechanisms underlying integrated decision-making.
All three analyzed motion directions (the left and the right component motions and their average) were concurrently and robustly represented in patterns of brain activity (Fig. 2B). To characterize the overall time course of direction-specific neural activity, an aggregate index of tuning strength was computed as a weighted sum of similarities across bins (Fig. 2B and Materials and Methods). Direction-specific responses started at ∼0.2 s after motion onset and remained strong for the duration of the epoch (Fig. 3A). The onset time is comparably late (Di Russo et al., 2002) for the direction-specific activity to reflect early, sensory responses (e.g., N1/P1), while it closely matches other well-documented correlates of decision-making such as the central-parietal positivity, i.e., the CPP (O’Connell et al., 2012; Twomey et al., 2016; Rangelov and Mattingley, 2020; Rangelov et al., 2021; McIntyre et al., 2022). Interestingly, tuning to the derived, average motion direction—which was never actually presented in the displays—was stronger than tuning to either of the two component motion stimuli displayed in the left and right hemifields.
The literature on perceptual decision-making has made a conceptual distinction between two mechanisms by which sensory information is passed to evidence accumulation processes. Either noisy samples of sensory input are passed directly to an accumulation stage or transfer is mediated by a short-term memory buffer. Conceptually, decisions about motion direction should require some form of storage, as directional information can only be derived from location changes over time, lending support to the latter idea (Smith and Lilburn, 2020). Further, research on the interplay between selective attention and evidence accumulation has revealed an important role for memory in decision-making (Smith and Ratcliff, 2009). Finally, research on changes of mind has shown that sensory input stored in a memory buffer can influence postdecisional processing (Stone et al., 2022). Thus, there is an emerging view that information transfer between perception and action during decision-making is mediated by memory. Returning to the results of the current study, it is possible that the population tuning traces we report reflect a gradual increase in the precision of decision-relevant sensory representations stored in a memory buffer, which in turn provides input to response selection processes. Interestingly, encoding sensory input into a memory buffer and response selection can both be described as an accumulation process, but with conceptually different outcomes: whereas encoding addresses the “what” question, response selection addresses the “how” question. Of note, most neural correlates of decision-making considered in the literature (e.g., ramping activity of individual neurons in area LIP in macaque, and the CPP measured in humans via EEG; O’Connell et al., 2018) appear to reflect the encoding processes (i.e., the “what”), rather than the response selection processes (i.e., the “how”). To the extent that our population tuning analyses capture a gradual increase in observers’ beliefs about the “what” question, the implication is that our population tuning measure is broadly comparable to these other, more familiar, correlates of decision-making.
Next, we considered two ways in which the brain could process component motion signals and their average. On the one hand, it might represent the component motion signals independently of the average motion direction. On the other, it might represent the average motion direction only, with tuning to the component motion directions merely reflecting their correlations with the average motion direction. To address this issue, we quantified the mean profiles of bin similarities for the two component motion signals (Fig. 3C, teal and red lines) and their average (violet) per participant by averaging bin similarities (Fig. 2B) across all trials and all time samples within a time window of 0.2–1.0 s after the onset of coherent motion signals (during which motion tuning was statistically significant, as shown in Fig. 3A). To quantify the separation between neural tuning profiles, we subtracted bin similarities across different bin distances and different tuning profiles (see Materials and Methods for details). For the left motion direction, for example, the bin similarity in the tuning profile for the left motion at a bin distance of zero (Fig. 3C, upright teal triangle) was subtracted from the bin similarity in the tuning profile for the average direction at the bin equal to the angular distance between the left and the average motion direction (Fig. 3C, inverted violet triangle) on that trial. This quantity reflects the trial-specific separation between tuning profiles for the left motion direction and the average motion direction, which we call a “decision function”, or DfL. To facilitate comparisons between participants, participant-specific DfL scores were z-scored across trials.
If the brain represents only the left motion direction, the DfL distribution should be centered on zero. If, in contrast, the brain represents the left motion direction separately from the average direction, the DfL distribution should be shifted away from zero. As shown in the top panel of Figure 3D, the grand-average histogram showing DfL scores (teal bars) exhibit a clear negative shift. An analogous decision function was computed for the average motion direction (DfA). This involved subtracting the bin similarity in the tuning profile for the left component at the bin equal to the angular distance between the left motion direction and the average (Fig. 3C, teal square) from the bin similarity in the tuning profile for the average motion direction at zero distance (Fig. 3C, violet circle). The participant-specific DfA scores were again z-scored across trials. If the brain represents only the average motion direction, then the DfA distribution should be centered on zero. If, in contrast, the brain represents the average direction separately from the left motion direction, the DfA distribution should be shifted away from zero. The top panel of Figure 3D clearly shows that the estimated DfA distribution was shifted positively. We used the same approach to compare tuning profiles for the right component motion direction and the average (Fig. 3D, bottom panel) and obtained a very similar pattern of negative and positive shifts in the decision functions.
Taken together, these results support the notion of distinct neural representations for all three motion directions (Fig. 3D). To quantify the degree of separation between distributions, we derived ROC curves per participant (McNicol, 2005). Figure 3E shows the grand-average ROC curves for discriminating between the average and the left component motion direction (Fig. 3E, teal curve) and between the average of the right component motion (Fig. 3E, red curve). Both ROC curves clearly deviate from chance (dashed diagonal line), indicating a reliable separation between neural representations of each component motion signal and the average. From the ROC curves, we estimated of the area-under-the-curve scores, which were significantly above chance (AUCM/SEM = 0.69/0.06 and 0.69/0.07 for the left and right component motion signals, relative to 0.50 for random guessing, both tone-sample ≥ 2.93, both ptwo-tailed < 0.01).
Simulations of neural population tuning
To investigate whether the neural representation of the derived average direction relied upon parallel or serial spatial integration of the component motion signals, a biologically plausible model of motion filtering (Adelson and Bergen, 1985; Kiani et al., 2008; Waskom et al., 2018) was used to simulate responses of motion-tuned integrator neurons to different types of sensory input (Fig. 4A). The model assumes that motion-tuned neurons operate as spatiotemporal filters such that they preferentially respond to a specific motion direction (i.e., the preferred direction) within a specific area of visual field. Similar to the multivariate decoding analyses described above, characterizing response profiles of a bank of neurons tuned to different motion directions in the same area can reveal population tuning strength to different motion directions.
Using the code that generated the visual displays shown to participants, different motion stimuli were created to mimic either serial or parallel integration of component motion signals (Fig. 4A, left). The left and right RDKs were algorithmically shifted to the center of the display so that they overlapped. For serial sampling, only the left or the right RDK was shown on any frame (see Materials and Methods). The sampling was simulated at 10 and 5 Hz, which broadly overlap with the frequencies at which perceptual and attentional systems are thought to sample sensory input (VanRullen, 2016). For parallel sampling, the two RDKs were shown together on every frame. Simulated visual displays served as an input to a bank of neurons that differed in their preferred motion direction. To simulate accumulation of sensory evidence, the estimated neural activity was summed along the time axis, which yielded a robust monotonic increase in activity for neurons tuned to the analyzed motion direction and a flat profile for neurons tuned to the opposite direction (Fig. 4A, right). Simulated motion energy (Fig. 4B; see Materials and Methods) was estimated separately for the three possible sampling mechanisms (parallel, 5 Hz serial, 10 Hz serial) and for the three different motion directions (left, right, average). In all scenarios, the time course of accumulated motion energy closely mimicked the expected temporal dynamics of evidence accumulation, with a gradual onset and rise toward an asymptotic threshold. Further, the rate of accumulation was higher for the average motion direction than that of the component motion signals, independently of the integration mechanism. Finally, the simulated motion energy for the average motion direction reached a higher asymptote than either of the two component motion directions. Broadly speaking, simulating accumulation of neural responses to different motion directions revealed that all three hypothetical integration mechanisms were consistent with the EEG results.
To test these observations, the jack-knifed time traces were modeled using three parameters: the onset, the rate, and the maximum amplitude (see Materials and Methods). Repeated-measures ANOVAs for the onset parameter, conducted separately for each integration type, revealed mixed results. For the 10 Hz serial model, there was a statistically significant main effect of the motion direction (Fc = 3.44; p = 0.034), with the slowest onset for the right component motion (82 ms) relative to the left (40 ms) and average motion (60 ms) signals, which were not different from each other (p > 0.10). For the 5 Hz serial model, in contrast, the effect of motion direction on the onset parameter was not statistically significant (p = 0.347). Finally, for the parallel model, the onset was slower for the two component motion signals (63 ms on average) than that for the average motion (60 ms; Fc = 9.22; p < 0.001). Repeated-measures ANOVAs for the rate parameter, in contrast, were consistent across different scenarios: there was a statistically significant main effect of motion direction (all Fc > 7.68; all p > 0.001) reflecting a higher accumulation rate for the average motion direction (2.31 a.u./s) relative to the component motions (1.98 a.u./s on average), which were not statistically different from each other (all p > 0.244). Finally, repeated-measures ANOVAs for the maximum parameter revealed a statistically significant effect of motion direction (all Fc > 38.13; all p > 0.001) reflecting a higher maximum for the average motion direction (1.86 a.u. on average) relative to component motion signals (1.58 a.u. on average), which were not different from each other (all p > 0.10). Taken together, the analyses of the fitted parameters revealed that the simulated evidence accumulation for the average motion direction was superior to that of the component motions. Critically, the three simulated models were comparable, showing that both serial and parallel integration models could account for the observed EEG data.
Analyses of noise correlations
To determine which mechanism provides a better fit to the obtained data, in a final set of analyses we focused on noise correlations between tuning strengths for different motion directions. Under serial sampling, whenever one motion signal is sampled (e.g., left), the other motion signal (e.g., right), by definition, is ignored. Strong tuning to the sampled motion should therefore coincide with absent or weak tuning for the ignored motion stimulus, thus predicting a negative correlation between the two. Under parallel sampling, in contrast, the two component motion events are sampled concurrently. Their respective tuning strengths should be independent, thus predicting no correlation. The parallel model is also consistent with positive correlations between component motion signals, as random fluctuation in attentional states would affect processing of both. Figure 4C shows pairwise correlations between the simulated accumulated tuning for all three motions across the three accumulation types. As hypothesized, there was a strong negative correlation between the tuning noise for the left and the right component motion signals under the serial assumption and no correlation between component motion signals under the parallel assumption. In addition, the serial model predicts only a weak positive correlation between tuning to the average motion and tuning to either the left or the right component motion signal, whereas the parallel model predicts strong positive correlations.
To test these predictions, pairwise tuning-noise correlations were computed for the observed EEG data (Fig. 4C, rightmost panel). Mimicking the simulated parallel model, these analyses revealed no correlations between tuning to the left and right component motion signals and strong positive correlations between tuning to the average and the two component motion signals. In contrast, the noise correlations in patterns of brain activity were inconsistent with the predictions of the two serial models (5 and 10 Hz).
Discussion
We characterized the neural mechanisms that support integration of spatially discrete motion signals in the service of a single, integrated decision in human observers. In contrast to typical perceptual decision-making tasks, our task used a large range of motion directions and required a precise estimate of the average of two spatially discrete signals shown concurrently. To avoid floor effects for the averaging task, we used high motion coherence stimuli in the peripheral RDKs. The behavioral data showed that observers performed the task well. Previous psychophysical and brain imaging studies in humans (Anderson and Burr, 1987; Huk et al., 2002; Born and Bradley, 2005) have found that the extent of the visual field over which motion signals are integrated in the sensory visual cortex is relatively small, subtending 8–10 dva. As our peripheral stimuli were positioned 16 dva apart, it is unlikely that observers’ behavioral choices relied on a single population of sensory neurons that aggregated input from both patches.
Functional brain activity during integrative decision-making was characterized by robust feature-specific responses to both component motion signals and their average. The onset of the feature-specific responses was comparable with other neural correlates of decision-making, such as the CPP (O’Connell et al., 2012; Twomey et al., 2016; Rangelov and Mattingley, 2020), suggesting that the population tuning strength reflects the dynamics of evidence accumulation. This onset was sufficiently early (∼200 ms) to eliminate the possibility that motion tuning reflected response preparation. Moreover, motor preparation predicts a monotonic increase in tuning until response onset (McIntyre et al., 2022), contrary to what we observed. Population tuning to the average unfolded concurrently with tuning to the component motion signals, and this occurred at a similar rate, but tuning to the average reached a significantly higher asymptote. As lower variability for motion directions per bin may have yielded higher tuning strengths independently of the analyzed motion, it is possible that the stronger tuning observed for the average than for the component motion signals was a consequence of lower variability for the average than for the component motions (circular SD = 0.31 vs 0.43 π). The simulation analyses, which were not impacted by the variability per bin, also revealed a higher asymptote for the average than for the component signals, showing that different mechanisms are consistent with our findings. Further studies are needed to adjudicate between these alternatives.
Hypothetically, tuning to the average motion direction might result from a passive pooling of motion signals across the whole visual field. This seems unlikely, however, because we have previously shown that when two overlapping RDKs are presented at fixation and only one is task relevant, the brain represents only the task-relevant motion stimulus, as if the task-irrelevant stimulus was not present in the display (Rangelov and Mattingley, 2020; Rangelov et al., 2021). These results strongly suggest that population tuning analyses primarily track task-relevant representations rather than passive pooling of sensory input across the visual field.
Due to the high motion coherence we employed, evidence accumulation for component motion signals might have unfolded serially and finished within the first 200 ms of signal onset. Such a rapid, serial accumulation process, however, seems unlikely for two reasons. First, we presented two concurrent patches in the periphery, thus rendering evidence accumulation more difficult for participants. Second, our participants had to reproduce the average motion direction which demanded a precise representation of component motions and, arguably, prolonged evidence accumulation. Moreover, if computation of the average motion direction was completed within the first 200 ms, it is unclear why the brain should maintain separate representations of the component motion signals and their average throughout the course of the trial, as our results show. Finally, a rapid serial accumulation would still predict negative correlations between decoding of component motion signals, which is inconsistent with the results of our analyses. In the context of the present study and the data-analytical framework we used, our findings render a fast, serial evidence accumulation process unlikely.
While the limited spatial resolution of EEG is not optimal for spatially restricted tuning analyses, its superior temporal resolution was essential for characterizing the time course of integrated decision-making, which was the focus of the present study. Future studies using functional magnetic resonance imaging or magnetoencephalography, which have a superior spatial resolution, could focus instead on revealing which specific neural areas are involved in representing raw, unfiltered sensory evidence and which areas represent integrated perceptual decisions. Hypothetically, integration could occur in motion-tuned sensory cortex, e.g., V5/hMT+ (Newsome et al., 1989), in associative brain areas, e.g., parietal cortex (Shadlen and Kiani, 2013), or even in motor areas, e.g., primary and supplementary motor cortex (Purcell et al., 2010). Given the characteristics of our displays, and considering the typical receptive field size of motion-sensitive visual neurons in the human brain (Anderson and Burr, 1987), it is unlikely that integration occurred in sensory areas, at least not during the initial feedforward sweep of visual information. Integration in lateralized sensory cortices would predict a lag between representations of the component signals (left, right) and the integrated representation (average), reflecting the lag involved in transmitting signals between the cerebral hemispheres (Saron and Davidson, 1989). We found no such lag in our EEG recordings suggesting that representations of the component motion signals might be integrated at the level of associative and/or motor brain regions.
A defining feature of our task is that participants had to sample and integrate evidence from two discrete locations in space. The absence of negative correlations for neural processing of different component motion events suggests that participants sampled sensory input in parallel between visual hemifields. Existing research suggests that parallel sampling across hemifields is possible. For example, monitoring discrete signals is easier between visual hemifields than within hemifields (Strong and Alvarez, 2018; Bland et al., 2020). The integration of discrete signals that are sampled in parallel could take place in at least two different ways. In one, the sampled evidence could be accumulated into two different decision variables, one for each RDK, until a decision about the direction of each component motion is reached. On this account, the average motion direction would be computed only after both component motion signals have been processed, predicting a delay for the average relative to the component representations. Our findings are inconsistent with this prediction. Alternatively, the sampled evidence could be accumulated in parallel into a single decision variable as if the two RDKs were virtually shifted so that they overlapped in space. This account predicts that representations of the two component motion signals and their average should overlap in time, as we observed (Fig. 3A). Overall, our findings are consistent with the parallel integration model.
A recent study addressed a related but distinct aspect of perceptual decision-making. Kang et al. (2021) had their participants judge different attributes of a single RDK stimulus, such as its color or motion direction. In contrast to the results presented here, they found that evidence accumulation for these different attributes takes place serially. Crucially, however, in their task the same stimulus feature (e.g., blue) was mapped to different behavioral responses, raising the possibility of a response selection bottleneck (Pashler, 2000), rather than true serial evidence accumulation. In contrast, here we held stimulus–response mapping constant throughout and allowed observers ample time to rotate the response dial to its desired position. More broadly, it is possible that the computational mechanisms that regulate integration of sensory evidence within attributes (e.g., across two motion signals) are different from those that control integration between attributes (e.g., across motion and color). This is likely to be an important distinction for future studies.
Analyses of behavioral data revealed a negligible impact of angular distance between component motion signals on participants’ representation of the average motion direction. It is likely that the high level of motion coherence used in the present study rendered the task relatively easy even at large angular distances. Hypothetically, the preferred integration mechanism might change as signal strength decreases. To alleviate performance costs, for example, participants might switch to serial integration instead of using parallel processing. While future work could focus on characterizing integration mechanisms at lower signal strengths, the current study demonstrates that parallel evidence accumulation is possible and indeed preferred, provided signal strength is sufficiently high.
Taken together, our results suggest that sensory inputs from the two hemifields are sampled in parallel and accumulated into a single decision variable. This is a striking feature of decision-making mechanisms because previous studies have shown that, when integration is not necessary (Wyart et al., 2015; Rangelov and Mattingley, 2020; Rangelov et al., 2021), evidence accumulation can successfully disregard concurrent but task-irrelevant signals. The neural mechanisms that support dynamic adjustments in contributions of discrete inputs to evidence accumulation probably rely on dynamic reconfigurations of synaptic weights (Duncan, 2001; Duncan and Miller, 2002) connecting different brain regions. Future studies should focus on characterizing the role of distinct functional neural pathways in evidence accumulation processes.
Footnotes
This work was supported by Australian Research Council (DP220104008) and National Health and Medical Research Council (APP1186955).
The authors declare no competing financial interests.
- Correspondence should be addressed to Dragan Rangelov at d.rangelov{at}uq.edu.au.