Abstract
Categorization is an essential cognitive and perceptual process for decision-making and recognition. The posterior parietal cortex, particularly the lateral intraparietal (LIP) area has been suggested to transform visual feature encoding into abstract categorical representations. By contrast, areas closer to sensory input, such as the middle temporal (MT) area, encode stimulus features but not more abstract categorical information during categorization tasks. Here, we compare the contributions of the medial superior temporal (MST) and LIP areas in category computation by recording neuronal activity in both areas from two male rhesus macaques trained to perform a visual motion categorization task. MST is a core motion-processing region interconnected with MT and is often considered an intermediate processing stage between MT and LIP. We show that MST exhibits robust decision-correlated motion category encoding and working memory encoding similar to LIP, suggesting that MST plays a substantial role in cognitive computation, extending beyond its widely recognized role in visual motion processing.
SIGNIFICANCE STATEMENT Categorization requires assigning incoming sensory stimuli into behaviorally relevant groups. Previous work found that parietal area LIP shows a strong encoding of the learned category membership of visual motion stimuli, while visual area MT shows strong direction tuning but not category tuning during a motion direction categorization task. Here we show that the medial superior temporal (MST) area, a visual motion-processing region interconnected with both LIP and MT, shows strong visual category encoding similar to that observed in LIP. This suggests that MST plays a greater role in abstract cognitive functions, extending beyond its well known role in visual motion processing.
Introduction
Assigning incoming sensory stimuli into behaviorally relevant categories is essential for recognizing the significance of sensory information and generating task-appropriate behavioral responses. Previous studies, particularly those based on delayed match-to-category (DMC) paradigms (Freedman et al., 2001; Freedman and Assad, 2006), have shown that several cortical areas, including the prefrontal cortex (PFC; Freedman et al., 2001; Cromer et al., 2010; Swaminathan and Freedman, 2012; Roy et al., 2014; Sarma et al., 2016), posterior parietal cortex (PPC; Freedman and Assad, 2006; Fitzgerald et al., 2011; Swaminathan and Freedman, 2012; Swaminathan et al., 2013; Zhou et al., 2021), and inferior temporal cortex (Freedman et al., 2003; Meyers et al., 2008), are involved in the visual categorization process. Recently, the lateral intraparietal (LIP) area (a subdivision of PPC) was shown to play a causal role in perceptual and categorical decisions about visual motion stimuli (Zhou and Freedman, 2019). LIP also shows categorical encoding that is stronger, shorter in latency, and more strongly decision correlated than PFC activity (Swaminathan and Freedman, 2012). A previous study found that the middle temporal (MT) area, a visual motion-processing area upstream from LIP, shows strong direction encoding, but not abstract categorical encoding, of visual motion during the same visual motion categorization task (Freedman and Assad, 2006). How motion direction encoding in MT is transformed into more cognitive, categorical encoding in downstream areas, such as LIP, remains unclear. One possibility is that this transformation is achieved by plasticity in the synaptic connections between MT and LIP neurons as a result of categorical learning. Alternatively, other brain areas may play a role in mediating this transformation. Here we address this question by directly comparing the roles of LIP and MST, an important parietal motion-processing area that is reciprocally connected with both LIP and MT (Andersen et al., 1990; Blatt et al., 1990), in visual motion categorization to understand how sensory encoding of visual motion is transformed into flexible, learning-dependent, and task-related categorical representations.
MST has been identified as an important motion-processing area within the dorsal visual pathway. MST is involved in the perception of both 2D and 3D visual motion patterns, and MST neurons typically have large receptive fields (RFs) with responses that are selective for both simple motion stimuli as well as complex motion patterns such as optic flow stimuli as are generated by one's own motion through the visual world (Saito et al., 1986; Sakata et al., 1986; Duffy and Wurtz, 1991; Celebrini and Newsome, 1994; Graziano et al., 1994; Bradley et al., 1996; Geesaman and Andersen, 1996; Britten, 2008). It has also been suggested that MST is involved in transforming spatial information between different reference frames during self-motion (Andersen et al., 1997; Fetsch et al., 2007). Moreover, dorsal MST (MSTd) also integrates visual and vestibular signals (Gu et al., 2007, 2008, 2010, 2012; Yu and Gu, 2018). However, contributions of MST to cognitive functions have been less studied compared with other PPC areas such as LIP and 7a (Andersen et al., 1997; Williams et al., 2003; Maimon and Assad, 2006). One recent study found striking motion direction selectivity in MST during the delay period of a delayed match-to-sample task (Mendoza-Halliday et al., 2014), whereas selective delay period activity was not observed in MT spiking activity in that study or in recordings from MT during a visual motion categorization task (Freedman and Assad, 2006). Furthermore, MST neurons have been shown to remain active during pursuit eye movements when tracking a stimulus across a brief blank of the target, indicating extraretinal influences on MST activity (Newsome et al., 1988). A previous study using a delayed match-to-sample task with ambiguous motion stimuli, which compared neural activity between MST and LIP, did find encoding of extraretinal information that reflected the trial-by-trial reports of monkeys about the ambiguous stimuli in MST, but that encoding was weaker in comparison with LIP (Williams et al., 2003).
Here we directly compared neural activity between MST and LIP while monkeys performed a visual motion DMC task in which they needed to categorize a sample stimulus, maintain category information in short-term memory, and determine whether a subsequent test stimulus was a categorical match to the sample. We found that MST neurons showed significant motion category encoding during the stimulus presentation and memory delay periods of the DMC task, qualitatively similar to that observed in LIP. Also similar to LIP, MST category encoding was correlated with the trial-by-trial categorical decisions of monkeys, revealed by comparing category selectivity on correct versus error trials. However, our analysis of the timing of category encoding and how that encoding correlated with the categorical decisions of monkeys suggests that LIP is more closely involved in the categorical decision process compared with MST. In summary, our results show that MST encodes abstract task-related factors such as categorical decisions and working memory, going beyond its traditionally recognized role in visual motion processing. This also gives insight into the functional roles of hierarchically interconnected PPC subregions in perceptual and cognitive functions.
Materials and Methods
Behavioral task and stimulus display
The DMC task is similar to the tasks reported previously except that two near-boundary directions were added into each category. In this task, monkeys were trained to release a touch-bar when the categories of sequentially presented sample and test stimuli matched or to hold the touch-bar when the sample and test categories did not match. Stimuli consisted of 10 motion directions (15°, 35°, 55°, 75°, 135°, 195°, 215°, 235°, 255°, 315°) grouped into two categories separated by a learned category boundary oriented at 45° (Fig. 1b). There were six directions that were evenly spaced (60° apart) and either 30° or 90° from the boundary (the “far” condition), in addition to four directions (two per category) that were 10° from the boundary (the “near” condition). Trials were initiated by the monkey holding the lever and keeping central fixation. Monkeys needed to maintain fixation within a 2.5° radius of a fixation point through the trial. After 500 ms of gaze fixation, a sample stimulus was presented for 650 ms, followed by a 1000 ms memory delay and a 650 ms test stimulus. If the categories of the sample and test stimuli matched, monkeys needed to release a manual touch-bar within the test period to receive a juice reward. Otherwise, monkeys needed to hold the touch-bar during the test period and a second delay (150 ms) period and wait for the second test stimulus, which was always a match, and then release the touch-bar. Therefore, monkeys concluded all trials with the same motor response (touch-bar release). The motion stimuli were full-contrast, 9° diameter, random-dot movies composed of 190 dots/frame that moved at 12°/s with 100% coherence. Task stimuli were displayed on a 21 inch color CRT monitor (resolution, 1280 × 1024; refresh rate, 75 Hz; viewing distance, 57 cm). Identical stimuli, timing, and rewards were used for both monkeys in all LIP and MST recordings. All 10 motion directions were used for sample stimuli during all the recording sessions. Instead of using the set of 10 sample motion directions also as test stimuli, for Monkey Q we used a reduced set of directions that excluded the four directions closest to the category boundary. By contrast, Monkey M used the same set of 10 motion directions as sample and test stimuli. The reduced set of test stimuli for Monkey Q were used to facilitate the training of the animal on the task. Eye positions of monkeys were monitored by an EyeLink 1000 optical eye tracker (SR Research) at a sampling rate of 1 kHz and stored for offline analysis. Stimulus presentation, task events, rewards, and behavioral data acquisition were accomplished using an Intel-based PC equipped with MonkeyLogic software running in MATLAB (http://www.monkeylogic.net).
Electrophysiological recording
Two male monkeys (Macaca mulatta; weight range, 8–12 kg) were implanted with a head post and recording chambers positioned over PPC. Stereotaxic coordinates for chamber placement were determined from magnetic resonance imaging (MRI) scans obtained before chamber implantation. We accessed LIP and MST from the same PPC chamber, which was positioned over the intraparietal sulcus. For MST recordings, we aimed to target MSTd rather than lateral MST (MSTl), since MSTd is more likely to be involved in cognitive functions, compared with MSTl. The chamber for Monkey M was centered using a stereotax at 10 mm lateral to the middle sagittal line, and 3 mm posterior to the interaural line, while the chamber for Monkey Q was positioned more laterally (centered at 13 mm lateral to the middle sagittal line, and 2.5 mm posterior to the interaural line) to gain more access to MSTd. Both chambers sit perpendicular to the horizontal plane. All experimental and surgical procedures were performed in accordance with the University of Chicago Animal Care and Use Committee and National Institutes of Health guidelines. Monkeys were housed in individual cages under a 12 h light/dark cycle. Behavioral training and experimental recordings were conducted during the light portion of the cycle.
LIP was located according to the patterns of neuronal activity and selectivity in each monkey, particularly during the memory-guided saccade (MGS) task (e.g., spatially selective visual responses and persistent activity during the delay period). All neurons included in the analysis were recorded from the same grid holes and similar depths where we encountered spatial selectivity in the MGS task (mostly 4–8 mm below the surface of the cortex). We also identified LIP neurons based on anatomic criteria, such as the location of each electrode track relative to that expected from the MRI scans (Fig. 1e), and the pattern of gray matter–white matter transitions (activity-silence-activity patterns) encountered on each electrode penetration.
MST was located based on anatomical criteria taken from pre-surgical MRI images (Fig. 1e), the patterns of gray matter–white matter transitions and sulci encountered during electrode penetrations, and assessing the visual responsiveness of neurons to visual motion stimuli presented during passive fixation. For both monkeys, we first identified LIP using the criteria described in the above paragraph. In both monkeys, MST was located 2–6 mm lateral and 0–4 mm posterior relative to the center of LIP. MST neurons were recorded from a region that was 12–17 mm (Monkey M, 14–17 mm; Monkey Q, 12–16 mm) lateral to the midline and 2–6 mm posterior to the interaural plane. The depth of MST below the cortical surface (the depth at which we first encountered spiking activity) spanned a range of 2–10 mm in both monkeys (mean depth: Monkey M, 5.5 mm; Monkey Q, 5.9 mm). These coordinates are consistent with the coordinates reported in previous macaque MST studies (Gu et al., 2006, 2007).
MST was also identified based on the patterns of gray matter–white matter transitions along electrode penetrations. MST recordings focused on the second gray matter layer (i.e., patch of obvious neural activity) encountered after passing through the dura mater. This layer was the first patch of neurons that was clearly visually responsive and direction selective to visual motion stimuli presented while the monkey passively fixated, consistent with the first patch being surface cortex just beneath the dura (potentially parietal area 7a).
Although we did not collect quantitative data regarding receptive field properties, we did assess neurons for visual responsiveness by listening to amplified neural activity on the audio speaker and visualizing rasters of isolated neurons while mapping locations within the recording chamber while searching for MST during 1–2 weeks of mapping before starting the main experiment. During sessions while we lowered electrodes into the superior temporal sulcus and searched for MST, monkeys passively fixated on a central spot for 3.1 s, while visual motion stimuli (random dot motion patterns of 100% coherence consisting of four unique directions and sizes ranging from 3° to 10°) were shown in a range of locations on the display (including the contralateral and ipsilateral visual fields). We also tested large (whole screen) visual motion stimuli which were expanding or contracting. Consistent with criteria used in other MST studies (Geesaman and Andersen, 1996; Gu et al., 2006), MST neurons were identified during mapping sessions as those in the anatomical locations described above that satisfied the following criteria or were close to (i.e., within the same stretch of gray matter) neurons that did so: responsive to large (∼10°) visual motion stimuli in the contralateral and ipsilateral visual fields, responsive to whole-screen expansion or contraction, and showed little or no modulation during the memory-guided saccade task. Because we did not quantitatively map MST RFs, we cannot conclude whether neurons were located in MSTd versus MSTl.
On some mapping sessions, we lowered the electrode beyond MST, encountered 1–2 mm of silence, followed by an additional patch of gray matter that was strongly visually responsive to visual motion stimuli (likely MT). These neurons showed strong and obvious responses to small (3°) visual motion stimuli in the contralateral visual field, but responded less to whole-screen expanding or contracting motion during passive viewing, and tended to respond to stimuli in a smaller region (e.g., within one quadrant of the display) compared with the MST recording sites.
Blocks of LIP and MST recording sessions were alternated in each monkey to reduce the influence of timing and/or training on the neuronal responses and behavior of the monkeys. In Monkey M, 52 LIP recordings sessions were followed by 90 MST sessions and an additional 15 LIP sessions. For Monkey Q, we first recorded 24 LIP recording sessions followed by 25 MST sessions, then conducted 8 LIP sessions followed by 21 MST sessions, and finally recorded 6 LIP sessions followed by 7 MST sessions.
The recording equipment and procedures were the same as in the previous studies (Swaminathan and Freedman, 2012; Swaminathan et al., 2013; Zhou and Freedman, 2019). All recordings from Monkey M were conducted using single 75 μm tungsten microelectrodes (FHC), while most of the recordings from Monkey Q were conducted using 16 channel (Plexon) linear v-probes after identification of the locations of brain areas using single-channel recordings. Neurophysiological signals were amplified, digitized and stored for offline spike sorting (Plexon) to verify the quality and stability of neuronal isolations.
Receptive field mapping and stimulus placement
Most LIP neurons as well as some MST neurons were tested with an MGS task before the DMC task. LIP neurons were identified by spatially selective visual or persistent activity during the MGS task during single-channel recordings. During multichannel recordings, we included all the neurons recorded from the same grid locations and similar depths where we recorded spatially selective persistent activity neurons. Before recording during the DMC task, MST neurons were tested with an MGS (eight target locations; typical eccentricity, 8–10°) task, as well as tested with a set of visual motion stimuli (whole screen expansion–contraction, and the linear motion stimuli used in the DMC task). Most of the MST neurons recorded by single electrode, and some of the MST neurons recorded by multichannel linear array, were responsive to visual motion patterns and showed little or no modulation during the MGS task. For single-channel recording in MST, we recorded only from neurons that showed activity modulation during the DMC task (but were not necessarily direction selective). We included all the neurons recorded from the locations that were identified as MST for multichannel recording.
The placement of motion stimuli differed slightly between single-channel recording and multichannel recording sessions. For single-channel recording, motion stimuli for the DMC task were always placed in RFs of LIP neurons, or at locations in the contralateral visual field that evoked the maximum visual response of MST neurons to the motion stimuli. The typical eccentricity of stimulus placement was ∼6.0–10.0°. For each multichannel recording session, we first identified one stimulus or task-responsive neuron according to the above criteria, and then placed the motion stimuli according to the RF of the identified neuron.
Data analysis
Behavioral performance quantification.
For both LIP and MST recording sessions, we computed the behavioral accuracy of each monkey for each sample direction, as shown in Figure 1c. We did this by computing the proportion correct on each session for each of the 10 motion directions and then computed the mean and SD across sessions from MST (Monkey Q, 53 sessions; Monkey M, 82 sessions) and LIP (Monkey Q, 38 sessions; Monkey N, 62 sessions). To compare the accuracy between near and far conditions, we first pooled the fix nonboundary directions and the four boundary directions in each session to calculate the mean accuracy for far and near conditions for each session. We then compared the near versus the far accuracies over sessions separately for each monkey as well as separately according to which brain area was targeted for recordings on each session.
Pre-analysis neuron screening.
We identified well isolated singe units in both LIP and MST that showed task-related activity in the DMC task, using the following criteria: (1) the maximum averaged firing rate during at least one of the four different task periods (sample period, earlier delay period, later delay period, and test period) should not be <1 spike/s; and (2) the activity should exhibit at least one kind of task-related modulation (e.g., sample category selectivity, test category selectivity, sample direction selectivity, test direction selectivity, two-way nested ANOVA, p < 0.01) during one of the four task periods, or the mean activity during at least one of the four task periods should be significantly different from the baseline activity (fixation period). In total, 326 LIP neurons and 571 MST neurons were included for further analysis. All the peristimulus time histograms of example neurons were smoothed using a causal filter.
ROC-based category tuning index.
We used the receiver operating characteristic-based category tuning index (rCTI) measurement to quantify the category selectivity, which was described in detail in our previous work (Swaminathan et al., 2013) and was defined as follows:
rCTI = BCD – WCD,
WCD = (2*|ROC(75,195) – 0.5| +|ROC(135,195) – 0.5| + |ROC(75,135) – 0.5| + 2*|ROC(255,15) - 0.5|
+ |ROC(315,15) – 0.5| +|ROC(255,315) – 0.5| + 2*|ROC(55,215) – 0.5| + |ROC(55,75) – 0.5|
+ |ROC(195,215) – 0.5| + 2*|ROC((35,235) – 0.5| + |ROC(35,15) – 0.5| + | ROC(255,235) – 0.5|)/16;
BCD = (2*|ROC(75,15) – 0.5| + |ROC(75,315) – 0.5| + |ROC(135,255) – 0.5|+ | ROC(135,15) – 0.5|
+ 2*| ROC(195,255) – 0.5| + |ROC(195,315) – 0.5| + 2*|ROC(55,35) – 0.5| + |ROC(55,255) – 0.5|
+ |ROC(75,235) – 0.5| + 2*|ROC(215,235) – 0.5| + |ROC(195,35) – 0.5|+ |ROC(215,15) - 0.5|)/16.
Identifying category-selective neurons.
We performed a shuffle analysis to determine whether a neuron showed significant category selectivity, by determining whether the rCTI value of this neuron was significantly above chance level. To obtain a null distribution (chance level), we shuffled the direction labels of trials within each session to calculate the rCTI and bootstrapped for 500 times. The rCTI value was determined as being statistically significant if it was >99% of values from the null distribution. We applied this method to the mean activity within each time bin during the task period spanning from 50 ms after sample onset to 200 ms after test onset (bin size, 300 ms; six time bins in total). Neurons were identified as category selective if their rCTI value was greater than the significant threshold in at least one time bin.
Classification of “pure direction-selective” neurons.
We identified the pure direction-selective neurons according to the following criteria: (1) the neuron should be not identified as category selective by the above criteria; and (2) there was significant difference between activity to the different motion directions (one-way ANOVA, p < 0.01).
Determining the latency of category selectivity.
For each neuron, we defined the threshold of significant category selectivity based on rCTI value. We set the rCTI value, which was three times SD above the baseline rCTI values (calculated during the fixation period using 100 ms bin size stepped by 5 ms), as the threshold. The latency of category selectivity was defined as the middle time point of the first time bin at which the rCTI value exceeded the threshold for at least two consecutive time bins.
Support vector machine decoding.
Similar to previous studies (Swaminathan et al., 2013; Sarma et al., 2016), we used linear support vector machine (SVM) classifiers to separately decode sample direction and category from a pseudopopulation from the two cortical areas. Activities from different neurons in one cortical area were treated as if they were recorded simultaneously although neurons were mostly recorded separately. In training the SVM classifier, a hyperplane that best separates the trials belonging to four (two for each category classifier) or five (direction classifier) different classes was determined. Each class corresponds to one type of category identity for category classifiers and one motion direction for direction classifier. As in previous studies (Sarma et al., 2016), we wanted to eliminate the contribution of direction selectivity and category selectivity to the performances of the category classifiers and directions classifiers, respectively.
Therefore, to train and test the category classifier, we separated our trials into four groups based on the motion directions. As an example, neural responses to two directions from category 1 and two directions from category 2 were used to train the classifier. The classifier was then used to test the category memberships of another four directions that were far away from the training directions. The two directions that were close to the training directions were left out. A second category classifier was constructed by switching the training and testing directions. Similarly, the third and fourth category classifiers were constructed using different directions in right panel. Then, the performances of these four classifiers was averaged. In this case, we minimized the contribution of the direction selectivity into the performance of the category classifier. For the direction classifier, we trained and tested two classifiers using the directions within each category separately and then averaged the performances of the two classifiers. This eliminated the contribution of category selectivity to the decoding performance of the direction classifiers.
Decoding was applied to the mean firing rates of neurons within a 100 ms sliding window (10 ms step). For each neuron, we randomly selected 66% of trials to train the classifier and left the other 34% of trials for testing. We then randomly sampled, with replacement, 160 trials from the training set and 80 trials from the testing set for bootstrapping. For each iteration of the bootstrap, we randomly selected 100 neurons with replacement from the neuronal population in each area to perform the analysis. To reduce the potential confound caused by uneven numbers of trials of different motion directions, a minimum number of trials for each motion direction was required for random sampling (10 and 5 trials, respectively, for training and testing data of each direction). We bootstrapped all decoding analyses 200 times.
Partial correlation analysis.
A partial correlation analysis was performed that was similar to those in previous studies (Zhou and Freedman, 2019). For each trial during the DMC task, we obtained three parameters [i.e., the category identity of sample stimuli, the neuronal activity, and the categorical choice of the monkeys (deduced from the behavior of the monkeys), for the calculation]. The stimulus category was assigned with the following different values for different motion categories: 1 and −1 are used for category 1 and category 2, respectively. Different categorical choices were also coded as different values (1 for choosing category 1, −1 for choosing category 2). The following two measures were then calculated: r stimulus = r(neuronal activity, stimulus category| choice category), the partial correlation between neuronal activity and stimulus category, given the categorical choices of the monkeys; and r categorical-decision = r(neuronal activity, categorical choice | stimulus category), the partial correlation between neuronal activity and the categorical choices of the monkeys, given the stimulus category. The partial correlation analysis was applied on the average firing rate of each neuron in a 100 ms window, advanced in 10 ms steps (see Fig. 6). To perform this analysis, we only included the trials in which the sample motion directions were close (10°) to boundary directions but the test directions were far from the boundary. This is because there were enough errors for these trials, and the errors on these trials were most likely because of miscategorizing the sample stimulus that was close to the boundary. We used the absolute value of r-stimulus and r-categorical decision, to track categorical encoding regardless of the categorical preference of each neuron across the trial, since some individual neurons showed different preferred categories in different task epochs.
Local field potential analysis.
All the recording channels in which there was at least one task-related neuron recorded were included in the analysis in the current study. We excluded trials or sessions with artifacts in the local field potential (LFP), such as those in which there were many time points (≥50, sampling rate = 1000 Hz) for which the LFP amplitude was clipped because of saturation of the amplifier or a mismatch in dynamic range between the amplifier and recording system (<1% of the trials). The LFP signal was prefiltered by a bandstop filter (Butterworth filter, 59–61 Hz) to remove power-line noise, and then z-scored for each recording channel in each recording session. We then used a MATLAB-based multitaper analysis toolbox (chronux; Mitra and Bokil, 2008) to analyze the power frequency spectra. The spectrograms were estimated using the LFPs within a 300 ms time window stepped by 10 ms.
To quantify the category selectivity in LFP activity, we first computed the single-trial power spectra of LFPs for each channel. We then averaged power spectra across trials based on the sample category and calculated the difference of power spectra between two sample categories for each channel. We defined the sample categories for which most of the recording channels exhibited higher or lower beta power during the delay period as the preferred or nonpreferred categories for each monkey, respectively. To test whether beta band LFP oscillations were correlated with the categorical decisions of the monkeys, we compared category-selective beta activity between correct and error trials (see Fig. 8). In this analysis, we only included trials in which the test directions were far from the category boundary (for which errors were most likely because of misclassifying the sample). Moreover, we only included electrode channels from the recording sessions during which monkeys made enough errors (Monkey M: more than five trials; LIP, 25 of 55 channels; MST, 50 of 69 channels; Monkey Q: LIP, 159 of 195 channels; MST, 298 of 375 channels) in each sample category condition.
Statistical methods.
To reduce the likelihood of false-positive results, we have applied multilevel analyses for hypothesis testing on data with nested structure (Aarts et al., 2014). Specifically, in Figure 2 (see also Fig. 6), we compared neural activity between LIP and MST by pooling data from two monkeys. Because the activity of neurons from each monkey could be more similar than that of neurons between the two monkeys, we applied the nested ANOVA to the distributions of neuronal activity, using the brain region and monkey as the main and nominal factors, respectively. For all other data analyses, we conducted statistical tests using the behavioral results or neuron activity from each individual monkey. The observations within these datasets were independent, continuous normally distributed (behavioral data) or right skewed (neural activity), in reasonably large sample sizes, and had similar variances between different conditions, which meet the assumptions of either the Wilcoxon rank-sum test or t test. We thus used either Wilcoxon rank-sum test or t test for hypothesis testing in these analyses. Furthermore, we found that distributions of neuronal activity were moderately right skewed. As the skewness was approximately two to three times the SD of the distribution, we transformed the distributions of neuronal activity into normal distributions using the square root transformation (taking the square root of the data value) before applying ANOVA or t test.
To show the neural data in detail, we performed multiple comparisons across time/frequency in Figure 2 (see also Figs. 4, 5, 6, 8). The statistics in these figures were not corrected for multiple comparisons, as the statistical significance reported on those figures is meant as a visual indication of the granularity and fine time scale of the effects. However, the main conclusions in the Results section are based on comparisons of data from a small number of wider time windows, rather than the many individual comparisons across each time point or frequency.
Results
Task and behavioral performance
We trained two male monkeys to perform a visual motion DMC task, in which they needed to decide whether two sequentially presented motion stimuli were from matching or nonmatching categories, which they reported by releasing (match) or holding (nonmatch) a manual touch-bar (Fig. 1a). To solve the task, monkeys needed to first categorize the sample and remember it during the delay to determine whether the upcoming test stimulus was from the same category. The stimulus set consisted of 10 directions of 100% coherence random-dot movies that were assigned to two categories (5 motion directions per category) by an arbitrary category boundary (Fig. 1b). Both monkeys performed the DMC task with high accuracy during both MST and LIP recording sessions (Fig. 1c; Monkey M: MST, 90.9 ± 4.0%; LIP, 87.6 ± 4.3%; Monkey Q: MST, 89.5 ± 2.7%; LIP, 86.7 ± 3.8%), and >70% correct for the near-boundary directions. Monkey accuracies were greater for sample directions that were farther (≥30°) from the boundary compared with the near-boundary directions for both MST and LIP recording sessions (Fig. 1d; near vs far behavioral accuracies: Monkey M: MST, 79.4% vs 98.9%; LIP, 77.7% vs 94.2%; Monkey Q: MST, 80.0% vs 95.9%; LIP, 75.2% vs 94.4%).
Task, behavioral performance, and recording positions. a, Sequence of the DMC task. Monkeys needed to either release a touch-bar when the categories of sample and test stimuli matched or hold the bar and wait for the second test stimulus when they did not match. The yellow dashed circle indicates the position of the receptive field of a neuron. b, Monkeys needed to group 10 motion directions into two categories (corresponding to the red and blue arrows) separated by a learned category boundary (black dashed line). c, Averaged performance (accuracy) of two monkeys for the 10 sample directions during MST and LIP recording sessions are shown separately. Error bars denote the ±STD across sessions. d, The mean accuracies in both near and far conditions. Data from each recording session (connected by a line) is shown separately for each monkey as well as separately according to which brain area was targeted for recording. e, Three example coronal MRI images are shown for both Monkey Q and Monkey M, with each slice showing a different anterior–posterior (A–P) position relative to the interaural plane. MRIs were acquired with a resolution between adjacent coronal slices of 0.7 mm (Monkey Q) and 1.0 mm (Monkey M). The overlaid purple and green marks indicate locations of MST (purple) and LIP (green) recorded neurons at each A–P location.
Category selectivity in MST and LIP
We recorded neuronal spiking activity and LFP signals from MST and LIP (targeting one brain area per session) while monkeys performed the DMC task (Fig. 1e; see Materials and Methods). We recorded 648 single units from MST (Monkey M, 140 units; Monkey Q, 508 units) and 361 from LIP (Monkey M, 89 units; Monkey Q, 272 units). A total of 571 of 648 (86.6%) MST neurons (Monkey M, 105 neurons; Monkey Q, 466 neurons) and 326 of 361 (90.3%) LIP neurons (Monkey M, 78 neurons; Monkey Q, 248 neurons) showed task-modulated responses in the DMC task (see criterion in Materials and Methods) and thus were included in further analyses. A large proportion of neurons recorded from both areas showed significant category encoding in the DMC task [MST: 407 of 571 neurons (71%); Monkey M, 68 neurons; Monkey Q, 339 neurons; LIP: 203 of 326 neurons (62%); Monkey M, 58 neurons; Monkey Q, 145 neurons; see criterion in Materials and Methods]. Figure 2a–c shows three example MST neurons. The first one (Fig. 2a) showed binary-like responses to the sample category during the sample, delay, and test periods of the DMC task; while the second neuron (Fig. 2b) showed significant category encoding mainly during the late sample and delay periods. The third MST neuron (Fig. 2c) showed relatively weaker category selectivity during the sample period, which was maintained into the delay. Figure 2d–f shows three example LIP neurons that encoded the sample categories during the sample, delay, and test periods.
Sample category selectivity in MST and LIP during the DMC task. a–c, Three example sample category-selective MST neurons. Average activity for each sample direction is plotted as a function of time. Different colors represent different sample categories, and different shades indicate the angular distance from the boundary with the brightest color indicating the center direction of each category. The first and second dashed vertical lines denote the time for sample stimulus onset and offset, respectively, while the third dashed vertical line indicates the time of test stimulus onset. d–f, Three example neurons from LIP are shown in the same format as a, b, c. g, A category tuning index (rCTI) shows the magnitude and time course of sample category selectivity. Shaded area denotes ±SEM. The black stars indicate the time points for which there was a significant difference between MST and LIP (nested ANOVA, p < 0.05). h, The distributions of category selectivity latencies for category-selective neurons in MST and LIP. Note that the values in the distribution represent the latency at which each neuron first showed category selectivity. The inset above shows the cumulative distributions of category selectivity latencies only for neurons that were category selective during the sample period. i, Population-level sample category selectivity in MST (pink) and LIP (green) is measured using an SVM classifier. Time course of sample category classification accuracy in both brain areas is shown as a function of time. Shaded area denotes ±SD.
To characterize the strength and time course of neuronal sample category encoding across the MST and LIP populations, we calculated an rCTI, which we have used previously (Swaminathan et al., 2013), for each single neuron recorded from both areas. The rCTI quantifies the category selectivity of a neuron by comparing neuronal discriminability between pairs of directions in the same versus different categories (see Materials and Methods). Index values can range from −0.5 to +0.5 with more positive values indicating stronger and more reliable category selectivity. Although individual category-selective neurons in both LIP and MST could show large rCTI values (e.g., the neurons in Fig. 2b and 2d, showed rCTI values in sample or delay > 0.3), the mean rCTI values across all task-modulated neurons were more modestly, but statistically significantly, shifted toward positive values during the sample, delay, and test epochs as shown in Figure 2g. This indicates that both MST and LIP showed significant category encoding during all periods of the DMC task. Comparing the two areas, LIP showed greater rCTI values during the early sample period (Fig. 2g; from 125 to 325 ms after sample onset; LIP vs MST: median, 0.0081 vs 0.0054; mean, 0.0257 vs 0.0178; p = 1.2 × 10−5, nested ANOVA; main factor, brain region; nominal factor, monkey; see Materials and Methods), while MST showed significantly greater category encoding during the late sample and early delay periods (Fig. 2g; −165 to 200 ms relative to sample offset; LIP vs MST: median, 0.0068 vs 0.0103; mean, 0.0185 vs 0.0305; p = 0.0075, nested ANOVA).
The latency of category selectivity for each neuron was assessed by using the time course of the rCTI. It was defined as the middle time point of the first time-bin at which the rCTI value exceeded 3 SDs above the baseline rCTI values for two consecutive time bins (see Materials and Methods). This revealed that the distributions of category selectivity latencies (across all the category-selective neurons) were overlapping between two areas (Fig. 2h; p = 0.81, nested ANOVA). To examine whether either brain area plays a leading role in categorizing the sample stimulus, we compared the timing of rCTI values among those neurons that were category selective during the sample period (0–650 ms after sample onset). We found that category encoding tended to emerge with a shorter latency in LIP than MST (Fig. 2h; median: LIP vs MST, 175 vs 223 ms; p = 0.0216, nested ANOVA). Furthermore, a significantly higher proportion of neurons in LIP than MST started to show significant category selectivity in the early sample period (50–350 ms following sample onset; LIP, 140 of 164 neurons; MST, 212 of 306 neurons; p = 1.2 × 10−4, χ2 = 14.9, χ2 test), and this trend was reversed during the late sample period where a greater fraction of MST than LIP neurons started to show category selectivity (351–650 ms after sample onset: LIP, 24 of 164 neurons; MST, 94 of 306 neurons).
We next assessed category information at the population level using linear classifiers [SVM (applied to neuronal pseudopopulations) trained to decode the sample category; see Materials and Methods)]. To mitigate the contribution of the category-independent direction tuning to category decoding performance of the classifier, we trained and tested the classifier using different groups of directions in each category separated by similar angular distances as in previous studies from our group (Swaminathan et al., 2013; Sarma et al., 2016). As shown in Figure 2i, we found significant sample-category decoding in both MST and LIP in the sample, delay, and test periods. To visualize how sample category information transitioned into and was maintained during the delay period, we evaluated the stability of category encoding across all DMC task periods by training and testing the category classifier using neural data from different time points in the trial. For example, if a classifier trained using neuronal data from one task period (e.g., sample period) showed high decoding performance when tested at another task period (e.g., delay period), this would indicate stable patterns of neuronal encoding between those task periods. Figure 3a and 3b, shows the results from MST and LIP, respectively.
Stability of category selectivity in MST and LIP. a, The stability of MST sample category encoding was determined by training (y-axis) and testing (x-axis) the classifier at different time points in the trial. Classification accuracy is indicated by the color at each x–y coordinate. b, The stability of sample category encoding in LIP.
We identified 68 MST neurons and 56 LIP neurons that were direction tuned, but did not show obvious sample category selectivity (see criterion in Materials and Methods; Fig. 4a–d, example neurons). To assess category-independent motion direction encoding (i.e., the ability to discriminate between directions in the same category) at the population level in both MST and LIP, we trained SVM classifiers to decode the sample direction within each category (see Materials and Methods). This revealed substantial category-independent direction decoding in both MST and LIP primarily during the sample period among task-modulated neurons (LIP, 326 neurons; MST, 571 neurons; Fig. 4e). This sample direction selectivity was not significantly different between LIP and MST (bootstrap, p > 0.3). While there were many category-selective neurons that showed direction selectivity that appeared unrelated to the learned categories, there were also direction-selective neurons that did not show obvious category selectivity. We assessed direction selectivity among these neurons that were not category selective (LIP, 123 neurons; MST, 164 neurons; see Materials and Methods). These neurons showed significantly stronger motion direction selectivity in MST than LIP during the sample period of the DMC task (Fig. 4f; 50–650 ms after sample onset, p <0.01, bootstrap).
Motion direction encoding in MST and LIP. a, b, Two example neurons from MST showed direction selectivity, but not obvious category selectivity, in the DMC task. Averaged activity for each sample direction is plotted as a function of time. Different colors represent different sample categories, and different shades indicate the angular distance from the boundary, with the brightest color indicating the center direction of each category. The first and second dashed vertical lines denote the time for sample stimulus onset and offset, while the third dashed vertical line indicates the time of test stimulus onset. c, d, Two example neurons from LIP are shown in the same format as in a and b. e, f, Population level sample direction selectivity in MST (pink) and LIP (green) in the DMC task. Time course of sample direction classification accuracy in MST (pink) and LIP (green) is measured using an SVM classifier. e, Classification accuracy of sample direction using all task-modulated neurons. The shaded area denotes ±SD. f, Classification accuracy of sample direction using task-modulated neurons that were not category selective. These neurons showed greater direction selectivity in MST than in LIP during the sample period. The black stars denote the time points for which there were significant differences between LIP and MST. (bootstrap, p < 0.05)
Decision-correlated neural activity in MST and LIP
To test whether sample category encoding in both MST and LIP was correlated with the trial-by-trial categorical decisions of the monkeys, we compared neuronal activity between correct and error trials. One possibility is that neuronal selectivity would be similar on correct and error trials, indicating that stimulus tuning was fixed rather than varying in concert with the decisions of the monkeys. A second possibility is that we would observe significantly different stimulus selectivity between correct and error trials, indicating neuronal encoding of stimuli that was closely and dynamically coupled to the categorical decision process of the monkeys. Of particular interest is category selectivity that showed an opposite sign between correct and error trials (i.e., responding more strongly to different categories on correct vs error trials), indicating that neural category selectivity varied closely with the trial-by-trial decisions of monkeys. Figure 5a and 5b show activity for example MST and LIP neurons on correct and error trials (the same neurons as in Fig. 2a,d). The example neurons from both areas showed significantly different category encoding between correct and error trials, and their category preferences were reversed on error compared with correct trials throughout most of the trial duration.
Sample category encoding in MST and LIP reflects monkeys' correct versus incorrect categorical decisions. a, The activity of an example MST neuron on correct (solid) and error (dashed) trials. Different colors represent different sample categories, and shaded areas denote ±SEM. b, An example neuron from LIP. c, d, The magnitude and time course of sample category selectivity on error trials in MST (c) and LIP (d) were determined by a category SVM classifier. The SVM classifier was trained using neural activity on correct trials and was tested with neural activity on error trials. The shaded area denotes ±SD. The black stars indicate the time points for which the decoding performance was significantly below chance (bootstrap, p < 0.05), indicating reversed category preferences on error versus correct trials.
To quantify decision-correlated neural activity at the population level, we first trained sample category SVM classifiers using activity on correct trials, and then tested their decoding performance on error trials. We only included trials in which the direction of the test-period stimulus was far (≥30°) from the boundary, since the errors of monkeys on these trials were most likely because of miscategorizing the sample, rather than the test stimulus. Sample category encoding, which was reversed between correct and error trials, would be expected to produce decoding values significantly below the level of chance (0.5). Indeed, decoding performance dropped below chance shortly following sample onset and was maintained below chance throughout all task periods for both MST and LIP (Fig. 5c,d), indicating that population activity in both areas covaried with the monkeys' correct versus erroneous decisions about sample category.
We separately examined components of neuronal selectivity that were stimulus related (i.e., to which category a stimulus actually belonged) or decision related (i.e., the report of the monkey of stimulus category on each trial, whether correct or incorrect) using a partial correlation analysis (Zaidel et al., 2017; Zhou and Freedman, 2019). We calculated the r-stimulus (the partial correlation between neuronal activity and stimulus category, given the monkeys' choices) and r-categorical-decision (the partial correlation between neuronal activity and the categorical choice of the monkeys, given the stimulus category) for each neuron using both correct and error trials (see Materials and Methods). The r-stimulus and r-categorical-decision represent the distinct contribution of the stimulus category and the trial-by-trial choice of monkeys to the neuronal response, respectively, with higher values indicating greater effects on the neural activity. Only trials in which the sample directions were near the boundary but the test directions were far from the boundary were included in this analysis, as there were sufficient numbers of errors on these trials and these errors were most likely because of miscategorizing the difficult (near-boundary) sample stimulus. This revealed that neuronal activity correlated significantly with both the actual category membership of stimuli as well as with the categorical decisions of monkeys in all three periods of the DMC task and in both cortical areas (Fig. 6a,b). However, when comparing the magnitude of r-stimulus and r-categorical-decision within each brain area, we found that neural activity in both MST and LIP was more closely correlated with the categorical decisions of monkeys about stimuli than the actual category of stimuli, indicated by the overall higher r-categorical-decision than r-stimulus values (p < 0.05, nested ANOVA). In Figure 6 (compare a, b), LIP activity was more decision correlated than MST during the early-to-mid sample period (shown by r-categorical-decision; 150–350 ms after sample onset: mean, LIP vs MST, 0.1137 vs 0.0975; p = 0.029, nested ANOVA). Furthermore, LIP activity during the test period was more decision correlated than MST activity (r-categorical decision; mean: LIP vs MST, 0.1297 vs 0.1175; p = 0.0040, nested ANOVA). We also repeated this analysis using all trials (not only near boundary trials), which produced qualitatively similar and statistically significant results (results not shown).
Stimulus-related and choice-related components of category selectivity in MST and LIP examined with a partial correlation analysis. a, The values of r-stimulus (the partial correlation between neuronal activity and stimulus category, given the choices of the monkeys) and r-categorical decision (the partial correlation between neuronal activity and choice of monkeys, given the stimulus category) of MST neurons are plotted across time. The black stars indicate the time points for which there was a significant difference between r-choice and r-stimulus (nested ANOVA, p < 0.05). The shaded area denotes ±SEM. b, The values of r-stimulus r-categorical-decision of LIP neurons are shown in the same format as in a.
Together, the decoding and partial correlation analyses indicate that sample category encoding was correlated with categorical decisions of monkeys in both MST and LIP, with significantly stronger correlations observed in LIP than MST during the early sample period.
Decision and working memory encoding in the local field potential
Although previous studies of visual categorization in the motion DMC task primarily focused on analyzing category encoding in neuronal spiking activity, other studies have identified categorization task-related LFP activity (Antzoulatos and Miller, 2016; Stanley et al., 2018). Recent evidence suggests that LFP oscillations in PFC, such as within the beta band, might play a role in visual categorization. LFP synchrony within PFC, and between PFC and other brain areas such as anterior intraparietal cortex (AIP) and striatum, has been shown to represent task-related information in categorization tasks (Antzoulatos and Miller, 2014, 2016). However, whether oscillatory activity in PPC correlates with the categorical decisions has not been closely examined, except for one study that examined LFP oscillations in the AIP area of PPC during spatial categorization (Antzoulatos and Miller, 2016). For both MST and LIP during the DMC task, we computed the single-trial LFP power spectra for channels in which there was at least one task-modulated neuron (see criterion in Materials and Methods), and then averaged them across trials based on the sample category (see Materials and Methods). This revealed significant differences in LFP power during multiple task epochs of the DMC task (relative to the fixation period; p < 0.01 t test) and among the different motion categories, particularly in the beta frequency range (12–30 Hz). The beta power of MST and LIP LFPs decreased during the sample and early delay periods, and then recovered to baseline during the mid-to-late delay. Interestingly, beta power was significantly modulated by sample category in both MST and LIP shortly before and during the delay in both monkeys (Fig. 7a–d), with most recording channels in a given monkey showing stronger power for one particular sample category than the other sample category. Category-selective beta band LFPs were observed both on electrode channels for which we simultaneously found single neurons with significant category-selective spiking activity, as well as from electrodes on which no category-selective single neurons were observed in the session. In each monkey, the sample category that was most often preferred was the same in both MST and LIP, but the preferred sample category was different between two monkeys. For Monkey M, the beta band LFP power was greater for category 1 in 44 of 55 LIP recording channels and 50 of 69 MST recording channels. Meanwhile, the beta band LFP power was greater for category 2 in Monkey Q in 179 of 195 LIP recording channels and 373 of 375 MST recording channels. This is likely related to previous reports of biased neuronal categorical representations (Fitzgerald et al., 2013). Furthermore, the magnitude of beta modulation was different between MST and LIP during different task periods (Fig. 7). MST showed significantly stronger category-selective beta activity (difference between preferred and nonpreferred categories) than LIP during the late-sample to early-delay period (−200 to 200 ms relative to sample offset; Monkey M: 16–28 Hz, p = 0.029, df = 122, tstat = –2.2; Monkey Q: 10–20 Hz, p = 0.0096, df = 568, tstat = −2.6; unpaired t test), whereas LIP showed stronger category-selective beta activity during the mid-to-late delay period of the DMC task (201–1000 ms after sample offset; Monkey M: 10–22 Hz, p = 0.026, df = 122, tstat = −2.3; Monkey Q: 10–22 Hz, p = 3.7 × 10−10, df = 568, tstat = 6.4; unpaired t test).
Beta band LFPs in MST and LIP encode sample category information during the delay period. a, b, Category-selective LFP activity (difference between preferred and nonpreferred categories) for Monkey M. a, The averaged sample category-selective LFP power recorded from MST electrodes in time–frequency space. The first and second dashed vertical lines denote the time interval for sample stimulus, while the third dashed vertical line indicates the time of test stimulus onset. LFP activity was z-scored for each recording channel in each recording session. Power is normalized to 1/f. The purple and orange bars mark the late sample to early delay as well as mid to late delay periods used in further analysis. b, Averaged sample category-selective LFP power recorded from LIP electrodes. Strong category-selective LFP activity is seen in the beta band during the mid to late delay period of the DMC task. c, d, Sample category-selective LFP activity in MST (c) and LIP (d) electrodes for Monkey Q.
To further test whether beta band LFPs were correlated with categorical decisions of monkeys, we compared category-selective beta activity between correct and error trials. As in the analysis of spiking activity, we only included trials in which the test directions were far from the category boundary (for which errors were most likely because of misclassifying the sample). We found that category-selective beta oscillations during error trials in both MST and LIP were either reversed in sign compared with correct trials (Fig. 8a–d, Monkey M) or were greatly reduced in strength (Fig. 8e–h, Monkey Q). These results indicate that the beta-oscillations of LFPs in both MST and LIP did not just reflect the stimulus features of sample stimuli, but were also correlated with working memory and decision processes.
Beta band LFP activity in MST and LIP reflects monkeys' correct versus incorrect categorical decisions. a, b, Category-selective LFP activity in MST (a) and LIP (b) on error trials for Monkey M. Only the trials in which the test directions were far from the category boundary (for which errors were most likely because of misclassifying the sample) are included. Furthermore, only data from the recording sessions in which monkey made enough errors (at least five error trials for each sample category condition) were included in the analysis (see Materials and Methods). The gray bar marks the time period used for further analysis. The category-selective LFP power in the beta frequency range is reversed on error trials compared with correct trials (Fig. 5a,b). c, d, Comparisons of category-selective LFP activity in MST (c) and LIP (d) between correct and error trials for Monkey M. The shaded area denotes ± SEM. The black dots mark the frequency bands for which there was a significant difference between correct and error trials (p < 0.05, paired t test). e, f, Sample category-selective LFP activity in MST (e) and LIP (f) on error trials for Monkey Q. g, h, Comparisons of category-selective LFP activity in MST (g) and LIP (h) between correct and error trials for Monkey Q.
Discussion
We directly compared neuronal encoding between the cortical areas MST and LIP during a visual motion categorization task in an effort to understand how visual feature encoding is transformed into abstract categorical representations and decisions. First, we found that MST exhibits flexible and task-related encoding during the DMC task, with MST neurons showing abstract and decision-correlated encoding of visual motion categories with a similar strength as in LIP. This suggests that MST plays a greater role in cognitive functions than is widely assumed—going beyond its well established role in visual motion processing. Second, we found substantial encoding of extraretinal task-relevant information in MST in both spiking and LFP neural activity during the working-memory delay period of the DMC task. This is consistent with a recent report of delay period direction encoding during a motion direction matching task, highlighting the engagement of MST with frontal-parietal areas associated with working memory, and extends the role of MST toward more flexible or abstract decision-making. Furthermore, LIP and MST differed in their time courses of category selectivity, with decision-correlated category selectivity appearing with a shorter latency in LIP.
Circuit mechanisms underlying motion categorization
Previous studies have given insight into the neural mechanisms underlying the transformation from visual feature encoding to more abstract categorical representations (Freedman et al., 2001; Freedman and Assad, 2006, 2011, 2016). In particular, a line of studies from our group have focused on visual motion categorization across a hierarchy of visual, parietal, and frontal lobe cortical areas. We showed that neural activity in PFC and LIP and MIP areas all encode learned categories during the DMC task, with evidence suggesting that LIP is more closely involved in the motion categorization process compared with MIP and PFC (Swaminathan and Freedman, 2012; Swaminathan et al., 2013; Zhou et al., 2021). In contrast, neural activity in MT, an upstream motion-processing area that provides input to LIP (and MST), showed strong direction encoding, but did not show an obvious encoding of the learned categories or altered direction tuning as a result of categorization training (Freedman and Assad, 2006).
In the present study, we show that neurons in MST, an important motion-processing area within PPC, showed significant category encoding during both the stimulus presentation and memory delay periods, which was qualitatively similar to that in LIP. In addition, we compared neuronal category encoding between correct and error trials in both MST and LIP, and found that such encoding in both areas was closely correlated with the trial-by-trial decisions of monkeys. These results are consistent with MST participating in transforming upstream motion direction encoding (i.e., in MT) into abstract and task-related categorical representations and maintaining task-relevant category information in working memory. This indicates that multiple interconnected PPC subregions, including LIP and MST, and perhaps others, are involved in the categorization process.
Although category encoding was broadly similar between LIP and MST, several pieces of evidence suggest that LIP might be more involved in the rapid categorization of visual stimuli than MST. First, category encoding initially emerged with a shorter latency in LIP than MST (Fig. 2g,h; LIP vs MST, 229 vs 273 ms). Second, category selectivity in LIP correlated more closely with trial by trial categorical decisions of monkeys than in MST during the early sample period (Fig. 6a,b, comparing the r-categorical decision values). These results are consistent with recent work from our group showing that LIP plays a causal role in motion-based categorical and perceptual decisions (Zhou and Freedman, 2019). Category encoding in MST might arise from feedback from higher decision-related brain areas, such as LIP and PFC. Thus, it will be important for future studies to test whether MST also plays a causal role in the motion categorization process beyond its known role in visual motion processing.
It has been suggested that LIP plays a general role in encoding abstract or categorical information about visual stimuli (Fitzgerald et al., 2011; Freedman and Assad, 2011, 2016). A previous study showed that LIP neurons can encode both learned motion categories and the learned pairings between associated shapes, suggesting a generality of category-related or task-related encoding in LIP across multiple visual feature domains (Fitzgerald et al., 2011). Meanwhile, studies from other groups have also shown that LIP represents a wide range of higher-order factors beyond categorization, such as the following: task rules (Stoet and Snyder, 2004), numerosity (Nieder and Dehaene, 2009), priority (Bisley and Goldberg, 2010), sensorimotor transformation (Zhou et al., 2020), motor error for corrective saccades (Zhou et al., 2016), and saccade timing (Zhou et al., 2018). Together, this evidence supports the view that LIP is generally involved in mediating abstract cognitive computations, beyond its role in visual-spatial functions such as attention and saccade planning. However, to our knowledge, MST has not been reported to encode visual stimulus features beyond visual motion, in addition to encoding vestibular stimuli and contributing to smooth pursuit eye movements (Komatsu and Wurtz, 1989; Heide et al., 1996; Britten, 2008; Gu et al., 2012). Therefore, it will be important to test whether abstract encoding in MST is specialized for motion-based tasks, or whether it shows more generalized cognitive encoding during tasks based on other visual features as in LIP. It will also be important to test whether the differences in neural encoding between LIP and MST generalizes to tasks based on other visual features. In the current study, we used the same behavioral task and visual motion stimulus parameters that were used in previous studies of LIP to facilitate direct comparisons between areas and across studies. It will be important in future work to better understand how matching stimulus properties to recorded MST neurons (e.g., size with respect to the RF) even more precisely will impact neuronal activity during the categorization task.
Cognitive functions of MST
MST is reciprocally connected with several subregions of parietal cortex, including LIP, VIP, and 7a (Andersen et al., 1990). MST is understood to be an important motion-processing stage, but has been less implicated in higher cognitive functions. In particular, MST has been suggested to contribute to the perception of complex motion patterns, and integrating visual and vestibular signals for heading direction perception during self-motion (Bradley et al., 1996; Geesaman and Andersen, 1996; Andersen et al., 1997; Gu et al., 2007, 2008, 2012; Britten, 2008). During motion-based decision tasks, MST has been suggested to function as an intermediate processing stage between primary motion processing in MT and more cognitive processing in LIP (Celebrini and Newsome, 1994; Parker and Newsome, 1998). Here we show that MST activity was correlated with categorical decisions of monkeys in a similar manner as LIP, raising the possibility that MST is also causally involved in mediating such decisions. Such abstract category encoding is distinct from the sensory feature encoding (motion direction tuning but not category tuning) previously observed in MT during the same motion categorization task (Freedman and Assad, 2006). However, this comes with the caveat that the current study was performed in different monkeys with different training histories than the original comparison of MT and LIP in this task. Furthermore, previous studies from other groups have also shown that MT did not show strong encoding of extraretinal or task-related factors in several cognitively demanding tasks (Britten et al., 1996; Cook and Maunsell, 2002; Williams et al., 2003; Maimon and Assad, 2006; Mendoza-Halliday et al., 2014). Specifically, a study reported that MST neuronal activity in a heading direction perception task was dominated by a stimulus-correlated signal, but was less choice correlated compared with another parietal area (ventral intraparietal cortex; Zaidel et al., 2017). Whereas we found the category encoding in MST was more correlated with the trial-by-trial categorical decisions of monkeys than the physical motion of stimuli (greater r-categorical-decision than r-stimulus). The discrepancy between these two studies could be because of differences in cognitive demands between the two tasks and/or training histories required for animals to learn the tasks.
We also show that MST activity persistently encodes decision-correlated category information during the delay period of the DMC task, unlike upstream visual areas such as MT during the same task (Freedman and Assad, 2006). This is consistent with a previous study finding robust persistent direction-selective activity in MST but not MT in a delayed motion-matching task (Mendoza-Halliday et al., 2014), but shows that delay-period encoding in MST extends to more abstract cognitive variables, beyond basic stimulus features. This also suggests that MST is more closely associated than MT with frontal-parietal circuits associated with working memory and task-related encoding. Previous work also found neural encoding in MST that appeared as intermediate between MT and LIP—for example, showing strong direction encoding, as in MT, with a modest influence of extraretinal or cognitive factors compared with that observed in LIP (Eskandar and Assad, 2002; Williams et al., 2003; Maimon and Assad, 2006). The more cognitive encoding observed in the present study could arise because of the specific demands of the categorization task compared with tasks used in previous MST studies. Another difference is in the method of neuron sampling (e.g., how neurons were selected and/or prescreened before recording during the main task). In our study, we recorded from all task-related neurons that we encountered (see Materials and Methods), whereas previous studies, which did not observe very strong cognitive encoding in MST (Williams et al., 2003), appeared to focus on motion direction-selective neurons from MST. The dorsal and ventral parts of MST and MST (MSTd vs MSTl) are recognized as being two functionally distinct regions (Saito et al., 1986; Britten, 2008; Gu et al., 2008, 2012). Because our study cannot provide a direct comparison between MSTd and MSTl neurons, it will be important for future studies to further test the differential roles of MSTd and MSTl in visual categorization and other cognitive functions.
Beta band LFP activity during categorization and working memory
Frontoparietal oscillatory synchrony has been suggested to play a role in cognitive functions, such as attention (Buschman and Miller, 2007), working memory (Salazar et al., 2012), and decision-making (Pesaran et al., 2008; Haegens et al., 2011). Specifically, the beta oscillation (12–30 Hz), often linked to motor functions, has been hypothesized to reflect the current sensorimotor or cognitive state via top-down selection of relevant neural ensembles (Engel and Fries, 2010; Antzoulatos and Miller, 2016). There is increasing evidence that beta band synchrony may also play a role in categorization and working memory. Previous studies have shown that beta band LFP oscillatory coherence in both PFC and PPC is category selective and may emphasize the encoding of task-relevant categories (Antzoulatos and Miller, 2016; Stanley et al., 2018). Interestingly, only category-selective PFC neurons, but not nonselective neurons, in those previous studies were synchronized with PPC beta oscillations, suggesting that long-range beta band synchrony could act as a filter supporting task-relevant encoding (Antzoulatos and Miller, 2016). Category-selective beta band synchrony has also been shown to develop between PFC and striatum in parallel with category learning (Antzoulatos and Miller, 2014). Furthermore, beta band synchrony between PFC and PPC as well as within PFC has been shown to encode stimulus information in working memory(Salazar et al., 2012). Consistent with previous studies, we found that beta band LFP activity in both MST and LIP showed decision-related category representation shortly before and during the working memory delay, suggesting that PPC beta band LFP activity correlates with categorical decision and working memory processes. Furthermore, we also found that the magnitude of such decision-correlated LFP activity was higher in MST than LIP during the late-sample and early-delay periods, but lower in MST than LIP during the mid-to-late delay. This might also suggest differential involvement of PPC subregions during categorization and working memory. A previous study showed that beta band LFP oscillations in PFC and AIP were especially prominent during the late-sample and early-delay periods of a rule-based spatial categorization task (Antzoulatos and Miller, 2016), whereas we found decreased beta band LFP oscillations in LIP and MST during this period of the DMC task. Moreover, that study suggests that PFC circuits might compute the spatial categories and relay that information back to the parietal cortex, whereas our previous study showed that PPC is more likely to lead the motion categorization process compared with PFC (Swaminathan and Freedman, 2012). These differences could be because of the different tasks (motion vs spatial categorization) and different PPC subregions (LIP, MST, and AIP) studied. However, it remains unclear precisely what roles LFP activity and LFP-spiking interactions play in mediating categorical decisions. One possibility is that category-selective LFP activity might be related to category-based top-down attention (i.e., the attentional template of the to-be-matched stimulus category in the DMC task), since previous studies have shown that beta band oscillations are associated with top-down attentional modulation (Buschman and Miller, 2007; Saalmann et al., 2007, 2012). Further causal experiments, such as those that inactivate or stimulate one area while recording in another (Ni et al., 2016), are likely to help understand the functional significance of frequency-dependent LFP activity and LFP-spiking interactions during cognitive functions such as categorical decisions, attention, and working memory.
Overall, this study suggests that MST plays a nontrivial role in higher cognitive functions such as categorical decisions and working memory, extending beyond its traditionally recognized role in sensory processing of visual motion. It will be important to build on this work to examine whether MST plays a more general role in visual categorization and working memory in paradigms extending beyond the DMC task and for visual stimuli other than motion. Furthermore, it will be important to assess whether the category-correlated neuronal activity in MST plays a causal role in the categorical decisions of animals by using techniques such as reversible cortical inactivation and microstimulation. Moreover, future studies also need to examine the differential roles of the wider network of PPC subregions in flexible cognition by simultaneously comparing neuronal population activity across areas during task performance.
Footnotes
This study is supported by National Institutes of Health Grant R01-EY-019041. We thank Dr. Kenneth Latimer, Dr. Pantea Moghimi, Barbara Peysakhovich, and Alessandra Silva for constructive and helpful comments during the manuscript preparation.
The authors declare no competing financial interests.
- Correspondence should be addressed to David J. Freedman at dfreedman{at}uchicago.edu or Yang Zhou at yangzhou1{at}pku.edu.cn