Abstract
The dorsomedial posterior parietal cortex (dmPPC) is part of a higher-cognition network implicated in elaborate processes underpinning memory formation, recollection, episode reconstruction, and temporal information processing. Neural coding for complex episodic processing is however under-documented. Here, we recorded extracellular neural activities from three male rhesus macaques (Macaca mulatta) and revealed a set of neural codes of “neuroethogram” in the primate parietal cortex. Analyzing neural responses in macaque dmPPC to naturalistic videos, we discovered several groups of neurons that are sensitive to different categories of ethogram items, low-level sensory features, and saccadic eye movement. We also discovered that the processing of category and feature information by these neurons is sustained by the accumulation of temporal information over a long timescale of up to 30 s, corroborating its reported long temporal receptive windows. We performed an additional behavioral experiment with additional two male rhesus macaques and found that saccade-related activities could not account for the mixed neuronal responses elicited by the video stimuli. We further observed monkeys’ scan paths and gaze consistency are modulated by video content. Taken altogether, these neural findings explain how dmPPC weaves fabrics of ongoing experiences together in real time. The high dimensionality of neural representations should motivate us to shift the focus of attention from pure selectivity neurons to mixed selectivity neurons, especially in increasingly complex naturalistic task designs.
- dorsomedial posterior parietal cortex
- information accumulation
- mixed selective representation
- neuroethology
- scan path and gaze consistency
- temporal receptive window
Significance Statement
Our study employs multiunit electrophysiology on macaques to investigate the neural coding of behaviorally relevant features in naturalistic contexts, focusing on the often-overlooked dorsomedial posterior parietal neurons. We unveil their real-time multiplex representation of video content, challenging conventional views by highlighting the role of mixed selectivity neurons in constructing a “neuroethogram” of cinematic material. This research prompts a reevaluation of neural selectivity in naturalistic paradigms, offering insights into how the primate brain deciphers complex real-world experiences.
Highlights
Neural codes for “neuroethogram” in macaque dorsomedial parietal cortex.
Parietal neural codes exhibit mixed selectivity of event features.
Dorsomedial PPC neurons support a long temporal receptive window for episodes.
Saccadic movement could not explain away mixed neuronal responses.
Consistency in scan path and gaze is shown across viewing repetitions.
Introduction
In an ever-changing environment, massive amounts of multidimensional information embedded in continuous events rush into the cognitive system. The neural system has to extract pieces of meaningful information, integrate them, and encode them into memory systems in time for future needs. The dorsomedial posterior parietal cortex (dmPPC), which consists of the paracentral Area 7 and precuneus (Cavanna and Trimble, 2006), is part of the posteromedial memory system (Ranganath and Ritchey, 2012) and has strong and widespread anatomical connections with its adjacent structures, including the early visual cortex, sensorimotor regions, medial temporal areas, and prefrontal cortex in both macaque monkeys and humans (Morecraft et al., 2004; Cavanna and Trimble, 2006; Kravitz et al., 2011).
A wealth of studies demonstrated that the dmPPC plays critical roles in multifaceted cognitive processes, including visuospatial attention and locomotion processes in egocentric environment (Ghaem et al., 1997; Bartels et al., 2008); sensorimotor transformation processes, such as object manipulation (Gardner et al., 2007), execution, and observation of reaching-to-grasp behaviors (Evangeliou et al., 2009; Diomedi et al., 2020); and representations of enumeration (Harvey et al., 2013), self-related processing (Cavanna and Trimble, 2006), and episodic memory formation and retrieval (Brodt et al., 2016, 2018). The dmPPC is part of an integral hub for extracting and scaffolding information in real time from the environment (Reagh and Ranganath, 2021) and with other agents (Kravitz et al., 2011; Freedman and Ibos, 2018).
Given the region’s myriad functions, the conventional logic of stimulus–response models might not be adequate for studying neuronal responses to complex stimuli and their interactions. To take an analogy, neurons in high-order brain areas such as the prefrontal cortex (Rigotti et al., 2013) and the parietal cortex could show mixed selectivity properties to different stimuli (Fusi et al., 2016; Wallach et al., 2021) and processes such as decision-making (Erlich et al., 2015) and visuomotor coordination (Diomedi et al., 2020). Indeed, Platt and his colleagues recently showed that neurons in PFC and orbitofrontal cortex were engaged in valuing social information with a measure known as “neuroethogram” (Adams et al., 2021). A neuroethogram is defined as fitting neural activities to ethograms, which is a technique for annotating species-typical behavior. Considering that the primate Area 7 is part of a social interaction network (Sliwa and Freiwald, 2017) and a posteromedial memory network (Ranganath and Ritchey, 2012), we predict that dmPPC neurons process information embedded within complex behaviorally meaningful events. Since the capacity of single neurons to integrate multiple variables flexibly should enhance the organism’s ability to process the nonlinear integration of multiple information sources (Vaccari et al., 2022), we reasoned that this capacity should be especially important for dealing with complex episodic information like video content.
In addition to leveraging on multifunctional features contained in naturalistic videos, we were mindful that temporal information is another fundamental aspect of events (Clewett et al., 2019). Since information is carried out over distinct timescales, we know that the ability of information accumulation changes from the primary sensory cortex to the high-order cortex (Hasson et al., 2008, 2015). These studies propose that brain areas are organized with hierarchical time receptive windows (TRW) so that brain regions with long TRW will accumulate transient sensory signals from short TRW brain areas for processing. The human precuneus plays an essential role in temporal information integration of movies of up to 12 s (Hasson et al., 2008; Andric et al., 2016). It is unclear how neurons in the dmPPC might dynamically assemble such temporal details to support such episode processing.
To address these two issues on processing multiplex content and passage of time, we combined an ethogram methodology with dynamic cinematic material, together with multiunit extracellular electrophysiology in awake macaque monkeys, to elucidate how dmPPC neurons process mixed selectivity representation of naturalistic content over time. In an additional eye movement experiment, we found that gaze behaviors and saccade-related activities could not fully account for the mixed neuronal responses elicited by the video stimuli.
Materials and Methods
Experimental model and subject details
Subjects
Five male rhesus macaques (Macaca mulatta) (8.66 ± 0.59 kg) with a mean age of 6.6 years old served as subjects in this study. Among them, we recorded extracellular electrophysiological activities from two monkeys (monkey Jupiter: 6 years, 8.3 kg; monkey Mercury: 6 years, 8.6 kg) and both neural activities and eye movements from a third monkey (monkey Galen: 8 years, 9.2 kg). Furthermore, we performed an eye movement experiment on two additional monkeys (monkey K: 7 years, 9.3 kg; monkey P: 6 years, 7.9 kg; see composition of neural and eye-tracking data for monkeys: http://dqqvn.ecyv.cn/0b).
All monkeys were single-housed with a 12:12 (7: 00 A.M./7:00 P.M.) light/dark cycle and kept within the temperature range of 18–23°C and humidity between 60 and 80%. The animals were fed twice a day with each portion of at least 180 g monkey chow and pieces of apple (8:30 A.M./4:00 P.M.). Water was limited during recording days. All animal care, experimental, surgical procedures, and pre-/postsurgical care were approved by the Institutional Animal Care and Use Committee (permission codes M020150902 and M020150902-2018) at East China Normal University.
Before this study, monkey Jupiter and monkey Mercury participated in a temporal order judgment behavioral experiment (Wang et al., 2020; Zuo et al., 2020). Monkeys Galen, K, and P were trained on fixation and saccadic tasks and participated in several oculomotor studies with neuronal activities recorded from the right medial temporal and medial superior temporal areas (Jia et al., 2021).
Method details
Experimental procedure and overview
During this study, the monkeys sat in a custom-manufactured Plexiglas monkey chair (29.4 cm × 30.8 cm × 55 cm) with a head-fixed (see Surgery) in front of a 19-inch screen (An-190W01CM, Shenzhen Anmite Technology) mounted on a stainless-steel platform. Monkeys’ eyes are about 60 and 62 cm away from the screen’s top edge and bottom edge, respectively. Water was delivered by a distributor (5-RLD-D1, Crist Instrument) as a reward.
In each session, the monkeys watched three different 30 s video footage presented with PsychoPy (PsychoPy 3.1.2, PsychoPy; Fig. 1A; see lists of videos: http://dqs2g.ecyv.cn/b0; and link to the videos: http://dqq6h.ecyv.cn/2f), each for 30 repetitions arranged in 6 blocks. The same list was repeatedly presented on two consecutive days (12 days in total). In total, each video was watched 60 times. In addition, 1 ml of water was delivered at the beginning of each video, and another 1.8 ml of water was delivered following a 6 s blank period at the end of the video. The monkeys took 5 min breaks between blocks.
Experimental procedure, recording sites, and feature selection with LASSO. A, Example video (a primate video) used in the study. Each day, the monkeys watched three different 30 s videos, each for 30 repetitions in 6 blocks. B, Reconstruction of recording sites (circled in red) overlaid on T1 images. C, D, We fitted a LASSO regression model with spike counts in a 40 ms time bin as a dependent variable and 52 ethogram items and 4 low-level features as regressors. The algorithm punishes the coefficient of less important variables to zero along with the gradual increase of parameter log(λ). A variable would be filtered out of the model when its coefficient is punished to zero. A 10-fold cross-validation procedure was used to determine the value of lambda as the model produced minimal mean squared errors (MSE). For example neuron #PC0087, the algorithm yielded an optimal model with 21 nonzero coefficient variables when log(λ) = –3.43. The red dashed lines represent the largest λ where the MSE is within one standard error of the minimal MSE. Solid curves in B refer to the coefficient path of variables. The numbers on top indicate the number of nonzero coefficient variables in the optimal model. E, A set of nonzero coefficient variables produced by the model at minimal MSE. F, For validation of the optimal model, a regression model, built with 80% training dataset to test 20% testing dataset, showed a significant prediction ability (F(1, 450) = 121.5, R2 = 0.213, slope = 0.200). G, Ethogram descriptions are presented for each video, with cells colored based on event frequency, denoting the number of frames corresponding to each specific event.
Experimental stimuli
The stimuli used in this study were downloaded from YouTube. We applied Video Studio X8 (Corel Corporation) to edit these videos into 720p segments with 25 frames per second. In total, we prepared eighteen 30 s footage that were classified into three categories: (1) primate content, with depiction of activities of monkeys; (2) nonprimate content, with activities of other species, including deer, lions, hippopotamus, hyenas, storks, rhinoceros, ostriches, penguins, and giraffes; (3) scenery content, with depiction of dynamic naturalistic scenes without any animals.
Ethogram analysis
Ethogram is used to describe a set of archetypal naturalistic behaviors by using descriptive terms and phrases of a species. For the collection of videos (see Stimuli), we constructed an inventory of behaviors by adapting the ethogram framework in Adams et al. (Adams et al., 2021), which is by far the most comprehensive ethogram analysis. Each video contains only a subset of ethogram features and no single video contains the full 52-item list (see Description and scoring of the ethogram: http://dqqvh.ecyv.cn/20). Different elements of the ethogram were presented across different videos and across days (Fig. 1G). We did not fully control the presence of all high-level features in the videos in this study. In contrast, by default, low-level visual features exist in all videos. The behaviors for each video were manually registered by a custom program, the Tinbergen Alpha (Adams, 2014), producing a complete binary time series of observable events.
Low-level feature extraction
Measures of video low-level features were extracted by applying Python and MATLAB for further modeling. OpenCV package (Bradski and Kaehler, 2000) was called in Python for the calculation of luminance, contrast, and saturation. The luminance of each video frame was the mean of the pixel-wise luminosity, which was computed with the following equation lpixel = 0.299 * R + 0.587 * G + 0.114 * B (Jack, 2008). The contrast of each frame was the standard deviation of the pixel-wise intensity distribution of the grayscale frame (Perfetto et al., 2020). Saturation was the mean pixel-wise S value of hue saturation value color space that transformed from RGB color space (Jack, 2008). The motion was evaluated by the mean velocity magnitudes of optical flow by using the in-built Horn–Schunck algorithm in MATLAB (Bartels et al., 2008; Sliwa and Freiwald, 2017).
Eye tracker and eye movement experiment
An infrared EyeLink 1000 Plus acquisition device (SR Research) was used to track eye positions at a sampling rate of 1,000 Hz. The illuminator module and the camera were positioned above the monkeys’ heads. An angled infrared mirror was used to capture and re-coordinate monkeys’ eye positions. Each trial was initiated with a self-paced 2 s (range from 1.85 to 2.15 s) fixation on the white dot (50 × 50 pixels) centered on the screen.
Electrophysiological recording and spike sorting
By using chronically implanted glass-coated electrodes from the right hemisphere (SC32, Gray Matter Research) on monkeys Jupiter and Mercury, and by using single-wire tungsten microelectrode with 24 probes (LMA Single Shank, Microprobes) on monkey Galen, we recorded multiunit activities. In each recording session, the monkeys sat in chairs with their heads fixed. The headstage of the multichannel utility was connected to the SmartBox (NeuroNexus Technologies) acquisition system via an amplifier Intan adapter (RHD2000, Intan Technologies) with 32 unipolar inputs. Microelectrodes impedance of each channel was in the range of 0.5–2.5 MΩ and measured at the beginning of the session. Spike waveforms above a set threshold were identified with a 1,000 Hz online high-pass filter. Electrophysiological data collection was bandpass filtered from 0.1 to 5,500 Hz and digitized at 30 kHz. Electrophysiological data from different sessions were treated as separate ones. Single units and their spikes were then identified based on peak amplitude, principal component, autocorrelation, and spike width by using Offline Sorter (Plexon). Units with an overall mean firing rate of fewer than 1 Hz across video presentations were excluded from further analysis.
Before recording, for monkeys Jupiter and Mercury, channels without spikes were manually advanced anticlockwise to detect the promised spike waveform. On any given day, the individual electrode was advanced at most eight rounds (1 mm) with the step of one-eighth to one round (15.625 to 125 μm). For monkey G, a custom-designed recording grid (Delrin, 56 mm × 33.5 mm, 5 mm in thickness) with interlaced holes (0.8 mm in diameter and 0.8 mm apart from each other) was fixed on the plastic chips in the headpost. Then, an accommodated guide tube leads the 24-probe tungsten microelectrode (LMA Single Shank, Microprobes), with 0.1 mm spacing between adjacent probes, through the skull and dura. A hydraulic microdrive (Frameless Hardware Company) was used to drive the microelectrode into the target cortex, which was determined by MRI T1 images. By the end of the study, Jupiter and Mercury were scanned with CT. The location of each electrode was confirmed by mapping the CT image to MRI T1 images. Histological recording sites were reconstructed based on the penetration depth of each electrode with the chamber coordinates and angles to the transverse plane (Fig. 1B).
Surgical procedure for headpost and electrode implantation
For monkeys Jupiter and Mercury, the surgeries consisted of two stages: headpost installation and electrode implantation. Each stage was followed by a recovery period during which one dose of analgesic (Tolfedine, Vetoquinol) and antibiotics (Baytril, Bayer HealthCare Pharmaceuticals) were daily given via intramuscular injection according to body weight for one week. All medical operations and healthcare pre-/postsurgeries comply with the Institutional Animal Care and Use Committee guidelines at East China Normal University.
Headpost installation
Food and water were limited to 12 h before surgery. Forty-five minutes before the surgery, one dose of atropine sulfate (Shanghai Pharmaceuticals) was injected to reduce saliva secretion during operations. Ten minutes later, one dose of Zoletil (Virbac) was injected for anesthetization before monkeys were transferred to the preparation room for shaving the head skin. Once the skin was prepared, the monkeys were placed on the stereotaxic apparatus mounted on the operating table. A mixture of oxygen and isoflurane was inhaled with the help of a ventilator. Dexamethasone (0.5 mg/kg, Jilin Huamu Animal Health Products) was administered via intravenous transfusion with a 5% glucose–saline (Sake Biotechnology) injection at the beginning of the surgery to reduce the intracranial pressure and avoid bone inflammation during or after the surgery. Respiration, heart rate, blood pressure, expired CO2, and oxygen saturation were monitored during the whole surgical procedure. Body temperature was sustained at 37°C with a constant temperature heater under the operating table. After successfully opening the epidermis and removing the subcutaneous tissues, an MRI-compatible polyether ether ketone (PEEK, Gray Matter Research) headpost was cemented by acrylate cement (Refine Bright, Yamahachi Dental) which was then anchored with ceramic bone screws (Gray Matter Research) distributed on the anterior part of the skull. Sterilized saline was dropped to cool the hardened cement rapidly and clean the smoothed crumbs around the wound. Analgesics and anti-inflammatories were injected as required, when the ventilator was turned off and intravenous injection withdrew. MRI anatomical scans were acquired 4 months afterward to aid the subsequent implantation of the recording chambers.
Recording chamber implantation
Preoperative preparations were identical to the first stage. After opening the epidermis and removing the subcutaneous tissues, a craniotomy (5/8-inch diameter) was manually drilled over the right hemisphere of the monkey, while the center of the chamber was predetermined by the simulation of 3D Slicer (Kikinis et al., 2014). Next, the surface around the craniotomy was polished into a plane, and the medial wall was smoothed to only accommodate the chamber of the acquisition system. Then, 12 ceramic screws were placed for chamber fixation, and 2 stainless-steel screws were anchored for grounding purposes. After that, the surrounding areas of the screws were tightly sealed with Super Bond (Sun Medical), and the chamber was fixed with Palacos (Heraeus Medical) and acrylate cement.
Immediately after that, the monkey was transferred into an MRI scanner and imaged with a fiducial filled with gadopentetate dimeglumine (Shanghai Xudong Haipu Pharmaceutical) that was diluted 750 times. The center of the chamber was re-coordinated based on the modeling with fiducial marker by using 3D Slicer [monkey Jupiter: anteroposterior (AP), −16.4 mm; mediolateral (ML), 5.8 mm lateral to medial, 28° angle to the right and 14° angle to the posterior of the transverse plane; monkey Mercury: AP, −15.422 mm; ML, 7.549 mm, 25° angle to the right and 9.1° angle to the posterior], covering paracentral of Area 7a. Twenty-four hours later, the asepsis electrode set was fitted into the chamber when monkeys were awake, and 44 rounds (5.5 mm) of each single electrode were gradually lowered anticlockwise to penetrate through the dura and pia.
For monkey Galen, T1 images were scanned before surgery. After exposure to the skull and removal of the hypodermis, a lightweight acrylic cap was anchored by six titanium screws with acrylate cement for head fixation. The cavity of the chamber was filled with a layer of cement, and two custom-designed plastic chips (10 mm × 56 mm, 5 mm in thickness) were stabilized over the hardened cement for recording grid restriction. At the end of the surgery, an acrylic resin cap was covered over the chamber. The monkey was allowed to rest for recovery.
Quantification and statistical analysis
Data analysis was performed using custom software written in R, Python, and MATLAB.
Feature selection with least absolute shrinkage and selection operator regression
We employed the “glmnet” package (Friedman et al., 2017) in R language to build a linear model with elastic net regulation least absolute shrinkage and selection operator (LASSO) feature selection algorithm (Tibshirani, 1996). In contrast to the commonly used general linear model, LASSO regression has advantages for the present study. Annotation of ethograms produced a schematic binary time series with 52 dimensions embedded within all the videos. An important feature of the ethogram is that some items are linearly correlated and not mutually exclusive. This implies that we might not be able to disentangle the internal structures/relationships among 52 ethogram items. For example, if the count of animals is equal or more than 2, the animal count must be larger than 1. A large number of regressors and multicollinearity of simultaneous happenings tend to cause overfitting, which will increase the value of cost function, and reduce the explanatory power of the model. The LASSO regression will scale all variables and shrink coefficients of less important predictors to zero to filter out these redundant items from the model. In short, the LASSO algorithm selects features with nonzero coefficients by minimizing the prediction error of the model (Tibshirani, 1996; Zou and Hastie, 2005; Muthukrishnan and Rohini, 2016), which allowed us to determine which selected feature modulates the neural activity.
In the ethogram analysis, there is an unbalanced frequency of variables in the pool of stimuli (Fig. 1G). We acknowledge that there are a number of limitations related to this. One of the limitations of the LASSO regression, particularly in the context of sparse variables, is its tendency to arbitrarily select one variable over another when they are highly correlated. This can lead to instability in the model and difficulty in interpreting the results. When dealing with sparse variables, LASSO tends to enforce sparsity in the estimated coefficients by shrinking some coefficients to exactly zero, effectively excluding certain variables from the model. While this can be advantageous in reducing overfitting and simplifying the model, it can also lead to potential information loss, as important variables might be excluded even if they have some predictive power. Another limitation is that LASSO selects variables based on their individual contributions to the model, ignoring potential interactions between variables. This can also lead to oversimplified models that fail to capture the complex relationships between predictors, especially in cases where interactions are crucial for understanding the underlying phenomenon. It is thus essential to be aware of these limitations when using LASSO regression, especially in the context of sparse variables (Wang et al., 2007; Fonti and Belitser, 2017).
For each neuron, we concatenated the sequences of average superposed spike counts in a 40 ms time bin over 30 repetitions, as well as the time series of ethograms, in the order of nonprimate, primate, and scenery video. A LASSO regression was constructed to model the modulation of 52 ethogram items and 4 low-level features as variables of neuronal activity. With an increasing parameter λ, the algorithm iteratively penalized coefficients of all items gradually shrinking to zero (Fig. 1C,D). An optimal λ was obtained by an in-built cross-validation procedure when the LASSO algorithm reached a minimum residual sum of squares. With the optimal λ parameter, these features with nonzero coefficients were then selected into the model (Fig. 1E), implying that neural activity was effectively modulated by these selected features. For model validation, we performed a cross-validation with a model with features using a random 80% of the sample data to predict the remaining 20% of the data (Fig. 1F).
Eye movement analysis
Raw eye movement data was converted from .edf format to .asc format. The following analyses were conducted by custom-coded R programs.
Saccade identification and saccadic modeling
We defined the duration of a saccadic event as the time elapsed from when the eye velocity exceeds 15° s−1 to when it returns below it (Goffart et al., 2017). In the present study, saccades were identified by the EyeLink 1000 Plus acquisition system with a “SACC” marker during recording. In total, 41,646 saccadic behaviors were found with a mean amplitude of 11.20° with a mean saccadic duration of 70.35 ms.
For each trial, the eye movement timeline was aligned with the onset of the movie stimulus. Then, the start moment and end moment of each saccade were divided by 0.04 s (40 ms, duration of one frame) to localize the start frame and end frame of that saccade. If the start point and end point were localized in the same frame, the corresponding frame would be valued with 1. If the start point and end point were localized in separate frames, the frames from the beginning of the saccade to the end of the saccade would all be valued with 1. The remaining frames without saccade were labeled with 0. By using this method, a frame-by-frame time serial binary saccadic event was generated trial-by-trial and defined as the 57th ethogram item. Then, a LASSO feature selection algorithm was performed for each neuron from monkey Galen to fit the neural activities with the binary labeling of the 52 ethogram items, 4 low-level visual features, and the saccadic item.
Grid segregation and scan-path similarity
Grid segregation serves the purpose of dimensionality reduction of the data. The recorded x and y coordinates of fixations and saccades are mapped to a sequence by the formula:
Next, we calculated the x and y coordinates for the averaged eye position in each frame. The scan path is defined by mapping the coordinates into grids to produce frame-by-frame eye position trajectories throughout the viewing of videos. The coordinates during blinks were filled with linear interpolation by using 100 ms coordinates before and after a blink as the baseline. The scan-path similarity over viewings was estimated using pairwise correlation. To compare the variation in scan-path over repetitive viewings, the correlation coefficient r was translated into z using a Fisher z transformation algorithm (Silver and Dunlap, 1987).
Decoding analysis
We used a support vector machine (SVM) decoding method in the present study.
Video content type discriminability
An SVM classifier with a leave-one-out cross-validation approach was performed on trial spike counts sequence binned in 1 s bins to quantify the representations of categorical natural episodes within neuronal activity with the help of the “e1071” (Meyer et al., 2019) package in R. For each neuron, a multiclass decoder was trained on 87 trials (29 trials of each video) and tested on the remaining 3 trials using a one-versus-all method. Overall decoding performance was estimated as the average accuracy over 30 repetitions. At the same time, the decoding accuracy of each video was taken as the percentage of trials correctly predicted. To test the significance of decoding ability, we trained a multiclass classifier using spike sequences with randomly shuffled labels of training data and tested the left shuffled-label trials over 1,000 repetitions. The statistical significance of real decoding performance was determined in comparison to the 95% percentile of shuffled decoding accuracy.
Temporal accumulation
We iteratively trained the SVM decoder utilizing spike counts in 1 s time bins from the 1st to the 30th time bin for the temporal accumulation decoding analysis to differentiate the video contents. A leave-one-out cross-validation training-testing SVM decoding approach was implemented at each accumulation time point. A multiclass decoder was trained on 87 trials (29 trials of each video) and tested on the remaining 3 trials using a one-versus-all method for each neuron at each time point. Spike count in the first 1 s time bin, for example, was used for the first accumulation time point. The spike count sequences from the first to second time bins were used in the decoding technique for the second accumulation time point, while sequences of spike count from the first, second, and third time bins were used in the decoding approach for the third time point and so on. We performed a similar decoding study for all individual time points to confirm that the accumulating effect is an intrinsic neural function rather than a momentary response activity (spiking in 1 s time bins) to stimuli. For the estimation of statistical significance, a similar permutation SVM decoding technique was used for corresponding accumulation and individual time points, respectively.
We defined three criteria to verify the accumulation neuron: (1) decoding accuracy in real firing sequences is significantly higher than corresponding label-shuffled firing sequences; (2) decoding accuracy in real firing sequences is significantly higher than corresponding individual timepoint decoding performance; and (3) decoding performances in real firing sequences as a function of accumulated time point.
Chi-squared simulation
A chi-squared simulation procedure is used to determine the chance level of percentage of units modulated by specific selected features. The chance level is determined at 7.2% (or 27 /375 neurons; Fig. 2B, blue dashed line).
Neuron classification. A–G, Raster plots (left panels) of reordered trials (left axes) overlaid by spike density histograms with 100 ms Gaussian kernel smoothing (right axes) and firing rate comparisons (right panels) of seven representative neurons responding to different video content types. All show a significantly higher firing rate during video viewing than pre/before and post/after video presentation (p < 0.05). In raster plots, the x-axis indicates the time course of the video, and vertical lines represent the onset or offset of the video display; each row is associated with a trial. Trials are re-ranked by video content types (yellow, scenery; blue, nonprimate; green, primate). Three example content-sensitive neurons showed significantly higher firing rates to primate (A, #PC0056, primate), nonprimate (B, #PC0040, nonprimate), and scenery (C, #PC0114, scenery) content type. Firing rates of three content-sensitive neurons exhibited the lowest activity to primate (D, #PC0232, nonprimate–scenery), nonprimate (E, #PC0205, primate–scenery), and scenery (F, #PC0249, primate–nonprimate) content types. G, A content-insensitive example neuron (#PC0192) exhibited equal firing rates across different video content types. ⊗: significant phase (pre, viewing, post) × category (primate, nonprimate, scenery) two-way interactions (p < 0.05). Colored rectangle: significantly higher firing rates during the viewing phase. Error bars: SEM. *p < 0.05, **p < 0.01, ***p < 0.001.
Results
In this study, we used five macaque monkeys. To study mixed selectivity coding, we had 3 monkeys view 18 different movies (categorically primate/nonprimate/scenery content type) while we performed extracellular action potential recording targeting the dmPPC (with one of them having eye movement simultaneously recorded). On each day, the monkeys watched 3 different movies, each for 30 repetitions. These movies were custom-edited to contain both content and temporal information (Fig. 1A; see link to the videos: http://dqq6h.ecyv.cn/2f). In total, we recorded extracellular activities of 375 units (monkey Jupiter, 164; monkey Mercury, 157; monkey Galen, 54) (Fig. 1B; Methods). In addition, we performed an eye movement experiment on two further monkeys to examine saccadic events and gaze consistency during viewing.
Classification of neurons by their specificity to the video’s content type
To evaluate neural responses during video viewing, we conducted repeated-measure ANOVAs followed by simple main effect analysis, revealing a substantial rise in firing rates during the viewing phase compared to the pre- and post-video phases (post hoc paired comparisons, all p < 0.05; Fig. 2A–G). By comparing the neural spiking rates to each of the three video content types, we observed that 33.07% (124/375) of the neurons exhibited significant (post hoc paired comparisons, all p < 0.05) content-sensitive activity in the videos (Fig. 2A–F), while 66.93% (251/375) showed no difference on firing rate across video contents (content-insensitive, Fig. 2G). Among these content-sensitive neurons, 52.4% (65/124) of the units had a higher mean firing rate for primate videos (primate), whereas 8.9% (11/124) and 16.9% (21/124) had a higher mean firing rate to nonprimate (nonprimate) and scenery (scenery) videos, respectively (Fig. 2A–C). In contrast, 10.5% (13/124) of these content-sensitive neurons discharged less to primate videos (nonprimate–scenery, Fig. 2D), and 3.2% (4/124) and 8.1% (10/124) discharged less to nonprimate (primate–scenery, Fig. 2E) and scenery contents (primate–nonprimate, Fig. 2F). We found that the dorsomedial parietal neurons respond differentially to different video content types, with some of them to the primate content, which is consistent with previous findings that a portion of this part of the monkey medial parietal cortex is activated by the social interaction of conspecifics (Sliwa and Freiwald, 2017).
Multiplex representation of ethogram items and low-level features in dmPPC neurons
We analyzed the videos in detail by employing a semiautomatic frame-by-frame annotation of ethogram schema, which contains a subset of binary time series of observable social and nonsocial events (Adams et al., 2021). To investigate how individual neuron encodes the dimensions of informative dynamic natural context, we fit a LASSO elastic network regulation regression for each neuron to fit the averaged single neural activities with the binary labeling of the 52 ethogram items (see description and scoring of the ethogram: http://dqqvh.ecyv.cn/20) and four low-level visual features. The analysis produced a collection of nonzero coefficients. As shown with an example neuron, the model with the lowest mean squared error would be chosen (Fig. 1C–E). We validated the chosen model by demonstrating a significant relationship with the predicted neural firing rates (F(1, 450) = 121.5, R2 = 0.213, p < 10−5; Fig. 1F). By applying this feature selection procedure for all neurons, our model indicated that the activity of a large percentage of neurons would either positively or negatively be influenced by a number of ethogram items depicted in the videos (range from 3.73 to 74.13%; Fig. 3A,B).
dmPPC neurons respond to social and nonsocial events in videos. A, Effects of neuronal responses to 4 low-level features (dark red) and 52 ethogram items (7 categories of the ethogram organized by 7 different colors) obtained by a LASSO regression analysis. Each row stands for an item out of all 56 items, while each column refers to one neuron. Blue dashed lines showed neurons acquired from three different monkeys (J = Jupiter, M = Mercury, G = Galen). B, Proportion of neurons responsive to each item. The blue dashed line indicates the chance level. C, LASSO coefficient for each item tested against zero. Error bars: SEM over neurons. *p < 0.05, **p < 0.01.
A large proportion of units respond to facial (visible face, 57.87%; side face, 47.47%; direct face, 74.13%; eye contact, 60.27%) and genital (visible genitals, 57.60%; prominent genitals, 53.87%; male genitals, 33.07%; female genitals, 22.4%) features (Fig. 3A,B). With prominent ethogram items, we ran one-sample t tests to test for their modulatory effects. The results revealed that eye contact (t(225) = 2.921, p = 0.004, Cohen’s D = 0.194), prominent genitals (t(201) = 2.249, p = 0.026, Cohen’s D = 0.158), holding food in mouth (t(132) = 2.298, p = 0.023, Cohen’s D = 0.199), allogroom (t(43) = 2.561, p = 0.014, Cohen’s D = 0.386), mounted threaten (t(48) = 2.911, p = 0.005, Cohen’s D = 0.416), and mutual aggression (t(13) = 2.643, p = 0.020, Cohen’s D = 0.706) significantly enhanced firing rates, whereas holding food (t(50) = −2.750, p = 0.008, Cohen’s D = 0.385) reduced neuronal activities (Fig. 3C). The proportion of the neurons modulated by each of the features are higher than chance (see Methods, Chi-squared simulation) (Fig. 3B). For example, the category “camera movement” including multiple camera motions modulate the discharge of about 83.20% (312/375) percentage of all units (Fig. 6C). The category “count,” that is, the number of animals visible, influenced a significant portion of all units [92.27% (346/375); Fig. 6C]. By factoring in a detailed classification of spiking selectivity to video content, we divided all the neurons into seven types according to their responses to content types in accordance to the ANOVA results. We found a more refined pattern for the corresponding results for these seven types of neurons (Fig. 4).
Quantitative effect of selected features on seven types of content-related neurons. One-sample t tests were used to assess the consistency of the modulation for each selected feature to subgroup neurons. A, Primate (P) units were positively modulated by optical flow (t(49) = 3.092, p < 0.01, Cohen’s D = 0.437), side face (t(31) = 4.067, p < 0.001, Cohen’s D = 0.719), prominent genitals (t(38) = 3.374, p < 0.01, Cohen’s D = 0.540), holding fold in mouth (t(36) = 3.506, p < 0.01, Cohen’s D = 0.576), chew (t(17) = 2.159, p < 0.05, Cohen’s D = 0.509), allogroom (t(16) = 2.663, p < 0.05, Cohen’s D = 0.646), and grapple (t(7) = 4.101, p < 0.01, Cohen’s D = 1.450), while negatively modulated by camera tracking (t(48) = −2.758, p < 0.01, Cohen’s D = 0.394), visible face (t(36) = −3.669, p < 0.001, Cohen’s D = 0.603), and group foraging (t(17) = −2.381, p < 0.05, Cohen’s D = 0.561). B, Activity of nonprimate–scenery (Np&S) neurons were boosted by the behaviors of camera panning (t(9) = 3.410, p < 0.01, Cohen’s D = 1.078) and animal count >5 (t(10) = 2.382, p < 0.05, Cohen’s D = 0.718) but lowered by the chewing behavior (t(1) = −15.016, p < 0.05, Cohen’s D = 10.618). C, Nonprimate (N) neurons responded more to video saturation (t(7) = 2.650, p < 0.05, Cohen’s D = 0.883) and animal count >1 (t(6) = 2.899, p < 0.05, Cohen’s D = 1.096) but responded less to the occurrence of optical flow (t(9) = −3.398, p < 0.01, Cohen’s D = 1.075), allogroom (t(1) = −285.532, p < 0.01, Cohen’s D = 201.902), and chase (t(1) = −104.730, p < 0.01, Cohen’s D = 74.055). D, No consistent activation of primate–scenery (P&S) units to dimensional video contents (p > 0.25). E, Responses of scenery (S) units were slightly suppressed to luminance (t(14) = −3.030, p < 0.01, Cohen’s D = 0.783). F, Animal count >1 (t(4) = 2.965, p < 0.05, Cohen’s D = 1.326) increased activation of a subgroup of primate–nonprimate (P&Np) neurons. G, Behaviors of eye contact (t(151) = 3.798, p < 0.001, Cohen’s D = 0.308), mounted threaten (t(30) = 2.180, p < 0.05, Cohen’s D = 0.392), and any aggression (t(86) = 2.142, p < 0.05, Cohen’s D = 0.230) positively evoked content-insensitive (CI) units’ firing, while foraging (t(36) = −2.695, p < 0.05, Cohen’s D = 0.443), holding food (t(29) = −2.409, p < 0.05, Cohen’s D = 0.440), and chase (t(30) = −2.396, p < 0.05, Cohen’s D = 0.430) significantly decreased CI responses. Colors (in A–G) refer to the labels of item categories shown on the left side. Error bars: SEM. *p < 0.05, **p < 0.01, ***p < 0.001.
We checked whether dmPPC neurons would play a role in the processing of low-level features (compare monkey early visual cortex described by Russ and Leopold (2015)]. In the same LASSO model, low-level features tuned a large proportion of the dmPPC neuronal responses (“low-level features” in Fig. 3A, 3B, dark red bars). A large subset of neurons was tuned by luminance (70.40%, 264/375), contrast (73.07%, 274/375), saturation (72.00%, 270/375), and optical flow (73.60%, 276/375), respectively. To elucidate the separate contributions of high- versus low-level features, we performed a separate cross-validation LASSO regression model to identify how low-level features contribute to neural activities. Taking the neuron #PC0087 as an example, compared to our original full model (Fig. 1F), the R2, or the explanatory power of the regression model, remained statistically significant (F(1, 450) = 78.85, R2 = 0.147, slope = 0.156) once the low-level feature was removed. This implies that we would not be able to exhaustively capture all the variations contributed by all low-level features (e.g., local spectral and spatiotemporal elements) to reach a 100% fitting of the neural activities.
Previous studies suggested that the neuronal latencies of responses to the onset of stimuli in the posterior parietal cortex range from 45.2 ms in lateral intraparietal area (LIP) (Bisley et al., 2004) to 98 ms in Area 7a (Bushnell et al., 1981; Barash et al., 1991). To test whether the latency had any effect on our results, we realigned the spike train to the onset of the video’s presentation with either a 40 ms or a 100 ms latency to fit the LASSO feature selection algorithms. By comparing these with our original results, the proportions of most selected features were not affected by the latencies realignments in both cases (p > 0.223), except that more neurons respond to male genitals with 100 ms realignment (0 ms vs100 ms, χ2(1) = 32.988, p < 10−5; 40 ms vs 100 ms, χ2(1) = 38.429, p < 10−5), but fewer neurons were modulated by the feature of flee (0 ms vs 40 ms, χ2(1) = 15.991, p < 0.001; 0 ms vs 100 ms, χ2(1) = 25.126, p < 10−5) and any aggression (0 ms vs 40 ms, χ2(1) = 10.882, p < 0.001; 0 ms vs 100 ms, χ2(1) = 18.961, p < 0.001) in both latencies. Yet, the coefficients of these selected features showed no differences against 0 ms latency alignments in both 40 ms (p > 0.524) and 100 ms (p > 0.362) realignments. Since these control analyses indicated that shifting the latency by one (40 ms) or two (∼100 ms) frames did not change our main results, we analyzed the data with no latency shifting for the remaining analyses.
It has been reported that neural activity in PPC is involved in primate saccadic behavior (Andersen et al., 1990). To evaluate the extent to which the neural activation was modulated by saccades during video viewing, we repeated the LASSO feature selection algorithm but now taking saccades as the 57th feature along with all other ethogram items on a trial-by-trial basis (see Methods, Eye movement analysis) for data acquired from monkey Galen. The results demonstrated that 85.19% (46/54) neural activities were increased by saccadic behaviors (t(45) = 3.271, p = 0.002; Fig. 5C). At the group level, however, the proportions (p > 0.541) and coefficients (p > 0.560) of neurons responding to each of the 56 key items showed no significant differences between whether saccadic data was included as a feature (Fig. 5). This suggests that even when potential contributions by eye movements were regressed out, the neuronal selectivity responses to the key features and visual content remain.
Neural activity in dmPPC cannot be accounted for by saccadic eye movements. For Monkey Galen eye movements were monitored with EyeLink 1000 Plus, and neural activities were recorded simultaneously during free viewing, we identified trial-by-trial saccadic eye movements and set saccadic as the 57th feature to evaluate the neural modulation by eye movement. A, Effects of neuronal responses to saccadic eye movements and 56-item ethograms obtained from LASSO regression trial-by-trial. B, Proportion of neurons responsive to each item. C, LASSO coefficient for each item tested against zero. The proportion and coefficient of neurons responding to each of the 56 key items showed no statistical difference from those when no saccadic data were included as a feature. Error bars: SEM across neurons. *p < 0.05, **p < 0.01, ***p < 0.001.
To illustrate the multiplex nature, we then performed an intersection analysis (Bastian et al., 2009) and found that almost all (94.4%; 354/375) units showed mixed selectivity representations to at least three ethogram categories, 1.6% (6/375) neurons modulated by the combination of two ethogram categories, and only 4% (15/375) units selectively respond to one single ethogram category (Fig. 6A–C).
dmPPC neurons demonstrate mixed selectivity representations. A, Distribution of neurons and their composition for mixed selectivity representations. Gray bars show the numbers of units exclusively modulated by combinations of mixed ethogram features, with their composition shown in the bottom panel. Color coding here is the same as Figures 2 and 3B. B, Demonstration for dmPPC cell ensembles for their mixed selectivity coding. Each small yellow dot denotes a neuron. The eight circles with labels refer to the eight feature categories (low-level features and seven ethogram categories), with their size proportional to the number of neurons modulated by that category. The connecting lines refer to the relationship between neurons and feature categories. C, Number of neurons that responded to each of the ethogram categories. For example, the category “camera movement” including multiple camera motions modulates the discharge of about 83.20% (312/375) percentage of all units. The category “count,” which is the number of animals visible, influenced a significant portion of all units (92.27%, 346/375).
Information coded for multiplex representation of features increases decodability for video content type
To examine whether and to what extent the temporal spiking patterns during viewing can differentiate the representations of categorical content, we trained a linear multiclass SVM classifier with a firing rate within 1 s time bins using a leave-one-out cross-validation approach for each neuron (see Methods, Content discriminability). Overall, 40.80% (153/375) of the neurons exhibited a significant decoding ability when compared to a label-shuffled permutation statistical threshold (see Methods, Decoding analysis, valid neurons, p < 0.05), while 59.20% (222/375) of the neurons did not show a significant decoding performance (invalid neurons, p > 0.05). Intended comparisons revealed that valid neurons had a greater ability to discriminate the video contents (t(373) = 23.787, p < 0.001, Cohen’s D = 2.499; Fig. 7B) than invalid units. Interestingly, these valid neurons showed a higher decoding performance to primate videos in comparison to nonprimate and scenery content types (F(2,304) = 36.025, p < 0.001, η2 = 0.192; primate vs nonprimate, t = 7.501, p < 0.001, Cohen’s D = 0.753; primate vs scenery: t = 7.191, p < 0.001, Cohen’s D = 0.722; nonprimate vs scenery: t = −0.310, p = 0.756, Cohen’s D = −0.031; Fig. 7C).
Relationship between mixed selectivity representation and individual neuronal decoding performance. A, A total of 153 valid neurons showed significant video content-type decoding ability. Error bars refer to the averaged prediction accuracy across neurons. The number over each bar refers to the number of neurons with video content decoding ability of the corresponding neuron type. Labels of the x-axis for each neuron type are the same as in Figure 4. The numerals above bars refer to the number of neurons with successful video content-type decoding ability of a given neuron group (Figs. 2, 4). B, Neurons with valid decoding performance (greater than the significant statistical threshold, valid neurons) demonstrated better decoding performance than invalid neurons (lower than the statistical threshold). C, Valid neurons showed higher decoding performance to primate content types in comparison to nonprimate and scenery content types. D, Valid neurons implicated more features than invalid neurons. E, The number of selected features was significantly related to individual neuron’s overall content-type discriminability across all the valid neurons. F, This relationship is significant for primate video content (left panel) but not for nonprimate (middle panel) or scenery (right panel) video content types. Lines represent linear regression of all valid neurons. Dots refer to valid neurons. Error bars: SEM across neurons. ***p < 0.001.
We then verified the multiplex representation of ethogram items results with their relationship with decoded discriminability for each of the video’s content type. The group of neurons with a significant decoding accuracy per video content type implicated significantly more selected features (t(373) = 6.125, p < 0.001, Cohen’s D = 0.644; Fig. 7D). A generalized linear model (GLM) regression revealed that the decoding accuracy of the valid decoder neurons was also significantly correlated with the number of selected features (R2 = 0.037, p < 0.017; Fig. 7E). On an individual neuron level, neurons increased their decoding ability with the number of selected features for the primate videos (R2 = 0.103, p < 5 × 10−5; Fig. 7F, left panel) but not for the nonprimate (R2 = 0.005, p = 0.374) or scenery videos (R2 = 0.002, p = 0.610; Fig. 7F). Specifically for primate videos, when we considered the eight regressors (seven ethogram item categories and one low-level category) separately, the relationship between decoding ability with the number of selected features was not present anymore (F(8,144) = 1.471, R2 = 0.076, p = 0.173), implying a multiplex representation of features.
Long temporal receptive window sustained by dmPPC neurons
In light of the proposal that the parietal association cortex exhibits information accumulation over long timescale (Honey et al., 2012; Murray et al., 2014; Runyan et al., 2017), we hypothesized that the dmPPC cells might help scaffold the dynamic events temporally. To test the temporal accumulation hypothesis, we constructed a multiclass SVM classifier with stepwise accumulated sequential spiking using 1 s time bins across the videos (Fig. 8A, light green dots/line). For this example neuron (#PC0221), we showed that accumulated 1 s epoch decoding produced a significantly better decoding performance than shuffled data (t(29) = 11.013, p < 0.001, Cohen’s D = 2.011; Fig. 8A, dark green dashed line), with prediction accuracy increased as a function of accumulated time points (R2 = 0.757, p < 10−5).
dmPPC neurons accumulate temporal information with long temporal receptive windows. A, The decoding performance of the example neuron (#PC0221) positively correlates with cumulative spiking sequences (light green) but not with momentary neural activity (yellow). We used two sets of SVM decoding exercises to verify this property. First, we used cumulative spikes in 1 s time bins for the 1st to 30th timepoint (accumulated sequence; light green) and compared it to the significant statistical threshold (dark green). Second, we used spikes in each individual time point (yellow) and compared them with the permuted significant statistical threshold (dark red). The four lines represent linear regression for these four SVMs for an example neuron. The dots refer to the decoding performance of each time point for cumulative and momentary conditions. B, A Sequence (Accumulative/Individual) × Approach (Real/Shuffle) two-way ANOVA revealed that the mean slope of population neurons (n = 57) was higher for real and cumulative sequences than for both shuffled control data and individual 1 s time binned spike data (p < 0.001). C, The decoding performance for each video content positively correlates with cumulative spiking sequences. D, One-way ANOVA and post hoc analysis revealed that neurons in dmPPC had fast accumulation speed for primate video contents. × indicates the sequence–approach two-way interaction. Error bars: SEM across neurons. ***p < 0.001. ns, not significant.
For statistical inference, an identical SVM decoding procedure was applied but using neural activity for each time bin independently (Fig. 8A, yellow dots/line). The prediction accuracy was not better than the permuted threshold, and its slope was not different from zero (R2 = 0.006, p = 0.689; Fig. 8A, brown dashed line). Planned paired t tests confirmed that evidence accumulation is inherent in the temporal sequences rather than the single moments during which neurons fire (t(29) = 12.154, p < 0.001, Cohen’s D = 2.219).
Using this approach, 57 of the neurons showed a statistically significant pattern for information accumulation. To assess the strength of the information accumulation, we compared the slopes by crossing two factors, Sequence (Accumulative/Individual) × Approach (Real/Shuffle), and found a two-way interaction (F(1, 56) = 339.307, p < 10−5, η2 = 0.531). These effects were derived from the stronger effects in accumulated real firing sequences than momentary neural firing (real accumulated vs real individual, t = 25.603, p < 10−5, Cohen’s D = 2.919; Fig. 8B, left panel) and no differences between the two shuffled condition points (t = 0.645, p = 0.520; Fig. 8B, right panel). Additional analysis revealed that these accumulation neurons had faster speed and stronger magnitude to accumulate conspecific relevant information than nonprimate and scenery information (F(2,112) = 21.056, p < 0.001, η2 = 0.273; primate vs nonprimate, t = 6.337, p < 0.001, Cohen’s D = 1.183; primate vs scenery, t = 4.378, p < 0.001, Cohen’s D = 0.817; Fig. 8C,D). These findings show that dmPPC neurons accumulate information on dynamic events in an additive manner over the course of video viewing, especially for conspecific-related events.
As a control analysis, we tested whether the effect was an artifact of growing firing sequences. We repeated the multiclass SVM decoding approach but now with smaller stepwise accumulated 40 ms time bins spiking sequences across the videos (in place of 1 s time bins). 31.37% (48/153) of the neurons sustained significant information accumulation, which is not statistically different from the performance with 1 s time bins (χ2(1) = 0.9279, p = 0.335), indicating that the dmPPC neurons exhibit its accumulation property irrespective to the length of the spiking sequences.
Scan paths over repetitions and gaze consistency modulated by video features
In the final set of analyses, we used data from an eye movement experiment and analyzed the gaze behavior of three monkeys (one of them had also participated in the electrophysiology experiment). Previous studies reported that nonhuman primates made anticipatory saccades to memorized events (Kano and Hirata, 2015), implying that primates exhibit consistency in gaze trajectory across repeated viewings. To test this, we calculated the pairwise scan-path correlations across viewing repetitions and showed that monkeys exhibited significantly stronger similarity across viewing scan paths for videos with conspecific activities (primate vs nonprimate and primate vs scenery, p < 0.05; Fig. 9A,B). We then ran a GLM regression to assess the change in scan-path similarity as a function of repetition lag, which is measured by the time between any two paired viewings of the same video. We showed a lag effect that scan-path similarity decreases with increasing repetition lag but consistently for all three content types (primate, R2 = 0.782, p < 10−5; nonprimate, R2 = 0.668, p < 10−5; scenery, R2 = 0.530, p < 10−5; Fig. 9C).
Scan paths across viewings remain most steady for primate video content type. A, Monkeys showed higher scan-path similarities across viewing repetitions for primate than for nonprimate and scenery videos (Galen, F(2, 1302) = 588.808, p < 0.001, η2 = 0.475; tPrimate–Nonprimate = 27.288, p < 0.001, Cohen’s D = 1.850; tPrimate–Scenery = 31.664, p < 0.001, Cohen’s D = 2.147; tNonprimate–Scenery = 4.376, p < 0.001, Cohen’s D = 0.297; monkey K, F(2, 1302) = 1228.672, p < 0.001, η2 = 0.654; tPrimate–Nonprimate = 42.212, p < 0.001, Cohen’s D = 2.862; tPrimate–Scenery = 43.615, p < 0.001, Cohen’s D = 2.957; tNonprimate–Scenery = 1.403, p = 0.340, Cohen’s D = 0.095; monkey P, F(2, 1302) = 52.018, p < 0.001, η2 = 0.074; tPrimate–Nonprimate = 2.620, p = 0.024, Cohen’s D = 0.178; tPrimate–Scenery = 9.847, p < 0.001, Cohen’s D = 0.688; tNonprimate–Scenery = 7.227, p < 0.001, Cohen’s D = 0.490). Error bars: SEM across repetitions. B, Heatmaps of averaged pairwise scan-path correlations of the three monkeys, showing significantly higher correlation for the Primate content-type eye data. C, Scan-path similarity plotted as a function of repetition lag. Error bars in C refer to SEM across monkeys. ***p < 0.001; *p < 0.05; n.s., not significant.
To identify the kind of features in the video that influence monkeys’ gaze consistency across viewings, we constructed a separate LASSO regression on the ethogram items using gaze consistency, which is a measurement of the distribution of gaze central tendency on each frame across the 30 viewings. We obtained patterns of results that are consistent with the findings of Adams et al. (2021) that items such as animal count equal or more than 1, visible and prominent genital cues, hold food in mouth resulting in high gaze consistency, whereas items such as presentation of male genitals, manipulate food, allogroom on the screen leading to lower gaze consistency (Fig. 10).
Monkeys’ gaze consistency modulated by the video features. Bars denote the nonzero coefficients chosen by the LASSO feature selection algorithm. Positive coefficients indicate items that result in high consistency across viewings whereas negative coefficients refer to low consistency across viewings. The three monkeys showed high agreement for a majority of items.
Discussion
Our findings revealed that neurons in the dmPPC (dmPPC) exhibit responses to an array of cinematic features. dmPPC neurons showed mixed selectivity responses to different categories of ethograms and low-level visual features. The amount of information embedded within neuronal spiking sequences was modulated by the convergence of multiple representations which in turn contribute to the read-out of different content types. The processing of category and feature information by these neurons is sustained by the accumulation of temporal information over a relatively long timescale.
According to dual-process models in social cognition, the medial posterior parietal cortex is part of the reflective system corresponding to a controlled social cognition processing (Satpute and Lieberman, 2006; Lieberman, 2007). Here, a large proportion of neurons respond to the presentation of foraging behaviors. This pattern might be related to evidence that neurons in macaque’s Area 7 express their intentions before actions (Snyder et al., 1997) and that Area 7 strongly responds to conspecific social interactions in comparison to nonsocial interactions by inanimate objects (Sliwa and Freiwald, 2017). Further to Klein’s observations that neuronal activity in the primate lateral intraparietal cortex signaled values of conspecific genital cues (Klein et al., 2008), such modulation is also observed in dmPPC neurons in the present study. Since neurons in dmPPC are consistently modulated by the observation of social interactions, such as inner group grooming behaviors (allogroom and scratch) and aggression (chase, strike, flee), our findings strengthen the notion that the posterior parietal cortex plays an interface role in the integration of multifaceted information for social cognition.
However, we are cautious about the specificity of our results to dmPPC. Rather, the results could be interpreted as merely providing insights into some general principles of the association cortex more broadly. In fact, recent studies have shown that the optogenetic silencing of PPC cells in mice did not have a significant impact on social interaction (Suzuki et al., 2022), prompting a reconsideration of the PPC’s role within the social interaction network. In the present study, the simultaneous occurrence of social interaction behaviors and animal motion and movements (e.g., reaching, scratching) warrants prudence in establishing direct implications of PPC involvement in social interactions. These findings suggest the integral nature of multifaceted information, including movement, by medial PPC neurons (Diomedi et al., 2020; Vaccari et al., 2022). Future research should prioritize meticulous controls for social and nonsocial conditions to systematically examine the relationship between PPC neural activity and social interactions.
All in all, the neurons in the dmPPC demonstrate mixed selectivity representation (Vaccari et al., 2022) to distinct as well as combinations of features (e.g., aggressive behavior and allogroom), implying a role of dmPPC in the computation for multiplex information in rich fast-changing environments (Fusi et al., 2016; Murray et al., 2017; Johnston et al., 2020). In several control analyses, we further confirmed saccadic movement could not explain away the mixed neuronal responses and that transmission latency also did not affect the overall main results. In a recent study by Russ and Leopold (Russ et al., 2023), the researchers computed the mean and standard deviation of the firing rate prior to the stimulus. They identified the response onset for each snippet based on changes in spike rates and determined the mean latency for each neuron by averaging the latencies across all snippets. They found that there are early and late responses to the same stimuli that code for different aspects of the stimulus. In contrast, our study aimed to investigate the mixed selectivity encoding strategy of dmPPC neurons towards multidimensional features and employed a different set of methodologies. It is noteworthy that a great majority of the observed ethogram behaviors in our videos persist for at least 200 ms (minimally five consecutive frames), and therefore, even in the presence of one or two frames of drift, the feature selection algorithm would adjust the coefficients by iteratively updating the weights.
Another key finding of our study is that the neurons in the monkey dmPPC accumulated representations across the duration of the video to support the construction of episodes, providing evidence that the PPC sustains a long temporal receptive window. Indeed, the dmPPC has been proposed to code information with long timescales (Hasson et al., 2008; Runyan et al., 2017). A human fMRI study reported that the medial posterior parietal cortex (precuneus) accumulates information of video stimuli up to 12 s (Hasson et al., 2008). We argued that the long TRW allows the dmPPC to accumulate the continuous information of multifaceted representations from unimodal or integration of cross-modal inputs (Gilissen and Arckens, 2021) to support the processing of streams of episodic information. This aligns with the higher temporal dynamics in the precuneus when remembering the unfolding of events that included a high density of experience units (Jeunehomme et al., 2022). The primate dmPPC has dense connections to the hippocampal formations in the primate (Kravitz et al., 2011). Given the known importance of schema cells in the primate hippocampus (Baraduc et al., 2019) and how such cells might code for information about space and nonspatial elements of the environment for both perceptual and mnemonic experiences (Gulli et al., 2020; Zhang et al., 2022), it is likely that the two structures support an information abstraction system that is driven by a broad range of behaviorally relevant inputs.
Functional MRI studies using dynamic movie stimuli reveal that the medial PPC is involved in the motion information processing in both humans and monkeys (Bartels et al., 2008; Russ and Leopold, 2015). This is in line with our observation that neural activity in dmPPC was modulated by motion features, including optical flow (Raffi and Siegel, 2007) and motions caused by camera movement and animal appearance (count), aggressive behaviors (e.g., chase, strike, flee), and other features correlated with the fluctuations in early visual areas (Russ and Leopold, 2015). Neuronal responses to the convergence of multiplex dimensional stimuli suggest that dmPPC is involved in the integration of inputs from inner projections in the PPC (Andersen et al., 1987) and from the early visual cortex (Robinson et al., 1978). Moreover, in the eye movement experiment, we further demonstrated that scan paths are modulated by video content type across repetitions. Analysis of gaze consistency across viewings also confirmed how specific ethogram items embedded in the videos could result in different viewing patterns (Adams et al., 2021).
Compared to static stimuli, the use of dynamic naturalistic videos, which meets the ethological validation from sensory to social cognition research (Mosher et al., 2014; Adams et al., 2021; Testard et al., 2021), helped us yield significant findings in the present study. We acknowledge that the analyses and their conclusions are based on correlational measures between action potentials and cognitive indices. In future work, micro-stimulation by electrical or targeted pharmacological intervention would be instrumental for a detailed elucidation of the complex cognitive roles carried out by the primate dorsomedial posterior parietal neurons.
Resource availability
Lead contact. Further information and requests for resources should be directed to the Lead Contact, Sze Chai Kwok (sze-chai.kwok@st-hughs.oxon.org).
Data and code availability. Raw electrophysiological data, analysis code, and processed data supporting the conclusions of this study are available upon request.
Footnotes
This work was supported by the National Natural Science Foundation of China (32071060), the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning (Beijing Normal University), Jiangsu Provincial Department of Science and Technology (BK20221267), and an internal funding from School of Psychology and Cognitive Science (East China Normal University). We thank Edmund Rolls for teaching our team fundamentals of electrophysiological recording in NHPs and Emiliano Macaluso and Guohua Xu for their helpful comments on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Sze Chai Kwok at sze-chai.kwok{at}st-hughs.oxon.org.