Abstract
Models of visual cognition generally assume that brain networks predict the contents of a stimulus to facilitate its subsequent categorization. However, understanding prediction and categorization at a network level has remained challenging, partly because we need to reverse engineer their information processing mechanisms from the dynamic neural signals. Here, we used connectivity measures that can isolate the communications of a specific content to reconstruct these network mechanisms in each individual participant (N = 11, both sexes). Each was cued to the spatial location (left vs right) and contents [low spatial frequency (LSF) vs high spatial frequency (HSF)] of a predicted Gabor stimulus that they then categorized. Using each participant's concurrently measured MEG, we reconstructed networks that predict and categorize LSF versus HSF contents for behavior. We found that predicted contents flexibly propagate top down from temporal to lateralized occipital cortex, depending on task demands, under supervisory control of prefrontal cortex. When they reach lateralized occipital cortex, predictions enhance the bottom-up LSF versus HSF representations of the stimulus, all the way from occipital-ventral-parietal to premotor cortex, in turn producing faster categorization behavior. Importantly, content communications are subsets (i.e., 55–75%) of the signal-to-signal communications typically measured between brain regions. Hence, our study isolates functional networks that process the information of cognitive functions.
SIGNIFICANCE STATEMENT An enduring cognitive hypothesis states that our perception is partly influenced by the bottom-up sensory input but also by top-down expectations. However, cognitive explanations of the dynamic brain networks mechanisms that flexibly predict and categorize the visual input according to task-demands remain elusive. We addressed them in a predictive experimental design by isolating the network communications of cognitive contents from all other communications. Our methods revealed a Prediction Network that flexibly communicates contents from temporal to lateralized occipital cortex, with explicit frontal control, and an occipital-ventral-parietal-frontal Categorization Network that represents more sharply the predicted contents from the shown stimulus, leading to faster behavior. Our framework and results therefore shed a new light of cognitive information processing on dynamic brain activity.
Introduction
Since Helmholtz's unconscious inferences, vision scientists have worked with the hypothesis that what we visually perceive is influenced by the bottom-up sensory input but also by top-down expectations of what this input might be (Kinchla and Wolfe, 1979; De Lange et al., 2018). Expectations predict upcoming visual information contents (Yuille and Kersten, 2006; Friston, 2010; Clark, 2013), thereby facilitating their disambiguation from the noisy input (Gilbert and Sigman, 2007; Kok et al., 2012) to speed up categorization behavior (Bar et al., 2006).
Studies of the dynamic predictive brain have mainly focused on how predictions can top-down modulate neural signals. For example, predictions can induce patterns of local stimulus-specific activation in hippocampal, ventral temporal, and primary visual cortex (Kok et al., 2014, 2017; Hindy et al., 2016; Margalit et al., 2020), or enhance gamma and reduce low-alpha oscillations in visual and frontal cortex (Benedek et al., 2011; Haegens et al., 2011; Michalareas et al., 2016; Lobier et al., 2018). Predictions can also enhance high-alpha synchronization in the frontal-parietal-occipital network (Lobier et al., 2018). However, key to understanding the mechanisms that top down predicts visual contents to facilitate their bottom-up categorizations is to reconstruct, from such neural signal modulations, the elusive networks that process (i.e., predict and categorize) specific information depending on the demands of the cognitive tasks. To address these points, we reverse engineered (1) the Prediction Network, which top down communicates specific stimulus contents before the stimulus is shown to the expected contralateral occipital hemisphere, and (2) the Categorization Network, which bottom-up processes these predicted contents from the stimulus to speed up its categorization.
Specifically, our research addresses the following fundamental information processing questions pertaining to the prediction and categorization of visual contents (Fig. 1): (1) When, where, and how does a Prediction Network of brain regions flexibly represent and communicate the predicted contents of a stimulus? (2) when, where, and how does a Categorization Network represent and communicate these contents when presented in the stimulus for behavior? and (3) how do predicted contents in 1 change stimulus content in 2 to speed up categorization behavior?
Materials and Methods
Participants
Eleven participants (18–35 years old, mean, 26.8; SD = 3.0, four males and seven females) took part in the experiment and provided informed consent. All had normal or corrected-to-normal vision and reported no history of any psychological, psychiatric, or neurologic condition that might affect visual or auditory perception. The University of Glasgow College of Science and Engineering Ethics Committee approved the experiment (Application number 300210118).
Stimuli
Stage 1 of the experimental design (Fig. 2) used two location cues (one for left-cued and one for right-cued trials). Stage 2 used three different sweeping sounds, serving as low spatial frequency (LSF), high spatial frequency (HSF), and neutral auditory cues. Stage 3 used two locations × two spatial frequencies × three orientations Gabor patches as stimuli (see below).
Stage 1 location cues
Participants sat at a 182 cm viewing distance from the screen. We presented a green dot of 1° of visual angle diameter for 100 ms to the left (vs right) of a fixation cross (2° of visual angle eccentricity).
Stage 2 SF cues
Three 250 ms sweeping sounds started with auditory frequency of 196 Hz (cuing LSF), 2217 Hz (cuing HSF) or 622 Hz (no prediction), with a sweep rate of 0.5 rising octave/s.
Stage 3 Gabor stimuli
Left (vs right)-cued Gabor patches were presented (diameter, 7.5° visual; left and right eccentricity, 12.5° visual), with LSF (vs HSF) contents of 0.5 cycle/degree (vs 1.2 cycle/degree) shown at one of three randomly chosen orientations (−15°, 0°, +15°). Before the task, we calibrated the LSF and HSF Gabor contrast independently for each participant, using an adaptive staircase procedure (target accuracy set at 90%). On each calibration trial, a left (vs right) green dot presented for 500 ms predicted the upcoming left versus right location of the LSF or (HSF) Gabor patch, itself presented for 100 ms. Participants responded LSF versus HSF versus Don't know without feedback. We adaptively adjusted the LSF versus HSF contrast as follows: Contrast = Contrast – 1 * (Correct vs Incorrect – target accuracy)/Shifting Count, where Shifting Count counts the number of direction changes (i.e., increasing to decreasing, or decreasing to increasing). The adaptive staircase stopped when the adjustment step was < 0.01, setting each SF contrast for this participant's Gabor stimuli in the actual experiment.
Procedure
Each three-stage trial started with a central fixation cross presented for 500 ms (Fig. 2A). For Stage 1, a green dot presented for 100 ms appeared to the left or right of the central fixation cross, predicting the left versus right location of the upcoming Gabor with a validity of one. This was followed by a jittered blank screen (1000–1500 ms).
In Stage 2, three sweeping sounds presented for 250 ms predicted the Gabor stimulus presented at Stage 3. On predictive trials, the 196 Hz (vs 2217 Hz) sound predicted the upcoming LSF (vs HSF) Gabor (both with 0.9 validity). The 622 Hz sound was a neutral cue without predictive value. This neutral cue was followed by LSF versus HSF Gabors with 0.5 probability on 33% of the trials (neutral trials).
In Stage 3, the LSF versus HSF Gabor stimulus appeared at one of the three rotations on the left versus right screen location for 100 ms. The Gabor was either LSF or HSF, with one of three randomly chosen orientations, followed by a 750–1250 ms intertrial interval (ITI) with jitter. We instructed participants to respond LSF versus HSF versus Don't know as quickly and as accurately as they possibly could. They did not receive feedback. We counterbalanced the use of the three keys (i.e., LSF, HSF, Don't know) across participants, which helped to minimize any effect from specific fingers.
The experiment comprised several blocks of 54 such trials (Table 1). Participants performed 10–14 blocks in a single day, with short break between blocks. They completed the total of 38–45 blocks over 3–4 d. Participants completed at least 499 trials in each condition (of left vs right presentation of LSF vs HSF Gabors). Participants learned the correct relationships between the auditory cues and predicted SF within about two blocks of trials, without explicit instructions. We therefore removed these first two blocks from all subsequent analyses.
Auditory localizer
Before the experiment, we ran an MEG localizer to model the bottom-up processing of each one of three auditory cues. For each cue, each localizer trial started with a blank screen for 500 ms, followed by the auditory tone for 250 ms, then a blank screen for 1250 ms ITI. In a block of 12 trials, 10 of the trials presented the same tone, and the 2 other tones were catch tones. Participants had to press a key when the tone was a catch tone. Each participant completed 36 such blocks (i.e., 12 blocks per type of tones), with a block order of low frequency, middle frequency, and high frequency repeated 12 times.
MEG data acquisition and preprocessing
We measured participants' MEG activity with a 248-magnetometer whole-head system (Magnes 3600WH, 4-D Neuroimaging) at a 508 Hz sampling rate. We performed the analysis according to recommended guidelines using the FieldTrip toolbox (Oostenveld et al., 2011) and in-house MATLAB code.
For each participant, we discarded the runs (i.e., blocks) with the head movements >0.6 cm, measured by prerun versus postrun head position recordings. We then applied a 1 Hz high-pass filter [fifth-order two-pass Butterworth infinite impusle response (IIR) filter] to the remaining data and removed the line noise using discrete Fourier transform. We epoched the raw data into trial windows, separately for each stage, Stage 1, −200 ms predot onset to 1,000 ms postdot onset [hereafter (−200 ms 1,000 ms)] around onset; Stage 2 (−200 ms 1,000 ms) around sweeping sound onset; Stage 3 (−200 ms 600 ms) around Gabor patch onset. We denoised the epoched data via a Principal Component Analysis (PCA) projection of the reference channels. We rejected noisy channels with a visual selection and rejected jump and muscle artifacts with automatic detection (Oostenveld et al., 2011). We decomposed the output dataset with Independent Component Analysis (ICA), identified and removed the independent components corresponding to artifacts (eye movements, heartbeat, i.e., two to four components per participant).
Source reconstruction
For each participant, we coregistered their anatomic MRI scan with their head shape recorded on the first session and normalized the volume data to standardized MNI coordinate space (Gross, 2019). Using brain surfaces segmented from individual warped MRI, we then prepared a realistic single-shell head model. We resampled each epoched dataset (i.e., each stage) at 512 Hz, low-pass filtered the data at 25 Hz (fifth-order Butterworth IIR filter), specified the time of interest between 0 and 500 ms (postcue at Stage 2; post-Gabor stimulus at Stage 3) and computed covariance across the entire epoch. We then computed the forward model with a 6 mm uniform grid warped to standardized MNI coordinate space and performed the linearly constrained minimum variance (LCMV) beamforming analysis to reconstruct the time series of each source, with the parameter lambda = 6%. Following the above steps, for each participant we obtained a single-trial time series of 4413 MEG cortical sources at a 512 Hz sampling rate between 0 and 500 ms that we used to analyze the dynamic information processing in the Prediction and Categorization Networks, that is, at Stages 2 and 3 (Fig. 1).
We applied the same preprocessing pipeline to the MEG localizer, using the epoched data (−200 ms 500 ms) around tone onset. We applied the LCMV analysis 0–500 ms post-tone to reconstruct the source representation of the MEG localizer data.
Analyses
Cuing improves behavior
At a group level, we discarded invalid predictive trials and applied a 2 (left vs right location cues) × 2 (valid predictive vs neutral cuing) × 2 (LSF vs HSF Gabor patches) ANOVA on the median RTs (excluding incorrect response and outliers) and on the accuracy of all participants. We found a significant main effect of valid predictive versus neutral SF cuing on RTs, showing that valid predictive trials are significantly faster than neutral trials (F(1,10) = 20.8, p = 0.001), and a significant interaction effect between location cue and Gabor SF (F(1,10) = 17.4, p = 0.002). Further analysis showed that this predictive versus neutral cuing effect is significant (p < 0.05, after Bonferroni correction) for each of the four experimental conditions (left vs right locations × low vs high SFs), quantified by a paired-sample t test, independently for each condition. For categorization accuracy (ACC, the proportion of correct responses), the ANOVA was significant only for valid predictive versus neutral cuing, showing that ACC is significantly higher in valid predictive than neutral trials (F(1,10) = 22.5, p = 0.0008); and a significant interaction between location cue and Gabor SF (F(1,10) =1 3.8, p = 0.004). Further analysis showed that this effect of SF cue is significant (p < 0.05, Bonferroni correction) for all but the left LSF experimental conditions (paired-sample t test independent for each condition).
Stage 2: Prediction Network
Prediction representations
To understand the Stage 2 network of regions that propagates the LSF versus HSF auditory prediction before stimulus onset, we computed the representation of the cue across the whole brain, separately for left- and right-cued trials.
For each participant, we computed the single-trial mutual information (MI; <LSF vs HSF auditory cue; Stage 2 MEGt>), at each time point from 0 to 400 ms following Stage 2 auditory cue onset on each occipital source (lingual gyrus, cuneus, inferior occipital gyrus), temporal (fusiform gyrus, inferior temporal gyrus, middle temporal gyrus, superior temporal gyrus), parietal (superior parietal lobe, inferior parietal lobe, angular gyrus, supramarginal gyrus), premotor (precentral gyrus, postcentral gyrus), and frontal (orbitofrontal gyrus, inferior frontal gyrus, middle frontal gyrus, medial frontal gyrus, superior frontal gyrus). We computed MI with the Gaussian Copula Mutual Information estimator (Ince et al., 2017), which supports multidimensional variables. This semiparametric estimator fits Gaussian (maximum entropy) copula but does not make any assumption about the marginal distributions of the variables.
Prediction periods clustering
To compute the number of space × time periods of prediction representations, we applied k-means clustering analysis on all 4413 × 204 (source × time points) dimensional trials as follows. In Step 1, peak time extraction, for each participant, and independently for left- and right-cued trials and source, we extracted the peak time MI (<LSF vs HSF auditory cue; Stage 2 MEGt>), 0 and 400 ms postauditory cue onset.
For Step 2, matrix computation, across participants and cued conditions in each ROI (occipital, temporal, parietal, premotor, and frontal), we summed the numbers of sources that peak during each 10 ms step time window between 0 and 400 ms postauditory cue onset (i.e., 39 time windows), producing a 5 (ROIs) × 39 (time windows) matrix of MI peaks. This matrix represented the total brain volume of prediction representation dynamics over time.
In Step 3, clustering, we k-means clustered (k = 1-30, repeating 1000 times) the matrix from Step 2, using the 39 time windows as samples and selected k as the elbow of the within-cluster sums of point-to-centroid distances metric.
The result shows Stage 2 with k = 4 as a good solution, starting with a period zero, before any prediction representation, and then three distinct timed periods with temporal, frontal, occipital of peak representations of the prediction.
Prediction Network nodes (supports Fig. 3A)
To reveal the dynamics of MI (<LSF vs HSF auditory cue; Stage 2 MEG>) representation of the prediction, we localized the source peaking around the first peak in the 90–120 ms time window (start), the last peak at 120–200 ms (midway), and >200 ms (end). We computed the group mean of these three source-localized peaks across participants (Fig. 3A, group mean). Further, we applied 2 (left vs right-cued prediction) * 2 (left vs right hemisphere) ANOVA on the prediction representation on occipital sources to test the interaction effect (i.e., contralateral effect).
Prediction Network reconstruction (supports Fig. 3B)
To reconstruct the Stage 2 Prediction Network, we computed Directed Feature Information (DFI, where F is the auditory cue predicting the upcoming LSF vs HSF Gabor) in each participant for each pair of identified network nodes (i.e., sender, temporal; receiver, frontal; sender, frontal; receiver, occipital) as follows. For Step 1, source selection, we selected the highest MI source for the sending and receiving regions in the time window of interest (temporal, 90–120 ms; frontal, 120–200 ms; occipital, >200 ms).
In Step 2, directed information (DI; i.e., event-related Transfer Entropy) quantifies all the information communicated from sending to receiving sources, removing information sent from the receiver itself. For the receiver at time x, with a communication delay y from the sender, DI is computed as the conditional mutual information between RAx and SAx-y conditioned on RAx-y as follows:
In Step 3, DI conditioned on feature (DI|F), DI|F removes from DI the information communicated about the predictive LSF versus HSF feature itself. We computed DI|F for each receiving time × communication delay.
For Step 4, DFI, the difference between DI and DI|F isolates the information communicated about the predictive cue. We computed DFI as follows:
In Step 5, statistical significance, we repeated 200 times DFI computations with shuffled feature labels (i.e., LSF vs HSF), using as the statistical threshold the 95th percentile of the distribution of 200 maxima (each taken across the DFI matrix of each shuffled repetition, Family-wise error rate (FWER), p < 0.05, one tailed).
For Step 6, communication proportions, to compute the proportion of communications about a feature in total network communications between two regions, we computed ratio DFI/DI, at the maximum receiving-time × communication-delay of the DFI measure.
We applied Steps 1–6 to reconstruct the Stage 2 Prediction Network of each individual participant (see Fig. 8, individual participant's DFI networks). Figure 3B shows the group average network. Note that here we established the same statistical significance test for each participant and reported a combination of frequentist and Bayesian estimation (Ince et al., 2021). The Bayesian approach contains a two-level analysis, where the first-level analysis involves null hypothesis significance testing within participants, and the second level is the Bayesian estimation of population prevalence.
Prediction Network mediation (supports Fig. 4)
We then tested whether frontal cortex is a necessary mediator of Stage 2 prediction communications between temporal and occipital cortex by isolating the role of the frontal region in these communications. We then compared network communications with and without frontal mediation. The steps below detail how we computed frontal mediation in the Prediction Network of each participant.
In Step 1, frontal mediation, DFI, on the selected temporal and occipital sources for receiving time points between 0 and 400 ms postauditory cue onset and for each delay between 0 and 300 ms, we computed the receiving-time × communication-delay of temporal-to-frontal DFI and then frontal-to-occipital DFI (each computed as in above, Prediction Network reconstruction). This quantifies the mediating role of the frontal region in the communication of the predictive cue (compare Fig. 4B).
For Step 2, direct communication, DFI|Frontal (DFI|F), to isolate the role of frontal mediation we also computed temporal-to-occipital DFI conditioned on the frontal activity. Specifically, for each time point in the combination of (1) receiving time x between 0 and 400 ms postauditory cue onset, (2) communication delay y between 0 and 300 ms, and (3) mediation time z between receiving time and sending time (i.e., x and x–y), we computed DFI received by occipital at time x, sent by temporal at x–y, conditioned on frontal activity at time z. This produced the 3D DFI receiving-time × communication-delay × mediation-time conditioned DFI matrix. We took the minimum conditioned DFI across the mediation time as the directed communication (i.e., without frontal mediation; Fig. 4A).
In Step 3, statistical significance, we recomputed Steps 1 and 2 and their difference, shuffling the LSF versus HSF labels, that is, 200 repetitions, using the 95th percentile of 200 maxima as the statistical threshold, each maximum taken across the DFI minus DFI|F matrix of each shuffled repetition, FWER, p < 0.05, one tailed. This isolated the receiving-time × communication-delays showing significant enhancement with versus without frontal mediation.
We applied Steps 1–3 to each participant. Figure 4, A and B, shows the results of a typical participant (see Fig. 9, all individual results). Figure 4C shows the group mean difference and its Bayesian prevalence.
Stage 3: Categorization Network
Stimulus representations
To reconstruct the Stage 3 Categorization Network, on predictive trials, we computed for each participant the dynamics of LSF versus HSF Gabor stimulus representation across the whole brain, separately for left- and right-cued trials, that is, MI (LSF vs HSF Gabor; Stage 3 MEGt), on each source in occipital, temporal, parietal, premotor, and frontal regions at each time point from 0 to 500 ms following Gabor onset.
Categorization periods clustering
To compute the number of space × time stimulus representations period, we applied again k-means cross-trials clustering analysis on all 4413 sources × 256 time points as follows. In Step 1, peak time extraction, first, for each participant, and independently for left- and right-cued trials and source, we extracted the peak LSF versus HSF representation MI in 50 10 ms time windows spanning 0–500 ms post-Gabor.
For Step 2, matrix computation, across participants and conditions, we counted the number of sources per ROI (occipital, temporal, parietal, premotor, and frontal) that peak in each time window, producing an ROI × time matrix of MI peaks.
In Step 3, clustering, we k-means clustered (k = 1–30, repeating 1000 times) the matrix from Step 2, using the 50 time windows as samples and selected k as the elbow of the within-cluster sums of point-to-centroid distances metric.
Stage 3 comprised k = 4 clusters. A first period with no LSF versus HSF stimulus representation followed an occipital-ventral (150–250 ms, start), parietal (250–350 ms), and premotor-frontal (>350 ms) period of stimulus representation.
Categorization Network nodes (supports Fig. 5A)
To reveal the dynamics of MI (LSF vs HSF Gabor; Stage 3 MEG), in each participant we localized the source peaking in each one of the three representational periods. We then computed the group mean of these three sources across participants. Figure 5A shows the group mean.
Categorization Network reconstruction (supports Fig. 5B)
To reconstruct the Stage 3 Categorization Network that communicates the Gabor SF across occipital, parietal, premotor regions identified earlier, we computed DFI communications of the LSF versus HSF stimulus information. That is, in each participant, for each pair of regions (i.e., sender: occipital, receiver: parietal; sender: parietal, receiver: premotor), we performed the following three steps.
In Step 1, source selection, we selected one sender and one receive source with the highest Stage MI representation of Gabor LSF versus HSF in the time window of interest (occipital, 150–250 ms; parietal, 250–350 ms; premotor, >350 ms).
For Step 2, DFI, for each receiver time points between 0 and 500 ms post-Gabor stimulus onset and for each sender delays between 0 and 300 ms, we computed the receiver-time × communication-delay of LSF versus HSF stimulus representation with DFI (see above, Prediction Network reconstruction).
In Step 3, statistical significance was established recomputing DFI with shuffled LSF versus HSF labels, that is, 200 repetitions, using as the statistical threshold the 95th percentile of 200 maxima, each taken across the DFI matrix of each shuffled repetition, FWER, p < 0.05, one tailed.
In Step 4, communication proportions, to compute the proportion of communications about a feature in total network communications between two regions, we computed the ratio DFI/DI at the maximum receiving-time × communication-delay of the DFI measure.
We applied Steps 1–4 in each participant, reconstructing the occipital-to-parietal and parietal-to-premotor network that communicates the LSF versus HSF Gabor contents (see Fig. 10, all individual results; Fig. 5B shows the group average).
Stages 2–3: influences of Prediction Network on Categorization Network
Prediction enhances stimulus representation (supports Fig. 6A)
To understand how Stage 2 predictions of LSF versus HSF facilitate their Stage 3 categorization when the stimulus is shown, we compared LSF versus HSF Gabor representations between Stage 3 valid predictive and neutral trials in each participant and Categorization Network region (i.e., contralateral occipital-ventral, parietal, premotor). Specifically, we computed MI as follows. In Step 1, source selection, we selected one Stage 3 source per region with the highest MI (LSF vs HSF Gabor; Stage 3 MEGt) during the time window of interest (occipital-ventral, 150–250 ms; parietal, 250–350 ms; premotor, >350 ms).
In Step 2, MI computation, for each selected source we computed source-by-time MI (LSF vs HSF Gabor; Stage 3 MEG), every 2 ms between 0 and 500 ms post-Gabor onset, separately for valid predictive and neutral trials. For this computation, we matched number of valid predictive trials with neutral trials (random selection). We averaged the MI matrices for valid predictive trials from five such random trial selections.
For Step 3, statistical significance of difference was established by recomputing the source-by-time MI with shuffled valid predictive and neutral trials (repeated 200 times), calculating the difference of peak between recomputed valid predictive and neutral MI in the time window of interest, using as statistical threshold the 95th percentile of 200 maxima, each taken across the source-by-time difference of each shuffled repetition (FWER, p < 0.05, two tailed). We repeated the above Steps 1–3 for each participant. Figure 6A shows the group-level results.
Prediction modulates Categorization Network source activity and RT (supports Fig. 6B)
To demonstrate where and when valid predictions modulate premotor MEG activity to facilitate behavior, we compared the effect of valid predictive versus neutral at Stage 2 on Stage 3 Categorization Network brain activity and behavioral RT.
In Step 1, we computed positive coinformation (Co-I; <predictive vs neutral; Stage 3 MEGt; RT>), information theoretic redundancy, as follows: Co-I = MI (<predictive vs neutral; Stage 3 MEGt>) + MI (<predictive vs neutral; RT>) – MI (<predictive vs neutral; Stage 3 MEGt, RT>) on every source of the Categorization Network and at every 2 ms between 0 and 500 ms post-Gabor onset, producing a vector in Stage 3 time. Specifically, this estimator supports multidimensional variables, so the joint information MI (predictive vs neutral; Stage 3 MEGt, RT) is computed by combining the copula-normalized Stage 3 MEGt and RT variables into a 2 d variable.
For Step 3, statistical significance was established by recomputing the Co-I with shuffled predictive versus neutral labels, 200 repetitions, using as the statistical threshold the 95th percentile of 200 maxima, each taken across the vector of each shuffled repetition, FWER, p < 0.05, one tailed. We applied Steps 1–3 to each participant. Figure 6B shows the group results.
Control analyses
Stage 1: dot representation
To check whether representation of the dot cue from Stage 1 remains present until representation of the auditory cue in occipital cortex at Stage 2, we computed the dot cue representation with MI (<left vs right dot; Stage 1 MEGt>) on each occipital source in lingual gyrus, cuneus, and inferior occipital gyrus at each time point (1) from 0 to 1000 ms following Stage 1 dot cue onset and (2) from −100 ms to 0 ms around auditory cue onset at Stage 2. We then averaged the time courses of dot representation across the sources. Figure 7A shows the results.
Stage 2: auditory decoding
We used classifiers trained on auditory localizer data to cross-decode the bottom-up processing of the auditory cues at Stage 2 as follows. For Step 1, training, we trained linear classifiers (MVPA-Light toolbox; Treder, 2020) to discriminate the LSF versus HSF auditory cue every 2 ms between 0 and 400 ms poststimulus, using MEG sensor responses from the auditory localizer as the training set.
In Step 2, testing, every 2 ms between 0 and 400 ms post-Stage 2 auditory cue, we computed the classifier decision value from the single-trial MEG sensor response. This produced a 2D (training time × testing time) matrix of decision values on each trial. To quantify decoding performance, across trials we computed for each combination of training time and testing time the MI between single-trial classification decision value and the true stimulus label (LSF vs HSF auditory cue). To establish statistical significance, we repeated the decoding procedure described 1000 times with shuffled cue labels, applying threshold-free cluster enhancement (TFCE; Smith and Nichols, 2009; E = 0.5, H = 0.5), and using as the statistical threshold the 95th percentile of 1000 maximum values (each taken across all the time points per shuffle after TFCE; i.e., FWER, p < 0.05, one tailed). We took the maximum decoding performance across all training time points.
For Step 3, source representation reconstruction, at the time point of peak performance, for all 4413 sources we computed MI between single-trial decision value and single-trial source activity. We repeated Steps 1–3 to generate the performance curves and source representations of each participant. Figure 7B shows the averaged performance across participants.
Results
Prediction speeds up behavior
Our three-stage cuing design is depicted in Figure 2A. On each trial, a location cue at Stage 1 (green dot), briefly displayed left versus right of a central fixation cross (Posner cuing; Posner and Petersen, 1990), predicted the visual hemifield location (left vs right) of an upcoming Gabor patch (hereafter referred to as Gabor; see above, Materials and Methods, Stimuli) with 100% validity, followed by a 1–1.5 s blank screen. Stage 1 introduced a left versus right hemisphere task demand that a flexible prediction pathway should accommodate. At Stage 2, all trials started with an auditory cue. On predictive trials (66% of total), a 250 ms sweeping tone (196 vs 2217 Hz) signaled the SF content (low vs high, with an equal split of trial numbers) of the upcoming Gabor stimulus with 90% validity. On neutral trials (33% of total), a 622 Hz tone had no association with the upcoming stimulus. The auditory cue was followed by another 1–1.5 s blank interval (prediction period). Figure 2B depicts the couplings between auditory cues and Gabors. Finally, at Stage 3, one of two (LSF vs HSF) Gabor stimuli appeared in the participant's left or right visual hemifield for 100 ms, with fixed brightness and contrast. Participants (N = 11, see above, Materials and Methods, Participants) categorized the Gabor SF as quickly and accurately as they possibly could without feedback (i.e., three-alternative forced-choice (3-AFC), with responses LSF vs HSF vs Don't know; see above, Materials and Methods, Procedure).
As expected, SF prediction (in valid predictive trials) improved categorization accuracy (compared with neutral trials) on average by 2.58% (96.9 vs 94.3%), F(1,10) = 22.5, p = 0.0008, and sped up reaction times (RTs), on average by 87.7 ms (454.4 vs 542.1 ms), F(1,10) = 20.8, p = 0.001. Significant RT improvements applied to each Gabor location × SF presentation condition (Fig. 2C, Table 2; see above, Materials and Methods, Cuing improves behavior) and individual participants, that is, Bayesian population prevalence (Ince et al., 2021, 2022) with a maximum a posteriori probability (MAP) estimate of the population prevalence of the effect of 11/11 = 1, 95% highest posterior density interval (HPDI; 0.77 1). That is, this experiment provides evidence that this within-participant result generalizes to most individuals if they participated in the same experiment.
This speeding up of categorization behavior following prediction should involve the information processing mechanisms of a flexible, task-demand-sensitive Prediction Network at Stage 2 and a Categorization Network at Stage 3. To understand where, when, and how the mechanisms of the networks led to faster RTs, we reconstructed and analyzed these networks in each participant (from 4413 MEG sources covering the whole brain; see above, Materials and Methods, MEG Data acquisition and preprocessing).
Prediction Network
To identify the brain regions that flexibly communicate the SF prediction over Stage 2 before stimulus onset (Fig. 1), we computed how strongly each MEG source dynamically represents the prediction, separately for left- and right-cued trials at Stage 1 (to reveal lateralization of prediction communication into occipital cortex at Stage 2; Flom et al., 1963).
Specifically, for left- and right-cued trials at Stage 1, we computed the Stage 2 spatial-temporal representation of the predictive SF auditory cue and MEG source activity using MI (Ince et al., 2017), that is, MI (LSF vs HSF auditory cue; Stage 2 MEGt) over 4413 MEG sources, every 2 ms between 0 and 400 ms post-Stage 2 cue onset (see above, Materials and Methods, Prediction representations). In each participant, this computation produced two source-by-time matrices (for left- and right-cued trials at Stage 1) whose MI values indicate the strength of SF prediction representation at Stage 2.
To reveal the spatial-temporal unfolding of prediction representation, we applied a data-driven clustering analysis to these MI matrices (see above, Materials and Methods, Prediction periods clustering). We found three distinct spatial-temporal periods (i.e., clusters) in both left- (Fig. 3A, first row) and right-cued trials (Fig. 3A, second row). Figure 3A summarizes their dynamics at group level by showing plots of the sources with maximal Stage 2 prediction representation (i.e., peak MI) in each color-coded period. These periods were replicated in each individual participant.
Specifically, Figure 3A shows that Stage 2 prediction representation dynamics start with an early temporal lobe (TL) peak (auditory cortex, blue, Period 1), moving to prefrontal cortex [dorsal lateral PFC (dlPFC), 120–200 ms, magenta, Period 2], and then finally to the occipital cortex (OC) contralateral to the predicted location (>200 ms, orange, Period 3). Of note, Stage 2 prediction representations were contralateralized on occipital sources, that is, to the Stage 1 cued spatial location, group-level ANOVA, two (left vs right prediction) by two (right vs left hemisphere occipital cortex), F(1,10) = 18.87, p = 0.0015), replicated in 10/11 participants, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI].
The dynamics of prediction propagation suggest a functional network that specifically communicates the predictive cue. Reconstructing this network requires quantifying the communication of the predictive information separately from all other communications. We did this by computing DFI (Ince et al., 2015), which quantifies directed, time-lagged region-to-region communication about a specific feature (here, the predictive cue). We computed DFI [of LSF vs HSF prediction (P)] between pairings of the three sources identified earlier (i.e., one per color-coded period, that is, temporal, prefrontal and occipital) for each possible time lag and separately for left- and right-cued trials, that is, DFIP (regionAt1→regionBt2; see above, Materials and Methods, Prediction Network reconstruction).
Figure 3B shows these prediction communications as the cross-participant DFI matrix between sender (y-axis) and receiver sources (x-axis) across different time delays (FWER corrected, p < 0.05, one tailed) in right-cued trials. The PFC source (x-axis, left) receives the predictive cue ∼160 ms, sent from the temporal TL source ∼50 ms earlier (y-axis, right); PFC then flexibly sends the predictive cue (y-axis, right) contralaterally to left OC (lOC) sources on right-cued trials (x-axis, right), with a 100–200 ms delay. We replicated these communications in individual participants as follows (Fig. 3B, prevalence bar): TL to PFC, left-cued trials (unfilled) 11/11, right-cued (filled) trials 11/11, Bayesian population prevalence = 1 [0.77 1], MAP [95% HPDI]; PFC to right OC (rOC), left-cued trials, 9/11, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI]; PFC to lOC: right-cued trials, 10/11, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI] (see Figure 7 for individual results).
Importantly, we found that SF communications in the Prediction Network compose only a percentage of the total region-to-region communications [calculated by Directed Information; Massey, 1990; see above, Materials and Methods Prediction Network reconstruction (Steps 3 and 6)]. These results emphasize the importance of isolating communications of contents, that is, across participants, 74.2 ± 13.1% of temporal to prefrontal for left-cued trials, 69.4 ± 19.0% for right-cued trials, 60.3 ± 19.0% of prefrontal-to-occipital communications for left-cued trials, and 67.1 ± 22% for right-cued trials.
We now know that PFC flexibly communicates the prediction from TL to lateralized OC, depending on the task-demand stimulus location at Stage 1. We also know that prefrontal cortex synchronizes with visual cortex (signal to signal) in top-down visual predictions tasks (Bar et al., 2006; Lobier et al., 2018). Now we test the hypothesis that PFC flexibly mediates the communication of prediction contents between TL and OC as a function of task demands. To directly test this mediation, Figure 4 shows a contrast of a direct communication of the prediction from TL to OC without (vs with) PFC mediation, that is, computing DFIP(TLt1→OCt3)|PFC t2 (Fig. 4A) versus DFIP(TLt1→OCt3; Fig. 4B; see above, Materials and Methods, Prediction Network mediation). Figure 4 reveals that PFC does indeed flexibly mediate the predictive cue from TL to left versus right OC. That is, these communications are conditional on PFC source activity and are replicated for left- and right-cued trials in ≥10/11 participants (Fig. 8), Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI]. Thus, PFC actively and flexibly mediates the network communications of the prediction from TL to lateralized OC.
Categorization Network
Next, we similarly reconstructed the Stage 3 Categorization Network that processes the presented Gabor SF stimulus for behavior. First, on predictive trials we computed the Stage 3 dynamic representation of the stimulus to identify space-time regions that represent Gabor SF for categorization, that is, by computing MI (Gabor SF; Stage 3 MEGt), on each source and time point separately for left- and right-cued trials. Clustering these space-time MI matrices revealed again three periods of Gabor stimulus representation (Fig. 5A; see above, Materials and Methods, Categorization periods clustering). Specifically, stimulus representation starts with an early lateralized occipital-ventral peak (150–250 ms, orange, Period 4), followed by a parietal lobe peak (250–350 ms, red, Period 5) and a premotor-frontal cortex peak (>350 ms), brown, Period 6) independently for left- and right-cued trials and replicated in all participants (Fig. 5A; see above, Materials and Methods, Categorization Network, Categorization periods clustering).
Then, we reconstructed in each participant the DFI Categorization Network that communicates the LSF versus HSF contents, that is, computed as DFI (regionAt1→regionBt2; see above, Materials and Methods, Categorization Network reconstruction). Figure 5B shows that these group-averaged communications develop from contralateral occipital-ventral cortex to parietal and then to premotor cortex. We replicated these communications in individual participants as follows (Fig. 5B, the bar plots show the prevalence bar): rOC to PL, left-cued trials (unfilled) 10/11, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI]; lOC to PL, right-cued trials (filled) 9/11, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI]; PL to PMC, left- (unfilled) and right-cued (filled) trials 9/11 participants, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI] (Fig. 9, all individual results). Here again, SF feature communications were a proportion of the total region-to-region network communications, that is, 56.0 ± 27.0% of total occipital-ventral to parietal communications for left-cued trials and 62.8 ± 17.7% for right-cued trials; 59.1 ± 20.5% of total parietal-to-premotor communications for left-cued trials and 56.2 ± 23.2% for right-cued trials.
Stage 2 prediction influences Stage 3 stimulus SF representation for faster categorization
Here, we sought to understand the network mechanisms whereby Stage 2 SF predictions change Stage 3 SF representations from the shown stimulus, leading to faster categorization behavior. We proceeded in two steps; the first one addresses how prediction changes SF stimulus representation, and the second step isolates the Categorization Network components that speed up behavior because of prediction.
Step 1: does prediction enhance SF discrimination for categorization?
We analyzed how Stage 2 SF predictions change Stage 3 stimulus SF representation in each Categorization Network region and participant. Specifically, we computed the difference of SF stimulus representation with and without prediction, that is, the difference of MI (Gabor LSF vs HSF; MEGStage3), separately computed for valid predictive and neutral trials (see above, Materials and Methods, Prediction enhances stimulus representation). These representational differences are presented in the box plots of Figure 6A, in each color-coded space-time region and participant, that is, on the source that maximizes the difference in this region, against the null hypothesis of no difference (0, dashed line). Box plots show that valid predictions enhanced SF discriminations on occipital-ventral (150–250 ms), parietal (250–350 ms) and PMC (>350 ms) sources, FWER, p < 0.05, two tailed. Seven of 11 participants replicated these results in contralateral OC, for left- and right-cued trials, Bayesian population prevalence = 0.64 [0.33 0.85], MAP [95% HPDI]; 9/11 participants in parietal lobe and PMC, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI] (Fig. 6A, insets, prevalence bars).
Step 2: where and when does prediction speed up behavioral RT in the Categorization Network?
Next, we identified the space-time regions of the Categorization Network that relate to faster Stage 3 RT following Stage 2 prediction. In each participant, we computed the Co-Information (valid predictive vs neutral cue trials; MEGStage3; RT) for each source in the three network regions and separately for left- versus right-cued trials (see above, Materials and Methods, Prediction modulates source activity and RT). Co-Information quantifies the influence of prediction that is shared, trial by trial, by MEG and RT. It therefore reveals prediction-related MEG source activity that directly relates to faster RT. The plots in Figure 6B show these results as the participant average of the source with maximal Co-Information at each time point. They reveal two peaks after ∼250 ms that maximally relate prediction influence on source activity and faster RT in the Categorization Network at Stage 3. Small brains locate these peaks in the parietal lobe and PMC, replicated in all individual participants separately for left- and right-cued trials, Bayesian population prevalence = 1 [0.77 1], MAP [95% HPDI].
In sum, SF Stage 2 predictions enhanced Stage 3 stimulus SF representations in all regions of the Categorization Network across the time course of processing, although only the parietal lobe and premotor cortex speed up RTs.
Control analyses and individual results
At Stage 2, in addition to reconstructing Prediction Network, we additionally conducted analyses to control for the potential influence from Stage 1 dot presentation to Stage 2 and the bottom-up processing of the auditory cues. First, we demonstrate in Figure 7A that the Stage 1 dot representation ceases before the onset of the auditory cue, providing evidence that the contralateralization observed at Stage 2 is not a residual effect from Stage 1. Equally importantly, to control for the bottom-up processing of the auditory cues, we traced their representations at Stage 2 using a linear classifier (Treder, 2020) trained to discriminate LSF versus HSF auditory cues from localizer data. Figure 7B shows their decoding performance. We localized the source contributing to the decoding peaks in each time window of the prediction dynamics. We found that the source representation of the auditory cues remains within TL.
Importantly, we applied a new approach to statistics where we seek to replicate each result above in each individual participant. We then estimate the Bayesian population prevalence of the results from the experimental sample of participants, thereby alleviating most problems of the replication crisis (Ince et al., 2021, 2022). Having reported the Bayesian population prevalence, below we show the individual results in detail.
We replicated TL to PFC-OC communications in Stage 2 Prediction Network in individual participants as follows: TL to PFC, left-cued trials (unfilled) 11/11, right-cued trials (filled) 1/11, Bayesian population prevalence = 1 [0.77 1], MAP [95% HPDI]; PFC to rOC, left-cued trials 9/11, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI]; PFC to lOC, right-cued trials 10/11, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI].
We replicated the result that TL to OC communications are conditional on PFC source activity for left- and right-cued trials in ≥ 10/11 participants, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI].
We replicated OC to PL to PMC communications in Stage 3 Categorization Network in individual participants as follows (Fig. 5B, prevalence bar): rOC to PL, left-cued trials (unfilled) 10/11, Bayesian population prevalence = 0.91 [0.64 0.99], MAP [95% HPDI]; lOC to PL, right-cued trials (filled) 9/11, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI]; PL to PMC, left-cued (unfilled) and right-cued (filled) trials 9/11 participants, Bayesian population prevalence = 0.81 [0.53 0.96], MAP [95% HPDI].
Discussion
We isolated the network mechanisms that dynamically predict specific visual contents. Then we examined specifically where, when, and how prediction changes stimulus representation to speed up categorization behavior. Our three-stage experimental design used a location cue to predict the left versus right visual field location of an upcoming Gabor stimulus at Stage 1 to study the effects of prediction on stimulus representation, specifically in the occipital cortex contralateral to stimulus presentation, depending on task demands. The Stage 1 location cue was followed at Stage 2 by an auditory cue that predicted the LSF versus HSF contents of the upcoming Gabor stimulus that appeared on the screen at Stage 3. We reconstructed a Prediction Network that propagates the auditory predictive cue from temporal (90–120 ms post-Stage 2 cue) to occipital cortex (200–400 ms), via prefrontal cortex (120–200 ms), all prestimulus. We showed that prefrontal cortex (mainly, dlPFC) mediates communication of the predictive cue from temporal to the left versus right occipital cortex, depending on cued location at Stage 1, demonstrating that the prediction pathway is flexible depending on the demands of the task. When the Gabor stimulus is finally shown at Stage 3, we reconstructed poststimulus the Categorization Network that propagates the LSF versus HSF feature from occipital-ventral cortex (150–250 ms post-Stage 3 Gabor), parietal lobe (250–350 ms post-Stage 3 Gabor) and premotor cortex (>350 ms post-Stage 3 Gabor). We then showed how predictions change the Categorization Network and found that they enhance LSF versus HSF representations of the shown stimulus from occipital cortex to premotor cortex, leading to faster behavior. Together, our results quantitatively reveal cognitive network mechanisms that flexibly communicate top down the prediction of a specific content to occipital cortex, which enhances the bottom-up representation of these contents in the stimulus to speed up behavior.
Functional networks predict and then represent stimulus contents
Methodologically, we reconstructed a functional network that flexibly communicates a specific auditory prediction of visual contents (LSF vs HSF) from temporal to left versus right occipital cortex, with mediation of the PFC. That is, PFC is necessary to flexibly propagate the predictive cue. Such connectivity analyses involve individual MEG sources acting as sending and receiving network nodes. Importantly, DFI functional connectivity differs from other signal-to-signal connectivity analyses (such as Granger causality or transfer entropy) because DFI isolates what the communication is about (at Stage 2, the auditory prediction of LSF vs HSF) as a percentage of the full signal-to-signal connectivity (Ince et al., 2015). At this stage, the specific function of the communications between brain regions that are not about the stimulus features remains to be characterized. They could be about other stimulus features (e.g., its orientation, or contrast), other aspects of the task (i.e., task engagement) or related to dynamic state effects (such as attentional engagement or fatigue). Furthermore, the remaining communications could relate to the synchronization between sender and receiver nodes that is necessary to form a carrier network to convey the feature information (Ziemer and Tranter, 2006; Lobier et al., 2014; Sherblom, 2019).
A similar logic isolated the mediatory role of PFC. Thus, DFI addressed the first question schematized in Figure 1, of the functional network of regions that dynamically (and multimodally) propagate a prediction of visual information to the PFC that translates a prediction from auditory cortex into a predictive signal in occipital cortex that subsequently influences the representation of stimulus contents, when shown.
These Stage 2 effects were obtained from the contrast between the two auditory cues and therefore might reflect only bottom-up processing of these auditory signals (not the predicted visual contents). However, our demonstration that PFC mediates the propagation addresses this point by showing a high-level modulation distinct from the dynamic representation of the tone itself (tested with the localizer before the experiment; Fig. 7). Also, we proved the visual specificity of the Stage 2 prediction with the end point in occipital cortex contralateral to the predicted location (compare Fig. 3). Thus, the propagation of the visual prediction at Stage 2 is distinct from that of the auditory input.
To address the question of how prediction influences Stage 3 processing of the stimulus, we compared Stage 3 stimulus representation with and without prediction. A key unresolved question about the role of predictions is whether they enhance versus dampen stimulus representation (De Lange et al., 2018). Evidence for one or the other typically relies on enhanced versus impaired decoding performance of the predicted stimulus in the regions of interest (Lee and Mumford, 2003; Kok et al., 2012; Blank and Davis, 2016; Kumar et al., 2017). Here, we showed that predictions enhance the representation of LSF versus HSF stimulus contents, locating these enhancements in source space and time. Most participants (7/11) showed that prediction enhances LSF versus HSF discrimination in occipital cortex, in parietal cortex (9/11), and in premotor cortex (9/11), the latter relating to faster behavioral categorization. Thus, our evidence supports the hypothesis that prediction enhances stimulus representation in the Categorization Network.
The mediation (i.e., control) role of prefrontal cortex
An interesting finding of our functional network is that prefrontal cortex mediates the temporal to occipital communication of the predictive cue. More precisely, we located the sources with highest representation of the predictive cue in the dorsolateral prefrontal cortex (Sanches et al., 2009), often related to working memory (D'Esposito et al., 1998; Rowe et al., 2000; Friedman and Robbins, 2022), selective attention (Goddard et al., 2022), and task performance (Collette et al., 2005). Prefrontal cortex could orchestrate the information of the auditory cue (i.e., upcoming LSF vs HSF) together with the memory of the upcoming stimulus location (i.e., left vs right visual field) and selectively prepare the contralateral occipital sources for the upcoming contents. Our results are compatible with this hypothesis, because representation of the prediction on occipital sources at Stage 2 is indeed contralateral to the predicted visual field where the stimulus will appear at Stage 3, that is, left occipital sources for a predicted right visual field stimulus and vice versa. Future work that fuses MEG and high-field fMRI will seek to resolve the specific cortical laminar layer that receives the prediction at Stage 2 (e.g., central laminar layer; Lawrence et al., 2019) and how this prediction then interacts with the cortical layer representation of the feedforward flow when the stimulus is shown at Stage 3 (e.g., in peripheral laminar layers; Lawrence et al., 2019).
Predictions and representations of face, object, body, and scene stimuli
We used DFI to reconstruct the dynamic Prediction and Categorization Networks. Our approach to the neuroimaging of cognitive tasks differs from most other approaches in several critical ways. First, our overarching goal is to reconstruct, for each individual participant, the network of MEG sources that communicate (i.e., send and receive) the information (e.g., an auditory cue; a visual feature) that is necessary to resolve the cognitive task under study (Schyns et al., 2009, 2022; Jaworska et al., 2022). These cognitive tasks play a critical role to shape the communications of specific stimulus and memory information across the networks of the brain (Schyns, 1998; Smith et al., 2004; Jaworska et al., 2022; Schyns et al., 2022; Kay et al., 2023). Second, to do so, we use a new measure of functional connectivity (i.e., DFI; Ince et al., 2015, 2016)) that differs from most other signal-to-signal measures of connectivity (e.g., Granger causality (Bressler and Seth, 2011) or transfer entropy (Lobier et al., 2014)). DFI quantifies communication of specific information between network nodes. For example, at Stage 2 of our experiment, nodes communicate the information about the auditory prediction of LSF versus HSF. DFI communication is expressed as a percentage of the full signal-to-signal connectivity between pairs of nodes. With DFI, we can uniquely interpret neural signal communications in terms of the specific information contents that the brain networks flexibly communicate to achieve a cognitive task. This is important to isolate because we showed in our prediction experiment that communications of the predicted features is only ∼55−75% of all signal-to-signal communications between brain regions. A direct consequence of DFI connectivity is that we can locate the network nodes where different information converges (i.e., the hubs; e.g., contralateral occipital representations of the left and right eyes of a face converge into the right fusiform gyrus hub; Schyns et al., 2007; Ince et al., 2016; Zhan et al., 2019; Jaworska et al., 2022). In turn, we can analyze whether hub nodes perform specific linear and nonlinear computations on their inputs (Jaworska et al., 2022). And these analyses apply equally to bottom-up and top-down information flows in the network. Here, they revealed mechanisms that top down propagate predictions of LSF versus HSF stimulus features from temporal to lateralized occipital cortex, depending on task demands. In turn, these predictions enhance bottom-up LSF versus HSF representations from occipital cortex to premotor cortex to speed up categorization behavior. Thus, our approach enables a unique mechanistic, algorithmic understanding of the information processing network that realize a specific cognitive task, which is the ultimate explanatory goal of cognitive neuroimaging (Schyns et al., 2009, 2022; Jaworska et al., 2022).
Generalizing from Gabor stimuli to more naturalistic face, object, and scene categorization tasks will incur several challenges to study the visual features that categorizes faces, objects, and scenes (Schyns et al., 2009, 2020). A key challenge is that the stimulus features participants use to predict and then categorize can differ across behaviors and levels of expertise (e.g., categorizing the same picture as city vs New York; Gauthier et al., 1999; Malcolm et al., 2014). We therefore need to characterize these features per participant and task to then study their predictions and representations for behavior in functional networks (Jaworska et al., 2022; Schyns et al., 2022; Kay et al., 2023). In particular, a methodological challenge remains to understand the compositionality of visual predictions as they decompose from their integrated representation high in the visual hierarchy (e.g., right fusiform gyrus) to their contralateral components for occipital cortex, down to their simplest Gabor representation in the lower hierarchical levels. This would require fusion of brain measures (e.g., high-field fMRI to finely tap into laminar layers (Gilbert and Li, 2013) and E/MEG (Ince et al., 2015) to trace the dynamics of these representations across layers in the occipito-ventral-dorsal streams).
Thus, to understand complex dynamic predictions and representations in the brain, we must understand the categorization task (e.g., city vs New York), the hierarchical composition of features that represent each category in the participant's memory, and trace their hierarchical predictions in the feedback flow (Yuille and Kersten, 2006) and their subsequent representation in the feedforward flow when the stimulus is shown. Once the compositionality of representations is understood, we could study how sensory hierarchies decompose predictions to facilitate stimulus processing and behavior.
Conclusions
We sought to isolate and understand the propagation of specific cognitive predictions in a Prediction Network and then how these predictions change the Categorization Network that processes the predicted contents in the stimulus. We showed that the Prediction Network dynamically propagates predictions of visual contents from temporal to occipital regions via the flexible mediation role of prefrontal regions. Then we showed that predicted contents were more sharply represented when the stimulus is shown in the Categorization Network from occipital-ventral to premotor cortex, via parietal cortex, leading to faster decision behavior. Our Prediction and Categorization Networks split the communications on specific contents from overall signal-to-signal connectivity, in principle generalizing to other stimulus features and sensory modalities.
Footnotes
P.G.S. was supported by Wellcome Trust Grant 107802 and U.S. Department of Defense Multidisciplinary University Research Initiative Grant 172046-01. R.A.A.I. was supported by Wellcome Trust Grant 214120/Z/18/Z.
The authors declare no competing financial interests.
- Correspondence should be addressed to Philippe G. Schyns at Philippe.Schyns{at}glasgow.ac.uk