Abstract
Learned associations between stimuli allow us to model the world and make predictions, crucial for efficient behavior (e.g., hearing a siren, we expect to see an ambulance and quickly make way). While there are theoretical and computational frameworks for prediction, the circuit and receptor-level mechanisms are unclear. Using high-density EEG, Bayesian modeling, and machine learning, we show that inferred “causal” relationships between stimuli and frontal alpha activity account for reaction times (a proxy for predictions) on a trial-by-trial basis in an audiovisual delayed match-to-sample task which elicited predictions. Predictive β feedback activated sensory representations in advance of predicted stimuli. Low-dose ketamine, an NMDAR blocker, but not the control drug dexmedetomidine, perturbed behavioral indices of predictions, their representation in higher-order cortex, feedback to posterior cortex, and pre-activation of sensory templates in higher-order sensory cortex. This study suggests that predictions depend on alpha activity in higher-order cortex, β feedback, and NMDARs, and ketamine blocks access to learned predictive information.
SIGNIFICANCE STATEMENT We learn the statistical regularities around us, creating associations between sensory stimuli. These associations can be exploited by generating predictions, which enable fast and efficient behavior. When predictions are perturbed, it can negatively influence perception and even contribute to psychiatric disorders, such as schizophrenia. Here we show that the frontal lobe generates predictions and sends them to posterior brain areas, to activate representations of predicted sensory stimuli before their appearance. Oscillations in neural activity (α and β waves) are vital for these predictive mechanisms. The drug ketamine blocks predictions and the underlying mechanisms. This suggests that the generation of predictions in the frontal lobe, and the feedback pre-activating sensory representations in advance of stimuli, depend on NMDARs.
Introduction
The classical view of sensory processing focuses on feedforward information transmission from the sensory organs to higher-order cortex, to generate representations of the world (Riesenhuber and Poggio, 1999; Serre et al., 2007). However, growing evidence of expectations strongly influencing perception and behavior (Körding and Wolpert, 2004; Summerfield et al., 2006; de Lange et al., 2018) suggests that the brain actively predicts incoming sensory information, a process that is not featured in the traditional framework. Predictive coding (PC) takes this process into account, wherein the brain uses generative models to make inferences about the world (Mumford, 1992; Dayan et al., 1995; Rao and Ballard, 1999; Friston, 2010; Spratling, 2017), possibly even to support conscious experience (Hobson and Friston, 2012, 2014; Seth et al., 2012). PC proposes that these models, based on prior sensory experiences, are represented at higher-order levels of a cortical hierarchy. The model predictions are transmitted from higher-order to lower-order cortex along feedback connections. Any mismatch between feedback predictions and observed sensory evidence generates an error signal, which is transmitted along feedforward connections, to update models in higher-order cortex (Egner et al., 2010; Gordon et al., 2017; Issa et al., 2018). The updated model is used in the next iteration to generate new predictions. This process of Bayesian updating aims to optimize beliefs about the sensory world. However, the neural representation of predictions is unclear.
N-methyl-D-aspartate receptors (NMDARs) may play a key role in PC. Theoretical work on PC (Friston, 2005) has proposed that higher levels of a cortical hierarchy transmit top-down predictions to lower levels through NMDAR-mediated signaling. Consistent with this proposal, NMDARs have been shown to modulate higher-order (frontal) cortical excitability (Homayoun and Moghaddam, 2007, 2008; Murray et al., 2014; Rosch et al., 2019), be enriched in superficial and deep cortical layers where feedback connections terminate (Rosier et al., 1993), and contribute to feedback activity (Self et al., 2012). However, there is a lack of experimental evidence linking NMDARs and prediction itself. Ketamine, a NMDAR blocker (Zorumski et al., 2016), can reduce prediction error signals, measured as auditory mismatch negativity (MMN) in oddball paradigms (Javitt et al., 1996; Schmidt et al., 2012; Corlett et al., 2016). But because the MMN reflects the mismatch between predictions and observed sensory evidence, it is difficult to dissect if ketamine's effect on the MMN is because of ketamine influencing feedback predictions, feedforward sensory evidence or error signaling directly, any of which can reduce the MMN. Further, prediction is only assumed in oddball paradigms; there is no behavioral measure of prediction. Hence, a paradigm that incorporates a behavioral readout of predictions and separates predictions from other PC mechanisms is required to probe a contribution of NMDARs to predictions and their neural representation.
To test circuit and receptor-level mechanisms of prediction, we recorded 256-channel EEG of subjects performing an audiovisual delayed match-to-sample task (see Fig. 1A). The task design temporally separates predictions (generated during the delay period) from error processing (after image onset), which cannot be done in oddball paradigms (Javitt et al., 1996; Schmidt et al., 2012; Corlett et al., 2016). Furthermore, having auditory stimuli carry predictive information about visual stimuli allows us to modulate separate feedforward (auditory to frontal) and feedback (frontal to visual) pathways. Subjects performed the task before, during, and after recovery from subhypnotic dosing of ketamine, targeted to concentrations that modulate NMDARs. In control experiments, we instead administered subhypnotic dexmedetomidine (DEX), an α2 adrenergic receptor agonist, selected to account for changes in arousal and modulation of hyperpolarization-activated cyclic nucleotide channels (HCN-1, which mediate ketamine's anesthetic effects) (Chen et al., 2009). We hypothesized that NMDARs contribute to predictive processes in higher-order cortex, sensory representations in advance of predicted stimuli, and consequent behavioral advantages. Thus, we expect ketamine, but not DEX, to disrupt neural representations of predictions and prevent faster reaction times (RTs) to predicted stimuli.
Materials and Methods
The University of Wisconsin-Madison Health Sciences and Social Sciences Institutional Review Boards (IRBs) approved experiments. Twenty-nine participants (14 female) performed the psychophysics PC experiment. We excluded data from 4 subjects as their performance accuracy was <50%. Seventeen additional participants (6 female) for DEX and 15 participants (5 female) for ketamine took part in the pharmacology PC experiments (mean age = 22.35 years; SD = 2.82 years). Of the 15 participants who performed ketamine experiments, the first 13 had to participate in the DEX experiments first as per IRB requirements. To rule out an effect of drug order, we also ran our linear mixed effects models with “drug order” as a covariate (RT ∼ sounds' predictive value + drug condition + order + sounds' predictive value × drug condition). There was no significant effect of “drug order” on RT (F(1,13.54) = 1.58, p = 0.20). Both the main effect of prediction and interaction effect of predictive value × drug condition were still significant. Three participants also performed in saline control experiments, which yielded results similar to the predrug baseline. Three participants (from 17 DEX participants) were excluded for low accuracy (accuracy <50%). Participants who performed the psychophysics PC experiment did not take part in the pharmacology experiments. We obtained informed consent from all participants.
Stimuli
For our psychophysics experiments, we used biomorphic visual stimuli from Michael Tarr's laboratory (https://sites.google.com/andrew.cmu.edu/tarrlab/resources/tarrlab-stimuli). These are known as greebles. Figure 1A shows examples presented to participants. We used three grayscale greebles for each psychophysics session, and each greeble was personified with a name. We used novel sounds (trisyllabic nonsense words) for the greeble names (e.g., “Tilado,” “Paluti,” and “Kagotu”) from Saffran et al. (1996). The sounds were generated using the Damayanti voice in the “text to speech” platform of an Apple MacBook. To avoid differences in the salience of stimuli, greeble images have similar size (13 degrees of visual angle in height and 8 degrees in width), number of extensions and mean contrast, and greeble names have the same number of syllables and sound level (80 dB SPL).
For the pharmacology experiments, we generated three new triplets of greebles. To control for saliency, each participant rated greeble salience for each of four (three plus one from the psychophysics experiment) triplets; that is, the participant identified whether any of the greebles in a triplet stand out compared with the other two greebles from the same triplet. We proceeded to use triplets which the participant rated all three greebles as being equally salient. We then named each of these greebles with a new trisyllabic nonsense word.
Audiovisual delayed match-to-sample task
Each trial of the task involves the sequential presentation of a sound (trisyllabic nonsense word) followed by a greeble image. We refer to stimuli using the following notation: A1, A2, and A3 correspond to each of the three sounds used (A for auditory); and V1, V2, and V3 correspond to each of the greebles used (V for visual). Using this notation, audiovisual stimulus sequences containing the matching name and greeble are A1-V1, A2-V2, and A3-V3. Audiovisual stimulus sequences containing a nonmatching name and greeble are A1-V2, A1-V3, A2-V1, A2-V3, A3-V1, and A3-V2. We pseudo-randomized names for greebles (i.e., matching sounds and images) across subjects.
Learning phase
During the first phase of the task, participants learn the association between the sounds and images (i.e., names of the greebles) through trial-and-error, by performing a match/nonmatch (M/NM) task. This phase is called the “learning” phase. Each trial starts with blank blue screen (R = 35, G = 117, B = 208, 200 ms duration; see Fig. 1A), gray). After that, a black fixation cross (size 1.16 degrees of visual angle; jittered 200-400 ms) is presented followed by a sound, a greeble name voiced by the computer (600 ms duration). After a jittered delay period (900-1200 ms duration), a greeble image (until a participant responds or 1100 ms duration, whichever is earliest) was presented on the monitor screen, as well as two symbols (√ and X) to the left and right of the greeble (9.3 degrees of visual angle from screen center). These symbols indicated participants' two response options: match (√) or nonmatch (X). The symbol location, left or right of the greeble image, corresponded to the left or right response button, respectively: left and right arrow keys of a computer keyboard in the psychophysics experiments; and left and right buttons of a mouse in the pharmacology experiments. We randomly varied the symbols' locations relative to the greeble image to minimize motor preparation (i.e., on some trials, a match response required a left button press and, on other trials, a match response required a right button press). In the learning phase, each greeble name and image had 33% probability of appearing in any given trial. This is to prevent subjects from developing any differential predictions about the greebles because of greeble name or image frequency, during the learning phase.
To address possible same-different biases (e.g., quicker RTs for match trials) (Lupyan and Thompson-Schill, 2012), we introduced a control called “inversion trials” in psychophysics experiments, to minimize the expectation of M/NM trials which, in itself, might otherwise contribute to participants' responses. In these inversion trials, participants had to respond whether the greeble image presented on screen is inverted (the appropriate response button, left/right, indicated on screen by the left/right location of a red arrow pointing down) or upright (yellow arrow pointing up). Participants did not know the type of trial in advance; the trial type was only revealed by the symbols to the left and right of screen center at the onset of the greeble image (i.e., √ and X signal M/NM trials, whereas downward red arrow and upward yellow arrow signal inversion trials); 50% of the total number of trials in the learning phase were inversion trials, and the rest were M/NM trials. Because participants cannot specifically prepare in advance for M/NM trials because of the random presentation of inversion and M/NM trials, there should be minimal confounding of RTs with a bias toward match responses.
Testing phase
Once participants show >80% accuracy for the M/NM trials in the learning phase of the task, they move on to the “testing phase” (1000 trials for the psychophysics experiments; see Fig. 1A). During the testing phase, we manipulated predictions by changing the probability of a greeble appearing after its learnt name. This probability is different for each greeble name and image. That is, in the testing phase, when a participant hears A1, there is 85% chance of V1 being shown (highly predictive [HP]); when a participant hears A2, there is 50% chance of V2 being shown (moderately predictive [MP]); and when a participant hears A3, there is a 33% chance of V3 being shown (not-match predictive [NP]). This allows participants to make stronger predictions about the identity of the upcoming visual image after hearing A1, than after hearing A2 or A3, for instance.
Inversion trials also consisted of half the total trials in the testing phase of the psychophysics experiments. Hence, like the learning phase, there should be minimal confounding of RTs with a bias toward match responses. This is supported, first, by the finding that accuracy was greatest for A1V1 trials (HP trials; see Fig. 2B), which might not have been expected (considering possible speed-accuracy tradeoffs) if subjects were simply responding faster to match trials generally. Second, if subjects are showing a match bias, then one might expect longest RTs to A1 (A1 > A2 > A3) in nonmatch trials, in addition to the shortest RTs to A1 (A1 < A2 < A3) in match trials. However, we found that nonmatch trials yielded similar RTs for each condition (A1, A2, and A3; see Fig. 2H), while the RTs for match trials still differed according to the predictive value of sounds. Since participants did not show a match bias in our psychophysics study (where we had more flexibility with the duration of the experiment and the number of trials), we conducted the pharmacology study without inversion trials, to prioritize sufficient M/NM trials for our drug condition analyses.
We randomly presented all the trial types (M/NM trials and inversion trials) to the participants. The task design involved creating the different predictive values for the three sounds in the testing phase, as well as balancing the stimulus numbers and M/NM trials as closely as possible. That is, the testing phase of the task had a similar proportion of each voiced name and each greeble image, to control for stimulus familiarity, as well as a similar proportion of match and nonmatch trials to avoid response bias. This resulted in 52% match trials and 48% nonmatch trials, to achieve the different predictive sound values. The subtle differences in trial numbers were accounted for in the causal power analysis, which incorporates the trial history of sound-image associations. Further, in the debrief after task performance, no subjects reported any stimulus having a different presentation frequency, nor any difference in M/NM trial numbers.
Causal strength and transitional probability
We quantified the relationship between a sound (name) and its paired image (greeble) using the strength of causal induction: given a candidate cause C (sound) how likely is the effect E (i.e., how likely is it followed by its paired image). We will represent variables C and E with uppercase letters, and their instantiations with lowercase letters. Hence, C = c+/E = e+ indicates that the cause/effect is present, and C = c–/E = e– indicates that the cause/effect is absent (for brevity, we will shorten variables equal to outcomes, such as C = c+ or C = c– as simply c+ or c–, respectively). The evidence for a relationship can be encoded as a 2 × 2 contingency table for each sound, as in Table 1, where N(c+,e+) represents the number of trials in which the effect occurs in the presence of the cause, N(c–,e+) represents the number of trials in which the effect occurs in the absence of the cause and so on. Applied to our study, for example, C could be hearing sound A1, and E viewing the paired greeble V1. For this case, N(c+,e+) would be the number of trials V1 follows A1, whereas N(c–,e+) would be the number of trials V1 follows A2 or A3. The full contingency table for the HP auditory cue A1 and its paired greeble V1 is shown in Table 1. There are analogous contingency tables for the other two auditory cues and their paired greebles.
Based on these contingency tables, we calculated three different measures of causal relationship (ΔP, causal power and causal support) for each trial. ΔP and causal power assume that C causes E. ΔP reflects how the probability of E changes as a consequence of the occurrence of the cause C. Causal power corresponds to the probability that an effect E happened because of cause C in the absence of all other causes. Whereas causal support evaluates whether or not a causal relationship actually exists and calculates the strength of that relationship. To do this, causal support estimates the evidence for a graphical model with a link between C and E against one without a link. For example, let us consider the graphs denoted by Graph 0 and Graph 1 in Figure 2D (adapted from Griffiths and Tenenbaum, 2005). There are three variables in each graph: cause C, effect E, and background cause B. In Graph 0, B causes E, but C has no relationship to either B or E. In Graph 1, both B and C cause E. While calculating ΔP and causal power Graph 1 is assumed, whereas causal support compares the structure of Graph 1 to that of Graph 0. Causal support is defined as the evidence provided from data D in favor of Graph 1, P(D | Graph 1), over Graph 0, P(D | Graph 0), which can be calculated by the following equation:
We calculated causal support using freely available MATLAB code from Griffiths and Tenenbaum (2005). ΔP and causal power were calculated using the following formulas:
We compared these three measures of causal relationship with the transitional probability (i.e., a comparison between causation and correlation). We calculated the transitional probability of each greeble (V) given prior presentation of its paired sound (A), using the following equation:
For each subject, we used the causal relationship value of each condition (HP, MP, and NP) at the end of a testing phase as the starting values of the next testing phase. For example, the starting values of causal relationship for the “under drug” testing phase were equal to the causal relationship values at the end of the “pre/baseline” testing phase. Similarly, the starting values for the “after recovery” testing phase were equal to the causal relationship values at the end of the “under drug” phase (see Fig. 3A,B). We tested whether subjects (1) regained access to already learned and stored predictive information, after they recovered from ketamine dosing, or (2) relearned the predictive information. We mimicked Hypothesis 2 by forcing the starting values of causal relationship (causal power here) “after recovery” to be zero (see Fig. 3C) instead of starting values equal to the causal relationship at the end of the “under drug” phase (Hypothesis 1; see Fig. 3B).
Pharmacology experiments
To manipulate participants' predictions, we administered two drugs, ketamine and DEX, each on a separate day, with at least 1 month intervening. The fixed order of DEX in the first session, and ketamine in the second session, for the first 13 ketamine subjects was IRB-imposed in their consideration of safety profiles of the different medications (registered on NCT03284307); whereas the final 2 ketamine subjects were not administered DEX in a prior session. All subjects were healthy and between 18 and 40 years of age without contraindication to study drugs. We also acquired EEG data throughout pharmacology experiments (see EEG recording, below), to measure electrophysiological activity during PC. A typical pharmacology experiment consisted of three segments: (1) predrug baseline, (2) under drug influence (DEX or ketamine), and (3) after recovery. During the predrug baseline, participants performed the learning phase (200 trials), then the first testing phase (400 trials). Under drug influence, participants performed the second testing phase (200 trials). Under ketamine, participants also performed a third testing phase (200 trials; for details, see Ketamine dosing). After recovery, they performed the last testing phase (400 trials). As a drug control, we tested 3 participants during saline administration. The saline results were similar to the predrug baseline. Because of a protocol-limited maximum time under drug influence, and the need to acquire sufficient M/NM trials for EEG analyses, the pharmacology experiments did not include inversion trials. All other aspects of the task in pharmacology experiments were the same as that in psychophysics experiments. For each of the two pharmacology experiments involving a particular participant, we used three new greeble name and image pairs to rule out any possible contribution of long-term memory.
DEX dosing
We intravenously administered a 0.5 μg/kg bolus over 10 min, followed by 0.5 μg/kg/h infusion (MedFusion4000 pump at μg/kg/hr). Participants performed the testing phase (under drug influence) during this infusion time, corresponding to stable drug levels according to the pharmacokinetic model for DEX by Hannivoort et al. (2015). We targeted a plasma concentration of DEX that is associated with mild sedation (modified observer's assessment of alertness/sedation [OAA/S]) of 4) (Colin et al., 2017) to control for nonspecific sedative effects, including HCN-1 effects (Yang et al., 2014). The actual sedation achieved was on average slightly deeper than anticipated (modified OAA/S median 3, IQR 2) with a mean plasma concentration of 0.8 (SD 0.33) ng/ml (Colin et al., 2017).
Ketamine dosing
In initial experiments, we tested two doses of ketamine to target the lowest plasma concentration that would modulate NMDARs in the relevant concentration range (<1 μm) (Morris et al., 2017). The first dose corresponded to intravenously administered 0.25 mg/kg ketamine over 10 min, followed by 30 mg/h infusion, corresponding to 0.4 μm (Rugloop software using a Harvard 22 pump). Before testing subjects, lack of nystagmus and visual disturbance was confirmed in all participants. Participants performed the testing phase (under drug influence) during this infusion time, corresponding to stable drug levels according to the pharmacokinetic model for ketamine by Domino et al. (1982, 1984). A second dose was tested with a second bolus of 0.25 mg/kg ketamine over 10 min, followed again by 30 mg/h infusion. Testing again was completed once stable plasma concentrations of ∼0.8 μm were achieved. Ketamine blocked predictions at this second level of ketamine dosing, equating to a minimum plasma concentration of 0.2 µg/ml; range tested: 0.2-0.3 µg/ml. As we found the effective ketamine dose to be 0.2 µg/ml in our first 7 subjects, we targeted that plasma concentration for our remaining subjects. We report data for the 0.2 µg/ml dosing in the manuscript. Three subjects were excluded from “after recovery” testing because of vomiting.
Monitoring
Subjects were monitored during drug exposure according to the American Society of Anesthesiologists guidelines, including electrocardiogram, blood pressure, and oxygen saturation. We monitored arousal level using the modified OAA/S scale (Chernik et al., 1990). We also monitored subjects' experience of dissociation, using subjects' ratings on a dissociation scale ranging from 1 to 10. We asked subjects the question: “On a scale of 1-10, with 1 being completely normal and 10 being an out of body experience, how dissociated are you feeling right now?” The dissociation scores were similar for the predrug baseline (mean ± SD, 1.12 ± 0.35) and the low dose of ketamine (plasma concentration ∼0.2 μg/ml) used for the task (mean 2.65 ± 1.26); whereas dissociation scores were much higher (mean 8.58 ± 1.55) for a higher dose of ketamine (plasma concentration ∼0.4 μg/ml). After the end of the drug infusion, symptoms continued to be monitored. The “after recovery” testing occurred when nausea and nystagmus (resulting from the higher dosing) subsided sufficiently to perform the task. Subjects were generally able to perform the task within 2 h after stopping drug infusion.
EEG recording
We performed high-density EEG recordings using a 256 channel system (including NA 300 amplifier; Electrical Geodesics). After applying the EEG cap with conductive gel (ECI Electro-Gel), we adjusted electrodes so that the impedance of each electrode was within 0-50 kiloohms. We checked electrode impedance before the experiment started, and again before drug administration. Using Net Station, we sampled EEG signals at 250 Hz and, offline, bandpass filtered between 0.1 and 45 Hz.
EEG preprocessing
We combined predrug baseline data from both ketamine and DEX experiments (baseline RT data showed similar results for both drugs), but “under drug” analyses were performed separately for each drug. We performed offline preprocessing and analysis using EEGLAB (Delorme and Makeig, 2004). First, we extracted data epochs −1500 to 3000 ms relative to the onset of the sound and −3000 to 800 ms relative to the onset of the image, for each trial. We then visually inspected each epoch and excluded noisy trials (∼5% of the total trials). Next, we performed independent component analysis using built-in functions of EEGLAB (pop_runica.m) and removed noisy components through visual inspection. We excluded 3 DEX subjects from further analysis because of very noisy EEG data, which after cleaning left too few trials for analysis (conditions with <10 trials). Finally, we performed channel interpolation (EEGLAB function, eeg_interp.m, spherical interpolation) and rereferenced to the average reference.
Time-frequency decomposition
To investigate changes in EEG spectral content, we performed time-frequency decompositions of the preprocessed data in sliding windows of 550 ms using Morlet wavelets, whose frequency ranged from 5 to 45 Hz in 40 linearly spaced steps. Power for each time-frequency point is the absolute value of the resulting complex signal. We dB normalized power (dB power = 10*log10[power/baseline]) to the prestimulus baseline, starting 700 ms before sound onset (i.e., baseline calculated using sliding 550-ms-long wavelets, starting with the wavelet positioned from 700 to 150 ms before sound onset (centered on 425 ms before sound onset) and avoiding the sound-evoked response). For drift-diffusion model (DDM) analysis, we calculated power spectral density and performed divisive baseline correction for each trial.
Electrode selection
We used a data-driven approach, orthogonal to the effect of interest, to select the electrodes of interest based on the task EEG data. In the first step, we averaged power across all electrodes (aligned to image onset) and all sounds/greeble names. This revealed increased alpha power (8-14 Hz) during the delay period compared with baseline (see Fig. 5B). In the second step, we selected the electrode clusters that showed significant change in alpha power during the −925 to −275 ms time window pre-image onset, compared with baseline. This time window ensured that our delay period did not overlap with the sound or image. After cluster-based multiple comparisons correction (6), four different clusters showed significant modulation in alpha power during the delay period [and we selected one electrode (and its four nearest neighbors) in each cluster that showed the greatest modulation, for consistency across clusters; see Fig. 6A]: (1) right frontal (RF electrodes 4, 214, 215, 223, 224); (2) left central (LC electrodes 65, 66, 71, 76, 77); (3) right central (RC electrodes 163, 164, 173, 181, 182); and (4) occipital (OC electrodes 117, 118, 119, 127, 129). Additionally, we found that only alpha power at the RF cluster significantly correlated with the predictive value of sounds.
Alpha power calculation
For each condition, HP, MP, and NP, we averaged alpha power over all five electrodes in a cluster, to calculate the average alpha power of each cluster. To best capture the delay period activity just before image onset, we calculated mean alpha power between −625 and −275 ms (as the wavelet window is centered around each time point, the power estimate before −625 ms and after −275 ms may contain auditory and visual stimulus-related responses, respectively) for each trial aligned to image onset. We also calculated mean delay period alpha power between 600 and 1225 ms for each trial aligned to sound onset. To link EEG power spectral density to behavior using our DDM analysis (see HDDM), we calculated single-trial baseline-corrected (divisive normalization) alpha band power, aligned to image onset.
Source space analysis
We used FieldTrip's beamforming technique to localize sources of the sensor level alpha activity (Oostenveld et al., 2011). This technique uses an adaptive spatial filter to estimate activity at a given location in the brain. We used a source model defined in MNI space for all subjects. Across all sound cues, we used the Dynamic Imaging of Coherent Sources (Gross et al., 2001) algorithm to beamform the delay period alpha activity (window 0-900 ms before image onset). We then calculated the average source power for each ROI of the AAL atlas (Tzourio-Mazoyer et al., 2002). We selected ROIs that showed significant change in source power during the delay period (before image onset) compared with baseline (p < 0.025, Bonferroni-corrected across all cortical AAL ROIs).
Granger causality
We performed nonparametric spectral Granger causality analyses at the source level. Because there were no significant frontal ROIs in the left hemisphere, we restricted our analyses to the right hemisphere only. We used a covariance window of 0-900 ms before image onset and the Linearly Constrained Minimum Variance algorithm (Van Veen et al., 1997) to generate virtual time series for significant frontal and temporal ROIs. We averaged across the significant frontal ROIs (superior frontal gyrus, medial; superior frontal gyrus, dorsolateral; middle frontal gyrus); and there was only one significant temporal lobe ROI (inferior temporal gyrus). We calculated nonparametric spectral Granger causality between the (average) frontal and temporal ROIs using the ft_connectivityanalysis function in FieldTrip for the stable window starting 900 ms before image onset (Wang et al., 2008; Seth et al., 2015). We found Granger causal influence from RF to temporal cortex correlated with the predictive value of sounds only in the β frequency (15-30 Hz) band.
Hierarchical DDM (HDDM)
We used a DDM (Ratcliff and McKoon, 2008), where there are two possible choices (correct/incorrect responses of the match trials) in our PC task. According to this model, decision-making involves the accumulation of evidence (drift process) from a starting point to one of two (upper or lower) thresholds, representing the choices. The accumulation rate is known as the drift rate, v; and the starting point can be biased toward one of the choices (in our study, by the predictive value of the sound), reflected in a bias parameter, z. We used HDDM software (http://ski.clps.brown.edu/hddm_docs/) (Wiecki et al., 2013) for hierarchical Bayesian estimation of the parameters of the DDM. Particularly with fewer trials per condition, this method has been shown to provide more reliable estimates of parameters and is less susceptible to outliers (Ratcliff and McKoon, 2008) than more traditional approaches to DDMs (Vandekerckhove and Tuerlinckx, 2008; Voss and Voss, 2008) .
To directly link the causal relationship between the sound and its paired greeble image to behavior and drift-diffusion parameters, we included the estimates of causal relationship and transitional probability as predictor variables of the bias, z, of the model. That is, we estimated posterior probability densities not only for basic model parameters, but also the degree to which these parameters are altered by variations in the psychophysical measures (ΔP, causal power, causal support, and transitional probability). In these regressions, the bias parameter is given by
We used Markov Chain Monte Carlo chains with 20,000 samples and 5000 burn-in samples for estimating the posterior distributions of the model parameters. We assessed chain convergence by visually inspecting the autocorrelation distribution, as well as by using the Gelman-Rubin statistic, which compares between-chain and within-chain variance. This statistic was near 1.0 for the parameters, indicating that our sampling was sufficient for proper convergence. We analyzed parameters of the best model (model with lowest DIC) using Bayesian hypothesis testing, where the percentage of samples drawn from the posterior fall within a certain region (e.g., >0). Posterior probabilities ≥95% were considered significant. This value is not equivalent to p values estimated by frequentist methods, but they can be coarsely interpreted in a similar manner.
All the model comparisons were estimated on the psychophysics data as these had the greatest number of trials per condition. This ensures robust estimation of the best model. The best-fitting model was then used to analyze data from different conditions: predrug baseline, under DEX, under ketamine, and recovery from ketamine. To directly link EEG power spectral density to behavior and drift-diffusion parameters, we used the HDDM, but now included the RF cluster power estimate (aligned to image onset) in the alpha band as the predictor variable of the bias, z, of the model; that is, in the regression equation above, CP was now alpha power.
ERPs
We used ERPLAB (https://erpinfo.org/erplab) (Lopez-Calderon and Luck, 2014) to run ERP analysis. First, we cleaned epoched data aligned to the sound onset for auditory ERPs using the pop_artmwppth.m function of ERPLAB with a moving window of 200 ms (2%-3% of trials for each subject were excluded). We then averaged over trials to generate an average ERP for each subject. We used 200-0 ms before stimulus onset as baseline. Based on previous literature (Davies et al., 2010; Winkler et al., 2013), we chose channel 9 (Cz electrode) for exemplar auditory ERPs in figures.
EEG signal decoding
We used two different machine learning approaches in this study: (1) a support vector machine (SVM) model to classify HP, MP, and NP trials after sound onset, with EEG power spectral density as input; and (2) a recurrent neural network (RNN) to classify V1, V2, and V3 trials relative to image onset, with “raw” EEG time series data as input.
We used the power spectral density of EEG signals across four frequency bands, theta (5-7 Hz), α (8-14 Hz), β (15-30 Hz), and γ (31-40 Hz), for the first decoding analysis. We calculated the sum of squared absolute power in each frequency band for each electrode cluster (RF, LC, RC, and OC), thus generating 16 features for each trial as the input dataset. Using Scikit-learn (Pedregosa et al., 2011) implemented in Python, for each 20 ms time bin, we trained an SVM model to classify the EEG data into three classes: HP, MP, and NP. We denote
As we penalized the mapped weights of the classifier at each time bin, we used normalized absolute values of the weights as a measure to deduce each feature's contribution to classify the outputs.
Second, we used a RNN with many-to-many architecture to decode visual stimuli using the EEG time domain signals from the four clusters of electrodes (RF, RC, LC, OC) mentioned above (20 features in total) in successive 20 ms time bins. The proposed RNN consists of a bidirectional long-short term memory (BiLSTM) layer followed by an attention layer, a fully connected with dropout layer, and a softmax layer as output. Considering x(t) as the input at time t, the output of the BiLSTM forward path is calculated as follows:
Where
All BiLSTM, attention, and FC layer weights and biases will be updated through backpropagation in time. Hyperparameters of the model, including the number of hidden units for BiLSTM, attention and FC layers, learning rate, dropout probability, and learning rate of stochastic gradient descent, etc., are optimized via a grid search. The decoding accuracy is calculated with 10-fold cross validation with 60% of trials as training set and 40% as test set by shuffling and stratifying to avoid accuracy bias because of imbalance classes.
To find out how well the model can be generalized in time, we tested the trained model until time t with the data at all time points using the stored parameters of the above RNN model. The results of this analysis are used to see whether information about the visual stimulus can be decoded before stimulus presentation.
To gain insight on how well the model decodes each visual stimulus during the delay period, we trained the model on data at 100 ms after visual stimulus onset and calculated the F score for each class (V1, V2, V3) under all conditions (before drug, under ketamine, and DEX).
Using the same model (trained on data 100 ms after visual stimulus onset), we looked at the proportion of each auditory stimulus (HP, MP, and NP) associated with the correctly decoded trials at 100 ms before visual stimulus onset.
Statistical analysis
We performed statistical analysis of trials from testing phases and used the learning phase only to confirm that the participants learned the correct associations. To only include the trials where the causal power of HP, MP, and NP trials has differentiated (see Fig. 1G), we excluded the first 50 trials of the testing phase in psychophysics experiments and the predrug baseline. We used all available trials for the testing phase of “under drug” and “after recovery,” as the causal power of HP, MP, and NP trials was already differentiated from the beginning (see Fig. 3A,B). For RT analysis, we excluded RTs more than/less than the mean ± 3 SDs for each subject, and we used correct match trials. For delay period EEG analyses, we used both correct match and nonmatch trials.
Because we had clear, a priori predictions about the effects of the different conditions in this study (i.e., the effect of prediction and the effects of the drug conditions), we could apply contrast analysis using linear mixed effects (LMEs) models. Our study conformed to the guidelines set out by Abelson and Prentice (1997); that is, we included the contrast of interest along with a paired, orthogonal contrast and infer significance only when our statistical tests showed that effects were significant for the contrasts of interest and not for the orthogonal contrasts.
EEG data were analyzed using contrast analysis with an LME where the predictive value of sound (HP, MP, NP; varying within subject) and drug condition (before ketamine, under ketamine, under DEX; varying within subject) served as independent variables. For prediction, we used a linear (−1, 0, 1) contrast as our contrast of interest, and a quadratic (−1, 2, −1) contrast as the orthogonal contrast in the analysis. For drug condition, we used a quadratic (−1, 2, −1) contrast as our contrast of interest, and a linear (−1, 0, 1) contrast as the orthogonal contrast in the analysis. Mathematically, our LME can be written as Equation 20 below:
Here, DV is the dependent variable (RF alpha power or average GC at beta frequency), PredictionC1 is the contrast of interest modeling sounds' predictive value, PredictionC2 is the orthogonal contrast for sounds' predictive value, DrugC1 is the contrast of interest modeling drug condition, and DrugC2 is the orthogonal contrast for drug condition. Terms after the summation notation in Equation 20 correspond to interactions between sounds' predictive value and drug condition (all possible interactions, i.e., permutations of contrasts, included).
We used a similar LME with prediction (HP, MP, NP; varying within subject) and drug condition (under ketamine, before drug, under DEX, after recovery; varying within subject) as independent variables to analyze RT data. We used a linear (HP, MP, NP: −1, 0, 1) contrast as our contrast of interest to model RT, and a quadratic (HP, MP, NP: −1, 2, −1) contrast as the orthogonal contrast in the analysis. For drug condition, we used a four-level (under ketamine, before drug, under DEX, after recovery: 3, −1, −1, −1) contrast as our contrast of interest, and two orthogonal contrasts (under ketamine, before drug, under DEX, after recovery: 0, 0, −1, −1 and 0, 2, −1, −1) in the analysis. Both main effects of sounds' predictive value and drug as well as their interaction were included. We used contrast analysis to model our hypotheses. Mathematically, our LME can be written as Equation 21 below.
Here, DV is the dependent variable (RT or accuracy), PredictionC1 is the contrast of interest modeling sounds' predictive value, PredictionC2 is the orthogonal contrast for sounds' predictive value, DrugC1 is the contrast of interest modeling drug condition, and DrugC2 and DrugC3 are the orthogonal contrasts for drug condition. Terms after the summation notation of Equation 21 correspond to interactions between sounds' predictive value and drug condition (all permutations of two-way interactions included).
To investigate auditory ERPs, we ran nonparametric permutation tests using the ft_timelockstatistics.m function from FieldTrip (Oostenveld et al., 2011). For before and during ketamine conditions, we looked for differences in ERP between sound cues across 50 to 300 ms latency from sound onset. We randomly shuffled condition labels 10,000 times. Alpha value was set to 0.05 for all comparisons.
We used repeated-measures ANOVA (Holm-Sidak–corrected p values) to test the significance of F scores for the three differentially predictive conditions (HP, MP, NP) as well as the significance of each feature's contribution in output classification.
To test the significance of the cross-temporal decoding results, we performed a paired t test between the 20-fold cross validated and 100 times resampled accuracy at each pixel and randomly permuted output labels model at the same pixel. The resulting p values then were corrected for multiple comparisons (Holm-Sidak corrections).
Results
Predictions improved RTs
Subjects initially learned paired associations (A1-V1, A2-V2, A3-V3) between three sounds (A1, A2, A3) and three images (V1, V2, V3) through trial and error. During learning, each sound and image had equal probability (33%) of appearing in any given trial, preventing subjects developing any differential predictions because of stimulus frequency. Thus, the presence of any given sound does not predict the occurrence of any future image, during this learning phase. Following the presentation of both stimuli, subjects reported whether the sound and image were indeed paired (i.e., whether or not they matched). To manipulate subjects' predictions during subsequent testing, we varied the probability of an image appearing after its associated sound. This probability was different for each sound: 85% chance of V1 after A1; 50% chance of V2 after A2; and 33% chance of V3 after A3 (Fig. 1A). Thus, A1 was HP, A2 was MP, and A3 was NP.
We hypothesized that increasing the predictive value of the sound would allow subjects to better predict the upcoming image, enabling quicker responses (HP < MP < NP) in match trials. However, if predictions are mediated by NMDARs, then ketamine should disrupt predictions; that is, subjects administered with a subanesthetic dose of ketamine should be unable to exploit the differential predictive value of each sound, thus preventing faster RTs. If these effects are specific to NMDAR manipulation, then the control drug DEX should still allow faster RTs to predicted stimuli. To investigate these hypotheses (and to restrict multiple tests on the same dataset), we ran an LME model. We used the sounds' predictive value (HP, MP, NP) and drug condition (before drug, under ketamine, under DEX, after recovery from drug) as independent variables and RT as our dependent variable (RT ∼ sounds' predictive value + drug condition + sounds' predictive value × drug condition). We applied a contrast analysis strategy to model our independent variables. Our study conformed to the guidelines set out by Abelson and Prentice (1997) with regards to contrast analysis (i.e., we included the contrast of interest along with paired, orthogonal contrasts). (Contrasts of interest only explain a part of the total variation between groups. We included orthogonal contrasts to explain the residual variance. According to Abelson and Prentice (1997), an analysis of the residual variance [i.e., orthogonal contrasts] is important since one may miss systematic patterns in the data if one only tests the contrast of interest. They suggested that finding a significant contrast of interest and a nonsignificant orthogonal contrast confirms the data support the hypothesis.)
To test our hypothesis that a greater predictive value of sounds allows subjects to respond faster, we used a linear contrast (HP, MP, NP: −1, 0, 1) as our contrast of interest, and a quadratic contrast (HP, MP, NP: −1, 2, −1) as the orthogonal contrast in the analysis. Further, to test our hypothesis that only ketamine prevents these faster responses, we used a four-level contrast (under ketamine, before drug, under DEX, after recovery: 3, −1, −1, −1) as our contrast of interest, and two orthogonal contrasts (under ketamine, before drug, under DEX, after recovery: 0, 0, −1, −1 and 0, 2, −1, −1) in the analysis. A significant main effect of prediction will confirm predictive sounds produce faster responses. A significant interaction effect of prediction and drug condition will confirm that ketamine disrupts subjects' ability to exploit predictive sounds to respond faster.
We found a significant main effect of sounds' predictive value in our LME model (ANOVA, F(1,21.07) = 14.14, p = 0.001; orthogonal contrasts nonsignificant). RTs were faster when sounds had greater predictive value (Fig. 1B). This result was further validated in parallel psychophysics experiments, where we controlled for possible match bias (Lupyan and Thompson-Schill, 2012) using randomly interleaved “inversion trials” (in which subjects simply indicated whether greebles were inverted) to minimize the expectation of match trials (ANOVA, F(1,21.92) = 18.71, p = 0.0001, orthogonal contrasts nonsignificant, Fig. 2A,G,H). Furthermore, we found effects on RTs could not be explained by speed-accuracy trade-offs, as subjects were most accurate for HP, followed by MP and NP sounds (ANOVA, F(1,21.66) = 14.42, p = 0.001, orthogonal contrasts nonsignificant, Fig. 2B).
Ketamine blocked fast RTs to predictive sounds
Importantly, we found a significant interaction of sounds' predictive value and drug condition (ANOVA, F(1,16.42) = 5.51, p = 0.03). The interaction effect confirmed that, under ketamine, the linear correlation between the predictive value of sounds and RT was diminished (Fig. 1C). This effect was specific to NMDAR manipulation, as DEX did not disrupt the ability of subjects to exploit the differential predictive value of the sounds. Under DEX, the linear correlation between the predictive value of sounds and RT was intact (Fig. 1E), similar to the predrug baseline condition. These pharmacological effects were not because of low accuracy as subjects' average accuracy was similar across all three conditions (77.8% under ketamine, 85.7% without ketamine, and 81.0% under DEX; ANOVA, F(1,23.932) = 1.36, p = 0.22). Neither were effects because of the level of sedation as subjects were more alert under ketamine than DEX (under ketamine, average modified OAA/S score of 4.85 compared with 3.33 under DEX [5, awake, 1, unresponsive]; unpaired t test, p = 0.003). Significant interaction effects in our LME model also confirmed that the linear correlation of RTs with predictive strength returns after recovery from ketamine (2-4 h after ending ketamine administration, depending on subject's recovery; Fig. 1D). Overall, our results demonstrate that subjects used predictive information to enhance behavioral performance and the NMDAR-blocker ketamine prevented this behavioral advantage.
Subjects based predictions on inferred “causal” relationships between stimuli
We next investigated what information in the trial history subjects base predictions on; in other words, how do subjects learn the predictive relationship between stimuli? We tested two possibilities: (1) did subjects generate predictions by keeping track of simple co-occurrences of each sound-image pair, that is, were they basing predictions on correlations? or (2) did subjects not only track the occurrence of each image with its paired sound but also unpaired sounds, that is, were they basing predictions on the learnt structure of all stimulus relationships? For option 1 above, we calculated the transitional probability (i.e., how often a particular image follows the sound only). For option 2, we calculated the causal power (Griffiths and Tenenbaum, 2005) of the sound-image association, which is the amount of evidence that a sound “causes” a particular image, as opposed to a random different “cause.” Here, “causal” is used in a statistical sense, capturing the relationship between the initial sound and the image that closely follows in time (Table 1 and methods show calculation details; we also calculated two other measures of “causation,” ΔP and causal support, and observed similar results). We updated the transitional probability and causal power estimates each trial, to account for the additional information available (i.e., the transitional probability/causal power value was the same for each stimulus at the start of the predrug baseline), but these values eventually systematically differed between stimuli as more trials were performed, reflecting the accumulating information from the trial history (Figs. 1F, 2E,F show the causal power differentiating all stimuli earlier than transitional probability).
We next determined whether transitional probability and/or causal power can account for the behavioral results (RTs). To this end, we modeled subjects' decision-making process using a DDM. Evidence accumulates (drift process) from a starting point to one of two boundaries. Here, the boundaries represent the two possible outcomes for match trials only (correct and incorrect). The drift process stops when it reaches a boundary, indicating the choice, and the time taken to reach the boundary represents the RT for the trial (Fig. 1G). The starting point of the drift process, here modeled from image onset, may be biased toward one of the boundaries, and it is determined by a bias parameter, z. This parameter represents the predictive value of the sound, which can be based on the transitional probability or causal power. A drift process that starts with a larger bias will reach the decision boundary quicker, resulting in a faster RT (i.e., more predictive sounds generate a larger bias and faster RT). Thus, whichever of transitional probability or causal power (through the bias parameter, z) yield better correspondence with subjects' RTs will be the better indicator of the information subjects used to generate predictions.
To test this, we used hierarchical Bayesian parameter estimation (HDDM) (Wiecki et al., 2013), which calculates the posterior probability density of the diffusion parameters generating the RTs for the entire group of subjects simultaneously, while allowing for individual differences. We estimated the regression coefficients to determine the relationship between trial-to-trial transitional probability/causal power and biases estimated from the posterior predictive distribution. In other words, we calculated the bias for each trial that best predicted the RT. But, for each trial, the bias was constrained to depend on the transitional probability or causal power (equation, Fig. 1G). Hence, for each trial, we calculated the relationship (regression coefficient β1) between the bias and transitional probability/causal power that best predicted RT. Specifically, we estimated the posterior probability density of the regression coefficient (β1; Fig. 1G) to determine the relationship between the bias and either transitional probability or causal power. We found causal power (DIC = −3197) predicted RTs better than transitional probabilities (DIC =−1795) (i.e., causal power better captured the basis of prediction generation; option 2 above). Further, bias was positively correlated with causal power (P{β1>0} = 0.04; Fig. 1H). This suggests that subjects based predictions on trial-by-trial updates of inferred “causal” relationships between sounds and images, rather than just correlations.
The HDDM also provides a framework to model drug effects. Thus, we repeated the above analysis of subjects' behavior under ketamine and under DEX. If ketamine prevents predictive information from conferring a behavioral advantage, all sounds will generate similar biases (i.e., there will be no correlation between the bias and the predictive value of sounds), so β1 will be zero. Indeed, under ketamine, β1 was not different from zero (P{β1>0} = 0.76; Fig. 1H). In contrast, under DEX, bias positively correlated with the predictive values of sounds (all sounds generating equal vs different biases, P{β1>0} < 0.00001; Fig. 1H) similar to baseline. After recovery from ketamine, once again β1 was >0 (P{β1>0} < 0.00,001; Fig. 1H), confirming that subjects had again generated larger bias for more predictive sounds. The question is: (1) did subjects regain access to previously learned and stored predictive information (Fig. 3B) or (2) did they relearn the predictive value of each sound after recovery from ketamine (Fig. 3C)? To answer this, we used the HDDM to analyze the first 30 trials for each subject after recovery (translating to ∼10 trials for each sound cue, for every subject). We found that, only for the former (option 1), bias positively correlated with causal power (P{β1>0} = 0.03; Fig. 3D). This suggests that ketamine did not produce a loss of previously learned predictive information, but rather ketamine prevented access to the predictive information.
Strength of predictive information correlated with frontal alpha power
We next investigated the circuit-level mechanism of prediction. To show task-responsive EEG electrodes, we averaged the time-frequency response across all trials and all electrodes (this selection procedure does not bias toward particular predictions). This revealed a task-related increase in baseline-corrected alpha power (8-14 Hz; see Fig. 5B) before drug administration. Four clusters of electrodes (RF, RC, LC, and OC) showed significant modulation of delay period alpha power compared with baseline, regardless of sound (see Fig. 6A). Considering alpha power as an index of neural excitability (reduced α indicating reduced inhibition/increased excitability) (Jensen and Mazaheri, 2010; Lange et al., 2013), one might expect stronger predictions to be associated with lower alpha power, reflecting greater activation of prediction-encoding neurons. This was the case for the RF cluster. There was greater reduction of alpha power across the delay period after more predictive sounds (Fig. 4A–C). We hypothesized that this differential α modulation should characterize the predrug baseline and DEX conditions, but not ketamine as it prevented predictive information from conferring a behavioral advantage. To test this hypothesis, similar to the RT analysis (except that EEG was not recorded after recovery from drug), we ran an LME model. We regressed delay period alpha power at the RF electrode cluster on sounds' predictive value (HP, MP, NP) and drug condition (before drug, under ketamine, under DEX) [RF alpha power ∼ sounds' predictive value + drug condtion + sounds' predictive value × drug condition]. To test the effect of sounds' predictive value, we used a linear contrast (HP, MP, NP: −1, 0, 1) as our contrast of interest, and a quadratic contrast (HP, MP, NP: −1, 2, −1) as the orthogonal contrast in the analysis. To test our hypothesis that the RF alpha power modulation before drug will change under ketamine but not under DEX, we used a quadratic contrast (before drug, under ketamine, under DEX: −1, 2, −1) as our contrast of interest, and a linear contrast (before drug, under ketamine, under DEX: −1, 0, 1) as the orthogonal contrast in the analysis (see below). The significant main effect of sounds' predictive value confirmed stronger predictions correlated with lower delay period alpha power at the RF electrode cluster (ANOVA, F(1,26.24) = 4.82, p = 0.0003, orthogonal contrast nonsignificant; Figs. 4A–D and 5D–F). Hence, frontal α reflected predictions.
To rule out the possibility that our RF spectral results are because of an impact on lower-level feedforward sensory processing, we also investigated whether there were differences in the auditory ERPs of HP, MP, and NP sounds. We tested for an effect over the entire time course between 50 and 300 ms after sound onset. Cluster-based permutation tests (Oostenveld et al., 2011) revealed no significant difference between HP, MP, and NP sounds in any electrodes at any latency. This confirmed that the correlation between the strength of predictions and frontal alpha power was not because of feedforward sensory processing, as all three sounds generated similar auditory ERPs (Fig. 5A).
Ketamine disrupted the correlation between predictions and frontal alpha power
NMDAR blockade has been shown to increase frontal cortical excitability (Homayoun and Moghaddam, 2007, 2008; Murray et al., 2014; Rosch et al., 2019), reducing response selectivity and signal-to-noise ratio (SNR) (Skoblenick and Everling, 2012; Ma et al., 2015). We thus expect low-dose ketamine to increase frontal cortical excitability regardless of the predictive value of sounds. This would manifest as similarly low RF alpha power for all sounds. Using our LME model, we found a significant interaction of prediction and drug condition (ANOVA, F(1,19.04) = 2.81, p = 0.0007). The interaction effect confirmed that, under ketamine, delay period alpha power at the RF electrode cluster was similar across sounds (Fig. 4E), whereas, for DEX, RF alpha power still showed a linear trend (Fig. 4F) similar to the predrug baseline. This was not because of a more general drug-related change in alpha power, as power estimates were corrected based on the power before sound onset. Moreover, this was not because of a change in feedforward sensory processing under ketamine as cluster-based permutation tests (Oostenveld et al., 2011) revealed no significant difference between HP, MP, and NP sounds at any electrodes at any latency (i.e., all three sounds still generated similar auditory ERPs to that before ketamine). This suggests that ketamine affects mechanisms representing predictive information, and not basic sensorimotor mechanisms. That is, NMDARs mediate prediction strength through modulation of frontal excitability (reflected in alpha power).
Decodability of predictions reduced under ketamine
Increased frontal excitability does not necessarily translate to useful prediction if it reduces SNR (e.g., because of a more general change in excitability). We propose that increased frontal excitability (indicated by reduced alpha power) could facilitate better prediction, but this would not be the case if increased excitability reduced the SNR. To test this, we trained a decoder to measure whether this hyperexcitability, reflected in low RF alpha power, still allows differential representation of predictions. Greater classification accuracy of each sound's predictive value during the delay period would be consistent with a higher SNR of the neural representation of the prediction. Before ketamine, the classification F score for each sound separated at 560 ms after sound onset (ANOVA, p = 0.0018) and remained separate across the delay period until image presentation (Fig. 6B). Similarly, classification F score for DEX separated at 620 ms after sound onset (ANOVA, p = 0.0067) and remained separate (Fig. 6D). This shows that the more predictive the sound, the better the classification (HP > MP > NP). In contrast, under ketamine, there was no separation of classification F score for each sound, and classification accuracy overall was lower (ANOVA, p = 0.48; Fig. 6C). This is consistent with ketamine disrupting the α indexing of predictive value. The weighting of features contributing to classifier performance confirmed that, before ketamine, RF alpha power contributed most to classification accuracy (RF alpha power feature (WRFα) > other features (W∼RFα), ANOVA, p = 0.008; Fig. 6F). This was also true under DEX (WRFα > W∼RFα, ANOVA, p = 0.013; Fig. 6H). Under ketamine, RF alpha power contributed little to classification accuracy (WRFα > W∼RFα, ANOVA, p = 0.51; Fig. 6G). These results suggest that ketamine disrupts the expression of predictive value in the power of RF alpha activity, which may be because of decreased SNR.
Frontal alpha power correlated with RTs on trial-by-trial basis, but ketamine perturbed the correlation
Although frontal activity correlates with the predictive value of sounds, we need to show that subjects use it (i.e., that RF alpha power is linked to behavior). We again used the HDDM, but now to test whether RF alpha power predicts RT on a trial-by-trial basis. As before, the HDDM included two boundaries representing the possible choices in match trials (i.e., correct/incorrect). The starting point of the drift process, modeled from image onset, is determined by the RF alpha power in the delay period for each trial, which may be biased toward one of the boundaries, and is reflected in the bias parameter, z. We calculated the posterior probability density of the regression coefficient, β1α, which determines the relationship between the bias (z) and RF alpha power. Bias was inversely correlated with RF alpha power (P{β1α < 0} = 0.04, Fig. 6E). This suggests that, for more predictive sounds, lower RF alpha power creates a larger bias, and as a result, decisions are reached quicker (quicker RT). In contrast, ketamine blocked the correlation between bias and alpha power (regression coefficient, β1α, did not differ from zero, P{β1α < 0} = 0.51; Fig. 6E). This suggests that subjects used frontal activity to make predictions.
Predictive sounds activated sensory representations before visual stimuli, but ketamine perturbed these prestimulus activations
Prior work suggests that predictions can activate sensory templates before visual stimuli onset (Kok et al., 2017; Blom et al., 2020). To test which neural circuits and receptors contribute to this, we used a time-generalized deep RNN to classify visual stimuli (V1, V2, V3) at successive time bins (20 ms) across a trial. Visual representations are often activated as early as 50 ms after stimulus onset (Blom et al., 2020). To capture this early representation in our analysis, we trained our classifier with EEG time series data (not power) from the RF, RC, LC, and OC electrode clusters, as spectral data corresponding to the visual stimulus presentation may be contaminated with prestimulus or motor activity because of the wavelet window size. Figure 7A–C shows the cross-temporal decoding accuracy at different trained (y axis) and tested (x axis) time bins across the delay period and after visual stimulus onset. The white contour shows decoding accuracy significantly above chance (33%). We found significant classification accuracy above chance during the delay period well before visual stimulus onset for the “before drug” (Fig. 7A) and DEX (Fig. 7C) conditions. Crucially, when we trained the classifier on visual stimulus-evoked activity (e.g., at 100 ms after stimulus onset, around the peak visual response) and tested in the delay period, as highlighted with the black dashed rectangle, there was significant decoding earlier in the delay period for the “before drug” and DEX conditions (predrug baseline: starting at 492 ms before stimulus onset; t(20,1000) = 6.4, p = 0.008; Fig. 7A; and DEX: starting at 392 ms before stimulus onset; t(20,1000) = 4.3, p = 0.011; Fig. 7C), compared with ketamine (starting at 116 ms before stimulus onset; t(20,1000) = 2.23, p = 0.03; Fig. 7B). This confirmed activation of visual stimulus representations before stimulus onset; and that ketamine perturbed such prestimulus activations.
Next, we probed which factors drive such previsual stimulus activations. We hypothesized that predictive sounds activate these early visual representations. That is, when subjects hear a predictive sound cue, they pre-activate a representation of its paired image during the delay period. Moreover, we posited that the strength of pre-activation will depend on the predictive value of the sound cue. Hence, the HP sound will pre-activate V1's representation most strongly, followed by the MP sound moderately pre-activating V2's representation, and the NP sound leading to little/no pre-activation of V3's representation. To test our hypothesis, we measured how well the decoder classifies each of V1, V2, and V3, expecting better classification performance for V1 > V2 > V3, after training on visual-evoked activity (100 ms after stimulus onset reported here) and testing on the entire time across a trial. This relates to the row corresponding to the training time 100 ms after visual stimulus onset in Figure 7A–C (black dashed rectangle), but now we further calculated the F score for V1, V2, and V3 individually. During the delay period, we found the highest classification F score for V1 followed by V2 and then V3. For the “before drug” baseline, the classification F score for V1, V2, and V3 separated at 200 ms before visual stimulus onset (ANOVA, F(20) = 5.3, p = 0.003) and remained separate thereafter, as shown by the blue horizontal line in Figure 7D. Similarly, the classification F score for DEX separated 84 ms before visual stimulus onset (ANOVA, F(20) = 3.2, p = 0.011) and remained so (Fig. 7F). In contrast, for the classification F score under ketamine, there was no separation during the delay period, and the F score curves only separated 12 ms after visual stimulus onset (ANOVA, F(20) = 2.4, p = 0.024; Fig. 7E).
To further confirm that such differential pre-activation of visual representations is driven by the predictive value of their paired sound cues, we identified which sound cue (HP, MP, or NP) preceded the correctly classified V1, V2, or V3 stimulus, when trained at 100 ms after visual stimulus onset and tested at 100 ms before visual stimulus onset (Fig. 7D–F, vertical black dashed line, testing time). One might expect HP sounds to precede pre-activation of their predicted V1 image representation, predominantly MP sounds to precede pre-activation of their predicted V2, and little bias for V3. For both “before drug” and DEX conditions, we indeed found that: for trials classified as V1, the proportion of HP sound cues was significantly higher than that for MP or NP (ANOVA, Predrug: F(20) = 8.8, p = 0.0001, Fig. 7G; Dex: F(20) = 4.3, p = 0.001, Fig. 7I); for trials classified as V2, the proportion of MP sound cues was significantly higher than that for HP or NP (ANOVA, Predrug: F(20) = 6.1, p = 0.0007, Fig. 7G; DEX: F(20) = 3.2, p = 0.004, Fig. 7I); but for trials classified as V3, there was no significant difference in the proportion of HP, MP, and NP sound cues (ANOVA, Predrug: F(20) = 0.80, p = 0.07, Fig. 7G; DEX: F(20) = 0.66, p = 0.1, Fig. 7I). Conversely, under ketamine, there was no significant difference in the proportion of HP, MP, and NP sound cues for any image (ANOVA, F(20) = 0.33, p = 0.31; Fig. 7H).
Previous fMRI (Gauthier et al., 1999) and EEG (Rossion et al., 2004; Moulson et al., 2011; Krusemark and Li, 2013) studies found visual-evoked responses to greebles in higher-order sensory cortex of the right hemisphere. One might expect pre-activation of predicted greeble representations in the same region. Accordingly, the RC electrode cluster contributed most to the classification accuracy for both the “before drug” (ANOVA, F(20) = 4.3, p = 0.002; Fig. 7J) and DEX (ANOVA, F(20) = 3.8, p = 0.01; Fig. 7L) conditions, when we trained on data 100 ms after visual stimulus onset and tested on the entire time across a trial. Crucially, the RC cluster significantly contributed to classification accuracy during the delay period for both “before drug” and DEX (476 ms before visual stimulus onset for “before drug” and 344 ms before stimulus onset for DEX). In contrast, under ketamine, the RC cluster did not significantly contribute to the classification accuracy until much later (52 ms before visual stimulus onset; ANOVA, F(20) = 2.9, p = 0.011; Fig. 7K). Together, this suggests that NMDARs contribute to the prestimulus sensory templates activated by predictions.
Strength of predictions correlated with feedback operating at β frequencies
To investigate how predictions generated frontally influence prestimulus sensory templates, we measured functional connectivity between these frontal and sensory sites. Specifically, we calculated nonparametric spectral Granger causality in both EEG sensor and source spaces. For both spaces, we found frontal influences on sensory cortex dependent on the predictive value of sounds. We focus on source space results here (because of potential mixing of source signals in sensor space complicating connectivity analyses). Since frontal alpha power reflected the predictive value of sounds, we first calculated sources of sensor-level, task-related increases in baseline-corrected alpha power. ROIs in Figure 8A (inset) showed significantly increased source power during the delay period compared with baseline (p < 0.025, Bonferroni-corrected across all cortical AAL ROIs, to ensure robust task-related sources). We found frontal cortical contributions to increased sensor-level alpha power in the right hemisphere only, as well as temporal cortical contributions consistent with previous greeble studies (Gauthier et al., 1999). Consequently, we restricted our source-level Granger causality analysis between these sources within the right hemisphere. Using nonparametric spectral Granger causality, we measured the source-level feedback from the superior and medial frontal gyrus (Fig. 8A, green) to inferior temporal gyrus (Fig. 8A, red). Theoretically, the audiovisual task should anatomically isolate predictions transmitted along feedback pathways to posterior visual areas during the delay period from the feedforward auditory signals. One might expect higher frontal excitability (from stronger predictions or effect of ketamine) to give rise to stronger feedback. To test this, we ran the same LME model as for our previous EEG analyses. But here we regressed source-level beta band (15-30 Hz) feedback from the superior and medial frontal gyrus (Fig. 8A, green) to inferior temporal gyrus on sounds' predictive value (HP, MP, NP) and drug condition (before drug, under ketamine, under DEX) [β feedback ∼ sounds' predictive value + drug condition + sounds' predictive value × drug condition]. To test the effect of sounds' predictive value, we used a linear contrast (HP, MP, NP: −1, 0, 1) as our contrast of interest, and a quadratic contrast (HP, MP, NP: −1, 2, −1) as the orthogonal contrast in the analysis. To test our hypothesis that the β feedback effect before drug will be perturbed under ketamine but not under DEX, we used a quadratic contrast (before drug, under ketamine, under DEX: −1, 2, −1) as our contrast of interest, and a linear contrast (before drug, under ketamine, under DEX: −1, 0, 1) as the orthogonal contrast in the analysis. A significant main effect of sounds' predictive value (ANOVA, F(1,30.46) = 11.47, p = 0.0001, orthogonal contrast nonsignificant) showed stronger predictions were associated with greater Granger causal influence of RF cortex on right inferior temporal cortex in the beta band during the delay period (Fig. 8A,B). Together, this suggests that predictions are disseminated along feedback connections down the cortical hierarchy before image onset.
Ketamine perturbed the correlation between prediction strength and β feedback
Because NMDAR blockers have been reported to perturb feedback pathways in macaques (Self et al., 2012) and humans (Vlisides et al., 2017), we expected ketamine to alter feedback (carrying predictions) from frontal to inferior temporal cortex. We do not expect such modulations of feedback pathways under DEX as DEX did not change correlation of RT and frontal alpha power with prediction. We found a significant interaction of prediction and drug condition (ANOVA, F(1,24.98) = 2.61, p = 0.04). The significant interaction effect demonstrated that, under ketamine, there was no longer a correlation between the predictive value of sounds and beta band Granger causal influence of RF on right inferior temporal cortex (Fig. 8C; i.e., ketamine scrambled the feedback for each sound). In contrast, for DEX, the Granger causal influence in the beta band still differed between sounds (Fig. 8D). This suggests that the predictive feedback facilitating behavior depends on NMDARs.
Discussion
Our results show NMDAR-mediated, circuit-level mechanisms of prediction and its behavioral effects. Frontal cortex represented predictions and, starting before image onset, transmitted them to posterior cortex in the beta band, to activate a sensory representation of the predicted image. Stronger predictions enabled faster responses, and reflected causal power (i.e., inferred “causal” relationships between sounds and images). In contrast, ketamine prevented fast responses to predictive stimuli, as well as subjects from using the strength of causal power to generate predictions. At the circuit level, ketamine disrupted predictions by reducing frontal alpha power to the same low level before all images (likely indicating reduced SNR), leading to undifferentiated feedback and perturbed prestimulus sensory activations. Overall, it suggests that NMDARs normally sharpen representations of predictions in frontal and posterior cortex, to enable PC. The data are less supportive of the classical view of perception, with its emphasis on feedforward processing to reconstruct images because one might have expected little systematic difference in behavioral and neural measures for different predictive conditions.
The initial predictive auditory stimulus will activate auditory pathways, leading to the generation of a prediction of the subsequent visual image by a higher-order, multimodal area. Complex auditory stimuli, such as the trisyllabic greeble names in our task, are represented in auditory lateral belt and parabelt cortex (Kaas et al., 1999). Belt and parabelt regions are connected with a number of multimodal areas in superior temporal and prefrontal cortex (Romanski et al., 1999; Kaas and Hackett, 2000), where there are multisensory neurons responding to both vocalizations and images (Diehl and Romanski, 2014; Hwang and Romanski, 2015). Recent work (Cao et al., 2019) suggests that early sensory fusion occurs in temporal or parietal multimodal areas, whereas more flexible weighting and integration of sensory signals for adaptive behavior occur in frontal multimodal areas. Our results are consistent with this but go further by showing that the frontal multimodal areas are the source of predictive multimodal signals, which, in addition to the enhancement of sensorimotor processing shown here, may be useful for communication and language processing (Kuperberg and Jaeger, 2016). Interestingly, we found that the frontal source and posterior target, showing prestimulus sensory activations of the predicted greeble, of predictive information was lateralized to the right hemisphere. This is consistent with previous studies suggesting a possible right hemispheric bias for the processing of greebles (Gauthier et al., 1999; Tarr and Gauthier, 2000; Brants et al., 2011), and possibly faces more generally (Tsao et al., 2008; Júnior et al., 2014).
There have been two broad approaches for understanding how we learn relationships between stimuli: associative and causal approaches. Classical theories, such as the Rescorla-Wagner model (Rescorla and Wagner, 1972; Bouton, 2004; Wagner et al., 2017), propose learning as the association between a cue and an outcome. Causal models of learning (Griffiths and Tenenbaum, 2005; Holyoak and Cheng, 2011; Gershman et al., 2015), on the other hand, propose that we learn relationships between latent unobservable “causes” and observable stimuli (both cues and outcomes). In other words, causal learning models have put forward the idea of “clustering,” where observations (related to both cue and outcome) are clustered together according to their hypothetical latent causes. Our task design incorporates pairs of artificial auditory and visual stimuli separated by a delay period. While this minimizes unwanted differences between pairs at the outset and temporally separates prediction from prediction error signals, the task design makes it difficult to determine to what degree subjects infer latent causes. Without losing sight of this caveat, it is interesting that, in line with previous work (Gershman et al., 2010; Gershman and Niv, 2012), we found that subjects' RTs were best predicted by a causal model (casual power) and not an associative model (transitional probability). Our causal power results have further implications on models of sequential learning. Previous work on causal learning focused on summarized data contingency (Cheng, 1997; Griffiths and Tenenbaum, 2005). Our findings support trial-by-trial learning from sequential data as proposed by a recent modeling study (Lu et al., 2008). Additionally, using HDDM, we found that both RF alpha power and causal power correlated to RT on a trial-by-trial basis. This points toward a neural readout of causal inference in humans.
Although β oscillations have been proposed to maintain the current brain state (Engel and Fries, 2010; Jenkinson and Brown, 2011), there is growing evidence of beta activity playing a more dynamic role (Spitzer and Haegens, 2017). It has been proposed that β oscillations are suitable for endogenous reactivation of cortical representations, to facilitate task-relevant activity patterns and cognitive demands (Spitzer and Haegens, 2017). In line with this, we found beta band predictive feedback from RF to inferior temporal cortices reactivates greeble representations in posterior cortex before image onset. Spitzer and Haegens (2017) speculate that beta band activity is well suited to be a “transit” between α frequency (generally associated with cortical excitation/inhibition) (Klimesch, 1996; Klimesch et al., 2007) and γ frequency (generally associated with population spiking and active stimulus coding) (Whittingstall and Logothetis, 2009) activity. Our finding of frontal cortical alpha power coding for the predictive value of sound cues, followed by frontal influence on temporal areas (shown to have greeble representations) (Gauthier et al., 1999; Rossion et al., 2004; Krusemark and Li, 2013) at β frequencies before visual-evoked activity, supports β's role as a “transit” band.
Intracortical laminar recordings in animal studies of sensory and attentional processing suggest that feedforward signaling operates at γ frequencies, whereas feedback signaling operates at lower frequencies (Buffalo et al., 2011; van Kerkoerle et al., 2014; Bastos et al., 2015). Consistent with this, previous work on predictive processing using univariate measures has reported the involvement of various lower frequencies, including θ, α, and β bands (Snyder and Foxe, 2010; Arnal and Giraud, 2012; Mayer et al., 2016; Han and VanRullen, 2017; Samaha et al., 2018; Bastos et al., 2020). Although there are varying reports of the direction of alpha power changes in predictive processing, this may be because of “when” or “what” is being predicted, differences in task structure (e.g., analyzing the stimulus or delay period), and differences between brain regions (Arnal and Giraud, 2012; Klimesch, 2012). Further considering connectivity, a recent macaque study reported greater α and β feedback from prefrontal to visual cortex during the presentation of more predictive visual stimuli (Bastos et al., 2020). van Pelt et al. (2016) also found strongest top-down feedback connectivity in the beta band while subjects viewed videos of predictable events. Our finding of frontal cortex causally influencing posterior cortex in the beta band according to predictions extends this finding to the delay period in the absence of sensory stimulation, as well as provides support for PC models that incorporate a key role for oscillatory activity more generally (Arnal and Giraud, 2012; Bastos et al., 2012; Alamia and VanRullen, 2019).
NMDARs are located in both superficial and deep cortical layers (Rosier et al., 1993), potentially allowing NMDARs to modulate the representation of predictions in deep layers, as proposed in certain PC models (Bastos et al., 2012), and predictive feedback signaling to superficial and/or deep layers. NMDAR blockade in humans has been shown to modulate frontal cortical excitability (Murray et al., 2014; Rosch et al., 2019), and our decoding analyses suggest that this NMDAR-related change in excitability reduces the SNR of prediction representations. This is consistent with macaque experiments showing that NMDAR blockade reduces the SNR in frontal cortex during the working memory period of an antisaccade task (Ma et al., 2015). In our study, NMDAR blockade also perturbed predictive feedback, consistent with macaque experiments showing that NMDARs contribute to feedback signaling (Self et al., 2012). In addition to NMDARs, ketamine can influence AMPARs, HCN1 channels, and other transmitter pathways (Sleigh et al., 2014; Zorumski et al., 2016). To minimize the possibility of these influences, we used relatively low dosing of ketamine to exploit its higher affinity for NMDAR compared with other receptors. Further, we controlled for noradrenergic and HCN1-mediated effects using the control drug, DEX. Together, these results suggest that NMDARs influence both the representation of predictions in higher-order areas and predictive signaling to lower-order areas, which impacts the formation of prestimulus templates.
Ketamine at subhypnotic doses perturbed feedback connectivity from frontal to more posterior cortex, but not evoked activity in sensory cortices, during which subjects could still perceive and accurately respond to audiovisual stimuli. This raises questions about the requirement of frontal feedback integrity for consciousness. Further, it has been proposed that generative models create virtual realities that support conscious experience (Hobson and Friston, 2012, 2014; Seth et al., 2012). That subjects' predictions in our study could be disrupted without impairing consciousness imposes constraints on PC as a theory of consciousness.
Footnotes
This work was supported by National Institutes of Health Grants R01MH110311 to Y.B.S., R01NS117901 to Y.B.S. and R.D.S., K23AG055700 to R.D.S., and R01AG063849 to R.D.S. We thank G. Lupyan, R.A. Pearce, and B.R. Postle for useful discussions.
The authors declare no competing financial interests.
- Correspondence should be addressed to Yuri B. Saalmann at saalmann{at}wisc.edu or Robert D. Sanders at robert.sanders{at}sydney.edu.au