Temporally Dissociable Contributions of Human Medial Prefrontal Subregions to Reward-Guided Learning

In decision making, dorsal and ventral medial prefrontal cortex show a sensitivity to key decision variables, such as reward prediction errors. It is unclear whether these signals reflect parallel processing of a common synchronous input to both regions, for example from mesocortical dopamine, or separate and consecutive stages in reward processing. These two perspectives make distinct predictions about the relative timing of feedback-related activity in each of these regions, a question we address here. To reconstruct the unique temporal contribution of dorsomedial (dmPFC) and ventromedial prefrontal cortex (vmPFC) to simultaneously measured EEG activity in human subjects, we developed a novel trialwise fMRI-informed EEG analysis that allows dissociating correlated and overlapping sources. We show that vmPFC uniquely contributes a sustained activation profile shortly after outcome presentation, whereas dmPFC contributes a later and more peaked activation pattern. This temporal dissociation is expressed mainly in the alpha band for a vmPFC signal, which contrasts with a theta based dmPFC signal. Thus, our data show reward-related vmPFC and dmPFC responses have distinct time courses and unique spectral profiles, findings that support distinct functional roles in a reward-processing network. SIGNIFICANCE STATEMENT Multiple subregions of the medial prefrontal cortex are known to be involved in decision making and learning, and expose similar response patterns in fMRI. Here, we used a novel approach to analyzing simultaneous EEG-fMRI that allows to dissociate the individual time courses of brain regions. We find that vmPFC and dmPFC have distinguishable time courses and time-frequency patterns.


Introduction
The medial prefrontal wall is important for reward-guided decision making, including learning and evaluating the incentive value of cues and actions (Rushworth et al., 2007Schoenbaum et al., 2009;Wallis and Kennerley, 2010;Alexander and Brown, 2011;O'Doherty, 2011). Two key subregions are a ventral portion beneath the genu of the corpus callosum (vmPFC; in particular medial area 10) and a dorsal portion that encompasses dorsal anterior cingulate cortex (dmPFC; in particular area 24, sometimes referred to as dACC; Ongür and Price, 2000;Rushworth and Behrens, 2008;Haber and Behrens, 2014). Evidence that both regions are important in reward processing disguises potential unique and temporally dissociable contributions to reward-guided learning.
The anatomical connectivity and cytoarchitecture of these two regions show a clear demarcation (Ongür and Price, 2000;Haber and Behrens, 2014). However, unambiguous functional dissociation is more problematic, with conflicting accounts proposing overlapping functions (Rangel and Hare, 2010;. In human functional MRI (fMRI) studies, vmPFC and dmPFC are often coactivated, but show anticorrelated activation patterns (FitzGerald et al., 2009;Hare et al., 2011;Nicolle et al., 2012;Boorman et al., 2013;Hauser et al., 2014a). Indeed, the observation that coactivation of multiple areas is common in reward processing (Vickery et al., 2011) renders it difficult to ascribe a specific functional role to a single area. Coactivated brain regions can emerge from two opposing principles: Two brain regions may receive inputs from the same source (such as dopaminergic midbrain), and synchronously process the same information with similar temporal patterns. Alternatively, coactivation of multiple regions in human neuroimaging studies may reflect a temporal sluggishness of an fMRI signal. Resolving these possibilities would gain strength from a refinement in methods with better temporal resolution, such as EEG or MEG (Hunt et al., 2012Hauser et al., 2014b). These modalities are limited in detecting signals from deep sources, especially where multiple areas respond to produce overlapping signals (Fig. 1A). As a consequence, decision-making studies often restrict themselves to single event-related components, such as the feedback-related negativity (FRN; Holroyd and Coles, 2002;Walsh and Anderson, 2012) or frontal midline theta (Cohen, 2011;Cavanagh and Frank, 2014;Cavanagh and Shackman, 2015), both traditionally associated with the dmPFC. Signals from vmPFC, however, are only rarely reported using electrophysiological techniques (Hunt et al., , 2013Harris et al., 2013;Lipsman et al., 2014), and are more difficult to localize as they can be obscured by more prominent signals arising in closer spatial proximity to recording electrodes.
Here we exploit the trial-by-trial variability in simultaneously recorded EEG and fMRI signal (Fig. 1B) to determine the unique contributions of brain regions to the EEG signal, enabling us to establish the time course of each region's response profile. We first show in a simulation how overlapping, correlated, sources can be misattributed as a singular dorsal midline source. We then show our technique can resolve time courses of visual and motor areas. In our main analysis, we dissociate contributions from vmPFC and dmPFC and show that the reward-locked activation in vmPFC precedes that of dmPFC. Moreover, we find that distinct frequency bands are evoked in each region, specifically theta oscillations within dmPFC and alpha within vmPFC.

Participants and task
Twenty-five healthy, right-handed humans (16 females, 29.9 Ϯ 7.4 years) performed a probabilistic reversal learning task while simultaneous EEG-fMRI was acquired. The study was approved by the local ethics committee, and all participants gave written informed consent. Data from 17 of the subjects has been reported previously using different analysis methods (Hauser et al., 2014b. Subjects performed a probabilistic reversal learning task ( Fig. 1C; Hauser et al., 2014aHauser et al., ,b, 2015 with two stimuli that were assigned either 80% or 20% win probability (50 Swiss Centimes). After 6 -10 correct responses, the reward probabilities reversed (average number of reversals: 7.3 Ϯ 1.1). The task consisted of 120 trials and 40 null trials. Each trial lasted 9000 ms on average. The cue stimuli were simultaneously presented for 2500 ms and participants could respond within 1500 ms and the selected stimulus was then highlighted by a frame for at least 1000 ms. After a jittered interval (mean 2750 ms), feedback (framed or crossed coin indicating win or loss) was pre-sented for 1000 ms, again followed by a jittered intertrial interval (mean 2750 ms, range 2000 -4000 ms). Further details of the task are provided by Hauser et al. (2014b).

Trialwise fMRI-informed EEG analysis
To understand the time course of regions activated in the fMRI, we exploited trial-by-trial variability of the EEG and fMRI signal to test for consistency in its covariation across modalities. With this approach, we can determine at each peristimulus time point how much trial-by-trial BOLD fluctuation within a given region covaries with the EEG signal. This allows us to make inferences about the activation time course of a specific brain region. Importantly, allowing multiple correlated sources to compete for variance in explaining the EEG data renders it possible to disaggregate the unique contribution of each region separately.
The analysis proceeded in several stages ( Fig. 1B): fMRI-ROI definition and single-trial response estimation. To understand the temporal evolution of a given brain region, we first defined a region-of-interest (ROI) using an ordinary fMRI analysis (cf. fMRI acquisition and analysis, below). We first derived areas being activated for a specific contrast. For our initial visual and motor responses, we used a contrast for the presentation of the cues (GLM1, cf. fMRI acquisition and analysis, below), and the button presses (GLM2, cf. fMRI acquisition and analysis, below), respectively. For our main analysis of dmPFC and vmPFC, we analyzed the reward-prediction error signals during feedback (GLM1). This contrast activated dmPFC, vmPFC, posterior cingulate cortex (PCC), and bilateral anterior insula (AI; Table 1; Fig. 1D). For each of these areas, we then extracted the time series for the complete experiment. To do so, we extracted the first eigenvector of the activation clusters as the time series, a standard procedure within SPM ("volume-of-interest" method), which removes nuisance regressors (here: pulse, movement, slow drifts) from the time series. We then estimated a trial-by-trial hemodynamic response function (HRF) of each region (detailed description below).
Multiple linear regression of single-trial fMRI responses onto EEG data. The estimated HRF response was collapsed across every trial and ROI to create a design matrix X fmri with dimensions: (n ROIs ϩ intercept) ϫ n trials (cf. Results, Eq. 1). This was then used to predict the EEG-signal Y e eeg across all trials and time points (dimensions: n timepoints ϫ n trials ). This was done for every electrode resulting in a contrast estimate B e eeg for every time point and electrode (dimensions: n ROIsϩ1 ϫ n timepoints ). The analysis of the time-frequency data were performed similarly. The time-frequency-analysis contained one additional dimension (size: n frequencies ), but the principle of regression across trials was the same. With this linear regression analysis, we only account for linear relationships between an EEG and BOLD signal, as in previous EEG-informed fMRI analyses (Debener et al., 2005;Eichele et al., 2005;Baumeister et al., 2014;Becker et al., 2014;Boecker et al., 2014;Hauser et al., 2014a,b;Iannaccone et al., 2015).
Group-level random-effects analysis of sensor-level data. Based on these fixed-effects results for each subject (Eq. 1), we ran a random-effects analysis calculating the consistency of these beta estimates (B e eeg ) across subjects, analogous to the summary statistics approach standardly used in fMRI analysis (Holmes and Friston, 1998). We therefore converted each individuals' beta estimates into three-dimensional SPM images (2 spatial dimensions, 1 time dimension) and smoothed the images (image kernel: 10 mm ϫ 10 mm ϫ 20 ms; Kilner and Friston, 2010). We used each subject's individual electrode placement (as defined by the fiducials FPz, PO9, PO10; cf. EEG acquisition and analysis, below) relative to the head for optimal alignment across subjects. Random-effects analysis was then run within SPM and familywise error (FWE) correction was used for multiple-comparison correction.

Learning model
We used a Bayesian hierarchical learning model, the hierarchical Gaussian filter model (HGF; Mathys et al., 2011Mathys et al., , 2014, to fit subjects' learning. This model is a Bayesian variation of classical reinforcement learning models (Rescorla and Wagner, 1972), which accounts for environmental volatility and adapts a trialwise learning rate accordingly (Behrens et al., Figure 1. Contributions of multiple brain areas to the EEG signal and their parcellation using fMRI-informed EEG. A, fMRI studies of reward processing show a multitude of areas responsive to reward information, such as reward prediction errors (left). Hypothetically, each area has a specific time course and amplitude in response to a stimulus (middle). Electrical fields elicited by these individual activation patterns sum spatially and a mixture of these signals defines event-related potentials measured at scalp electrodes, such as frontocentral electrode FCz (right). The contribution of each area is governed by the strength of its activation and the spatial proximity to the electrode, such that the recorded signal is a weighted sum (Figure legend continues.) 2007; Mathys et al., 2011). This model has been shown to outperform standard reinforcement learning models in many instances (Iglesias et al., 2013;Diaconescu et al., 2014;Hauser et al., 2014a;Vossel et al., 2014). It is worth noting that the prediction error signals generated by this model are closely correlated with those generated by more standard reinforcement learning models. Unsurprisingly, the cortical regions of interest isolated by the HGF model-based analysis of prediction errors in fMRI data ( Fig. 1D) are similar to those identified using classic reinforcement learning models as shown in previous studies (Gläscher et al., 2009;Hauser et al., 2015).
The electrode positions of each subject were manually determined based on the T1 image, which was recorded while the participants were wearing the EEG cap. The electrode positions of FPz, PO9, and PO10 were then used as fiducials to coregister the electrode positions to the scalp.
Preprocessing was performed in Analyzer (v2.01, BrainProducts GmbH). We performed MR-artifact removal using sliding average subtraction (Allen et al., 2000), and in-built cardioballistic artifact removal with manual inspection. The data were resampled to 256 Hz and filtered (0.1-30 Hz bandpass, 50 Hz notch). Ocular movement artifacts, remain-ing MR and cardioballistic artifacts were removed using ICA. Trials exceeding 120 V were automatically excluded. Continuous EEG was re-referenced to average reference and subsequently exported to MAT-LAB (R2013B). Further analysis was performed using customized analysis scripts as well as fieldtrip (Oostenveld et al., 2011) and SPM M/EEG (www.fil.ion.ucl.ac.uk;Litvak et al., 2011) routines.
In this paper, we do not focus on the task-evoked potentials (ERPs), but on the effect of BOLD variability on the EEG signal. For completeness, however, we show the event-related EEG components for rewards and punishments in Figure 1F.
Time-frequency power-spectra analysis was performed in fieldtrip using "mtmconvol" with a Hanning taper between 0 and 1000 ms, and 2 and 20 Hz frequency range (5 cycles per time window). The initial trialwise segments consisted of 5000 ms long epochs (prefeedback 2000 ms to postfeedback 3000 ms) to allow an adequate TF-decomposition also in low frequencies. Power-spectra were baseline-corrected using a segment 1000 -0 ms before feedback presentation. To correct for multiple comparisons in the power spectrum, we used a cluster-based permutation test across the whole time-frequency spectrum using 1000 iterations and a cluster forming threshold of t Ͼ Ϯ2 (cf. Hunt et al., 2013).

fMRI acquisition and analysis
fMRI was recorded in a Philips Achieva 3T scanner and a 32-channel receive-only coil. The EPI sequence was optimized for minimal ventromedial signal dropout (TR: 1850 ms, TE: 20 ms, 15°rotation from AC-PC; cf. Hauser et al., 2014b for further details). Additionally, a T1 wholebrain structural image was acquired. fMRI processing and analysis was performed in SPM 8 (www.fil.ion.ucl.ac.uk). Raw data were realigned to mean EPI and coregistered to the individuals T1 image. Normalization was performed using the new segmentation method and deformation field normalization. Subsequently, the normalized data were spatial smoothed with a 6 mm FWHM kernel. A more detailed description of the preprocessing is provided by Hauser et al. (2014a).
The key aim of the fMRI analysis in this paper was not to test specific novel hypotheses concerning RPE processing during reversal learning, but instead to generate ROIs that would subsequently be used to inform EEG analysis. We computed two separate GLMs to extract our specific ROIs.
GLM1. This GLM served as the primary analysis and was used to extract the reward prediction error (RPE)-related fMRI activations for the main joint fMRI-EEG analysis, as well as extract visual areas for the visual analysis. To isolate functional decision making ROIs, we entered the trialwise RPEs (variable ␦ 1 from the HGF model, which denotes the difference of the posterior and prior mean) as parametric modulator at the onset of feedback presentation into the first-level GLM. Additionally, we entered the following nuisance regressors to improve model fit: movement parameters, choice value (variable 2 of chosen object; the prior mean of the chosen object, reflecting the chosen value in standard reinforcement learning schemes) at onset of the cue stimulus, and cardiac regressors (Glover et al., 2000;). The visual ROI was derived from the cue stimulus presentation regressor in this GLM.
GLM2. To investigate the hand motor area for our motor analysis, we computed an additional GLM that contained a regressor for all button presses and the following nuisance regressors: movement and pulse. For motor analysis, we entered the temporal and spatial derivatives. We derived the motor activity from an additional, separate GLM, because the subjects were allowed to choose freely after cue onset. Thus, the button press regressor was highly correlated with the stimulus onset regressor, which rendered it difficult to obtain a clear motor cortex activation.
For all fMRI-only analyses, we applied a peak-FWE threshold at p Ͻ 0.05, k Ͼ20 to correct for multiple comparisons. The decision making network (GLM1) thus consisted of vmPFC and PCC which were positively associated with RPEs. Additionally, the dmPFC and bilateral AI were negatively correlated with RPEs (Table 1). In the analysis of the button presses (GLM2), we found the left handmotor area to be significantly activated by the first temporal derivative of the motor responses. This cluster then served as ROI for our motor anal-4 (Figure legend continued.) (as indicated by hypothetical weights, w 1 -w 5 ) that overweighs sources close to the recording electrodes (here: dmPFC). B, Trial-by-trial fMRI-informed EEG analysis disentangles unique contributions of brain areas to the EEG signal. Using the hemodynamic response, we estimate fMRI activation of each brain region on every trial (left). Within a multiple-regression analysis, we then determine the peristimulus time when trial-by-trial fMRI activations best predict the EEG signal (middle). This enables us to recover the unique and individual time courses of these brain regions (right). However, a caveat is that the approach remains insensitive to temporal contributions that perfectly covary in the fMRI signal across trials. C, Participants engage in a probabilistic reversal-learning task where they learn which of two stimuli had the higher reward probability. One of the stimuli was assigned as the correct stimulus with a reward probability of 80%. The other stimulus had a reward probability of 20%. The reward was depicted by a framed 50 Swiss Centimes' coin, whereas a punishment was illustrated by a crossed coin. After 6 -10 correct responses, the reward probabilities reversed. The participants were informed about potential reversals but were not instructed on the timing of these reversals. D, fMRI activations for RPEs at feedback. Increasing RPEs were linked to engagement of vmPFC and PCC, whereas decreasing RPEs engendered activation in dmPFC and bilateral AI. E, Average absolute correlations between ROIs activated in the RPE-fMRI-contrast. F, Event-related potentials elicited over electrode FCz, separated for rewards (black) and punishments (red). dmPFC, Dorsomedial prefrontal cortex; lAI, left anterior insula; PCC, posterior cingulate cortex; rAI, right anterior insula; vmPFC, ventromedial prefrontal cortex. ysis. In the visual analysis (GLM1), we found (in addition to primary visual areas) an occipitotemporal area containing fusiform gyrus to be highly active.
To estimate the trial-by-trial response of a given ROI, we first extracted the BOLD time-series (first eigenvector of the ROI, standard procedure in SPM "volume of interest" extraction) across the whole experiment. Nuisance artifacts as captured by our regressors-of-no-interest plus slow drifts (Ͼ128 s) and mean session amplitudes were removed from the time series.
To estimate the single-trial response amplitude, we up-sampled the data by factor 10 to a TR of 185 ms (using cubic spline interpolation) and then time-locked the signals from 0 to 8 s poststimulus. The up-sampling enabled us to fit the canonical SPM hemodynamic response function (and its first derivative) to the single-trial up-sampled data using linear regression. Using this method, we could estimate the amplitude of the HRF on each trial for each ROI without any overlap with the next trial (mean trial duration 9 s; compare Fig. 1C), despite the sparse sampling of the fMRI signal (TR ϭ 1850 ms). It is worth noting that a valid alternative method (particularly important for experiments with shorter trial durations and overlapping hemodynamic events) would be to model all trials in a single general linear model with HRF-convolved stick functions for each trial. In practice for our experiment, this produced very similar parameter estimates to the approach outlined above.

Dipole simulations
We sought to test (via simulation) whether equally strong sources placed in vmPFC and dmPFC would lead to both sources being visible in a conventional EEG analysis. Dipole simulations (Fig. 2) were performed using dipole simulation methods in fieldtrip. We used the standard template provided in SPM to generate the conductor tissue. We then used the peaks of our vmPFC and dmPFC ROIs as the MNI coordinates of our dipoles. The dipole momentum was set to [0 0 1] for dmPFC and [0 0 Ϫ1] for vmPFC. For both separate dipole analyses ( Fig. 2 A, B), we used a fixed, equally strong signal and added 5% of noise. The same procedure was applied when simulating the two dipoles together (common simulation; Fig.  2C). For fitting the dipole in the common simulation (Fig. 2C, right), we used the dipole fitting method in fieldtrip using the same head model as for the dipole simulation. To fit the dipole, we fitted one dipole and performed a linear grid search across the whole volume and report the best fitting source. We found (Fig.  2C, right) that the common topography for the mixed simulated dipoles was in fact estimated to originate solely from dmPFC. Similar results were obtained when both simulated sources were assigned with the same dipole momentum (data not shown). Such a localization method has frequently been used in conventional EEG studies, e.g., in studies investigating feedback-related negativity (cf. Walsh and Anderson, 2012). It thus shows the difficulty of detecting different sources with correlated and overlapping activation time courses.

Proof of principle: activation time course of visual and motor areas
To test the validity of our method, we applied it to a robust activation in left occipitotemporal cortex ( Fig. 3A; containing fusiform gyrus, peak at MNI: x ϭ Ϫ36, y ϭ Ϫ61, z ϭ Ϫ2, t (24) ϭ 19.02). This area showed strong fMRI activation in response to the presentation of visual stimuli using a standard fMRI analysis of stimulus presentation. Our fMRI-informed EEG analysis revealed that the EEG signal was strongly related to the fMRI data extracted from this visual ROI ϳ190 ms after stimulus presentation (Fig. 3A  Overlapping activations of multiple brain areas bias inference and source localization in noninvasive neurophysiology. A simulated dipole from the dmPFC (A, right) shows a typical midfrontal topography (left), whereas a simulated vmPFC dipole (B, right) displays a similar and overlapping topography (left), albeit with a broader spatial distribution reflecting its greater distance from the sensors. C, The simulation of two, equally strong dipoles in vmPFC and dmPFC (same origins as A and B) result in a topography, which is strongly dominated by the dmPFC signal (left). Source estimation of this topography, thus, falsely locates a single dipole in the dmPFC (C, right) and is unable to capture a contribution of the vmPFC.
in proximity of our functional ROI is widely reported (Hillyard and Anllo-Vento, 1998;Brem et al., 2006). In addition, the evoked topography showed a marked left-lateralization (Fig. 3A), confirming an assumption that the left-lateralized aspects of ordinary N1-topographies are generated within a left-hemispheric area. It is important to note that no prior spatial information was entered into our fMRI-informed EEG analysis and that this EEG scalp topography emerged naturally from the data. Based on this resulting topography, we then performed a priorinformed source-localization (cf. Informed source localizations, below) and found the peak source probability to relocalize within the original functional ROI (Fig. 3A, right; peak MNI: x ϭ Ϫ22, y ϭ Ϫ86, z ϭ 12).
As further confirmation for our method, we reconstructed the time course of the motor activity from the handmotor area. Here, we analyzed the fMRI activation at button press and derived a functional ROI which included the prominent "⍀ sign" (Yousry et al., 1997), which defines handmotor area ( Fig. 3B; peak activation at MNI: x ϭ Ϫ36, y ϭ Ϫ30, z ϭ 58; z ϭ 7.08; p FWE Ͻ 0.001) and we determined its activity around the time of button presses in the task. We found that BOLD activation in this area significantly predicted EEG data between ϳ 80 ms before and 100 ms after button press (Fig. 3B, middle; cluster-extent FWE correction p Ͻ 0.001 for time window Ϫ500 to 500 ms peristimulus; voxel height threshold: t ϭ 3; peak at 20 ms, t (24) ϭ 7.70). The topography showed a clear midcentral distribution in proximity of the handmotor area. Source localization, again, confirmed the reasonability of our resulting topography (Fig. 3B, right; peak MNI: x ϭ Ϫ34, y ϭ Ϫ30, z ϭ 52).

Informed source localizations
To examine the face validity of our analysis, we also localized the topographies which we received from our fMRI-EEG integration. Importantly, no spatial information entered into fMRI-informed EEG analysis, and therefore the method was completely uninformed about the spatial topography that might emerge in sensor-level EEG analysis.
For the source localization of our results, we used the group-level random effects topographies from our fMRI-informed EEG analysis. For each ROI, we ran an informed source localization in SPM. We used the time window that was found to be significant in the main analysis for the given ROI. The group topographies of the t values during this time were then inverted. To generate the forward model, we used the mean of the normalized T1 images of all subjects. As group electrode positions, we used the mean of the subjects' individual fiducial positions which was then coregistered to the mean T1. The model inversion (minimum norm in combination with a smooth source covariance as provided in SPM: COH; PST Hanning window; bandpass filter 0 -48 Hz; Friston et al., 2008) was performed using a spatially informed, but unrestricted source localization using the fMRI-ROI as a prior. The prior we entered was the original ROI cluster, which was initially used for the fMRI-informed EEG analysis. This prior was adopted because we had strong assumptions on the origin of the topography and we wanted to evaluate whether the topography of our analysis confirmed or rejected our initial source localization. This approach leads to a relaxation of the shrinkage priors in this area, but does not restrict the source to be estimated at any area in the brain, i.e., it is less restrictive in the predefined region but allows for whole-brain sources. A striking example of this can be seen in Figure 4D, where the source localization for vmPFC also picks up a contribution from PCC, which is known to commonly coactivate with vmPFC in many studies of reward-guided learning and decision making (Clithero and Rangel, 2014).

Determining activation time courses of brain regions
Simultaneous acquisition of EEG and fMRI provides a means to concurrently gather temporal and spatial information related to cognitive processing (Rosa et al., 2010;Huster et al., 2012). Most analysis methods rely on trial-by-trial variability of EEG data to localize the source of ERPs (Debener et al., 2005;Eichele et al., 2005;Baumeister et al., 2014;Becker et al., 2014;Boecker et al., 2014;Hauser et al., 2014a,b;Iannaccone et al., 2015). Here, we exploit a trial-by-trial variability in a local fMRI signal to determine the temporal activation and oscillatory pattern of specific, predefined, brain regions in recorded EEG tracings. Underlying all simultaneous EEG-fMRI analyses is an established assumption that electrical potentials as recorded by EEG, and brain activity as measured by the BOLD signal, show an approximately linear relationship (for review, see Rosa et al., 2010). If a brain region has an increased fMRI signal at a given trial, its electrical field potential will therefore also be increased, and vice versa.
Unlike traditional EEG-informed fMRI analyses or EEG source analyses, the analysis we implement is not meant to reveal . Activity in this region was successfully associated with an EEG signal between 170 and 220 ms (shaded area depicts cluster-corrected significance) at occipitotemporal electrodes (middle), confirming the sensitivity of our approach (red indicates electrode used for time plot). Informed source localization confirmed sensitivity of topography (right). B, Second validation of method. Right-handed button presses elicited activation of the left handmotor area (left), which was found to be active around the time of the button press with a typical midcentral topography (middle). Informed source localization localized again into the motor area (right).
where in the brain specific EEG signals originate from. Instead, our fMRI-informed EEG analysis asks which features of the EEG data are linearly dependent upon trial-to-trial variability in BOLD responses. By including multiple brain regions simultaneously in the same regression model, we can ask a novel set of questions including whether certain EEG features are better explained by some region's activity than others.
We ran a mass-univariate (multiple) regression analysis (Fig.  1B) on EEG data from 25 healthy adults who performed a probabilistic reversal learning task (Fig. 1C) while fMRI was recorded simultaneously. First, we extracted the fMRI activation of a given ROI on every trial by deconvolving the local fMRI signal with the hemodynamic response function. Together with a constant term, these estimates formed the design matrix derived from the fMRI data, termed X fmri dimensions: (n ROIs ϩ intercept) ϫ n trials . ROIs were determined based on conventional fMRI contrasts from previously designed analyses (cf. Hauser et al., 2014a;visual, motor, reward prediction errors). Then, for each EEG electrode e, we ran a separate (multiple) linear regression analysis: (1) Y e eeg (dimensions: n timepoints ϫ n trials ) denotes the preprocessed EEG data for a given electrode e. The effect sizes B e eeg (dimensions: n ROIsϩ1 ϫ n timepoints ) and their errors E e eeg (dimensions: n timepoints ϫ n trials ) were estimated for each electrode separately using ordinary least-squares regression. The design matrix X fmri remained constant in the regression on each sensor.
The key idea is that at each EEG sensor, the estimated B matrix reflects the time course of the electrical contribution from each region of interest. Crucially, the approach controls for the electrical contributions of other regions of interest to the sensor, as all regions of interest compete for explained variance in the multiple linear regression. The trial-to-trial activation of correlated sources, such as the ones elicited by RPE processing (Fig. 1E), are sufficiently decorrelated to discern unique contributions to the EEG signal. However, it is important to bear in mind that this method is insensitive to common contributions from multiple regions that have identical temporal and trial-by-trial fMRI profiles. EEG parameter estimates were calculated at the first level by performing the regression separately for each subject. At the second level, we calculated the group time course for each ROI by entering each participant's beta estimates into one-sampled t tests, to test for significant deflections from zero across the population. We corrected for multiple comparisons using FWE correction.
To test the validity of our method, we applied it to two brain areas which elicit unique and characteristic temporal and topographical EEG signals, namely visual elicited activation from occipitotemporal cortex (including fusiform area) and motor activation from handmotor area (see Materials and methods for details). The occipitotemporal cortex elicited a strong negative deflection at left occipitotemporal electrodes, peaking at 190 ms after stimulus onset (Fig. 3A, middle). This is well in line with the visual N1/N170 component, often associated with activation in occipitotemporal regions (Hillyard and Anllo-Vento, 1998;Brem et al., 2006;Friston et al., 2008). The handmotor area showed a significant activation at the time of button press with a midcentral topography in proximity of the central sulcus (Fig. 3B). It is important to note that no information about the timing or topography is provided in our method. The specific time course and topography thus emerged from the data itself, by identifying which features of EEG activity covaried with trial-by-trial fluctuations in fMRI activity.

Dissociating medial prefrontal cortex activations: time courses of vmPFC and dmPFC
Having established the sensitivity of our method, we addressed our main question of interest, namely the temporal response profile of vmPFC and dmPFC during reward processing.
Participants played a probabilistic reversal learning task (Fig.  1C) where they had to learn reward probabilities based on the feedback. vmPFC and dmPFC were identified from a RPE contrast during feedback in a conventional fMRI analysis ( Fig. 1D; Table 1). vmPFC activity increased with increasing RPEs, Figure 4. Unique temporal contributions of vmPFC and dmPFC to EEG signal. vmPFC cluster (A) revealed a temporally elongated activity (B) becoming active at ϳ250 ms after feedback (shaded area depicts cluster-corrected significance) and uniquely processing information until ϳ400 ms after feedback with a midcentral topography (C; red indicates electrodes Fz and FCz used for time plot B). D, Source estimation localized center of topography into vmPFC again (peak at MNI: x ϭ Ϫ8, y ϭ 40, z ϭ Ϫ14). E, Time-frequency decomposition revealed that vmPFC mainly operates in the alpha band, from 8 to 12 Hz (thick black line; p Ͻ 0.05 using cluster permutation test). The colors indicate the effect of single-trial fMRI responses on the EEG power rather than pure EEG power signals. dmPFC (F) shows a short, but marked unique activation between 360 and 400 ms after feedback (G). Its topography is typical for dorsofrontal midline areas (H; green indicates electrode FCz). I, Informed source localization projects topography back to dmPFC (peak at MNI x ϭ 10, y ϭ 22, z ϭ 44). Time-frequency analysis shows that dmPFC elicits a significant activation in the theta band, from 5 to 8 Hz (J).
whereas dmPFC showed increased activation to decreasing RPEs. This confirms the antagonistic activation pattern of these areas, similar to previous studies (FitzGerald et al., 2009;Hare et al., 2011;Nicolle et al., 2012;Boorman et al., 2013). Moreover, we found coactivation of the PCC with the vmPFC, both of them known to frequently coactivate and as the core of the default mode network (Raichle et al., 2001). The dmPFC showed coactivations with bilateral AI, again known to coactivate as nodes of the saliency network (Seeley et al., 2007). To reliably determine the unique contributions of vmPFC and dmPFC (compare Fig.  1E), we controlled for the joint activation with PCC and AI by adding them to the EEG multiple regression model as covariates.
We found an unique contribution of vmPFC ( Fig. 4A-C) that extended across a prolonged period postfeedback presentation, starting as early as 250 ms after feedback onset and remaining active until ϳ400 ms poststimulus, peaking at 262 ms (cluster-FWE corrected: p ϭ 0.017; voxel-height threshold: t ϭ 3; correction time window: 0 -800 ms). The topography attributed to the vmPFC had a midcentral distribution and is similar to a simulated topography from the given area (Fig. 2B). This was also confirmed using an informed source localization, which localized the topography within the same area (Fig. 4D).
An activation uniquely attributed to dmPFC (Fig. 4F-H ) showed an onset at ϳ360 ms postfeedback onset, with a strong peak at 390 ms. This activation terminated at ϳ400 ms (cluster-FWE corrected: p ϭ 0.045; voxel-height threshold: t ϭ 3; time window: 300 -400 ms). The dmPFC showed a midcentral topography, similar to our simulated response ( Fig. 2A). The topography, was successfully reallocated to the dmPFC in the informed source localization (Fig. 4I ).
To examine further for timing differences between vmPFC and dmPFC, we ran a cross-correlation analysis between the fMRI-informed EEG-signals for both regions. The crosscorrelation examines the time-shift that maximizes the correlation between two signals and revealed that cross-correlation was maximal at a lag of 50 ms at electrode AFz. This supports the notion that the vmPFC precedes dmPFC in terms of activation.
It should be noted that our multiple-regression approach identifies signals that can be explained over and above the contribution of other brain regions and thus may be biased to preferentially reveal differences rather than common activations. We therefore conducted additional analyses which do not control for shared variance (Fig. 5). In these analyses, activation in both regions became stronger and more extended in time, implying a common extended component between the regions that may be masked by our multiple regression approach. vmPFC (Fig. 5A) activity peaked at 260 ms after feedback remained active until ϳ400 ms (t (24) ϭ 4.27, p ϭ 0.003, cluster-FWE corrected for multiple comparisons; time window: 0 -800 ms poststimulus). dmPFC (Fig. 5B) showed a sustained activation from ϳ300 -420 ms (t (24) ϭ 5.68, p ϭ 0.027, peak-FWE corrected for multiple comparisons; time window: 0 -800 ms poststimulus). Importantly, however a vmPFC response still preceded that seen in dmPFC supporting the notion that these regions make distinct temporal contributions to the reward learning process.

Time-frequency analysis of dmPFC and vmPFC activation
To further understand the neural dynamics of vmPFC and dmPFC, we extended our method and performed our fMRIinformed EEG analysis on the time-frequency decomposed EEG signal. We repeated the same analysis as described in equation 1, but the data vector Y e eeg was extended by the frequency component Y e,f eeg ; that is, regression is not only performed at every elec-trode and every time bin as before, but also at every frequency bin. We focus on the time-frequency induced effects of dmPFC and vmPFC in a subset of midfrontal electrodes which correspond to the electrodes that showed a strong effect of vmPFC, and dmPFC activity respectively (electrodes Fz, FCz, Cz, C1, C2, FC1, FC2 for dmPFC, electrodes AFz, Fz, FCz, FC1, FC2, Cz for vmPFC; Fig. 4 E, J, compare bottom right). Oscillatory correlates of the dmPFC have extensively been investigated in various cognitive domains with dmPFC often associated with theta activation over midfrontal electrodes (Cavanagh et al., 2011;Cohen, 2011;Cavanagh and Frank, 2014;Cavanagh and Shackman, 2015;Frank et al., 2015). There is by comparison little knowledge regarding vmPFC, though first studies indicate it processes information mainly in alpha and lower beta bands (Oya et al., 2005;Hunt et al., 2013;Lipsman et al., 2014).
At the time when dmPFC showed marked activation in the time domain (360 -400 ms), we found increased synchronization in the theta band ( Fig. 4J; p ϭ 0.046, corrected for multiple comparisons using cluster permutation). The vmPFC showed a clear and marked desynchronization mainly in the alpha band ( Fig. 4E; p ϭ 0.029, corrected for multiple comparisons using cluster permutation) extending around the same time as the unique activation of vmPFC was apparent.  To investigate the effect of dmPFC and vmPFC when not controlling for the contributions of the related areas, we analyzed the vmPFC and dmPFC separately. We found that both signals show sustained activations after feedback with a similar topography as in our main analysis. This supports an assumption that activation signals recorded during this time consist of a mixture of dmPFC and vmPFC signals and are not differentiable by using common EEG/MEG analyses. dmPFC (B) showed a sustained activation from ϳ300 to 420 ms (shaded area depicts cluster-corrected significance) and vmPFC (A) activity peaked at 260 ms after feedback and showed a long lasting activity again until ϳ400 ms.

Discussion
subjects' decision making behavior as well as in learning from reinforcement (Fellows andFarah, 2003, 2007). Single-unit recordings also provide evidence suggestive of a role in motivation and choice (Bouret and Richmond, 2010;Strait et al., 2014). dmPFC is likewise activated in fMRI studies of reinforcement learning and action selection (Behrens et al., 2007;Botvinick, 2007;Hare et al., 2011;Vickery et al., 2011;Kolling et al., 2012;Boorman et al., 2013;Economides et al., 2014). Combined with evidence from single-unit recording and lesion studies (Kennerley et al., 2006;Seo and Lee, 2009;Sheth et al., 2012), this has led to the idea that dmPFC subserves a role in learning from outcomes, as well as in deploying learnt values in action selection (Alexander and Brown, 2011;. Despite partially different functional connotations of vmPFC and dmPFC, both areas consistently tend to coactivate in fMRI studies of human decision making (FitzGerald et al., 2009;Hare et al., 2011;Nicolle et al., 2012;Boorman et al., 2013), which has rendered it difficult to understand the relation between these two regions within the decision making network.
Here in our fMRI analysis we replicated a common finding of anticorrelated activation profiles of dmPFC and vmPFC, and also found unique temporal activation profiles for each area. vmPFC became active as early as 250 ms following a reward outcome and maintained its activation until 400 ms, an activation also reflected in a marked alpha desynchronization in the time-frequency domain. The dmPFC, on the other hand, showed a much shorter, but temporally overlapping activity between 360 and 400 ms after feedback, an activation profile reflected in the theta band.
The medial prefrontal wall as a whole is densely innervated by mesencephalic dopamine inputs (Lindvall et al., 1974;Bates and Goldman-Rakic, 1993) rendering it plausible both areas process the same dopaminergic information, such as prediction errors (Holroyd and Coles, 2002;Schultz, 2002), based on this common input with (potentially) a similar time course. A different viewpoint assumes vmPFC and dmPFC makes distinct contributions to decision making (Schoenbaum et al., 2009;Grabenhorst and Rolls, 2011;Hare et al., 2011;O'Doherty, 2011). This is supported by evidence that vmPFC calculates option values (Hare et al., 2011) or stores value expectations (Schoenbaum et al., 2009) which are then transmitted to other regions including dmPFC (potentially via a dopaminergic relay; Schoenbaum et al., 2009). Such a decision-making hierarchy entails vmPFC and dmPFC should show temporally distinct activation patterns, with vmPFC activated at an earlier stage than dmPFC. Our findings lend support to this idea of a preceding vmPFC response that provides a value signal, which may then be propagated to dmPFC, which in turn integrates this information into action values for subsequent action selection.
Our findings demonstrate that vmPFC and dmPFC process unique information which is functionally dissociable from one another as well as other coactivated areas. Although we find coactivation of these regions in conventional fMRI analyses, it is nevertheless the case that the responsivity of these regions is dissociable in time. Both vmPFC and dmPFC process information separately, over and above any simultaneous processing of information that our method may be less sensitive too. This temporal separation speaks against the idea that these areas simultaneously process information purely as predicted by accounts where reward-related activation is driven by a common input. Rather, it supports the idea that there is a processing hierarchy instantiated within the medial prefrontal cortex itself whereby an earlier activation of vmPFC might be propagated forward to the dmPFC for further evaluation. However, it will be important to replicate these findings in animal studies with multisite recordings. Therein, one can measure region-specific neural activity directly without having to control for coactivated brain areas. We would therefore expect to see similar, but temporally extended activation patterns, as our approach ignores variance shared by multiple ROIs.
Previous EEG studies focused on the FRN (Miltner et al., 1997;Holroyd and Coles, 2002;Talmi et al., 2013;Sambrook and Goslin, 2014) as a component which reflects feedback processing in the dmPFC (Walsh and Anderson, 2012). Interestingly, the FRN often shows a different temporal activation pattern to our observed dmPFC signal, as it occurs between 250 and 350 ms after feedback. This discrepancy may be resolved when we understand that such ERP components are likely to represent a mixture signal caused by several areas which are active at approximately the same time (compare Fig. 1A). It is also likely that the FRN reflects a mixture of active areas, potentially even a combination of vmPFC and dmPFC activation. Given our finding that vmPFC is active at ϳ250 ms, a mixture of vmPFC and dmPFC activity would explain the earlier occurrence of the FRN compared with our dmPFC activation. Moreover, it also explains why previous source localizations often report more anterior and ventral FRN origins, even localizing it in proximity of the vmPFC (cf. Walsh and Anderson, 2012), rendering it likely that a mixture of vmPFC and dmPFC signal is located between the two sources. Evidence for this assumption comes from our analysis in which we do not control for the other regions where we showed a temporally more elongated signal for the dmPFC, well within the range of the FRN (Fig. 5). Given that this signal does not emerge when controlling for other areas, it is clear that this time course does not reflect activity from dmPFC alone. This also highlights that common EEG source and scalp analyses often struggle to differentiate temporally related signals. Critically, EEG analyses are less sensitive to deep sources such as the vmPFC, especially if such signals are superimposed upon by stronger signals from regions closer to the scalp. The key advantage of our approach is that we have a signal that we know originates from vmPFC, rather than inferring on the signal using an inverse solution. This enables us to clearly delineate the signals from vmPFC and dmPFC and therefore to define unique temporal contributions of these areas.
The finding that the dmPFC operates in the theta frequency band fits with a broad corpus of literature which posits frontal midline theta as being related to decision making and cognitive control (Cavanagh and Frank, 2014). One of the difficulties of the frontal midline theta, however, has been a lack of spatial specificity. Our results show that the dmPFC elicits feedback-related theta activity, even when controlling for coactivated areas such as the insular cortex or vmPFC. Much less is known about the temporal and oscillatory aspects of the vmPFC as these signals are often obscured by other signals with sources closer to the scalp electrodes. By controlling for the activity of the coactivated regions, we recovered a vmPFC signal that operates mainly in the alpha band. We note that the few studies that have investigated vmPFC activity using MEG and intracortical recordings (Oya et al., 2005;Hunt et al., , 2013Harris et al., 2013;Lipsman et al., 2014) have reported similar frequency bands as reported in our results.
In this study, we show that our fMRI-informed EEG approach can be of great value for answering questions about the timing of specific brain regions, and that it is robust against correlated sources and complementary scalp distributions. Our approach assumes a linear relationship between amplitude/power and BOLD signal strength. Although animal studies have suggested that the relation between BOLD and local field potentials is partly nonlinear and frequency-dependent (for review, see Rosa et al., 2010), a broad corpus of simultaneous EEG-fMRI studies demonstrate that despite simplifying assumptions, linear associations provide for sensitive and reliable results (Debener et al., 2006;Rosa et al., 2010). However, it should be noted that a directionality in coupling between EEG frequencies and BOLD is nontrivial (Scheeringa et al., 2011). This may explain why we observed a negative relationship between EEG alpha power and BOLD in the vmPFC, similar to previous EEG-fMRI reports ( Lüchinger et al., 2011;Scheeringa et al., 2011), but opposite to invasive recording findings (Oya et al., 2005). Moreover, it is noteworthy that our approach requires a relatively strong trial-by-trial variability of EEG-and fMRI-signals to determine the specific time courses. In our main analysis, we analyzed reward and punishment trials together. A separate analysis of rewards and punishments results in similar (albeit slightly weaker; data not shown) results for rewards, whereas punishment-only trials lack in power to show the same patterns. An increase in trial number of either kind would thus be beneficial for future studies.
In summary, we provide evidence for dissociable temporal patterns of the frontal midline areas dmPFC and vmPFC. Both areas express dissociable temporal signals with vmPFC preceding a dmPFC activation. This pattern supports the assumption of a functional architecture in the medial prefrontal cortex whereby vmPFC (operating in alpha bands) is involved primarily in valuation and value comparison and precedes dmPFC (operating in theta), which then uses this information in subsequent trials to select the most valuable actions.