Abstract
Recent neurophysiological accounts of predictive coding hypothesized that a mismatch of prediction and sensory evidence—a prediction error (PE)—should be signaled by increased gamma-band activity (GBA) in the cortical area where prediction and evidence are compared. This hypothesis contrasts with alternative accounts where violated predictions should lead to reduced neural responses. We tested these hypotheses by violating predictions about face orientation and illumination direction in a Mooney face-detection task, while recording magnetoencephalographic responses in a large sample of 48 human subjects. The investigated predictions, acquired via lifelong experience, are known to be processed at different time points and brain regions during face recognition.
Behavioral responses confirmed the induction of PEs by our task. Beamformer source analysis revealed an early PE signal for unexpected orientation in visual brain areas followed by a PE signal for unexpected illumination in areas involved in 3D shape from shading and spatial working memory. Both PE signals were reflected by increases in high-frequency (68–140 Hz) GBA. In high-frequency GBA we also observed a late interaction effect in visual brain areas, probably corresponding to a high-level PE signal. In addition, increased high-frequency GBA for expected illumination was observed in brain areas involved in attention to internal representations. Our results strongly support the hypothesis that increased GBA signals PEs. Additionally, GBA may represent attentional effects.
Introduction
The view of the brain as a “predictive machine” has gained considerable popularity in the last decade (Hawkins and Blakeslee, 2005; Clark, 2013; Hohwy, 2013). This notion implies that the brain relies on statistical regularities in the environment to construct internal predictions of its sensory inputs to facilitate perception. In many cases these statistical regularities are extracted from lifelong experience and form priors residing in implicit long-term memory. Yet, the mechanisms underlying the integration of experience-based information and sensory evidence during the perceptual process are still a matter of debate (Mumford, 1992; Rao and Ballard, 1999; Kersten et al., 2004; Friston, 2005, 2010; Grossberg, 2007, 2013; Spratling, 2008; Kay and Phillips, 2011). Opposing theories propose either signal suppression (Grossberg, 2007, 2013; Carpenter and Grossberg, 2010) or signal enhancement (Mumford, 1992; Rao and Ballard, 1999; Friston, 2005) in case of a mismatch of sensory evidence and information learned from previous experience.
According to predictive coding theory (Rao and Ballard, 1999) in particular, a mismatch between predictions based on priors from our experience and incoming information should result in a prediction error (PE), reflected by increased neural activity.
Anatomically, PEs are supposed to be propagated by feedforward connections (Rao and Ballard, 1999), originating in superficial cortical layers (Barone et al., 2000). As gamma-band activity (GBA) is prominent in the superficial layers of the cortical microcircuit (Buffalo et al., 2011; for review, see Wang, 2010), it has been suggested that the bottom-up propagation of PE signals is reflected in GBA (Arnal and Giraud, 2012; Bastos et al., 2012).
To test the hypothesis that PEs are reflected by increased neural activity versus alternative accounts that favor suppression of activity in case of violated predictions, we used MEG because of its high temporal and spatial resolution. This enabled the investigation of timing, anatomical location, and magnitude of PE signals at distinct hierarchical levels. Moreover, direct access to electrophysiological activity by MEG allowed us to specifically test whether GBA is the carrier of PE signals.
First evidence for PE signaling in GBA has been provided in recent MEG studies (Arnal et al., 2011; Todorovic et al., 2011; Bauer et al., 2014). The present study is, however, to our knowledge the first one to test this hypothesis for priors from lifelong experience while providing the spatial resolution to investigate PEs at different hierarchical levels and a high statistical power due to the large sample size of 48 subjects (for review on the problems caused by small sample sizes in neuroscience, see Button et al., 2013).
We induced PEs by a mismatch between the sensory input in a Mooney (Mooney and Ferguson, 1951) face-detection task and predictions based on priors from lifelong visual experience. The investigated priors “upright face orientation” and “illumination from the top” are supposed to be processed in different brain areas as well as at different time points during the face recognition process (Cavanagh, 1991), which we expect to be reflected by time-shifted PEs.
Materials and Methods
Experimental strategy
To investigate the neural correlates of PE signals we collected MEG responses while subjects performed a Mooney face-detection task (Mooney and Ferguson, 1951). Mooney stimuli cannot be recognized without relying on predictions based on priors from our lifelong experience (Moore and Cavanagh, 1998; Kemelmacher-Shlizerman et al., 2008). Here, we focused on two important priors for Mooney faces. First, faces normally appear in upright orientation (“orientation prior”; Yin, 1969; Valentine, 1988). Second, a scene is normally illuminated by a single light source from the top (“illumination prior”; Brewster, 1844; Sun and Perona, 1998; Adams, 2007; Gerardin et al., 2010).
To induce PEs, the presented stimuli were made incompatible with the orientation prior, the illumination prior, or both priors. To this end, we presented upright (UP) or inverted (IN) Mooney faces illuminated from the top (TP) or from the bottom (BT), which resulted in a 2 × 2 full factorial design (factors orientation and illumination) with four Mooney face conditions: UPTP, UPBT, INTP, and INBT. To counter a potential response bias, additional sham stimuli with matched image statistics were presented that did not contain a face.
To formulate hypotheses about the expected timing of neural PE responses we draw on a behaviorally well validated process model for Mooney face recognition by Cavanagh (1991). This model suggests that the stimulus orientation should be processed before the illumination direction is evaluated. Hence, we assume that the PE response for the violation of the orientation prior should precede the PE response for the violation of the illumination prior.
Subjects
Fifty-nine subjects participated in the MEG experiment. Subjects had normal or corrected-to-normal visual acuity and were right-handed according to the Edinburgh Handedness Inventory scale (Oldfield, 1971). Each subject gave written informed consent before the beginning of the experiment. Subjects were paid 10€/h. Eleven subjects had to be excluded from further analysis; one subject was not able to tolerate the structural MRI scan, five subjects were excluded due to excessive movement or due to an insufficient amount of remaining trials after artifact rejection, and five more subjects were excluded based on their behavioral performance (see exclusion criteria below). Forty-eight participants (average age: 25.04 years, 22 males) remained and were considered for behavioral and neurophysiological analysis. The large sample size of 48 subjects was chosen to reduce the risk of false positives, as suggested by Button et al. (2013).
The local ethics committee (Johann Wolfgang Goethe University, Frankfurt, Germany) approved of the experimental procedure.
Stimuli
Two-tone images, known as Mooney face stimuli (Mooney, 1957), were created by transforming all shades of gray in photographs of upright faces into either black or white. To investigate the violation of the orientation prior, Mooney face orientation was inverted. To investigate the violation of the illumination prior, the illumination source was set to light from the bottom, while light from the top corresponded to the expected illumination direction.
There was no significant difference in average local luminance between any of the four Mooney face conditions (p > 0.55). In addition, scrambled “No-Face” stimuli (SCR) were created from each of the Mooney face conditions by displacing white or black patches within the given background, thereby all low-level information was maintained but the facial configuration disappeared. The scrambled stimuli served as sham stimuli to avoid a response bias toward detecting faces. Examples of the stimuli can be seen in Figure 1.
All stimuli were resized to a resolution of 510 × 650 pixels. All stimulus manipulations were performed with the GNU Image Manipulation Program 2.4, Free Software Foundation).
Stimulus presentation
A projector with a refresh rate of 60 Hz was used to display the stimuli at the center of a translucent screen (background set to gray, 145 cd/m2). Stimulus presentation was controlled using the Presentation software package (Version 9.90; Neurobehavioral Systems).
Stimuli were presented in a pseudorandomized order for a short time window of 0.2 s with a vertical visual angle of 20.8 and a horizontal visual angle of 16.2 degrees (white stimulus parts, 1140 cd/m2; black stimulus parts, 30 cd/m2). To avoid effects of fatigue, the overall experiment was divided into six blocks (134 stimuli per block) and subjects were allowed to take short breaks between blocks. In each block, 20 Mooney face stimuli of each face condition were presented together with No-Face stimuli in a 3:2 ratio (exact ratio 2.96:2) to counteract response bias, resulting in 80 Mooney face stimuli and 54 Scramble stimuli. The intertrial interval between stimulus presentations was randomly jittered from 3.5 to 4.5 s.
Task and instructions
Subjects performed a face-detection task on two-tone images and responded by pressing one of two buttons. The button assignment for a “Face” or “No-Face” response was counterbalanced across subjects (n = 24 right index finger for Face response). Subjects were instructed to respond only once and as precisely and quickly as possible. The subjects were informed about the ratio (3:2) of Faces to No-Faces in the presentation. Between stimulus presentations subjects were instructed to fixate a white cross on the center of the gray screen. Further, they were instructed to maintain fixation during the whole block. In addition, subjects were asked to suppress eye blinks during stimulus presentation and to avoid any movement during the acquisition session. Before data acquisition, subjects performed a test block of 2 min with stimuli not used during the actual task.
Data acquisition and exclusion criteria
MEG data acquisition was performed in line with recently published guidelines for MEG recordings (Gross et al., 2013). MEG signals were recorded using a whole-head system (Omega 2005; VSM MedTech) with 275 channels. The signals were recorded continuously at a sampling rate of 1200 Hz in a synthetic third-order gradiometer configuration and were filtered on-line with fourth-order Butterworth 300 Hz low-pass and 0.1 Hz high-pass filters.
Before and after each block the subject's head position relative to the gradiometer array was determined using three localization coils, one at the nasion and the other two located 1 cm anterior to the tragus of each ear on the nasion-tragus plane. Blocks with a head movement exceeding 5 mm were discarded from further MEG data analysis.
For artifact detection the horizontal and vertical EOG was recorded via four electrodes: two were placed distal to the outer canthi of the left and right eye (horizontal eye movements) and the other two were placed above and below the right eye (vertical eye movements and blinks). The impedance of each electrode was measured with an electrode impedance meter (Astro-Med) and was kept <15 kΩ.
Structural MR images were obtained with a 3 T Siemens Allegra or Trio scanner (Siemens Medical Solutions) using a standard T1 sequence (3D MPRAGE sequence, 176 slices, 1 × 1 × 1 mm voxel size). For the structural scans vitamin E pills were placed at the former positions of the MEG localization coils for coregistration of MEG data and MR images. Behavioral responses were recorded using a fiber optic response pad (Photon Control; LUMItouch Response System) in combination with Presentation software (Version 9.90; Neurobehavioral Systems). Participants were excluded from further analysis if a response bias was detected (5 of 59 subjects). For response bias detection we calculated the normalized c criterion (c(n); Green and Swets, 1966) from the performance of each participant. A mean response bias deviating more than 2 SDs from zero was chosen as the rejection criterion.
Statistical analysis of behavioral data
Responses were classified as correct or incorrect based on the subject's first answer. For the hit rate (HR) analysis, the accuracy for each condition was calculated. For the reaction time (RT) analysis only correct responses were considered.
HRs and RTs were subjected to separate 2 × 2 repeated-measurements permutation ANOVAs (Anderson and Ter Braak, 2003; Suckling and Bullmore, 2004). To test whether the standard F statistics obtained for the main effects and the interaction were likely to have occurred by chance, the condition labels of the original data were permuted across conditions. The F value of the original data was then tested against an empirical distribution of F values constructed from 5000 datasets with such randomly permuted condition labels. Each main effect and the interaction were tested separately. F values larger than the 95th percentile of the distribution of F values obtained for the permuted datasets were considered to be significant at an α-level of 0.05. For the main effects, condition labels were permuted between the two levels of the tested factor within each subject, but permutations were restricted to occur within the level of the other factor, e.g., the orientation effect labels for UPTP and INTP were considered to be exchangeable, but labels of UPTP and INBT were not exchangeable. By keeping the labels of the other factor fixed, we aimed to avoid any confounds due to the variability introduced by the factor not currently of interest. For calculation of the interaction effect, condition labels were permuted across levels of both factors within subjects. In contrast to standard F tests, nonparametric permutation tests avoid the assumption of normality and are therefore recommended when testing non-Gaussian data as they are frequently encountered in behavioral measurements.
For post hoc testing, a Wilcoxon signed rank test was performed for each simple effect and a sequential Bonferroni–Holm correction (Holm, 1979) was applied to account for multiple comparisons (uncorrected α-level = 0.05).
MEG data analysis
Preprocessing.
Data analysis was performed with MATLAB (RRID:nlx_153890; MATLAB 2008b; The MathWorks) and the open source MATLAB toolbox FieldTrip (RRID:nlx_143928; Oostenveld et al., 2011; Version 2012 01-05).
Trials were defined from 0.55 s before to 0.55 s after stimulus onset. The time point of the stimulus onset was adjusted to take the projector delay into account. Trials containing sensor jump artifacts, eye movement artifacts, or muscle artifacts were rejected using automatic FieldTrip artifact-rejection routines. In addition, EOG channels were checked manually for horizontal and vertical eye movements. Only trials with correct behavioral responses were taken into account for MEG data analysis. To avoid potential effects of button press-related motor activity, we analyzed only data up to 0.350 s after stimulus onset.
Spectral analysis at the sensor level.
A multitaper approach (Percival and Walden, 1993) based on Slepian sequences (Slepian, 1978) was used for time-frequency transformation. The transformation was applied in an interval from 2 to 150 Hz in 2 Hz steps and in a time window of 0.400 s–0.050 s before (baseline) and 0–0.350 s after stimulus onset (task).
For each frequency, we considered an adaptive sliding time window with a width of 7 divided by the frequency in Hertz and an adaptive frequency smoothing, with a factor of 0.2 times the frequency, resulting in two tapers for each frequency. Time-frequency representations (TFRs) for the combined face conditions (UPTP, UPBT, INTP, and INBT) were averaged over time to obtain an average frequency representation for the task and baseline period, respectively. To identify frequency bands for subsequent beamformer analysis, we compared the spectral power in the task interval for all subjects and the combined face conditions with the baseline spectral power using a dependent-sample permutation t test and a cluster-based correction method (Maris and Oostenveld, 2007) to account for multiple comparisons across frequency and sensors. Clusters were defined as (spatially and spectrally) adjacent samples whose t values exceeded a critical threshold corresponding to an uncorrected α-level of 0.05. Cluster sizes were defined by taking the sum of t values of a given cluster. During the randomization procedure labels of task and baseline data were randomly reassigned within each subject. Cluster sizes observed for the original dataset were then tested against the distribution of cluster sizes obtained from 1000 permuted datasets. Cluster values larger than the 95th percentile of the distribution of cluster sizes obtained for the permuted datasets were considered to be significant. We found a significant positive and a significant negative cluster (Fig. 3). To delineate frequency bands for these clusters, we identified the points of maximum curvature in the spectrum by visual inspection. Based on the points of maximum curvature (excluding the maximum turning points for positive values and minimum turning points for negative values), we determined four nonoverlapping frequency intervals for subsequent beamformer source analysis: (1) 14–28 Hz (beta), (2) 28–56 Hz (low gamma), (3) 56–68 Hz (mid gamma), and (4) 68–144 Hz (high gamma).
Note that current recommendations for best practice favor source-level statistics over statistics at the sensor level (Gross et al., 2013). Therefore, we only performed the minimally necessary statistics for a choice of frequency bands at the sensor level, while all other (orthogonal) statistical tests were performed at beamformer source level.
Source grid creation.
To create individual source grids we transformed the anatomical MR images to a standard T1 template from the SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm) in MNI space (Collins et al., 1994) obtaining an individual transformation matrix for each subject. We then warped a regular 3D dipole grid based on the standard T1 template (spacing 10 mm) with the inverse of the transformation matrix to obtain an individual dipole grid for each subject in subject space. This way each specific grid point was located at the same brain area for each subject, which allowed us to perform source analysis with individual head models as well as multisubject statistics for all grid locations. Lead fields at those grid locations were computed for the individual subjects with a realistic single shell forward model (Nolte, 2003).
Beamformer source power analysis.
Beamformer source analysis was performed using the dynamic imaging of coherent sources (DICS) algorithm; a frequency domain beamformer (Gross et al., 2001) implemented in the FieldTrip toolbox. While the DICS algorithm was designed to compute source coherence estimates, we used real-valued filter coefficients only and thus restricted our analysis to the local source power (Grützner et al., 2010). The real part of the filters reflects the propagation of the magnetic fields from sources to sensors, as this process is supposed to happen instantaneously (Nunez and Srinivasan, 2006). Beamformer analysis uses an adaptive spatial filter to estimate the power at every specific location of the brain. The spatial filter is constructed from the individual lead fields and the cross-spectral density matrix for each subject. Cross-spectral density matrices were computed for the task period of 0–0.350 s after stimulus onset and the baseline period of 0.400–0.050 s before stimulus onset in four bands based on the statistical analysis of spectral power at the sensor level (spectral smoothing indicated in brackets): 21 Hz (±7 Hz), 42 Hz (±14 Hz), 62 Hz (±6 Hz), 106 Hz (±38 Hz). Cross-spectral density matrix calculation was performed using the FieldTrip toolbox with the multitaper method (Percival and Walden, 1993) using 3, 4, 9, or 26 Slepian tapers (Slepian, 1978), depending on the required spectral smoothing. We used a regularization of 5% (Brookes et al., 2008).
Beamformer filters were computed as “common filters” based on the activation and baseline data across all conditions. Using common filters for activation and baseline and all conditions allows for subsequent testing for differences between conditions; using common filters ensures that differences in source activity do not reflect differences between filters.
Spatial filtering of the sensor data for source statistics was then performed by projecting single trials through the common filter for each condition, task, and baseline separately.
Source statistics.
We used an equal amount of trials for the beamformer analysis for each subject in all conditions to make sure that statistical differences were not caused by a different numbers of trials. When the trial number differed across conditions for a subject, the minimal amount of trials across conditions was selected randomly from the available trials in each condition.
Statistical testing was performed in two steps. At the first level, we computed a within-subject t test on the single trial data to obtain a test statistic for task versus baseline source activity for each condition (dual state beamformer; Huang et al., 2004). At the second level, the resulting t values for each grid point and condition across all subjects were subjected to a 2 × 2 repeated-measurements permutation ANOVA with factors stimulus orientation and illumination direction. Here, we aimed to identify the consistent effects of condition-dependent source-power changes across subjects. To account for multiple comparisons across voxels, a cluster-based correction method (Maris and Oostenveld, 2007) was used. Clusters were defined to be adjacent voxels whose F values exceeded a critical threshold corresponding to an uncorrected α-level of 0.05. Cluster sizes were defined the same way as for the sensor-level statistics and were then tested against the distribution of cluster sizes obtained from 5000 permuted datasets. Permutation strategies for main effects and the interaction were identical to the ones applied to the behavioral data. Cluster values larger than the 95th percentile of the distribution of cluster sizes obtained for the permuted datasets were considered to be significant. For illustration of the effects in bar charts, the t values of the significant voxels in each cluster were averaged for each condition and over all subjects.
Both the statistical procedure for the cluster-based analysis as well as the beamformer analysis parameters chosen for source power reconstruction were very similar to the approach applied by Grützner et al. (2010) who were able to show a close correspondence of the beamformer source locations recovered from MEG data and the locations revealed by fMRI in a Mooney faces task, supporting the validity of the method.
Post hoc source analysis.
To characterize the effects in more detail by examining the frequency and time ranges at which the conditions underlying the significant effects differed, a post hoc analysis was performed. For this purpose, the source time courses of all significant voxels obtained by the permutation ANOVA were extracted. To that end, raw data were filtered in a broad frequency range (8 Hz high pass, 150 Hz low pass). Then, we calculated a time-domain beamformer filter (linear constrained minimum variance, LCMV; Van Veen et al., 1997) based on task and baseline intervals of all conditions (common filters; Nieuwenhuis et al., 2008). For each source location three orthogonal filters were computed (x, y, z direction). To obtain the source time courses, the broadly filtered raw data were projected through the LCMV filter. Subsequently, the 3D direction carrying the largest variance, indicating the dominant dipole orientation, was identified using a singular value decomposition.
For each source time course a time-frequency transformation was applied with the same parameters as for the sensor-level analysis but only in the relevant frequency range (high-gamma frequency range). Source time-frequency spectral power was transformed to relative change values by subtracting the average baseline power at each frequency and by subsequently dividing by it.
To determine the time and frequency ranges of the differential activations underlying the main or interaction effects, time-frequency transformations were averaged across voxels within each significant cluster of the permutation ANOVA and subjected to a post hoc-dependent samples permutation t test. When investigating the main effects, we additionally averaged over the two levels of the other (i.e., currently not tested) factor. For example, for the main effect of orientation, we calculated the mean of inverted stimuli (INTP and INBT) and the mean of upright stimuli (UPTP and UPBT) across all voxels and contrasted the resulting TFR with the permutation t test. Condition labels were randomly reassigned within each subject between the two levels of the tested factor during the randomization procedure. For the main effects of illumination the mean of stimuli illuminated from the bottom (UPBT and INBT) and the mean of stimuli illuminated from the top (UPTP and INTP) were contrasted. For the interaction effect we first calculated the orientation difference for stimuli illuminated from the bottom (UPBT-INBT) and from the top (UPTP-INTP) and contrasted the resulting difference TFR using the permutation t test. To account for multiple comparisons across frequency and time bins a cluster-based correction method (Maris and Oostenveld, 2007) was used. For one of the effects the post hoc test did not reach significance with the cluster-based correction method and only uncorrected t values are reported (Fig. 5C).
To obtain the time points of the strongest differential activation for each effect, the difference in the averaged TFR between the two levels of the tested factor (e.g., upright and inverted stimuli for the orientation effect) was further averaged over the relevant frequency range and plotted over time. Only the peaks in the significant time ranges identified by the post hoc tests are reported. For one of the effects, for which the post hoc test did not reach significance with the cluster-based correction method, both main peaks are reported.
Correlation of high-frequency GBA with reaction times
Pearson's correlations were calculated to assess the relationship between per subject mean reaction times and baseline corrected high-frequency GBA averaged over the significant cluster obtained for each effect.
Before correlation, RT and GBA for each subject were averaged over upright (UPTP and UPBT) and inverted (INTP and INBT) conditions for the orientation effect, conditions illuminated from the top (UPTP and INTP) and from the bottom (UPBT and INBT) for the illumination effect, and congruent (UPTP and INBT) and incongruent (UPBT and INTP) conditions for the interaction effect.
To focus on the effects of potential PEs, we subtracted each subjects' mean of GBA at the significant source locations across the four face conditions as well the subjects' mean RT across the four face conditions from the individual GBA and RT values, respectively. This subtraction corrects for individual differences in GBA (Hoogenboom et al., 2006) as well as in behavioral speed between subjects (Kanai and Rees, 2011), e.g., related to variations in the myelination of motor fibers.
Results
Behavioral analysis
To assess the behavioral effects of the violation of the orientation and illumination prior, we analyzed the HRs and the RTs of correct responses by means of a permutation ANOVA (see Materials and Methods). Post hoc Wilcoxon signed rank tests were used to investigate the simple effects underlying the interactions for HR and RT (Fig. 2). Statistical results are summarized in Table 1.
Hit rates
Subjects made the fewest mistakes in detecting faces when both priors were met (avg. HR(UPTP) = 94.38%) and made the most mistakes when both priors were violated (avg. HR(INBT) = 68.84%), suggesting the induction of PEs by our task design.
The permutation ANOVA revealed a main effect of orientation (p = 0.0002) and illumination (p = 0.0002), as well as an interaction between the two factors (p = 0.0002). Higher HRs were found for the upright than for the inverted Mooney faces. Also, higher HRs were found for the Mooney faces illuminated from the top than for the Mooney faces illuminated from the bottom.
Post hoc tests revealed that violating the orientation prior led to a decrease in HRs for faces illuminated from the top (p = 6.6 × 10−9; avg. HR(UPTP) − avg. HR(INTP) = 13.3%,) as well as for faces illuminated from the bottom (p = 1.63 × 10−9; avg. HR(UPBT) − avg. HR(INBT) = 24.1%). HR also decreased when the illumination prior was violated for upright (p = 0.046; avg. HR(UPBT) − avg. HR(UPTP) = 1.8%) and inverted Mooney faces (p = 2.14 × 10−8; avg. HR(INBT) − avg. HR(INTP) = 12.2%).
Reaction times
Subjects responded fastest when both priors were met (avg. RT(UPTP) = 0.614 s) and responded slowest when both were violated (avg. RT(INBT) = 0.723 s), which is also in line with the induction of PEs by our task design.
We found main effects of orientation and illumination for the RTs (p = 0.0002), as well as an interaction between the two factors (p = 0.0002). Shorter RTs were found for the UP than for the IN Mooney faces. Also, RTs were shorter for the Mooney faces illuminated from the TP than for the Mooney faces illuminated from the BT.
Violating the orientation prior led to increases in RTs for faces illuminated from the top (p = 1.63 × 10−9; avg. RT(INTP) − avg. RT(UPTP) = 0.0710) and for faces illuminated from the bottom (p = 1.11 × 10−8; avg. RT(INBT) − avg. RT(UPBT) = 0.0899 s), as revealed by the post hoc Wilcoxon signed rank tests.
Further, an increase in RT was detected when the illumination prior was violated for the upright Mooney faces (p = 0.0035; avg. RT(UPBT) − avg. RT(UPTP) = 0.0190 s). The violation of the illumination prior had an even more severe effect on the detection of Mooney faces in inverted orientation (p = 2.22 × 10−7; avg. RT(INBT) − avg. RT(INTP) = 0.0379 s).
The orientation effect on RT as well as HR was stronger than the illumination effect (RT: p = 3.23 × 10−8; HR: p = 1.63 × 10−9).
Neural responses
We performed a time-resolved beamformer source analysis of MEG activity to assess the PE responses in source space that corresponded to the violations of illumination and orientation priors. To this end we first identified the relevant frequency bands for beamformer analysis by statistically comparing the sensor activity in the task interval (0–350 ms) for all face conditions and correct trials with the baseline activity. This analysis revealed a cluster with task-related increases in activity over occipital, parietal, and temporal sensors and a cluster with task-related decreases over frontal, parietal, and temporal sensors (Fig. 3). The spectral profile of the two clusters was used to determine four nonoverlapping frequency intervals for beamformer source analysis: (1) 14–28 Hz (beta), (2) 28–56 Hz (low gamma), (3) 56–68 Hz (mid gamma), (4) 68–144 Hz (high gamma).
Note that all later statistical comparisons were performed in source space as this was strongly recommended in the recently published guidelines for MEG analyses (Gross et al., 2013). Moreover, we note that all subsequent statistical comparisons were orthogonal to the one used for identifying the frequency bands of interests, i.e., there is no double dipping (Kriegeskorte et al., 2009).
High-gamma frequency range (68–144 Hz)
Orientation effect.
In the high-gamma frequency range, we observed a main effect of orientation (cluster-based permutation ANOVA, p = 0.0154) at the occipital pole (V2), right superior occipital gyrus, left middle occipital gyrus, and left fusiform gyrus (Fig. 4A, left column; see Table 2 for MNI coordinates of peak voxels). At these areas, power compared with baseline was higher for inverted Mooney faces than for upright Mooney faces (Fig. 4A, right column). Post hoc analysis revealed two significant clusters, the first one peaking at 80 ms and the second one at 270 ms after stimulus onset (Fig. 5A, middle and bottom rows). The orientation effect involved the high-gamma frequency range from 76 to 120 Hz (Fig. 5A, middle row).
Illumination effect.
We found a main effect of illumination (cluster-based permutation ANOVA, p = 0.012) in a cluster located in right superior frontal gyrus/superior frontal sulcus (SFS), medial MFC, and anterior cingulate gyrus (ACG; Fig. 4B, left column; see Table 2 for MNI coordinates of peak voxels). At these locations power compared with baseline was higher for Mooney faces with illumination from the bottom than for Mooney faces with illumination from the top (Fig. 4B, right column). Post hoc analysis revealed a significant frequency range from 78 to 112 Hz and a peak time at ∼120 ms after stimulus onset (Fig. 5B, middle and bottom rows).
A second cluster for the main effect of illumination (cluster-based permutation ANOVA, p = 0.011) had a maximum located at right supramarginal gyrus (SMG) in the inferior parietal lobule, but extended also to the inferior temporal gyrus (Fig. 4C, left column; see Table 2 for MNI coordinates of peak voxels). At these locations, power compared with baseline was higher for Mooney faces with illumination from the top than for Mooney faces with illumination from the bottom (Fig. 4C, right column). This difference peaked at ∼135 and 310 ms after stimulus onset and was most pronounced between 75 and 144 Hz (Fig. 5C, middle and bottom rows).
Interaction effect.
In the high-gamma frequency range an interaction effect of the factors illumination and orientation (cluster-based permutation ANOVA, p = 0.002) also was observed. The cluster was located at left superior parietal lobe/precuneus, occipital pole (V2), right inferior occipital gyrus, right lingual gyrus, and the right cerebellum (Fig. 4D, left column; see Table 2 for MNI coordinates of peak voxels). Here, source power compared with baseline was higher for the UPBT and INTP condition than for the INBT and UPTP condition (Fig. 4D, right column). The interaction effect involved a significant frequency interval from approximately 68 to 96 Hz and had a peak at 210 ms after stimulus onset (Fig. 5D, middle and bottom rows).
Beta (14–28 Hz), low (28–56 Hz), and mid gamma (56–68 Hz) frequency range
No significant main or interaction effects were found in the beta and mid- and low-gamma frequency range.
Correlation of high-frequency gamma-band responses and RT
Correlation of high-frequency gamma-band responses and RT revealed a significant positive correlation at the source locations of the orientation effect (r = 0.37, p = 0.00019; Fig. 6A) and the first illumination effect (r = 0.43, p = 8 × 10−6; Fig. 6B). A significant negative correlation was found at the source locations of the interaction effect (r = −0.32, p = 0.0011; Fig. 6D) and a tendency toward a negative correlation was found at the locations of the second illumination effect (r = −0.17, p = 0.09; Fig. 6C).
Discussion
We tested whether PEs are reflected by increased neural activity versus the alternative of reduced neural activity for violated predictions. Using MEG with its direct access to electrophysiological activity allowed testing specifically whether PEs are signaled in GBA. PEs were induced by the violation of two priors based on lifelong visual experience: upright face orientation and illumination from the top. Deviations from these priors were embedded in a Mooney face-detection task (Mooney, 1957).
Behavioral findings confirmed the successful induction of PEs by our task. In addition, neuronal activity at task-specific brain locations was increased when priors were violated, in line with the concept of PEs in predictive coding theory (Rao and Ballard, 1999). Importantly, this increase in neuronal activity was indeed observed in GBA (>68 Hz), the frequency range thought to be associated with the bottom-up propagation of PEs (Arnal and Giraud, 2012; Bastos et al., 2012). These findings strongly support the notion that increased (high-frequency) GBA reflects PEs. No PE signals were found in any of the lower frequency bands, suggesting that PEs are mainly represented in high-frequency GBA.
However, for the violation of the illumination prior we additionally found decreased GBA in posterior parietal brain areas, which may represent decreased attention to internal mnemonic representations (Wagner et al., 2005). Hence, we suggest that the high-frequency GBA not only signals PEs, but also attentional effects, in line with previous results (Fries et al., 2001).
Violations decrease accuracy and increase RTs
Behavioral responses were slower and more inaccurate when priors were violated. This is in line with other behavioral phenomena accounted for by predictive coding such as priming and global precedence (Friston, 2005) and validates that our task design successfully induced PEs.
Notably, the violation of the orientation prior had a higher impact on HRs and RTs than the violation of the illumination prior. This difference may be explained as follows. While a robust inversion effect is found in face perception (Yin, 1969 for photographic faces; Rodriguez et al., 1999 for Mooney faces), the illumination prior varies substantially between individuals (Adams, 2007) and can be altered with experience (Adams et al., 2004). Thus, the stronger behavioral effect of the violation of the orientation prior is in line with a precision-weighting of PEs (Friston and Kiebel, 2009; Adams et al., 2013) based on the higher precision of the orientation prior than the illumination prior.
Cortical source power changes in high-frequency GBA reflect PEs
For the violation of the orientation prior we expected that the neural correlate of a PE should arise before any illumination effect, and that it would be signaled by GBA increases. Indeed, at 80 ms after stimulus onset, before any effect of illumination, we observed the first of two significant clusters of increased high-frequency GBA for the violation of the orientation prior in early visual areas. These areas have been linked to contour integration (Kourtzi et al., 2003). The contour-integration role of these areas combined with the early latency of the orientation effect supports its interpretation as reflecting PEs arising for unexpected face orientations. This is because contour processing areas are suitable candidate locations for an orientation PE as an early (2D) contour match to internal templates was suggested as the first stage of Mooney face recognition (Cavanagh, 1991). Since the stimulus contour pattern of the inverted faces does not match the expected template contour pattern of upright faces, a specific PE in contour processing brain areas is supposed to arise for inverted stimuli at this early processing stage.
An orientation-related PE could also arise in areas tuned to specific, illumination-invariant, coarse-grained luminance contrasts in faces, because these seem to play a role in face processing (Ohayon et al., 2012). This specific tuning was reported in the macaque middle face patch (MFP), making its homolog, the fusiform face area, a candidate for orientation PEs. However, MFP cells seem to be preferentially active for contrasts matching environmental priors, additionally requiring embedding of the contrast in a face-like pattern. This latter condition is not well met in Mooney stimuli, potentially reducing any effects of changes in luminance contrasts with orientation in our study.
For the violation of the illumination prior we expected that the correlate of a PE should arise after the first orientation-related effect. Again, we expected this PE to be signaled by increased GBA. We observed increased high-frequency GBA for violation of the illumination prior at 120 ms after stimulus onset, and thus 40 ms after the first orientation effect. This effect was located in MFC, SFS, and ACG. Both timing and location of this effect support its interpretation as an illumination-related PE. This is because the illumination direction strongly influences the shading pattern of an image and shading cues are the only cues available in Mooney faces to reconstruct the 3D shape (Kemelmacher-Shlizerman et al., 2008). PEs are therefore likely to arise in areas involved in the processing 3D shape from shading cues, such as the MFC (Taira et al., 2001). Additionally, SFS may be used to keep shading cues in working memory (Courtney et al., 1998) and ACG may support error detection (Botvinick et al., 2004). Thus, we interpret this illumination effect as a PE signal for the unexpected illumination.
We also observed an interaction effect with increased GBA for the UPBT and INTP conditions at precuneus, V2, and lingual gyrus, of which all three are involved in (global) shape processing (Fink et al., 1997; Hegdé and Van Essen, 2000; Tanskanen et al., 2008). This interaction effect occurred at 210 ms after stimulus onset. At this late time point, the process model of Cavanagh (1991) suggests that the shape of the sensory input is supposed to be evaluated based on the interaction of light and 3D structure. The combination of these two properties of a scene can also be predicted based on prior experience. We expect upright face orientation to be combined with illumination from the top and—as it is probably more common to see photographs of inverted faces than actual inverted faces—we expect inverted face orientation to be combined with illumination from the bottom. This expected combination of orientation and illumination is violated in the INTP and UPBT conditions. Therefore, we interpret this late interaction effect at precuneus, V2, and lingual gyrus as a PE at a higher conceptual level.
GBA additionally reflects attentional effects
We observed a second illumination effect peaking at 135 and 310 ms after stimulus onset. For this illumination effect, we found a decrease of GBA for violation of the illumination prior mainly in SMG. Activity in this area may reflect deployment of attention to internal mnemonic representations, as stated in the attention to memory hypothesis (AtoM; Wagner et al., 2005). Accordingly, the SMG usually shows decreased BOLD fMRI activity for less familiar information (Wagner et al., 2005; Ciaramelli et al., 2008), potentially corresponding to unusual illumination conditions here. To link these fMRI findings to our MEG results, we draw on the well established positive correlation of the BOLD fMRI signal with GBA in MEG (Brookes et al., 2005). Taking this correlation into account, the observed decreased GBA for the stimuli with the less familiar illumination direction in the SMG may be an AtoM effect rather than a PE.
Thus, our results suggest that high-frequency GBA does not exclusively signal PEs, but also reflects attention. This attentional interpretation could be reconciled with an interpretation as a PE by the recent proposal that attention itself is implemented via gain modulation of PE units (Feldman and Friston, 2010). As our study was not designed to test this specific hypothesis, the interplay of attention, PEs, and GBA remains to be investigated.
Increased GBA for violations is associated with slower processing
High-frequency GBA at the locations of the orientation effect and the first illumination effect showed a positive correlation with RT. This relationship is compatible with longer RT reflecting the PE for violation of the orientation and illumination prior.
In contrast, the negative relationship of GBA and RT at the locations of the second illumination effect suggests that here increased GBA speeds up processing. This is in line with our interpretation that this effect does not represent a PE and also with a general negative correlation of GBA and RT from previous reports (Hoogenboom et al., 2010).
The above interpretation of the interaction effect as a high-level PE, however, is questioned by the negative correlation of GBA with RT at these locations. Nevertheless, it is possible that the consistency violation inducing the interaction is not performance relevant.
Conclusion
Our results strongly support the notion that PEs are signaled by increased high-frequency GBA (>68 Hz). This also holds for violation of priors from lifelong experience.
Footnotes
This work was supported by grants from the LOEWE NeFF. We thank William Phillips for comments on an earlier version of this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Michael Wibral, MEG Unit, Brain Imaging Center, Goethe Universität, Heinrich-Hoffmann Strasse 10, 60528 Frankfurt, Germany. wibral{at}bic.uni-frankfurt.de