Abstract
An image patch can be locally decomposed into sinusoidal waves of different orientations, spatial frequencies, amplitudes, and phases. The local phase information is essential for perception, because important visual features like edges emerge at locations of maximal local phase coherence. Detection of phase coherence requires integration of spatial frequency information across multiple spatial scales. Models of early visual processing suggest that the visual system should implement phase-sensitive pooling of spatial frequency information in the identification of broadband edges. We used functional magnetic resonance imaging (fMRI) adaptation to look for phase-sensitive neural responses in the human visual cortex. We found sensitivity to the phase difference between spatial frequency components in all studied visual areas, including the primary visual cortex (V1). Control experiments demonstrated that these results were not explained by differences in contrast or position. Next, we compared fMRI responses for broadband compound grating stimuli with congruent and random phase structures. All studied visual areas showed stronger responses for the stimuli with congruent phase structure. In addition, selectivity to phase congruency increased from V1 to higher-level visual areas along both the ventral and dorsal streams. We conclude that human V1 already shows phase-sensitive pooling of spatial frequencies, but only higher-level visual areas might be capable of pooling spatial frequency information across spatial scales typical for broadband natural images.
Introduction
The phase spectrum of natural images, rather than the amplitude spectrum, contains much of the perceptually important information in an image (Oppenheim and Lim, 1981; Wang and Simoncelli, 2004). Of particular interest here is the local phase spectrum, computed by Fourier analysis of small image regions or patches; in this paper, phase always means local phase unless otherwise mentioned. Important visual features are perceived at points of maximum local phase congruency (Morrone et al., 1986; Morrone and Owens, 1987; Morrone and Burr, 1988), where the phases of different spatial frequencies are aligned (Fig. 1A). Phase congruency is invariant to changes in image contrast and brightness, and therefore can be applied to broadband edge detection (Morrone and Burr, 1988; Kovesi, 1999). Figure 1B shows how edges in a natural image can be detected based on phase congruency (Kovesi, 1999).
Simple cells in the primary visual cortex (V1) may be approximated by oriented bandpass filters of different phase selectivity and size (Jones and Palmer, 1987; Ringach, 2002). Edges and lines, typical features of natural images, are broadband with a phase alignment that depends on the feature (Fig. 1A), and thus excite simple cells at a variety of different frequencies with appropriate phase relations. In the macaque V1, the distribution of preferred spatial phases seems to cluster into even- and odd-symmetric phase classes (Ringach, 2002). Local energy model of feature detection employs an even- and odd-symmetric spatial filter pair followed by a square root operator of the sum of the squared filter responses (Morrone and Owens, 1987; Morrone and Burr, 1988). The peaks in the local energy function denote points of maximum phase congruency, and thus the perceptually important features. This model succeeds in detecting both real and illusory visual features (Morrone and Burr, 1988; Ross et al., 1989; Morrone et al., 1994).
Although neurons even in the primary visual cortex may detect some phase congruencies (Morrone and Burr, 1988; Mechler et al., 2002), it is likely that such congruencies are mainly processed in the extrastriate cortex. A recent computational study on natural image statistics showed that pooling across multiple spatial frequencies is a statistically optimal way to analyze the output from V1 (Hyvärinen et al., 2005). Perna et al. (2008) showed sensitivity to phase congruency associated with broadband edges and lines in human V1, but more interestingly they showed that only higher-level areas in the ventral and dorsal stream were capable of identifying the phase type (edge or line).
Here we studied with functional magnetic resonance imaging (fMRI) how the human visual cortex encodes spatial phase alignments. We used fMRI adaptation to explore phase-sensitive pooling of spatial frequencies. Responses in several visual areas, including V1, showed an increase in the fMRI response as a function of the change in the cross-frequency phase difference relative to the adapting stimulus. Then, we explored selectivity to congruent phase alignments in broadband stimuli. All studied visual areas showed stronger responses for the stimuli with congruent phase structure, and this selectivity increased along the hierarchy of visual areas. The results suggest that phase-sensitive pooling of spatial frequencies takes place already in human V1, but only higher-level visual areas are capable of pooling across multiple spatial scales.
Materials and Methods
Subjects
Eight subjects participated in experiment 1 [subjects S1–S8, ages 26–43, 2 females (f)], seven subjects in experiment 2 (S1–S6, S9, ages 26–43, 1 f), two subjects in control experiment 1 (S1, S2, ages 27, 43, 1 f), six subjects in control experiment 2 (S1–S4, S7, S8, ages 26–43, 2 f), and six subjects in control experiment 3 (S1–S3, S8–S10, ages 23–43, 2 f). All subjects had normal or corrected to normal vision. They gave written informed consent before participating in the study. The ethical committee of the Hospital district of Helsinki and Uusimaa had evaluated and approved the research.
Visual stimuli and experimental design
All stimuli consisted of grating patches presented simultaneously in all four visual field quadrants at mean eccentricity of 7.6° (see Fig. 2C). The grating patches were constructed from one, two, or five sinusoidal component gratings windowed by a symmetric two-dimensional Gaussian function whose SD was 1.5°. Subjects performed a fixation task during all experiments to direct attention away from the stimuli. For most subjects, the task was to press one of two buttons when the letter “o” displayed at the point of fixation was changed into either letter “e” or “c” for 250 ms. These changes occurred at 3–10 s random intervals. In experiment 1, the mean correct rate for the fixation task was 88%, and there was no significant difference in the task performance between tasks coinciding with adaptation and test stimuli. In experiment 2, the mean correct rate for the fixation task was 71%, and there were no significant differences in the task performance between different stimulus blocks. Both subjects in control experiment 1, three subjects in control experiment 2, and one subject in control experiment 3 detected luminance change in fixation instead of letter identification. All stimuli were created with Matlab (MathWorks), and their timing was controlled with Presentation (Neurobehavioral Systems). The stimuli were projected with a three-micromirror Christie X3 (Christie Digital Systems) data projector to a semitransparent screen, which the subjects viewed at a 34 cm distance via a mirror. The screen was gamma-corrected using a Minolta LS-110 luminance meter (Minolta Camera).
Experiment 1: phase difference-tuned fMRI adaptation.
In experiment 1, the grating patches were constructed of a fundamental harmonic grating [spatial frequency f = 0.4 cycles per degree (cyc/deg)] and a third harmonic component grating (spatial frequency 3f) with a contrast ratio of 1:1/3, thus being the first two components of a square wave (see Fig. 2A). In the adapting stimulus, the phase difference between the component gratings was 0° for four of the eight subjects and 180° for the other four subjects. In the test stimuli, the fundamental harmonic grating was kept constant and the third harmonic was relocated 0°, 45°, 90°, 135°, or 180° (see Fig. 2B). All stimulus patches had the same root-mean-square (RMS) contrast (SD of luminance values divided by the mean luminance) of 14%. Michelson contrast (difference between the maximum and minimum luminance value divided by their sum) of the stimulus patches increased monotonically from 60% to 90% as a function of phase difference (see control experiment 2). The mean luminance of the stimuli was 22 cd/m2.
The timing of the stimuli was adopted from a recent orientation-tuned fMRI adaptation study (Fang et al., 2005). Each run began with a 30 s preadaptation, which was followed by alternating presentations of 5 s top-up adaptation and one of the five test stimuli, and ended with 18 s of adaptation stimulation (total run length: 408 s). All stimuli were presented with abrupt onset and offset, temporally modulated at 1 Hz (stimulus on for 0.5 s and off for 0.5 s). Each test stimulus was presented 12 times during one run, and eight runs were measured for each subject during an experiment. The order of the test stimuli was optimized for event-related fMRI (Wager and Nichols, 2003).
A functional localizer run with an eight-region multifocal stimulus (Vanni et al., 2005) (see Fig. 2D) was measured in the beginning of each experiment. The grating patches in the multifocal stimulus were centered at 45° off of the horizontal and vertical meridians at eccentricities 1.9° and 7.6°. This functional localizer ensured that the regions of interest (ROIs) included retinotopically correct cortical representations for the adaptation experiment stimuli, which typically did not evoke statistically significant activations in visual areas V1, V2, and V3/VP.
Experiment 2: representation of phase congruency.
In experiment 2, the grating patches were constructed of a fundamental harmonic grating (spatial frequency f = 0.4 cyc/deg) and four harmonic sinusoidal component gratings (spatial frequencies 3f, 5f, 7f, and 9f) with contrast ratios of 1:1/3:1/5:1/7:1/9, thus being the first five components of a square wave. The stimuli were divided into two stimulus categories based on phase alignments between the components in the compound grating patches. In the congruent category, the component gratings were summed with local phase coherence across spatial frequency components. In the random category, the phase differences between the component gratings were random. Ten stimuli were derived from the congruent category with congruence phases from 0° to 90° (see Fig. 6A) and 10 from the random category (see Fig. 6B). On average, the stimulus patches in the two categories had the same RMS contrast of 11% and Michelson contrast of 60%. The mean luminance of the stimuli was 22 cd/m2.
The stimuli were presented in a block design with the stimuli from the two categories presented in different blocks. The stimulus layout was similar to Figure 2C, but now during one 15 s stimulus block, the grating patch and its orientation changed every 0.5 s independently in each visual field quadrant. All stimuli within one category were presented three times in random order during one stimulus block. Four blocks of both stimulus types were presented alternately during one run with 15 s rest in between (total run length: 255 s), and eight runs were measured for each subject during the experiment.
Control experiment 1: event-related responses for the compound gratings without adaptation.
As a reference for the adaptation results (experiment 1), we measured for two subjects (S1 and S2) the event-related responses for the stimuli shown in Figure 2 without the adaptation (control experiment 1). The experimental design, including the timing of the stimuli, was identical to experiment 1 with the exception that the adapting stimulus periods were replaced with fixation baseline and the preadaptation period was excluded. The data analysis was also identical to that in experiment 1.
Control experiment 2: can results in experiment 1 be explained by stimulus contrast instead of phase difference?
In experiment 1, the RMS contrast of the adaptation and test stimuli was equal, but the Michelson contrast increased monotonically as a function of phase difference. Control experiment 2 separated the effect of change in the Michelson contrast and change in the phase difference. The adapting stimulus of control experiment 2 comprised 90° phase difference between the component gratings, and in the test stimuli, the fundamental harmonic grating was kept constant and either the third harmonic was relocated or the Michelson contrast of the adapting grating was changed. The adapting stimulus was switched to a 90° stimulus, because we wanted to test in the same experiment the effect of increased and decreased contrast and be able to compare responses for test stimuli with either a change in the phase alignment or in the local contrast. The five test stimuli were as follows: (1) the adapting stimulus with 90° phase difference between the component gratings, (2) a compound grating with 0° phase difference between the component gratings, (3) a compound grating with 180° phase difference between the component gratings, (4) the adapting stimulus with Michelson contrast (60%) equal to the compound grating with 0° phase difference between the component gratings, and (5) the adapting stimulus with Michelson contrast (90%) equal to the compound grating with 180° phase difference between the component gratings. The cross sections of the stimuli are shown in supplemental Figure 1 (available at www.jneurosci.org as supplemental material). The stimulus layout and the timing of the stimuli were identical to experiment 1.
Control experiment 3: can results in experiment 1 be explained by change in position instead of phase difference?
In experiment 1, the responses were measured for the test stimuli as a function of phase difference between the constant fundamental and the mobile harmonic grating. Control experiment 3 controlled the possibility that the responses were due to change in the absolute phase (position) of the harmonic grating and not due to phase difference between the two gratings. The adapting stimulus in control experiment 3 was the third harmonic grating alone (spatial frequency f = 1.2 cyc/deg) (see Fig. 2A, right). In the test stimuli, the same grating was relocated 0°, 45°, 90°, 135°, or 180°, the last test stimulus being equivalent to a reversal of pattern contrast of the adapting stimulus. The stimulus layout and the timing of the stimuli were identical to experiment 1.
Data acquisition and analysis
Measurements were performed using a 3T GE Signa Excite scanner (General Electric Medical Systems) equipped with an eight-channel receiver head coil. Functional volumes were acquired with echo-planar imaging using single-shot gradient-echo sequence with imaging parameters: repetition time 1.5 s, 26 slices with 2.8 mm slice thickness, field of view 18 cm, imaging matrix 64 × 64, echo time 30 ms, and flip angle 60°. Structural images with low resolution (voxel size: ∼1.8 mm × 1.8 mm × 1.5 mm) were acquired at the end of each measurement session with spoiled gradient-echo sequence. The data were coregistered to high-resolution (voxel size: ∼1 mm × 1 mm × 1 mm) structural images, from which the white and gray matter borders were segmented and reconstructed using Freesurfer software package (Dale et al., 1999; Fischl et al., 1999a).
Functional data were analyzed with SPM2 (Wellcome Department of Imaging Neuroscience) Matlab toolbox. In preprocessing, functional images were corrected for interleaved acquisition order and for head motion. All quantitative analyses were performed on spatially unsmoothed data. From experiment 1 and control experiments 2 and 3, the first 20 images from the beginning of each run were excluded from the analysis to reach stable adaptation. From control experiment 1 and experiment 2, the first four to six images from the beginning of each run were excluded to reach stable magnetization. In statistical analysis, the timing of the test stimuli in experiment 1 and in control experiments 1–3 and the onset and duration of the stimulus blocks in experiment 2 were entered as regressors of interest to the general linear model and convolved with the canonical hemodynamic response model. During the parameter estimation, the data were high-pass filtered with 128 s cutoff, and serial autocorrelations were estimated with restricted maximum likelihood algorithm using a first-order autoregressive model.
Identification of visual areas
Each subject participated in a separate retinotopic mapping experiment with multifocal fMRI (Vanni et al., 2005). The multifocal stimulus consisted of 24 concurrently stimulated regions in three rings and eight wedges. The retinotopic areas V1, V2, V3/VP, V3AB, and hV4 were identified for each subject on the inflated cortical surface based on the representations of vertical and horizontal visual field meridians.
All subjects participated in a V5+ and lateral occipital complex (LOC) localizer experiment. Location of visual area V5+ was defined as the voxels that responded more strongly to low-contrast (10%) concentric expanding and contracting (7°/s) stimulus than to the corresponding stationary stimulus and were located approximately at the known anatomical location of V5 (Watson et al., 1993). The notation V5+ indicates that this area corresponds to V5 complex, including not only V5 but also neighboring areas that are sensitive to motion (Huk et al., 2002). Location of LOC was defined as the voxels that responded more strongly to gray-level photographs of objects and faces than to scrambled versions of the same images, and were located approximately at the known anatomical locations of object-selective regions in the lateral (Malach et al., 1995) and ventral occipitotemporal cortex (Grill-Spector et al., 1999). Photographs were provided by two free photograph libraries and laser-scanned faces by the Max-Planck Institute for Biological Cybernetics in Tuebingen, Germany.
The compound grating stimuli in experiment 2 activated also brain regions along the medial bank of the intraparietal sulcus. Previous fMRI studies have revealed several distinct visual field maps located in this region (Swisher et al., 2007; Konen and Kastner, 2008), and these regions may be important in surface segmentation and brightness perception (Perna et al., 2005; Vinberg and Grill-Spector, 2008). We did not have separate functional localizer data to identify these regions, but based on the activation patterns in experiment 2, we identified two regions of interest (IPS1/2 and IPS3) located along the intraparietal sulcus (see Fig. 3). The mean (SD) Talairach coordinates of these ROIs were IPS1/2: ±26 (7), −74 (5), 36 (4) and IPS3: ±25 (5), −57 (3), 55 (3).
Region-of-interest analysis
The ROIs within each visual area were restricted to voxels with statistically significant response for functional data. In experiment 1 and in control experiments 1–3, the voxels that were included in the ROIs were defined from separate functional localized data, because the stimuli in these experiments did not evoke statistically significant activation patterns across visual areas. Therefore, the ROIs in the retinotopic visual areas included only the voxels with statistically significant response (pnone < 0.001) in the multifocal localizer run measured in the beginning of each experiment. The multifocal localizer experiment failed to evoke statistically significant responses in visual area V5+ and LOC, which have less clear retinotopic organization. Therefore, the ROIs in area V5+ included only the voxels with statistically significant response (pFWE < 0.05) in the separate V5+ localizer experiment and the ROIs in areas LO and pFus were the active voxels (pFWE < 0.05) from the separate LOC localizer experiment. For experiment 1 and control experiment 1, responses were also analyzed in two ROIs along the intraparietal sulcus. These ROIs were defined based on the data from experiment 2 (see above, Identification of visual areas) and included only the voxels with statistically significant response (pFWE < 0.05) in experiment 2. For experiment 1, this analysis could be done for six subjects who participated both in experiment 1 and experiment 2.
In experiment 2, the stimuli evoked statistically significant (pFWE < 0.05) and consistent activation patterns across visual areas in each individual. Therefore, the ROIs in each area included the voxels with statistically significant (pFWE < 0.05) response across the two stimulus conditions compared with the fixation baseline across all experimental runs. As a reference, we analyzed the data from experiment 2 also using ROIs defined from separate functional localized data (see supplemental Fig. 2, available at www.jneurosci.org as supplemental material). The fMRI signal changes for each stimulus condition within the ROIs were calculated from the parameter estimate images (Vanni et al., 2005).
Group-averaged visualizations
The spherical surface-based coordinate system (Fischl et al., 1999b) was used in the group-averaged data visualizations in Figures 3 and 8. An average cortical surface was created from all of the participants in the study, and the individual data were resampled to this averaged surface based on the cortical curvature information with Freesurfer. Nodes with data from less than three subjects were omitted from the visualizations shown in Figures 3 and 8.
Results
Experiment 1: cortical sensitivity to cross-frequency spatial phase differences
A compound grating composed of different spatial frequencies has a different appearance depending on the phase alignment between the components (Fig. 1A). Already two spatial frequency components evoke different percepts in the compound as a function of the phase difference (Fig. 2A,B). It is not clear how the visual system encodes the phase alignments in the broadband visual stimuli. In experiment 1, we studied the cortical sensitivity to cross-frequency spatial phase differences with fMRI adaptation.
The ROIs in the cortical visual areas were defined separately for each subject. Retinotopic visual areas V1, V2, V3/VP, hV4, and V3AB were mapped with a multifocal stimulus (Vanni et al., 2005). The multifocal stimulus was modified from the earlier study to give, in addition to V1, robust signals also in a subset of extrastriate areas. The LOC was localized based on its selective response to visual objects (Malach et al., 1995). Based on functional and anatomical criteria, LOC was further divided into a dorsal area termed the lateral occipital (LO) area (Malach et al., 1995) and a ventral region located around the posterior fusiform gyrus (pFus) (Grill-Spector et al., 1999). Visual area V5+ was localized based on its selectivity to visual motion (Watson et al., 1993). The notation V5+ indicates that this ROI corresponds to V5 complex, which includes not only V5 but also neighboring areas that are sensitive to motion (Huk et al., 2002). In addition, we identified two ROIs along the intraparietal sulcus (see Materials and Methods). Figure 3 illustrates the locations of the visual areas and ROIs on a cortical surface averaged from all our subjects (Fischl et al., 1999b).
In experiment 1, the compound grating stimulus comprised two spatial frequency components, and the phase difference between these components was varied. We assessed the responsiveness in several visual areas with first adapting the subject with repeating one stimulus and then examining responses for changes in the stimulus (Grill-Spector and Malach, 2001; Fang et al., 2005). Such a design enables detection of signals from particular neuronal populations within the spatial resolution elements of fMRI (voxels).
Subjects were adapted to a compound grating where the phase difference between the two spatial frequency components was either 0° or 180°. In the test stimuli, the phase difference between the components was changed by keeping the low-spatial-frequency grating constant and shifting the high-spatial-frequency grating. Figure 2B shows all the test stimuli with 0°, 45°, 90°, 135°, or 180° phase difference between the components. Subjects' attention was directed away from the stimuli by displaying the stimulus simultaneously in all visual field quadrants (Fig. 2C) and with a letter-identification task at the fixation point in the middle of the screen. Figure 4A shows activation patterns for two representative single subjects for the test stimulus with maximal difference from the adapting stimulus. Because adaptation design has relatively weak detection power, here the data are displayed at low statistical threshold to represent the relative weight of distinct areas in individual data. For both subjects, the change in the phase difference activated most the extrastriate visual areas V3AB, hV4, LO, pFus, and intraparietal sulcus. As a reference, Figure 4B shows activation patterns for the same test stimuli without the adaptation (control experiment 1). Without adaptation, the stimuli activated all mapped areas, including V1, V2, V3/VP, V3AB, hV4, V5+, LO, pFus, and areas along the intraparietal sulcus.
Figure 5A shows the mean fMRI response strengths for each of the test stimulus after adaptation, from ROIs in different visual areas. Responses for the subjects with two different adapting stimuli have been averaged so that each bar corresponds to test stimuli differing from the adaptor phase by the same amount (see supplemental Fig. 3, available at www.jneurosci.org as supplemental material, for results given separately for the two different adapting stimuli). All tested visual areas showed a monotonic increase in the signal from the 0° (adapting stimulus) to the 180° shift in the phase difference (maximum difference compared with the adapting stimulus). These trends were confirmed with nonparametric Page's L test [V1: L = 407, p < 0.001; V2: L = 407, p < 0.001; V3/VP: L = 408, p < 0.001; hV4: L = 418, p < 0.001; V3AB: L = 398, p < 0.01; LO: L = 417, p < 0.001; pFus: L = 423, p < 0.001; V5: L = 398, p < 0.01; IPS1/2: L = 311, p < 0.01 (N = 6); IPS3: L = 306, p < 0.01 (N = 6)]. Area V1 was the most adapted, whereas area hV4 showed the strongest response for the test stimulus with maximal difference compared with the adapting stimulus. Area hV4 responded positively also for the adapting stimulus, which could imply higher sensitivity for the stimuli or weaker adaptation. The slopes of the tuning curves in V1 and hV4 were not significantly different (Wilcoxon signed-rank test, p > 0.05). Supplemental Figure 4 (available at www.jneurosci.org as supplemental material) shows averaged time course profiles and fitted hemodynamic models for the adapting stimulus and for the test stimulus with maximal difference compared with the adapting stimulus for visual areas V1, hV4, and pFUS. Figure 5B shows the mean fMRI response strengths for the event-related responses without the adapting stimulus (control experiment 1). Without adaptation, the fMRI responses were approximately equal regardless of the phase difference between the components.
Although the only change from the adaptation to the test stimuli was the phase difference between the two spatial frequency components, this change entails diverse local contrasts in the test stimuli. That is, the RMS contrast energy of the stimuli shown in Figure 2B was equal, but the Michelson contrast increased monotonically from the stimulus with 0° phase difference between the components to the stimulus with 180° phase difference. Control experiment 2 separated the effects of phase difference and Michelson contrast. Subjects were adapted to a compound grating with 90° phase difference between the components. In the test stimuli, either the phase difference was changed by shifting the high-frequency component, or the contrast of the stimulus was changed by keeping the gratings constant but scaling the Michelson contrast to match either of the two other test stimuli. The adapting stimulus equaled the middle stimulus (90° phase difference between components) in Figure 2B and two of the test stimuli the leftmost (0° phase difference between components, i.e., −90° phase shift compared with the adapting stimulus) and rightmost (180° phase difference between components, i.e., +90° phase shift compared with the adapting stimulus) stimuli in Figure 2B.
The results from control experiment 2 are shown in Figure 5C. In extrastriate areas, including areas LO and pFus, the test stimulus with +90° change in the phase difference evoked significantly stronger response than the test stimulus with equally scaled Michelson contrast (Wilcoxon signed-rank test, p < 0.05). There were no significant differences in any visual area between the responses for the test stimulus with −90° change and equally contrast-scaled test stimulus. Correspondingly, the evoked responses were stronger for the test stimulus with 180° phase difference than for the 0° phase difference (Wilcoxon signed-rank test, p < 0.05, for each visual area), which could be explained by a larger neural population preferring 180° difference of phase across spatial frequencies. Interestingly, area hV4 responded positively also to the low-contrast test stimulus (Wilcoxon signed-rank test, p < 0.05). This finding is in agreement with a previous fMRI study, which showed that the contrast response function in hV4 differs from the functions in areas V1–V3 showing positive responses to both contrast increments and decrements (Gardner et al., 2005).
In experiment 1, the component grating with low spatial frequency (Fig. 2A, left) was kept constant and the grating with high spatial frequency (Fig. 2A, right) was shifted. Therefore, in addition to a change in the phase difference between the two components, the absolute phase (position) of the high-spatial-frequency grating was changed. Control experiment 3 checked that the change in the absolute phase cannot explain the results in Figure 5A. In this experiment, the stimulus was the high-frequency grating alone (Fig. 2A, right). Subjects were adapted to one absolute phase of the grating, and in the test stimuli, the absolute phase was shifted 0–180°. The results are shown in Figure 5D. There were no significant trends in the data as a function of the change in the absolute phase (Page's L test, for all areas L < 291, p > 0.05) or significant differences between the responses for different test stimuli (Friedman test, for all areas p > 0.05).
Experiment 2: cortical representation of spatial phase congruency
Spatial features in compound gratings are reinforced when more harmonic components are added with local phase coherence across spatial frequency bands. Typical edges in natural images are such broadband features with coherent phase alignments (Griffin et al., 2004). Because phase congruency has been shown to be useful in broadband edge-detection algorithms (Morrone and Owens, 1987; Morrone and Burr, 1988; Kovesi, 1999), human visual cortex could also encode phase congruency in the identification of broadband edges.
In experiment 2, we compared fMRI responses for broadband visual stimuli with congruent and random phase structures. We switched from the adaptation design to a blocked stimulus presentation, which included two different stimulus categories. Figure 6A shows the stimuli in the congruent category with congruence phases ranging from 0° to 90°. The stimuli in the random category (Fig. 6B) were composed of the same spatial frequency components, but with random phase differences between the components.
Figure 7 shows representative single-subject statistical activation maps during the two different stimulations and for the comparisons between responses for congruent and random stimuli. The activation maps suggest that the compound gratings with congruent phase alignments evoke stronger responses in extrastriate areas, especially in V3AB, hV4, LO, pFus, and in intraparietal sulcus. None of the areas responded stronger to the random stimuli.
Figure 8 shows the signal changes for the two stimulus categories. In each functional area, the responses were higher for the congruent than random stimuli (Wilcoxon signed-rank test, p < 0.05 for each visual area). Supplemental Figure 3 (available at www.jneurosci.org as supplemental material) shows averaged time course profiles and fitted hemodynamic models for the two different stimulus categories for visual areas V1, hV4, and pFUS. To compare responses between different visual areas, we computed a selectivity index for each area from the responses to congruent stimuli (Rcong) and random stimuli (Rrand). The difference in the responses (Rcong − Rrand) was divided by the sum of the responses (Rcong + Rrand). The results are shown in Figure 8B. Visual area had a significant effect on the congruency selectivity index (Friedman test, p < 0.05). The index increased through the ventral stream areas V1, V2, V3, hV4, LO, and pFus (Page's L test, L = 608, p < 0.001). In addition, the congruence selectivity index was high in the regions along the intraparietal sulcus. Figure 8C illustrates the cortical distribution of the congruency selectivity index on the group-averaged data.
Discussion
This study explored phase-sensitive integration of spatial frequencies in the human visual cortex. The fMRI adaptation experiment showed that several visual areas, including V1, are sensitive to the spatial phase differences in compound grating stimuli. In addition, all studied visual areas showed selectivity for congruent phase structure. This selectivity increased along the ventral stream areas and was high also in areas along the intraparietal sulcus. Our results contribute to the previous studies on phase congruency (Morrone and Burr, 1988; Mechler et al., 2002; Perna et al., 2008) by showing that phase-sensitive pooling of spatial frequencies is ubiquitous in the human visual cortex, that selectivity to phase congruency increases along the ventral stream hierarchy as well as along the intraparietal areas, and by suggesting that higher-level visual areas detect phase congruency across multiple spatial scales.
Importance of image phase structure for perception
In natural images, spatial phase information is essential for the recognition of visual objects (Oppenheim and Lim, 1981). The distinction of local and global phase (i.e., phase in local and global Fourier transform) is important here. Presumably, only local phase is computed in the perceptual system. However, strong perturbation of global phase typically also leads to strong disruption of local phase congruency, and thus to perceptual distortions. In fact, perturbations of global phase in natural images alter higher-order image statistics (Thomson et al., 2000) and impair object categorization performance more than a contrast reduction (Bex and Makous, 2002; Wichmann et al., 2006). Furthermore, Wang and Simoncelli (2004) proposed that perturbations of local phase coherence can explain the perception of blur in natural images. Our results suggest that the perception of such phenomena might originate in the higher-level visual areas, which showed pronounced selectivity for congruent phase alignments.
Observers are not very sensitive to the phase difference between two spatial frequency components (Burr, 1980; Badcock, 1984), and it has been suggested that human visual system would be specialized only for edge-like (0° phase difference) and line-like features (180° phase difference) (Atkinson and Campbell, 1974). A phase discrimination task comprises, however, confounding differences in local contrast (Badcock, 1984). Our results support the idea that phase alignments are encoded in the cortex and provide further evidence for the behavioral data (Tyler and Gorea, 1986) that phase sensitivity cannot be explained by local contrast.
Most previous studies on contour processing in the human visual cortex have concentrated on integration of local edges across the space, whereas the phase congruency detection requires integration of responses across spatial scales. Given the importance of edge structures in a contour detection task, cross-frequency alignments must be processed either before or in parallel with contour integration (Dakin and Hess, 1999).
Neuroimaging studies on spatial phase processing in human visual cortex
In addition to perception, phase-scrambling of pictures effectively attenuates the fMRI responses from the object-processing areas (Malach et al., 1995). In contrast to the monotonic dependence of phase coherence for perception, nonmonotonic fMRI tuning function for phase coherence in natural images has been reported from macaque visual cortex (Rainer et al., 2001). However, confounding differences in higher-order statistics may partly explain the decrease in the fMRI response for the intermediate blending of natural images with phase noise (Dakin et al., 2002). More recently, a monotonic increase in the fMRI response as a function of the amount of spatially correlated noise in natural images has been reported in humans with a progressive increase in the slope of this function along the ventral stream areas (Tjan et al., 2006).
Perna et al. (2008) compared fMRI responses for edges, lines, and phase noise in a block design and reported higher response for edges and lines than noise in several visual areas, including V1, and higher responses for edges than lines in areas around LOC and intraparietal sulcus. In addition, they showed that the preference for the coherent stimuli is reduced in V1 when the contrast of the stimuli is increased whereas in the higher-level areas the result is stable across contrast values. They suggested that a model that combines the local energy model (Morrone and Owens, 1987; Morrone and Burr, 1988) with a nonlinear contrast gain function can account for the results.
Our results show phase sensitivity in all visual areas, including V1, with fMRI adaptation. The adaptation design enables detection of phase sensitivity even in spatially overlapping neural populations within an fMRI voxel. We need to attenuate the response selectively to one phase combination to show that there is another population of neurons with different phase preference within the same voxel. This is comparable to orientation selectivity (Fang et al., 2005). Our experiment 2 complements the experiment performed by Perna et al. (2008). They used central wide-field stimuli, whereas our grating patches were peripheral, at mean eccentricity of 7.6°. Psychophysical studies have shown differences in the phase perception for central and peripheral viewing. The phase discrimination in the central viewing may be mediated mainly by two classes of phase detectors specialized for edge-like and line-like features (Burr et al., 1989), whereas detectors tuned to intermediate phase values may contribute to the phase discrimination in the peripheral vision (Morrone et al., 1989).
Both the studies by Perna et al. (2005, 2008) and the present study support the conclusion that phase congruency is coded in the visual cortex. We contribute to the earlier studies (Perna et al., 2005, 2008) by showing complementary sensitivity to phase congruency, both across stimulus features and across visual areas, and by showing the increase in the selectivity for phase congruency as a function of anatomical hierarchy of visual areas.
Electrophysiological studies on spatial phase sensitivity and spatial frequency pooling
In the primary visual cortex, cells can be divided into simple and complex cells based on their selectivity to spatial phase. Simple cells are selective for the absolute phase of a grating, whereas complex cells are more invariant to the phase. Addition of another frequency to the cell's optimal spatial frequency has an inhibitory effect on the responses of most simple cells and on the responses of a fraction of complex cells, but only a narrow range of spatial frequencies around the optimal spatial frequency can affect the response (De Valois and Tootell, 1983). For simple cells, this effect can depend on the phase difference between the spatial frequency components (De Valois and Tootell, 1983). Mechler et al. (2002, 2007) directly compared responses in macaque V1 single cells for compound grating stimuli with different congruence phases. They reported that both simple and complex cells are able to code the congruence phase, but concluded that phase-specific pooling of V1 responses must be performed to account for the human performance.
In the simplest abstract models, the receptive field of a simple cell is described by a linear spatiotemporal filter whose output is half-wave rectified and squared, and the receptive field of a complex cell is described by a quadrature pair of linear spatiotemporal filters whose outputs are squared and summed (Carandini et al., 2005). Recent studies estimating such filters using reverse correlation methods suggest that macaque complex cells are better described with several additional filters (Rust et al., 2005; Chen et al., 2007). In addition, Felsen et al. (2005) showed that complex cells in cat V1 are tuned to the phase regularities of natural images. If the complex cells integrate simple cell responses sensitive to very different spatial frequencies, this model could account for the phase congruency detection. However, electrophysiological data suggest that phase-sensitive pooling across multiple spatial frequencies is performed outside V1 (Mechler et al., 2002, 2007).
Our fMRI results agree with the electrophysiological studies that V1 already shows phase-specific pooling of spatial frequencies. In our adaptation experiment, the two spatial frequencies (f and 3f) in the compound grating stimuli fit within the spatial frequency bandwidth of V1 cells (De Valois et al., 1982; Foster et al., 1985), and the phase sensitivity could be explained for example with the local energy model (Morrone and Owens, 1987; Morrone and Burr, 1988). However, results from our experiment 2 support the idea that phase congruence across multiple spatial frequencies is identified in higher-level visual areas. The nonlinearity in the pooling over spatial frequency could be similar to the speed-tuning nonlinearity reported in macaque MT/V5 (Priebe et al., 2003). Linear summation of responses to individual gratings explains the preferred speed for compound grating stimuli in V1, but not in V5/MT, where responses are enhanced for stimuli consisting of multiple spatial frequencies moving at the same speed (Priebe et al., 2003, 2006).
Models of broadband feature detection in the visual cortex
Several models of early visual processing have suggested that the visual system should implement phase-sensitive pooling of spatial frequency information in the detection of broadband features (Morrone and Owens, 1987; Morrone and Burr, 1988; Perona and Malik, 1990; Lindeberg, 1998; Kovesi, 1999). Motivation for pooling over spatial frequency can also be derived from natural image statistics (Hyvärinen et al., 2005). Thus, we see how computational modeling can guide experiments on the extrastriate cortex whose function is still not very well understood. We think that models based on the statistical structure of natural images are particularly promising in this respect, since they can provide computational predictions on how the visual input should be processed after V1 (Hyvärinen et al., 2009).
Footnotes
-
This work was supported by Finnish Graduate School of Neuroscience, Finnish Cultural Foundation, and Academy of Finland (National Programme for Centres of Excellence 2006-2011 Grant 213464; Grants 105628, 210347, and 124698; and NEURO-program Grant 111817). We thank Marita Kattelus for help with the measurements, Petteri Räisänen and Ronny Schreiber for technical support, XtraVision consortium for valuable discussions, and Linda Stenbacka for comments on the manuscript.
- Correspondence should be addressed to Linda Henriksson, Low Temperature Laboratory and Advanced Magnetic Imaging Centre, Helsinki University of Technology, P.O. Box 3000, FI-02015 TKK, Espoo, Finland. henriksson{at}neuro.hut.fi