Abstract
Motion-induced blindness (MIB) is a visual phenomenon in which a salient static target spontaneously fluctuates in and out of visual awareness when surrounded by a moving mask pattern. It has been hypothesized that MIB reflects an antagonistic interplay between cortical representations of the static target and moving mask. Here, we report evidence for such antagonism between human ventral and dorsal visual cortex during MIB. Functional magnetic resonance imaging (fMRI) responses in ventral visual area V4 decreased with the subjective disappearance of the target. These response decreases were specific for the cortical subregion corresponding retinotopically to the target, occurred early in time with respect to the perceptual report, and could not be explained by shifts of attention in reaction to target disappearance. At the same time, responses increased in mask-specific subregions in dorsal visual areas in and around the intraparietal sulcus. These opposite responses in ventral and dorsal visual areas occurred only during subjective target disappearance, not when the target was physically removed. Perceptual reports of target disappearance were furthermore associated with a “global” modulation of activity, which was delayed in time, and evident throughout early visual cortex, for both subjective target disappearance and physical target removal. We conclude that awareness of the target is tightly linked to the strength of its representation in ventral visual cortex, and that the mask representation in dorsal visual cortex plays a crucial role in the spontaneous suppression of the target representation during MIB.
Introduction
When surrounded by a moving visual pattern, a salient visual target disappears from visual awareness, as if briefly erased, only to reappear several seconds later, a phenomenon called “motion-induced blindness” (MIB) (Bonneh et al., 2001). MIB does not seem to be solely determined by low-level sensory suppression or adaptation (Bonneh et al., 2001). It has been hypothesized that MIB is caused by a competition between the neural representations of the static target and the moving mask at some level(s) of cortical visual processing (Bonneh et al., 2001; Graf et al., 2002; Keysers and Perrett, 2002), or by cortical mechanisms confined to the target representation, such as filling-in (Hsu et al., 2004, 2006).
MIB is complementary to other perceptual phenomena, such as binocular rivalry (Blake and Logothetis, 2002; Tong et al., 2006), in which perception fluctuates spontaneously in the face of constant physical stimulation (Blake and Logothetis, 2002). Models of bistable perception postulate a competition between two populations of neurons representing the two alternative perceptual interpretations, at multiple levels of the visual cortical hierarchy (Blake and Logothetis, 2002). Alternative hypotheses posit that the spontaneous perceptual transitions are caused by local adaptation and noise within visual cortex (Lehky, 1988; Blake, 1989; Stollenwerk and Bode, 2003; Wilson, 2003; Moreno-Bote et al., 2007) or by an active selection mechanism akin to top–down attention (Kleinschmidt et al., 1998; Lumer et al., 1998; Leopold and Logothetis, 1999).
To gain additional insight into the neural basis of spontaneous perceptual transitions, we measured neural activity with functional magnetic resonance imaging (fMRI) in human visual cortex, while subjects reported the disappearance of a salient visual target during MIB (see Fig. 1A). The static target and the moving mask were (by design) processed by distinct, retinotopically organized, neural populations in early visual cortex, and by separate (the ventral and dorsal, respectively) visual pathways at higher levels of the human visual system (Ungerleider and Haxby, 1994). Together, retinotopic and functional specificity enabled us to isolate the cortical target and mask representations, in multiple stages of cortical visual processing.
The results were consistent with the hypothesis that an antagonism between the mask representation in dorsal visual cortex and the target representation in ventral visual cortex underlies the spontaneous target disappearance during MIB. Responses decreased with target disappearance in the retinotopic subregion of visual area V4 that corresponded to the target. In contrast, responses increased with target disappearance in subregions of dorsal visual areas that corresponded to the mask. These opposite target- and mask-specific fMRI responses were evident early in time, coincident with the perceptual report, and they were specific to spontaneous target disappearance (i.e., absent when the target was physically removed). This antagonism might reflect direct suppression of the target representation in ventral cortex by the mask representation in dorsal cortex, or common input of opposite sign from outside of visual cortex. Some of these results have been published previously in preliminary form (http://www.journalofvision.org/8/6/538/).
Materials and Methods
Subjects
Data were acquired from six healthy subjects with normal or corrected-to-normal vision (one female; age range, 25–35 years). One subject was an author. All experiments were conducted with the written consent of each subject and in accordance with the safety guidelines for fMRI research, as approved by the University Committee on Activities Involving Human Subjects at New York University. Each subject participated in several scanning sessions: one to obtain a high-resolution anatomical volume for cortical surface extraction, one to define retinotopically organized cortical visual areas, one to identify the subregions of these areas corresponding to the target and mask locations, and three to five sessions to measure fMRI responses in the main experiments (MIB and physical “replay,” described next).
Stimulus, task, and procedure
Subjects reported the disappearance and reappearance of a salient target surrounded by a moving mask (see Fig. 1A). While fixating the central cross, subjects reported their perception of the target by depressing a button with the right middle finger (visible) or right index finger (invisible), switching between these two button states in response to the perceptual transitions. We exploited the fact that the elements of a perceptual group disappear and reappear conjointly during MIB (Bonneh et al., 2001). The target was a contour made up of multiple yellow collinear bars of maximum contrast, lying on an imaginary circle around the central fixation cross. A single bar subtended ∼7.5° of polar angle and ∼0.5° of eccentricity. The length of the target contour (i.e., the number of bars) was determined individually for each subject in psychophysical pilot experiments as the maximum number such that disappearance during MIB was >20% of the viewing time. Thus, the target was either two or four bars in length (corresponding to ∼1 or 2° of visual angle), subtending ∼15 or 30° of polar angle. It was always centered on one of the visual field diagonals; the quadrant in which it was placed was determined individually to maximize invisible time. The target was positioned at an eccentricity of 4.5° for five subjects but was placed at an eccentricity of 5.5° for the sixth subject to obtain sufficient periods of invisibility. The mask consisted of 200 blue moving dots confined to an aperture of 7° radius. The random dot pattern was displayed as if arranged on the surface of a sphere rotating around an oblique axis. The target was separated from the mask by a blank protection zone subtending ∼2° around the target. Target and mask stimuli were superimposed on a black background. Stimuli were projected onto a rear-projection screen in the bore of the magnet via an LCD projector (Eiki LC-XG100; Eiki) with a pixel resolution of 1024 × 768 and 60 Hz refresh rate. Subjects were supine and viewed the screen through an angled mirror at a distance of 57 cm, yielding a field of view of 29 × 22°.
The majority of MIB runs were interleaved with replay runs, during which the target was physically removed from the display according to the temporal sequence of the subject's perceptual reports in the preceding MIB run. Subjects reported target visibility/invisibility by depressing the same two buttons as during MIB. The fMRI responses during replay provided a reference for the interpretation of responses during MIB. Each subject performed 20–41 MIB runs and 19–27 replay runs (180 s duration each).
Eye movements were not recorded because the resolution of our fMRI-compatible eye tracker was not sufficiently precise to rule out fixational eye movements (specifically microsaccades), which might have modulated neural responses to the small static target. Such fixational eye movements almost certainly occurred during the prolonged stimulus viewing (Martinez-Conde, 2006). For several reasons, however, it is unlikely that fixational eye movements caused the fMRI responses reported in this study (see Results, A global response component in early visual cortex, and Discussion). Moreover, target disappearance in MIB cannot be explained by retinal stabilization after a spontaneous reduction of microsaccades, as has been suggested for Troxler fading (Martinez-Conde, 2006). As opposed to Troxler fading, targets of high luminance contrast disappear more, and targets surrounded by a static mask disappear less (Bonneh et al., 2001; Hsu et al., 2004). We verified in two of the subjects, using the same stimulus configuration as in the fMRI experiments, that the motion of the dots was in fact critical for target disappearance.
Magnetic resonance imaging data acquisition
Magnetic resonance imaging (MRI) data were acquired on a 3 T Allegra scanner (Siemens Medical Systems) equipped with a transmit head coil (NM-011) and a four-channel phased-array receive surface coil (NMSC-021; both Nova Medical) positioned at the back of the head. We measured blood oxygenation level-dependent (BOLD) changes in MRI signal intensity using a standard echoplanar imaging sequence with the following parameters: repetition time (TR), 1.2 s; echo time (TE), 30 ms; flip angle, 72°; 64 × 64 matrix; voxel size, 3 × 3 × 3 mm; 22 slices oriented approximately perpendicular to the calcarine sulcus, covering the occipital lobe and part of the temporal and parietal lobes. In retinotopic mapping sessions, we used the same imaging parameters with the following exceptions: TR, 1.5 s; flip angle, 75°; 27 slices. At the beginning of each session, we acquired an anatomical T1-weighted MPRAGE (magnetization-prepared rapid gradient echo) volume in the same slices as the functional volumes, but with twice the in-plane resolution (voxel size, 1.5 × 1.5 × 3 mm).
Data analysis: preprocessing
Data from the beginning of each functional run were discarded (10 volumes from each run of MIB and replay; 14 volumes from each run of the retinotopic mapping and periodic stimulus alternation experiments) (see below, Retinotopic mapping and Definition of ROIs) to minimize the effect of transient magnetic saturation, and to allow the hemodynamic response to reach steady-state baseline. We compensated for head movements within and across scans with standard procedures (Jenkinson et al., 2002), converted the data from arbitrary intensity units to percentage modulation, and high-pass filtered (cutoff, 0.02 Hz) the time series to remove slow drift.
The anatomical volume from each session was aligned to a high-resolution anatomical volume (acquired in a different scanning session) by an automated robust image registration algorithm (Nestares and Heeger, 2000). The resulting alignment parameters were used to resample the functional data from each scanning session to the image space of the high-resolution anatomy. Cortical surfaces were extracted from the high-resolution anatomy using SurfRelax software (http://www.cns.nyu.edu/∼jonas/software.html), enabling us to visualize and define regions of interest (ROIs) (see below, Retinotopic mapping and Definition of ROIs) on computationally flattened representations of the occipital cortex. The alignment parameters were also used to transform the ROIs from the high-resolution image space to the image space of each session. This enabled us to coregister the data and extract time series from corresponding ROIs across scanning sessions.
Event-related responses time-locked to perceptual reports
fMRI responses during MIB.
We performed a deconvolution analysis (Dale, 1999) to estimate the mean fMRI response time course for reported target disappearance and reappearance. This procedure is equivalent to selective averaging with correction for overlap between temporally adjacent responses (Dale, 1999), based on the assumption that hemodynamic responses superimpose linearly over time (Boynton et al., 1996). This was done separately for each subject and each of several ROIs in visual cortex [V1, V2, V3, V4, MT+, V3AB, V7, posterior intraparietal sulcus (pIPS)] (see below, Retinotopic mapping and Definition of ROIs). The preprocessed time series were averaged across gray matter voxels within each ROI. The resulting mean time series was up-sampled by a factor of two (that is, at 0.6 s resolution). We then computed ordinary least-squares estimates of the mean responses to the two switch event types (disappearance/reappearance reports) according to the following:
where y was the measured time series, h = [h1T h2T]T was a vertical concatenation of estimated responses to the two event types, the design matrix X = [X1 X2] was a horizontal concatenation of two convolution matrices corresponding to the two event types, and superscript T indicates matrix transpose. Each Xi had dimensions M × N, where M was the number of samples in y and N was the number of time points in the estimated hi. The first column of each Xi contained 1's at the samples of the corresponding switch event and 0's elsewhere. Each of the N − 1 subsequent columns contained a copy of this event sequence, shifted by the corresponding lag. To create the discrete event sequences, we rounded subjects' disappearance and reappearance reports (button presses sampled at 1 ms resolution) to the nearest sample (600 ms resolution) of the fMRI time series. We estimated 16 parameters (corresponding to 9 s) for both disappearance and reappearance responses. The random distribution of interswitch intervals (see Fig. 1B) made the response estimation from a rapid event sequence particularly efficient (Burock et al., 1998; Dale, 1999). To quantify the mean and variability of responses across subjects, we first concatenated the preprocessed measurements from each run, estimated the responses for each ROI from each subject using the deconvolution procedure, and finally computed the mean and SEM across subjects.
fMRI responses during replay.
fMRI responses time-locked to perceptual reports during replay were estimated as described above for MIB (Eq. 1), with the exception that only those perceptual reports (button presses) preceded by a stimulus alternation within a time window from 200 to 1000 ms were encoded as events in the design matrix. This ensured that only button presses after physical stimulus transitions, rather than purely subjective transitions, were used for analyzing the event-related responses during replay. Calculating the responses time-locked to the button presses allowed for direct comparison between the response time courses during MIB and replay. In a separate analysis, we also calculated replay responses time-locked to the physical stimulus transitions (supplemental Fig. 5, available at www.jneurosci.org as supplemental material).
Isolation of target- and mask-specific responses
Neural response modulations that specifically reflected observers' perception of the target should have been spatially specific to the target representation in each visual cortical area. Likewise, neural responses potentially instigating the target suppression should have been spatially specific to the mask representation. We aimed to isolate such target- and mask-specific response components within each visual area. Therefore, we first removed from the time series in the target subregion any signal component shared by the corresponding mask subregion. Likewise, we removed from the time series in the mask subregion any signal component shared by the corresponding target subregion. In areas MT+, V3AB, V7, and pIPS, the target representation could not be delineated (see below, Definition of ROIs). To isolate mask-specific responses within each of these areas, we therefore removed from the ipsilateral mask subregion (i.e., representing the mask only) any signal component shared by the corresponding contralateral subregion (i.e., representing the target and the mask).
For each subregion of interest (“target” and “mask” subregions for V1–V4; mask subregions for MT+, V3AB, V7, pIPS), the “reference” time series to be removed was obtained by averaging voxel time series across the complementary subregion, separately for each subject. The reference time series was normalized to a unit vector, and a residual target-specific (or mask-specific) subregion time series was then computed with orthogonal projection:
where y was the original time series of the target (or mask) subregion, r was the unit vector reference time series, and y* was the residual, target-specific (or mask-specific) time series. Having removed the variance accounted for by the reference time series, we calculated the mean responses time-locked to perceptual reports from the residual time series using deconvolution, that is, substituting y with y* in Equation 1. Removing r via orthogonal projection ensured that precisely the amount of r present in y was removed. This procedure isolated a target- or mask-specific response component within each cortical area, which was orthogonal to any response component expressed within the complementary subregion of that area. These residual responses were, therefore, conservative estimates of the target- and mask-specific components of the underlying neural activity. In the following, we refer to the original responses (y) as the “raw” responses and we refer to the residuals (y*) as the “target-specific” and “mask-specific” responses.
We focused on the target- and mask-specific responses in our primary analyses because the raw responses were contaminated by what appeared to be a nonselective and nonsensory “global” response component (see Results, A global response component in early visual cortex). For each cortical area, we removed the reference signal from the fMRI time series, before the calculation of event-related responses, as a proxy for removing this global response component.
Previous fMRI studies likewise removed reference response time courses from the cortical subregion of interest (Meng et al., 2005; Fox et al., 2006; Sylvester et al., 2007). Like simple subtraction of response time courses from different cortical subregions, the orthogonal projection applied here is based on the assumption that stimulus-specific components and global (i.e., spatially nonspecific) components of the cortical activity superimpose linearly. It has been shown that spontaneous fluctuations of cortical activity (Leopold et al., 2003) superimpose linearly with event-related responses (Arieli et al., 1996; Fox et al., 2006), and that their removal improves estimates of spatially specific event-related fMRI responses by eliminating correlated noise (Fox et al., 2006; Sylvester et al., 2007). Moreover, it has been reported that global and/or nonsensory fluctuations of activity in visual cortex can occur, not only spontaneously, but also time-locked to preparatory cues or behavioral reports in visual detection and discrimination tasks (Jack et al., 2006; Sylvester et al., 2007). Such fluctuations may thus camouflage neural responses that are specific to the perceived or attended stimulus (Sylvester et al., 2007). The procedure that we used effectively decorrelated the target- and mask-specific components of the cortical activity (i.e., spatially specific to the stimulus locations) from any superimposed nonsensory or global response component, thereby improving the estimates of spatially specific event-related cortical responses (Leopold et al., 2003; Fox et al., 2006; Sylvester et al., 2007). This procedure guaranteed the spatial specificity of the region-specific responses, but was statistically conservative in that the projection could not have introduced a statistical bias for larger event-related responses.
One might be concerned that the procedure might have artificially produced target-specific and mask-specific responses with opposite polarities. Two observations rule out this concern. First, all target- and mask-specific responses had the same sign as the corresponding raw responses [compare Fig. 3 with Fig. 8A, top row; Fig. 5 with supplemental Fig. 2, top panel (available at www.jneurosci.org as supplemental material)]. Second, we observed target- and mask-specific responses with opposite sign across different visual areas: during target disappearance, area V4 exhibited a target-specific response decrease (see Table 2), whereas pIPS and V3AB exhibited a mask-specific response increase (see Table 3). This dissociation across separate visual areas cannot simply be explained by our analysis procedure.
Statistical comparisons of event-related responses
Statistical tests were performed across subjects (treating individual differences between subjects as a “random effect”), ensuring that significant effects were robust against intersubject variability. We repeated the statistical tests by combining responses across subjects, treating them as measurements from a single subject (“fixed effect”). The main results reported in this study were qualitatively identical for both fixed- and random-effect analyses.
For the statistical analysis of target- and mask-specific responses, we averaged the mean response time courses (estimated with deconvolution) (see above, Event-related responses time-locked to perceptual reports) across a time window from 0 to 1.8 s after the button press (four samples). Our primary purpose was to characterize changes of neural activity occurring during the spontaneous target disappearance. The onset of the hemodynamic response typically lags behind the onset of neural activity by ∼2 s (Heeger and Ress, 2002; Logothetis and Wandell, 2004). Thus, the 0–1.8 s time window likely corresponds to neural activity preceding the perceptual report. To test the target disappearance responses for significance, we used a simple t test across subjects, comparing the disappearance responses (averaged across the 0–1.8 s time window) against zero.
For the statistical analysis of the global response component evident in the raw response time courses of early visual cortex (see Results, A global response component in early visual cortex), we focused on a later time window (4.2–7.8 s after the button press). The global responses were delayed with respect to both the button press and the target-specific response modulations. The 4.2–7.8 s time window typically comprised the peaks/troughs of the global responses and was thus most sensitive. To test the global response component for significance, we performed a paired t test, comparing responses (averaged across the 4.2–7.8 s time window) after reappearance and disappearance.
To characterize the relative timing of the raw responses measured in the target and mask subregions of areas V1 through V4, we compared the peak/trough latencies using a Wilcoxon sign rank test. To this end, we fitted smooth curves to the mean responses using cubic spline functions and identified the peak (for reappearance) or trough (for disappearance) of the best-fitting curve (time window, 2–9 s after the button press).
Retinotopic mapping
A periodic “traveling-wave” stimulation protocol was used to measure retinotopic maps in visual cortex (Engel et al., 1994; Sereno et al., 1995; Wandell et al., 2007). In brief, we measured the cortical representation of polar angle with a slowly rotating wedge-shaped checkerboard stimulus (45° wide) in six scanning runs (three clockwise and three counterclockwise) and the representation of eccentricity with a slowly expanding or contracting annulus-shaped checkerboard stimulus (duty cycle of 25%) in four runs (two expanding and two contracting). For five subjects, the retinotopy stimuli were displayed via the LCD projector (see above, Stimulus, task, and procedure), with a maximum eccentricity of ∼10°. For the remaining subject, retinotopy stimuli were displayed on an LCD flat panel (NEC 2110; NEC) display located behind the scanner bore, with a maximum eccentricity of ∼6°.
A standard Fourier-based analysis was used to identify the borders of retinotopically organized visual areas. The amplitudes and phases of the response were extracted from the fMRI time series at each voxel. We then projected the maps of response phase onto the flattened cortical surface, and defined the area boundaries as the phase reversals of the polar angle maps, while using the multiple representations of the fovea as additional guide for identification of higher-tier extrastriate areas (Wandell et al., 2007). Six previously described visual cortical areas were thus defined in each subject: V1, V2, V3, V4, V3AB (V3A and V3B combined), and V7, which is also referred to as IPS0 (Swisher et al., 2007; Wandell et al., 2007). We note that there is controversy over the definition of area V4 (Hansen et al., 2007; Wandell et al., 2007). We defined area V4 following the convention of Wandell et al. (2007).
Definition of ROIs
Cortical target subregions.
To identify the target subregions of V1–V4, subjects completed 8–10 runs of a periodic block-alternation (15 cycles of 16.8 s) between two complementary diagonal arrangements of probe stimuli (see Fig. 2A). Each was centered in one visual field quadrant at the same eccentricity as the MIB target. They had the same spatial configuration, contrast, and color as the MIB targets, but additionally flickered at ∼8 Hz to drive visual cortical neurons more strongly. We defined the target subregions in visual areas V1 through V4 as the ensemble of voxels: (1) located in the hemisphere contralateral to the target, (2) with responses that correlated with the stimulus alternations (threshold, r > 0.5), and (3) with responses that modulated with a phase that corresponded to blocks when the probe was presented (π to 2π). The results were qualitatively similar for a range of correlation and phase thresholds. Figure 2A shows a map of the responses evoked by the flickering probe in the localizer session for one example subject, along with the borders of visual areas V1 through V4. For this subject, the target was located at ∼4.5° eccentricity in the lower left visual field quadrant, and spanned ∼15° of polar angle. Accordingly, the probe evoked strong responses in the dorsal parts of V1, V2, and V3, and in ventral area V4.
To compensate for any possible coregistration errors, we repeated between two and four of these target localizer runs within each session of the main experiment and further restricted the target subregions as described above, now in the slices of the functional volumes. We used a more liberal correlation threshold for restricting the ROIs based on these within-session localizers (threshold: r > 0.2 for V1, V2, V3; r > 0.15 for V4). Responses in V4 tended to be weaker, so the lower threshold in V4 enabled us to identify a similar number of target-responsive voxels in areas V1, V2, V3, and V4 in each subject and scanning session (Table 1). Repeating the analysis using r > 0.15 in all four visual areas yielded qualitatively similar results.
Size of target subregions in visual cortex
Cortical mask subregions.
To identify mask subregions of each cortical area, subjects completed two to four runs in which the mask pattern alternated periodically (15 cycles of 16.8 s) between moving and static dots. This allowed us to define two motion-sensitive areas of the dorsal pathway: the human MT+ complex, containing both MT and MST (Watson et al., 1993; Tootell et al., 1995; Huk et al., 2002), and a region of pIPS. It also allowed us to identify (threshold, r > 0.5) the subregions corresponding to the mask in each of the other visual cortical areas: V1–V4, V3AB, and V7. pIPS was located in the posterior segment of the horizontal ramus of the intraparietal sulcus, anterior and immediately adjacent to V7; it overlapped with areas IPS1 and IPS2 as defined topographically (Schluppeck et al., 2005; Silver et al., 2005; Swisher et al., 2007). For each of the areas V1, V2, V3, and V4, we excluded the target subregion (defined as described above) from the mask subregion. The representation of the target probe stimulus could not, however, be consistently identified in MT+, V3AB, V7, and pIPS. We therefore divided each of these mask-responsive areas into the subregion ipsilateral to the target stimulus (containing only the mask representation) and the subregion contralateral to the target stimulus (containing both the target and part of the mask representation). For each cortical area analyzed in this study, therefore, we defined two complementary subregions: target and mask subregions for areas V1–V4, and mask (ipsilateral to the target stimulus) and target plus mask (contralateral to the target stimulus) for areas MT+, V3AB, V7, and pIPS.
Control ROIs.
To characterize the spatial extent of the global response component observed in early visual cortex (see Results, A global response component in early visual cortex), we defined three additional ROIs in and around area V1: (1) the V1 subregion corresponding to the black stimulus background surrounding the moving mask, that is, anterior to the mask subregion in retinotopically defined V1; (2) a comparably large subregion of early visual cortex corresponding to the surround of the projection screen, that is, further anterior along the calcarine sulcus; (3) a bihemispheric region of occipital white matter in the vicinity of area V1. In what follows, we refer to these ROIs as “background,” “surround,” and “white matter.” To ensure that the background ROI did not contain any of the cortical mask representation, we made use of the negative BOLD fMRI responses commonly observed to surround stimulus-evoked responses in area V1 (Tootell et al., 1998; Shmuel et al., 2006). We selected those voxels in the periphery representation of V1 (eccentricity range, ∼7–10°), which modulated in anti-phase (i.e., were negatively correlated) with the cortical response to the mask (threshold, r > 0.2). This was the case for most voxels lying anterior to the mask subregion. It was important that this control analysis separated between subregions corresponding to the mask, the stimulus background, and the surround of the projections screen. Therefore, we restricted the analysis to those five subjects in which V1 had been retinotopically mapped out to 10° of eccentricity (see above, Retinotopic mapping).
Results
We measured fMRI responses in multiple visual cortical areas while subjects reported (by depressing one of two buttons) the visibility and invisibility of a salient yellow target contour during MIB (Fig. 1A) or while the target was physically removed briefly from the display. The target onsets and offsets in the latter physical replay condition occurred in the exact same temporal sequence of the perceptual states during the preceding MIB run. A histogram of these perceptual state durations is shown in Figure 1B. We found three distinct cortical response components associated with the subjects' perceptual disappearance reports: an early target-specific decrease in ventral area V4; an early mask-specific increase in dorsal, motion-selective areas; and a delayed global response decrease in early visual cortex.
MIB. A, Schematic illustration of an epoch from a typical MIB experiment. The top row shows the physical stimulus. A constant salient target (yellow) was surrounded by a moving dot pattern (blue), which appeared as a rotating sphere. The target was separated from the moving pattern by a blank zone subtending ∼2° of visual angle. The second row illustrates the subject's fluctuating perception of the target. B, Distributions of target visible (blue) and target invisible (red) periods during fMRI experiments. The thick lines indicate the mean, and the thin lines indicate the SEM across subjects (n = 6).
A target-specific response component in ventral visual cortex
To isolate target-specific cortical responses, we first identified the cortical subregions corresponding retinotopically to the target (Fig. 2A) and the subregions corresponding to the mask in each of visual areas V1 through V4. We then removed, within each area, the time series of the mask subregion from the time series of the target subregions, analogous to previous studies (see Materials and Methods, Isolation of target- and mask-specific responses) (Meng et al., 2005; Fox et al., 2006; Sylvester et al., 2007). In the following, we refer to the original responses (before removing the time series of the mask subregion) as the raw responses and we refer to the residuals as the target-specific responses. Finally, we averaged the target-specific responses, time-locked to subjects' perceptual reports (disappearance and reappearance) by means of deconvolution (see Materials and Methods, Event-related responses time-locked to perceptual reports). This procedure guaranteed the spatial specificity of the resulting responses, but it did not introduce a statistical bias for larger responses or a particular sign of the responses.
Linking fMRI responses to MIB. A, Map of cortical responses from one example subject to a flickering probe stimulus at the target location. Colors represent correlation between measured activity and stimulus alternations (threshold, r > 0.5). The map is superimposed on a flattened representation of the subject's occipital lobe. The borders of visual areas V1, V2, V3, and V4 are indicated as white lines. The probe was presented in the lower left visual field quadrant, that is, in the same location as the target in the MIB experiment (see icon, gray-shaded region). Thus, it evoked responses in the corresponding dorsal subregions of right hemisphere visual areas V1–V3, and in ventral V4. B, Target-specific fMRI responses in area V4 (see icon, gray-shaded region) from an example subject during MIB: target disappearance (“off”) and target reappearance (“on”). Target-specific responses were isolated by removing the mask subregion time series from the target subregion time series via orthogonal projection (see Materials and Methods, Isolation of target- and mask-specific responses), calculating event-related responses from the residuals for each run, and averaging across runs. Error bars indicate SEM across runs (n = 23).
There was a strong and consistent target-specific response during the perceptual transitions in area V4. Figure 2B shows these target-specific responses in the target subregions of area V4 for an example subject. The responses reflected both types of perceptual transitions. Specifically, responses decreased below the mean baseline level when time-locked to target disappearance and increased when time-locked to target reappearance (Fig. 2B, red squares and blue circles). Figure 3 shows the group average target-specific response modulations for visual areas V1, V2, V3, and V4, and for both MIB and replay. Responses to target reappearance increased similarly for MIB and replay in all four visual areas (Fig. 3, compare blue circles and light blue stars). But there were notable differences between the disappearance responses during MIB and replay (Fig. 3, compare red squares and pink diamonds), as characterized in the following paragraphs. In an alternative version of the analysis, we isolated the target-specific responses by removing the average mask time series, collapsed across the mask subregions of all four visual areas (V1–V4). Removing this average mask time series from the time series of each individual target subregion yielded nearly identical results (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Target-specific responses in visual areas V1–V4. Group average target-specific disappearance (off) and reappearance (on) responses during MIB and replay for the target subregions (see icon, gray-shaded region). Target-specific responses were isolated by removing the mask subregion time series from the target subregion time series via orthogonal projection (see Materials and Methods, Isolation of target- and mask-specific responses), calculating event-related responses from residuals for each subject, and averaging across subjects. Error bars indicate SEM across subjects (n = 6).
Subjective perceptual transitions during bistable stimulus viewing, like physical stimulus transients, might capture attention (Lee et al., 2007; Pastukhov and Braun, 2007), which is known to boost neural activity in visual cortex (Liu et al., 2005). An important concern is thus that neural responses associated with perceptual transitions might only reflect the secondary effect of attention capture, rather than the perceptual transition per se. However, neural responses reflecting attention capture and neural responses reflecting perception should behave differently during target disappearance in MIB. Stimulus onsets and offsets both capture attention (Theeuwes, 1991; Watson and Humphreys, 1995). Activity in the target subregion reflecting attention capture should thus increase during both target reappearance and disappearance. In contrast, activity in the target subregion reflecting target perception should decrease during target disappearance. We therefore focused on the disappearance responses to distinguish response decreases (that we interpret as being involved in target perception) and response increases (that are confounded with attention capture).
In fact, responses to the physical target removal during replay, particularly in V4, increased initially (Fig. 3, pink diamonds, 0–4.2 s after button press), as would be expected for attention capture. Similar transient response increases have been observed in human V4 after contrast decrements (Gardner et al., 2005). Later in time, the responses to the physical target removal fell below baseline in all four areas, particularly in V1, V3, and V4 (Fig. 3, pink diamonds, >4.8 s after the button press). These delayed response decreases during replay suggest that our measurements had sufficient sensitivity and retinotopic specificity to detect neural responses specific to the small target stimulus even in the presence of the moving mask (see also supplemental Fig. 4, available at www.jneurosci.org as supplemental material).
During MIB, disappearance responses in V4 decreased already around the time of the button press (Fig. 3, red squares). Figure 4 shows the mean amplitudes of the target-specific disappearance responses, averaged across the time window 0–1.8 s after the button press. The disappearance responses in V4 were significantly below baseline during MIB, but not during replay (Table 2). The target-specific response decreases in V4 were robust, statistically significant across all time windows from 0 to 4.8 s (p < 0.05, two-tailed t test), but we focused on the initial (early) responses coincident with the perceived target disappearance. The same was true for the target-specific responses obtained in V4 by removing the mask time series averaged across V1–V4 instead of the V4-specific one (supplemental Fig. 1, available at www.jneurosci.org as supplemental material). The raw responses (before removing the time series of the mask subregion from that of the target subregion) also decreased significantly in the target subregion of V4 during this early time window after target disappearance during MIB, but not during replay (Table 2). Thus, the sign of the response (decrease) in the V4 target subregion did not depend on the specific data analysis procedure.
Early target disappearance responses in visual areas V1–V4. Group average target-specific response amplitudes immediately after the perceptual report during MIB and replay (time window, 0–1.8 s after button press) for the target subregions (see icon, gray-shaded region). Error bars indicate SEM across subjects (n = 6). The asterisks indicate significant difference from 0 (p < 0.05, two-tailed t test).
Early disappearance responses in target and mask subregions of visual areas V1–V4
The early response decreases observed in V4 during MIB must have occurred around the time of the subjective disappearance. The onset of the hemodynamic response typically lags behind the onset of neural activity by ∼2 s. The median latencies of disappearance button presses in the replay experiments ranged across subjects from 501 to 568 ms (upper quartile, 586–819 ms). Assuming similar button press latencies during MIB, the significant fMRI response decreases at t < 1.8 s indicate that the underlying neural responses in V4 likely preceded the behavioral report.
In none of the earlier visual areas were the target-specific disappearance responses statistically significant (Fig. 4, Table 2). There were no significant response decreases in V1–V3 in any of the time windows from 0 to 4.8 s. The same was true for the target-specific responses obtained in these areas by removing the mask time series averaged across V1–V4 (supplemental Fig. 1, available at www.jneurosci.org as supplemental material). Neither were the raw disappearance responses statistically significant in the target subregions of V1–V3 (Table 2). Thus, there was no evidence for any response modulation around the time of the subjective target disappearance in visual areas V1 through V3.
The mask-specific response modulations in areas V1 through V4 were statistically indistinguishable from zero during MIB (Table 2). Mask-specific responses were calculated by removing, separately for each visual area, the time series of the target subregion from the time series of the mask subregion (see Materials and Methods, Isolation of target- and mask-specific responses). There was no evidence of response modulation in mask subregions of V1 through V4, for any time window between 0 and 4.8 s after spontaneous target disappearance. This apparent lack of mask-specific response modulation in V1 through V4 contrasts with the robust target-specific responses in V4 (see above), and with the robust mask-specific responses in dorsal visual areas (presented next).
An opposite, mask-specific response component in dorsal visual cortex
Several cortical areas in the dorsal visual pathway responded strongly to the motion of the mask (see Materials and Methods, Definition of ROIs). These dorsal visual areas included the human MT+ complex (MT and MST combined), V3AB (V3A and V3B combined), V7, and the posterior IPS. Activity in these areas has been reported to correlate with the perception of 3D structure from motion (Brouwer and van Ee, 2007), like that in the mask, and with the control of top–down attention (Kastner and Ungerleider, 2000; Corbetta and Shulman, 2002; Silver et al., 2005).
We focus on the responses to the mask in the hemisphere ipsilateral to the target because the contralateral subregions reflected a mixture of responses to target and mask. The target representation could not be reliably identified in these dorsal areas, presumably because of larger response field size and scatter than in areas V1 through V4 (Wandell et al., 2007; Dumoulin and Wandell, 2008). We therefore analyzed these dorsal mask subregions separately for the hemisphere ipsilateral to the target (containing only the mask representation) and contralateral to the target (containing target and mask representations). To further isolate mask-specific cortical responses, we removed from the time series in the mask subregion the time series of the contralateral (target plus mask) subregion, before calculating the event-related responses (see Materials and Methods: Isolation of target- and mask-specific responses). We refer to the original responses (before removing the time series of the contralateral subregion) as the raw responses and we refer to the residuals as the mask-specific responses [supplemental Figs. 2 and 3 (available at www.jneurosci.org as supplemental material) show the raw responses of all four dorsal visual areas in both hemispheres]. As noted above, this procedure guaranteed the spatial specificity of the resulting responses, but it did not introduce a statistical bias for larger responses or a particular sign of the responses.
During MIB, the mask-specific responses in areas MT+, V3AB, and pIPS tended to increase above baseline when time-locked to target disappearance and to decrease when time-locked to target reappearance (Fig. 5, red squares and blue circles). In other words, the mask-specific responses in these dorsal visual areas were opposite to the target-specific responses in V4. The opposite responses during MIB were not expressed throughout intraparietal cortex, but circumscribed to a subset of intraparietal areas. In area V7, located directly in between V3AB and pIPS, responses were more like those in V1–V4 and tended to decrease during target disappearance and increase during target reappearance (supplemental Fig. 3, available at www.jneurosci.org as supplemental material).
Mask-specific responses in dorsal visual areas. Group average mask-specific disappearance (off) and reappearance (on) responses during MIB and replay for the mask subregions of areas MT+, V3AB, and pIPS (i.e., contralateral to the cortical representation of the target) (see icon, gray-shaded region). Mask-specific responses were isolated by removing the time series of the corresponding contralateral subregion containing the target representation (see Materials and Methods, Isolation of target- and mask-specific responses), calculating event-related responses, and averaging across subjects. Error bars indicate SEM across subjects (n = 6).
As with the target-specific responses in V4, it was important to consider whether the mask-specific responses in dorsal areas might have been reflected by capture of attention by the phenomenal transient (Lee et al., 2007; Pastukhov and Braun, 2007). A reflexive attention shift to the transient change of target visibility would predict a response decrease, if any at all, in the cortical mask representation for both target reappearance and disappearance. Thus, the response increase observed in the dorsal mask subregions during target disappearance can hardly be explained by attention capture. We therefore again focused on the disappearance responses, now distinguishing response increases in the mask subregion (that might be involved in the target suppression) from response decreases (that are confounded with attention capture).
The early mask-specific response increases at target disappearance during MIB were significantly above zero in pIPS and V3AB, but not in MT+ and V7. Figure 6 shows the disappearance response amplitudes averaged across the time window from 0 to 1.8 s after the perceptual report, and Table 3 lists the statistics. The raw disappearance responses in V3AB and pIPS showed the same polarity as the mask-specific responses (supplemental Fig. 2, top panel, available at www.jneurosci.org as supplemental material). But the early opposite polarity responses of the mask subregions, particularly in V3AB, were pronounced by isolating the spatially specific component (Table 3), indicating that these responses predominated in the hemisphere contralateral to the cortical target representation. Again, this observation is incompatible with attention capture by the target disappearance underlying the responses in these dorsal areas. Such a reflexive attention shift to the target location would have predicted a larger response increase in the hemisphere containing the target representation.
Early target disappearance responses in dorsal visual areas. Group average mask-specific response amplitudes immediately after the perceptual report during MIB and replay (time window, 0–1.8 s after button press) for the mask subregions ipsilateral to the target stimulus (i.e., contralateral to the cortical representation of the target) (see icon, gray-shaded region). Error bars indicate SEM across subjects (n = 6). The asterisks indicate significant difference from 0 (p < 0.05, two-tailed t test).
Early disappearance responses in mask subregions of dorsal visual areas
If the dorsal regions in the IPS played a role in instigating the spontaneous target suppression during MIB, the opposite polarity responses should have been specific to MIB, and should not have occurred during the physical removal of the target. In fact, there was no evidence of response increases in V3AB and pIPS during the physical removal of the target; responses instead tended to decrease (Fig. 5, compare red and pink curves, and blue and light blue curves). The disappearance responses immediately after the perceptual report were significantly larger during MIB than during replay in both V3AB and pIPS, for both the raw and the mask-specific responses (p < 0.05, two-tailed paired t test). This difference was significant neither in MT+ (raw responses, p = 0.42; mask-specific responses, p = 0.45) nor in V7 (raw responses, p = 0.3; mask-specific responses, p = 0.11). The striking dissociation between mask-specific responses during spontaneous disappearance and physical removal of the target is consistent with the hypothesis that cortical representations of the mask in dorsal visual areas (i.e., V3AB and pIPS) play a crucial role in the spontaneous suppression of target representations in ventral visual areas (i.e., V4) during MIB.
A global response component in early visual cortex
Superimposed on the target- and mask-specific responses, there was a global (i.e., spatially nonspecific) response component in early visual cortex. To preview the results, this global response component had the following characteristics. The global response decreased with target disappearance and increased with target reappearance. It was delayed with respect to the perceptual report and with respect to the target- and mask-specific response components. Different from the target- and mask-specific responses discussed above, the global response was also prevalent during physical replay. Together, these results suggest that the global response component did not instigate the spontaneous switch of target visibility during MIB. In the following characterization of the global response, we focused on a later time window (4.2–7.8 s after the perceptual report), because the peaks and troughs of the global response were typically within this time window.
The global response was evident throughout the entire representation of the visual field. Figure 7 shows the raw MIB and replay responses of three subregions of area V1, corresponding to the mask, the black stimulus background, and the surround of the projection screen. All three V1 subregions (mask, background, surround) exhibited a significant (4.2–7.8 s: p < 0.05, two-tailed paired t test) response modulation during both MIB and replay. In other words, the global response in V1 was not restricted to the cortical representation of the MIB stimulus, but was also evident in the representation of the far visual field periphery. The responses of a bihemispheric region in the occipital white matter in the vicinity of V1 are shown in Figure 7 as a control. There was no significant modulation in this white matter control ROI (4.2–7.8 s: MIB, p = 0.49; replay, p = 0.15; two-tailed paired t test), indicating that the global component was confined to the cortex.
Global response throughout area V1. Group average raw disappearance (off) and reappearance (on) responses during MIB and replay for four subregions in and around V1, corresponding to the mask, the black stimulus background (see icons, gray-shaded regions), the surround of the stimulus projection, and a bihemispheric region of occipital white matter in the vicinity of area V1. Data were averaged across the five subjects for whom the peripheral V1 portion corresponding to the stimulus background was retinotopically defined (see Materials and Methods, Definition of ROIs). Error bars indicate SEM across subjects (n = 5).
Because of the global response component, the raw response time courses were similar across target and mask subregions of early visual cortex. Figure 8A shows the raw response time courses during MIB and replay for the target and mask subregions in areas V1–V4. The raw responses decreased with target disappearance and increased with target reappearance, similarly in the target (top row) and mask subregions (bottom row). The raw responses during the 4.2–7.8 s time window were statistically significant in target and mask subregions of V1 through V3 during replay, although less consistently during MIB (Table 4). In contrast, the raw responses in V4 during this late time window were significant neither during MIB nor during replay (Table 4).
Global response throughout early visual cortex. A, Group average raw disappearance (off) and reappearance (on) responses during MIB and replay for retinotopic subregions corresponding to the target (top row) and surrounding mask (bottom row) of areas V1 through V4 (see icons, gray-shaded regions). Error bars indicate SEM across subjects (n = 6). B, Peak latencies of fMRI responses of the target subregion are plotted against those of the surrounding mask subregion in areas V1 through V4. Top row, MIB; bottom row, replay. Peak latencies in each subregion were estimated for reappearances (blue) and trough latencies were estimated for disappearances (red). The symbols represent subjects.
Late response modulation in target and mask subregions of visual areas V1–V4
There was one notable difference between the raw response time courses in target and mask subregions of early visual cortex: The response peaks in the mask subregions were consistently delayed with respect to those in the target subregions (Fig. 8B, top). This latency difference was highly significant (p < 0.01, Wilcoxon's sign rank test) in V1 and V2 and significant (p < 0.05) in V3 and V4. In contrast, peak latencies did not differ consistently between the same cortical subregions during replay (Fig. 8B, bottom) (V1, p = 0.73; V2, p = 0.11; V3, p = 0.52; V4, p = 0.68; Wilcoxon's sign rank test).
The global response component did not reflect cortical and/or hemodynamic point spread. First, the responses also occurred in subregions of V1 much further away from the target subregion (Fig. 7) than would be predicted by the point spread in V1 (Engel et al., 1997; Logothetis and Wandell, 2004). Second, the raw responses in the mask subregions were absent in a control experiment in which subjects passively viewed the same stimulus configuration (supplemental Fig. 4, available at www.jneurosci.org as supplemental material), but strong during the replay experiment (Fig. 8A, bottom row, blue stars and pink diamonds).
This dissociation between the mask subregion responses during replay and control experiment may have been caused by several factors specific to the replay: subjects reported the visibility of the target; the target appeared in an irregular (unpredictable) manner; the target was static, and thus less salient than the flickering probe in the control experiment. The same factors were also critical for driving analogous widespread cortical responses in previous studies (see Discussion). Additional experiments will be needed to determine whether any of these factors alone, or only their combination, is sufficient for driving the global response. Regardless, the dissociation between raw responses in the mask subregions during replay and control experiment suggests that the global response component during replay was nonsensory (i.e., not solely driven by the target stimulus).
The notion that the global response component was nonsensory is further supported by the observation that the raw responses in all subregions of V1 through V3 were also temporally more closely tied to perceptual report than to sensory stimulation. Averaging the raw replay responses time-locked to the stimulus transitions, rather than subjects' behavioral reports, revealed consistently smaller response amplitudes in all subregions of V1 through V3 (supplemental Fig. 5, available at www.jneurosci.org as supplemental material). Analogous responses time-locked to perceptual reports, with analogous shifts of response peak latencies between a target representation and the surrounding cortex, have been observed in early visual cortex during change detection (Moradi et al., 2007) and simple detection and discrimination tasks (Jack et al., 2006).
Subjects may have made small fixational eye movements during or after the (subjective or physical) changes in target perception, but it is unlikely that eye movements can explain the global response component. Such fixational eye movements cause shifts of the retinal stimulus image, which in turn drives lateral geniculate nucleus (LGN) and V1 activity (Martinez-Conde, 2006). We observed a particularly strong raw response in the unstimulated anterior subregion of V1 corresponding to the black stimulus background surrounding the mask (Fig. 7). Eye movements could not have caused these responses (Martinez-Conde, 2006). It is also unlikely that eye movements caused the observed global response in subregions of cortex corresponding to the moving mask. Although fixational eye movements can modulate the responses of direction-selective neurons to moving patterns, it is unlikely that this would produce a net modulation of the population response because the modulation of the responses of each neuron depends on its preferred direction relative to the retinal image shift (Bair and O'Keefe, 1998).
To sum up, the global response component was particularly robust in early visual areas V1–V3, and in the cortical subregions corresponding to the periphery of the visual field well beyond the stimulus; it was most closely tied to perceptual reports; it was not caused by retinal image shifts attributable to eye movements. Importantly, the global response component occurred during both MIB and replay. Thus, most likely, it did not trigger the spontaneous perceptual transitions during MIB.
Discussion
The target subregion in area V4 exhibited a spatially specific response decrease during spontaneous disappearance of a salient visual target induced by a moving mask. This response decrease occurred early in time with respect to the perceptual report, despite the sluggishness of the hemodynamics. In contrast, responses in mask subregions of the dorsal pathway, particularly in areas V3AB and pIPS, increased equally early in time with the subjective target disappearance during MIB, but not during its physical removal. These opposite target- and mask-specific responses were superimposed on a delayed, and apparently nonsensory and spatially nonspecific (i.e., spatially global) response component, which was primarily expressed in early visual cortex (V1–V3).
The current study depended on our ability to isolate fMRI responses in visual cortex specifically reflecting the target representation. This may have been compromised by a number of factors. First, given the spatial proximity of target and mask and the cortical and hemodynamic point spread (Engel et al., 1997; Logothetis and Wandell, 2004), the fMRI responses elicited by both stimulus components certainly overlapped. But we were clearly able to differentiate responses of the target subregions in V1–V4 from the surrounding activity, based on differences in response time course, amplitude, and latency (Figs. 3, 4, 8B; supplemental Fig. 4, available at www.jneurosci.org as supplemental material). Second, the removal of small targets surrounded by a moving pattern evokes response increases in some neurons and decreases in others (Wilke et al., 2006). If completely balanced, such a mixture of response polarities would cancel in the fMRI signal. However, we observed decreases of target-specific fMRI responses with subjective target disappearance. Thus, we infer that the underlying neural activity must have predominantly decreased with target disappearance. Third, the target disappearance during MIB may be followed by the phenomenal substitution of the target with the mask (Hsu et al., 2004, 2006). The neural activity associated with such perceptual filling-in (Meng et al., 2005; Komatsu, 2006) might counteract the response decreases associated with target disappearance. But, again, the measured target-specific population responses in V4 were inconsistent with a complete cancellation of response decreases associated with target disappearance and response increases associated with filling-in.
Spontaneous perceptual fading of low contrast targets in the visual field periphery has been suggested to be mediated by local adaptation after spontaneous reductions of fixational eye movements (Martinez-Conde, 2006). This account can neither explain MIB nor the concomitant responses in visual cortex. During MIB, targets surrounded by a static mask rarely disappear, and targets of high contrast disappear more (Bonneh et al., 2001; Hsu et al., 2004). This and other psychophysical observations indicate that MIB is governed by other mechanisms than those determining peripheral fading. Furthermore, retinal stabilization would predict target-specific response decreases in V1 at target disappearance. In contrast, we observed robust target-specific response decreases in higher-tier ventral visual cortex, but not in V1. Retinal stabilization is also incompatible with the delayed global response decrease measured in the unstimulated subregion of V1. Finally, it is unlikely that retinal stabilization causes increases in cortical activity to moving patterns, as we observed during target disappearance in dorsal visual areas.
The nonsensory global response observed in early visual cortex during MIB and replay shared several features with spatially unspecific neural responses measured in previous studies. First, fMRI response increases with a similar topography occur in early visual cortex also during detection and discrimination tasks (Jack et al., 2006). Second, the widespread P300 component of the event-related potential likewise depends on temporal uncertainty about the occurrence of task-relevant sensory events (Hillyard et al., 1971; Squires et al., 1976; Bledowski et al., 2004). Specifically, reversals of the perception of ambiguous figures are followed by a strong P300 (Kornmeier and Bach, 2004, 2006). Third, the noradrenergic and cholinergic systems (which have widespread projections into visual cortex) also respond strongly to the unpredictable occurrence of task-relevant stimuli (Aston-Jones and Cohen, 2005; Bouret and Sara, 2005; Yu and Dayan, 2005).
However, the global response component observed here exhibited one striking difference to previously reported global cortical responses, specifically the P300 (Klotz and Ansorge, 2007). In the present study, the global component reflected the sign of the illusory or physical change of the target, decreasing with target disappearance. Thus, it cannot simply be explained in terms of increased arousal triggered by the illusory or physical change (Huk et al., 2001). The global component in early visual cortex may play a more specific role in perceptual organization, such as stabilizing the newly selected percept (Einhäuser et al., 2008). Additional experiments are needed to determine the role of the global component, if any, in perceptual organization.
fMRI responses have been found to correlate with modulations of the cortical local field potential (LFP) and/or of spiking activity (Heeger and Ress, 2002; Logothetis and Wandell, 2004). Multiunit spiking activity and LFP power in the gamma frequency band modulate strongly with target suppression in V4 during “generalized flash suppression,” a perceptual phenomenon similar to MIB (Wilke et al., 2006). In contrast, LFP power in V1 correlates strongly with target disappearance only in lower (alpha and beta) frequency bands (Wilke et al., 2006). This dissociation between electrophysiological signal components might be related to the target-specific and global components we observed in the fMRI response during MIB. We speculate that the target-specific response component might reflect local modulations of spiking activity and/or the gamma-band LFP (Logothetis and Wandell, 2004; Liu and Newsome, 2006; Nir et al., 2007), and that the global component might reflect modulations in the lower frequency range of the LFP. To test this correspondence, it will be critical to characterize the topography of the electrophysiological signal components correlated with target suppression.
How do our results compare with those of microelectrode recordings in monkeys and fMRI in humans during binocular rivalry (Blake and Logothetis, 2002; Tong et al., 2006)? Observers' target perception during MIB was closely linked to the initial response modulations of the target subregions in V4. This is consistent with changes of single-unit activity in V4 preceding monkeys' perceptual reports during binocular rivalry (Leopold and Logothetis, 1996). Together, these observations indicate that the strength of the neural representation of a target stimulus in ventral visual cortex is closely linked to observers' perception of the target (Blake and Logothetis, 2002; Wilke et al., 2006). There were, however, several differences between our MIB results and those that have been reported for rivalry. First, previous fMRI studies of rivalry reported response modulations in earlier visual areas including V1 and even the LGN (Polonsky et al., 2000; Tong and Engel, 2001; Haynes et al., 2005; Lee et al., 2005, 2007; Meng et al., 2005; Wunderlich et al., 2005). Some of these have been shown to be spatially (and temporally) specific (Haynes et al., 2005; Lee et al., 2005, 2007; Meng et al., 2005). During MIB, in contrast, the target-specific response modulations in V1 were not robust. This suggests that the spontaneous target disappearance emerged at a higher level of visual cortical processing. Second, the global response component, evident throughout early visual cortex during MIB and during replay of MIB, has not been reported for binocular rivalry. Third, target disappearance during MIB, but not during replay, was associated with transient opposite polarity responses in the mask-specific subregions in dorsal visual cortex. Such dissociation between the subjective illusion and its physical replay has not yet been reported for binocular rivalry.
Our findings are consistent with the notion that the mask representation in the dorsal pathway plays a causal role in the spontaneous suppression of the target representation in the ventral pathway. The opposite target-specific responses in V4 and mask-specific responses in V3AB and pIPS were not evident in earlier visual areas. These opposite responses were specific to MIB, not occurring during the physical removal of the target. Moreover, the response increase in the IPS predominated in the hemisphere contralateral to the target representation, suggesting that an integrated representation of the mask object was engaged in competition with the target representation in V4 (Stoner et al., 2005). This pattern of results is inconsistent with models of MIB based only on mechanisms circumscribed to the cortical target subregion, such as boundary adaptation, which might underlie perceptual filling-in (Hsu et al., 2004, 2006). However, our results do not rule out the hypothesis that such local mechanisms may contribute, in conjunction with long-range cortical competition, to the target disappearance (Hsu et al., 2004, 2006).
The spatially specific response increases in V3AB and pIPS during target disappearance suggest that spontaneous fluctuations of endogenous attention may have caused the target to disappear (Bonneh et al., 2001). The target-specific V4 response decreases and the mask-specific IPS response increases are both incompatible with capture of attention by the target disappearance (i.e., an attention shift to the target). Instead, they are consistent with a spontaneous attention shift from the target to the mask, occurring before the perceptual report. Perceptual transitions in bistable perception and the concomitant changes of neural activity can occur even in the absence of attention shifts (Lee et al., 2007; Pastukhov and Braun, 2007). But endogenous attention may nevertheless be, among others, one important factor initiating such transitions (Leopold and Logothetis, 1999; Chun and Marois, 2002; Meng and Tong, 2004; Stoner et al., 2005). One possible interpretation of the present results, therefore, is that spontaneous attention shifts are the dominant cause of response decreases in ventral visual cortex leading to the subjective target disappearance during MIB.
Footnotes
-
This work was supported by Leopoldina National Academy of Science Grant BMBF-LPD 9901/8-136 (T.H.D.), National Institutes of Health Grant R01-EY16752 (D.J.H.), and the Weizmann–New York University Demonstration Fund in Neuroscience. We thank Jeremy Freeman, Sang-Hun Lee, Markus Siegel, and the members of the Heeger Laboratory for comments.
- Correspondence should be addressed to Tobias H. Donner, Department of Psychology and Center for Neural Science, New York University, 6 Washington Place, Eighth Floor, New York, NY 10003-6634. tobias.donner{at}nyu.edu