The visual system interprets most visual scenes according to a single best interpretation, thereby effectively disambiguating the inherently ambiguous information provided by retinal stimulation. The phenomenal instability of ambiguous figures like the Necker cube provides an especially dramatic and compelling example of this more general ambiguity, but the underlying processes are believed to be essentially the same as those involved in an observer's normal visual perception (Long and Toppino, 2004). Thus, ambiguous or bistable figures exemplify the constructive nature of vision, which normally occurs so rapidly and effortlessly that it is difficult to explore the complex mechanisms that underlie the process.
There are two general approaches to explain ambiguous figures. One favors the more passive or bottom-up processes of neural adaptation and recovery among cortical structures, whereas the other approach stresses top-down influences on illusory reversals by means of cognitive processes like hypothesis testing, problem solving, or voluntary attention. Both lines of research have well established psychophysical support (for review, see Long and Toppino, 2004).
In a recent study, Weilnhammer et al. (2013) used functional magnetic resonance imaging (fMRI) to identify brain regions involved in the perception of ambiguous stimuli and investigated their effective connectivity by means of dynamic causal modeling (DCM). The authors presented participants with rotating Lissajous figures, which are perceived as 3D objects that spontaneously change their rotation direction (Weilnhammer et al., 2013, their Fig. 1). In contrast to other ambiguous stimuli, this change of direction happens almost instantaneously (i.e., with a very short transition time) and most frequently at specific configurations of the stimulus. To create a control condition, the authors used rotating Lissajous figures with a slight phase shift, thereby disambiguating the percept. This sophisticated “replay” condition exogenously triggered perceptual changes, but otherwise mimicked the ambiguous condition as closely as possible. Most importantly and in contrast to previous work, ambiguous and replay conditions did not differ in transition times, which was confirmed by psychophysical pilot studies and participants' reports during debriefing after the experiment.
The issue of transition times is important, because previous fMRI studies reported reversal-related brain activity in frontal and parietal regions, but it remained debatable whether these activations reflect attentional top-down processes that initiate (i.e., cause) perceptual reversals (Sterzer and Kleinschmidt, 2007), or whether they are a mere consequence of perceptual reversals. Support for the latter view was recently presented by Knapen et al. (2011), who analyzed frontoparietal activity with respect to the length of transition times between the perceptual alternatives. These authors found higher blood-oxygen level-dependent (BOLD) activity for transitions with longer durations in both ambiguous and replay conditions and concluded that frontoparietal regions are activated in response to differences in perceptual transitions.
In line with previous studies, Weilnhammer et al. (2013) found increased BOLD activity in the right inferior frontal gyrus (IFG) and right human motion complex (hMT+) during the perceptual reversals of ambiguous stimuli. Because ambiguous and control conditions were carefully matched, however, the authors could conclude that the observed frontoparietal activations were not merely caused by differences in transition times. Taken alone, however, this analysis cannot resolve the timing issue of whether the parts of the identified network were activated in a feedforward (bottom-up) or feedback (top-down) manner, because a causal role for a given area (e.g., the frontoparietal cortex) in initiating figure reversals cannot be inferred from correlative neurophysiological measures.
To explain their correlative fMRI results in terms of either top-down or bottom-up processing, or a mixture of both, Weilnhammer et al. (2013) used DCM to model causal relationships between the regions identified by their fMRI analyses. They fitted different models to the signal time courses observed at right IFG and hMT+. A top-down model, assuming that perceptual reversals are associated with a modulation of top-down connectivity from IFG to hMT+, led to the highest probability in the Bayesian model comparison. The authors concluded that right IFG activity might play an important role during bistable perception and that their results are in line with previous studies suggesting a causal role of right frontal and parietal cortices in perceptual transitions.
Whereas these conclusions are convincing in the light of the data presented, it might be argued that the experimental design has favored a dominant role of top-down processes. In particular, the duration of experimental blocks was quite short (40–43 s) for adaptation-like bottom-up processes to fully emerge, since it is well established in the research of bistable phenomena that the number of perceptual reversals systematically increases during the first minutes of viewing before it reaches an asymptote (for review, see Long and Toppino, 2004).
According to this bottom-up interpretation of bistable perception, prolonged viewing of an ambiguous figure results in neural adaptation, which is specific to networks underlying the representation of one of the possible perceptions. When the level of adaptation reaches a critical threshold, networks mediating the competing representation suddenly dominate, resulting in a perceptual reversal. A competitive adaptation-recovery cycle between the two representational networks is then instantiated. Assuming that the recovery from adaptation is incomplete during each recurring cycle, the time to reach the critical threshold decreases until an asymptotic reversal rate is reached (Long and Toppino, 2004). Hence, it is possible that an effective bottom-up processing can only be observed after several minutes of viewing.
On a more general note, the findings of Weilnhammer et al. (2013) further demonstrate the usefulness of fMRI in identifying the neuronal network underlying figure reversals. The neurophysiological mechanisms, however, by which the respective brain areas (prefrontal and/or parietal lobes) exert their influence on low-level areas of the network cannot be identified by this tool and remain elusive (Logothetis, 2008). In this context, brain oscillations in different frequency bands as identified by electroencephalography (EEG) and magnetoencephalography (MEG) provide important candidate mechanisms reflecting both local bottom-up processes and more global large-scale interactions within widely distributed cortical networks (Engel et al., 2001). For instance, frontal gamma band activity during perceptual reversals might represent top-down signals from the frontal lobes, establishing feature binding of relevant object representations during attentional selection; in contrast, during phases of perceptual maintenance, activity in the alpha band seems more prominent (for review, see Kornmeier and Bach, 2012). Thus, the spatiotemporal dynamics of brain oscillations render them well suited for studying interrelations of bottom-up and top-down mechanisms within a hybrid model of figure reversal.
In addition, fMRI may fail to detect more subtle changes in neural activity. It has been discussed that short-term synchronization of synaptic activity without significant increase of metabolic demand may not trigger a hemodynamic response, yet such neural events would be well detectable by means of electrophysiological recordings. This is of special relevance for stimulus constellations and experimental designs that cause a nearly continuous engagement of cortical regions, inducing too few fluctuations of hemodynamic activity to be found by standard analyses of BOLD imaging. Furthermore, the sluggishness of the hemodynamic response leaves fMRI with a temporal resolution in the order of seconds, which may not always suffice to dissociate prestimulus from poststimulus activity, or feedforward from recurrent processing if occurring in too close temporal proximity. Here, the simultaneous recording of EEG and fMRI, associated with the comparison of unimodal and multimodal analyses, is best suited to reveal both the regions and the neural mechanisms involved in the processing of ambiguous stimuli (Huster et al., 2012).
To bridge the gap between correlation and causality, however, fMRI and EEG should be complemented by neuromodulation methods. Transcranial magnetic stimulation (TMS), for instance, has been successfully used to transiently disrupt specific cortical areas to probe their causal role in bistable perception (Zaretskaya et al., 2010), but, again, TMS-induced virtual lesions are not informative with respect to potential mechanisms underlying figure reversals, that is, brain oscillations. New techniques for noninvasive human brain stimulation like rhythmic TMS and transcranial alternating current stimulation (tACS) have recently become available, allowing modulation of brain oscillations. The effectiveness of these methods to entrain brain oscillations in different frequencies has been demonstrated for visual perception in general (Romei et al., 2010), as well as specifically for bistable perception (Strüber et al., 2013), thereby providing encouraging evidence that these techniques indeed have the potential to establish the causal role of neural oscillations for perception and cognition.
By using DCM for the assessment of causal dependencies between brain regions associated with bistable perception, Weilnhammer et al. (2013) successfully took a new direction in the study of ambiguous stimuli. Future studies should complement this important work by using combined recording and analysis of fMRI and EEG, as well as noninvasive brain stimulation methods to further our understanding of how the brain constructs a viable model of its surrounding environment.
Footnotes
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
This work was supported by the German Research Foundation (DFG RA 2357/1-1 to S.R.; HU 1729/2-1 to R.J.H.). We thank Daniel Strüber for helpful discussions.
- Correspondence should be addressed to Stefan Rach, Experimental Psychology Laboratory, Department of Psychology, European Medical School, Carl von Ossietzky Universität, Ammerländer Heerstrasse 114-118, 26129 Oldenburg, Germany. stefan.rach{at}uni-oldenburg.de