In 1984, Duncan (1984) demonstrated that visual attention prioritizes whole objects: subjects performed better when reporting two features on the same object than when reporting one feature from each of two adjacent objects. This marked the beginning of the field of object-based attention research (Egly et al., 1994). Later studies showed that the neural (fMRI) representation of multiple attributes belonging to a target object is enhanced relative to those of a control object (O'Craven et al., 1999), and that object-based attention has a temporal dimension in sampling the target and control objects at different frequencies (Fiebelkorn et al., 2013).
The view of attention as a homogeneous construct has recently been called into question. Attention is now viewed as an umbrella term for multiple subprocesses [feature-based attention (Treisman and Gelade 1980); spatial attention (Posner, 1980); top-down vs bottom-up attention (Buschman and Miller, 2007); and sustained attention (Rosenberg et al., 2016)]. Object-based attention (Egly et al., 1994), as described above, is one of these subprocesses. This heterogeneity in the attention literature has prompted a number of scientists (Chun et al., 2011) to advocate for taxonomizing the construct to facilitate greater precision in future research.
In a series of recent articles, Summerfield and Egner (2009, 2016) propose that not only is the attention literature heterogeneous in consisting of multiple subcomponents, but that some experiments designed to investigate attention may confound attention itself with the conceptually distinct construct of expectation. Summerfield and Egner (2009, 2016) postulate that expectation is driven by information about the probability of an upcoming event while attention is driven by information about its relevance. In contrast to the straightforward conceptual difference between attention and expectation, the difference in experimental design required to induce each of these two cognitive processes is subtle. In both attention and expectation experiments, a cue provides information about an upcoming target with some probability, and the presence of this cue facilitates performance (Summerfield and Egner, 2016). Summerfield and Egner (2009, 2016) argue that if the cue provides information about the probability of occurrence of an upcoming event and the task of the subject is simply to indicate whether the event occurred, then the experiment elicits a neurocognitive expectation process. If the experiment is instead designed so that the cue indicates which of multiple dimensions is task relevant and the task of the subject is to respond to some feature within the relevant dimension, then the experiment elicits an attention mechanism (Summerfield and Egner, 2009; 2016). Summerfield and Egner (2009, 2016) further speculate that expectation and attention might rely on distinct neural mechanisms: an expected event in many cases results in decreased neural processing, while an attended event results in increased processing (Summerfield and Egner, 2009). This recasting of portions of the attention literature as expectation enables interpretation of experimental results from the perspective of predictive coding, a computational theory of how the brain facilitates perception from sensation by minimizing the prediction error between expected and received sensory input (Huang and Rao, 2011).
If the central claim by Summerfield and Egner (2009, 2016), that a subtle adaptation to existing attention experiments induces a distinct but parallel process to attention, is correct, then this raises the question of whether expectation, like attention, is also a heterogeneous construct. This theory creates the opportunity to branch out new expectation subfields in parallel to the more mature subdomains of attention research (e.g., feature based; spatial; top-down vs bottom-up; object-based; and sustained expectation). The field of feature-based expectation has already been introduced (Summerfield and Egner, 2016). Another possible new subfield could be object-based expectation, paralleling the existing field of object-based attention (Jiang et al., 2016). An open question within this proposed framework is whether the expectation statuses of individual features interact to form object-level expectation.
In a recent article in The Journal of Neuroscience, Jiang et al. (2016) used a computational model to simulate three competing models for how expectations about one object feature affects expectations about another (Jiang et al., 2016, their Fig. 3), as follows: (1) expectations do not spread from one object feature to the other; (2) a prediction error in one feature spreads to the other feature to render the whole object unexpected (the reconciliation hypothesis); and (3) the expectation status of one feature repels the expectation status of the other feature, thereby promoting the perceptual inference that the two features belong to separate objects (the segregation hypothesis).
The authors performed a behavioral experiment in which subjects were presented with a cloud of moving dots, which were either red or green (color dimension) and were moving either up or down (motion dimension; Jiang et al., 2016, their Fig. 1a). A trial-by-trial auditory cue generated an expectation of the upcoming feature values with 75% validity: the timbre signaled the upcoming color, and the pitch direction signaled the upcoming motion direction. When subjects were instructed to allocate sustained attention to one feature (color), the trial-by-trial expectation cue was associated with a behavioral benefit in response times not only to the attended feature, but also to the unattended feature (motion; Jiang et al., 2016, their Fig. 1c). This effect was interpreted as evidence for cross-feature spread of expectation in support of the reconciliation hypothesis.
To examine the neural substrates of this effect, Jiang et al. (2016) performed the same experiment using fMRI. They analyzed their data using intersubject multivoxel pattern analysis (for review, see Haxby et al., 2014), comparing the voxel-by-voxel pattern of activation between conditions in different regions of the brain. In one analysis, a linear support vector machine classifier was trained to distinguish activity patterns in early visual cortex resulting from two conditions in which expectation was consistent (color and motion both expected vs color and motion both unexpected). In a second analysis, the same classifier was trained to distinguish between two conditions where expectation was inconsistent between features (color expected and motion unexpected vs color unexpected and motion expected). The classifier showed greater accuracy when separating the two consistent conditions than when separating the two inconsistent conditions (Jiang et al., 2016, their Fig. 4f,g). This result matched the predictions of the reconciliation hypothesis and neither of the alternative hypotheses (Jiang et al., 2016, their Fig. 4a–c). The authors concluded that objects are the unit of selection, not only for attention, but also for expectation.
In considering the behavior results, readers with a psychology background will note an analog in the classical Stroop (1935) and Simon effects (Simon and Wolf, 1963): performance on experimental tasks that require subjects to keep track of a conjunction of inconsistent features is worse than performance on tasks that require subjects to keep track of only a single feature. These behavioral effects indicating interference by inconsistent feature conditions are common in the psychology literature and need not be interpreted as object-level perceptual selection.
In considering the fMRI results, we note that, to use a relative comparison of the performance of two classifiers to draw conclusions about brain function, the two classifiers must have an equal chance at separating the data, aside from the hypothesis to be tested. If we assume that early visual cortex contains neural ensembles that separately encode color and motion, then, from a predictive coding perspective (Rao and Ballard, 1999; Alink et al., 2010), the color ensemble will be highly active when the color is unexpected and the motion ensemble will be highly active when the motion direction is unexpected. The classifier for the two consistent conditions must then separate the state of cortex in which both ensembles are highly active from the state in which neither ensemble is highly active. This could in principle reduce to a comparison as to whether visual cortex is active versus inactive. The classifier for the inconsistent conditions, however, must separate the two intermediate states where the color ensemble is highly active and the motion ensemble is not from the state where the motion ensemble is highly active and the color ensemble is not. This would entail learning to separate complex spatiotemporal patterns of activity in the motion and color ensembles, a task that is more difficult than separating high from low activity. This scenario could explain the result of the classifier comparison without making inferences about cross-feature spread of expectation in early visual cortex. Therefore, the presented conclusion may not be the only interpretation of these data.
A simple control could have resolved this issue: the inclusion of a second object, such as an additional spatially separate cloud of dots. Experiments on object-level spread of attention (Duncan, 1984; Egly et al., 1994; O'Craven et al., 1999; Fiebelkorn et al., 2013) typically include a control object for the purpose of demonstrating that attention does not spread there. A control object would provide a stronger test of the three hypotheses: no spread to a control object would support the object-based interpretation, but, without a control object, the results are difficult to interpret. For instance, they might indicate spatial spread of expectation.
Future experiments on object-based expectation would further benefit from considering the role of time, a feature that has recently started to gain traction in the attention literature (Buschman and Kastner, 2015; Denison et al., 2017). From the perspective of a perceptual decision-making framework (Kayser et al., 2010), it is possible that probability- and relevance-driven selection interact over the time course of a perceptual decision. It is known that object-based attention alternates between a cued and uncued location on the same object at a faster frequency (8 Hz) than between a cued location on one object and an uncued location on a different object (4 Hz; Fiebelkorn et al., 2013). Rhythmic sampling on subsecond timescales has also been demonstrated for spatial attention (Landau and Fries, 2012), raising the possibility that rhythmic brain activity may support multiple forms of attention. One hypothesis is that expectation may align the phase of ongoing rhythmic attentional sampling to optimize cortical excitability at the time of an expected event. Neural oscillations may disentangle these rapid endogenous prioritization processes during feature integration (Helfrich and Knight, 2016). Delineating the spatiotemporal progression of these rapid mechanisms will necessitate the use of recording methods with greater temporal resolution than fMRI, such as electroencephalography, magnetoencephalography, and electrocorticography. In the meantime, the present study (Jiang et al., 2016) is a step in the right direction toward understanding the multiple processes that are collectively called attention.
Footnotes
Editor's Note: These short reviews of recent JNeurosci articles, written exclusively by students or postdoctoral fellows, summarize the important findings of the paper and provide additional insight and commentary. If the authors of the highlighted article have written a response to the Journal Club, the response can be found by viewing the Journal Club at www.jneurosci.org. For more information on the format, review process, and purpose of Journal Club articles, please see http://jneurosci.org/content/preparing-manuscript#journalclub.
This work was supported by National Institute of Neurological Disorders and Stroke Grant R37-2113532 to Professor Robert T. Knight (Ph.D. advisor to S.J.K.S.) and by the Alexander von Humboldt Foundation (R.F.H.). We thank Falk Lieder, Sebastian Musslick, Jacob Miller, Jesse Livezey, Rika Antonova, and Robert Nishihara for helpful discussion.
The authors declare no competing financial interests.
- Correspondence should be addressed to S. J. Katarina Slama, Helen Wills Neuroscience Institute, Barker Hall #210C, Berkeley, CA 94720. slama{at}berkeley.edu