There is an extensive literature about the neuronal processes in the visual system that enable us to appreciate spatial structure at different scales. Basically, spatial receptive fields increase in size and complexity from early- to high-level visual areas. But what about our sense of temporal structure? From momentary motion percepts and motor programs to language and conscious cognition, and on to the scale of the lifespan, all human experience is temporally structured. Like spatial perception, temporal perception operates at different scales, from subsecond dynamic stimulus patterns to the course of events as they unfold in a meaningful sequence. The perception of temporal structure has not received the attention it deserves.
A fundamental feature of our world is its temporal asymmetry: many of the temporal patterns that we perceive never occur in reverse. Just as gravity orients many objects in space (e.g., faces, houses, and trees; all appear most frequently in a canonical “upright” orientation), so causality orients event sequences in time. As a result, the statistics of natural sequences are not, in general, symmetric with respect to the direction of time. This temporal asymmetry concerns small as well as large temporal scales and low- as well as high-level perceptual processes. We suspect that this asymmetry concerns all sensory modalities. It is certainly present in motor control and cognition itself. Beyond vision, we may consider examples such as our perception of phonemes, words, sentences, and stories, our stream of conscious thought, or our attribution of intentionality to others' actions.
In the same way that face perception relies on the upright appearance of faces for optimal performance (upside-down faces are harder to recognize, for example), our perception of temporal patterns is likely to rely on the canonical temporal orientation of dynamic stimulus patterns. At the most general level, our ability to exploit the natural temporal structure of natural-world statistics allows us to anticipate everyday events and to understand the environment we live in. Exactly how this occurs remains a mystery.
A ground-breaking recent investigation by Hasson et al. (2008) begins to address this challenging topic. The authors used functional magnetic resonance imaging (fMRI) to study the temporal response properties of different brain regions. To investigate temporal structure across a wide range of scales from moments to minutes, the authors used silent movies with complex story lines as stimuli. Commercial movies are designed, arguably, to drive higher cortical regions in predictable ways (Hasson et al., 2004), albeit perhaps to different degrees, depending on the screenplay and director. To study the dependence of the responses on the temporal sequence, these complex, natural, and dynamic stimuli were presented to subjects in forward and backward temporal sequence. The movies were also presented after cutting them into segments of varying length and randomly reordering those segments to create shuffled clips (this was called “time scrambling” at different time scales). Based on their own earlier work (Hasson et al., 2004), the authors explore new territory with these unconventional stimuli: most visual fMRI studies have relied on stimuli of simple (or degenerate) temporal structure, such as static pictures or moving dots.
Similar to turning a natural visual stimulus, such as a face, upside down, time reversal is a useful stimulus manipulation because it preserves many features of the original stimulus: the set of stimulus frames is the same, equating a host of properties computed from this set, including all spatial features. Moreover, temporal frame adjacency is preserved, the spatiotemporal frequency spectrum is unaltered, and velocity and acceleration of visual motion remain the same (in the opposite direction). But time reversal makes natural sequences unnatural, just as an upside-down face is unnatural and therefore more difficult to recognize.
Using conventional activation analysis, Hasson et al. (2008) asked how response amplitude in different brain regions differs between forward, backward, and time-scrambled presentations. Based on the plausible assumption of neurons tuned to natural spatiotemporal patterns, one might expect reduced responses to the less natural and less comprehensible backward and time-scrambled stimuli. However, Hasson et al. (2008) found similar and sometimes higher levels of activity when movies were viewed in reverse [Hasson et al. (2008), their Fig. 6 (http://www.jneurosci.org/cgi/content/full/28/10/2539/F6)]. One explanation is the increased mental effort subjects might have made to comprehend the sequence.
The most interesting result is revealed when the authors go beyond conventional activation analysis and investigate the effect of their temporal stimulus manipulations on the reliability of the response time courses. Building on previous studies (Bartels and Zeki, 2004; Hasson at al., 2004), they investigated the correlation between regional-average time courses for repeated presentations of the same stimulus (forward, backward, or time scrambled). Complex natural stimuli such as movies elicit complex response time courses. Correlating time courses from repeated presentations obviates the need for an explicit model of the expected responses and allows the authors to assess to what extent the response is stimulus driven: a strongly stimulus-driven response should be reproducible with high accuracy by repeating the stimulus, yielding a high correlation. A low correlation, however, would indicate that a large proportion of the variance of the region's activity is not stimulus related. That variance could reflect either internal brain dynamics or noise.
Early visual areas and motion-sensitive area hMT+ showed highly stimulus-driven activity time courses under all conditions. When the temporal sequence was reversed or scrambled, so was the activity time course, but the time course was equally reliable. This is consistent with the idea that these regions represent approximately instantaneous statistics of the stimulus. Several higher regions, including the superior temporal sulcus (STS), the precuneus, the posterior lateral sulcus (LS), the temporal parietal junction (TPJ), and the frontal eye field (FEF), exhibited highly reliable, stimulus-driven activity time courses to forward presentations of the movie clips, but responded less predictably when the temporal sequence was disrupted. Whereas the STS and precuneus required integrity of the natural temporal sequence at an intermediate time scale of ∼12 s to respond reliably, the LS, TPJ, and FEF required longer time-scale sequence integrity (∼36 s). This suggests that the latter regions accumulate stimulus information over longer periods.
Hasson et al. (2008) introduce another interesting analysis: they correlated the time course during forward presentation with the reversed time course during backward presentation (correcting for hemodynamic lag). At a qualitative level, the results from these analyses are consistent with the reliability findings described above: the regions responding less reliably to backward presentations also show lower correlations between forward and reversed backward time courses. Quantitatively, however, the forward-to-reverse-backward correlations tended to be lower than the backward–backward correlations, despite the fact that the backward–backward correlations are doubly affected by the decreased reliability. This tendency was consistent across regions and held even for V1, hMT+, the lateral occipital complex, and the parahippocampal place area (PPA) [Hasson et al. (2008), their Fig. 2B,C (http://www.jneurosci.org/cgi/content/full/28/10/2539/F2)]. Assuming that the authors' appropriate correction for the temporal asymmetry of the hemodynamic response succeeded completely, this result suggests that the backward time courses are not merely less reliable, but also different in terms of their stimulus-driven (reliable) component. This would be expected under the theory that neurons are tuned to specific natural spatiotemporal patterns that do not naturally occur in reverse. However, it could also be accounted for by other models, including evidence accumulators, which would not necessarily need to the tuned to complex natural stimuli or even spatiotemporal sequences (consider our thought experiment in Fig. 1).
In analogy to the established concept of the spatial receptive field, Hasson et al. (2008) introduce the concept of “temporal receptive window” to interpret their findings. Early visual cortex and hMT+ thus have the shortest temporal receptive windows, the STS and precuneus are intermediate, and the LS, TPJ, and FEF, have long temporal receptive windows (∼36 s). Although there seems to be some relationship between spatial receptive-field size and temporal receptive-window size (with early visual cortex, for example, having small receptive fields and a short temporal receptive window), the two hierarchies may not exactly coincide (the higher ventral stream PPA, for example, may have large receptive fields and a relatively short temporal receptive window).
Areas with intermediate and long temporal receptive windows may process narratives, theories of mind, and predictions of the sequence of events. The FEF exhibited an unexpectedly long temporal receptive window. Unlike monkey FEF, however, human FEF is not well defined. It is generally accepted that human FEF is located within Brodmann's area 6, between the dorsal and ventral premotor cortices (Paus, 1996). Previous functional imaging studies have shown lateral portions of dorsal premotor cortex, which could correspond to the region described as FEF in the current study, to be activated during the temporal orienting of attention (Coull and Nobre, 1998; Coull et al., 2000). The long temporal receptive window could thus be related to the high-level cognitive process of orienting attention selectively toward specific moments in time. In the present experiments, this temporal attention mechanism may have been especially busy during the viewing of temporally disrupted movie sequences, because the subject more frequently needs to reorient and redefine expectations.
Like the receptive-field concept, the concept of temporal receptive window is certainly useful, but it raises several questions. Does the temporal receptive window indicate the time period of the temporal pattern template a neuron responds to? Or does it indicate the length of the temporal window within which a temporally less-extended pattern is detected? The latter interpretation would be analogous to the spatial receptive field of a position-invariant neuron. It would imply a persistent response whose presence would indicate that the pattern the neuron responds to “has recently occurred.”
The questions continue: When we imagine a face not seen in years, higher visual regions are thought to host the imagery representation constructed from memory. Does this indicate that these regions' temporal receptive window spans years or a lifetime? Or should we account for such top-down effects involving memory as “extratemporal-receptive-window effects,” in analogy to the extratemporal-receptive-field effects postulated in the spatial domain (Blakemore and Tobin, 1972; Lamme 1995)? The analogy between spatial receptive field and temporal receptive window, thus, may foreshadow both promise and potential problems with the new concept.
From a computational perspective, it is not trivial to predict how time reversal of a stimulus will affect the response time course of a single neuron. In Figure 1, we started to imagine how six different hypothetical neurons might respond to a brief natural movie snippet played forward and backward. This thought experiment suggests that the effects of the stimulus manipulations explored by Hasson et al. (2008) depend on the type of neuron and the targeted level of organization (single neuron, pattern information, regional activation). A low forward-to-reverse-backward correlation could have a variety of causes including a simple evidence-accumulation process as well as tuning to a natural spatiotemporal sequence that does not naturally occur in reverse. It will be exciting to see future studies explore these questions further. One avenue would be to combine the stimulus manipulations introduced by Hasson et al. (2008) with pattern-information analysis (Norman et al., 2006) to assess to what extent a region's activity pattern distinguishes forward and backward presentations (Fig. 1, second column from the right) and also to what extent it distinguishes a given pair of brief stimuli when both are presented either forward or backward.
In the big picture of systems neuroscience, Hasson et al. (2008) beautifully demonstrate the limits of a deterministic stimulus–response model for higher cortical regions. Whereas complex natural stimuli (e.g., movies in forward presentation) can reliably drive higher regions, the activity of these regions appears to be similar on average but less predictable in its fluctuations when a the stimulus is nonsensical to the region (e.g., played backward or time scrambled) or absent (Nir et al., 2006). This suggests an intriguing interpretation: rather than slavishly following the stimulus (or shutting up when the stimulus is not to their preference), higher regions may engage and disengage with the external world dynamically. And when disengaged, they may remain quite active [Hasson et al. (2008), their Fig. 6 (http://www.jneurosci.org/cgi/content/full/28/10/2539/F6)], apparently involved in an intrinsic dynamic of their own.
The study by Hasson et al. (2008) combines conceptual, methodological, and theoretical contributions, which will be particularly important to future studies using complex natural stimuli. Methodologically, it suggests that we should separately consider the activity elicited by a stimulus and the reliability of the response (two aspects confounded when considering a t value or significance test). It also demonstrates a very useful general method for assessing the reliability of a complex response pattern: correlating its replications. In terms of brain theory, Hasson et al. (2008) provide a glimpse of the global picture of how our percepts of temporal structure are constructed in the brain. The concept of temporal receptive window seems set to inspire many future studies to follow up on the questions it raises in time.
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
- Correspondence should be addressed to Fabiana M. Carvalho, Department of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK.