Abstract
In the attentive tracking task, observers track multiple objects as they move independently and unpredictably among visually identical distractors. Although a number of models of attentive tracking implicate visual working memory as the mechanism responsible for representing target locations, no study has ever directly compared the neural mechanisms of the two tasks. In the current set of experiments, we used electrophysiological recordings to delineate similarities and differences between the neural processing involved in working memory and attentive tracking. We found that the contralateral electrophysiological response to the two tasks was similarly sensitive to the number of items attended in both tasks but that there was also a unique contralateral negativity related to the process of monitoring target position during tracking. This signal was absent for periods of time during tracking tasks when objects briefly stopped moving. These results provide evidence that, during attentive tracking, the process of tracking target locations elicits an electrophysiological response that is distinct and dissociable from neural measures of the number of items being attended.
Introduction
To successfully interact with a dynamic environment, humans need to maintain representations of their environment over time and track the changing spatial positions of those representations when necessary. Although a great deal of work has been devoted to understanding the maintenance of these representations, the critical process of tracking spatial position of targets as they move has been comparatively neglected. Here, we demonstrate a novel neural signature that is unique to the process of tracking target positions, along with some initial estimates of the relevant scalp topography and time course.
In visual working memory (VWM) experiments (Luck and Vogel, 1997), observers have to actively maintain target representations over time. In multiple object tracking (MOT) experiments (Pylyshyn and Storm, 1988; Alvarez and Cavanagh, 2005), this requirement is coupled with the need to continually monitor spatial position as targets move. Although recent neuroimaging studies have begun to characterize the neural mechanisms that underlie MOT (Culham et al., 2001; Jovicich et al., 2001; Howe et al., 2009), it has proven difficult to separate activity related to monitoring changing target positions from maintenance activity. One approach to delineating these mechanisms is to contrast the neural activity during MOT with that observed during VWM, deducing that the difference between the two primarily reflects activity that is related to the process of tracking the spatial position of the targets. This study is the first to directly compare activity between these two tasks within the same experiment.
Neuroimaging studies of both VWM (Linden et al., 2003; Todd and Marois, 2004; Xu and Chun, 2006) and MOT (Culham et al., 2001; Jovicich et al., 2001) have shown that activity within the intraparietal sulcus (IPS) increases parametrically with target load, suggesting that this area is involved in the representation of target information. This finding is mirrored by human event-related potential (ERP) studies that have found sustained contralateral activity over posterior parietal electrode positions that is modulated by target load in both VWM (Vogel and Machizawa, 2004; Vogel et al., 2005; McCollough et al., 2007) and MOT tasks (Drew and Vogel, 2008). Importantly, these target load responses [termed “contralateral delay activity” (CDA)] have been shown to saturate near the behaviorally derived capacity limitations of performance in these tasks, bolstering the supposition that this activity is related to on-line representation of target information.
In contrast, the neural mechanisms that underlie the ability to track the spatial position of targets are not well understood. In addition to IPS, MOT tasks elicit activation across a wide range of cortical regions such as the frontal eye fields (FEFs), the superior parietal lobule (SPL), and the human motion complex (MT+). Thus far, it has been difficult to establish which areas are related to the process of tracking target positions, as opposed to processes serving other perceptual, cognitive, or oculomotor components of the task (Howe et al., 2009). Since tracking the current positions of each target as it moves throughout the visual field must be a quickly changing, dynamic process, the sluggish temporal resolution of functional magnetic resonance imaging (fMRI) may be insufficient to adequately characterize such a rapid process.
Materials and Methods
Overview
We examined lateralized ERP responses across task and stimulus manipulations to isolate separable neural mechanisms that are responsible for two core components of attentive tracking: target maintenance and tracking target locations. In previous work, our group has demonstrated strong evidence that CDA amplitude is an index of target maintenance. In experiment 1a, we compared the electrophysiological response to lateralized versions of a tracking task and a change detection task. In experiment 1b, we again measured performance in a tracking task and a change detection task, but here we adopted radial motion displays and instructed participants to pay attention or ignore the motion in different blocks of the experiment. This experiment equated both the visual stimulation and the observer performance between the two tasks. In experiments 1a and 1b, we found an increase in CDA amplitude in response to the tracking task, but we wondered whether this increase was caused by the act of paying attention to task-relevant motion. In experiment 2, we compared the electrophysiological response of paying attention to motion in a lateralized random dot kinematogram (RDK) to tracking a single object. CDA amplitude was much reduced in the RDK task, suggesting that attending to motion by itself is insufficient to generate the large tracking activity observed in experiment 1. Finally, in experiment 3, we manipulated the necessity to monitor changing target locations within a trial by stopping and starting motion. We found that, when the objects stopped moving, CDA amplitude decreased, again suggesting that the CDA amplitude increase we observed throughout the experiments in this study is attributable to the act of tracking the spatial location of targets.
Participants
We analyzed the data of 13 observers in experiment 1a, 16 in experiment 1b, 13 in experiment 2, and 12 in experiment 3. We rejected trials where an eye movement or blink was detected. If >25% of an observer's trials were rejected on this basis, the observer's data were eliminated from additional analyses. In total, 7 of the 61 observers were excluded from the sample on this basis. Ages ranged from 18 to 28 and all observers gave informed consent according to procedures approved by the University of Oregon and were paid $10/h for participation. All observers reported having no history of neurological problems, normal color vision, and normal or corrected-to-normal visual acuity. In experiments 1a, 1b, and 3, there were 10 blocks of 64 trials each. Blocks lasted ∼5 min. In experiment 2, observers completed 196 tracking trials and 576 attention to motion trials. Task order was counterbalanced across observers.
Experiment 1
There were two versions of experiment 1. In experiment 1a, we used VWM and MOT tasks used previously by our group (Vogel and Machizawa, 2004; Drew and Vogel, 2008), whereas in experiment 1b we adapted stimuli introduced by Alvarez and Cavanagh (2005). Task type (memory vs tracking) was blocked, whereas set sizes of one or three (two in experiment 1b) items were interleaved. In experiment 1a, the order of block type (memory or tracking) was counterbalanced across observers. In experiment 1b, we used a fixed order of memory blocks first, followed by tracking blocks, in an effort to avoid observers unnecessarily attending the irrelevant motion.
Experiment 1a
Standard visual working memory task.
The lateralized working memory task (see Fig. 1A) was adapted from the procedure used by our laboratory in previous work (Vogel and Machizawa, 2004). Each trial began with a 200 ms arrow cue followed by a cue-to-sample interval that varied randomly between 100 and 200 ms. One or three colored boxes (subtending 0.6° visual angle) were then displayed on both the attended and unattended side of the screen for 500 ms within an invisible 10.5 × 4.5° bounding rectangle that was offset 2.1° lateral to the fixation cross in each hemifield. This was followed by a 1500 ms delay period. At the end of each trial, the items from the selection period reappeared and observers were asked to categorize the items as either “same” or “different” with button press on a game pad controller. On 50% of trials, one of the boxes changed to a color that was not previously on the attended side; on the other one-half of trials, the colors were identical.
Standard tracking task.
The bilateral tracking task (see Fig. 1A) was adapted from previous work from our laboratory (Drew and Vogel, 2008). As in the standard VWM task, each trial started with an arrow cue that instructed observers which hemifield to attend. After the cue, eight boxes (6° square) appeared on each side of the screen and remained stationary for 500 ms. Of these eight boxes, either one or three were colored red to identify these items as targets and the remaining boxes were drawn in black, signifying them to be distractors. The same number of boxes was filled with a photometrically equiluminant (Konica Minolta ChromaMeter CS-100a) green color on the unattended side of the screen. After 500 ms, the red and green items changed to black, making them visually indistinguishable from the distractors, at which point all of the boxes moved randomly for 1500 ms, bouncing whenever they made contact with other objects or when they reached the bounding rectangle described previously. Velocity and direction of motion also changed at random intervals during the trials. Average velocity was 1.6°/s (minimum, 0.8°/s; maximum, 2.4°/s). At the end of each trial, one item was colored red (one item was filled in green on the unattended side) and observers were asked to judge whether this item was one of the targets or not (“same” or “different” than the original color). On one-half of the trials, the probed item was a target and on the other one-half of trials it was a distractor. The timing of cue, initial selection period, delay/tracking interval, and test array were identical with that of the standard VWM task.
Experiment 1b
In this experiment, we again asked participants to either hold item identity in memory or track spatial locations of the items. Critically, by using the same radial motion display in both tasks, we were able to balance the visual stimulation across the two tasks. The task (see Fig. 3A) was adapted from stimuli used in a tracking study by Alvarez and Cavanagh (2005).
Each trial began with a 200 ms arrow cue, followed by a 32 ms interstimulus interval. Observers were asked to attend one or two bars on lateralized spinning pinwheels (two perpendicular rectangles joined at the center) and to keep track of the position of the cued bars as the pinwheels spun randomly for 2000 ms at an average rate of ∼179°/s (SD, 19.5; maximum, 177; minimum, 141). Each bar was 2.9 × 0.3°. The pinwheels were arranged at the corners of a 5.6 × 5.6° box centered at fixation, so they were centered ∼4.0° from fixation. After an initial 500 ms selection period, the cue colors disappeared and the pinwheels began rotating. The speed and direction changed at random intervals so that the motion was unpredictable. After 2500 ms, the pinwheels stopped rotating and one bar on each side changed color. In the “radial tracking task,” observers were asked to attend to one bar on either one or two pinwheels and track its spatial location as it rotated. Observers had to report whether the bar colored at the end of the trial was a target bar or not.
In different blocks of the experiment, the observers completed the “radial memory task” using the same displays. In this task, observers were asked to memorize the initial color of the cued pinwheel arm(s), ignoring subsequent motion. At the end of each trial, a cued color bar was categorized as either the same or different as the color at the beginning of the trial. To equate performance across the two tasks, we used a set of seven highly similar, equiluminant colors varying smoothly between red and green.
Experiment 2
Observers completed two tasks, “attention to motion” and “tracking” (see Fig. 5A), that were blocked and counterbalanced across observers (as in experiment 1a). As in experiment 1b, both tasks used the same stimulus. Each trial began with a 500 ms cue instructing the observer to attend to the left, right, or, in the attention to motion task, both hemifields. This was followed by an interstimulus interval that varied randomly from 250 to 500 ms. Next, one circular aperture (diameter, 7° visual angle; lateralized 0.76° from the central fixation point) appeared in each hemifield. Each aperture was filled with a random dot kinematogram composed of 450 dots (diameter, 0.075° visual angle) that moved coherently in a random direction for the duration of the trial. Each dot had a 250 ms duration, after which it disappeared and reappeared in a random position within the aperture. Each aperture also contained 2 disks (diameter, 0.63° visual angle). For the first 500 ms of each trial, one of the two disks was red, and the other was gray. For the remainder of the trial, all disks were gray. The disks moved randomly at a constant rate of 7.5°/s, bouncing off the walls of the aperture but not each other.
Attention to motion task.
Observers were cued to attend to either the right or left aperture (“cued” trials), or to both apertures (“uncued” trials). Observers were asked to monitor the aperture(s) for a motion event that occurred on two-thirds of all trials. The motion event was either a brief (166 ms) speeding (to 5.1°/s) or slowing (to 0.7°/s) of the velocity of all the dots in one aperture. Otherwise, the dots moved at a constant rate of 3°/s. At the end of each trial, all motion stopped and observers were asked to categorize the motion event as either “absent,” “slower,” or “faster.” The spatial cues were 100% valid and motion events in the uncued trials were equally likely to occur in either aperture.
Tracking task.
Observers were asked to track one of two disks on each trial while ignoring the motion fields. The visual stimulation in the tracking trials was identical with the attention to motion trials with one exception: at the end of each trial, motion stopped and one disk in each aperture turned red. Observers were asked to judge whether this disk was initially red or not with a button press.
Experiment 3
Experiment 3 used a tracking task similar to that described in experiment 1a. Tracking duration was set to 2000 ms. There were four conditions in this experiment, and observers were always asked to track two lateralized targets on each trial. In the Normal condition, all objects moved randomly for the duration of the trial. In the Pause condition, all objects moved randomly for the first 682 ms of the tracking period, paused for 500 ms, and then resumed random motion. The Stop condition was identical with the Pause condition except that the objects remained stationary for the remainder of the trial. Finally, in the No-Move condition, all objects remained stationary for the duration of the trial. All four trial types were randomly intermixed and were indistinguishable from each other for at least the 500 ms selection period. The stimuli were matched in both hemifields, so that when stimuli were moving or stationary in the attended hemifield, they would also be moving or stationary, as appropriate, in the unattended hemifield.
Electrophysiological recording and analysis
ERPs were recorded in each experiment using our standard recording and analysis procedures (McCollough et al., 2007; Drew and Vogel, 2008). Observers were asked to hold central fixation throughout each trials and we rejected all trials that were contaminated by blocking, blinks, or large (>1°) eye movements.
We recorded from 22 tin electrodes mounted in an elastic cap (Electrocap International) using the International 10/20 System. The 10/20 sites F3, FZ, F4, T3, C3, CZ, C4, T4, P3, PZ, P4, T5, T6, O1, and O2 were used along with five nonstandard sites: OL midway between T5 and O1; OR midway between T6 and O2; PO3 midway between P3 and OL; PO4 midway between P4 and OR; POz midway between PO3 and PO4. All sites were recorded with a left-mastoid reference, and the data were re-referenced off-line to the algebraic average of the left and right mastoids. Horizontal electrooculogram (EOG) was recorded from electrodes placed ∼1 cm to the left and right of the external canthi of each eye to measure horizontal eye movements. To detect blinks, vertical EOG was recorded from an electrode mounted beneath the left eye and referenced to the left mastoid. The EEG and EOG were amplified with a SA Instrumentation amplifier with a bandpass of 0.01–80 Hz and were digitized at 250 Hz in LabView 6.1 running on a Macintosh. Contralateral and ipsilateral waveforms were defined based on the side of screen the observer attended on each trial. In this paper, as in previous work, we will focus on lateralized components by defining electrode pairs as either contralateral or ipsilateral with respect to the side of the screen the observers were asked to covertly attend on a given trial. As in previous work, we quantified the CDA by subtracting contralateral activity from ipsilateral activity, then averaging the resultant difference wave across a set of five electrodes (P3/4, PO3/4, OL/OR, O1/O2, T5/6).
In addition to deleting all trials in which eye movements or blinks were detected, in each experiment, we determined whether there was a drift of eye position toward the attended side by examining the horizontal EOG channel over a long time window (100–2000 ms). Attended side did not interact with condition in any of our experiments (all values of F < 1.4; all values of p > 0.2). However, in two of the four experiments, there was a small but significant drift toward the attended side of the screen (experiment 1a: F(1,12) = 7.64, p = 0.017; experiment 3: F(1,10) = 7.98, p = 0.018). It is important to note that the greatest magnitude of drift (1.13 μV) we observed (experiment 3) corresponds to an eye movement of <0.1° of visual angle (Hillyard and Galambos, 1970). In all experiments, our stimuli were lateralized by at least 2° from fixation.
Results
Experiment 1a: comparing VWM and tracking with standard displays
In the first experiment, we recorded ERPs from observers as they performed lateralized versions of a visual VWM task and MOT in separate blocks. In both tasks, we presented items in both hemifields but asked observers to either track or remember the items within a single hemifield. For each task, they tracked or remember either one or three targets per trial (mixed within blocks).
Behavioral results
As expected, accuracy decreased when set size was increased from one object (memory task, 96%; tracking task, 93%) to three objects (memory, 86%; tracking, 77%; F(1,12) = 48.5, p < 0.001). Covertly tracking moving objects was more difficult than holding the same number of color boxes in memory (F(1,12) = 9.54; p = 0.009). We also found a significant interaction between set size and trial type (F(1,12) = 5.45; p < 0.038) that was driven primarily by a larger set size effect in the tracking condition.
Electrophysiological results
Replicating previous work (Vogel and Machizawa, 2004; Drew and Vogel, 2008), we observed a large, occipito-parietal contralateral negativity that rose soon after the objects disappeared in the VWM task and after motion began in the tracking task (Fig. 1). Figure 2A shows the ERP difference waves (contralateral − ipsilateral). In Figure 2B, we plot the amplitude during the early time window (800–1200 ms). There was a significant effect of set size on amplitude (F(1,12) = 19.43; p < 0.001), whereas the tracking task produced significantly greater amplitude than the memory task (F(1,12) = 8.10; p < 0.015). However, task and set size did not interact (F(1,12) = 2.64; p = 0.13), suggesting that the mechanisms underlying the set size effect are independent of those underlying the increased amplitude during the tracking task. We suggest that the additional amplitude during tracking is the signature of tracking the changing position of the targets. Subsequent experiments were designed to rule out alternative explanations.
Figure 2C illustrates an interesting secondary difference between the two tasks: Although there was a significant main effect of set size (F(1,12) = 13.60; p = 0.003), the set size effect in the VWM task decreased by the late (1550–1950 ms) time window, whereas MOT set size effect was approximately constant throughout the trial, leading to a set size by task interaction (F(1,12) = 11.99; p = 0.005). This appears to be driven primarily by the fact that the set size effect was no longer significant in the VWM task (t(12) = 1.04; p = 0.317). Notably, although the set size effect is no longer present and the overall CDA amplitude is lower, there is still a significant CDA for the VWM trials in this late time period (t(12) = 2.64; p = 0.02). Although the apparent difference in time course is interesting in its own right, our primary concern is whether this decrease in CDA amplitude is attributable to stimulus differences between tasks. One simple explanation is that the relative stability of the CDA in tracking tasks is attributable to the presence of moving stimuli during the delay interval of a typical change detection trial (contrast the middle panels of Fig. 1A). We therefore hypothesize that CDA amplitude should stay stable in the presence of competing information during the delay interval.
Conclusions
Both tracking and memory tasks elicited sustained CDA activity that increased in amplitude with set size. However, across both set sizes, the amplitude was substantially larger in the tracking task than in the memory task. Moreover, activity appeared to decrease later in the trial in the memory task, but not in the tracking task. We would like to conclude that the larger, more persistent signal seen in the tracking task reflects processes that are specific to attentive tracking, such as shifting spatial attention or monitoring target positions as objects move, but not necessary for the memory task. However, in experiment 1a, there are a number of physical differences between the two tasks. The tracking task had continuously visible stimuli, whereas the memory task did not. Furthermore, in the current experiment, the tracking task was more difficult than the memory task. In experiment 1b, we attempted to control for these alternative explanations by using identical stimuli for each task and equating overall performance levels for the two tasks.
Experiment 1b: comparing VWM and tracking in identical displays
The goal of experiment 1b was to replicate experiment 1a with physically identical stimuli for MOT and VWM tasks. To this end, we modified a version of the tracking task originally described by Alvarez and Cavanagh (2005). In both tasks, observers were cued to one or two target arms. In the tracking task, the observer had to indicate whether the arm highlighted at the end of the trial was the same as the initially cued arm, regardless of the color. In the memory task, the observer had to indicate whether the color of the highlighted arm was the same as the color of the cue, regardless of which arm was highlighted. We equated accuracy levels across the two tasks by using a set of seven photometrically equiluminant shades between red and green, thereby increasing the difficulty of the memory task. However, although overall performance was equated, the difficulty of the two tasks during the tracking/maintenance interval may not have been equivalent. By manipulating the similarity of the colors used in this version of the task, we increased the difficulty of the task, but this increase may have been primarily attributable to an increased rate of comparison errors (Awh et al., 2007). Indeed, the cognitive processes necessary to complete MOT and VWM tasks fundamentally differ during the tracking/maintenance intervals: whereas VWM merely requires the observer to actively maintain target identity, tracking necessitates maintaining target identity and monitoring the spatial position of each target as they move. Although our manipulation cannot control for these differences, by equating task performance we hoped to control for more general factors that may have influenced the results of the experiment 1a such as effort and arousal. Thus, the results of this experiment alone are insufficient to rule out the possibility that the difference in difficulty between the two tasks during the tracking/maintenance periods could account for the amplitude increasing during tracking tasks. In Discussion, however, we argue that evidence from other published works as well as the results from the current set of experiments suggest that such a difficulty explanation is unlikely to account for this basic finding.
Behavioral results
Accuracy decreased when set size was increased from one (memory task, 83%; tracking task, 86%) to two (memory, 71%; tracking, 71%) objects (F(1,15) = 118.1; p < 0.001). Overall, accuracy did not significantly differ between the two tasks (F(1,15) = 1.4; p = 0.248) and the two factors did not interact (F(1,15) = 3.65; p = 0.075).
Electrophysiological results
The results from the early time window (800–1200 ms) (Fig. 3C) replicated the effects observed in experiment 1a: we found a large main effect for the number items attended in the early time window (F(1,15) = 37.84; p < 0.001), the CDA amplitude was significantly greater for the tracking task than for the memory task (F(1,15) = 25.52; p < 0.001), and the task effect did not interact with number of items (F < 1). Even when the tasks were as closely matched as possible, the requirement to continually track target positions resulted in a substantial increase in contralateral negative voltage activity.
Late amplitude.
In contrast to experiment 1a, the later time window (1550–1950 ms) (Fig. 3D) yielded the same pattern of results as in the early time window: significant main effects for target load (F(1,15) = 22.58; p < 0.001) and task (F(1,15) = 26.63; p < 0.001), but no interaction (F < 1). This result supports our hypothesis that the late decline observed for VWM amplitude in experiment 1a was attributable to the lack of ongoing visual stimulation.
Scalp topography of maintenance and target tracking.
If the additional negativity, seen in the tracking task, has a different scalp topography from the CDA, this would be converging evidence for our hypothesis that the added signal represents a separable mechanism related to tracking the spatial location of targets.
To assess this, we first examined the set size effects by performing a subtraction between remember 2 and remember 1 over a long time window (500–2500 ms) and comparing that with the scalp distribution resulting from the subtraction of track 2 and track 1 (Fig. 4A). When computing topographic maps, we collapsed across attend right and attend left trials by trading lateralized electrode sites for attend right trials such that the right hemisphere was always contralateral. Therefore, the topographic maps denote the average contralateral response on the right hemisphere and the average ipsilateral response on the left. Medial electrodes are simply the average amplitude during attend right and attend left trials.
Both subtractions reveal a lateral and posterior focus of activity over occipito-parietal electrode sites that is consistent with previous work on the topography of the CDA (McCollough et al., 2007; Jolicoeur et al., 2008). An analysis of the normalized electrode pair (F3/4, C3/4, P3/4, Po3/4, T5/6, OL/R, O1/2) by task interaction (McCarthy and Wood, 1985) was not significant (F(6,90) = 0.21; p = 0.977), suggesting that the scalp topographies were statistically equivalent.
Next, we isolated the effect of increased amplitude during tracking by collapsing across set size and subtracting averaged memory amplitude from tracking amplitude (Fig. 4B). This activity showed a much more broadly distributed activity that was more dorsal and anterior than the set size effect. When we directly compared the two scalp topographies, we found a significant electrode position by condition interaction (F(6,90) = 2.39; p = 0.034), suggesting that the two effects stemmed from distinct sources. In experiment 1a, we found a similar pattern of results: in the early time window (800–1200 ms), normalized amplitude for the set size effect for the two tasks did not interact across electrode site (F(6,72) = 0.52; p = 0.794), but there was a marginally significant task by electrode effect (F(6,72) = 1.95; p = 0.085). Given the difficulty of interpreting the underlying neural generators based on scalp voltage distributions (Urbach and Kutas, 2002), this apparent scalp topography difference should be interpreted with caution. Nonetheless, the current analyses suggest that the set size effects observed in the two tasks likely stem from similar sources, whereas the increased amplitude during tracking appears to stem from a distinct source.
Experiment 2: attention to motion or tracking target locations?
Does the increased amplitude observed during tracking arise from shifts in spatial attention that occur as target locations are tracked or merely from attending to a moving stimulus? Experiment 1b strongly suggested that it was not caused by physical differences in the stimuli, such as the presence of motion. However, it could be that it is attention to motion, rather than tracking target positions per se, that accounts for the effect. Therefore, in experiment 2, we directly contrasted tracking and attention to motion. The aim of this experiment was to observe the neural signature of attention to motion in the absence of tracking so as to compare it with the signature associated with tracking, which we conceptualize as involving both attention to motion and shifting spatial attention as target position changes. In the attention to motion task, observers were asked to monitor either one or two lateralized RDK fields so that they could categorize a brief motion event that occurred on two-thirds of all trials. The motion event was either a brief speeding or slowing of the velocity of all dots in a given RDK field. To ensure that lateralized attention was necessary to complete this task, we contrasted cued and uncued trials. On cued trials, observers were precued to the aperture that would contain the motion event. On uncued trials, the motion event was equally likely in either aperture, forcing the observer to monitor both apertures.
Numerous neuroimaging and primate neurophysiology studies have shown that this class of task should elicit large task-related increases in activity in cortical regions such as area MT and MT+ (Treue and Maunsell, 1996; O'Craven et al., 1997; Serences and Boynton, 2007). Moreover, RDK displays are ideal for this purpose because the observer cannot track a single dot in the field to perform the task. Instead, the observer must continuously attend the motion aperture more globally to detect the brief velocity change that could occur at any point (or not at all) during the trial. We contrasted this with an attentive tracking condition by overlaying two disks on top of each motion field and requiring the observer to track one target disk as it moved randomly along with a distractor disk. If the increased contralateral activity observed during tracking is driven by attention to moving stimuli rather than being specific to tracking the movement of targets, we should find a similar sustained contralateral activity in both the global motion and tracking conditions.
Behavioral results
Mean accuracy for tracking task was higher than for the attention to motion task (80 vs 72%) but chance performance was higher in the tracking task (50%) than in the motion task (33.33%). After correcting for guessing (Macmillan and Creelman, 1991), we found that performance in the two tasks was statistically equivalent (t(12) = 1.78; p = 0.100). As expected, we found that performance in the uncued condition (67%) of the motion task was significantly less accurate than in the cued condition (71%; t(12) = 2.67, p = 0.03).
Electrophysiological results
Transient response to display onset.
To measure the spatial allocation of attention at the onset of the RDK fields, we examined the early visually evoked ERPs waveforms (the P1 and N1) that reflect sensory processing. Here, and in subsequent ERP analyses unless otherwise noted, we examined only the cued trials in the attention to motion task, ignoring uncued trials in which we would not expect to observe spatial attention effects. The P1 and N1 components are modulated by spatial attention, with larger amplitudes observed in response to stimuli in attended locations compared with unattended locations (Heinze et al., 1990; Mangun et al., 1991; Hillyard et al., 1998). In lateralized tasks, spatial attention is associated with more positive voltage over contralateral electrodes during the P1 and more negative voltage over ipsilateral electrodes during the N1. Therefore, a simple index of the locus of spatial attention may be computed by subtracting contralateral activity from ipsilateral over the duration of the P1 and N1 (75–175 ms). Using this computation, large positive numbers indicate that the contralateral side is attended. We computed the mean amplitude from contralateral and ipsilateral channels for the OL/OR electrodes 75–175 ms after the onset of the stimuli. We found that there was a significant main effect for hemisphere (contralateral or ipsilateral; F(1,12) = 32.58, p < 0.001), but not for task (motion or tracking; F(1,12) = 0.18, p = 0.676), nor was the interaction between the two factors (F(1,12) = 4.00; p = 0.069). Planned comparisons show that contralateral amplitude was higher than ipsilateral amplitude for both the motion and tracking tasks (t(12) = 4.24, 5.5, respectively, both values of p < 0.005) (Fig. 5B,C), indicating that observers used the cue and were initially attending the cued hemifield in both tasks.
Sustained response.
The difference waveforms (contralateral minus ipsilateral) for both tasks are shown in Figure 5D. Despite sharing identical physical stimulation, we found that these two tasks elicit clearly distinguishable electrophysiological responses (t(12) = 4.19; p < 0.005) during the 500–2500 ms time window. Although attending to motion did elicit a measurable amount of sustained contralateral voltage (−0.66 μV; t(12) = 3.24, p < 0.008), its amplitude was only a small fraction of that observed for the same stimuli while the subject engaged in tracking (−2.07 μV; t(12) = 4.83, p < 0.001) (Fig. 5D,E). These results indicate that sustained attention to a moving stimulus is insufficient to account for the large sustained contralateral activity that we have consistently observed during tracking conditions. Thus, we conclude that this activity primarily reflects processes specifically related to attentional tracking.
Experiment 3: manipulating the necessity to continuously monitor spatial information
We have shown that the additional amplitude observed during tracking, relative to VWM (experiment 1a), is not attributable to the mere presence of motion (experiment 1b) nor to attention to motion (experiment 2). The data remain consistent with our hypothesis that the added amplitude reflects monitoring target position. In experiment 3, we move from the subtractive strategy of the previous experiments to a more direct approach. If the increased CDA amplitude observed in our previous experiments reflects processing related to monitoring changes in target position, then it should only be present when the objects are moving. Therefore, if an observer is tracking moving stimuli, and the objects temporarily stop moving, there should be a reduction in CDA amplitude, which should then resume once the targets begin to move again. In experiment 3, we introduced this new condition, providing a direct test of our target location hypothesis under constant task conditions. An additional advantage is that we can estimate the time course of the tracking-related activity: when objects stop moving, how long will it be before the difference is reflected in the amplitude of the waveform?
In experiment 3, we asked observers to track two lateralized objects in four motion conditions: Normal, Pause, Stop, and Never Move, all randomly intermixed. Normal trials were identical with MOT trials from previous experiments, with continuous motion throughout the trial, whereas on Never-Move trials, objects were stationary throughout the trial. On the Pause trials, all objects stopped moving for 500 ms and then began to move again. Stop trials were similar to Pause trials except that the objects never started moving again. All conditions were interleaved, with identical initial selection periods of 500 ms. The Normal (continuous tracking) and Never-Move (target position maintenance without tracking) conditions essentially served as baseline conditions, analogous to the working memory and tracking conditions in experiments 1a and 1b. These conditions were compared with the Pause and Stop conditions, which began identical with the normal condition. If the additional ERP activity observed during tracking in experiments 1a and 1b was driven by the need to track the spatial locations of the targets, stopping object motion midtrial should reduce the ERP amplitude to a level equivalent to maintaining target position (the Never-Move condition). If the objects started moving again, as they did in the Pause condition, we would predict that amplitude would rise back to the level associated with continuous tracking (the Normal condition).
Behavioral results
Performance in this task varied as a function of condition, from 96% correct in the Never-Move condition, to 91% in the Stop condition, to 82% in the Pause and Normal conditions. This led to a significant effect of condition (F(1,33) = 39.12; p < 0.001). This effect appears to be driven primarily by the fact that performance on the never move condition was higher than for any of the other conditions (t(11) = 6.71; p < 0.001).
Electrophysiological results
Figure 6 shows that our key prediction was confirmed: directly manipulating whether or not Os needed to track objects within a trial led to transient decreases and increases in CDA amplitude. The added signal seen during tracking dissipated during a pause in motion and increased again soon after the onset of motion. We analyzed the mean differential amplitude during the selection phase (200–300 ms) and in three time windows during tracking (Fig. 6A,B): early (1000–1500 ms), middle (1500–2000 ms), and late (2000–2500 ms). Note that the middle period corresponds to the time when motion stopped on Pause trials. Amplitudes for all four conditions were equivalent during the selection period before motion onset (F(3,33) = 0.314; p = 0.815). In the early period (F(3,33) = 20.43; p < 0.001) and all subsequent time windows, there was a significant effect of condition. Only the Never-Move condition did not require tracking object locations, and amplitude was significantly lower in this condition compared with the other three (simple contrast: F(1,11) = 39.00, 31.9, and 43.6; all values of p < 0.001), which were equivalent (F(2,22) = 0.62; p = 547). In the middle time window after the initial cessation of motion, Pause and Stop amplitudes were significantly lower than Normal amplitude (t(11) = 2.56, 5.5, respectively; both values of p < 0.03). Finally, in the late time period, the Normal and Pause conditions require monitoring target position, whereas the Stop and Never-Move conditions do not. Accordingly, Pause amplitude rose significantly above Never-Move amplitude (t(11) = 4.84, p < 0.001) so that it was equivalent to Normal amplitude (t(11) = 2.09; p = 0.061) and Stop amplitude was equivalent to amplitude in the Never-Move condition (t(11) = 1.95; p = 0.077), which served as a spatial working memory control condition in this experiment. This confirmed our prediction that the additional amplitude associated with tracking, compared with visual memory alone, appears to be specifically related to the necessity to track the positions of moving objects. Thus, when objects are not currently moving, amplitude decays to a level equivalent to the Never-Move conditions. Another interpretation concordant with our data might be that, when objects stopped moving, observers recoded their positions into a different representation, such as a verbal code. Regardless of whether object information was represented in visual memory or recoded verbally, our results suggest when objects stop moving, unique tracking mechanisms are no longer used, resulting in a decrease in CDA amplitude.
The significantly greater amplitude in the normal condition relative to the Never-Move amplitude during the early time window (1000–1500 ms: t(11) = 6.25, p < 0.001) serves as a replication of the results of experiment 1 using a position memory task rather than a color by position memory task. In this light, it is worth noting that amplitude in this condition follows a very similar pattern as the memory condition in experiment 1a, slowly decreasing as the trial progresses. Since there were stationary stimuli present throughout the “retention” interval, this argues that the amplitude decreases over time because of the absence of motion, rather than the absence of stimuli.
One of the advantages of electrophysiological research is that we can record precisely when the brain registers a change in the stimulus. How long does the target tracking effect persist after motion ceases, and how soon does it reengage when motion restarts? To estimate the latency of this effect, we subtracted amplitude in the pause and stop conditions from amplitude in the Normal condition (Fig. 6C) with a 50 ms sliding window. The Pause and Stop conditions showed very similar decreases in amplitude in response to motion cessation: both Pause and Stop amplitude dropped significantly below Normal amplitude for the first time in the 1500–1550 ms time window, or 343 ms after motion stopped. Amplitude in the Stop condition dropped to a level equivalent to the Never-Move condition at the same point (343 ms after motion stopped) and the two conditions remained statistically equivalent for the remainder of the trial. In the Pause condition, amplitude rose to be statistically equivalent to Normal during the 1800–1850 ms time window, within 193 ms after motion resumed.
We therefore tentatively conclude that the process of monitoring target positions turns off with a latency of ∼350 ms. Kreegipuu and Allik (2007) recently estimated that it takes 200 ms for the visual system to register the onset or offset of motion. Thus, our data suggest that it takes an additional 150 ms to register that it is no longer necessary to monitor target position information. This relatively fast, but not immediate processing may help explain why, when asked to localize targets with a mouse click, observers tend to click slightly ahead of the most recent position (Iordanescu et al., 2009). Efficient spatial tracking may necessitate both an awareness of the current location of the target as well as an estimate of where the object could go next (Horowitz and Cohen, 2010). If this is the case, then the delay observed before amplitude decreased in the absence of motion may be attributable to a prediction mechanism continuing to operate for some period of time even though the spatial position of the targets is not changing.
Discussion
The goal of these experiments was to study the processes by which observers maintain and track the spatial position of objects. Although the temporary storage of spatial indexes in VWM is a key component of a number of models of MOT (Oksama and Hyönä, 2004, 2008; Alvarez and Cavanagh, 2005), this is the first clear demonstration of an overlap in the neural activation between the two tasks. Both VWM and MOT evoked a strikingly similar lateralized contralateral component that was sensitive to the number of items attended on given trial and that has been shown to be sensitive to behavioral performance in both VWM (Vogel and Machizawa, 2004) and MOT (Drew and Vogel, 2008). Furthermore, using closely matched VWM and MOT tasks, we have demonstrated that the amplitude of the sustained, lateralized response, or CDA, was consistently greater for MOT than for VWM. The scalp topography of the additional amplitude was distinct from the topography of the set size effect, which was similarly distributed in both VWM and MOT tasks. Experiment 2 demonstrated that the act of attending to motion was not sufficient to drive the contralateral tracking activity observed in experiments 1a and 1b. Finally, in experiment 3, we showed that the contralateral tracking activity responded in predictable ways to transient changes in stimulus motion: decreasing when targets stopped moving after a latency of ∼350 ms, and rising again when target motion restarted. In sum, the increased contralateral negativity observed in experiments 1a, 1b, and 3 appears to be specifically tied to the process of tracking the spatial position of targets.
Although we feel confident that the observed increase in contralateral negativity reflects attentive tracking, it is not currently clear what role the observed activity plays in the task of tracking moving objects. As outlined above, we have ruled out a number of explanations including overall task performance and merely paying attention to a moving stimulus. Although it is challenging to perfectly equate effort levels during the delay/tracking interval, previous work has suggested that CDA amplitude is not modulated by task difficulty within a given set size (Drew and Vogel, 2008; Ikkai et al., 2010; Luria et al., 2010). Furthermore, if the increased amplitude associated with tracking is attributable to increased difficulty during the tracking (compared with maintenance) period, then tracking should be associated with more activity from the same apparent source. Rather, we found that the increased activity associated with tracking showed a distinct, more dorsal distribution of activity than the activity associated with increasing the target load in both tasks. In sum, although we cannot rule out a difficulty explanation, together our data suggest that the observed increase in amplitude is attributable to task-specific mechanisms.
At present, we do not know whether the increased amplitude during tracking reflects shifts of spatial attention or the process of updating target positions. fMRI studies of tracking have implicated FEF, SPL, and anterior IPS, which are also involved in covert attentional shifts (Corbetta et al., 1998; Wojciulik and Kanwisher, 1999). Our studies cannot discriminate between the updating and shifting hypotheses because, thus far, we have focused on tracking spatial positions. Additional experiments will be necessary to determine whether the tracking activity observed in the current study is observed when target identity must be updated with location held constant (Blaser et al., 2000). However, it would be quite interesting if the tracking activity were to reflect attentional shifts, given that one well characterized ERP signature of spatial orienting is the N2pc, a transient component that deflects 200–300 s after stimulus onset, whereas the observed tracking activity is a sustained slow-wave component that appears to be constant throughout tracking trials. In this case, we might be observing the signal of continuous attentional “smooth pursuit” (Horowitz et al., 2004), as opposed to the discrete “attentional saccades” indexed by the N2pc. However, it is not clear that there is firm distinction between these two hypotheses, as attentional shifts might be the mechanism for updating spatial memory [following the logic of Postle et al. (2004)]. It seems plausible that the process of updating target information may be driven, at least in part, by covert shifts of spatial attention.
Assuming for the moment that our interpretations are correct, we can ask about the neural substrates of indexing locations and tracking the spatial location of targets. Our method was not designed to answer this question, beyond suggesting that the tracking source is more dorsal and anterior than the source of the indexing process. However, a set of conceptually similar studies using fMRI and unit recordings is instructive here. Reynolds and colleagues (Mitchell et al., 2007; Sundberg et al., 2009) have recently demonstrated strong evidence that activity in V4 is strongly modulated by the relative locations of a tracked item and visually identical distractors. However, the recordings in both of these studies occurred during a pause in the motion of the objects, similar to the pause in motion used in experiment 3. As in experiment 3, we hypothesize that activity observed during this time period is attributable to processes related to indexing spatial locations of objects rather than tracking the spatial location of the target. A number of studies have shown that activity in the intraparietal sulcus increases with the number of items that must be encoded into visual working memory (Linden et al., 2003; Todd and Marois, 2004; Xu and Chun, 2006; McNab and Klingberg, 2008) and asymptotes when the behavioral capacity is exceeded (Todd and Marois, 2004; Xu and Chun, 2006). Similarly, activity in IPS increases as observers are asked to track more targets (Culham et al., 2001; Jovicich et al., 2001). Given the fact that activity in IPS [particularly posterior IPS, according to Xu and Chun (2006)] increases as a function of set size during both VWM and MOT tasks, activity in this area may reflect a pointer system that devotes an attentional focus to each of the tracked targets. In support of this idea, posterior IPS was the only region that Howe et al. (2009) found to be more active when attending to stationary targets than when passively viewing the display. If both tasks use posterior IPS to focus attention on target locations, then perhaps the strikingly similar behavioral capacity limitations in the two tasks is driven by the processing capacity of an IPS-based pointer system. This would help explain the correlation between VWM and MOT performance (Oksama and Hyönä, 2004), as well as the fact that the two tasks strongly interfere with one another in a dual-task situation (Fougnie and Marois, 2006).
If posterior IPS is the locus of the indexing mechanism, what areas are involved in tracking target locations? To perform the attentive tracking task, this mechanism must apprehend several basic motion parameters of the display, such as the speed and trajectories of the targets. One plausible neural candidate for such a mechanism is MT+, which is known to contain large proportions of motion-selective neurons that are sensitive to motion parameters such as coherence, trajectory, and speed. Similar to the spatial tracking activity observed in the current study, during fMRI studies of MOT, MT+ shows strong activation when motion is attended, compared with passive viewing of the moving display, but the area appears to be less responsive to target load manipulations than other areas such as IPS (Culham et al., 2001; Jovicich et al., 2001). Furthermore, Howe et al. (2009) found enhanced activation in MT+ when observers had to track the spatial locations of objects, relative to when the objects were stationary.
What have we learned about the nature of tracking moving objects? First, our interpretation of the neural activity from the current study suggests that there are at least two separate mechanisms that underlie tracking: an indexing mechanism that is closely tied to VWM representation and a mechanism that tracks target locations. Second, although one might expect that, when there are more targets to track, there would be more tracking activity, in our experiments tracking activity did not interact with target load. It may be that our measures are not sufficiently sensitive to detect the increase in target position processing with load. Another possibility is that this tracking mechanism is effectively either on or off depending on the task requirements, similar to the response found in area MT+ in the studies discussed above.
To interact with out environment, it is necessary to constantly track the spatial locations of objects as they move and as we move with reference to the objects. The present data represent a step forward in understanding this process, delineating the differences between holding object information in visual working memory and tracking object location during an attentive tracking task. In the process, we believe we have isolated a neural signature of tracking target positions that appears to be sensitive to the necessity to track spatial positions, but not to the number of spatial positions that must be tracked.
Footnotes
This work was supported by National Institutes of Health Grant MH-65576 (T.S.H., E.K.V.). We thank Ed Awh, Paul van Donkelaar, and Ulrich Mayr for comments on a previous version of the paper.
- Correspondence should be addressed to Trafton Drew, Visual Attention Laboratory, Harvard Medical School, 64 Sidney Street, Suite 170, Cambridge, MA 02139. tdrew1{at}rics.bwh.harvard.edu