Abstract
Most mental processes consist of a number of processing steps that are executed sequentially. The timing of the individual mental operations can usually only be estimated indirectly, from the pattern of reaction times. In vision, however, many processing steps are associated with the modulation of neuronal activity in early visual areas. Here we exploited this association to elucidate the time course of neuronal activity related to each of the self-paced mental processing steps in complex visual tasks. We trained monkeys to perform two tasks, search–trace and trace–search, which required performing a sequence of two operations: a visual search for a specific color and the mental tracing of a curve. We used multielectrode recording techniques to monitor the representations of multiple visual items in area V1 at the same time and found that the relevant curve as well as the target of visual search evoked enhanced neuronal activity with a timing that depended on the order of operations. This modulation of neuronal activity in early visual areas could allow these areas to (1) act as a cognitive blackboard that permits the exchange of information between successive processing steps of a sequential visual task and to (2) contribute to the orderly progression of task-dependent endogenous attention shifts that are driven by task structure and evolve over hundreds of milliseconds.
Introduction
Many tasks of our daily lives take from seconds to minutes. It is usually possible to subdivide these tasks into simpler subtasks that are executed sequentially. Consider, for example, the task of finding your way on a map. You typically start looking for your current position, and then for your destination. Next you explore the various possible routes until you have found the one that is shortest or most convenient (Roelfsema, 2005). Some of these subtasks can be decomposed further, e.g., when there are two possible paths on the map you may first explore one before you try the other. At some small timescale, however, it becomes difficult to subdivide the task components further, and previous authors have suggested that there exists an “atomic step of cognition” lasting 100–300 ms (Newell, 1990; Ballard et al., 1997; Anderson and Lebiere, 1998). In the visual domain, the atomic processing steps have been called “elemental operations,” and the neuronal program that solves a complex task was called a “visual routine” (Ullman, 1984).
Previous studies estimated the time course of atomic processing steps from the reaction times of subjects across tasks of varying complexity (Anderson and Lebiere, 1998). Here we measured the time course by recording neuronal activity in the visual cortex. We investigated two routines that involve a visual search and the tracing of a curve so that we could take advantage of the well-established association between these subtasks and shifts of attention that can be measured with neurophysiological techniques (Chelazzi et al., 1998, 2001; Everling et al., 2002; Sheinberg and Logothetis, 2001; Bichot et al., 2005). Previous neurophysiological studies of attention shifts cued subjects to attend one item in the display and then redirected attention by switching this cue (Motter, 1994; Khayat et al., 2006; Busse et al., 2008; Herrington and Assad, 2009), but the dynamics of neuronal activity related to self-paced endogenous attention shifts resulting from a complex visual task were not investigated previously.
We trained monkeys to perform two routines that required curve tracing and visual search, in different orders, and simultaneously monitored neuronal activity at multiple locations in the primary visual cortex (area V1) with multielectrode recording techniques. We found that V1 neurons that represent a traced curve or the target of search enhance their response when these items become relevant, and we were therefore able to monitor the precise time course of the search and trace operations embedded in a routine. We observed that the necessary endogenous attention shifts are associated with substantial delays in the visual cortex where the response modulation of the first operation evolves over ∼200 ms and persists during the initiation of the second operation that requires an additional 60 to 240 ms, depending on the task and precise configuration of the stimulus. These results suggest that neurons in early visual areas contribute to the implementation of complex visual tasks and may play a role in the transfer of information between successive processing steps.
Materials and Methods
We used surgical and electrophysiological techniques to record multiunit activity (MUA) in area V1 from chronically implanted microwires and electrode arrays in two macaque monkeys (Macaca mulatta) as described previously (Supèr and Roelfsema, 2005). All experimental procedures complied with the National Institutes of Health Guide for Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee of the Royal Netherlands Academy of Arts and Sciences.
Tasks and stimuli.
Figure 1 illustrates the two tasks. The monkeys were seated at a distance of 75 cm from a 21 inch computer display with a frame rate of 70 or 100 Hz. Trials started with a blank screen with a fixation point (with a size of 0.2°) in the center of the display. The stimuli appeared when the monkey had kept his gaze on the fixation point for 300 ms. Each trial required the monkeys to maintain eye position within a square fixation window of ∼1° (see Eye position measurements) until they were allowed to make an eye movement. If the animal broke fixation, then the trial was aborted and repeated later in the same block. In the trace-then-search task, the monkeys first traced a curve connected to the fixation point to identify the color of the marker at its other end (see Fig. 1a, T-Marker) and then searched for a large colored disc (see Fig. 1a, T-Disc) with the same color. The monkeys performed these two operations mentally while keeping their gaze on the fixation point. When the stimulus had been in view for 600 ms, the fixation point disappeared, and the monkeys were required to make an eye movement to the target disc. In the second search-then-trace task, the order of operations was reversed (see Fig. 1b). The monkeys searched for a marker with the same color as the fixation point (see Fig. 1b, red T-Marker) and traced the curve connected to this marker to locate a disc at the curve's other end that was the target for an eye movement. Monkeys received a juice reward only when they made an eye movement to the relevant target disc. The stimuli (eight in the trace-then-search task and four in the search-then-trace task) were shown interleaved in a pseudorandom sequence (so that the same stimulus was never repeated more than three times in succession). The curves were white, and the markers (size, 0.3°) and discs (size, 0.7°) were red or green shown on a black background. The size of the stimuli ranged from 3 to 8° of visual angle, and the luminance ranged from 20 to 80 cd/m2, depending on the color. Stimuli varied in position and size to match as closely as possible the center of the receptive fields (RFs) of the recording sites and we ensured that only target or distractor elements fell into the RF (see Figs. 2, 4). The monkeys were initially trained to perform simple curve-tracing tasks before the recordings from the complex tasks reported here. We used different curve configurations in this initial training phase (Roelfsema et al., 1998) to ensure that the monkeys did not rely on the length or shape of the target curve to solve the task.
Surgical procedures.
We first implanted a head holder under aseptic conditions and general anesthesia, which was induced with ketamine (15 mg/kg, i.m.) and maintained after intubation by ventilating with a mixture of 70% N2O and 30% O2, supplemented with 0.8% isoflurane, fentanyl (0.005 mg/kg, i.v.), and midazolam (0.5 mg/kg · h, i.v.). In separate surgeries, we implanted microwires and electrode arrays chronically in the right hemisphere of monkey G and the left hemisphere of monkey C. The wires had a core of platinum-iridium (20 μm) and a coating of polyimide. They were positioned at a depth of 0.8–2 mm below the cortical surface (Supèr and Roelfsema, 2005). The electrode arrays consisted of 4 × 5 or 5 × 5 electrodes (Blackrock Microsystems). The animals recovered for at least 21 d before training was resumed.
Multiunit recording of neuronal activity in area V1.
We recorded MUA from a total of 77 recording sites with a clear visual response. We obtained good recordings from 46 sites of two electrode arrays in monkey C, and a total of 31 sites in monkey G from two electrode arrays (N = 17) and a number of wire electrodes (N = 14). The criteria for a good visual response required the average response to be at least 50% of the standard deviation of the trial-to-trial variation in the spontaneous activity. Of all the included recording sites, 80% had a visual response stronger than twice the standard deviation of the spontaneous activity, i.e., the d′ of their visual response was two or higher. We illustrated the neuronal activity on individual trials to give an impression of the reliability of the responses in supplemental Figure 1 (available at www.jneurosci.org as supplemental material). We recorded the envelope of the amplified signal filtered between 750 and 5000 Hz [called MUAE by Supèr and Roelfsema (2005)]. The MUA signal represents the pooled activity of a number of neurons in the vicinity of the electrode tip, and the population responses obtained with this method are identical to those obtained by pooling across single units (Supèr and Roelfsema, 2005; Cohen and Maunsell, 2009). Supplemental Figure 2 (available at www.jneurosci.org as supplemental material) compares the MUAE signal to MUAS that can be recorded from the same electrode by counting the spikes that cross a preset threshold. The advantage of MUAE over MUAS is that it does not depend on an arbitrary threshold, and we found previously that it has a better signal-to-noise ratio (Supèr and Roelfsema, 2005). We mapped the RF borders of the neurons at every recording site by measuring the onset and offset of the response evoked by a light bar moving in one of eight directions (Supèr and Roelfsema, 2005). The average RF size was 1.3° (range, 0.6–2.8°), and the average eccentricity was 4.0° (range, 1.0–7.0°). The receptive field centers of array and wire electrodes in the two monkeys are shown in supplemental Figure 3 (available at www.jneurosci.org as supplemental material).
Both monkeys could perform the search-then-trace task as well as the trace-then-search task before we started the recording sessions. We collected the data in the trace-then-search task in monkey G over a period of 2 weeks. We varied the stimulus from day to day in this animal because the receptive field positions were spread out over the visual field (supplemental Fig. 3, available at www.jneurosci.org as supplemental material), and we wanted to ensure that recordings sites were well driven by the stimulus. This was followed by recordings in the search-then-trace task over a period of 5 weeks. We recorded from two electrode arrays in monkey C, and accordingly, the receptive fields formed two clusters (supplemental Fig. 3, available at www.jneurosci.org as supplemental material). We could therefore record from all recording sites on a single day. The data in the trace-then-search and search-then-trace tasks in monkey C were recorded on different days, separated by 4 weeks. We selected these days because the recordings had a good signal-to-noise ratio and the monkey was performing well. In this animal, we switched back and forth between the two tasks and obtained similar results across a number of recording days.
In the analysis of the data, we included only correct trials (the total number of trials ranged from 80 to 200 per condition). We normalized the responses to the average peak response evoked by the target and distractor curve (or marker), after subtraction of the spontaneous activity. Population responses were computed by averaging across the normalized responses at individual recording sites. The data from each recording site were used only once.
We used d′ to quantify how well the neurons at a recording site discriminated between the target and distractor stimulus on single trials, and computed d′ as follows: where μT and μD are average responses evoked by the target and distractor items, and σ represents the pooled standard deviation of the neuronal response across trials. We also computed modulation indices to quantify the strength of the response modulation. The modulation index (MI) was defined as the difference between the response evoked by the target and distractor curve (or disc) divided by the average response: MI = (target − distractor)/[(target + distractor)/2].
Estimation of the latency of the visual response and the response modulation.
We used a method for the estimation of the latency of the visual responses and the response modulation that is based on fitting a curve to the response (difference). Although some previous studies measured latency as the first of a number of consecutive time bins satisfying a significance criterion (Lennie, 1981; Maunsell and Gibson, 1992), we have noted that this method yields biased latency estimates, especially if the difference in the neuronal response between conditions evolves gradually as latencies become shorter if more trials are collected, i.e., when the overall significance is higher (Roelfsema et al., 2003). Our curve-fitting method is independent of the number of trials, and the latency is measured by fitting a function f(t) to the response or the response difference (Thompson et al., 1996; Roelfsema et al., 2003).
We derived the shape of f(t) from the following two assumptions: (1) the onset of the response (modulation) has a Gaussian distribution, and (2) a fraction of it dissipates exponentially. We derived the following two differential equations from these assumptions: ∂m1(t)/∂t = −αm1(t) + g(t, μ, σ) for the dissipating modulation, and ∂m2(t)/∂t = g(t, μ, σ) for the nondissipating modulation. The total response (modulation) equals m1(t) + m2(t) = f(t), g(t, μ, σ) is a Gaussian density with a mean μ and a standard deviation σ, and α−1 is the time constant of the dissipation. The equations are solved by the sum of an ex-Gaussian (Luce, 1986) and a cumulative Gaussian: Thus, the function f(t) is described by five parameters, μ, σ, α, c, and d; G(t, μ, σ) is a cumulative Gaussian, and c and d determine the relative contribution of nondissipating and dissipating modulation, respectively. We (arbitrarily) defined the latency of the visual response (latonset) and response modulation (latmod) as the time where the fitted function reached 33% of its maximum (lat33), but we also report the latency estimates obtained with different criteria in the results section.
We used a statistical resampling procedure (bootstrapping) to measure the significance of differences in latency between the response modulations in different conditions (Press et al., 1993). If there are N1 recording sites responses in condition 1 and N2 sites in condition 2, we randomly selected N1 cases with replacement from the first sample and N2 from the second sample and determined the latency of the modulation in the two simulated conditions, Latsim1 and Latsim2, using the curve-fitting method described above. We repeated this 10,000 times and determined the significance of a difference in latency by investigating whether the confidence interval based on the simulated latency differences (Latsim1 − Latsim2) included the zero. We derived the 95% confidence intervals of the latency of the response modulations equivalently from the distributions of Latsim.
Eye position measurements.
We implanted a gold ring under the conjunctiva of one eye to record the eye position using the double magnetic induction method (Bour et al., 1984) and recorded this signal at a rate of >500 Hz. During stimulus presentation, monkey G had to maintain steady fixation within a 1.2 × 1.2° (or 1 × 1°) fixation window centered on the fixation point. We had to make the fixation window for monkey C larger (maximally, 1 × 1.6°) because the vertical eye position signal contained 50 Hz noise with a peak-to-peak amplitude of 0.3°. We confirmed the fixation accuracy of monkey C by analyzing the eye position distribution off-line and found that the 95% confidence interval of fixation positions was [−0.2°, 0.2°] in the X direction, and [−0.5°, 0.6°] in the Y direction. We ruled out that small eye movements within the fixation window influenced the neuronal responses using a stratification analysis (Roelfsema et al., 1998).
Results
We trained two monkeys to perform two tasks that required them to search for an item of a specific color and to trace one of two curves. In the first trace-then-search task, the monkeys initiated a trial by looking at a fixation point, and after a delay of 300 ms, a stimulus appeared with two curves, two circular markers, and two larger circular discs (Fig. 1a). The monkeys had to trace the curve that was connected to the fixation point to identify the color of the marker at the other end of this curve (T-Marker), while ignoring a distractor curve. The animals had to search for the larger disc (T-Disc) with the same color as the target marker and to select this disc as the target for an eye movement. When the stimulus had been in view for 600 ms, the fixation point disappeared, cuing the monkeys to make a saccade to the target disc. Stimuli differed in whether the upper or lower curve was connected to the fixation point and in the colors of the markers and discs, so that all permutations of the stimulus resulted in a total of eight stimuli that were shown in a pseudorandom sequence (Fig. 2a). Although this task was relatively complex, the animals reached an average performance of 90.1% correct (monkey C, 91.7%; monkey G, 88.5%). In the second search-then-trace task, the order of operations was reversed. Trials started with a gray fixation point that changed color to either green or red at the same time that a stimulus appeared (Fig. 1b). The animals had to search for a target marker with the same color as the fixation point that was located at the start of a target curve. The monkeys had to trace this target curve to localize a disc at its other end, and this disc was the target for an eye movement. The animals reached an average performance of 87.5% in the search-then-trace task (monkey C 82.9%; monkey G, 92.0%). We recorded reaction times in separate experimental sessions in which the monkeys were allowed to respond as fast as possible (supplemental Table 1, available at www.jneurosci.org as supplemental material). As expected, the response times in the complex tasks where longer than in tasks that could be solved by tracing or searching only. During the recordings of neuronal activity, monkeys had to maintain fixation for 600 ms to avoid the influence of eye movements on the activity of V1 neurons.
Neuronal activity in area V1 in the trace-then-search task
To measure the time course of the search and trace operations, we recorded multiunit neuronal activity from electrodes that were implanted chronically in area V1. We placed the markers and discs in the receptive fields of different V1 neurons so that we could monitor their representations at the same time. Figure 2a shows the stimuli from a recording session with the trace-then-search task, as well as two typical RFs of multiunit recording sites: one on the marker at the end of a curve [recording site 1 with a receptive field (RF1)] and the other on one of the discs [recording site 2 with a receptive field (RF2)]. For four of the stimuli, the target disc was closer to the target marker (Fig. 2a, close configuration, left) than for the other four stimuli (far configuration, right). This difference between stimuli caused an unexpected difference in neuronal activity. We therefore analyzed trials from the close and far configuration separately.
Figure 2b shows the responses evoked at RF1 on the marker at the end of either the target or distractor curve. These responses were pooled across stimuli with opposite color combinations (shown above each other in Fig. 2a) so that the average RF stimulation was the same for responses evoked by the target and distractor markers (50% red and 50% green). It can be seen that the neuronal responses evoked by the relevant marker (target) were stronger than those evoked by the irrelevant marker (distractor) in the close as well as in the far configuration. This response enhancement was not observed during the initial neuronal response triggered by the appearance of a stimulus in the receptive field, but after a delay, indicating that it is a correlate of tracing of the target curve, which was required to determine the relevant color for the subsequent search. To evaluate the significance of the response enhancement, we compared the distribution of single-trial responses in a window from 200–600 ms after stimulus presentation with a Mann–Whitney U test and found activity evoked by the relevant marker to be significantly enhanced in both the close and far configurations (p < 10−6).
To measure the timing of the response modulation caused by tracing, we subtracted the response evoked by the irrelevant marker from that evoked by the relevant one and fitted a curve to the difference response (Fig. 2b, light blue curve). We (arbitrarily) determined the modulation latency as the point in time where the fitted function reached 33% of its maximum (Roelfsema et al., 2003). The latency of the response modulation was 214 ms in the close configuration and 201 ms in the far configuration, whereas the visual response (averaged across the target and distractor markers) had a much shorter latency of 42 ms (Fig. 2b, black curve).
In the same recording session, we measured the responses of neurons at recording site 2 with a receptive field on the target disc (i.e., the target of the visual search) (Fig. 2a, RF2). The target disc evoked significantly stronger activity than the distractor disc in both the close (U test; p < 10−4) and far configuration (p < 0.05), a response enhancement that we attribute to the search for the disc with the appropriate color (Fig. 2c). When we determined the timing of the search-related response modulation, we were surprised to find a difference between the close and far configurations. The latency of the response enhancement was 258 ms in the close configuration (i.e., 44 ms after the tracing modulation observed at recording site 1), and it increased to 400 ms in the far configuration, which is 199 ms after the response enhancement at recording site 1 and >350 ms after the visual response with a latency of 39 ms.
Population analysis in the trace-then-search task
We observed similar results when we evaluated the activity across the entire population of recording sites in the trace-then-search task. Figure 3a shows the average neuronal response of 44 recording sites with an RF on the marker or on a segment of the curve close to the marker (Fig. 2a, RF1) (22 recording sites in monkey C and 22 in monkey G). The response evoked by the relevant marker (Fig. 3a, orange) was stronger than the response evoked by the irrelevant marker (blue) in the close configuration (paired t test; t = 8.5; p < 10−10) as well as in the far configuration (paired t test; t = 4.8; p < 2.10−5). To assess the strength and reliability of the response modulation across trials at individual recording sites, we computed the d′, defined as the difference between responses evoked by target and distractor marker divided by standard deviation of single-trial responses in a window from 200–600 ms after stimulus onset (see Materials and Methods). The blue bars in Figure 3c represent the distribution of d′ values across recording sites in the close (left) and far configurations (right). In the close configuration, all d′ values are positive, which indicates that the target marker generally evoked stronger activity than the distractor marker (p < 10−10; sign test). The median modulation index [(target − distractor)/average] was 0.73, and the response evoked by the marker curve was on average 98% stronger than the response evoked by the distractor. The same was true for the far configuration, where the majority of recording sites had a positive d′ value (p < 10−6; sign test) and a median MI of 0.30 corresponding to a target response that was, on average, 41% stronger than the distractor response. At the population level, the response enhancement occurred after 180 ms in the close configuration and after 192 ms in the far configuration, >130 ms after the neurons' initial visual responses triggered at a latency of 41 ms (Fig. 3a, bottom, compare blue and black curves).
To investigate the visual search operation, we evaluated the activity at 47 recording sites with a receptive field on the target or distractor disc (Fig. 2a, RF2) (24 sites in monkey C and 23 in monkey G). The responses evoked by target discs were stronger than responses evoked by distractor discs in the close configuration (paired t test; t = 11.3; p < 10−10) with an average MI of 0.34 (38% increase in activity in the window from 200–600 ms). The same effect was observed in the far configuration (t = 7.5; p < 10−8) (Fig. 3b), where the average MI was 0.15, corresponding to an average increase in activity by 13% (in the window from 200–600 ms; activity increased by >50% in the last phase of the response). The reliability of the search-related response enhancement was also reflected by the responses of individual recording sites. The d′ values were positive at 46 of the 47 recording sites in the close configuration (p < 10−10; sign test) and at 41 of these sites in the far configuration (p < 10−7; sign test) (Fig. 3c, red bars). The search-related response modulation occurred later than the response modulation caused by tracing. In the close configuration, neurons with an RF on the target disc enhanced their response at a latency of 267 ms, which was significantly later than the tracing modulation (bootstrapping test; p < 0.005) (see Materials and Methods), and the delay was even longer in the far configuration, with search modulation occurring at a latency of 435 ms, >200 ms after the tracing modulation (bootstrapping test; p < 0.001). In addition to the latency estimates obtained at 33% of the maximum of response modulation (lat33), we also investigated the moment in time that the fitted function reached 25, 50, or 75% of its maximum (lat25, lat50, and lat75). In the close configuration of the trace-then-search task, the differences in lat25, lat50, and lat75 between the trace and search modulations were 74 ms (bootstrapping test; p < 0.05), 112 ms (p < 0.001) and 154 ms (p < 0.001), respectively. In the far condition, the differences in lat25, lat50, and lat75 were 246 ms (p < 0.002), 240 ms (p < 0.001) and 238 ms (p < 0.001).
We next determined the latencies at the individual recording sites with significant modulation (at p < 0.05; Mann–Whitney U test; i.e., a subset of all recording sites in Fig. 3c) and found that tracing modulation occurred earlier than search modulation in the close (median, 198 vs 280 ms; p < 10−6; Mann–Whitney U test; Nmarker = 34; Ndisc = 34) as well as in the far configuration (median, 228 vs 429 ms; p < 10−5; Nmarker = 18; Ndisc = 15) (Fig. 3d).
We also observed the latency differences if we analyzed the data of the two monkeys separately. In monkey C, the tracing modulation occurred at a latency of 196 ms in the close configuration, which was significantly earlier than the search modulation with a latency of 275 ms (supplemental Fig. 6, available at www.jneurosci.org as supplemental material) (bootstrapping test; p < 0.001). In the far configuration, the tracing modulation had a delay of 231 ms, and this was significantly earlier than the search modulation with a latency of 472 ms (bootstrapping test; p < 0.001). In the close configuration, the trace modulation in monkey G had a latency of 164 ms, which was significantly earlier than the search modulation with a latency of 302 ms (supplemental Fig. 7, available at www.jneurosci.org as supplemental material) (p < 0.05). In the far configuration, the trace modulation occurred at a latency of 111 ms, which was earlier than the search modulation with a latency of 409 ms (p < 0.002).
Neuronal activity in the search-then-trace task
In the trace-then-search task, we found that the trace-related response modulation preceded the modulation caused by search. Does this temporal ordering reflect the sequencing of operators that are embedded in a visual routine? An alternative hypothesis is that tracing is simply a faster process than search so that the tracing modulation would always precede the search modulation. We therefore investigated the V1 responses in the search-then-trace task where the order of the elemental operations was reversed (Fig. 1b). Now the animals searched for a marker with the same color as the fixation point and traced a curve connected to this marker to locate the target for an eye movement. The variations in the color at the fixation point and the colors of the markers resulted in a total of four stimuli, and Figure 4a shows the receptive fields of two simultaneously recorded sites relative to these stimuli in one of the recording sessions. The receptive field of recording site 1 (Fig. 4a, RF1) was on the marker, whereas the receptive field of recording site 2 (RF2) was on the curve. It can be seen that the relevant marker that was the target of the search evoked stronger activity at recording site 1 than the irrelevant marker (U test; p < 10−6), and the latency of the response modulation was 264 ms (Fig. 4b). Neurons at recording site 2 responded more strongly to the target curve than to the distractor curve (U test; p < 10−6), and this response enhancement had a latency of 313 ms (Fig. 4c). Thus, the response enhancement due to the visual search at recording site 1 preceded the response enhancement caused by tracing at recording site 2.
Figure 5a shows the neuronal responses averaged across 41 recording sites with a receptive field on the marker (22 in monkey C, 19 in monkey G). It can be seen that the relevant marker that was the target of the search evoked a response that was 48% stronger than the irrelevant marker (paired t test; t = 7.1; p < 2.10−8; median MI, 0.46 in a window from 200–600 ms after stimulus onset), and this enhanced activity was reflected by the distribution of d′ values, which was shifted to positive values (Fig. 5c, red bars) (sign test; p < 10−9). Thus, the visual search increased the neuronal activity at the target's location, just as we had observed in the trace-then-search task. To evaluate the trace operation of the search-then-trace task, we analyzed the responses of 46 recordings sites with a receptive field on the curve (22 in monkey C, 24 in monkey G). The responses evoked by the target curve were, on average, 52% stronger than the responses evoked by the distractor curve (paired t test; t = 6.6; p < 5.10−8; median MI, 0.41), and the distribution of d′ values was shifted to positive values (Fig. 5c, blue bars) (sign test; p < 10−8). This modulation of the neuronal responses evoked by the target curve provides a neuronal correlate of the spread of attention along this curve.
Our evaluation of the timing of neuronal activity in the search-then-trace task revealed that the first event in area V1 was the visual response with a latency of about 40 ms (Fig. 5a,b). This was followed by the search modulation that occurred at a latency of 228 ms, and this event was followed, in turn, by the appearance of the tracing modulation at latency of 287 ms, 59 ms later than the search modulation (latency measured at 33% of the maximal modulation; bootstrapping test; p < 0.001). The alternative latency definitions gave similar results because the onset of search modulation measured at 25, 50, and 75% of the maximal modulation preceded tracing modulation by 58, 61, and 62 ms, respectively (p < 0.001 in each case).
The differences in latency were also evident when we compared individual recording sites with significant modulation (at p < 0.05; Mann–Whitney U test). The modulation at recording sites with a receptive field on the curve occurred 62 ms after the modulation of the responses of neurons with a receptive field on the marker (Fig. 5d) (median, 243 vs 305 ms; p < 0.002; Mann–Whitney U test; Nmarker = 31, Ncurve = 35). Thus, in this task, the order of the search and trace operations was inverted with modulation caused by tracing occurring after search.
We also investigated the latency of the modulation within monkeys. In monkey C, the search modulation occurred after 244 ms, significantly earlier than the trace modulation with a latency of 316 ms (supplemental Fig. 8, available at www.jneurosci.org as supplemental material) (bootstrapping test; p < 0.005). In monkey G, there was a trend in the same direction with search modulation occurring after 221 ms, earlier than the trace modulation with a latency of 245 ms, but this latency difference was not significant (bootstrapping test; p > 0.05).
Because of the design of the stimuli, the receptive fields on the curve in the search-then-trace task had a larger eccentricity, on average, than receptive fields falling on the marker. We performed a control experiment in monkey C with longer curves so that all the receptive fields fell on the curve (supplemental Fig. 4, available at www.jneurosci.org as supplemental material). In this experiment, the latency difference between recording sites with parafoveal and more eccentric receptive fields disappeared, which indicates that the delay between the search and trace modulation in the search-then-trace task reflects sequential processing and is not caused by differences in receptive field eccentricity.
Control for variations in eye position
We conducted a stratification analysis (Roelfsema et al., 1998) to exclude the possibility that small eye movements around the fixation point contributed to the modulation of neuronal responses. This procedure first removes trials with microsaccades (these are detected by a convolution of the eye-movement traces with a step function). We then computed the distribution of eye positions in the two stimulus conditions (target and distractor in the RF) in a window of 200–600 ms after stimulus appearance (supplementary Fig. 5, available at www.jneurosci.org as supplemental material). The stratification procedure corrects for differences in gaze position by removing surplus trials from the two conditions that are compared until the eye position distributions are identical (at a resolution of 0.2 × 0.2°). Stratification did not change results across the population of recording sites. After stratification, the latency of the trace-related modulation in the close configuration of the trace-then-search task was 181 ms, significantly earlier than the search-related modulation with a latency of 271 ms (p < 0.005; bootstrapping test). In the far configuration, the trace modulation had a latency of 244 ms and occurred significantly earlier than the search modulation with a latency of 438 ms (p < 0.005). Similarly, in the search-then-trace task, the latency of the search modulation was 237 ms, which was significantly earlier than the trace modulation with a latency of 292 ms (p < 0.005). We conclude that our results are not attributable to systematic differences in eye position between conditions.
Comparison of reaction time to the timing of response modulation
In our neurophysiological experiments, the monkeys maintained fixation for 600 ms during stimulus presentation so that we could analyze the latency of attentional modulation, uncontaminated by the influence of eye movements. It is of interest to compare the timing of modulation to the reaction times that were recorded in the sessions where the animals could react as soon as the stimulus appeared (supplemental Table 1, available at www.jneurosci.org as supplemental material). The reaction time is expected to occur later than the response modulation because there are additional delays associated with motor preparation. Indeed, in most of the conditions, there was a substantial delay between the onset of the response modulation and the saccadic reaction time (supplemental Table 1, supplemental Figs. 6–8, available at www.jneurosci.org as supplemental material). One exception is the data of monkey C in the far condition of the trace-then-search, who had an average reaction time of 502 ms, which is only slightly later than the V1 response modulation with a latency of 472 ms (supplemental Fig. 6d, available at www.jneurosci.org as supplemental material). We considered the possibility that this short delay was caused by comparing data from different recording days, because the timing of the V1 response modulation and the reaction times had been measured in different sessions. We therefore also recorded neuronal activity in area V1 during a reaction time version of the trace-then-search task in monkey C (supplemental Fig. 9, available at www.jneurosci.org as supplemental material). In the close condition, the latency of the response modulation at the colored marker was 237 ms, and the latency of the response modulation at the search target was 292 ms. The average reaction time was 497 ms in the close condition, >200 ms after the onset of the V1 modulation. In the far condition, the trace modulation occurred after 222 ms, the search modulation after 415 ms, and the reaction time was 531 ms. Thus, the attentional modulation now occurred earlier than in the main experiment and well before the saccade.
Discussion
Here we recorded neuronal activity in area V1 during complex visual tasks that required visual search and curve tracing. By recording from different visual field representations at the same time, we were able to show that the timing of neuronal response modulation related to visual search, and curve tracing depends on the order of these operations. The first operation took ∼200 ms, whereas the second operation caused an additional delay of 60 to 240 ms. These results provide new insights into the contribution of early visual areas to attentive processing and reveal, for the first time, the time course of endogenous attention shifts during sequential visual tasks. In addition, our results provide an unexpected insight in the visual search process, because we found that the timing of the search modulation depends on the relative position where a color is cued and where it can be found.
Activity in area V1 started with a transient response (Fig. 6, gray responses) that relies on the feedforward connections from the LGN (van Rullen et al., 1998; Lamme and Roelfsema, 2000; Kreiman et al., 2005). This early (preattentive) processing phase is followed by a later (attentive) phase, when operations like search and tracing modulate neuronal activity, presumably through horizontal and feedback connections. We considered the possibility that the initial transient response delays the onset of the response modulation. However, we showed previously that the attentional modulation of V1 responses can occur as early as 80 ms when a saccade brings an attended stimulus into the receptive field (Khayat et al., 2004), which indicates that the visual transients presumably did not influence the timing of response modulation in the present task.
Previous studies on the time course of endogenous attention shifts first cued monkeys to attend one item and then changed the cue so that attention had to be redirected. The redirection of attention took between 150 and 200 ms (Motter, 1994; Khayat et al., 2006; Busse et al., 2008; Herrington and Assad, 2009). The present results generalize these findings to attention shifts that are the direct consequence of the structure of an intrinsically sequential task. These attention shifts evolve on a similar time scale (here between 60 and 240 ms) as the attention shifts induced by a cue.
Origin of the V1 attentional response modulation
In accordance with previous studies, the responses of V1 neurons evoked by relevant image elements were stronger than responses evoked by irrelevant elements (Motter, 1993; Roelfsema et al., 1998; Vidyasagar, 1998; Li et al., 2006; Roberts et al., 2007; Chen et al., 2008). Because we recorded multiunit activity, the enhancement of the target representation could be caused by an increase in activity of neurons that are also activated by the distractor or by the recruitment of a new population of neurons that is silent when the distractor falls in their RF. Previous single-unit studies in area V1 found that many neurons that respond to a relevant item have a weaker response to an irrelevant one (Motter, 1993; Supèr and Roelfsema, 2005; Chen et al., 2008), although Vidyasagar (1998) reported that there is also a substantial fraction of V1 neurons that are only activated by attended stimuli.
A mechanistic model for the neuronal implementation of sequential tasks
Our results suggest mechanistic explanations for the implementation of the two composite tasks that are summarized in Figure 6. In the trace-then-search task, the first tracing step is associated with the enhancement of neuronal responses evoked by the relevant curve, at a latency of ∼180 ms (Fig. 6a,b, blue). Modeling studies (Sha'ashua and Ullman, 1988; Ullman, 1996; Grossberg and Raizada, 2000; Roelfsema, 2006) show that curve tracing can be implemented in the visual cortex as the propagation of an enhanced response along the representation of the target curve. The enhanced activity starts at the fixation point, and when it reaches the end of the curve, the color of the relevant marker can be registered in areas of the visual and frontal cortex as target color for the subsequent search (Chelazzi et al., 1998; Sheinberg and Logothetis, 2001; Everling et al., 2002). Models of visual search propose that a color selective signal is then fed from higher areas back to the lower areas (van der Velde and de Kamps, 2001; Hamker, 2005) to enhance the response of neurons with a receptive field on the target color. We observed a strong enhancement of V1 responses evoked by the target disc, in accordance with these models and also in line with neurophysiological studies in area V4 (Chelazzi et al., 2001; Bichot et al., 2005). The delay caused by the search was 90 ms if the target disc was close to the marker, and it increased to 240 ms if they were farther apart. This dependence of the search delay on the relative location of cue and target conforms to psychophysical findings that endogenous attention shifts are fastest over short distances (Hazlett and Woldorff, 2004), although the effects observed by us were larger. It is possible that the configuration of the stimulus contributed to this large difference in timing. The target disc was at a location that was at a direct continuation of the traced target curve in the close condition, whereas it was at a location that was a continuation of the ignored distractor curve in the far condition. The enhanced neuronal response at the location of the target disc (Fig. 6a,b, red) could be read out by cortical areas involved in the planning of an appropriate saccade, and it might thereby complete the trace-then-search task.
In the search-then-trace task, the modulation of neuronal responses indexes another order of the operations: now search initially enhances neuronal activity evoked by the target marker at a latency of 228 ms (Fig. 6c). The resulting focus of enhanced response at this marker (Fig. 6c, red circle) provides the starting location of the subsequent curve-tracing process, which is implemented as a spread of the response enhancement along the target curve 59 ms later, and this process eventually labels the circle at the end of this curve as the target for the saccade.
It is likely that the modulation of neuronal activity in the visual cortex during sequential tasks has a correlate in psychology, as selective attention is directed to the target of a search task (Kim and Cave, 1995), and it also spreads along the target curve of a curve-tracing task (Houtkamp et al., 2003). We tentatively propose that these shifts of attention are the psychological correlate of task-induced modulations of neuronal activity in lower and higher areas of the visual cortex, in accordance with Figure 6, although other perceptual processes like brightness perception and color filling-in may also depend on the propagation of neuronal activity (Paradiso and Nakayama, 1991; Sasaki and Watanabe, 2004; Huang and Paradiso, 2008). Because attention and neuronal response modulations are at different levels of description, attention shifts cannot cause a modulation of neuronal activity. Conversely, a mechanistic description of a visual routine that relies on the propagation of enhanced neuronal activity can predict where, when, and why the monkey directs his attention at the psychological level.
Visual routines and parameter transfer
The visual routine theory of Ullman (1984) proposed that complex tasks in vision can be solved by the sequential application of a limited number of operations. In this view, a visual routine is more than a succession of processing steps, however, because there is the need for successive operations to exchange information with each other (Roelfsema, 2005). In the search-then-trace task, for example, the end result of the search has to be passed on to the subsequent tracing operation that must start at the appropriate location. Our results suggest that interactions between lower and higher areas could cause the visual cortex to act as a cognitive blackboard (Mumford, 1992) where relevant features and locations are temporally indexed with an increased neuronal response, thus permitting an exchange of information (Roelfsema et al., 2000). Because this important point is perhaps nonintuitive, we have illustrated how a computer program would use variables to transfer information between subroutines of the search-then-trace task in Figure 6d. Variables provide subroutines with the information that is necessary for their initiation. They return their results in other variables (with the prefix “VAR”). The subroutine Color_Search, for example, receives the relevant color “Col” as input, finds the marker with this color, and stores its location in another variable, Loc1, which is then passed on to the Trace_Curve operation. We propose that the enhancement of neuronal firing rates in the visual cortex plays an equivalent role in the neuronal implementation of visual routines because it highlights the information that is exchanged between subroutines. For example, the search process enhances the firing rate of neurons that respond to the relevant marker (Fig. 6c, red circle, i.e., Loc1) and can thus specify the start of the subsequent trace operation. We found that the search modulation indeed persisted during the start of the trace operation, as is required for the information transfer. Such a role of early visual areas in the temporary storage of variables is in accordance with recent studies proposing that area V1 contributes to working memory (Harrison and Tong, 2009; Serences et al., 2009). The enhancement of neuronal activity in other visual areas could index other features, like colors, shapes, or motions, if they have to be transferred between successive processing steps.
A causal role for attentional modulation in area V1 in visual routines?
Our results are consistent with the possibility that the delayed V1 response modulation contributes to performance in sequential tasks. However, the V1 activity could also reflect processing in higher visual areas that feed back to area V1 when the subtasks have been solved. A recent study (Khayat et al., 2009) that compared neuronal activity in the frontal eye fields (FEFs) to area V1 in the curve-tracing task did not find significant differences in the latency of the attentional modulation between these areas. Thus, if the curve-tracing task is first solved in higher areas, then these areas presumably do not include the FEFs. Another possibility is that areas of the visual and frontal cortex jointly select display items that are relevant and that the attentional modulation develops simultaneously in these areas. Thompson et al. (1996) estimated that the delay between attentional modulation in FEFs and the onset of the saccade is ∼70 ms, and this value is compatible with the delays between the V1 attentional modulation and the monkeys' response times in the present study (supplemental Table 1, available at www.jneurosci.org as supplemental material).
The tasks of the present study required the visual selection of image elements that were near to distractors. In these situations, the enhanced activity may have to be propagated in areas with a high spatial resolution, like area V1, because RFs in higher visual areas could be too large for selection, in particular if the target and distractor have similar features. In this view, the distance between relevant and irrelevant image elements may influence the involvement of early visual areas, and differences in these distances between studies could explain why attentional modulation is sometimes weak in area V1 (Luck et al., 1997) and sometimes strong (Motter, 1993). In the curve-tracing task, the V1 response modulation is prominent because the activity of only a few recording sites suffices for the identification of the target curve (Poort and Roelfsema, 2009). Moreover, monkeys are likely to make an error on trials where V1 neurons that represent the distractor curve enhance their response over the neurons that represent the target curve (Roelfsema and Spekreijse, 2001; Pooresmaeili et al., 2010). Similarly, activity in area V1 has been shown to predict figure perception in a texture segregation task (Supèr et al., 2001).
The causal involvement of area V1 in visual routines could be directly tested with interference techniques. Such an experiment would require control over the delayed response modulation without affecting the feedforward propagation of information from V1 to higher visual areas. Lamme et al. (2002) showed that a poststimulus mask that interferes with the delayed V1 response modulation in a texture segregation task also impairs figure perception. Furthermore, transcranial magnetic stimulation over area V1 disrupts visual perception and visual search, even if this disruption occurs in a later processing phase when the visual stimulus has activated higher visual areas (Cowey and Walsh, 2001; Juan and Walsh, 2003). Future studies using this and other techniques could test if, when, and how early visual areas contribute to the execution of sequential visual tasks.
Footnotes
This work was supported by a grant from the European Union (EU IST Cognitive Systems Project 027198, “Decisions in Motion”), a grant from NWO-ALW (Nederlandse Organisatie voor Wetenschappelijk Onderzoek-Aard en Levenswetenschappen), Human Frontier Science Program Grant RGP0007/2007-C, and a grant from NWO-VICI to P.R.R. We thank Victor Lamme for his assistance in the surgeries, and Kor Brandsma and Jacques de Feiter for biotechnical assistance.
- Correspondence should be addressed to Pieter R. Roelfsema at the above address. p.roelfsema{at}nin.knaw.nl