Humans are able to categorize complex natural scenes very rapidly and effortlessly, which has led to an assumption that such ultra-rapid categorization is driven by feedforward activation of ventral brain areas. However, recent accounts of visual perception stress the role of recurrent interactions that start rapidly after the activation of V1. To study whether or not recurrent processes play a causal role in categorization, we applied fMRI-guided transcranial magnetic stimulation on early visual cortex (V1/V2) and lateral occipital cortex (LO) while the participants categorized natural images as containing animals or not. The results showed that V1/V2 contributed to categorization speed and to subjective perception during a long activity period before and after the contribution of LO had started. This pattern of results suggests that recurrent interactions in visual cortex between areas along the ventral stream and striate cortex play a causal role in categorization and perception of natural scenes.
Electrophysiological measurements during categorization of natural scenes have shown that the human brain can discriminate images containing animals from distractors as early as 150 ms after image presentation (Thorpe et al., 1996). Categorization proceeds rapidly and effortlessly, even when the images occur in unpredictable positions in peripheral vision (Fize et al., 2005) or outside the focus of attention (Li et al., 2002). These findings have challenged the traditional view that high-level visual processing is slow and attention demanding and have led to the account that the ultra-rapid categorization must be driven exclusively by feedforward activation from V1 via higher ventral visual areas to prefrontal cortex (Thorpe and Fabre-Thorpe, 2001).
Recent accounts of visual perception stress the role of feedback from higher visual areas to V1 (Bullier, 2001; Hochstein and Ahissar, 2002; Lamme, 2004). Recurrent interactions within and between lower- and higher-order areas are initiated quickly, almost immediately after the activation has reached V1 and has been fed forward (Bullier, 2001; Lamme, 2004). Using intracerebral recordings from humans, Liu et al. (2009) showed that category-related information is present in occipitotemporal areas in the ventral visual pathway as early as 100 ms after stimulus onset. Boehler et al. (2008) applied high-resolution magnetoencephalography and found recurrent modulation of V1 activity as early as 100–120 ms after stimulus onset, 27 ms after the onset of the initial feedforward sweep of processing in V1 at 71 ms. The delay was only 11 ms relative to the onset of activity in extrastriate areas. Thus, the current estimations of the speed of processing in the human visual system support the idea that the category-related activity at 150 ms (Thorpe et al., 1996) does not simply represent the first responses in hierarchically high visual areas, but that it may already involve recurrent activity between higher and lower areas.
In humans, most of the causal evidence for recurrent processing comes from the dorsal stream; it has been shown that the contribution of early visual areas (V1/V2) is critical for motion perception after the activation of MT/V5 (Pascual-Leone and Walsh, 2001; Silvanto et al., 2005a,b; Koivisto et al., 2010). Causal evidence for the involvement of recurrent processing between early and later areas along the ventral stream is still lacking. We examined whether or not recurrent processing contributes to the categorization and subjective perception of natural images. We used fMRI-guided, navigated single-pulse transcranial magnetic stimulation (TMS) to causally interfere with the activity of early visual areas (V1/V2) and the lateral occipital area (LO), an intermediate area along the ventral stream that is activated during conscious object recognition (Grill-Spector et al., 2000) and is involved in analyzing the visual shape (Grill-Spector, 2009). If recurrent interactions contribute to categorization or perception, V1/V2 activity should be critical for performance both before and after LO shows its earliest activation. By contrast, the feedforward account predicts that V1/V2 activity should not have any critical role after LO has been activated.
Materials and Methods
Thirteen healthy, right-handed participants with normal or corrected-to-normal vision were tested (six males; mean age, 23.2 years; range, 20–27 years). The experiment was undertaken with the understanding and written consent of each participant. The study was accepted by the ethical committees of the University of Turku and the Hospital District of Helsinki and Uusimaa, and it was conducted in accordance with the Declaration of Helsinki.
Stimuli and design.
The visual stimuli were color photographs of natural scenes (Fig. 1a) from various online sites (e.g., www.iaps-association.org; www.kuvaliiteri.fi). The images of animals and non-animals represented a mixture of general views and close-ups. They varied in terms of luminance, color, and spatial frequency within both categories; therefore, the categorization task could not be based on detection of low-level visual features. The images of animals (n = 216) contained one or more animals displayed in their natural environments (54 mammals, 54 birds, 54 reptiles/fishes, 54 small animals/insects). The non-animal images (n = 216) represented various subcategories (54 landscapes/plants, 54 food/fruits/vegetables, 54 buildings, 54 vehicles). The participants had never seen the photographs before. Thirty additional images were used only in practice trials.
The experiment used a 2 (stimulation area: V1/V2 vs LO) × 2 (type: animal vs non-animal) × 7 [stimulus onset asynchronicity (SOA): 30, 60, 90, 120, 150, 180, 210 ms] (plus the no-TMS condition) within-subject design. Participants were tested in two sessions, separated by at least 3 d. The order of V1/V2 and LO stimulations was counterbalanced. For six participants, V1/V2 was stimulated in the first three stimulus blocks and LO in the remaining blocks in the first session; in the second session the order was reversed. For seven participants, the first session began with three LO stimulation blocks, followed by three V1/V2 stimulation blocks; the order was reversed in the second session.
In one session, six stimulus blocks (three during V1/V2 stimulation and three during LO stimulation) were completed. Each stimulus block included 72 stimuli. Within a block, 16 images were assigned to the baseline (no TMS) condition and eight images (one from each of the eight subcategories) to each TMS SOA (30, 60, 90, 120, 150, 180, 210 ms). The SOA at which the stimuli were presented was counterbalanced across the participants by rotating the images across the SOAs and no-TMS condition. Each image appeared equally often in the V1/V2 and LO stimulation conditions. In the first session, each image was unique. In the second session, the images were repeated, so that each image was seen twice by each participant. In both stimulation conditions, the total number of images per SOA for each participant was 48 (24 animal and 24 non-animal images).
The stimulus images (3.1° × 2.5°) were presented on a 19” monitor with 800 × 600 resolution (75 Hz; 13.3 ms per frame) from a distance of 150 cm (Fig. 1b). Each trial began with a fixation point in the center of the screen for 700 ms, followed by the stimulus in the center for 13.3 ms. The participants were asked to decide as fast and accurately as possible whether the image represented animals or not. After each decision, they rated the quality of their subjective perception on a scale from 0 to 2 (0 = my perception was unclear and my response was a pure guess; 1 = my perception was unclear but I could see a part/pattern of animal/object; 2 = I had clear perception of the animal/object). The animal/non-animal response was given by pressing one of two responses in the back of the response pad (Dual Action; Logitech) with the forefinger or middle finger of the right hand; the subjective rating was indicated by pressing one of the buttons on the top of the pad with the thumb of right hand. It was stressed that both the speedy decision and the subjective rating were important and should be done carefully. After a practice block, it was verified that the participant had understood the instructions and could name all of the response buttons correctly.
The fMRI measurements for each participant were performed with a 3-T MRI scanner (Signa HDxt; General Electric) with a phased array eight-channel coil. The visual stimuli were presented with a three-micromirror data projector (Christie X3; Christie Digital Systems) using Presentation software (Neurobehavioral Systems). The major imaging parameters were repetition time (TR), 1.8 s; echo time (TE), 30 ms; flip angle (FA), 60°; field-of-view (FOV), 20 cm; matrix, 64 × 64; and slice thickness, 3 mm. Twenty-nine slices were acquired in interleaved order.
The V1 and V2 localization, including target region identification (the region corresponding to the visual field position in which the visual stimuli were presented) were based on 24-region multifocal fMRI (Vanni et al., 2005). Four runs, each 4 min long, comprised 32 miniblocks of 7.3 s duration; during each miniblock, a subset of the 24 regions were stimulated. We determined retinotopic LO representation at the same retinotopic positions as in V1/V2 with 50 achromatic photographs of objects (1.3° or 3.1° in diameter), which were contrasted with fixation alone. Four runs, each 4 min long, comprised blocks at nine different locations of the visual field. The visual motion-sensitive area V5 was used as a functional landmark and was localized with one run of low-contrast expanding and contracting rings (24°) versus rest.
Standard preprocessing with slice-time and motion correction were followed by estimation of general linear model with SPM8 Matlab toolbox (MathWorks). Functional areas were determined from three-dimensional (3D) images using functional and anatomical landmarks (Fig. 1c,d). The V1/V2 TMS target area was selected on the basis of the retinotopic activations elicited by the foveally presented object pictures and the multifocal stimuli. Because selective TMS stimulation of V1 without stimulation of V2 is very difficult because V1 is anatomically surrounded by V2, we selected the retinotopic area corresponding to fovea in the right occipital pole that was closest to the skull and therefore the easiest to stimulate with TMS. The LO in the right hemisphere was localized on the basis of the activation elicited by the object pictures that did not overlap with those elicited by the multifocal and motion stimuli, but was located posterior from V5, approximately halfway between V5 and V1/V2 areas. The coordinates of the approximate center of the target areas were visually estimated and extracted by using SPM and used as the TMS stimulation target sites.
Nexstim eXimia stimulator and Nexstim biphasic 70 mm figure-of-eight coil were used for administrating single TMS pulses. A chin rest was used to obtain a stable head position and earplugs were used to attenuate the sound of the TMS pulse-induced noise. The coil was fixed on a holder, the coil plane was positioned tangentially on the head, and the TMS pulses were directed on the target sites (V1/V2 or LO) by using the MRI-guided eXimia navigated brain stimulation (NBS) system (Nexstim), which continuously registers the relationship between the brain and TMS coil with a spatial resolution of 2 mm.
The TMS intensity was 70% of the maximum output of the stimulator, provided that it did not produce eye blinks, muscle twitches, or other uncomfortable sensations. To obtain a comfortable level of stimulation for all of the participants, the intensity was decreased to 65% for one participant in the V1/V2 condition, and to 60–67% for four participants in the LO condition. The TMS-induced electric field distribution in the V1/V2 and LO target areas were modeled with the eXimia NBS system that estimates the E-field strength in the brain by using spherical conductor model (Sarvas, 1987; Heller and van Hulsteyn, 1992). The estimated E-field strength did not differ between V1/V2 (128 V/m, SD = 25) and LO (122 V/m; SD = 24) target areas (t(12) = 0.83; not significant). NBS takes into account the shape of the copper spirals inside the TMS coil, the coil orientation and location, current direction, and the overall shape of the head and the brain. The spherical conductor model does not take into account the sulci/gyri pattern around the calcarine sulcus. The validity the E-field strength estimation might be increased with the finite element model, which indicates higher focality of the field in gray matter than the spherical model (Thielscher et al., 2011), but for our purposes the accuracy of the spherical model is sufficient.
Analyses of response speed were based on median reaction times (RT) in trials where the image categorization was made correctly. The quality of subjective perception in the same trials was operationalized as average scores in the subjective rating scale (0–2). Accuracy was analyzed according to signal detection theory (Stanislaw and Todorov, 1999). We calculated d′ for a measure of accuracy as it is unaffected by response bias. A d′ value of 0 indicated an inability to distinguish signals from noise, whereas larger values indicated a greater ability to distinguish signals from noise.
Preliminary analyses showed that RTs to animals (551 ms, SD = 33) were faster than to non-animals (593 ms, SD = 29) (F(1,12) = 7.38, p = 0.019). Because stimulus type (animal vs non-animal) did not show any significant interactions in the analyses of RTs, accuracy, and subjective perception, further analyses were performed with the data from pooled stimulus types.
The RTs, accuracy, and subjective ratings were analyzed with 2 (TMS stimulation area: V1/V2 vs LO) × 7 (SOA) repeated-measures ANOVAs (with Huynh–Feldt corrected p values), which were supplemented by trend analyses to reveal the effects of SOA more exactly. Significant effects of SOA were followed by Fisher's procedure and by comparing the scores to those in the no-TMS baseline condition with two-tailed paired-samples t tests.
The asynchrony between the onset of the image and that of the TMS pulse (SOA) influenced categorization speed (F(6,72) = 3.39, p < 0.05) (Fig. 2a), and this influence depended on the stimulated area (area × SOA, quadratic interaction; F(1,12) = 4.99, p < 0.05). TMS on V1/V2 decreased response speed (F(6,72) = 2.80, p = 0.017) at the SOAs of 90, 120, 150, 180, and 210 ms (p < 0.05), while stimulation of LO decreased response speed (F(6,72) = 2.92, p < 0.05) at the SOAs of 150 and 180 ms (p < 0.05). These findings indicate clearly that the contribution of V1/V2 was still critical after LO had shown its first response.
TMS did not have any significant effect on the accuracy of categorization (Fig. 2b). The high overall level of accuracy (91.7%) could be expected, as the images were not degraded in any way to make them harder to recognize, they were relatively large, and presented to the fovea, which is cortically represented in both hemispheres and enjoys a high degree of cortical magnification. However, the quality of subjective perception was significantly impaired by TMS (F(6,72) = 5.23, p < 0.001). A quadratic trend (p < 0.001) shows that subjective perception first impaired and then improved back to normal level as SOA increased (Fig. 2c). Moreover, the effects of TMS differed between the areas (area × SOA, quadratic trend, F(1,12) = 12.80, p < 0.005). The contribution of V1/V2 (F(1,12) = 22.85, p < 0.001, quadratic trend) was critical at the SOAs of 90, 120, 150, and 180 ms (p < 0.05), whereas the contribution of LO (F(1,12) = 6.72, p < 0.025, quadratic trend) had a later onset, being significant at 150 ms (p < 0.05). These findings indicate that the quality of subjective perception depended on the contribution of V1/V2 before and after the onset of the LO activity.
We tested the correlations between the modeled TMS-induced E-field in the targeted V1/V2 and LO areas and the magnitude of TMS-induced suppression at those of the SOAs where TMS impaired perception (Fig. 2d). The stronger field was in the V1/V2 target area, the stronger suppression of perception was observed at the SOA of 120 ms (r = 0.63, p = 0.020, Spearman's nonparametric test). Because of the smaller suppressive effect in LO and the resultant small variation in the size of the effect, the corresponding correlation (r = 0.28) between the field strength in the LO target area and suppression at the SOA of 150 ms was not statistically significant. As the power of the TMS output in V1/V2 stimulation was constant (70% for 12 of the 13 participants), the correlation suggests that TMS impaired perception by disrupting cortical processing in visual areas rather than by inducing nonspecific effects. The specificity of the influences of TMS is also confirmed by the different time windows of the V1/V2 and LO effects.
In addition, because some of the participants reported that the stimulation of LO was less comfortable than the V1/V2 stimulation, we further ruled out the possibility of nonspecific effects resulting from LO stimulation by showing that it does not have an influence on a color perception task. LO is not known to be involved in color perception; therefore, if color perception would be suppressed in the same manner as scene categorization, the results of the main experiment would probably result from nonspecific effects of TMS. Seven of the participants (plus one of the authors) categorized and rated their subjective perception (scale, 0–3) of isoluminant blue and green rectangles (3.1° × 2.5°; 21 cd/m2). The overall performance level was adjusted individually in pre-experimental trials by varying the similarity (the amount of blue) between the colors so that the difficulty level (∼90% correct) would correspond to that in scene categorization. The stimulus presentation and stimulation of LO were otherwise identical to those of the categorization experiment. The results showed that stimulus-TMS SOA did not have any effect on response speed (F(6,42) = 1.34) or on the ratings of subjective perception (F < 1). Although SOA did not show any main effect or trends, we examined further the critical SOAs where TMS had produced impairments in the main experiment with t tests. The response times at the SOAs of 150 ms (554 ms, SD = 64) and 180 ms (549 ms, SD = 54) did not differ from those at the shortest SOA, 30 ms (554 ms, SD = 80) or no-TMS baseline (522 ms, SD = 57) (p > 0.05). Neither did TMS impair the ratings of subjective perception at the SOA of 150 ms (2.2, SD = 0.4) compared with those at the shortest SOA (2.1, SD = 0.5) or no-TMS baseline (2.2, SD = 0.4) (p > 0.05). Thus, TMS did not influence performance at the 150 and 180 ms SOAs during which categorization and perception of natural scenes were impaired, making it unlikely that nonspecific effects of TMS explain our results.
Previous TMS studies (Pascual-Leone and Walsh, 2001; Silvanto et al., 2005a,b; Koivisto et al., 2010) have shown that the activity of early visual cortex is necessary for subjective perception of motion after the activity of MT/V5, suggesting a role for recurrent processing along the dorsal stream in motion perception. The present study suggests that recurrent interactions also play a causal role along the ventral stream (the object recognition pathway) contributing to categorization and perception of natural images.
The application of TMS showed a long, critical V1/V2 activity period (90–210 ms) in categorization of natural images. This period began before and continued after LO showed its earliest contribution (150 ms). Therefore, the results do not support the linear feedforward account of categorization that predicts that the contribution of V1/V2 is over after the information has been fed forward along the ventral stream. As predicted by the recurrent processing hypothesis, V1/V2 continued to be critical for categorization after the higher area in hierarchy along the ventral stream (LO) had shown its earliest response. In addition, the effects of TMS on the quality of subjective perception converge with the results on categorization speed. V1/V2 contributed to subjective perception 90–180 ms after the image, whereas the critical activity of LO occurred at 150 ms, also showing that the activity of V1/V2 continued to be critical after activation of the higher area in visual hierarchy. In our more recent TMS study (N. Salminen-Vaparanta, M. Koivisto, S. Vanni, L. Henriksson, G. Quaß, V. Noreika, and A. Revonsuo, unpublished observations), LO contributed to recognition of line drawings at 120 and 140 ms. The activation of LO by 150 ms or before fits well with the results of source localization (Fize et al., 2005), which suggest that the category-related activity at 150–170 ms during perception of natural images results from extrastriate sources.
A recent study (Camprodon et al., 2010) provided support for recurrent processing by showing that TMS to the occipital pole impaired accuracy in recognition of natural images at two discrete time windows, 100 and 220 ms after the image. The task called for discrimination at the basic level of category hierarchy between images of birds and mammals, which requires more detailed processing of the image and a longer categorization time than the discrimination at superordinate level between animals and non-animal images (Macé et al., 2009). The average response time was 769 ms in the study by Camprodon et al. (2010), which is ∼200 ms longer than in the present study. Thus, our study generalizes the role of V1/V2 in recurrent processing to a less demanding task, resembling more closely the conditions under which ultra-rapid categorization has been observed previously (Thorpe et al., 1996).
The application of TMS did not reveal discrete V1/V2 time windows, which correspond to feedforward and recurrent processing, respectively. In fact, the single long V1/V2 activity period that we found fits better than two or more discrete periods (Camprodon et al., 2010) to models (Lamme, 2004), which assume that iterative recurrent loops between V1 and higher areas are progressively engaged as activation proceeds forward in visual hierarchy. As additional areas are involved in the feedback loops at each successive level of processing, there should not be any separate inactive period in the early visual areas between feedforward and feedback activity.
Finally, one must note that our results do not directly show that the early visual areas received feedback from LO or other extrastriate areas. They indicate that perception and categorization depend on V1/V2 activity in a latency scale where activation has already been fed forward and a clear population response has emerged in LO. As the feedback projections from extrastriate cortex back to V1 take as little as 10 ms (Hupé et al., 2001), the involvement of recurrent loops between areas along the ventral stream and V1 during the end part of the V1/V2 activity period is likely. These recurrent loops may enhance discriminability or efficiency of signal processing, tune down surrounding activation gain, and eventually give rise to cognitive processes, such as segregation between figure and ground.
This work was supported by the Academy of Finland Grants 124623, 124698, and 125175. N.S.V. was supported by the Graduate School of Psychology in Finland.
- Correspondence should be addressed to Mika Koivisto, Centre for Cognitive Neuroscience, University of Turku, Turku 20014, Finland.