Abstract
Natural scenes are characterized by individual objects as well as by global scene properties such as spatial layout. Functional neuroimaging research has shown that this distinction between object and scene processing is one of the main organizing principles of human high-level visual cortex. For example, object-selective regions, including the lateral occipital complex (LOC), were shown to represent object content (but not scene layout), while scene-selective regions, including the occipital place area (OPA), were shown to represent scene layout (but not object content). Causal evidence for a double dissociation between LOC and OPA in representing objects and scenes is currently limited, however. One TMS experiment, conducted in a relatively small sample (N = 13), reported an interaction between LOC and OPA stimulation and object and scene recognition performance (Dilks et al., 2013). Here, we present a high-powered preregistered replication of this study (N = 72, including male and female human participants), using group-average fMRI coordinates to target LOC and OPA. Results revealed unambiguous evidence for a double dissociation between LOC and OPA: relative to vertex stimulation, TMS over LOC selectively impaired the recognition of objects, while TMS over OPA selectively impaired the recognition of scenes. Furthermore, we found that these effects were stable over time and consistent across individual objects and scenes. These results show that LOC and OPA can be reliably and selectively targeted with TMS, even when defined based on group-average fMRI coordinates. More generally, they support the distinction between object and scene processing as an organizing principle of human high-level visual cortex.
SIGNIFICANCE STATEMENT Our daily-life environments are characterized both by individual objects and by global scene properties. The distinction between object and scene processing features prominently in visual cognitive neuroscience, with fMRI studies showing that this distinction is one of the main organizing principles of human high-level visual cortex. However, causal evidence for the selective involvement of object- and scene-selective regions in processing their preferred category is less conclusive. Here, testing a large sample (N = 72) using an established paradigm and a preregistered protocol, we found that TMS over object-selective cortex (lateral occipital complex) selectively impaired object recognition, while TMS over scene-selective cortex (occipital place area) selectively impaired scene recognition. These results provide strong causal evidence for the distinction between object and scene processing in human visual cortex.
- lateral occipital cortex
- object recognition
- occipital place area
- scene recognition
- transcranial magnetic stimulation
Introduction
Our daily-life environments, such as city streets, forests, and kitchens, are characterized both by local object content (cars, trees, refrigerators) and global spatial layout. These sources of information are to some extent independent, with many scenes being identifiable without requiring individual object recognition, and most objects being identifiable without a background scene. Accordingly, cognitive neuroscience research has revealed distinct pathways in visual cortex that are primarily involved in either scene or object recognition (Oliva, 2013; Epstein, 2014).
In humans, fMRI research has shown that viewing objects (vs scrambled controls) selectively activates regions in lateral occipital cortex (LOC) and posterior fusiform gyrus (Malach et al., 1995; Grill-Spector, 2003). By contrast, viewing scenes (vs objects) activates the following three scene-selective regions (Epstein and Baker, 2019): the parahippocampal place area (PPA), the retrosplenial complex (RSC), and the occipital place area (OPA). Studies using multivariate pattern analysis have shown that these scene- and object-selective regions represent distinct aspects of a scene, with scene-selective regions representing spatial layout (e.g., open vs closed) and object-selective regions representing object properties (Kravitz et al., 2011; Park et al., 2011; Harel et al., 2013).
While the neuroimaging evidence for a functional distinction between scene- and object-selective regions is convincing, the evidence for these regions being causally involved in selectively processing their preferred category is weaker. A number of studies have used transcranial magnetic stimulation (TMS) to interfere with activity in one of these regions to demonstrate impaired object or scene recognition. These studies showed that TMS over LOC impaired object recognition performance (Pitcher et al., 2009; Koivisto et al., 2011; Mullin and Steeves, 2011). Because of physical limitations of TMS reaching deeper structures, studies investigating scene recognition have targeted OPA, rather than PPA or RSC. These studies found that scene categorization is impaired when OPA is stimulated (Dilks et al., 2013; Ganaden et al., 2013). The causal evidence for a double dissociation between object-selective LOC and scene-selective OPA, however, is limited to a single experiment, conducted in a relatively small sample (N = 13; Dilks et al., 2013). Providing such evidence is important for ruling out alternative explanations for impaired performance. For example, TMS-related artifacts (e.g., discomfort) differ depending on scalp location, compromising the direct comparison of task performance across regions. TMS-related artifacts (e.g., the clicking sound of a TMS pulse) may also differentially affect different tasks, for example related to the attentional requirements of each task.
A, Examples of stimuli for each category. B, TMS sites used in the present study. OPA and LOC were based on fMRI coordinates from previous studies [i.e., for the LOC Talairach coordinates: 45, −74, 0 (Pitcher et al., 2009); and for the OPA Talairach coordinates: 34, −77, 21 (Julian et al., 2016)]. C, Example trial of the 4AFC object/scene recognition task. Five TMS pulses at a rate of 10 Hz were delivered at the onset of the stimulus.
In the experiment by Dilks et al. (2013), participants performed two four-alternative forced-choice (4AFC) categorization tasks, one involving scenes (beach, forest, city, kitchen) and one involving objects (camera, chair, car, shoes) (Fig. 1). In different blocks, TMS was applied over right LOC, right OPA, or vertex. Results showed a significant interaction between region (LOC, OPA, vertex) and task (scenes, objects). However, such an interaction does not necessarily indicate a double dissociation as it could, in principle, result from task-specific effects of TMS over one region alone. Furthermore, TMS over LOC has been shown to both impair object perception and facilitate scene perception (Mullin and Steeves, 2011), such that scene recognition might be worse when stimulating OPA compared with LOC even when TMS over OPA has no effect on scene recognition. Comparisons with vertex (control) stimulation are therefore important for interpreting the interaction. These comparisons provided some evidence that TMS over LOC impaired object recognition (p = 0.06) and TMS over OPA impaired scene recognition (p < 0.05).
While these results are encouraging, the effects of TMS are variable across individuals (Ridding and Ziemann, 2010), and effect sizes are typically small, such that large samples are needed to obtain generalizable results. Indeed, it is increasingly recognized that results obtained in small samples are often not reproducible or overestimate effect sizes (Button et al., 2013). Therefore, to provide strong causal evidence for a double dissociation between object- and scene-selective regions, we replicated the experiment by Dilks et al. (2013) in a large sample of participants (N = 72) using a preregistered protocol.
Materials and Methods
The preregistration is available at this website: https://aspredicted.org/jm6ey.pdf.
Participants
Seventy-two volunteers (mean age ± SEM, 23.30 ± 0.42 years; age range, 18–33 years; 43 females) participated in the study. The current study was performed to select participants for a subsequent study, for which three groups of 24 participants were needed. All volunteers had normal or corrected-to-normal vision and were right handed. Exclusion criteria were as follows: (1) history of epileptic seizures; (2) a nuclear family member afflicted with epilepsy; (3) history of psychiatric or neurologic diseases; (4) metallic objects in the head; (5) implanted devices (e.g., pacemaker, cochlear implant); and (6) having used psychoactive medication or recreational drugs <48 h before the experiment. The protocol was approved by the medical ethical committee of the Commissie Mensgebonden Onderzoek (Arnhem-Nijmegen) and conducted in accordance with the Declaration of Helsinki (Fortaleza amendments).
Behavioral task
Participants performed a 4AFC task, identical to the task used by Dilks et al. (2013). In a blocked design, participants were either presented with an object, which could be a camera, car, chair, or shoes, or with a scene, which could be a beach, city, forest, or kitchen. For all stimuli, grayscale images from the SUN Database (Xiao et al., 2010) were used, which were further degraded by blending the image with 8 × 8 grid of tiles. For scenes, these tiles were grayscale with random intensity. For objects, a scrambled version at 60% transparency was used as tiles and overlaid on the image. All images spanned a visual angle of 9° × 9° (Fig. 1A).
The experiment consisted of two parts. The first part of the experiment determined the optimal stimulus presentation duration by using a thresholding procedure that leads to an average score of ∼63% for each category. For this, a QUEST staircase procedure was used (number of trials, 32; β = 3.5; δ = 0.01; γ = 0.5; grain = 2) in two runs of 256 stimuli (i.e., 128 stimuli for objects as well as for scenes, equaling 32 stimuli per category). The range of stimulus presentation was fixed between 64 and 128 ms per category. The lower (shorter presentation time) threshold value of the two runs for each category was selected for the second part of the experiment. The stimulus presentation times on average were as follows: camera, 91.6 ms; car, 74.9 ms; chair, 114.0 ms; shoes, 110.5 ms; beach, 80.8 ms; city, 93.3 ms; forest, 86.9 ms; and kitchen, 116.1 ms. Each stimulus was followed by a mask, which was presented for 500 ms (Fig. 1C). The mask was made of a 4 × 4 grid of tiles containing a scramble of eight randomly selected image parts. For scenes, undegraded images were used, whereas for objects, degraded images were used to make the mask.
The second experimental part consisted of six runs of 64 stimuli (i.e., 32 stimuli for objects as well as for scenes, equaling 8 stimuli per category). During this part, participants were stimulated with TMS, which started at stimulus onset. Stimulation site, that is, right (r) LOC, rOPA, or vertex, was changed each run and followed a pseudorandom, counterbalanced palindromic design.
Transcranial magnetic stimulation and site localization
A MagVenture MagPro X100 magnetic stimulator was used to deliver TMS with a Cool B-65 figure-of-eight coil. Note that this is different from Dilks et al. (2013), who used a MagStim Super Rapid2 stimulator.
At the onset of each object or scene stimulus, a train of five pulses was applied at 10 Hz (i.e., 500 ms total duration of a single train) at 60% of maximum stimulator output (MSO). TMS was delivered over rLOC, rOPA, and vertex, with a posterior-to-anterior magnetic field orientation. The correct location during stimulation was monitored using a Localite neuronavigation system. The size and shape of a participant's head was modeled by marking left and right tragus of the ear, left and right canthi of the eyes, nasal bridge, inion, and vertex. Subsequently, a standard MNI brain was fit, based on those points. MRI coordinates were used to identify the peak activation voxel of the rLOC (i.e., Talairach coordinates 45, −74, 0; Pitcher et al., 2009), and of the rOPA (i.e., Talairach coordinates 34, −77, 21; Julian et al., 2016). Note that Dilks et al. (2013) used individual fMRI coordinates. The location of the vertex was determined as the midpoint between both tragi, on top of the head. For all stimulation sites, coil location was monitored using the Localite system and was kept constant within a range of 2 mm displacement (Fig. 1B).
Before the start of the task, the phosphene threshold (PT) of each participant was determined using the “method of constant stimuli” (Mazzi et al., 2017). On average, the PT was 60.43 ± 1.45% of MSO ranging between 38 and 100 MSO. After PT determination, the five pulse, 10 Hz stimulation, as during the experiment, was shortly applied to each experimental region for participants to acclimatize to the feeling of TMS. No adverse events were reported.
Statistical analysis
Similar to Dilks et al. (2013), for the main analysis a 3 × 2 GLM repeated-measures ANOVA was performed with percentage correct as the dependent variable, following our preregistration. TMS site (rLOC, rOPA, and vertex) and category (object and scene) acted as independent variables. A significant main and interaction effects were followed by post hoc t tests. In addition, Bayes factors (BF10) are reported for post hoc analysis. Values >1 indicate evidence that the alternative hypothesis (i.e., performance in the active TMS condition is different from vertex TMS) is true, with larger values suggesting stronger evidence. Values <1 indicate evidence for the null hypothesis (i.e., performance does not differ between active and vertex TMS) with values closer to 0 suggesting stronger evidence. Data were further explored by comparing the LOC effect (percentage correct during vertex – LOC stimulation during object recognition) and OPA effect (percentage correct during vertex – OPA stimulation during scene recognition) between the first and second half of the experiment. A paired-samples t test was used to test for differences between halves, and one-sample t tests with 0 as the test value were used to investigate whether the effect was >0. Finally, repeated-measures ANOVAs were used to test for differences between categories for the LOC and OPA effect as well as a one-sample t test to indicate whether the effect is >0. All analyses were performed using SPSS 25.0 (ANOVA and post hoc tests) and JASP 0.12.2 (Bayes factors).
Results
Main results: replication of Dilks et al. (2013)
A 3 (TMS site: rOPA, rLOC, vertex) × 2 (category: objects, scenes) ANOVA revealed a significant interaction (F(2,142) = 7.09, p = 0.001), indicating that TMS site differentially affected performance (percentage correct) in the two tasks. To test whether object and scene recognition performance was impaired following stimulation, we followed up this interaction by separate one-way ANOVAs for each category.
For the object recognition task, a significant main effect of TMS site indicated that object recognition was affected by TMS (F(2,142) = 8.38, p < 0.001; Fig. 2A). Post hoc t tests showed that performance was reduced during LOC TMS compared with vertex TMS (t(71) = 4.05, p < 0.001, d = 0.48, BF10 = 269.65; Fig. 2B) and compared with OPA TMS (t(71) = 2.55, p = 0.013, d = 0.30, BF10 = 3.81). OPA TMS did not impair object recognition performance compared with vertex TMS (t(71) = 1.42, p = 0.159, d = 0.17, BF10 = 0.31).
A, Results of the present study (N = 72). The findings are similar to those of Dilks et al. (2013) showing decreased performance for scene recognition during TMS over OPA (compared with vertex) and decreased performance for object recognition during TMS over LOC (compared with vertex). B, Average and individual data of the LOC effect (percentage performance difference vertex – LOC during object recognition) and OPA effect (percentage performance difference vertex – OPA during scene recognition) for both the present study and Dilks et al. (2013). C, Percentages of participants susceptible to TMS over OPA, LOC, both, or neither. D, LOC (red) and OPA (blue) effect for the first half and second half of the experiment. E, LOC (red) and OPA (blue) effect per category. In all graphs: *p < 0.05, **p < 0.01, ***p < 0.001, with error bars reflecting the SEM.
For the scene recognition task, we similarly found a main effect of TMS site (F(2,142) = 11.97, p < 0.001; Fig. 2A). In this case, performance was reduced during OPA TMS compared with vertex TMS (t(71) = 4.91, p < 0.001, d = 0.58, BF10 = 5753.48; Fig. 2B) and compared with LOC TMS (t(71) = 3.08, p = 0.003, d = 0.36, BF10 = 5.39). LOC TMS did not impair scene recognition performance compared with vertex TMS (t(71) = 1.51, p = 0.135, d = 0.18, BF10 = 0.687).
These results replicate the results of Dilks et al. (2013), providing convincing causal evidence for a double dissociation between scene- and object-selective regions in the recognition of scenes and objects. The overall pattern of results of the two studies was nearly identical, with statistically stronger evidence in the current study because of the larger sample size (Fig. 2B).
Individual differences and consistency across time
Although overall TMS was effective in altering scene and object recognition performance, there is a notable amount of individual variability in response to TMS. To get an idea of this variability, the responder rate was calculated: 37.5% of participants showed both an effect of TMS over LOC (object recognition performance vertex – LOC > 0) and over OPA (scene recognition performance vertex – OPA > 0); 18.1% of participants showed an effect of LOC but not OPA TMS; and 25.0% of participants showed an effect of OPA but not LOC TMS (Fig. 2C). This means that the susceptibility to TMS over LOC and OPA was 55.6% and 62.5%, respectively. Although these rates are lower than those of Dilks et al. (2013; 69.2% and 84.6%, respectively), χ2 tests indicated no significant difference between these percentages (χ2 = 0.84, p = 0.358; and χ2 = 2.40, p = 0.122, respectively). The susceptibility rates found here are in agreement with susceptibility rates of previous studies targeting visual areas, such as van Koningsbruggen et al. (2013), who showed a susceptibility rate of 54% for TMS over the extrastriate body area, located near the LOC.
Individual differences in the excitability of the stimulated cortex, skull thickness, neuronal and axonal orientation, as well as metabolic factors may impact the susceptibility to TMS. One way to control for this variability is to take into account the phosphene threshold. We found that phosphene threshold was not significantly correlated with the LOC TMS effect (r = 0.04, p = 0.739) or with the OPA TMS effect on performance (r = 0.13, p = 0.269). This suggests that the above-mentioned physiological differences between participants, to the extent that these are captured by phosphene threshold, did not significantly mediate our effects.
Since all TMS sites were stimulated twice in a palindromic design, it was possible to investigate the consistency of the effects over time by comparing the TMS-induced LOC effect (percentage correct object recognition, vertex – LOC) and OPA effect (percentage correct scene recognition, vertex – OPA) in the first half to the second half of the task. The LOC effect was significant for both the first half (4.60 ± 1.15%, t0(71) = 3.90, p < 0.001, d = 0.47) and second half (4.04 ± 1.40%, t0(71) = 2.88, p = 0.005, d = 0.34), with no significant difference between the two halves (t(71) = 0.36, p = 0.724, d = 0.04, BF10 = 0.138; Fig. 2D). Similarly, the OPA effect was significant in the first half (4.69 ± 1.12%, t0(71) = 4.18, p < 0.001, d = 0.49) and second half (3.13 ± 0.99%, t0(71) = 2.89, p = 0.005, d = 0.37) of the experiment, with no difference between halves (t(71) = 1.05, p = 0.300, d = 0.12, BF10 = 0.219; Fig. 2D). These results indicate that the TMS effects of LOC and OPA were robust over time.
Furthermore, to ensure that the results were not driven by specific features of one or a few objects or scenes, the LOC and OPA effects were compared between categories. There were no significant differences between the individual objects and scenes for the LOC effect (F(3,213) = 0.23, p = 0.859) or for the OPA effect (F(3,213) = 0.85, p = 0.465). Furthermore, all categories showed the effect in the expected direction (Fig. 2E), although it did not reach significance for all individual categories tested separately (LOC effect: camera, p = 0.006; car, p = 0.084; chair, p = 0.048; shoes, p = 0.08; OPA effect: beach, p = 0.289; city, p = 0.064; forest, p = 0.008; kitchen, p = 0.006). This is likely a consequence of increased variability by selecting a subset of trials, which consequently reduces statistical power.
Finally, to exclude speed–accuracy trade-offs, we analyzed reaction time (RT) data. A 3 × 2 repeated-measures ANOVA revealed no significant main effect of TMS site (F(2,142) = 0.839, p = 0.394) or category (F(1,71) = 3.28, p = 0.074), and no significant interaction of TMS site * category (F(2,142) = 0.11, p = 0.893). This indicates that RT did not differ between TMS conditions, neither for object nor scene stimuli, excluding speed–accuracy trade-offs.
Discussion
The distinction between object processing and scene processing features prominently in cognitive neuroscience research (Oliva, 2013; Epstein, 2014). Indeed, neuroimaging studies have demonstrated that this distinction is one of the main organizing principles of human high-level visual cortex, with object-selective cortex representing object content (but not scene layout) and scene-selective regions representing scene layout (but not object content). Here, by testing a large sample (N = 72) using an established paradigm (Dilks et al., 2013) and a preregistered protocol, we provide strong TMS evidence supporting this distinction, with TMS over object-selective cortex (LOC) selectively interfering with object recognition and TMS over scene-selective cortex (OPA) selectively interfering with scene recognition. The results could not be explained by a speed–accuracy trade-off, and were consistent across time and individual object and scene stimuli.
What may explain the distinct functional roles of LOC and OPA, as revealed here? A likely possibility is that LOC contributed to object recognition by representing object shape or object contour, while OPA contributed to scene recognition by representing more globally distributed image properties such as surface texture (Cant and Goodale, 2007). Another possibility is that LOC and OPA differentially contributed to the tasks by processing different low-level visual input, such as spatial frequencies (Rajimehr et al., 2011), rectilinear features (Nasr et al., 2014), or retinal locations (Levy et al., 2001). Further research is needed to disentangle these contributions by using carefully controlled stimuli. For example, one fMRI study used a task manipulation to show increased activation in LOC when participants attended to the shape (flat vs convex) of a surface compared with attending to its texture (rock vs wood; Cant and Goodale, 2011). The opposite result (texture > shape) was found in scene-selective cortex. It would be interesting to test whether LOC and OPA are causally involved in these tasks, for which visual input is equated.
While the current results provide strong evidence for distinct scene- and object-selective regions in visual cortex, this does not mean that scene- and object-selective pathways are functionally independent. In natural vision, scenes and objects are processed interactively, with scene context facilitating object recognition and object processing facilitating scene recognition. For example, when local object cues are degraded (e.g., a distant car in the mist), scene context can drive object recognition (Brandman and Peelen, 2017). Conversely, a clearly visible object can allow for recognizing an otherwise ambiguous scene (Brandman and Peelen, 2019). fMRI studies showed that degraded objects were more reliably represented in LOC when shown on a congruent scene background (Brandman and Peelen, 2017), and degraded scenes were more reliably represented in OPA when shown together with an intact object (Brandman and Peelen, 2019). In other words, object representations in LOC can be driven by scene cues, and scene representations in OPA can be driven by object cues. Future TMS studies could follow up on these results, testing whether LOC and OPA are causally involved in representing their preferred categories (objects or scenes) when these are merely inferred from nonpreferred cues (scenes or objects).
To provide a near-exact replication of Dilks et al. (2013), we used the same behavioral task, stimuli, procedure, and TMS timing. However, one important difference between the studies is that we used group-average Talairach coordinates rather than individually localized regions. This may implicate less spatial specificity in the present study, potentially decreasing the effect size (Sack et al., 2009). Nevertheless, the results of the two studies are remarkably similar (Fig. 2B). In addition to the replication of the main results, the effect sizes and variability are also quite comparable. The decrease in object recognition performance after LOC TMS was 4.7% (SD, 8.0%; Cohen's d = 0.59) in the study by Dilks et al. (2013), and 3.9% (SD, 8.3%; Cohen's d = 0.47) in the present study. The decrease in scene recognition performance after OPA TMS was 4.9% (SD, 5.7%; Cohen's d = 0.86) in the study by Dilks et al. (2013), and 3.8% (SD, 6.4%; Cohen's d = 0.59) in the present study. These results show that LOC and OPA can be targeted reliably based on group-average fMRI coordinates, providing an easy and cost-effective way to investigate the LOC and OPA when an fMRI scan is not feasible.
Finally, TMS timing in the current study was rather non-specific, with a window of 500 ms and a frequency of 10 Hz. Several studies have suggested that the LOC represents an object category between 120 and 200 ms after stimulus onset (Koivisto et al., 2011; Carlson et al., 2013; Cichy et al., 2014; Isik et al., 2014; Kaiser et al., 2016). Similarly, the OPA represents scene layout around the same time window (Cichy et al., 2017; Henriksson et al., 2019). Based on these values, TMS timing could be optimized to disrupt object and scene recognition more powerfully, with a narrower window and a higher frequency; for example, TMS between 100 and 300 ms after stimulus onset with a frequency of 20 Hz. Additionally, single- or double-pulse TMS at specific latencies should be used to test the time points at which these regions are causally involved in the recognition of objects and scenes (Koivisto et al., 2011; Pitcher et al., 2012; Reeder et al., 2015).
In summary, the current study provides causal evidence for a double dissociation between object-selective LOC and scene-selective OPA, successfully replicating Dilks et al. (2013). By showing that these regions can be reliably targeted with TMS, even when using group-average fMRI coordinates, we can now address new questions regarding the functionality of these regions. One interesting avenue for future research is to move beyond isolated object and scene processing and test the causal time course of LOC and OPA during naturalistic situations, in which scene and object processing interact and mutually inform each other.
Footnotes
This project has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation program (Grant Agreement 725970). We thank Daniel Dilks for providing the stimuli and task.
The authors declare no competing financial interests.
- Correspondence should be addressed to Miles Wischnewski at mileswischnewski{at}gmail.com