Abstract
Functional magnetic resonance imaging has revealed a set of regions selectively engaged in visual scene processing: the parahippocampal place area (PPA), the retrosplenial complex (RSC), and a region around the transverse occipital sulcus (previously known as “TOS”), here renamed the “occipital place area” (OPA). Are these regions not only preferentially activated by, but also causally involved in scene perception? Although past neuropsychological data imply a causal role in scene processing for PPA and RSC, no such evidence exists for OPA. Thus, to test the causal role of OPA in human adults, we delivered transcranial magnetic stimulation (TMS) to the right OPA (rOPA) or the nearby face-selective right occipital face area (rOFA) while participants performed fine-grained perceptual discrimination tasks on scenes or faces. TMS over rOPA impaired discrimination of scenes but not faces, while TMS over rOFA impaired discrimination of faces but not scenes. In a second experiment, we delivered TMS to rOPA, or the object-selective right lateral occipital complex (rLOC), while participants performed categorization tasks involving scenes and objects. TMS over rOPA impaired categorization accuracy of scenes but not objects, while TMS over rLOC impaired categorization accuracy of objects but not scenes. These findings provide the first evidence that OPA is causally involved in scene processing, and further show that this causal role is selective for scene perception. Our findings illuminate the functional architecture of the scene perception system, and also argue against the “distributed coding” view in which each category-selective region participates in the representation of all objects.
Introduction
Our ability to perceive the visual environment is remarkable: we can recognize a scene within a fraction of a second (Potter, 1976; Biederman, 1981; Thorpe et al., 1996), and use that information to seamlessly navigate. Given the ecological importance of scene perception and navigation, it is perhaps not surprising then that particular regions of the human brain are specialized for processing visual information about scenes, including the parahippocampal place area (PPA) (Epstein and Kanwisher, 1998), the retrosplenial complex (RSC) (Maguire, 2001), and a region near the transverse occipital sulcus, formerly referred to as “TOS” (Grill-Spector, 2003), but for reasons outlined in the Discussion, henceforth called the “occipital place area” (OPA). Here we investigate the least-studied of these regions, OPA, and ask whether it is not only selectively responsive to scenes, but also whether it plays a causal role in scene processing.
Evidence that the PPA is not only activated when people perceive scenes, but that it is further necessary for this function, comes from patients with damage in or near the PPA, who have deficits in simple visual identification of scenes or landmarks (Aguirre and D'Esposito, 1999; Mendez and Cherrier, 2003), and difficulty more generally in knowing where they are (Habib and Sirigu, 1987; Epstein et al., 2001). Similarly, although patients with RSC damage can recognize salient landmarks, they fail to use these landmarks to orient themselves or to navigate through a larger environment (Takahashi et al., 1997), implying a causal role of RSC in navigation. Together, these results suggest that both PPA and RSC are key cortical regions underlying our ability to recognize scenes and use this information in navigation. By contrast, little is known about OPA apart from its selective responses to visually presented scenes (Nakamura et al., 2000; Grill-Spector, 2003; Hasson et al., 2003; MacEvoy and Epstein, 2007; Dilks et al., 2011). Thus, we asked here whether OPA also plays a causal role in scene perception. To answer this question, we applied transcranial magnetic stimulation (TMS) to the right OPA (rOPA) in two experiments.
Multiple different kinds of information can be extracted from visually presented scenes, including the layout of surrounding space, the category of the scene (e.g., beach or city), and the recognition of particular places. Here we tested the role of rOPA in the first two abilities. In Experiment 1, we delivered TMS to rOPA or the face-selective right occipital face area (rOFA) while participants were required to discriminate fine-grained differences in scene layout or face shape. To precisely measure shape discrimination thresholds for spatial layout versus face shape, we used a staircase procedure to determine how different the stimuli had to be for participants to correctly discriminate the scenes or faces. In Experiment 2, we delivered TMS to rOPA or the object-selective right lateral occipital complex (rLOC) while participants performed a categorization task on scenes or objects. If OPA is causally and selectively involved in scene processing, then TMS over OPA should disrupt perception of scenes but not faces or objects.
Materials and Methods
Participants
Eight participants took part in Experiment 1 (5 females, mean age 23 years), and 14 participants took part in Experiment 2 (9 females, mean age 24 years). One participant from Experiment 2 withdrew due to discomfort from the TMS stimulation, and hence was not included in the analysis. All participants had good visual acuity, and were free of ophthalmic, neurologic, and general health problems. Participants provided informed consent in accordance with the Committee on the Use of Humans as Experimental Subjects at the Massachusetts Institute of Technology.
Functional magnetic resonance imaging scanning
Before TMS, each participant completed a localizer scan to identify the category-selective regions of interest (ROIs) (i.e., rOPA, rOFA, and rLOC) for TMS.
Data acquisition.
Scanning was performed in a 3.0 T Siemens Trio scanner at the A. A. Martinos Imaging Center at the McGovern Institute for Brain Research at the Massachusetts Institute of Technology. Functional images were acquired with a Siemens 32-channel phased array head-coil and a gradient-echo EPI sequence [32 slices, repetition time (TR) = 2 s, echo time = 30 ms, voxel size = 3 × 3 × 3 mm, and 0.6 mm interslice gap). Slices were oriented approximately parallel to the calcarine sulcus and provided whole-brain coverage. In addition, a high-resolution T-1 weighted MPRAGE anatomical scan was acquired for anatomically localizing the functional activations.
Design.
A blocked functional magnetic resonance imaging (fMRI) design was used in which participants viewed 3 s movie clips of faces, bodies, scenes, objects, and scrambled objects (Pitcher et al., 2011). Each participant completed four runs. Each run was 234 s long and consisted of two blocks per stimulus category. The order of the stimulus category blocks in each run was palindromic (e.g., faces, objects, scenes, bodies, scrambled objects, scrambled objects, bodies, scenes, objects, faces) and was randomized across runs. Each block contained 6 movie clips from the same category for a total of 18 s per block. We also included 18 s fixation blocks at the beginning, middle, and end of each run, during which time the screen alternated between different full-screen colors once every 3 s (0.3 Hz).
Data analysis.
Data were analyzed with FS-FAST and Freesurfer (http://surfer.nmr.mgh.harvard.edu/) (Dale et al., 1999; Fischl et al., 1999). Before statistical analysis, images were motion corrected (Cox and Jesmanowicz, 1999), smoothed (3 mm FWHM Gaussian kernel), and linearly detrended. ROIs were determined using standard general linear model analyses with predictors for each condition convolved with a gamma function (delta = 2.25 and tau = 1.25). rOPA was identified using a scenes > objects contrast. rOFA and rLOC were identified using faces > objects and objects > scrambled contrasts, respectively.
TMS
TMS site localization.
TMS sites were located with the Brainsight TMS-MRI coregistration system (Rogue Research). In particular, ROIs were localized by overlaying individual activation maps from the fMRI localizer on high-resolution MRI scans for each participant. The target sites (i.e., rOPA, rOFA, and rLOC) were then identified by selecting the voxel exhibiting the peak activation in each ROI (Fig. 1). The proper coil locations were marked on a bathing cap that was placed on the participant's head. In addition, a vertex control site was identified as the direct midpoint between the bridge of the nose and the inion, and between the participants' temples.
TMS stimulation.
A Magstim Super Rapid2 stimulator was used to deliver the TMS via a figure-eight coil with a wing diameter of 70 mm. TMS was delivered at 10 Hz and 60% of maximal stimulator output. A single intensity was used for all participants on the basis of previous studies (Pitcher et al., 2009). Test stimuli were presented during 500 ms rTMS.
Experiment 1
To test whether sensitivity to fine-grained differences in scene layout or face shape is affected by TMS to rOPA and rOFA, respectively, participants performed a delayed two-alternative forced choice (2AFC) match-to-sample task on either scenes or faces with TMS delivered during the presentation of the second stimulus (Fig. 2A). The pair of stimuli in each trial varied in morph difference (Fig. 2B), which we adjusted with a staircase procedure. Our measure of discrimination ability was the stimulus morph difference necessary to produce 75% correct discrimination. TMS sites included rOPA, rOFA, and a vertex control site. Participants completed 2 runs while being stimulated at each site. Scene and face trials were interleaved within a run. The order of the TMS sites was palindromic (e.g., rOPA, rOFA, vertex, vertex, rOFA, rOPA), and roughly counterbalanced across participants. There were 36 trials per condition per run.
Design.
On each trial, a fixation cross appeared, followed by the first stimulus (scene or face), presented at the center of the screen for 1 s. Immediately afterward, a test pair was shown simultaneously side-by-side for 500 ms (Fig. 2A). Each test pair included the immediately preceding study item and a distracter item created by morphing the study item with a different item. Morphs were created with the FantaMorph Software (Abrosoft) for faces and scenes, by morphing one face (scene) into another face (scene) in steps of 5%. Figure 2B shows example morph continua for scenes and faces. TMS pulses occurred for the entire duration the test stimuli were on the screen. Following the test pair was a white screen that remained on until participants made their response. Participants were told to report which of the two test items they had seen just before (Fig. 2A). They pressed the “q” key if the item was on the left, or the “w” key if the item was on the right. The correct item was on the right 50% of the time. A QUEST staircase procedure (Watson and Pelli, 1983) was used to adjust the difficulty of the task by changing the distance between the two test items on the morph spectrum. This procedure provided an estimate of the morph threshold at which participants were able to discriminate between the study item and the morphed item with an accuracy of 75% correct; chance was 50%. The following QUEST parameters were set: number of trials = 36; beta = 3.5; delta = 0.01; gamma = 0.5; grain = 5. The reason to use a staircase procedure, as opposed to the traditional measure of accuracy during TMS, is that it may provide a more sensitive measure of perceptual discrimination ability than mean accuracy on a fixed set of trials because a larger percentage of the trials target the critical difficulty range for each participant, with fewer uninformative floor and ceiling trials. For each TMS site, the two discrimination thresholds for each condition (scenes or faces) acquired from the two runs were averaged before significance testing.
To introduce the task to participants, each session began with practice trials at the beginning of each run, using the same procedure as the experimental task but with very easy test pairs, two scene pairs and two face pairs, containing the study item and an 80%-morph distracter item. After four practice trials, the program advanced to the actual experimental trials. No TMS was administered during practice trials.
Stimuli.
Stimuli were grayscale photographs of faces and grayscale computer-generated scenes. The faces were of Caucasian men, with neutral expression, no facial hair or glasses, wearing black hats to hide hair and ears, shown in front view, in front of a black background (Duchaine and Nakayama, 2006) (Fig. 2B). Scenes were perspective views of houses generated with Google SketchUp (Fig. 2B). Stimuli were presented on a Dell OptiPlex 330 running Windows Vista using Matlab (version 2008b, MathWorks) and the Psychtoolbox extension (version 3.0.9) (Brainard, 1997). Participants were seated 71 cm away from the screen, with their head stabilized by a chin rest. Stimuli were 9 × 9 degrees in visual angle.
Experiment 2
To test whether categorizing a scene (i.e., identifying the scene as a beach, forest, city, or kitchen) or an object (i.e., identifying the object as a camera, chair, car, shoes) is affected by TMS to rOPA and rLOC, respectively, participants performed a delayed 4AFC categorization task on either scenes or objects with TMS delivered during the presentation of the stimulus (Fig. 3A). TMS sites included rOPA, rLOC, and a vertex control site. Participants completed 6 runs while being stimulated at each site. Scene and object trials were presented in separate blocks. The order of the TMS sites was palindromic (e.g., rOPA, rLOC, vertex, vertex, rLOC, rOPA), and roughly counterbalanced across participants. There were 32 trials per condition per site.
Design.
The experiment consisted of two phases. The goal of the first phase was to determine the optimal stimulus duration per category (i.e., beach, forest, city, or kitchen for the scene stimuli; camera, chair, car, shoes for the object stimuli) per participant that would produce an average performance of ∼63% for that category. This was the thresholding phase. The second phase was the main experiment: stimulus durations per category per participant were fixed at the previously determined threshold and TMS was administered.
In both phases, participants performed a 4AFC categorization task in separate blocks for the scene and object images. Their task was to pick the correct category of the image they had just seen by pressing one of four keys. Specifically, a fixation cross appeared, followed by the test stimulus, followed by a 500 ms mask, followed by a gray screen that remained on until the participant made a response (Fig. 3A). In the first phase, there were 2 runs containing one block per task (object or scene), with a total 128 stimuli per block (32 per category). Participants' stimulus presentation duration thresholds for each category were determined within a run, using a QUEST staircase procedure (number of trials = 32; beta = 3.5; delta = 0.01; gamma = 0.5; grain = 2). The thresholds obtained in the 2 runs were compared, and the shorter-duration threshold was selected for the second phase. This was done to take potential learning effects into account. Average presentation duration across participants was 96 ms (range 64–128). In the second phase, stimulus durations per category per participant were fixed at the previously determined threshold. This phase consisted of 6 runs with one scene block and one object block in each run, and 32 stimuli per block (8 per category). TMS was delivered to a single stimulation site within a run. As described above, the sites included rOPA, rLOC and vertex, and were stimulated in a palindromic design.
Stimuli.
Stimuli were grayscale photographs of scenes and objects (Fig. 3B). The scene stimuli were images of beaches, forests, cities, and kitchens (64 images per category) taken from the SUN Database (Xiao et al., 2010). The object stimuli were images of cameras, chairs, cars, and shoes (64 per category). Pilot testing showed that participants were close to ceiling with presentation times at the shortest duration possible (one refresh). Thus, to bring performance down to a dynamic range, it was necessary to degrade the images. The scenes were degraded by blending a square grid of grayscale tiles (8 × 8 tiles) of random (i.e., white) intensity with each image. To degrade the object images, their transparency was increased 40%, and they were placed on a scrambled object background. The scrambled objects were constructed by randomly selecting an equal number of square tiles from images in each of the 4 object categories, arranging them in an 8 × 8 grid of the same dimensions as the object images and permuting their positions. A total of 8 different backgrounds were used. It was necessary to use somewhat different strategies for degrading the scene and object stimuli due to differences in the properties of the two stimulus types (e.g., an object on a background vs a scene with no background). Pilot experiments suggested that the two strategies produced similar decrements in performance for their respective stimulus type.
The scene and object stimuli were backward masked with masks constructed as follows: scene masks consisted of a 4 × 4 grid of tiles of randomly selected segments from 8 different undegraded scene images (1 image per category; 2 tiles per image). Object masks were generated with an identical procedure with the exception that the object images used were already degraded. Stimuli were presented on a Dell OptiPlex 330 running Windows Vista using Matlab (version 2008b, MathWorks) and the Psychtoolbox extension (version 3.0.9) (Brainard, 1997). Participants were seated 71 cm away from the screen, with their head stabilized by a chin rest. Stimuli and masks were 9 × 9 degrees in visual angle, and a unique mask was generated for each trial.
Results
If OPA is causally and selectively involved in scene perception, then TMS over OPA should disrupt perception of scenes, but not other categories of stimuli (e.g., faces or objects). Accordingly, the following predictions were tested: (1) impaired scene discrimination when TMS was delivered over rOPA, but not when TMS was delivered over rOFA; and impaired face discrimination when TMS was delivered over rOFA, but not when TMS was delivered over rOPA (Experiment 1), and (2) impaired scene categorization when TMS was delivered over rOPA, but not when TMS was delivered over rLOC; and impaired object categorization when TMS was delivered over rLOC, but not when TMS was delivered over rOPA (Experiment 2).
Experiment 1
As predicted, scene discrimination was impaired when TMS was delivered over rOPA, but not when TMS was delivered over rOFA, while face discrimination was impaired when TMS was delivered over rOFA, but not when TMS was delivered over rOPA (Fig. 2C). A 3 (TMS site: rOPA, rOFA, vertex) × 2 (Category: Scenes and Faces) repeated-measures ANOVA revealed a significant interaction (F(2,14) = 17.00, p = 0.001), with a significantly greater threshold for scenes than faces when TMS was delivered over rOPA, relative to vertex and rOFA, and a significantly greater threshold for faces than scenes when TMS was delivered over rOFA, relative to vertex and rOPA (all interaction contrasts, p < 0.05).
Follow-up ANOVAs contrasting the TMS sites for each category (i.e., scenes and faces) separately confirmed the above finding. For scenes, a three-level (TMS site: rOPA, rOFA, vertex) repeated-measures ANOVA revealed a significant main effect of TMS site (F(2,14) = 6.18, p = 0.01), with a significantly higher discrimination threshold when TMS was delivered over rOPA compared with either vertex or rOFA (main effects contrasts, p = 0.006 and 0.03, respectively), and no difference between vertex and rOFA (main effects contrast, p = 0.34). By contrast, for faces, a three-level (TMS site: rOPA, rOFA, vertex) repeated-measures ANOVA revealed a significant main effect of TMS site (F(2,14) = 10.35, p = 0.002), with a significantly higher discrimination threshold when TMS was delivered over rOFA compared with either vertex or rOPA (main effects contrasts, p = 0.009 and 0.002, respectively), and no difference between vertex and rOPA (main effects contrast, p = 0.78).
Experiment 2
As predicted, scene categorization was impaired when TMS was delivered over rOPA, but not when TMS was delivered over rLOC, and object categorization was impaired when TMS was delivered over rLOC, but not when TMS was delivered over rOPA (Fig. 3C). A 3 (TMS site: rOPA, rLOC, vertex) × 2 (Category: Scenes and Objects) repeated-measures ANOVA revealed a significant interaction (F(2,24) = 11.77, p = 0.001), with significantly lower categorization accuracy for scenes than objects when TMS was delivered over rOPA, relative to vertex and rLOC, and a significantly lower accuracy for objects than scenes when TMS was delivered over rLOC relative to rOPA (all interaction contrasts, p < 0.05), and a marginally significant lower accuracy relative to vertex (interaction contrast, p = 0.06).
Follow-up ANOVAs contrasting the TMS sites for each category (i.e., scenes and objects) separately confirmed the above finding. For scenes, a three-level (TMS site: rOPA, rLOC, vertex) repeated-measures ANOVA revealed a significant main effect of ROI (F(2,24) = 6.80, p = 0.005), with significantly lower accuracy when TMS was delivered over rOPA compared with either vertex or rLOC (main effects contrasts, both p values < 0.05), and no difference between vertex and rLOC (main effects contrast, p = 0.71). By contrast, for objects, a three-level (Site: rOPA, rLOC, vertex) repeated-measures ANOVA revealed a significant main effect of ROI (F(2,24) = 6.90, p = 0.004, with significantly lower accuracy when TMS was delivered over rLOC compared rOPA (main effects contrast, p = 0.01), a marginally significant difference between rLOC and vertex (main effects contrast, p = 0.06), and no difference between vertex and rOPA (main effects contrast, p = 0.13).
Discussion
In two separate experiments, we found that TMS to rOPA disrupted the perception of scenes, but not faces or objects, providing the first evidence that OPA is causally and selectively involved in scene perception. We have renamed this region (formerly known as TOS) the occipital place area or OPA because (1) the data reported here strengthen the evidence for the specific role of this region in the perception of places and scenes, (2) the region in question is defined by its function, not solely by its anatomical location, and (3) indeed the region is not even within the transverse occipital sulcus in all subjects (Nasr et al., 2011). Other results from our study replicate prior findings (Pitcher et al., 2007, 2009) that the rOFA is causally involved in face perception and the rLOC is causally involved on object perception, and further show that our control tasks are sensitive to TMS, just not TMS to OPA. Together, our findings have implications both for the cognitive and neural basis of scene perception, and for the category specificity of representations in high-level visual cortex.
First, what do these data tell us about scene perception? A number of hypotheses have been put forth regarding the kinds of information that might be extracted from visually presented scenes, including spatial layout (Epstein and Kanwisher, 1998; Kravitz et al., 2011; Park et al., 2011), scene category (Dilks et al., 2011; Walther et al., 2011), and the identity of specific familiar places and their location in a broader cognitive map (Epstein et al., 1999). Our findings reveal that OPA is causally involved in at least two of these aspects of scene perception: the perception of spatial layout (Experiment 1), and the recognition of scene category (Experiment 2). Moreover, these findings dovetail with a recent study showing that OPA is selectively engaged in visual scene processing: OPA is sensitive to the left-right orientation of scenes, not objects (Dilks et al., 2011). However, an important future question is whether OPA represents such high-level scene information itself (which may further be necessary for navigation and reorientation—Cheng and Newcombe, 2005; Spelke et al., 2010), or whether it extracts more basic perceptual information such as spatial frequency (Rajimehr et al., 2011), surface slant (Thrun, 2002), or spatial envelope properties (Oliva and Torralba, 2001), en route to a full representation of the scene.
Further, while current data do not directly address the role of each of the scene-selective regions, perhaps given the neuroanatomical location of each of these regions, with OPA being more posterior than PPA and RSC, OPA may serve as the first stage in the scene perception system, and thus might be involved in something earlier in the process. This hypothesis fits with proposals of posterior-to-anterior hierarchically organized cortical networks for not only low-level vision (Hubel and Wiesel, 1959), but also high-level visual processing. For example, it has been proposed that face-selective regions of cortex are themselves arranged in a processing hierarchy, proceeding from the more posterior occipital face area (OFA), to the more anterior fusiform face area (Yovel and Kanwisher, 2004; Pitcher et al., 2007; Schwarzlose et al., 2008; Liu et al., 2010). Similarly, Taylor et al. (2007) have argued that the body-selective regions of cortex are hierarchically organized, proceeding from the more posterior extrastriate body area to the more anterior fusiform body area. A future experiment varying the timing of the TMS to OPA could shed some light on whether OPA is involved in earlier stages of scene processing, preceding scene processing in PPA and RSC (see Pitcher et al., 2007 for a similar experiment on OFA).
Second, our finding that OPA is causally and selectively involved in the perception of scenes addresses the general question of whether category-selective regions contribute only to the perception of their preferred stimuli, or whether these regions constitute parts of broader distributed representations of all objects—the “distributed coding hypothesis” (Haxby et al., 2001). We have shown that OPA is causally involved in the perception of scenes (in two different experiments), and not in the perception of faces or objects. Similarly, we replicate prior results that OFA is not causally involved in processing nonfaces (Pitcher et al., 2009), in this case scenes. Thus our results suggest that even if a category-selective region contains information about nonpreferred stimuli, such information plays no detectable causal role in the perception of those nonpreferred stimuli. Of course, it is possible that a more powerful disruption method may show some causal role of these regions in processing nonpreferred stimuli, and thus it will be important in the future to test this hypothesis further with data from patients and other stronger disruption methods, such as electrical microstimulation in humans and macaques.
Finally, our finding that OPA is necessary for scene perception points to an important new set of questions for further investigation, questions not only about the precise function a given category-selective region, but also about how multiple cortical regions might interact: How does OPA interact with the other scene-selective regions? What is the connectivity among the scene-selective regions, and the rest of the brain? Whatever the ultimate answers to these questions, our results demonstrate that the cortical region OPA is causally and selectively involved in scene processing.
Footnotes
This work was supported by National Institutes of Health Grant EY013455 (N.K.). We thank David Pitcher for his expertise and advice on TMS.
The authors declare no competing financial interests.
- Correspondence should be addressed to Daniel D. Dilks, McGovern Institute for Brain Research, MIT, 43 Vassar Street, 46-4141, Cambridge, MA 02139. dilks{at}mit.edu