Abstract
fMRI studies have revealed three scene-selective regions in human visual cortex [the parahippocampal place area (PPA), transverse occipital sulcus (TOS), and retrosplenial cortex (RSC)], which have been linked to higher-order functions such as navigation, scene perception/recognition, and contextual association. Here, we document corresponding (presumptively homologous) scene-selective regions in the awake macaque monkey, based on direct comparison to human maps, using identical stimuli and largely overlapping fMRI procedures. In humans, our results showed that the three scene-selective regions are centered near—but distinct from—the gyri/sulci for which they were originally named. In addition, all these regions are located within or adjacent to known retinotopic areas. Human RSC and PPA are located adjacent to the peripheral representation of primary and secondary visual cortex, respectively. Human TOS is located immediately anterior/ventral to retinotopic area V3A, within retinotopic regions LO-1, V3B, and/or V7. Mirroring the arrangement of human regions fusiform face area (FFA) and PPA (which are adjacent to each other in cortex), the presumptive monkey homolog of human PPA is located adjacent to the monkey homolog of human FFA, near the posterior superior temporal sulcus. Monkey TOS includes the region predicted from the human maps (macaque V4d), extending into retinotopically defined V3A. A possible monkey homolog of human RSC lies in the medial bank, near peripheral V1. Overall, our findings suggest a homologous neural architecture for scene-selective regions in visual cortex of humans and nonhuman primates, analogous to the face-selective regions demonstrated earlier in these two species.
Introduction
A sense of “place,” and the ability to recognize the environment and localize oneself within it, is crucial for survival in most animals. Although place-related cues take myriad forms across the animal kingdom, visual cues predominate in humans and other primates. In humans, functional MRI studies (Aguirre et al., 1996, 1998; Epstein and Kanwisher, 1998; Ishai et al., 1999; Maguire, 2001; Bar and Aminoff, 2003; Grill-Spector, 2003; Hasson et al., 2003) have described three visual cortical regions that are more active during the presentation of “places” (typically, scenes or isolated houses) compared with the presentation of other visual stimuli such as faces, objects, body parts, or scrambled scenes. Typically, these human brain regions are named for nearby anatomical landmarks as follows: (1) parahippocampal place area (“PPA”), (2) transverse occipital sulcus (“TOS”), and (3) retrosplenial cortex (“RSC”).
Although all three regions respond well to scenes, recent fMRI studies have revealed intriguing functional differences between them. For instance, PPA reportedly processes the visual-spatial structure of scenes (Epstein and Kanwisher, 1998), responding to changes in viewpoint and to scene novelty, but not during the navigation tasks—whereas RSC responds in the opposite way (Epstein et al., 1999, 2003; Park and Chun, 2009).
Such evidence suggests that these regions form a network for scene processing, analogous to the well known network for face processing. Based on human fMRI, this face-processing network includes several regions, including occipital face area (OFA), fusiform face area (FFA), and the anterior face region (Kanwisher et al., 1997; Grill-Spector et al., 2004; Rajimehr et al., 2009). Recent studies have revealed neurobiological mechanisms underlying this network by studying homologous regions in macaque monkeys (Tsao et al., 2003, 2008a; Rajimehr et al., 2009). Primate studies have shown that (1) at least some of these face-processing regions are anatomically interconnected, as shown by microstimulation combined with fMRI (Moeller et al., 2008); (2) these regions are organized hierarchically, based on physiological recordings (Freiwald and Tsao, 2010); and (3) this face-processing network extends to prefrontal cortex, as demonstrated by fMRI activation (Tsao et al., 2008b). Thus, studies of the face-processing network in monkeys have greatly expanded our understanding of the neurobiological substrates of face perception and recognition.
Analogously, our main goal here was to test for macaque homologs of human PPA, TOS, and RSC, to enable subsequent studies of scene-processing mechanisms in macaque cortex. To generate an optimal reference map, we first defined the precise locations of these regions in human cortex. These maps indicated that all three scene-selective regions are centered near but not on the sulci/gyri for which they were named. Moreover, these “scene-selective” regions are located in or adjacent to known retinotopic areas, including the lowest-tier areas V1 (adjacent to RSC) and V2 (adjacent to PPA). In macaques, the homolog of PPA is located adjacent to the FFA homolog, mirroring the topography of adjacent human regions FFA and PPA. The macaque fMRI also revealed a homolog of human TOS, which included V3A. Preliminary versions of this work have been presented previously (Devaney et al., 2008).
Materials and Methods
Human subjects
Seventeen normal human subjects (seven females; 22–33 years of age), with normal or corrected-to-normal vision, were tested in one to three experimental sessions each (Table 1). Written informed consent was obtained from each subject before the experiments. All experimental procedures were approved by Massachusetts General Hospital protocols.
Primate subjects
Seven juvenile male macaque monkeys (Macacca mulatta) were used in these studies (Table 2). Three of the monkeys (4–6 kg) were studied at the Massachusetts General Hospital (MGH), and four (5.0–8.5 kg) were studied at the National Institute of Mental Health (NIMH). Surgical details and the training procedures for the monkeys were similar across the two sites and described in detail previously (Vanduffel et al., 2001; Tsao et al., 2003; Bell et al., 2009). All experimental procedures conformed to NIH guidelines and were approved by experimental protocols at MGH and NIMH, respectively.
Human imaging
Human subjects were scanned in a horizontal 3 T Siemens Tim Trio MR imager at MGH. Gradient echo EPI sequences were used for functional imaging (TR, 2000 ms; TE, 30 ms; flip angle, 90°; 3.0 mm isotropic voxels; 33 axial slices). A 3D MP-RAGE sequence (1.0 mm isotropic) was used for high-resolution anatomical imaging from the same subjects.
Throughout the functional scans, all subjects continuously fixated a small fixation spot at the center of visual display. To control attention level during the functional scanning, subjects reported an unpredictably timed color change for the fixation target, except as noted. Each session consisted of 10–15 functional runs, and each run contained 14 blocks (block duration, 16 or 24 s).
Primate imaging
All primates were implanted with a MR-compatible headpost and trained to work in the sphinx position in a MR-compatible horizontal restraint device. As in the human task, all monkey subjects were required to fixate a small spot at the center of the display screen, near continuously. Eye position was monitored using an infrared pupil tracking system (ISCAN). Monkeys were rewarded with water or juice for maintaining fixation within a square-shaped central fixation window (typically, 2 × 2° in size) surrounding the fixation spot.
MGH.
Primate scanning at MGH used the 3 T scanner described above. A gradient echo EPI sequence was used for functional imaging (TR, 2000 ms; TE, 19 ms; flip angle, 90°; 1.0 mm isotropic voxels; 50 axial slices). Each monkey session consisted of 20–25 functional runs, with each run containing 14 blocks (block duration, 30 or 40 s). Each monkey was scanned for two to five sessions, and data from all sessions were averaged together. To increase functional sensitivity in the monkey scans (in part, to compensate for smaller voxels in the smaller primate brains), we used a gradient insert coil (Siemens AC88), parallel imaging with a four-channel phased array coil, and an exogenous contrast agent [monocrystalline iron oxide nanoparticle (MION); 8–10 mg/kg, i.v.]. Previous studies (Vanduffel et al., 2001; Leite et al., 2002; Tsao et al., 2003) within the same animals have confirmed that MION and BOLD label corresponding cortical areas (Vanduffel et al., 2001; Leite et al., 2002), although within-area activity details may differ slightly (Smirnakis et al., 2007). For each monkey, structural scans were also acquired using a 3D MP-RAGE sequence (0.35 mm isotropic voxels), during anesthesia.
NIMH.
Imaging data were collected using a 3 T GE scanner. A gradient echo (EPI) sequence was used for functional imaging (TR, 2000 ms; TE, 17.9 ms; flip angle, 90°; 1.5 mm isotropic voxels; 27 coronal slices) with an eight-channel surface coil array, based on MION (7–11 mg/kg, i.v.). Each session consisted of 10–30 functional runs containing three blocks (block duration, 40 s). Each monkey was scanned for two sessions, and data from all sessions were averaged together. High-resolution T1-weighted whole-brain anatomical scans (voxel size, 0.5 mm3) were also acquired on a 4.7 T Bruker scanner with a modified driven equilibrium Fourier transform sequence.
Data analysis
For all human and monkey subjects, functional and anatomical data were preprocessed and analyzed using FreeSurfer (http://surfer.nmr.mgh.harvard.edu/). For each subject, the cortical surface was extracted and reconstructed, allowing analysis on both the “inflated” and “flattened” views.
All functional images were motion corrected, spatially smoothed (unless otherwise noted) using a 3D Gaussian kernel [2.5 mm half width at half-maximum (HWHM) in humans and 1 mm HWHM in monkeys] and normalized across scans. The estimated hemodynamic response was defined by a gamma function, and then the average signal intensity maps were calculated for each condition. Voxelwise statistical tests were based on a univariate general linear model. The significance levels were projected onto the inflated/flattened cortex after a rigid coregistration of functional and anatomical volumes. For monkey data, additional manual corrections were also applied to avoid possible misalignment between functional and structural scans. Using FreeSurfer, functional maps were spatially normalized across sessions (in monkeys) and across subjects (in humans and monkeys). Then, activity within individuals monkey and human brains were transformed spatially onto the “averaged human” and “averaged monkey” brains, respectively (for details, see Fischl et al., 1999), and averaged using a fixed-effects model.
As noted in different analyses, the averaged human cortical surface was based on either the 10 subjects participating in our main study or 40 independent human subjects (FreeSurfer). For all monkeys, we generated an averaged anatomical surface based on the four NIMH monkeys and projected the averaged activity onto those anatomical maps.
In human subjects, flattened maps were generated using largely automated routines in FreeSurfer. These procedures automatically created a number of cuts around the medial aspect of the inflated surface: one in a region around the corpus callosum to remove all midbrain structures, one down the fundus of the calcarine sulcus, a set of equally spaced radial cuts, and a sagittally oriented cut around the temporal pole. The resulting cut surface was projected onto a plane that was oriented perpendicular to the average surface normal at each cortical site. Further details of these procedures are described previously (Fischl et al., 1999).
Visual stimuli
For all experiments (human and macaque) at MGH, stimuli were presented via a LCD projector (Sharp; 1024 × 768 pixel resolution, 60 Hz refresh rate) onto a rear-projection screen using a PC. MATLAB 7.0 and Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) were used to program the experiments. The stimuli were presented in a blocked design. Within a given functional scan, the first and last blocks were always null epochs (i.e., a fixation point on a black background), to allow the hemodynamic response to reach a steady state. The remaining stimulus blocks were ordered pseudorandomly, without a rest period between them. Within each block, stimuli (see below) were presented for 1 s.
Corresponding stimulus presentation details were similar for the monkeys tested at NIMH. There, stimuli were presented via a Sharp Notevision3 projector (resolution, 1024 × 768), via Presentation software (12.2). Each block lasted 40 s, during which 20 images were presented for 2 s each, alternating with 20 s fixation blocks (neutral gray background). Individual scanning runs began and ended with a block of baseline fixation.
Specific stimuli, human subjects
Scenes.
We used four different sets of scenes. Image set 1 included achromatic (gray-scaled) scenes, including 23 images of furnished or empty rooms, and 23 outdoor scenes (cities or natural landscapes). Set 2 included eight naturally colored images of the scanning rooms that were all familiar to the subjects (Rajimehr et al., 2009). Set 3 was eight achromatic scenes of familiar locations outside the scanning rooms, including both indoor and outdoor images. Set 4 included eight achromatic scenes of unfamiliar places, including both indoor and outdoor images.
Faces.
Three different sets of face images were used in this experiment. Image set 1 was 23 images of individual faces (contrasted with scene set 1). Set 2 included eight colored face mosaics that included multiple equal-sized faces adjacent to each other (contrasted with scene set 2) (Rajimehr et al., 2009), of equal retinotopic extent to the scene set. Set 3 included computer-generated (FaceGen) faces, similar to those used by Yue et al. (2011).
Additional category-related images.
Set 1 included eight unfamiliar computer-generated objects (“blobs”) (Yue et al., 2011). Set 2 was eight images of tools (Bell et al., 2009). Set 3 included eight scrambled versions of the scene stimuli. The scrambled images were based on perturbing a random noise field at different scales, to match the original image statistics (Portilla and Simoncelli, 2000).
Retinotopic mapping.
To map the retinotopic organization within the central approximately one-half of the cortical representations (10° radius in the visual field), we used two complementary sets of retinotopic stimuli. Set 1 was scenes and face mosaics (face set 2), which were presented within retinotopically limited apertures, on a black background. The retinotopic apertures included (1) a foveal disk (1.5° radius), (2) a peripheral annulus (5° inner radius and 10° outer radius), (3) an upper vertical meridian wedge (10° radius and 60° angle), (4) a lower vertical meridian wedge (10° radius and 60° angle), (5) a left horizontal meridian wedge (10° radius and 30° angle), and (6) a right horizontal meridian wedge (10° radius and 30° angle). Set 2 was phase-encoded, contrast-reversing (1 Hz) checkerboards within continuously rotating rays or continuously expanding/contracting ring stimuli, as described previously (Sereno et al., 1995; Tootell et al., 1997).
In one subject, we also mapped the representation of the far peripheral visual field, using radially scaled, contrast-reversing checkerboards presented at a range of eccentricities from 70° to (and beyond) the visible limits of the visual field, centered on the vertical and horizontal meridians (retinotopic set 3).
Specific stimuli, monkey subjects, MGH
Stimuli were identical to the human scene set 2, face set 2, and retinotopic set 1, described above.
Specific stimuli, monkey subjects, NIMH
The stimuli used at NIMH were achromatic photographs from three image categories, all relatively familiar to the monkeys. Set 1 was individually presented monkey faces, from the local colony. Set 2 was scenes of the NIMH scanning, training, and housing rooms. Set 3 was objects from those environments. Retinotopic stimuli were not used in the monkeys at NIMH.
Results
Overall
Figure 1A–E illustrates the group-averaged scene-selective activity from the main group of human subjects (n = 10; Table 1), using faces as control images, in the folded (Fig. 1A,B), inflated (Fig. 1C,D), and flattened (Fig. 1E) cortical surfaces. Consistent with previous studies (Epstein et al., 2007; Park and Chun, 2009), we found significantly higher responses to scenes in three main regions, bilaterally, in the vicinity of (1) PPA, (2) TOS, and (3) RSC.
For comparison, Figure 1F shows a fMRI map from an awake fixating macaque monkey, in response to the same stimuli, displayed in the same cortical surface format. As in humans, multiple scene-biased regions were evident in the macaque. Regions that appear to correspond in the two species (presumptive homologs) are named accordingly in white (Fig. 1, compare E, F). Below, this putative map correspondence was tested in detail.
For simplicity and historical continuity, we used the original names for the human scene-selective regions PPA (Epstein et al., 1999), TOS (Grill-Spector, 2003), and RSC (Maguire, 2001). We also extended the original naming scheme to indicate presumptive monkey homologs of these areas, by adding “m” (i.e., mPPA, mTOS, mRSC). However because the present evidence revealed inaccuracies in all these names, a new set of names is proposed in Discussion, which remains correct across both human and macaque cortex.
Human fusiform anatomy
To clarify the functional maps of PPA, it is helpful to first document a detail in the anatomical maps. Generally, the fusiform gyrus is described as a single uninterrupted gyrus (Polyak, 1957; Duvornoy, 1999). However one group (Chao et al., 1999; Haxby et al., 1999) distinguished fMRI activity on the “medial fusiform” gyrus, from that on the “lateral fusiform” gyrus. Here, we found that this functional subdivision has a rough anatomical correlate: the central portion of the fusiform gyrus is usually split along its length by a shallow sulcus. We named this the “middle fusiform sulcus,” separating the “medial fusiform” gyrus from the “lateral fusiform” gyrus.
Figure 2 shows this anatomical feature in the averaged MRI-based cortical surfaces from two independent subject pools: (1) the current group average (n = 10; Fig. 2A,B) and (2) the averaged surfaces from the standard FreeSurfer average brain (n = 40; Fig. 2C,D). This cortical surface analysis averages the cortical folding pattern (i.e., the gyri and sulci) without conventional volumetric (3D) blurring. However, note that the cortical folds in each individual surface are best fit to the group-averaged folding pattern, so the individual maps are subject to minor 2D misalignment relative to the group map (Fischl et al., 1999).
The middle fusiform sulcus (white arrow) is apparent in both group-averaged cortical surfaces (Fig. 2B,D). In the n = 40 surface, the middle fusiform sulcus is only 2.5 mm deep, thus ∼5 mm across the cortical surface. By contrast, the two sulci defining the external border of the fusiform gyrus (i.e., collateral and temporal occipital sulci) are much deeper, with maximum depth of ∼10 and 6 mm, respectively. In our n = 17 subject pool, the group values were similar: the depth and length of the middle fusiform gyrus in those individual surfaces ranged from 2 to 5 mm and 8 to 54 mm, respectively.
To confirm the presence of this sulcus in actual brains, we examined ex vivo brains from human autopsy. A middle fusiform sulcus was present in 20 of 24 hemispheres examined (83%). Examples are shown in Figure 3.
Human PPA
Figure 4 shows the location of scene-selective activity in this region (PPA), from the main human dataset (n = 10), based on group-averaged maps of the anatomy and function from a common set of subjects. Also shown is the center of fMRI activity (the voxel showing the highest statistical bias for scenes) in the group average (Fig. 4C) and in the individual data comprising our group map (Fig. 4D). Counter to expectations, we found that this ventral scene-selective region (the “parahippocampal place area”) was not centered on the parahippocampal gyrus. Instead, it was consistently centered near the lateral lip of the collateral sulcus, where it meets the medial fusiform gyrus, in both our group-averaged and the individual maps, in both hemispheres. Of course, a lower-amplitude activity bias could extend onto the parahippocampal gyrus, depending on the statistical threshold chosen, the levels of signal averaging and spatial filtering, and variations between individuals.
Role of stimulus variations
There is no single, quantifiable stimulus comparison for localizing PPA. Instead, different studies have localized this region based on correspondingly different scenes or houses, contrasted with various sets of faces, objects, body parts, and/or scrambled scenes. Thus, it could be argued that the location of PPA varies with the stimuli used to localize it. This could occur if the optimal stimuli vary continuously (instead of area-wise) across the cortical sheet (Wang et al., 1996) (but see Tootell et al., 2008). Alternatively, it could occur in some models of a distributed representation (Ishai et al., 2000a,b). Conceivably, either of these hypotheses could explain the presence here of a scene-selective patch of activity located lateral to, instead of on the parahippocampal gyrus.
To address this, we directly tested whether the location and topography of PPA varies due to corresponding stimulus variations. Figure 5 shows the results produced by four different sets of scenes versus natural and computer-generated faces, objects, or scrambled scenes (see Materials and Methods). Despite these wide stimulus variations, the topography of PPA remained remarkably constant in comparisons within a common subject pool.
Thus, the unexpected localization of the scene-selective region here (away from the crown of the parahippocampal gyrus) cannot be attributed to stimulus differences between the current versus past studies. Instead, these results in PPA are fully consistent with results in classic lower-level visual areas, such as V1, V2, MT: none of these areas changes shape or moves across the cortical map, dependent on object stimulus variations.
Meta-analysis
How does the unexpected PPA localization here compare with analogous localizations in the literature? To clarify this, the following meta-analysis was conducted. The centers of previously published scene-biased activity in this region were translated onto a common, standardized cortical surface (using FreeSurfer and its averaged human brain) based on Talairach coordinates (Talairach and Tournoux, 1988) reported in previous studies (Table 3). Coordinates were found in 12 neuroimaging comparisons of scenes or buildings, relative to faces, objects, or scrambled scenes. Each study was assigned a character, and that distribution is shown in Figure 6. Eleven studies were based on fMRI; one was based on PET.
The averaged center of PPA in the current data (asterisk, from Fig. 4C) lies squarely in the middle of these previously published sites; thus, our data were representative. Among the previously published sites, five were located on the crown of the medial fusiform gyrus, but none was on the crown of the parahippocampal gyrus. This and prior descriptions (Haxby et al., 1999; Levy et al., 2004) suggest that the parahippocampal place area is not centered on the parahippocampal gyrus (see Discussion); instead, it is located lateral to that gyrus. However, as noted above, submaximal activity beyond the center can extend onto adjacent regions of the cortical surface (including the parahippocampal gyrus), depending on thresholding and related factors.
All but one of the remaining sites were located along the lip of the collateral sulcus, which divides the medial fusiform gyrus from the parahippocampal gyrus. The confluence of published centers along the lip (but not within the depth) of the collateral sulcus may reflect signal contributions from the large vein that overlies the collateral sulcus (Menon et al., 1993; Kim et al., 1994), in addition to signal contributions arising from the gray matter itself.
The macaque homolog of human FFA
Human PPA is located immediately adjacent to FFA in the cortical sheet, on the medial side, on the side closest to the splenium of the corpus callosum. Thus, any candidate homolog of PPA in the monkey (“mPPA”) should also lie immediately adjacent to monkey FFA (mFFA), on the side closest to the splenium.
To test that prediction, it was necessary to first localize mFFA as a reference landmark. Previously (Tsao et al., 2003; Rajimehr et al., 2009), the location of mFFA was defined based on quantitative transformation of cortical areas in the human and macaque maps, using fMRI and equivalent stimuli, based on maps from individual monkeys. In both reports, mFFA is the large, high-amplitude, face-selective patch located approximately midposteriorly along the length of the superior temporal sulcus, extending from the ventral bank onto the lip of the middle temporal gyrus (Fig. 1F, black asterisk).
However, additional face-responsive patches have also been reported in this cortical region, which might confuse the accurate localization of mFFA. In the simplest account, both monkeys and humans have two main face patches in corresponding cortical regions of each hemisphere, with the more posterior patch comprising (m)FFA (Pinsk et al., 2005, 2009; Hadj-Bouziane et al., 2008; Bell et al., 2009; Rajimehr et al., 2009). Another account is more complex: the monkey has either three (Tsao et al., 2003) or six (Tsao et al., 2008a) face patches in each hemisphere, whereas humans have three (Tsao et al., 2008a) in this occipito-temporal region.
It is possible that this discrepancy arises in part from variation in the individual maps chosen for illustration. To date, group-averaged maps have not been calculated for the monkey face patches, which would reduce or eliminate such individual variations. To remedy this, group-averaged maps were first calculated from the fMRI data from three monkeys (Table 2) used throughout this study, based on the same localizing stimuli used in human subjects (faces vs scenes). In the monkey experiments, we used an exogenous contrast agent (MION) (see Materials and Methods), which increased the spatial specificity of the MRI signal compared with the more conventional BOLD signal used in human studies (Mandeville and Marota, 1999; Vanduffel et al., 2001; Leite et al., 2002). These averaged data showed two main face patches in each hemisphere (Fig. 7), consistent with those described earlier (Pinsk et al., 2005, 2009; Hadj-Bouziane et al., 2008; Bell et al., 2009; Rajimehr et al., 2009) (Fig. 1E).
To confirm this finding, we calculated a second group-averaged map based on an additional and independent set (n = 4) of monkeys. This second set of activity maps was generated in a different laboratory (NIMH), using a different scanner, based on stimuli that were familiar to the monkeys (i.e., faces of conspecifics, scenes and objects from the laboratory)—as opposed to stimuli that were matched to the human localization studies, as tested first. Despite these technical differences, again the group averages showed two main face patches (Fig. 8, black asterisks and arrowheads), as expected from previous reports (ibid).
At lower thresholds, additional, smaller face-biased patches were sometimes found within a given monkey, as described previously (Tsao et al., 2008a; Ku et al., 2011). However, the presence and location of such additional patches varied across animals, dependent on threshold level and other factors. Accordingly, those patches did not survive group averaging. Note also that face-selective activity in mFFA sometimes extended farther posteriorly (in or near V4d) as in the human maps (Fig. 1F). However, in both species, retinotopic maps from the same subjects suggest that this variable posterior activity reflects a difference in stimulus size/position, not necessarily face selectivity per se.
Macaque PPA
Based on the cortical maps, a candidate mPPA should lie adjacent to this main face patch (mFFA) in the monkey cortical map, analogous to the relationship of FFA to PPA in the human map. Thus, in macaques, mPPA should lie on the crown of the middle temporal gyrus, slightly anterior and ventral to the posterior middle temporal sulcus.
Such a result has been shown in individual maps from two monkeys (Rajimehr et al., 2011). Here, that initial finding was confirmed in both sets of group-averaged data (Figs. 7, 8). In one hemisphere, scattered regions of scene-biased activity also extended into the region of occipito-temporal sulcus (Fig. 7D). However, the latter activity was inconsistent in location, relative to the consistent peak of scene-selective activity in mPPA, in all four averaged hemispheres (Figs. 7, 8).
Human TOS
In humans, an additional focus of scene-selective activity is found in dorsal occipital cortex (Nakamura et al., 2000; Grill-Spector, 2003; Hasson et al., 2003; Epstein et al., 2005; Park and Chun, 2009) (Figs. 1, 9). Depending on experimental details, that dorsal patch can be as prominent as the one in PPA, in both amplitude and topographical extent. However, this dorsal occipital patch has received relatively little attention.
In the original report, the dorsal patch of scene-selective activity was localized on the transverse occipital sulcus; thus, it was named “TOS.” However, before that time, a classic retinotopically defined area (“V3A”) was also localized on the transverse occipital sulcus (Tootell et al., 1997). Thus, either (1) the transverse occipital sulcus spans both activity-defined areas (i.e., V3A plus TOS), (2) the TOS region coincides with (or includes) V3A, or (3) the original localization of TOS is incorrect.
Our evidence supports the third hypothesis. When averaged across subjects and hemispheres, this scene-selective patch (TOS) was centered on the crown of the lateral occipital gyrus (Fig. 9), anterior and ventral to the transverse occipital sulcus. As in PPA, the centers of highest activity occurred on the edges of this gyrus, consistent with a contribution from the large veins overlying the adjacent sulci.
Human area V3A is easily defined based on retinotopic mapping stimuli, because it has a distinctive map of the complete contralateral visual field (Tootell et al., 1997). In Figure 10, we localized the scene-selective TOS region relative to retinotopically defined area V3A, within all hemispheres in which V3A was unambiguously defined, based on two retinotopic criteria: (1) upper versus lower field subdivisions and (2) horizontal versus vertical meridians (see Materials and Methods).
These data confirmed that TOS is consistently located immediately anterior and ventral to V3A, and dorsal to the confluent foveal representations in V1 through V3 (Fig. 10). Thus, TOS lies within explicitly retinotopic cortex—extending from V7 (Tootell et al., 1998) through V3B (Press et al., 2001) and LO-1 (Larsson and Heeger, 2006).
Macaque TOS
Next, we tested whether a TOS homolog (“mTOS”) exists in macaque visual cortex. When translated from the human maps to the macaque maps, a homolog for human TOS should lie immediately anterior to macaque V3A (Gattass et al., 1988), in macaque “V4d,” and/or the newly described retinotopic representations CIP-1, CIP-2 (Arcaro et al., 2011), and perhaps also the DP (dorsal prelunate) gyrus (Andersen et al., 1990; Heider et al., 2005).
However, this specific human-to-monkey prediction is complicated by the existing maps of macaque V3A, which are not perfectly clear. The original single-unit maps of V3A frequently showed a representation of the contralateral 180° on the anterior bank of the lunate sulcus, posterior to the prelunate gyrus (Van Essen and Zeki, 1978; Gattass et al., 1988). However, in some animals, the anterior (upper field) representation in V3A was less certain (Gattass et al., 1988). A similar uncertainty can be seen in fMRI maps of V3A in some macaques (Fig. 11, upper field representation). When defined by variations in polar angle, the fMRI maps of V3A in macaque consistently extend over the prelunate gyrus (Arcaro et al., 2011) (Fig. 11).
In all three animals in which the MR slice prescription included this region (MGH), we found patches of scene-selective activity in this general location, extending variably across both sides of the prelunate gyrus (Figs. 1F, 7, black arrows). In two monkeys, we were also able to map the retinotopy (Fig. 11). Direct comparison between the scene-biased and retinotopic maps showed that mTOS included area V4d, which is roughly the topographic equivalent of human areas V7, V3B, and LO-1 (Fig. 11C,D,F). However, in macaques, this scene-selective activity also extended into area V3A, with some variability. In one hemisphere, mTOS was mainly in area V3A without any clear activity in area V4d (Fig. 11E). Thus, mTOS activity included V3A (as defined by the polar angle), plus areas more anterior to V3A (as in human TOS). Given the uncertainty in the definition of macaque V3A, it seems likely that the macaque TOS is homologous with human TOS.
Human RSC
A third patch of scene-selective fMRI activity was noted in human studies (Maguire et al., 1998; O'Craven and Kanwisher, 2000) and eventually attributed to RSC (Maguire, 2001), referring to architectonically defined retrosplenial cortex (Brodmann, 1909). However, the fMRI-defined scene-selective RSC has not been localized in detail.
In our human maps, scene-selective RSC was consistently located in the fundus of the parieto-occipital sulcus, bilaterally (Fig. 12A,B). Extrapolating from many early architectonic studies, the scene-selective RSC region thus lies near the peripheral retinotopic representations of primary and secondary visual cortex, V1 and V2. To localize these regions in more detail, we first compared functional and anatomical maps based on group-averaged data (Fig. 12). Scene-selective RSC was localized using our main group-averaged data based on faces versus scenes, as described above. V1 was localized anatomically, based on increased myelination in the stria of Gennari (Hinds et al., 2008), as translated to the current brain surface using spherical coordinates (Fischl et al., 1999). The topography of V2 was based on the following two kinds of data: (1) previous fMRI studies of the retinotopy in human V2 (Sereno et al., 1995; DeYoe et al., 1996; Engel et al., 1997; Pitzalis et al., 2006, 2010) up to 60° eccentricity, and (2) flattened human cortical tissue stained for cytochrome oxidase (Tootell and Taylor, 1995; Horton and Hocking, 1998) including the far peripheral representation, which reveals thin stripes that are known to span the width of V2 (Tootell et al., 1983; Horton, 1984).
According to this group data, RSC is located immediately adjacent to V1. The close proximity of RSC to V1 and V2 is somewhat surprising, given the higher-order properties reported for RSC (Epstein et al., 2007; Park and Chun, 2009; Vann et al., 2009) (see Discussion).
These maps also revealed a partially mirror-symmetrical topography in scene-selective regions PPA and RSC (Fig. 12). Although PPA lies farther away from the border with V1, both RSC and PPA lie adjacent to the peripheral representation of V2: PPA is located adjacent to the representation of the upper visual field, while RSC lies adjacent to the representation of lower visual field.
Given these unexpected results in the group-averaged data, we conducted more detailed tests to confirm these conclusions within an individual subject. Figure 12D–F shows those results, based on patterns of fMRI activity produced by (1) scenes versus faces (set 2; to label RSC and PPA); (2) vertical versus horizontal meridians in the central 20° (retinotopic set 1); (3) monocular activation of the visible limit of the ipsilateral far periphery (the monocular crescent) of the visual field, versus the (invisible) farther periphery (see Materials and Methods). As a reference, we also included the group-averaged border of V1 based on the stria of Gennari.
Overall, we found a good match between the group-averaged data and the individual data. The retinotopically defined border of V1/V2 (the vertical meridian representation) in the individual subject corresponded well with myelination boundaries in the group-averaged map (Fig. 12E), within the central approximately one-half of V1, where both measures were available. In addition, the peripheral extent of checkerboard-driven activation in the individual map coincided with the peripheral border of V1 in the myelination map (Fig. 12F). The peripheral extent of the checkerboard-driven activity spread slightly into adjacent areas, including presumptive V2 and the posterior portion of PPA. This spread of the checkerboard-driven activation was expected; previous studies have demonstrated that both V2 (Sereno et al., 1995; DeYoe et al., 1996; Engel et al., 1997) and PPA (Rajimehr et al., 2011) are strongly activated by flickering checkerboards.
As in the group map, RSC in this individual map was located immediately adjacent to the dorsal border of peripheral V1, thus occupying what would otherwise be the peripheral representation of V2. Also consistent with the group comparison, PPA was located adjacent to peripheral V2, at an eccentricity similar (or even more peripheral) to that of RSC.
Macaque RSC
Based on the translation of cortical maps across species, a presumptive macaque homolog of RSC should be located on the medial bank, in or adjacent to the parietal occipital (medial) sulcus (POm) (Pitzalis et al., 2006). In at least one of the monkeys, we confirmed the presence of that scene-biased patch, bilaterally (Fig. 13). As in human RSC, this presumptive macaque homolog of RSC (“mRSC”) was small in size and low in amplitude, in response to the localizer used here. This small size and amplitude of RSC may explain why mRSC did not reach threshold in the n = 3 group map (Fig. 7C,D).
Discussion
The correspondence between scene-selective regions in human and macaque cortex is diagrammed in Figure 14.
Human PPA
We found that scene-selective fMRI activity in PPA was typically centered on the lips of the collateral sulcus and adjacent medial fusiform gyrus, rather than on the parahippocampal gyrus per se. This was borne out in our MRI data (Figs. 4, 5) and in a meta-analysis of the literature (Fig. 6). This finding is also consistent with a few reports describing functionally equivalent regions on the collateral sulcus (Levy et al., 2004) or medial fusiform gyrus (Chao et al., 1999; Haxby et al., 1999).
The discrepancy in localizing PPA cannot be easily attributed to differences in experimental design or stimuli, relative to previous localizers. Although the size of PPA varied according to the stimuli we tested, the peak location and the topography of this area remained remarkably constant, within a given set of subjects (Fig. 5).
Medial fusiform gyrus
In two independent group-averaged cortical surfaces (n = 17 and n = 40; Fig. 2), and in 20 of 24 human brains from autopsy (Fig. 3), we documented that a shallow sulcus (the middle fusiform sulcus) subdivides the fusiform gyrus into two parallel branches: the lateral and medial fusiform gyri. This middle fusiform sulcus roughly divides the scene-responsive fMRI activity (on the medial fusiform gyrus) from face-responsive activity (on the lateral fusiform gyrus). Since that middle fusiform sulcus was not considered in the original report (Epstein et al., 1998), it remains true that PPA is located on the gyrus immediately medial to “FFA,” in both the present and the original accounts.
Macaque PPA
We compared maps across species in the cortical sheet, using functional landmarks, without considering the cortical folding patterns. This approach has become standard (Van Essen et al., 2001; Tootell et al., 2003; Orban et al., 2004; Sereno and Tootell, 2005), partly because gyri and sulci vary enormously across species. For instance, macaques do not have a fusiform gyrus. Even when similar cortical folds exist, homologous areas vary in location relative to the cortical folds across species. For example, the well established direction-selective area MT/V5 is located in the superior temporal sulcus in macaque, but in the inferior temporal sulcus in humans.
Previously (Rajimehr et al., 2011), we presented evidence for mPPA in two individual monkeys. Here, we confirmed that finding in seven animals, in two independent group averages. In all cases, mPPA was defined as a patch of scene-responsive activity (Figs. 7, 8) centered exactly where a macaque homolog of human PPA should lie, adjacent to the most prominent face patch (mFFA). In the folded brain, this location is ventral and slightly anterior to the posterior middle temporal sulcus (PMTS). Area TEO is centered roughly on the PMTS (Boussaoud et al., 1991); thus, mPPA apparently lies immediately anterior to TEO. Like human PPA, mPPA is elongated along the posterior-to-anterior axis (Figs. 1, 7, 8). Thus, by the local-neighborhood criterion, the human-to-macaque match is good. The more global comparison including areas much farther from PPA (e.g., anterior temporal lobe, the subiculum) may not match quite as well, consistent with the disproportionate expansion in some cortical regions in humans, relative to macaques (Fig. 14).
Human TOS
Our data (Figs. 9, 10) indicate that the human scene-selective region TOS is actually centered on the nearby lateral occipital gyrus, rather than within its namesake, the transverse occipital sulcus. As shown previously (Tootell et al., 1997), the transverse occipital sulcus spans a different, retinotopically defined area, V3A. Thus, scene-selective TOS should lie immediately anterior and lateral to retinotopically defined V3A, in/near retinotopic human areas V7 (Tootell et al., 1998), V3B (Press et al., 2001), and/or LO-1 (Larsson and Heeger, 2006). That conclusion was confirmed here in six hemispheres (Fig. 10), consistent with earlier illustrations in two hemispheres (Levy et al., 2004), and one of two hemispheres in the study by Spiridon et al. (2006).
Macaque TOS
Macaque cortical maps showed a corresponding cluster of scene-selective patches in dorsal occipital cortex (mTOS) (Figs. 1, 11). As in human TOS, mTOS includes the area anterior to macaque V3A (i.e., area V4d). In macaques, mTOS also extends posteriorly into V3A (Fig. 11), depending on how V3A is defined.
This possible posterior extension of mTOS in macaques (relative to humans) does not rule out the assumption of homology, because incremental changes occur naturally as cortical maps evolve across species. Moreover, if there is an interspecies shift in (m)TOS relative to V3A, this has a precedent in the existing literature. In humans, V3A shows high motion selectivity (Tootell et al., 1997). However, in macaques, higher motion selectivity is instead found in area V3 (Van Essen et al., 1990). To the extent that mTOS includes V3A, the region of high scene selectivity would thus be located adjacent and anterior to the region of higher motion selectivity (Fig. 11), in both humans and macaques. That is, both functional properties (sensitivity to motion and sensitivity to scenes) would have shifted by a single area.
RSC
A third scene-responsive area was named RSC, with reference to the architectonically defined retrosplenial cortex [areas 26, 29, and 30 of Brodmann (1909)]. However, Brodmann's report of small cytoarchitectonically defined areas located posterior to the splenium (i.e., BA 26, 29, and 30) was not confirmed by subsequent anatomists (Economo, 1929; Bailey and von Bonin, 1951), nor was an analogous area reported in macaque (Brodmann, 1909). More importantly, the location of Brodmann areas 26, 29, and 30 does not overlap with the location of scene-selective RSC. Recently, the original definition was blurred by widely broadening its borders (Fenske et al., 2006; Epstein et al., 2007) and/or the name itself (retrosplenial “complex”) (Bar, 2007). In all of our data, scene-selective RSC is a discrete region consistently located in the fundus of the parieto-occipital sulcus, ∼1 cm from the original Brodmann areas.
Surprisingly, we also found that RSC is located immediately adjacent to V1, in what would otherwise be the peripheral representation of dorsal V2. This was unexpected. Except for RSC, V1 is surrounded mainly by the second-tier cortical area V2. Thus, RSC is quite unique: it is an apparently higher-tier area (Park and Chun, 2009) that nevertheless borders the two lowest-level areas in the cortical visual hierarchy (Van Essen et al., 1990). Functionally similar areas are often located near each other (e.g., area MT/V5 and surrounding direction-selective areas), presumably because such adjacency can shorten the more numerous cortical connections between functionally related areas. However, counterexamples can also be cited, in which adjacent areas are not functionally similar. The proximity of RSC with V1/V2 may be an example of the latter.
The topography of these three areas supports certain observations in the literature. First, Gattass et al. (1988) reported that V2 does not include a representation of the far peripheral visual field, unlike that found in V1. Such a retinotopic difference would “make room” for RSC along the V1 border, as reflected in our data. Second, our data are consistent with evidence for an asymmetry in dorsal versus ventral V2 in macaques (Van Essen et al., 1984; Felleman and Van Essen, 1991).
An even more restricted representation of eccentricity has also been reported in area V3 (Van Essen et al., 1984; Gattass et al., 1988). As described above, such an arrangement would make room for PPA, adjacent to V2 (Fig. 12).
Nomenclature
The present data reveals numerous complications in the current names for scene-selective cortical regions. The human regions are not centered on the gyri/sulci for which they are named, and the human names cannot be accurately generalized to homologous areas in macaques. The latter discrepancies arise commonly in cross-species comparisons, because different species develop different sulci and gyri.
Above, we used the original names for the scene-selective regions, for historical continuity. However, in Figure 14, we proposed a simple alternative naming scheme that would remain accurate across both humans and macaques. In the new scheme, regions PPA, TOS, and RSC are renamed VS, DS, and MS, respectively (for ventral, dorsal, and medial regions of scene responsivity). Corresponding regions in humans and macaques would be distinguished using the prefix “h” or “m,” yielding hVS, hDS, and hMS in humans, and mVS, mDS, and mMS in monkeys.
Future directions
The demonstration of scene-selective regions in macaques enables future experiments using classical neurobiological techniques, to reveal common neural mechanisms underlying scene processing. For instance, what are the functional properties of single units in each of these scene-selective patches? Do the different scene-selective regions share specific neural connections with each other, and/or with higher-level brain regions implicated in place processing (e.g., hippocampus via entorhinal cortex), and/or spatial navigation (the dorsal stream)? An analogous proliferation of knowledge about neural mechanisms followed the demonstration of “face-selective” patches in macaques based on fMRI (Tsao et al., 2003)—which were prompted in turn by fMRI studies on face-selective patches in humans (Kanwisher et al., 1997; Haxby et al., 2000; Rajimehr et al., 2009). Hopefully, the current study will serve a similar purpose.
Footnotes
This work was supported by NIH Grants R01 MH67529 and R01 EY017081 (R.B.H.T.), the Martinos Center for Biomedical Imaging, the NCRR, the MIND Institute, Shared Instrumentation Grants 1S10RR023401, 1S10RR019, and 1S10RR023043, and the NIMH Intramural Research Program.
- Correspondence should be addressed to Shahin Nasr, Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Charlestown, MA 02129. shahin{at}nmr.mgh.harvard.edu
This article is freely available online through the J Neurosci Open Choice option.