A key component of spatial navigation is the ability to use visual information to ascertain where one is located and how one is oriented in the world. We used functional magnetic resonance imaging to examine the neural correlates of this phenomenon in humans. Subjects were scanned while retrieving different kinds of topographical and nontopographical information in response to visual scenes. In the three critical conditions, they viewed images of a familiar college campus, and reported either the location of the place depicted in the image (location task), the compass direction that the camera was facing when the image was taken (orientation task), or whether the location was on campus or not (familiarity task). Our analyses focused on the retrosplenial cortex (RSC)/parietal-occipital sulcus region and the parahippocampal place area (PPA), which previous studies indicate play a critical role in place recognition. RSC activity depended on the type of information retrieved, with the strongest response in the location task. In contrast, PPA activity did not depend on the retrieval task. Additional analyses revealed a strong effect of familiarity in RSC but not in the PPA, with the former region responding much more strongly to images of the familiar campus than to images of an unfamiliar campus. These results suggest that the PPA and RSC play distinct but complementary roles in place recognition. In particular, the PPA may primarily support perception of the immediate scene, whereas RSC may support memory retrieval mechanisms that allow the scene to be localized within the broader spatial environment.
Spatial navigation is a core cognitive ability in humans and animals. Neurophysiological studies, primarily in animals, have identified cell populations that encode spatial quantities useful for navigation, including place cells in the hippocampus (O'Keefe and Dostrovsky, 1971), head-direction (HD) cells in Papez circuit structures (Taube et al., 1990; Taube, 1998), and grid cells in entorhinal cortex (Hafting et al., 2005). An extensive literature has developed around these results, and computational models have been developed that incorporated these neurophysiological data (Redish, 1999; Byrne et al., 2007).
The neural basis of spatial navigation in humans is less clear. Although place cells have been identified in the human medial temporal lobes (Ekstrom et al., 2003), neurophysiological studies in humans are limited because they can only be performed on restricted clinical populations. Neuroimaging studies, in contrast, can be performed on normal subjects and have the advantage that they allow simultaneous monitoring of neural activity across the entire brain. These studies have identified a set of cortical regions, including parahippocampal cortex (PHC), retrosplenial cortex (RSC), and posterior and medial parietal cortices, that respond more strongly during virtual or imagined navigation than during non-navigational control tasks (Aguirre et al., 1996; Maguire et al., 1998; Ino et al., 2002; Hartley et al., 2003). These regions have also been implicated in navigationally relevant functions such as scene encoding (Epstein, 2005) and spatial memory retrieval (Burgess et al., 2001; Kohler et al., 2002; Shelton and Gabrieli, 2002; Wolbers and Buchel, 2005). Neuropsychological reports indicate that damage to these regions can lead to impairments that are specific to the topographical domain (Habib and Sirigu, 1987; Takahashi et al., 1997; Aguirre and D'Esposito, 1999; Mendez and Cherrier, 2003). In summary, previous work supports the idea that PHC, RSC, and parietal regions mediate navigationally relevant processes in humans.
Complementary to this neuroscientific work, behavioral studies have distinguished between various visuospatial mechanisms that might be useful for navigation, including mechanisms for recognizing viewpoint-specific scene “snapshots” (Diwadkar and McNamara, 1997; Shelton and McNamara, 1997) and mechanisms for orienting and localizing the observer in world-centered space (Mou and McNamara, 2002; McNamara et al., 2003; Burgess et al., 2004). However, the precise link between these cognitive mechanisms and their neural substrates remains unclear. One possibility is that certain regions, or combinations of regions, might preferentially support one or more of these mechanisms. For example, we have previously hypothesized that the parahippocampal place area (PPA) (Epstein and Kanwisher, 1998) and RSC might support distinct but complementary processes for place recognition, with the PPA encoding information about the spatial structure of the local scene (akin to a scene “snapshot”), and RSC encoding more large-scale spatial representations that allow the local scene to be situated in a broader environment that may extend beyond the current horizon. If this is the case, then the PPA should be activated during simple perceptual analysis of a scene, whereas RSC activity should depend on retrieval of long-term spatial knowledge. Although a previous study from our laboratory found indirect evidence for this proposal (Epstein and Higgins, 2007), we have not previously tested it directly.
In the current study, subjects were scanned with functional magnetic resonance imaging (fMRI) while they viewed campus scenes and retrieved information about the location, orientation, or familiarity of the depicted place. If the proposed division of labor were correct, we predicted that the RSC would show greater response during location and orientation judgments than during simple familiarity judgments, because only the former judgments require explicit retrieval of long-term spatial knowledge. In contrast, we predicted that PPA activity would be less sensitive to task requirements, because all three tasks involved perception of scenes.
Materials and Methods
Fifteen subjects with normal or corrected-to-normal vision were recruited from the University of Pennsylvania community and gave written informed consent according to procedures approved by the University of Pennsylvania institutional review board. All subjects were required to have spent ≥2 years on campus to ensure familiarity with campus locations. Subjects were paid for their participation.
Scanning was performed at the Center for Functional Neuroimaging at the University of Pennsylvania on a 3T Siemens (Erlangen, Germany) Trio equipped with an eight-channel multiple-array Nova Medical (Wilmington, MA) head coil. T2*-weighted images sensitive to blood oxygenation level-dependent contrasts were acquired using a gradient-echo echo-planar pulse sequence [repetition time (TR), 3000 ms; echo time (TE), 30 ms; voxel size, 3 × 3 × 3 mm; matrix size, 64 × 64 × 45]. Structural T1-weighted images for anatomical localization were acquired using a three-dimensional magnetization-prepared rapid-acquisition gradient echo pulse sequence [TR, 1620 ms; TE, 3 ms; inversion time, 950 ms; voxel size, 0.9766 × 0.9766 × 1 mm; matrix size, 192 × 256 × 160]. Visual stimuli were rear projected onto a Mylar screen at the head of the scanner with an Epson (Long Beach, CA) 8100 3-liquid crystal display projector equipped with a Buhl long-throw lens (Navitar, Rochester, NY) and viewed through a mirror mounted to the head coil. Responses were recorded using a four-button fiber optic response pad system.
A digital camera was used to obtain 60 color images of the University of Pennsylvania campus. Images were taken at 30 distinct locations, 15 of which were on the east side of campus and 15 of which were on the west side of campus. For purposes of this experiment, “east” and “west” locations were defined by reference to 36th Street, which runs through the center of campus. At each of these 30 locations, one image was taken of the view observed when facing west, and one image was taken of the view observed when facing east. Pilot surveys were used to identify locations that were familiar to most University of Pennsylvania students. In addition, 30 photographs of the University of Illinois (Urbana-Champaign, IL) campus (which was unfamiliar to all subjects) and 60 photographs of common objects (30 vehicle and 30 nonvehicle objects) were used as control stimuli.
The experiment consisted of four experimental scans followed by two functional localizer scans. Before entering the scanner, each subject was asked to analyze maps of the University of Pennsylvania campus and view pictures taken along 36th Street, to ensure that he/she was familiar with the location of the street, which was used as a reference for location judgments (see below). The pictures shown in this prescan orientation were not shown again during the experiment.
Experimental scans were 8 min 36 s long and were divided into twelve 36 s blocks, interleaved with 6 s interblock fixation periods and an additional 12 s fixation period at the end of the scan. In each block, subjects viewed five color photographs for 3.3 s each with 2.7 s interstimulus intervals and performed one of four possible tasks (see Fig. 1): (1) in “location” blocks, they used a button box to report whether each photograph depicted a place located east or west of 36th Street (regardless of camera orientation); (2) in “orientation” blocks, they reported whether each photograph was taken by a camera facing east or west (regardless of camera location); (3) in “familiarity” blocks, they reported whether each photograph was taken on the Penn campus or elsewhere; (4) in “object” blocks, they viewed object photographs and reported whether each one depicted a vehicle or nonvehicle object. Subjects were instructed to make their responses during the time that either the stimulus or the fixation cross appeared on the screen. The task and response alternatives for each block were indicated by a visual word prompt that preceded the block and was on the screen for 4 s followed by a 2 s blank interval before the presentation of the first photograph. In addition, a small visual word prompt remained on the screen in the bottom left corner throughout the block to ensure that subjects would not forget the appropriate task.
All told, there were 48 blocks in the experiment. Because each block was composed of five distinct trials, this made for 240 trials total (60 for each condition). The same set of 60 Penn images was used to construct the location and the orientation blocks. Of these 60 images, 30 were used to construct half of the trials in the familiarity blocks (with the choice counterbalanced across subjects). The remaining 30 familiarity trials were constructed using the Illinois photographs. Note that the 6 s temporal separation between trial onsets made it feasible to analyze the data in an event-related manner; however, we presented the trials within blocks to minimize the effects of switching between tasks.
Functional localizer scans were 8 min 15 s long and were divided into 15 s picture epochs during which subjects viewed color photographs of scenes, common objects, and other stimuli presented at a rate of 1.33 pictures/s in a blocked design as described previously (Epstein et al., 2005). The scenes in the functional localizer scans depicted unfamiliar places that were not shown in the main experiment.
After leaving the scanner, subjects were asked to rate their familiarity with each location from the Penn stimulus set on a scale from one to five, with a rating of one indicating a complete lack of familiarity with the depicted place and a rating of five indicating complete familiarity with the place.
Functional images were corrected for differences in slice timing by resampling slices in time to match the first slice of each volume, realigned with respect to the first image of the scan, spatially normalized to the Montreal Neurological Institute template, and spatially smoothed with an 8 mm full-width at half-maximum (FWHM) Gaussian filter. Data were analyzed using the general linear model as implemented in VoxBo (www.voxbo.org) including an empirically derived 1/f noise model, filters that removed high and low temporal frequencies, regressors to account for global signal variations, and nuisance regressors to account for between-scan differences.
Eight regressors (two for each task) were used to model the effects of interest. Specifically, location trials were either classified as location A or B depending on whether the image shown on that trial was also shown in the familiarity condition (location A) or not shown in the familiarity condition (location B). A similar division was made for orientation trials (orientation A vs orientation B). Familiarity trials in which Penn images were shown were modeled separately from familiarity trials in which Illinois images were shown (Penn vs not Penn). Thus, comparison of response between location A, orientation A, and Penn conditions allowed us to measure the effect of task across three conditions that used the same stimulus set, whereas comparison between Penn and not Penn allowed us to measure the effect of familiarity across two conditions in which subjects performed the same task. Because these were the effects of interest, we report activation during location A, orientation A, Penn, and not Penn and did not consider the response during location B and orientation B. For brevity, we will use the terms location and orientation to refer to location A and orientation A conditions. To ensure an equal number of trials across conditions, vehicle and nonvehicle trials were modeled separately within the object blocks, but because this difference was not of interest, the response to objects reflects the average of these two regressors. Each trial was modeled as an impulse response function convolved with a canonical hemodynamic response function.
In addition to the analysis described above, we performed three additional analyses of the data to better understand the relationship between neural activity and behavioral performance. First, to visualize the relationship between reaction time (RT) and fMRI response, we performed an analysis in which the trials for each condition were divided into quintiles according to RT (e.g., fastest 20% of trials equals the first quintile, next fastest 20% equals the second quintile, etc.), and each quintile was modeled using a separate regressor (Sayres and Grill-Spector, 2006). Second, to determine whether fMRI response differences between conditions could be explained by RT differences, we performed an analysis in which an additional covariate modeling the RT for each trial was added to the eight standard regressors. Finally, to determine whether fMRI response differences between conditions could be explained by differences in behavioral accuracy, we performed an analysis in which responses to correct and incorrect trials were modeled separately for each condition. The results for correct trials in this last analysis were similar to those obtained for all trials in the eight-regressor model and will not be considered further.
A final analysis examined the relationship between item repetition and fMRI response. As noted above, the same image set was used to construct location A, orientation A, and Penn conditions; as such, each image in this set appeared three times in the course of the experiment. To examine the effect of this image repetition, we reassigned trials in location A, orientation A, and Penn conditions into three new categories (first presentation, second presentation, third presentation), which superseded the previous assignment by task. The trials in location B and orientation B conditions were similarly reassigned into two regressors (first presentation, second presentation), although as before, results from these trials are not considered. Additional regressors modeled not Penn, vehicle, and nonvehicle trials.
Functional regions of interest (ROIs) were defined for each subject using data from the functional localizer scans. These regions consisted of voxels responding more strongly (t > 3.5) to scenes than to common objects in the posterior parahippocampal/collateral sulcus region (PPA) and retrosplenial/parietal-occipital sulcus region (RSC). Using these criteria, the left PPA, right PPA, and right RSC were identifiable in all subjects, and the left RSC was identifiable in all subjects but one. The mean size of these ROIs were as follows: left PPA, 5.7 ± 3.1; right PPA, 6.3 ± 3.7; left RSC, 3.6 ± 2.9; right RSC, 4.7 ± 4.1 mm3 (errors are 1 SD). The time course of fMRI response during the main experimental scans was extracted from each ROI (averaging over all voxels) and entered into the general linear models described above to calculate parameter estimates (β values) for each condition, which were used as the dependent variables in a second-level random-effects ANOVA. We saw no evidence that the pattern of response differed between the left and right PPA or between the left and right RSC, so data were averaged across both hemispheres before second-level analysis.
Note that RSC is defined here as a functional rather than as an anatomical region. As such, we did not attempt to restrict RSC to the anatomically defined retrosplenial cortex (i.e., Brodmann's areas 29 and 30), but allowed it to extend superiorly into the posterior cingulate (area 23) and posteriorly into the parietal-occipital sulcus/anterior calcarine region. Although these regions have been distinguished based on cytoarchitechtonic (Kobayashi and Amaral, 2000; Morris et al., 2000) and connectivity data (Morris et al., 1999; Kobayashi and Amaral, 2003), as well as functionally (Sugiura et al., 2005), our primary aim was to distinguish the RSC from the PPA rather than to distinguish different subdivisions of the RSC/posterior cingulate/parietal-occipital sulcus region from each other. We leave the task of more precisely localizing function relative to the anatomical structures to future studies. A similar terminology has been used by other authors (Bar and Aminoff, 2003).
For whole-brain analyses, subject-specific β maps were calculated for contrasts of interest and then smoothed to 12 mm FWHM to facilitate between-subject averaging before entry into a random effects analysis. Voxels with a significance level of p < 0.001, uncorrected, are reported.
For both behavioral and fMRI data, we consider two effects of interest. First, the effect of retrieval task was measured by examining differences between the location, orientation, and Penn conditions. Second, the effect of campus familiarity was measured by examining differences between the Penn and not Penn conditions. fMRI response during the object blocks is also reported for purposes of comparison.
Behavioral results are reported in Table 1. Only trials that were used in the fMRI analyses were included. Trials in which subjects made an incorrect response were excluded. Behavioral data from one subject were lost because of a computer error.
There were highly significant effects of retrieval task, evident in both the accuracy (F(2,26) = 20.5; p < 0.00001) and RT (F(2,26) = 66.3; p < 0.0000000001) data. Specifically, accuracies were lower and RTs longer on the more navigationally intensive location and orientation tasks than on the less navigationally intensive familiarity task (accuracy: location vs Penn, t(13) = 4.5, p < 0.001; orientation vs Penn, t(13) = 6.0, p < 0.0001) (RT: location vs Penn, t(13) = 7.1, p < 0.00001; orientation vs Penn, t(13) = 12.6, p < 0.0000001). These effects were quite strong: compared with the Penn trials, RTs on location and orientation trials were 753 and 891 ms longer, respectively. There was also a significant RT difference between the location and orientation trials (t(13) = 2.3; p < 0.05) that was accompanied by a marginally significant difference in accuracy (t(13) = 2.0; p = 0.07). However, these differences were quite small compared with the location versus Penn and orientation versus Penn differences. In summary, the location task was much harder than the familiarity task, and the orientation task was slightly harder still.
A marginal effect of campus familiarity was also observed. Specifically, there was a trend toward faster RTs on Penn trials than on not Penn trials (t(13) = 2.0; p = 0.06). However, there was no concomitant difference in accuracies (t < 1, NS).
The post-scan familiarity test confirmed that subjects were very familiar with the locations shown in the experiment. On a scale of 1–5, where 1 corresponded to “did not recognize the place” and 5 corresponded to “knew the location right away,” the average rating was 4.2. The SD across subjects was 0.35, and the SD across items was 0.60.
The key prediction of the study was that the level of RSC activity would depend on the retrieval task, whereas PPA activity would be comparatively less affected. The results bore out these predictions (Fig. 2). PPA activity did not vary as a function of retrieval task (F < 1, NS) or familiarity (t <1, NS; but see below, Long-term familiarity and item repetition). The consistency of response across retrieval task is particularly striking, given the large differences in RTs between the location/orientation and Penn conditions. These data strongly support a role for PPA in perceptual analysis: the PPA responds strongly when a scene is in view, regardless of whether or not the subject retrieves long-term spatial knowledge in response to it.
The pattern in the RSC was quite different. Here, we found strong modulation of activity as a function of both retrieval task (F(2,28) = 11.4; p < 0.001) and familiarity (t(14) = 4.6; p < 0.001). Specifically, the RSC responded marginally more strongly during location retrieval than during orientation retrieval (t(14) = 2.1; p = 0.06), significantly more strongly during orientation retrieval than during simple judgments of familiarity (orientation vs Penn t(14) = 2.6; p < 0.05), and significantly more strongly to familiar Penn scenes than to unfamiliar nonPenn scenes (t(14) = 4.6; p < 0.001). These results are consistent with a role for RSC in spatial knowledge retrieval for two reasons. First, they demonstrate that RSC activity varies as a function of the type of spatial knowledge retrieved (location vs orientation vs Penn). Second, they demonstrate that RSC activity varies as a function of environmental familiarity (Penn vs not Penn), which affects the suitability of the stimulus as a trigger for spatial knowledge retrieval.
To further demonstrate the difference in pattern between the PPA and RSC, we performed two additional analyses in which data from the PPA and RSC were combined and ROI was included as a factor. We found highly significant interactions of ROI with both retrieval task (F(2,28) = 32.8; p < 0.0000001) and familiarity (F(1,14) = 56.5; p < 0.00001), confirming our observation that these two effects manifest themselves differently in these two regions.
RT effects in the RSC
To what extent can differences in fMRI response between conditions in the RSC be explained by differences in RTs? The answer to this question is important for understanding whether the various retrieval tasks tap distinct memory processes, or whether they tap the same processes but to different degrees. Simple observation indicates that some of the fMRI response differences were congruent with differences in RTs, whereas others were not. For example, fMRI response was greater for orientation trials than for Penn trials, corresponding to a concomitant increase in RT. In contrast, fMRI response was greater for location trials than for orientation trials, although RTs were longer and accuracies lower in the orientation condition. Thus, it seemed possible that some (but not all) of the fMRI response differences in RSC could be explained in terms of RT differences.
To examine this further, we plotted fMRI response in RSC as a function of RT (Fig. 3). There was a clear increase in fMRI response as a function of RT for all four scene conditions. Indeed, the data suggest that the greater response during navigationally intensive judgments (location and orientation) than during familiarity judgments (Penn and not Penn) could be explained in part by this RT effect. To test this, we added RT as a covariate to our original analysis. The β weight on this covariate was positive (t(13) = 6.6; p < 0.0001), indicating a significant effect of RT on fMRI response. With the addition of this RT covariate, the previously observed pattern of greater response during location retrieval than during orientation retrieval remained robust (t(13) = 2.9; p < 0.02), as did the greater response to Penn scenes than to nonPenn scenes (t(13) = 5.1; p < 0.001). However, the location > Penn difference was no longer significant (t < 1, NS) and the orientation > Penn difference actually reversed (t(13) = −2.1; p = 0.055). In summary, RT differences account for some but not all of the fMRI response differences in RSC.
These results suggest that the less navigationally intensive familiarity judgments may have recruited many of the same RSC processes engaged during the more navigationally intensive location and orientation judgments, but to a lesser degree. In particular, the results are consistent with the idea that RSC supports processes for localizing places, which were engaged most strongly on location trials, somewhat less strongly on orientation trials, and least strongly on Penn trials. Indeed, given the fact that all tasks were performed within the same experimental session, it is likely that subjects retrieved some amount of location information in all three conditions. As such, the longer RTs and greater RSC response on location trials relative to Penn trials might be the behavioral and neural concomitants of the same phenomenon (longer/stronger engagement of RSC localization processes during location trials). This claim is supported by the fact that the fMRI response difference in RSC between location and Penn trials is eliminated when RT differences are regressed out. It is also notable that the RSC response during orientation judgments is weaker than the RSC response during location judgments, although the orientation task is more difficult and east/west orientation cannot be ascertained without first identifying location. This suggests that once location is identified with the specificity necessary for the performance of the orientation task, the additional processing necessary to specify orientation (which results in longer RTs for this task) is mediated by other regions of the brain.
Long-term familiarity and item repetition
As noted above, PPA response to familiar and unfamiliar locations was equivalent. In contrast, a recent study from our laboratory observed a small but reliable advantage for familiar versus unfamiliar places in this region (Epstein et al., 2007). What accounts for these apparently discrepant results? Previous studies have demonstrated that PPA response to a scene image is reduced when the image is repeated (Epstein et al., 1999; Ewbank et al., 2005). Because each image in the Penn condition was shown three times during the course of the experiment (i.e., in the Penn, location A, and orientation A conditions), whereas each image in the not Penn condition was only shown once, it is possible that the effect of place familiarity was masked by response reductions caused by image repetition.
To test this, we performed an additional analysis in which we separately modeled the response to the first, second, and third presentation of the images in the Penn/location A/orientation A stimulus set. As can be seen in Figure 4, fMRI response was significantly reduced by stimulus repetition in both the PPA (F(2,28) = 25.5; p < 0.000001) and RSC (F(2,28) = 22.6; p < 0.00001). Critically, the PPA response to the first presentation of Penn images was significantly higher than the PPA response to the not Penn images (t(14) = 3.8; p < 0.01), replicating our previous results. However, PPA response to the second presentation of the Penn images was no larger than the response to not Penn images (t < 1, NS), and PPA response to the third presentation was even further reduced. RSC response to Penn images was higher than response to not Penn images for all three presentation positions (all t values >3.8; all p values <0.002). Thus, an effect of environmental familiarity can be observed in the PPA, although it is much less robust than the familiarity effect in the RSC.
Although our primary focus was on the PPA and RSC, we also performed exploratory whole-brain analyses to identify regions that were differentially activated by the various retrieval tasks. Of particular interest was identifying regions that were more active for the navigationally intensive location and orientation tasks than for the less navigationally intensive familiarity task. To this end, we examined the location versus Penn and orientation versus Penn contrasts across the entire brain.
Results are shown in Figure 5. There was a striking degree of overlap between the regions activated in the location > Penn and orientation > Penn contrasts (Fig. 5a,b, orange regions). These overlapping regions included the superior frontal gyrus/sulcus, ascending and descending segments of the intraparietal sulcus, and two foci in the thalamus. In addition, the RSC/parietal-occipital sulcus region was activated for the location > Penn contrast, whereas several foci in the cerebellum were activated in the orientation > Penn contrast (data not shown). Some of these regions have been previously identified as being critical for spatial transformation processes, such as mental rotation and mental perspective taking (Harris et al., 2000; Creem et al., 2001; Zacks et al., 2003a,b; Wraga et al., 2005; Keehner et al., 2006; for review, see Zacks and Michelon, 2005). Thus, activity in these regions may reflect engagement of generic spatial processes that are useful across a variety of tasks and a variety of scales. These processes would complement the navigationally specific encoding and recognition processes supported by the PPA and RSC. Alternatively, some of the frontal/parietal/thalamic activations may reflect attentional or eye movement differences between the tasks (Curtis and D'Esposito, 2006).
The reverse contrasts (Penn > location and Penn > orientation) revealed several regions more active in the less navigationally demanding conditions, including the dorsal and ventral medial frontal cortices, anterior temporal lobe, anterior hippocampus, amygdala, angular gyrus, middle occipital gyrus, and cuneus (Fig. 5a,b, blue regions). Some of these regions (medial frontal, angular gyrus, middle temporal lobe) have been previously identified as forming a “default network” that tends to be deactivated during cognitive tasks (Gusnard et al., 2001; Raichle et al., 2001; Golland et al., 2007); as such, the relatively greater activity during the Penn condition in these regions may simply reflect the fact that the familiarity task is less difficult. Similarly, activation of the temporal pole and anterior hippocampus may reflect greater episodic or narrative processing related to the scenes during the easier familiarity task.
We also compared activity between the Penn and not Penn conditions and between the location and orientation conditions. For the familiarity contrast (Penn > not Penn), we observed greater activity during the Penn condition in a wide swath of the medial and lateral parietal cortices (including RSC/posterior cingulate, precuneus, inferior parietal lobe) and also in the left middle temporal lobe and several frontal regions (Fig. 5c). These data are particularly interesting when considered in light of results from two earlier studies that examined effects of location familiarity. In the first study, greater response to familiar than to unfamiliar locations was found in the RSC (with somewhat weaker effects in transverse occipital sulcus and PPA) but not in the wide swath of parietal cortex observed here (Epstein et al., 2007). Interestingly, this study used substantially briefer image presentation times (700 ms, as opposed to 3300 ms in the current study). The second study used even faster presentation times (30/70 ms) and found no effect of familiarity even in the RSC unless subjects were explicitly required to identify the images as specific locations [see experiment 2 in Epstein and Higgins (2007)]. Taken as a whole, these results suggest that familiarity effects in the RSC require a minimum presentation time to develop and may spread from the RSC to other parietal regions over a relatively slow time course of 1–3 s. This slow time course may possibly reflect conscious mental imagery of unseen aspects of the familiar locations. For the location versus orientation contrast, we observed that the precuneus and right inferior frontal gyrus (BA45) were more active in the location condition, whereas the caudate nucleus was more active for the orientation condition (data not shown). We do not at present have an explanation for this pattern, which was not predicted.
We investigated the role of the PPA and RSC in spatial navigation by measuring fMRI response in these regions while subjects made location, orientation, or familiarity judgments on photographs of real-world environments. We found clear evidence for a dissociation of function between these regions. Response in the PPA did not vary as a function of task or stimulus familiarity, indicating a primary role for this region in scene perception. In contrast, response in the RSC varied strongly with both task and familiarity, consistent with a primary role for this region in topographical memory retrieval. In summary, the results support our hypothesis that the PPA and RSC play distinct but complementary roles in place recognition.
The role of the PPA in scene processing has been long established (for review, see Epstein, 2005). The PPA responds strongly when scenes are viewed or imagined (O'Craven and Kanwisher, 2000), both within the framework of an explicit navigational task (Aguirre et al., 1996; Maguire et al., 1998) and during passive viewing (Epstein and Kanwisher, 1998; Goh et al., 2004). It responds less strongly to nonscene objects, although even here the response is modulated by the geometric richness of the context within which the object was initially viewed (Janzen and van Turennout, 2004; Aminoff et al., 2007). Based on these results, we have hypothesized previously that the PPA represents the spatial structure of the currently visible (or imagined) visual environment. The present results strongly support this hypothesis. Despite the large differences in accuracy and RT between the navigationally intensive location and orientation retrieval tasks and the less navigationally intensive familiarity task, there was no difference in response between these three conditions. As far as the PPA is concerned, the important thing is that a scene is viewed, not what kind of information is retrieved in response to it.
Also consistent with a PPA role in scene perception is the fact that PPA responses to familiar and unfamiliar locations were equivalent. However, we treat this finding with caution, because it appears that PPA responses in location A, orientation A, and Penn conditions were reduced by image repetition relative to the response in the not Penn condition, thus masking a small familiarity effect. Indeed, in other experiments, we have observed a small but reliable advantage for familiar versus unfamiliar places in the PPA (Epstein et al., 2007). Nevertheless, the general pattern across all experiments is clear: the first-order response in the PPA is determined by stimulus content, not retrieval task or environmental familiarity.
In contrast to the hypothesized PPA role in scene perception, the current results strongly implicate RSC in retrieval of long-term spatial knowledge. Three aspects of the results support this conclusion. First, RSC response was stronger when viewing familiar scenes for which the beyond-the-horizon spatial surroundings were known than when viewing unfamiliar scenes for which this knowledge was unknown. Second, RSC response was stronger when subjects explicitly retrieved long-term spatial knowledge when making location and orientation judgments than when they made simple familiarity judgments that did not explicitly require retrieval of this information. Third, RSC response varied as a function of the type of spatial information retrieved, with greater activity during location retrieval than during orientation retrieval. In summary, the RSC response to scenes was modulated by the potential for retrieving long-term spatial information, the effort to retrieve long-term spatial information, and the type of spatial information retrieved.
The proposed division of labor between the PPA and RSC is consistent with results from several lines of previous research. First, neuropsychological studies indicate that damage to the PHC leads to an inability to identify scenes (Habib and Sirigu, 1987; Mendez and Cherrier, 2003), whereas damage to the RSC leads to a syndrome in which patients can identify scenes by name but cannot use this information to orient in the wider world (Takahashi et al., 1997; Katayama et al., 1999). These results suggest that the parahippocampal cortex is more critical for recognition of the immediate scene whereas the RSC is more critical for using the local scene to situate oneself within the larger environment. Second, neuroimaging studies have observed RSC activity in a variety of memory retrieval tasks, both with spatial (Burgess et al., 2001; Ino et al., 2002) and episodic (Cabeza et al., 2004) information, consistent with the memory retrieval role we propose here. Third, in a previous study examining neural correlates of scene recognition, we observed greater activity in both the PPA and RSC when subjects identified scenes as specific locations than when they classified scenes into general categories (Epstein and Higgins, 2007). In the PPA, the greater response during location identification was attributable to the fact that the location task required more intensive processing of the specific idiosyncratic geometries of each scene, consistent with a PPA role in local scene representation. In contrast, some of the response difference in RSC was attributable to memory retrieval processes that were engaged during recovery of long-term knowledge about specific familiar locations but not during identification of general scene categories. The current study extends these earlier results by showing that the RSC is critically involved in retrieval of information that is explicitly spatial.
Although the current results do not allow us to identify the type of memory processes supported by the RSC, we can offer some speculations based on the current and previous results. It is of interest that the greatest RSC response was in the location retrieval condition. Recordings from rodent RSC have identified neurons that represent allocentric orientation (HD cells) (Chen et al., 1994), and also neurons that represent conjunctions of location and orientation (direction-dependent place cells) (Cho and Sharp, 2001). The monkey RSC is connected to parietal regions (area 7a), Papez circuit structures (anterior thalamic nucleus), and medial temporal regions (parahippocampal and entorhinal cortices) (Morris et al., 1999; Kobayashi and Amaral, 2003). Byrne et al. (2007) have proposed that RSC combines information about allocentric head direction (received from Papez circuit structures) with information about the distances and egocentric bearings of local surfaces (received from parietal regions) to calculate the distance and bearings to bounding surfaces in allocentric coordinates. This information is then transmitted to the medial temporal lobe. Our data are consistent with the idea that RSC mediates the translation between egocentric observer location (which can be ascertained from the immediate scene) and allocentric location (which cannot be calculated without retrieval of long-term knowledge). HD cells and direction-dependent place cells may encode the spatial quantities that make this egocentric to allocentric transformation possible.
It is also worthwhile to consider how the current results relate to cognitive theories of spatial processing. Wang and Spelke (2002) have proposed that human spatial navigation relies primarily on three systems: (1) a path integration mechanism, (2) a mechanism for place recognition based on matching of viewpoint-specific “snapshots,” and (3) a “geometric module,” which recovers heading direction after disorientation by reference to local scene geometry (Cheng, 1986). More recently, Burgess (2006) has proposed that these mechanisms are complemented by another set of processes that encode location and orientation in world-centered coordinates. Our data are consistent with the idea that the PPA mediates a viewpoint-specific snapshot system. The particular sensitivity of the PPA to geometric information, such as the presence or absence of large fixed bounding surfaces (Epstein and Kanwisher, 1998; Henderson et al., 2006), suggests that these PPA snapshots might provide the input that allows the geometric module to operate; however, the relationship between the PPA and the geometric module should be considered speculative at this point (Cheng and Newcombe, 2005). It is also unclear whether the PPA goes beyond representing the visible aspects of the local scene to encode information about the orientation or location of the body in local space. Importantly, however, the insensitivity of the PPA to retrieval task or scene familiarity suggests that it is does not represent orientation or location in global (i.e., beyond-the-horizon) space. As discussed above, identification of global location appears to involve RSC processes, perhaps in conjunction with Papez circuit mechanisms that represent global orientation (Sharp et al., 2001).
In summary, our results demonstrate a clear division of labor between the PPA and RSC. The former encodes the visuospatial structure of the immediate scene, whereas the latter uses the immediate scene as a cue to access stored representations of the broader environment. These results are a first step in the larger goal of understanding the relationship between the mental processes that underlie spatial navigation and the neural systems that support them.
This work was supported by grants from the National Institutes of Health (EY-016464) and the Whitehall Foundation (2004-05-99-APL) to R.A.E. We thank Sean Macevoy for useful discussions, Petya Radoeva for assistance with data display, and Steve Higgins for the University of Illinois stimuli.
- Correspondence should be addressed to Russell A. Epstein, Department of Psychology, 3720 Walnut Street, Philadelphia, PA 19104-6241.