The parahippocampal cortex (PHC) has been implicated in both place/scene processing and episodic memory. We proposed that this region should instead be seen as intrinsically mediating contextual associations and not place/scene processing or episodic memory exclusively. Given that place/scene processing and episodic memory both rely on associations, this modified framework provides a platform for reconciling what seemed like different roles assigned to the same region. Comparing scenes with scenes, we show here that the PHC responds significantly more strongly to scenes with rich contextual associations compared with scenes of equal visual qualities but less associations. This result adds unequivocal support to the view that the PHC mediates contextual associations in general rather than places or scenes proper, and necessitates a revision of the current view that the PHC contains a dedicated place/scenes “module.”
The parahippocampal cortex (PHC) is a region within the medial temporal lobe (MTL) that has been implicated in memory-related functions. However, in parallel to its well accepted role particularly in episodic memory (Gabrieli et al., 1997; Brewer et al., 1998; Wagner et al., 1998; Schacter and Wagner, 1999; Ranganath et al., 2004; Daselaar et al., 2006), others view it as central to place-, scene-, and navigation-related processing (Aguirre et al., 1996; Stern et al., 1996; Epstein and Kanwisher, 1998; Janzen and van Turennout, 2004; Yi and Chun, 2005; Epstein et al., 2007; Henderson et al., 2007), and it has even been termed the “parahippocampal place area” (“PPA”) (Epstein and Kanwisher, 1998). Episodic memory and scene processing share some common characteristics and functions: episodic memories often involve recollection of spatial contexts, and similar networks seem to be involved in remembering past episodes and imagining novel scenes or future events (Addis et al., 2007; Bar, 2007; Buckner and Carroll, 2007; Hassabis et al., 2007; Szpunar et al., 2007) (for review and discussion, see Schacter et al., 2007). Importantly, however, PHC contributes to episodic memory formation even under conditions in which scene or place processing is minimal (Wagner et al., 1998; Ranganath et al., 2004). As a result, it used to be difficult to attribute a unique, unequivocal role to this region because it was unclear how the same cortical region could mediate two differing functions such as episodic memory and spatial processing (Fig. 1).
We proposed a modified framework with which to consider the function of the PHC, whereby the PHC more generally mediates the representation and processing of contextual associations (Bar and Aminoff, 2003; Bar, 2004; Fenske et al., 2006; Aminoff et al., 2007). Within this framework, it is easy to reconcile spatial processing and memory findings in the PHC, given that both rely heavily on associations: spatial representations rely on the association between what and where, and episodic memory is based on associations among elements that comprise a specific episode. This framework provides a natural platform for bridging the various functions attributed to the PHC. Indeed, that MTL structures should be seen as more general than space specific has been suggested in the past (Eichenbaum et al., 1999; Lipton et al., 2007). To test this proposal, we conducted here a direct study of whether there is truly a uniquely place/scene-specific module in the PHC, or whether instead scenes and places activate the PHC because they rely on contextual associations in their representation.
To perform this test, we contrasted scene stimuli with other scene stimuli, where the difference between the scenes in the two contrasted conditions was the number of contextual associations that they elicit (Fig. 2). Specifically, scenes in the strong contextual associations condition included a foreground object (or objects) that, based on our previous surveys (Bar and Aminoff, 2003), possessed a significant number of strong contextual associations (e.g., a traffic light or a desk chair), whereas the scenes in the weak contextual associations condition focused on an object (or objects) that did not have unique, strong contextual associations (e.g., a bottled water or a plain wooden table). All scenes looked realistic and contained coherent collections of objects in places. Importantly, a special effort was made to ensure that the scenes in the two conditions were equally complex in their details and visual properties, including spatial frequency content, complexity as defined by number of objects in the scene, and scene type (i.e., indoor vs outdoor) (see Materials and Methods). Functional magnetic resonance imaging (fMRI) activation, particularly in the PHC, was compared when participants viewed the scenes/places in these two conditions. By carefully equalizing place and visual information in the scenes of both conditions, this study goes beyond previous demonstrations and tests directly whether the PHC shows exclusive place sensitivity or whether its demonstrated response to place information is a result of the broader role of the PHC in contextual processing. If the PHC, and the PPA in particular, shows increased response to scenes of stronger contextual associations compared with scenes of comparable visual information but weaker contextual associations, it will provide strong support for the proposal that the PHC should be seen as intrinsically mediating contextual associations. Such a result would make it difficult to argue that the PPA is sensitive exclusively to the place information available in scenes, or to scenes in general, because this information is equated here across conditions.
Materials and Methods
Twenty-four participants were scanned in this experiment. The participants were all right-hand dominant, with a mean ± SD age of 22.8 ± 3.22 years. All participants had normal or corrected-to-normal vision. Informed written consent was obtained from each of the participants before the scanning sessions. All procedures were approved by Massachusetts General Hospital Human Studies Protocol number 2000-002038. Five participants were excluded from the analysis because of technical difficulties: two participants fell asleep during the experiment, one participant was excluded because of technical problems with the presentation of the stimuli, and two because of technical difficulties with the scanner. One additional participant was removed from the analysis because of outlier data whereby the percentage signal change in each region of interest (ROI) for this participant was consistently higher by at least two SDs from the group mean. The remaining 18 participants were used in the analysis of this experiment, 12 of which were females.
Colorful photographs of everyday scenes were used in this experiment. A portion of the scenes was generously provided by Dr. Marvin Chun (Yale University, New Haven, CT) and Dr. Helene Intraub (University of Delaware, Newark, DE) (Park et al., 2007). Each scene contained a background of varying level of complexity and at least one complete foreground object. The foreground object(s) either had strong contextual associations (e.g., a gas pump) or was only weakly associated with multiple contexts (e.g., a plain wooden table). These levels of associativity were derived from extensive surveys we conducted previously (Bar and Aminoff, 2003). In those surveys, the list of strong contextual objects was derived from two types of procedures: one in which participants were given a context (e.g., a kitchen) and asked to name the object most related to that context (e.g., an oven). The top two objects most commonly named for each context across participants were considered to have strong contextual associations. In the second survey, we gave participants the name of an object (e.g., an oven) and asked them to name the context most strongly associated with the object (e.g., a kitchen). The objects that were consistently associated with only one context were considered to have strong contextual associations; the objects that yielded a large variety of associated contexts, or no context at all, were considered to have weak contextual associations. Strongly contextual objects are highly diagnostic of a particular context, whereas weakly contextual objects are not, by our definition. The scenes used here were divided into strong and weak contextual associations corresponding to the contextual assignment of their foreground object(s) in those surveys.
Importantly, overall scene complexity was matched between the scenes in the two conditions. The complexity of a scene was defined by the number of objects in it, and the average number of objects in the strong and weak conditions was equated: the average number of objects for the strong scenes was 4.3 and for the weak scenes was 4.0 (t = 0.85; p > 0.39). There were a total of 256 scenes presented; 38 of these scenes were removed from the analysis to equate the complexity between the strong and weak conditions. The equated conditions had 107 scenes in the strong condition and 111 in the weak condition. Place information was matched between the strong and week contextual associations scene by having the same proportion of indoor (e.g., rooms; 12%) and outdoor environments (82%). Finally, it was also important to make sure that the scenes in the two conditions do not differ significantly on low-level properties. Therefore, we conducted a spectral analysis comparing spatial frequency content in the scenes of both conditions. To do this, we used two methods: a two-dimensional Fourier transform (FFT2) and the Welch's periodogram method. A two-sample t test between the means across all spatial frequencies of the images in the strong versus the images in the weak context conditions yielded the following: t(1,256) = −0.15, p > 0.88 for the FFT2 method; t(1,256) = −0.29, p > 0.77 for the Welch method. Therefore, we can say with confidence that the only consistent difference between the scenes in the two conditions was the strength of the contextual associations that they elicit.
For some of the analyses, the stimuli were further divided into categories based on place information, i.e., whether an indoor or an outdoor scene. The indoor scenes were further divided into categories defined by the number of objects in the scene: low (one to two objects), medium (three to four objects), and high (more than four objects). This was done only with the indoor scenes because of the ambiguity in defining individual objects with the outdoor scenes (i.e., a background of trees).
Procedure and design
There were three phases in the experiment. The first phase consisted of two fMRI runs in which participants passively viewed photographs of scenes. Although no active response was required, participants were explicitly instructed that both the foreground and the background are as important (i.e., to remember them for a subsequent memory task) and that they should attend both equally. During these two runs, each trial consisted of a picture that was presented for 1500 ms, with a 1500 ms interstimulus interval. One hundred twenty-eight scenes were presented in each run with intermixed additional 42 fixation trials. Picture trials from the strong context condition and from the weak context condition were intermixed with fixation trials in a predetermined order for each functional run to maximize efficiency and accuracy in extracting the hemodynamic response function (order was created by the function optseq of the FreeSurfer Toolbox). Stimuli subtended a 9° visual angle. The second phase of this experiment consisted of a similar setup as the first phase but pertains to a false memory hypothesis and is beyond the scope of this study. No data from this second phase were used for analysis in the present paper. The third phase of this experiment consisted of a PPA localizer adapted from Epstein et al. (2003). It consisted of three runs using a block design in which task-related blocks alternated with fixation blocks. Each block lasted 20 s. In each task block, 20 pictures of the same kind of stimulus were presented. We used five types of stimuli: indoor scenes, outdoor scenes, weak contextual objects, faces, and scrambled colorful pictures. In each block, a picture was presented for 400 ms, with a 600 ms interstimulus interval. Each run consisted of two blocks per stimulus type, except for the scrambled pictures, which only had one block. Participants had to perform a one-back memory task, and pressed a button if the picture repeated. There were two repetitions per block, randomly interspersed within the trials. In total, combining the three runs, there were 120 trials, or six blocks, per stimulus type, except for the scrambled pictures in which there was only 60 trials, or three blocks. Fixation blocks consisted of a black fixation cross in the middle of the screen that was presented for 1700 ms, with a 300 ms interstimulus interval. On the last presentation of each fixation block, the fixation cross changed to a red color to alert the subject that a picture block was about to begin. There was a total of eight blocks, or 80 trials, of fixation per run, with a total of 24 fixation blocks, or 240 fixation trials used.
The participants were engaged in viewing pictures while whole-brain fMRI scans were collected on a 3 tesla Siemens Tim Trio scanner using a gradient echo-planar imaging sequence [echo time (TE), 25 ms; flip angle, 90°]. The acquired slices were axial, parallel to the anterior commissure–posterior commissure line (33 slices, 3 mm thickness, 1 mm skip). The repetition time (TR) for the first phase of the experiment was 3000 ms, and the TR for the third phase, the localizer, was 2000 ms. Each participant had a series of both anatomical scans as well as functional scans. Anatomical images were acquired using a high-resolution three-dimensional magnetization-prepared rapid gradient echo sequence (128 sagittal slices; TE, 3.39 ms; TR, 2530 ms; flip angle, 7°; voxel size, 1 × 1 × 1.33 mm).
Functional data were analyzed using the FreeSurfer analysis tools (https://surfer.nmr.mgh.harvard.edu). Data from individual fMRI runs were first corrected for motion using the AFNI package (Cox, 1996) and spatially smoothed with a Gaussian full-width, half-maximum filter of 8 mm for viewing of the scenes in the first two runs and 5 mm in PPA localizer runs. The intensities for all runs were then normalized to correct for signal intensity changes and temporal drift, with global rescaling for each run to a mean intensity of 1000. For the first two runs, a finite impulse response model was used for the region of interest analysis, and a gamma-fit analysis was used for the statistical parametric maps. A gamma-fit analysis was used in the localizer runs. To account for intrinsic serial correlation in the fMRI data within subjects, we used a global autocorrelation function that computes a whitening filter (Burock and Dale, 2000). The data were then tested for statistical significance, and activation maps were constructed for comparisons of the different conditions. ROIs were based on a random-effect analysis.
Cortical surface-based analysis.
Once the data from all trials were averaged, the mean and variance volumes were resampled onto the cortical surface for each subject. Each hemisphere was then morphed into a sphere in the following manner. First, each cortical hemisphere was morphed into a metrically optimal spherical surface. The pattern of cortical folds was then represented as a function on a unit sphere. Next, each individual subject's spherical representation was aligned with an averaged folding pattern constructed from a larger number of individuals aligned previously. This alignment was accomplished by maximizing the correlation between the individual and the group but prohibiting changes in the surface topology and simultaneously penalizing excessive metric distortion (Fischl et al., 1999b). These methods have been described in detail previously (Dale et al., 1999; Fischl et al., 1999a, 2001; Segonne et al., 2004).
The PHC ROI and an additional ROI in the retrosplenial complex (RSC) were defined by the localizer part of the experiment. Using the contrast in which scenes (collapsed across outdoor and indoor scenes) were compared with faces, weak contextual objects, and scrambled pictures, greater activity elicited for scenes were used to define the scene-selective regions of the PHC (i.e., the PPA) and the RSC. This served as the structural constraint for analyzing the data of the first phase, in which the contextual modulation of scene perception was examined. A functional constraint was also used for these ROIs, whereby only voxels that had significant activation in the omnibus contrast were used (where the omnibus contrast is defined as testing for any difference, between any of the conditions, and critically against baseline). All of the voxels that met these constraints were then averaged, allowing the contrasts of interest to be computed across the resulting time courses. One participant did not show a scene-selective activation in the left hemisphere (LH) RSC, and, because these areas were defined by this localizer so that we can later examine the relative effects of strong and weak contextual scenes in scene-sensitive regions, this subject could not be included in the analysis of the LH RSC but was included in all other ROI analyses. A one-way repeated-measures ANOVA was performed for experimental conditions on the mean percentage of peak signal change calculated for each condition.
The critical contrast in directly testing whether the PHC contains an exclusively place/scene-related “module” or more generally mediates contextual associations was to compare activation elicited by scenes with strong contextual associations and scenes of comparable visual and spatial properties but weak contextual associations. The results of this main contrast show significant difference in the PHC as a function of associative strength, independent of place information and visual properties (Fig. 3). Similar results were obtained in the RSC, which has been proposed to mediate place/scene processing (Epstein et al., 2007), and to a smaller but significant extent in the medial prefrontal cortex (MPFC), both of which have been implicated in playing a central role in the processing and representation of contextual associations in addition to the PHC (Bar and Aminoff, 2003; Fenske et al., 2006; Bar et al., 2007). Additional increased activation for scenes with strong contextual associations compared with scenes with weak contextual associations was found on the lateral surface in the inferior parietal lobule. At a much reduced spatial extent, differential activity sensitive to scenes with strong contextual associations was found in the left cerebellum. Differential activation that was greater for scenes with weak contextual associations compared with scenes with strong contextual associations was found in the left and right caudate. However, this activity was at a much reduced spatial extent (spanning only one to three voxels) compared with the activation of the PHC, RSC, and MPFC. Given this limited evidence of differential activity in the caudate, we report the results but do not make claims about the underlying caudate role at this stage.
In this main contrast, it is important to remember that we are comparing scenes with scenes, in which the primary difference is whether the foreground objects have strong or weak contextual associations. Therefore, even the condition that is considered here as weak in terms of contextual associations is inherently highly contextual because a picture of scene with multiple objects that could elicit numerous associations. In fact, even weak contextual individual objects, presented in isolation (Bar and Aminoff, 2003), elicit activation in this network that, although lower than that elicited by strong contextual objects, is significant nevertheless. This observation emphasizes the associative way with which we perceive our environment (Bar et al., 2007).
Results from the PPA localizer, overlaid on the statistical map of the contrast of interest (Fig. 3A), reveal a substantial overlap between the processing of contextual associations and what has previously been attributed to place processing proper. The PHC and RSC regions that were differentially active in the PPA localizer (i.e., stronger for entire scenes compared with faces, weak contextual objects, and scrambled pictures) were used as ROIs to compare strong context with weak context scenes (Fig. 3B). In each such area, in both hemispheres, we found a significant effect of context such that the strong scenes activated the PPA and RSC ROIs significantly more than the weak scenes [PPA LH, t(17) = 3.25, p < 0.005; PPA right hemisphere (RH), t(17) = 3.14, p < 0.006; RSC LH, t(16) = 2.69, p < 0.02; RSC RH, t(17) = 2.89, p < 0.01].
We also examined the effect of place information and the effect of number of objects within the scene in each ROI. Scenes were separated into an indoor category and an outdoor category and tested for significant differences across both the strong context scenes and the weak context scenes. A significantly greater amount of activity was elicited when viewing scenes in an indoor setting compared with an outdoor setting [PPA RH, F(1,17) = 8.62, p < 0.01; PPA LH, F(1,17) = 6.18, p < 0.03; RSC RH, F(1,17) = 12.38, p < 0.003; RSC LH context, F(1,16) = 4.12, p < 0.06], which is consistent with previous such demonstrations (Bar and Aminoff, 2003; Henderson et al., 2007). We conducted an additional analysis in which we split the indoor scenes into three complexity categories defined by the number of objects: low, medium, and high (see Materials and Methods). The ROI analysis revealed significantly greater activity as the number of objects in the scene increased [PPA RH, F(2,34) = 4.13, p < 0.03; PPA LH, F(2,34) = 2.98, p < 0.06; RSC RH, F(2,34) = 4.00, p < 0.03; RSC LH, F(2,32) = 1.71, NS]. At each “number of objects” category and in each “place” category, the strong contextual scenes elicited more activity than the weak contextual scenes, demonstrating the dominant sensitivity to contextual associations. The number of objects within a scene, as well as the increased detail in the indoor scenes compared with the outdoor scenes, adds richness to the scene and thus correspondingly increases the number of associations elicited. Note that it is still theoretically possible that PHC responds to number of objects independent of contextual associations. To be able to argue that, however, one would have to create conditions in which the number of objects is different between scenes, while the number of contextual associations that the different conditions elicit remains constant, and test whether PHC activation will change accordingly.
In summary, we conclude from these findings that context modulates PHC activity for both types of place information (outdoor/indoor) and for any equated number of objects, and thus contextual associations are most critical for PHC activation independent of place/scene information, number of objects, and visual properties such as spatial frequency content.
We demonstrated that the PHC, and particularly the area that has been defined as place specific (PPA) within it, is intrinsically sensitive to the strength of contextual associations elicited by the scene, independent of place information and visual properties. This adds major support to our proposal that the PHC mediates contextual associations. Furthermore, the mere place information in the scenes, or the fact that the stimuli are scenes for that matter, is not sufficient to explain the operations of the PHC (and RSC). It is important to consider the implications of this demonstration of a direct sensitivity to contextual associations of the PHC to the view of its possible role in place and scene processing.
We have shown here that two scenes, which are on average equal in the information that they contain but different in the number of associations they elicit, activate the PHC to significantly different degrees. Therefore, the associative nature of complex scenes is a key factor in eliciting stronger parahippocampal activation. The data presented here do not imply that place information is not a relevant factor for the PHC. Place information is extremely rich with associations and, as such, is expected to rely heavily on a region that mediates contextual associations. In other words, spatial associations are primarily contextual associations, and place scenes are made up of rich contextual associations. Our perception of the function of the PHC is that it is critical for associative processing in various domains. This broader framework allows reconciling the fact that both spatial and nonspatial material activates the PHC, when sufficiently associative (Bar and Aminoff, 2003; Aminoff et al., 2007). In fact, our findings are consistent with the findings reported by Epstein and Kanwisher (1998), whereby pictures of rooms with objects elicited stronger PHC activation than rooms without objects, which we see as the cortical manifestation of the increased number of associations that is elicited by an increased number of objects in the scene. Taken together, we do not argue that the PHC is insensitive to place/scene information but rather that the sensitivity that it shows to such information is derived from its more broadly defined role in associative processing. Therefore, the intrinsic function of the PHC is in contextual associations, and this sensitivity accommodates both spatial and nonspatial associations. One prediction that stems from this framework is that place information that is not associative, if such information exists, will not activate the PHC because it lacks associations. By showing that the PHC response is better correlated with contextual associations than with place information proper we merely argue that the role of the PHC and the PPA in it should be considered in broader functional terms, and these terms should be dominated by an associative explanation rather than by a spatial explanation. This characterization collectively explains the activations observed in these regions in our context studies, in studies of spatial and scene processing, and in studies of episodic memory.
Contextual associations and place processing are distinguished here by showing that PHC activation is modulated by contextual associations even when place/scene information remains unchanged. As mentioned in Introduction, the PHC has also been implicated in episodic memory, which we have not addressed here directly. Episodic memories reflect experience with specific episodes and have been shown to require the hippocampus (Scoville and Milner, 1957; Burgess et al., 2002) presumably in addition to the PHC. When elements of such episodes repeat in our environment in sufficiently varying contexts, they are gradually dissociated from particular episodes and may gradually become more semantic than episodic. For example, seeing a bottle of water in different contexts dissociates it from a particular context and leaves in memory a more semantic knowledge about it. The type of contextual associations we refer to with relation to the proposed role of the PHC might lie in between episodic and semantic information. When an object or an episode most frequently appears in a particular, unique context, this object/episode remains highly diagnostic of that context and is thus considered here as a strong contextual item. Objects and episodes that tend to appear in different contexts, conversely, become less predictive of a context and thus are considered weakly contextual. We have proposed that strongly contextual information is clustered in memory in context frames (Bar and Ullman, 1996; Bar, 2004). Such context frames represent prototypical, recurring information pertaining to the specific context, such as identities of objects most commonly associated with that context, as well as typical relationships between them. To understand to what extent such context frames are episodic or not, one would have to study this issue directly in the future, for example by creating episodes that are either congruent or incongruent with a context, and evaluating whether activation is more sensitive to the episodic nature of the information or to its relation to typical contexts.
The indication that the caudate may be more sensitive to scenes with weak contextual associations than scenes with strong contextual associations might suggest that objects that appear in many different contexts may have a representation that is more semantic rather than episodic in nature. This issue is of potential importance for theories of multiple memory systems and deserves a more direct testing in the future.
We have concentrated here on the PHC in particular, because the primary goal of this study was to test directly whether its sensitivity is more related to place information or more broadly to contextual associations. However, the cortical network involved in context-related processes is more distributed and in addition to the PHC includes the RSC and the MPFC. All of these regions showed here an activation pattern that was correlated with magnitude of contextual associations in the scenes, as elaborated in Results. Contextual processing is a broad term that encompasses several subprocesses. The division of labor between these context network components is outside the scope of this specific study, but we have proposed and shown initial evidence (Bar and Aminoff, 2003; Bar, 2007; Aminoff et al., 2008) that the RSC contains contextual representations that are more abstract and prototypical (i.e., context frames), the PHC is more directly related to the specific appearance and physical properties of a context and its associated elements (and thus might contain more episodic versions of context frames), and the MPFC, which shows significant activation for strongly contextual information primarily when the task requires an explicit focus on the context, might be more related to the context-based generation of predictions. Naturally, given that this characterization of the context network is in its initial stages, additional studies will help refine these definitions further.
It is easy to see why the pioneering studies that have been conducted previously to study scene and place analysis in the cortex came to the conclusion that the PHC contains a region that is strictly place/scene sensitive (Aguirre et al., 1998; Epstein and Kanwisher, 1998). After all, the idea that humans have developed special sensitivity to orientation-related visual information is highly appealing. [Indeed, there are convincing reports of more exclusively place-related processing and representations in the neighboring entorhinal cortex (Fyhn et al., 2004; Brun et al., 2008).] By building on those initial PHC studies while controlling for level of contextual associations, we show that the role of the PHC should be considered more generally as mediating contextual associations, which are central for navigation, processing place information, and scenes but are used also for other cognitive functions relying on such associations.
This work was supported by National Institute of Neurological Disorders and Stroke Grant NS050615 and a Dart Neuroscience Award (both to M.B.), National Institute of Mental Health Grant MH-NS60941 (D.L.S.), and National Research Service Award T32MH070328 (E.A.). We thank M. Chun and H. Intraub for their generosity in providing some of the scene stimuli, K. Kveraga for his help with analyzing the spectral content of the stimuli, and two anonymous reviewers for constructive comments.
- Correspondence should be addressed to Moshe Bar, Martinos Center for Biomedical Imaging at Massachusetts General Hospital, Harvard Medical School, 149 Thirteenth Street, Charlestown, MA 02129.