## Abstract

As we navigate the world, we use learned representations of relational structures to explore and to reach goals. Studies of how relational knowledge enables inference and planning are typically conducted in controlled small-scale settings. It remains unclear, however, how people use stored knowledge in continuously unfolding navigation (e.g., walking long distances in a city). We hypothesized that multiscale predictive representations guide naturalistic navigation in humans, and these scales are organized along posterior-anterior prefrontal and hippocampal hierarchies. We conducted model-based representational similarity analyses of neuroimaging data collected while male and female participants navigated realistically long paths in virtual reality. We tested the pattern similarity of each point, along each path, to a weighted sum of its successor points within predictive horizons of different scales. We found that anterior PFC showed the largest predictive horizons, posterior hippocampus the smallest, with the anterior hippocampus and orbitofrontal regions in between. Our findings offer novel insights into how cognitive maps support hierarchical planning at multiple scales.

**SIGNIFICANCE STATEMENT** Whenever we navigate the world, we represent our journey at multiple horizons: from our immediate surroundings to our distal goal. How are such cognitive maps at different horizons simultaneously represented in the brain? Here, we applied a reinforcement learning-based analysis to neuroimaging data acquired while participants virtually navigated their hometown. We investigated neural patterns in the hippocampus and PFC, key cognitive map regions. We uncovered predictive representations with multiscale horizons in prefrontal and hippocampal gradients, with the longest predictive horizons in anterior PFC and the shortest in posterior hippocampus. These findings provide empirical support for the computational hypothesis that multiscale neural representations guide goal-directed navigation. This advances our understanding of hierarchical planning in everyday navigation of realistic distances.

## Introduction

When we navigate a city, we draw on our memory. We learn, retrieve, and update representations of relationships among different locations. This relational knowledge guides decisions and behavior (O'Keefe & Nadel, 1978; Burgess et al. 2002; Behrens et al., 2018; Momennejad, 2020) and has been captured by computational models of planning, inference, and spatial navigation (Garvert et al., 2017; Momennejad et al., 2017; Stachenfeld et al., 2017). While some argue for one-step relational representations, here we scale evidence for computational models suggesting that predictive representations of relational structures may be organized at multiple scales along hippocampal (Stachenfeld et al., 2017; Momennejad and Howard, 2018) and prefrontal (Christoff and Gabrieli, 2000; Koechlin and Hyafil, 2007; Momennejad and Haynes, 2013) hierarchies. Such hierarchical structure could also enable the extraction of abstract relational structures that unfold at lower levels (Fig. 1*A*).

Previously, we had shown that representations learned by reinforcement learning (RL) models capture human planning (Momennejad et al., 2017), in highly controlled experiments with fixed predictive scales. Here, we used model-based fMRI data analysis to test predictions of a multiscale predictive representation model (Momennejad and Howard, 2018) on brain signals collected during virtual navigation of a real-world city that participants had learned in their daily lives (dataset from Brunec et al., 2018). Our results show that representations in prefrontal and hippocampal hierarchies during real-world navigation were aligned with model predictions.

Our framework of multiscale representations stems as much from computational models as decades of electrophysiology and neuroimaging findings. Our first hypothesis was that, during virtual navigation, anterior hippocampus would display representational similarity at longer predictive scales than posterior hippocampus. Rodent place field size increases along the dorsal-ventral hippocampal axis, with larger and more overlapping place fields in more ventral regions (Jung et al., 1994; Kjelstrup et al., 2008; Strange et al., 2014; Contreras et al., 2018). Human fMRI evidence suggests a similar gradient along the posterior-anterior axis (homologous to the rodent dorsal-ventral axis) (Poppenk et al., 2013). The larger-scale anterior hippocampal representations might support goal-directed search (Ruediger et al., 2012), integration of spatial and nonspatial states further apart (Collin et al., 2015), and longer time horizons (Nielson et al., 2015). Posterior hippocampal representations are more myopic and may support fine-grained spatial relations (Evensmoen et al., 2013) and pattern separation in memory (Schlichting et al., 2015; Duncan and Schlichting, 2018; Lohnas et al., 2018). Recent computational models of predictive representations capture multiscale place fields and why they skew toward goals (Stachenfeld et al., 2017; Momennejad and Howard, 2018).

Our second hypothesis was that anterior PFC (antPFC) would display representational similarity to more distant states than posterior PFC. The PFC's hierarchical representations (Badre and D'Esposito, 2007) support active navigation and planning (Spiers and Gilbert, 2015; Epstein et al., 2017), computing alternative paths to goal (Javadi et al., 2017), reversals and detours (Spiers and Gilbert, 2015), and retrospective revaluation via offline replay (Momennejad et al., 2018). Neuroimaging evidence suggests a prefrontal hierarchy whereby more antPFC regions support relational reasoning (Christoff et al., 2009), abstraction (Christoff et al., 2001; Bunge et al., 2003), and prospective memory (Gilbert, 2011; Momennejad and Haynes, 2012, 2013).

Finally, we hypothesized that representational scales in antPFC would exceed the longest predictive horizons of hippocampal representations (Fig. 1*B*). The antPFC is the largest cytoarchitectonic region of the human PFC (Ramnani and Owen, 2004). PFC's recurrent interconnectivity enables information to linger across longer scales allowing slower learning and integration. In contrast, the hippocampus supports rapid statistical learning (Schapiro et al., 2013, 2016, 2017) and is less heterogeneous across mammalian species (Strange et al., 2014).

To test these hypotheses, we conducted model-based representational similarity analyses (Fig. 1*D*) on an existing fMRI dataset (Brunec et al., 2018), in which participants actively navigated to known goals (goal-directed condition), or followed a dynamic arrow along unfamiliar routes (GPS condition) in a virtual version of Toronto (Fig. 1*C*). The participants' experience in this virtual setup was as realistic as possible within the constraints of fMRI, and benefited from real-world familiarity, allowing us to compare predictive horizons on well-learned versus novel routes.

Consistent with our predictions, antPFC displayed representational similarity at longer horizons on goal-directed compared with GPS-guided paths. Anterior hippocampus followed the PFC, whereas posterior hippocampus supported smallest-scale predictive representations.

## Materials and Methods

##### Subjects

Twenty-two healthy right-handed volunteers were recruited. One participant was excluded because of excessive difficulty with the task (i.e., repeatedly getting lost). Two additional participants were excluded because of incomplete data or technical issues. Exclusions resulted in 19 participants who completed the study (9 males; mean age 22.58 years, range 19-30 years). The sample size was not predetermined using a power analysis. We reanalyzed the dataset from a previously published study and included all participants with usable data. All participants had lived in Toronto for at least 2 years (mean = 10.45, SE = 1.81). All participants were free of psychiatric and neurologic conditions, had normal or corrected-to-normal vision, and otherwise met the criteria for participation in fMRI studies. Informed consent was obtained from all participants in accordance with Rotman Research Institute at Baycrest's ethical guidelines. Participants received monetary compensation on completion of the study.

##### Experimental design and paradigm

The details of the experimental design have been reported previously (Brunec et al., 2018). The task used a realistic navigation software drawing on 360° panoramic images from Google Street View. This allowed participants to walk through a virtual Toronto from a first-person, street-level perspective. The navigation software was written in MATLAB version 7.5.0.342. Navigation was controlled using three buttons: left, right, and forward. A “done” button allowed participants to indicate that they had completed a route. The task was projected on a screen in the bore of the scanner viewed by the participants through a mirror mounted inside of the head coil. Participants navigated in four conditions, and navigated 16 routes in total (four in each condition, in a randomized order).

Data from two conditions of interest were analyzed in the present manuscript: goal-directed and GPS/arrow-following routes. The routes were constructed before the day of scanning: participants built routes with researcher assistance, using a computer program which showed overhead maps of Toronto. Additionally, sets of routes in areas of Toronto with which participants were generally unfamiliar were created. Four of these routes were randomly assigned to each participant to be used in the baseline (GPS) condition. In the scanner, participants were provided with goal-directed route destinations and asked to navigate toward the goal along the most goal-directed/comfortable route. GPS trials involved no goal-directed navigation; instead, participants followed a dynamic arrow (Fig. 1*C*). To navigate GPS-guided routes, participants used the same control buttons as they did along goal-directed routes. However, in the GPS condition, they did not know the goal or the distance. We only analyzed routes where participants successfully reached the goal (M-_{goal-directed} = 3.37, M-_{GPS} = 3.16 routes). Comparing these conditions enabled us to contrast navigational signals associated with goal-directed navigation with matched motor control and optic flow, but no goal.

##### fMRI acquisition and preprocessing

Participants were scanned with a 3T Siemens MRI scanner at the Rotman Research Institute at Baycrest. A high-resolution 3D MPRAGE T1-weighted pulse sequence image (160 axial slices, 1 mm thick, FOV = 256 mm) was first obtained to register functional maps against brain anatomy. Functional T2*-weighted images were acquired using EPI (30 axial slices, 5 mm thick, TR = 2000 ms, TE = 30 ms, flip angle = 70 degrees, FOV = 200 mm). The native EPI resolution was 64 × 64 with a voxel size of 3.5 mm × 3.5 mm × 5.0 mm. The preprocessed data were the same as those used in Brunec et al., (2018). Images were first corrected for physiological motion using the Analysis of Functional Neuroimages (Cox, 1996). All subsequent preprocessing steps were conducted using the statistical parametric mapping software SPM12 (Penny et al., 2011). Preprocessing involved slice timing correction, spatial realignment, and coregistration with a resampled 3 mm isotropic voxel size, with no spatial smoothing. The mean time courses from participant-specific white matter and CSF masks were regressed out of the functional images, alongside estimates of the 6 rigid body motion parameters from each EPI run. To further correct for the effects of motion which may persist despite standard processing (Power et al., 2012), an additional motion scrubbing procedure was added to the end of our preprocessing pipeline (Campbell et al., 2013). Using a conservative multivariate technique, time points that were outliers in both the six rigid-body motion parameter estimates and BOLD signal were removed, and outlying BOLD signal was replaced by interpolating across neighboring data points. This method further reduces effects of motion-induced spikes on the BOLD signal without leaving sharp discontinuities because of the removal of outlier volumes.

##### Analysis

We used two main representational similarity analyses (Fig. 1*D*). To maximally benefit from the temporal resolution afforded by fMRI, paths were discretized into steps: each step corresponded to a TR, during which an entire brain volume was measured. In the first analysis, we computed the correlation between every given step (TR) and the average of all future steps (TRs) within a particular horizon (e.g., mean of future 10 TRs following the current TR). In the second analysis, following the equations for predictive or successor representations (Dayan, 1993; Momennejad et al., 2017), we computed the correlation between every given step and the discount-weighted sum of future steps within a horizon. The pattern across voxels at each future TR was weighed exponentially using a discount parameter (i.e., γ value, ɣ) between 0 and 1, and the value of the discount parameter corresponded to the scale of abstraction, corresponding to different levels of a representational hierarchy (Momennejad and Howard, 2018).

##### ROI analysis

We investigated the predictive similarity of each state to future representations in a set of ROIs. To do so, we first extracted voxelwise time courses across each navigated route and *z*-scored the values within each voxel. We then ran two predictive similarity analyses. First, we measured the correlation of each time point (TR) with the mean of successor TRs within a given horizon (e.g., correlation between TR at time t, and the mean of 10 following TRs). Second, we correlated the voxelwise pattern at each time point (TR) within each navigated route with a discount-weighted sum of future TRs. The patterns at future TRs were weighted by different constant values (ɣ), corresponding to different predictive spatial scales. The specified ɣ values were 0.1, 0.6, 0.8, and 0.9 (Fig. 1*D*). With increasing ɣ values, time points further in the future remain weighted >0.

As the average distance traversed within each TR was ∼25 m, a ɣ value of 0.1 meant that only each subsequent step (1 TR away) was weighted >0, and steps farther in the distance contributed little to no weight to the sum of future representations. We computed the predictive horizon using the unit of fMRI measurement (i.e., a TR of 2 s). Hence, depending on the speed of navigation, which was matched across conditions (see Fig. 2*C*), each step could cover a varying range of spatial distances (in meters) within and across subjects. Here we used the average distance traversed within a given horizon. For a ɣ value of 0.6, ∼7 steps in the future were weighted >0, corresponding to ∼175 m (see Fig. 6*D*). For a value of 0.8, ∼15 steps or 375 m were weighted >0, while this was the case for ∼32 steps or 800 m for a ɣ value of 0.9 (see Fig. 6*D*).

The TR-by-TR correlations within each route were averaged to derive the representation of future states on each trial. We first applied this analysis to *a priori* ROIs, including bilateral anterior and posterior hippocampi (aHPC, pHPC) and anterior and medial prefrontal cortical ROIs (antPFC, mPFC). We also examined the same measure in the mPFC and antPFC. The antPFC and mPFC ROIs were defined as spheres surrounding peak voxels identified in preliminary findings from an fMRI adaptation of a known behavioral study of successor representations (Momennejad et al., 2017) reported by Russek et al. (2018). The spheres were centered on an anterior prefrontal voxel (MNI coordinates *x* = 8, *y* = 68, *z* = 8) and a medial prefrontal voxel (MNI coordinates *x* = −22, *y* = 56, *z* = 10). These analyses were performed for each of the ROIs, as well as a searchlight within the PFC.

##### PFC searchlight analysis

In order to identify any gradients of predictive representation in the PFC, a custom searchlight analysis was performed within a PFC mask (created in WFU PickAtlas). The analysis was restricted to gray matter voxels, and a spherical ROI with a 6 mm radius was used to iteratively correlate each TR with the discount-weighted sum of future states for voxels within each searchlight. The searchlight analysis was performed for four different values of ɣ: 0.1, 0.6., 0.8, and 0.9. The single-subject correlation maps were then compared against zero (AFNI *3dttest*++). The output *z* score maps were thresholded at values corresponding to 5% false positive rates established by a cluster-size permutation simulation (AFNI *ClustSim*).

##### Model-based analysis: the discount-weighted sum of successor states

This section addresses the reasoning behind testing the successor representation hypothesis in terms of pattern similarity between a given state and the discount-weighted sum of its successor states (Fig. 1). Consider an environment that consists of *n* states, some of which lead to one another. Consider *T* to be the *n* × *n* matrix of transition probabilities for one-step transitions among these *n* states. In a deterministic environment, when there is a transition from a given state *S _{i}* to state

*S*, we assign 1 in the ith row and jth column of

_{j}*T*. The successor representation under a random policy can be then computed from

*T*as follows (for comparison to policy-dependent SR, see Momennejad, 2020):

Equation 2 expands Equation 1 for computing the successor representation from state *s _{1}* to the goal state

*s*from

_{g}*T*, which is one cell in the SR matrix. Recall that

*T*denotes the matrix of one-step transition probabilities among adjacent states, while SR contains multistep dependencies among nonadjacent states. Here the parameter

*t*refers to the number of steps (or the distance) between states. This parameter need not denote temporal steps, and can denote any type of sequential relationship among states.

Assume the starting state is *s _{1}* and the goal state is

*s*. Expanding Equation 2, the successor representation from States 1-5 is the fifth element in the first row of the successor representation (Eq. 3), and corresponds to the expected discounted number of times we expect to visit State 5, if we start from State 1:

_{5}Equations 2 and 3 only capture 1 cell or element in the SR row associated with state *s _{1}*. In the successor representation framework, the

*sth*row of the SR matrix (the M matrix in Dayan, 1993 equations) is the representation we expect to observe when the agent is in state

*s*. It denotes how often we expect to visit the current state's successors on average and given a discount. A given row of the successor representation includes the present state, and the weighted representation of successor states. Thus, at the moment when an agent is in state

*s*, the row activation of successor states predicts the simultaneous activation of γ-weighted representations. We take this simultaneous row activation as the sum of all activated weighted states in the row (Eq. 4).

In short, the first row of the SR matrix corresponds to the representation that is simultaneously activated when the agent is in State 1, which is the sum of *M*(*s _{1}, s_{2}*),

*M*(

*s*),

_{1}, s_{3}*M*(

*s*),

_{1}, s_{4}*M*(

*s*). Since we only have a goal-directed trajectory, this can be the weighted sum of representations of successor states (Eq. 4). Each successor state is weighted by the discount factor (γ, ɣ) to the power of its distance (here in the number of states) to the starting state. A simple prediction following this weighted sum view is that being in a given state along the trajectory activates the row associated with that state and hence the weighted sum of successor states on that trajectory. This predicts neural similarity between the current state and the weighted sum of successor state representations.

_{1}, s_{5}We did not have access to pretraining representations of the stimuli, for example, the uncorrelated representation of each location on the trajectory before being associated with specific paths (through lived experience in Toronto). Since we do not have these pretraining representations, this method offers an approximation of the expected similarity structure. Therefore, as a general rule, we make the following prediction. In a goal-directed trajectory, and assuming the agent stays on path, we can assume that the transition probability between two adjacent states, for example, *T*(*s _{i}, s_{j}*), equals 1 (i.e., we have a deterministic Markov Decision Process (MDP)). We predict that Equation 3 approximates the pattern similarity of the TR in the

*ith*state to the weighted sum of TRs that are its successor states. The predictive horizon is the successor distance within which the discount parameter γ > 0 (see Fig. 6). We hypothesize that different parts of the brain will show pattern similarity contingent with different values of the discount parameter

This is a first step toward testing the multiscale predictive representation hypothesis in a realistic navigation setting. To improve prediction accuracy, future studies are needed that incorporate diverse paths through each state, to each goal, and to different goals. These studies should include a larger graph or MDP of the environment with different starting and goal locations. In order to study map-dependent and path-dependent changes in the representation of each location, a study design is needed where the participants learn a new environment. Such studies would enable us to compare pretraining and post-training neural correlations among the states or locations in the environment.

## Results

Participants navigated a set of distances they regularly traversed in everyday life (M-_{goal-directed} = 3.5, M-_{GPS} = 2.5 km). After completing each route, participants rated how familiar each route felt, and how difficult they found it to navigate on a scale from 1 to 9 (where 1 would correspond to least familiar and most difficult, respectively). As expected, the average reported familiarity was higher in the goal-directed condition (mean = 7.0, SD = 1.44) than in the GPS condition (mean = 3.0, SD = 0.51; *t*_{(18)} = −10.53, *p* < 0.001, *d* = 2.42; Fig. 2*A*). The subjective difficulty was similar in the goal-directed (mean = 6.89, SD = 1.43) and GPS (mean = 7.24, SD = 1.08) conditions, suggesting that all navigated routes were perceived to be similarly undemanding (*t*_{(18)} = 0.827, *p* = 0.419, *d* = 0.190; Fig. 2*B*). There was also no difference in movement speed across the goal-directed (mean = 24.91, SD = 7.66) and GPS conditions (mean = 25.23, SD = 2.25; *t*_{(18)} = 0.191, *p* = 0.851, *d* = 0.044; Fig. 2*C*). GPS routes did, however, include more turns (mean = 7.08, SD = 1.39) than goal-directed routes (mean = 5.86, SD = 1.78; *t*_{(18)} = 3.04, *p* = 0.007, *d* = 0.698; Fig. 2*D*). This was the case despite the GPS routes being shorter than goal-directed routes, on average (*t*_{(18)} = −4.31, *p* < 0.001, *d* = 0.989; Fig. 2*E*).

### Hippocampal and prefrontal gradients of near-future predictive representations

To investigate predictive representations along hippocampal and prefrontal hierarchies, we conducted a progression of analyses. First, we investigated representational similarity between each time point (TR) and the average of future *n* TRs, where *n* determined different future horizons (i.e., unweighted average of 1, 2, 3, 4, 5, or 10 future TRs) (Fig. 3). We conducted the analyses separately on 6 *a priori* ROIs of anterior-posterior hippocampus (split into 6 slices as in Brunec et al., 2018) and *a priori* selected mPFC and antPFC ROIs (see Materials and Methods). Second, we conducted the same analyses with discount-weighted sums of future TRs at different horizons (Figs. 4 and 5), focusing on two posterior and anterior hippocampal ROIs and mPFC and antMPFC. In follow-up analyses, we included the path distance on each route as a factor in the model. Third, we then conducted the discount-weighted sum RSA in a PFC-masked searchlight analysis to detect scales of representation in the PFC (Figs. 6 and 7).

### Representational similarity to mean of future TRs across horizons

We conducted linear mixed effects models on these similarity measures in bilateral hippocampi for each of the routes traveled within each condition. We included average Fisher's *z*-transformed similarity on each route as the dependent variable, and axial segment (1-6), number of TRs (1-5), and hemisphere (L, R) as fixed effects. Similar analyses were performed for PFC ROIs. Participants were included as a random effect. The random intercept mixed effects models were implemented in R (R Core Team) using the packages *lme4* (Bates et al., 2015) and *lmerTest* (Kuznetsova et al., 2017) to assess significance. This produced a Type III ANOVA table with Satterthwaite's method of approximating degrees of freedom. Where these included decimal numbers, they were rounded to the nearest integer. Effect sizes for individual factors in mixed effects models were calculated as η_{p}^{2} values using the *effectsize* R package (Ben-Shachar et al., 2020). For overall model fits, we report marginal pseudo-*R*^{2} values using the *r.squaredGLMM* function from the *MuMIn* R package, which represent the variance explained by fixed effects in the model (*R*^{2}_{M}) (Nakagawa et al., 2017; Bartoń, 2020). The similarity values for 10 TRs ahead were not entered in the present model because of the nonlinear shift from 5 to 10 TR, but they are plotted in Figure 3. All plots were generated with the ggplot2 package (Wickham, 2016).

#### Hippocampal results

We found a significant effect of axial segment (*F*_{(5,6796)} = 45.38, *p* < 0.001, η_{p}^{2} = 0.03), driven by greater future representations in the anterior segments compared with posterior ones. There was also a main effect of condition (*F*_{(1,6796)} = 1182.35, *p* < 0.001, η_{p}^{2} = 0.15), reflecting generally greater values in the goal-directed (Fig. 3*A*), compared with the GPS condition (Fig. 3*B*), and a significant effect of the future horizon (*F*_{(1,6796)} = 633.44, *p* < 0.001, η_{p}^{2} = 0.27), reflecting higher similarity values for states closer to the present. There was a main effect of hemisphere, reflecting higher values in the right compared with the left hemisphere (*F*_{(1,6796)} = 6.97, *p* = 0.008, η_{p}^{2} = 0.001). There were significant interactions between axial segment and condition (*F*_{(5,6796)} = 6.97, *p* < 0.001, η_{p}^{2} = 0.004), axial segment and future horizon (*F*_{(20,6796)} = 2.13, *p* = 0.002, η_{p}^{2} = 0.006), and condition and future horizon (*F*_{(4,6796)} = 13.16, *p* < 0.001, η_{p}^{2} = 0.008). The latter interaction is of particular interest as it suggests that the decline across different predictive horizons was greater in the GPS compared with the goal-directed condition. There was no significant three-way interaction (*F* < 1). The overall *R*^{2}_{M} of the model was 0.30.

#### PFC results

We fit the same models separately for the *a priori* selected ROIs in antPFC and mPFC. In the antPFC (overall *R*^{2}_{M} = 0.22), there was a significant main effect of condition (*F*_{(1,1222)} = 363.76, *p* < 0.001, η_{p}^{2} = 0.23), as well as a main effect of future horizon (*F*_{(4,1222)} = 48.36, *p* < 0.001, η_{p}^{2} = 0.14), but no condition by future horizon interaction (*F*_{(4,1222)} = 1.18, *p* = 0.319, η_{p}^{2} = 0.004). In the mPFC (overall *R*^{2}_{M} = 0.26), there was a significant effect of condition (*F*_{(1,1222)} = 218.77, *p* < 0.001, η_{p}^{2} = 0.19) and a significant effect of future horizon (*F*_{(4,1222)} = 114.82, *p* < 0.001, η_{p}^{2} = 0.27), but again no significant condition by future horizon interaction (*F*_{(4,1222)} = 1.82, *p* = 0.122, η_{p}^{2} = 0.006).

Comparing the representational similarity in the goal-directed and GPS conditions against zero, we found that the antPFC displayed >0 similarity for every predictive horizon, including 10 steps ahead, in the goal-directed condition (all *p* values < 0.001), but only up to 5 steps in the GPS condition (all *p* values for 1-5 steps < 0.001). In contrast, the mPFC only displayed >0 similarity up to 5 steps in the future on goal-directed routes (*p* values < 0.001) and 3 steps on GPS routes (*p* values ≤ 0.002). The anterior-most hippocampal segment displayed >0 similarity for up to 4 steps in the future (*p* values ≤ 0.006) on goal-directed routes and only 1 step on GPS routes (*p* < 0.001), while the posterior-most hippocampal segment displayed >0 similarity for 1 step on goal-directed routes (*p* < 0.001), and 2 steps on GPS routes (*p* values ≤ 0.006).

We next conducted similar analyses with the weighted sum of future TRs of different horizons.

### Model-based representational similarity to future TRs in ROIs

When an RL agent is in a given state during navigation, the discount-weighted sum of the successor states is the predictive representation for that state (see Materials and Methods). Therefore, we investigated the similarity between each time point and ɣ-weighted sum of the representations of future states. We ran a series of linear mixed effects models following the logic described above, including each route within each of the conditions. The models included Fisher's *z*-transformed representational similarity values as the dependent variable, with ɣ and condition as fixed effects and participant as a random effect. ɣ was modeled as an ordinal variable. All analyses were implemented using the same packages as above. For the hippocampus, the reported statistics and plotted values apply to the right hippocampus, as there was no significant difference between left and right hippocampi (all *p* values > 0.34).

#### Mixed effects analysis

The first mixed effects model included all ROIs to compare average representational similarity differences across regions with different hypothesized scales. There was a significant main effect of ɣ (*F*_{(2,1448)} = 322.14, *p* < 0.001, η_{p}^{2} = 0.31), suggesting (not surprisingly) more representational similarity within horizons that are closer to the present state. We also observed a significant main effect of condition (*F*_{(1,1452)} = 309.46, *p* < 0.001, η_{p}^{2} = 0.18), suggesting representational similarity at higher predictive horizons in the goal-directed compared with the GPS condition (Fig. 4*A*,*B*). There was a main effect of ROI (*F*_{(3,1448)} = 547.38, *p* < 0.001, η_{p}^{2} = 0.53), confirming the hypothesis of longer predictive horizons in the antPFC, followed by mPFC, aHPC, and pHPC. There was also a significant interaction between ɣ and condition (*F*_{(2,1448)} = 7.49, *p* < 0.001, η_{p}^{2} = 0.01), and a significant interaction between condition and ROI (*F*_{(3,1448)} = 10.13, *p* < 0.001, η_{p}^{2} = 0.02). The overall *R*^{2}_{M} of the model was 0.57.

#### Within-ROI analyses

Follow-up mixed effects models were conducted for predictive similarity values within each ROI. Significance was established against a Bonferroni-adjusted value of ɑ = 0.0125 (for 4 ROIs). In the antPFC (overall *R*^{2}_{M} = 0.27), there was a significant main effect of ɣ (*F*_{(2,347)} = 53.29, *p* < 0.001, η_{p}^{2} = 0.23). There was also a significant effect of condition, with significantly higher correlations in the goal-directed than the GPS condition (*F*_{(1,349)} = 103.42, *p* < 0.001, η_{p}^{2} = 0.23). There was no significant ɣ × condition interaction (*F* < 1). In mPFC (overall *R*^{2}_{M} = 0.34), there was again a significant main effect of ɣ (*F*_{(2,350)} = 106.39, η_{p}^{2} = 0.38), as well as a main effect of condition (*F*_{(1,352)} = 83.19, *p* < 0.001, η_{p}^{2} = 0.19) in the same direction as the antPFC. There was no significant ɣ × condition interaction (*F*_{(2,350)} = 3.44, *p* = 0.033, η_{p}^{2} = 0.02).

In the aHPC (overall *R*^{2}_{M} = 0.45), there was a significant main effect of ɣ (*F*_{(2,348)} = 151.90, *p* < 0.001, η_{p}^{2} = 0.47), a main effect of condition (*F*_{(1,350)} = 128.05, *p* < 0.001, η_{p}^{2} = 0.27), as well as a ɣ × condition interaction (*F*_{(2,348)} = 4.89, *p* = 0.008, η_{p}^{2} = 0.03). As in the mPFC, this interaction reflected a steeper slope across ɣ values in the GPS condition (–0.16) than in the goal-directed condition (–0.12). In the pHPC (overall *R*^{2}_{M} = 0.47), there was a significant main effect of ɣ (*F*_{(2,349)} = 218.38, *p* < 0.001, η_{p}^{2} = 0.56), a main effect of condition (*F*_{(1,351)} = 87.99, *p* < 0.001, η_{p}^{2} = 0.20), and a significant ɣ × condition interaction (*F*_{(2,349)} = 3.81, *p* = 0.023, η_{p}^{2} = 0.02), again reflecting a steeper slope in the GPS condition (–0.17), compared with the goal-directed condition (–0.13).

To test for evidence of predictive representations, we conducted one-sample *t* tests to test these values against zero, with an adjusted value of ɑ = 0.002 (24 comparisons in total). At ɣ = 0.1, the similarity values in all ROIs were significantly >0 in both conditions. At ɣ = 0.6, similarity values for all ROIs but the pHPC were significantly >0 in the goal-directed condition. In the GPS condition, however, similarity values in neither the aHPC nor the pHPC were significantly >0. At ɣ = 0.8, values in both antPFC and mPFC remained significantly >0 in the goal-directed condition, but only antPFC remained >0 in the GPS condition. For this value of ɣ, the values in aHPC and pHPC were not significantly >0 in either condition, and were actually significantly <0 in the pHPC. This significant negative correlation could reflect the differentiation of neural patterns across time, potentially as a manner of separating experience into fine-grained units.

### Representational similarity during goal-directed navigation is related to traveled path distance

If the hippocampus and PFC represent planning processes associated with the currently navigated route, these representations should be modulated by the route path distance. To test this, we included the path distance on each route as a factor in the mixed effects model. Path distance was calculated as the summed change in longitude and latitude coordinates between each adjacent pair of TRs. To account for the contribution of time, we also regressed out the number of TRs on each route. The reported model-fits thus account for the variability in the amount of time spent navigating different routes. Before running these models, we mean-centered distances within each participant to account for different ranges traveled. We excluded nine goal-directed routes from a total of 8 participants because of improbably long paths that diverged >1.5 km from the main path. Including these paths, however, did not change the significance of any of the results.

#### Path distance results

In the goal-directed condition (overall *R*^{2}_{M} = 0.71), there were significant effects of ɣ (*F*_{(2,630)} = 186.83, *p* < 0.001, η_{p}^{2} = 0.37) and ROI (*F*_{(3,630)} = 369.49, *p* < 0.001, η_{p}^{2} = 0.64). There was no significant main effect of path distance (*F*_{(1,645)} = 1.06, *p* = 0.304, η_{p}^{2} = 0.002), but there were significant interactions between ROI and path distance (*F*_{(3,630)} = 38.13, *p* < 0.001, η_{p}^{2} = 0.15) and ɣ and path distance (*F*_{(2,630)} = 6.47, *p* = 0.002, η_{p}^{2} = 0.02; Fig. 5). The plotted values in Figure 5 were estimated using the effects package in R (Fox, 2003; Fox and Weisberg, 2011). There was no significant interaction between ɣ and ROI, nor a three-way interaction (both *p* values > 0.30). As predicted, we observed no main effect of path distance in the GPS condition (*F* < 1), nor any interactions with ROI (*F*_{(3,671)} = 2.07, *p* = 0.103, η_{p}^{2} = 0.009) or ɣ (*F* < 1). The main effects of ɣ (*F*_{(2,671)} = 180.62, *p* < 0.001, η_{p}^{2} = 0.35) and ROI (*F*_{(3,671)} = 209.62, *p* < 0.001, η_{p}^{2} = 0.48) remained significant, however. The overall *R*^{2}_{M} of this model was 0.68.

#### ROI and path distance interactions

We conducted a linear mixed effects model for each of the ROIs, predicting representational similarity from path distance and ɣ. In the antPFC (overall *R*^{2}_{M} = 0.32), there were significant effects of ɣ (*F*_{(2,144)} = 33.92, *p* < 0.001, η_{p}^{2} = 0.32) and path distance (*F*_{(1,149)} = 91.51, *p* < 0.001, η_{p}^{2} = 0.38), but no interaction between the two (*F*_{(2,144)} = 1.09, *p* = 0.340, η_{p}^{2} = 0.01). This suggests that the effect of path distance was stable across different predictive horizons in antPFC. In mPFC (overall *R*^{2}_{M} = 0.32), the effects of ɣ (*F*_{(2,144)} = 71.77, *p* < 0.001, η_{p}^{2} = 0.50) and path distance (*F*_{(1,147)} = 82.70, *p* < 0.001, η_{p}^{2} = 0.36) were again significant, as was the interaction between the two (*F*_{(2,144)} = 3.62, *p* = 0.029, η_{p}^{2} = 0.05). In the aHPC (overall *R*^{2}_{M} = 0.40), there was a significant effect of ɣ (*F*_{(2,145)} = 69.82, *p* < 0.001, η_{p}^{2} = 0.49), a significant effect of path distance (*F*_{(1,152)} = 47.17, *p* < 0.001, η_{p}^{2} = 0.24), and a weaker interaction between ɣ and path distance (*F*_{(2,145)} = 2.93, *p* = 0.057, η_{p}^{2} = 0.04). Finally, in the pHPC (overall *R*^{2}_{M} = 0.50), there were significant effects of ɣ (*F*_{(2,144)} = 146.40, *p* < 0.001, η_{p}^{2} = 0.67), path distance (*F*_{(1,148)} = 82.52, *p* < 0.001, η_{p}^{2} = 0.36), and a weaker interaction between the two (*F*_{(2,144)} = 3.23, *p* = 0.042, η_{p}^{2} = 0.04).

#### Comparison with Euclidean distance

To establish how specific these results were to the traversed paths, we reran the models but this time included the Euclidean distance from start to goal as a predictor instead. In the goal-directed condition (overall *R*^{2}_{M} = 0.69), the effects of ɣ and ROI remained significant (both *p* values < 0.001), but there was no main effect of Euclidean distance (*F* < 1), and no significant interaction between ɣ and Euclidean distance (*F*_{(2,631)} = 1.93, *p* = 0.145, η_{p}^{2} = 0.006). There was an interaction between ROI and Euclidean distance (*F*_{(3,630)} = 3.33, *p* = 0.019, η_{p}^{2} = 0.02), but no three-way interaction (*F* < 1). In the GPS condition, the effects of ɣ and ROI were again significant (*p* values < 0.001), and there was a weaker main effect of Euclidean distance (*F*_{(1,49)} = 4.39, *p* = 0.041, η_{p}^{2} = 0.08), but no other main effects or interactions (all *p* values < 0.60).

### Model-based representational similarity in prefrontal searchlights

PFC has a much larger volume than the hippocampus. In order to identify hierarchies of predictive representations comparable to hippocampal ROIs, we ran a searchlight analysis and computed similarity for voxels within every spherical searchlight (of 6 mm radius). The searchlight analysis was performed for four values of ɣ (0.1, 0.6, 0.8, 0.9) within each of the conditions. The thresholded *z* score maps for different values of ɣ are displayed as overlays in Figure 6*A*, along with the average thresholded similarity maps within each condition (thresholded at 0.06; Fig. 6*B*).

#### Prefrontal hierarchy

To capture the gradient of values from the anterior-most to the posterior-most segments of the PFC, we calculated the average value of representational similarity across voxels within each anterior-posterior slice (i.e., the *y* direction). The slopes are plotted in Figure 7. These plots reveal a gradation of predictive representations extending from posterior-most to anterior-most slices of the PFC. This trend was reliable in both the goal-directed and GPS conditions, but the representational similarity values were consistently greater in the goal-directed condition.

To account for the proportion of different histologically defined brain regions covered by each significant cluster, we calculated the percentage of overlap between each prefrontal Brodmann area (BA) region and the significant voxels for each value of ɣ in each of the conditions (Table 1; Fig. 8). These percentages represent the proportion of each BA region covered by the significant thresholded clusters. We found the largest overlap between voxels in the antPFC (BA 10) and significant voxels in the searchlight analysis with various ɣ values. Following anterior and polar PFC was BA 11, corresponding to the OFC, and then BA 25 and 32, corresponding to subgenual area or cingulate cortex and ACC, respectively. These regions were followed by smaller overlap in area 47, corresponding to the orbital part of the inferior frontal gyrus, areas 46 and 9 corresponding to the dorsolateral PFC, and no overlap in area 45 corresponding to the inferior frontal gyrus.

### Representational similarity slope along PFC hierarchy

#### Controlling for distance: matched distance analysis

As discussed in earlier sections, the distances were not matched between the two conditions (Fig. 2*E*). To account for this difference, we conducted a matched analysis in which we manually selected pairs of routes with the minimum difference in distance for each participant, up to 1 km (Fig. 9*A*). We were unable to include 3 of the participants in this analysis as the distances in their goal-directed and GPS routes were too different (with a difference in distance > 1.5 km). For the remaining 16 participants, there was no significant difference between the selected GPS and goal-directed routes (*p* = 0.215). We ran a paired-samples *t* test comparing participants' prefrontal RSA maps for the two selected routes. We also included the difference in distance for the two selected routes as a covariate for each participant. The brain maps of the average correlation values thresholded at 0.04 are presented in Figure 9*B*, and the results of the 5% False Positivity Rate (FPR)-corrected *t* test in Figure 9*C*.

We compared matched-distance searchlight results in the goal-directed and GPS conditions. In this comparison, relatively few clusters significantly differed between the goal-directed and GPS conditions. However, the comparison at each level of ɣ suggests that there is a set of clusters along the rostrocaudal extent of the PFC which differentiates between goal-directed and GPS-guided navigation (Table 1). Notably, while only orbitofrontal clusters were significantly different for smaller horizons, more dorsal and rostral/polar PFC clusters emerged in the comparison of larger horizons or scales, between the goal-directed and GPS conditions. It is worth noting, however, to ensure matched distances between the goal-directed and GPS condition, we excluded individuals with a large difference between the distances in the two conditions. As a result, this analysis only included individual paths from 16 participants, which likely results in increased noise and lower statistical power, which is common when using more naturalistic data.

## Discussion

We investigated the hypothesis that relational knowledge, about navigational paths, is organized as multiscale predictive representations in hippocampal and prefrontal hierarchies. We found evidence for such multiscale representations in a task where participants navigated the city of Toronto virtually in goal-directed and GPS-guided conditions, with realistically long distances. Our fMRI representational similarity analysis between each state (TR) and a discounted sum of its prospective states (at multiple scales, 25-875 m) confirmed this hypothesis. These results support the idea that prefrontal-hippocampal representations organize relational knowledge, in this case for navigation, at different scales of generalization and abstraction (Behrens et al., 2018; Momennejad and Howard, 2018).

Our primary goal was to investigate at which scale, and in which condition, different brain regions remain informative about the upcoming path. Representational similarity patterns in different regions stopped carrying information about predicted paths at different horizons. Namely, for higher horizons, fewer regions had above chance similarity to planned paths (Fig. 3), suggesting a hierarchy of representation. Moreover, representations in posterior regions stopped being predictive at smaller scales, and gradually more anterior regions remained predictive at longer horizons (Fig. 4). Notably, there was an interaction between hierarchy and condition: in the goal-directed condition (Fig. 4*A*), the findings were more pronounced than in the GPS-guided condition (Fig. 4*B*). This finding reveals that during planning at realistically long horizons regions higher in the representational hierarchy carry predictive information.

We have reported four main findings. First, fMRI similarity reflected longer predictive horizons for paths in the goal-directed, compared with the GPS condition. Second, similarity in the anterior hippocampus and antPFC was significantly higher in the goal-directed condition and for longer horizons (Fig. 4). Third, predictive representations were organized along a posterior-anterior hippocampal gradient of predictive horizons (25-175 m) with larger scales in gradually more anterior regions (Fig. 3). Fourth, representational similarity to future horizons was organized along an anterior-posterior gradient in the PFC with larger-scale horizons (25-875 m) in gradually more anterior regions (Fig. 6).

### Representational hierarchy

In spatial navigation, hierarchical representation could enable hierarchical planning and subgoal computation (Ribas-Fernandes et al., 2019). Our proposal is that larger and more abstract scales of predictive representations in antPFC may support planning at larger scales (Fig. 1, large-scale graph). This higher-level plan may be translated into gradually more granular representations in prepolar PFC, OFC, and anterior hippocampal regions (Fig. 1, mid-scale graph), and finer-scale trajectories are translated by hippocampal gradients to the smallest predictive horizons of place fields (Fig. 1, small-scale graph). In our analyses, this was reflected in a gradation of representational similarity: at longer horizons, fewer regions had above chance similarity to planned paths.

### Hippocampal hierarchy

One possibility is that the PFC represents the global structure of each route, while the hippocampus supports fine-grained representation of individual locations. This is consistent with recent cognitive map work in rodents, monkeys, and humans indicating PFC's involvement in active navigation and planning (Epstein et al., 2017), as well as evidence that the dorsal-ventral or posterior-anterior hippocampal axis supports gradually larger spatiotemporal scales (Poppenk et al., 2013; Strange et al., 2014; Peer et al., 2019) and inference on mnemonic relations (Collin et al., 2015; Schlichting and Preston, 2015). Recent computational perspectives suggest that the hippocampus and PFC form and update a predictive map of the state space at multiple scales (Momennejad and Howard, 2018), organizing relational knowledge of spatial and nonspatial states (McKenzie et al., 2014; Schuck et al., 2016; Garvert et al., 2017; Bellmund et al., 2018).

### Prefrontal hierarchy

Comparing predictive similarity across the PFC (Figs. 6–9; Table 1) revealed an overall effect of condition and prefrontal gradient. Longer predictive similarity horizons were observed in the goal-directed versus GPS-guided condition, and antPFC regions showed predictive similarity for longer horizons: the anterior or polar PFC (BA 10; Table 1; Fig. 8), OFC (BA 11), and granular and ACC (BA 25 and 32), consistent with the slope of predictive similarity in Figure 7. Such a prefrontal hierarchy of relational abstraction could support task sets and schema. BA 10 is structurally well connected within the PFC and with the rest of the cortex and has long decay, thus a candidate for supporting higher scales of abstraction, from predictive representations with larger scales of integration to clustering relational graphs with higher radii. These functions may rely on longer sustained memory leading to binding over longer time scales, associating farther apart locations, or increasing representational similarity among clusters of associations.

### Nonspatial relevance

Previous work has proposed hierarchies of predictive representations along prefrontal and hippocampal gradients (Momennejad and Howard, 2018), a hierarchy of time scales in the brain (Chen et al., 2015), and a role for hippocampal-prefrontal interactions in integrating episodes to build abstract schema (Schlichting and Preston, 2017). Similar representational hierarchy may also underlie relational knowledge and category generalization (Constantinescu et al., 2016), abstraction and transfer (Cole et al., 2011), reward predictions (Takahashi et al., 2017), associative inference, and schema learning (Moscovitch and Melo, 1997; Zeithamova and Preston, 2010; Zeithamova et al., 2012; van Kesteren et al., 2013; Hebscher and Gilboa, 2016; Spalding et al., 2018; Yu, 2018; Lee et al., 2021). A crucially nonspatial body of evidence indicates a functional role for antPFC in the encoding and retrieval of prospective memory task-sets and goals (Haynes and Rees, 2006; Gilbert, 2011; Momennejad and Haynes, 2012, 2013). Lesions to the frontopolar cortex do not impair navigation, intelligence, or working memory, but impair multitasking and prospective memory (Burgess, 2000; Volle et al., 2011), such as completing a sequential plan for simple everyday tasks (Burgess, 2000). This fits with the proposal that the PFC is organized in a rostrocaudal hierarchy (Koechlin et al., 2003; Badre and D'Esposito, 2007; Koechlin and Hyafil, 2007; Koechlin, 2011), with more anterior or rostral regions corresponding to higher levels of integration and relational abstraction (Bunge et al., 2003; Christoff et al., 2009; Momennejad and Haynes, 2013).

### OFC

More anterior OFC yielded higher predictive similarity for larger predictive horizons (Figs. 4, 8). OFC and antPFC have both been suggested to support model-based RL (Daw et al., 2011), where an agent unfolds a learned state-action-state associative model during goal-directed planning and decision-making (McDannald et al., 2012, 2014; Daw and Dayan, 2014; Pauli et al., 2019). Thus, OFC may maintain task-relevant state-state relational maps that enable iterative value computation in planning and decision-making (Daw et al., 2005; Simon and Daw, 2011; Keiflin et al., 2013; Wilson et al., 2014; Wimmer and Büchel, 2019). Predictive representations in anterior hippocampus were the most similar to OFC representations, consistent with recent work on OFC-hippocampal interactions in model-based behavior (Wood and Grafman, 2003; Keiflin et al., 2013; Schuck et al., 2016; Wikenheiser and Schoenbaum, 2016; Miller et al., 2017; Vikbladh et al., 2019).

Prefrontal hierarchies of representations need not be static. These representations could be constructed from compressed representation, for example, eigenvectors (Stachenfeld et al., 2017), inverse Laplace transform (Momennejad and Howard, 2018), or generative models (Whittington et al., 2020). Future studies can shed light on prefrontal and medial temporal contributions to information integration, eigen-decomposition, generative models, and abstraction.

### Caveats and future directions

The fMRI dataset used here (Brunec et al., 2018) was acquired for different questions, leaving some caveats for present purposes, some of which we addressed in our control analyses, whereas others remain to be addressed by future studies.

The first caveat: navigated routes in the goal-directed condition were longer than those in the GPS-guided condition (Fig. 2). To overcome this, we controlled for distance in one-sample *t* tests to reveal regions with significant pattern similarity for a given horizon (Fig. 6*C*). In a more conservative analysis, we excluded longer routes from analyses, including only goal-directed routes within the range of distances in the GPS-guided condition (Fig. 9). Consistent with our earlier findings, longer predictive scales engaged more dorsal PFC regions in the goal-directed condition. Pending replication with more controlled designs, these control analyses suggest that our main findings are reliable (Table 1; Figs. 7 and 8).

The second caveat: the selection of routes did not include multiple past and future trajectories for each state, nor multiple past routes for each goal location. Such designs would enable testing the graph structure of relations, advancing previous work using routes with multiple paths (Balaguer et al., 2016; Chanales et al., 2017), and dissociating pattern similarity because of recent memory from pattern similarity because of predictive representations more directly.

Follow-up fMRI studies can also investigate compressed representations and abstraction by asking whether states that appear on many paths have a pronounced predictive representation (e.g., subgoals, states with special graph properties), and whether nearby locations are clustered as one state (or subgoal) by some brain regions. Further studies could compare the temporal hierarchy of large-scale predictive representations for higher-level plans (e.g., train from New York to Philadelphia) and smaller subgoal processing (e.g., walk to the train station). One way to test this is to orthogonally manipulate distance and the number and location of subgoals, such as turns. Such designs could also test the dynamics of goal and subgoal representation, complementing existing electrophysiology and neuroimaging work showing goal representation in MTL and PFC (Howard et al., 2014; Brown et al., 2016; Sarel et al., 2017; Tsitsiklis et al., 2019).

### Other interpretations

Theoretically, we have proposed that a given state has higher representational similarity to its frequently visited successors along the planned path (Ezzyat and Davachi, 2014; Momennejad et al., 2017). This can also be discussed in terms of increased association, integration, abstraction, and clustering (Ritvo et al., 2019); the spread of activation across memory networks (Sievers and Momennejad, 2019); or the replay of previous trajectories or paths (Wu and Foster, 2014; Ambrose et al., 2016; Momennejad et al., 2018). While there are clever analytic designs to hint one way or another, a clear-cut dissociation of these hypotheses requires higher spatiotemporal resolutions, such as electrophysiology and MEG.

In conclusion, we present support for the hypothesis that multiscale predictive representations in hippocampal-prefrontal hierarchies underlie cognitive maps and hierarchical planning. While posterior hippocampal regions supported smallest predictive scales, anterior prefrontal regions supported the largest predictive horizons. Follow-up studies can be designed to further investigate planning, subgoal setting, and abstraction in spatial and nonspatial tasks.

## Footnotes

These data were collected under the Canadian Institute of Health Research grant MOP49566 awarded to Morris Moscovitch. At the time when this research was conducted, the authors were supported by the Canadian Institute of Health Research grant MOP125958 awarded to Morris Moscovitch (supporting I.K.B.), the Alzheimer Society of Canada doctoral award to I.K.B., James S. McDonnell Foundation and Natural Sciences and Engineering Council Discovery and Accelerator grants awarded to Morgan Barense (supporting I.K.B.), National Institute of Mental Health grant R01-MH104606 to Joshua Jacobs (supporting I.M.) and John Templeton Foundation grant NIBIB R01EB022864 to the Princeton Neuroscience Institute (supporting I.M.). We gratefully acknowledge and thank Morris Moscovitch and Jason Ozubko for designing the original experiment and collecting and sharing the data with us. We also thank Ken Norman, Buddhika Bellana, and Morgan Barense for helpful discussions.

The authors declare no competing financial interests.

- Correspondence should be addressed to Iva K. Brunec at ivab{at}sas.upenn.edu or Ida Momennejad at idamo{at}microsoft.com