Abstract
Electrophysiological recording studies in the dorsocaudal region of medial entorhinal cortex (dMEC) of the rat reveal cells whose spatial firing fields show a remarkably regular hexagonal grid pattern (Fyhn et al., 2004; Hafting et al., 2005). We describe a symmetric, locally connected neural network, or spin glass model, that spontaneously produces a hexagonal grid of activity bumps on a twodimensional sheet of units. The spatial firing fields of the simulated cells closely resemble those of dMEC cells. A collection of grids with different scales and/or orientations forms a basis set for encoding position. Simulations show that the animal’s location can easily be determined from the population activity pattern. Introducing an asymmetry in the model allows the activity bumps to be shifted in any direction, at a rate proportional to velocity, to achieve path integration. Furthermore, information about the structure of the environment can be superimposed on the spatial position signal by modulation of the bump activity levels without significantly interfering with the hexagonal periodicity of firing fields. Our results support the conjecture of Hafting et al. (2005) that an attractor network in dMEC may be the source of path integration information afferent to hippocampus.
 hippocampus
 head direction
 path integration
 entorhinal cortex
 place cells
 navigation
Introduction
The ability of rodents to navigate based on path integration (Mittelstaedt and Mittelstaedt, 1980; Etienne et al., 1986; Etienne, 1987; Alyan and McNaughton, 1999) and the persistence of hippocampal place fields in the dark have motivated the inclusion of a neural path integrator (PI) in theories of rodent navigation (McNaughton, 1989; O’Keefe, 1989; Touretzky and Redish, 1996; Redish and Touretzky, 1997; Samsonovich and McNaughton, 1997) (see also Etienne and Jeffery, 2004; O’Keefe and Burgess, 2005). In a “locale navigation” system (O’Keefe and Nadel, 1978), path integration helps solve the “simultaneous localization and mapping” problem (Smith et al., 1990; Montemerlo et al., 2002): to map an environment, sensory features must be associated with a motionderived representation of the animal’s location in some coordinate system. In a “praxic navigation” system, path integration can construct trajectories to previously visited locations.
Recent electrophysiological recording experiments in the dorsocaudal region of rat medial entorhinal cortex (dMEC) strongly suggest that the path integrator resides there (Fyhn et al., 2004; Hafting et al., 2005). Cells in this region exhibit multiple spatial firing fields arranged in a hexagonal grid. The scale of the grid increases, moving from the dorsal edge of dMEC to more ventral locations, but locally, cells recorded on the same tetrode tend to share both scale and grid orientation, suggesting that they are part of a local microcircuit (Hafting et al., 2005).
We show via a neural network model that the data are consistent with dMEC serving as the path integrator within a locale navigation system. Our model provides both a cogent explanation of the firing properties of grid cells and a mechanism by which such cells could satisfy the computational requirements for path integration. Redish and Touretzky (1997) posited these to include the following: (1) spatially localized firing fields that are universal across environments; (2) activity patterns updated based on selfmotion information; (3) activity patterns reset during reentry into a familiar environment; and (4) population activity patterns coding for position over a large area. We do not address here what role the dMEC might play in praxic navigation.
We first show that hexagonally spaced activity bumps can arise spontaneously on a sheet of neurons in a spin glasstype neural network model (Hopfield, 1982, 1984). Introducing an asymmetric form of the connection matrix and assigning a preferred direction of motion to each cell, with a corresponding velocitydependent input such as might be provided by the head direction system (Sharp et al., 2001; Wiener and Taube, 2005), allows the bumps to shift in any direction and gives individual units hexagonally repeating firing fields. We then show that a collection of grids with different scales and orientations allows an efferent structure such as the hippocampus to construct placespecific representations covering a substantially larger area than the period of the largest grid. Finally we show that “sensory” patterns can be superimposed onto the network, modulating the strengths of the firing fields of the cells without disrupting their hexagonal structure.
Parts of this work have been published previously in abstract form (Touretzky and Fuhs, 2005).
Spin glass model
To model the formation of a grid of bumps, we use a network of nonlinear neurons arranged on a twodimensional sheet. The weight matrix governing their mutual interactions is symmetric, and neuron interactions are constrained to be local: a neuron interacts only with those within a neighborhood around it on the sheet. This type of network is known as a spin glass model, by analogy with statistical mechanics models of frustration in magnetic systems (Hopfield, 1982, 1984). Spin glass models exhibit stable states (attractors) that correspond to local minima of an energy function. They will settle into one of these minima from any nearby starting state. Adding noise helps the system escape from shallow energy minima (frustrated states) and settle into a global minimum energy state. Attractor networks have been widely used to model the hippocampus (Samsonovich and McNaughton, 1997; Redish, 1999; Kali and Dayan, 2000; Touretzky et al., 2005) (see also Zhang 1996).
The construction of the weight matrix was inspired by the observation, originally proven by Thue (1892, 1910), that the optimal packing of equally sized circles in a plane is a hexagonal lattice. Similarly, hexagonal activity patterns on a sheet can be explained as the result of competition between each neuron and those neighbors that are within a certain radius around it. We created such competition by composing weights based on two properties. First, weights are proportional to a periodic function of the distance between units on the sheet; this creates cooperation between units of similar “phases” and competition between units outofphase. Second, unit interactions are constrained to be local. The result yields a symmetric weight matrix that produces multiple activity bumps that arrange themselves in a hexagonally periodic lattice attributable to the locality and radial symmetry of the recurrent connections.
Let ξ_{i} be the membrane voltage of neuron i, and let f_{i} be its firing rate. We use square root as the nonlinear transfer function that converts membrane voltage to firing rate, with a threshold of 0: Ermentrout (1994) showed that this transfer function was consistent with that of a conductancebased model with class I membranes, i.e., neurons whose firing rate can vary continuously from 0.
The evolution of the membrane voltage ξ_{i} over time is governed by a differential equation that includes an integration time constant τ, recurrent connections with weights W_{ij}, a velocity input ν_{i}, and a Gaussian noise term ε: To allow for translation of the hexagonal activity patterns across the sheet, the full weight matrix W_{ij} is composed of symmetric and asymmetric components. The symmetric matrix establishes hexagonal periodicity, whereas the asymmetric matrix provides directional biases to translate the pattern.
To calculate W_{ij}, we first define a distance function d_{ij} between pairs of units on this sheet. Each unit is assigned an integer coordinate pair (x_{i}, y_{i}) whose x_{i} and y_{i} values range from 1 to N_{diam}; the circular sheet of neurons used in these simulations is 61 units in diameter, 2861 units total. The Euclidean distance between units i and j is then . The symmetric matrix for connections between neurons i and j has connection strengths: The structure of the connection strengths is principally determined by Ψ(ωd_{ij}), a local, periodic function of the distance between units. This function is shown in Figure 1E and derived in the Appendix. The spatial frequency ω rescales the function, thus determining the number of bumps that form along one axis of the grid. The term γ_{i} decreases the projection strengths of neurons near the edge of the sheet, limiting the boundary effects. Figure 1A shows the symmetric weight matrix for the central unit in the sheet.
The asymmetric weight matrix W^{asym}, which is purely inhibitory, has a similar form but is offset from the center, as shown in Figure 1B: Because the asymmetric weights serve only to translate the activity pattern across the sheet, a small region of inhibition is sufficient. We therefore restricted the weights to include only the portion of Ψ up to its first 0 crossing, ψ_{1}.
The offset distance function δ_{ij} is biased in a direction opposite the preferred movement direction of the cell φ_{i}, offset by approximately oneeighth wavelength: The offsets are δ_{i}^{x} = β/ωcosφ_{i} and δ_{i}^{y} = β/ωsinφ_{i}. In theory, the φ_{i} values can be random and uniformly distributed around the circle. However, because of the relatively small number of units in our simulation, we found it advantageous to use just four preferred directions, 90° apart, assigned in an alternating manner across the grid so that, at every point at which four pixels meet, all four preferred directions are represented. This approach ensures a smooth distribution of preferred directions so that the local bump representation is not biased toward motion in any particular direction.
Figure 1C shows the sum of the symmetric and asymmetric projection weights of the central unit, W_{ij} = W_{ij}^{sym} + W_{ij}^{asym}. Although the efferent weight matrix for each unit is asymmetric, the afferent connections of the unit are approximately symmetric, because it receives projections from units with all possible preferred directions. Figure 1D shows the input weights for the central unit of the sheet; the apparent graininess in this plot is attributable to the variation in preferred directions across units.
The projection strengths of units near the edge of the sheet were faded to 0 by the γ_{i} term in Equations 3 and 4 to ameliorate edge effects. These edge effects were caused by units on the edge receiving unbalanced input: unlike units closer to the center whose inputs were from units in all directions, units near the edge received no input from beyond the edge of the sheet. This imbalance resulted in bumps of activity preferentially forming at the edges of the sheet, constraining the formation thereafter of bumps in the interior. For sheets with noncircular boundaries, this caused the lattice of bumps to form only at orientations that would maximize the number of bumps aligned along the edges of the sheet. For example, a square sheet elicits grid bump lattices with orientations of 0 or 90°. Also, the interbump distance along one axis is distorted by a factor of relative to the other, reflecting the stretching of the lattice necessary for bumps to be located along all four edges. The existence of these minimum energy states prevents the network from exhibiting the variety of grid orientations observed by Hafting et al. (2005). Moreover, when the bumps are moved across the sheet, they tend to bunch up along the edge rather than smoothly sliding off, which destroys the hexagonal symmetry. Switching to a toroidal topology would resolve this latter problem but would not eliminate the tendency toward axis alignment.
The γ_{i} term solves both problems. It eliminates distortion caused by edge effects because the progressively weaker weights fail to reinforce the bumps as they move toward the edge, so that they smoothly “fade away” rather than abruptly “fall off.” Also, the annular shape eliminates the bias in favor of axisaligned grids. The γ_{j} term is defined as follows: The top three panels of Figure 2A show the formation of a hexagonal lattice of bumps across the neural sheet. The noise term ε in Equation 2 serves to break the initial symmetry, and the phase and orientation of the bumps is established within the first few dozen time steps. After 200 time steps, the result is a robust lattice of bumps. Parameter values for this simulation are given in Table 1.
Path integration
The central hypothesis explored by our model is that path integration can be achieved by moving the bump array around on the sheet. A variety of schemes have been proposed for shifting an attractor bump around a ring (Skaggs et al., 1995; Redish et al., 1996; Zhang, 1996; Goodridge and Touretzky, 2000; Sharp et al., 2001; Hahnloser, 2003) or over a sheet (Samsonovich and McNaughton, 1997; Eliasmith, 2005). The model presented here differs from these previous models in that, rather than moving a single bump, it is designed to move multiple bumps simultaneously while enforcing a particular spatial relationship between them.
To move the bumps in direction Φ, the velocity input v_{i} to units with preferred direction φ_{i} close to Φ is increased, whereas the input to units with preferred direction nearly opposite Φ is decreased. The asymmetric inhibitory projections, because they are offset from the center of each unit, inhibit one flank of the bumps more than the opposite flank, causing the bumps to shift.
Let s be the animal’s current speed, normalized to lie between 0 and 1, and let Φ be its current direction of motion. Then the velocity input to each unit is calculated as follows: The value of v_{i} ranges between 0 and 2; it is 0.5 when the animal’s speed is 0. A plot of v_{i} as a function of φ_{i} resembles the head direction cell tuning curves seen in postsubiculum (Taube et al., 1990a,b) or anterior dorsal thalamus (Blair and Sharp, 1995; Taube, 1995), which have a width of 90–100°. However, the model also functions properly with broader tuning functions; previous tests used a ς_{hd} value of 0.633, which produced a much wider curve.
The bottom three panels of Figure 2A show the bump array being shifted to the right. The hexagonal periodicity of the bumps is preserved during translation. Interestingly, except at the edges of the sheet, the velocity modulation causes little change in the firing rates of the cells, suggesting that the dMEC could perform path integration even if firing rates were only weakly velocity dependent.
Figure 2B shows the place fields of three units in a square arena. To generate these fields in an efficient and systematic manner, the lattice of bumps was shifted to the right to correspond to leftward travel along the top edge of the arena. Then, at each of 75 positions along the top edge, the network was reset to its previous state at that position, and the activity pattern was shifted upward to correspond to downward travel. As the activity pattern shifted across the neural sheet, individual cells showed a corresponding pattern of activity across space. The firing fields differ only as to phase; the grid spacing and orientation is constant.
In vivo development of the weight matrix
Although a developmental model of dMEC is beyond the scope of this work, albeit an intriguing avenue for future research, we want to provide some insight into the developmental processes potentially underlying the construction of such a matrix.
Patches of cells within dMEC must first be organized into twodimensional sheets. One way to achieve this is by innervation by an afferent cortical sheet in a localized manner: each dMEC cell receives projections from a small region within the sheet. Such projections would need to be only approximately topographic, i.e., the “logical” position of the dMEC cell in the sheet, as dictated by the source of its afferent projections, does not bear a strict correspondence with its physical location within the brain. Early in development within the visual system, a similar imprecision appears to exist: the chemical markers underlying axonal wiring yield an approximately retinotopic projection that is then improved based on the spatiotemporal coactivity of the neurons (Miller et al., 1989; Wong, 1999). This assumption is important, because anatomically proximate dMEC cells (recorded on the same tetrode) show firing fields at different phases. However, only a small amount of jitter in the relationship between logical and physical position is sufficient to account for the observed heterogeneity of field phase in neighboring neurons. To illustrate this point, consider the second and third cells in Figure 2B. Although close to each other on the sheet, they have completely different firing field phases.
To construct the symmetric portion of the weight matrix via Hebbian learning, dMEC cells would require coincident activity between units that varied as a radially symmetric, locally weighted periodic function of their distance. It is important to note that, in the context of Hebbian learning, coincident activity is a statistical construct. On average, the correlation between units at distance d should be proportional to Ψ(ωd_{ij}). However, many possible sets of activity patterns could give rise to such a correlation structure, and individual activity patterns need not look anything like the weight matrix itself. For example, the weight function ψ used in these simulations was derived from multiwave packets of activity propagating across the sheet of dMEC neurons (see Appendix).
Furthermore, the specific quantitative formulation of ψ used in these simulations is not critical to capturing the activity patterns of the grid cells. Although the ψ used shows two “rings” of excitatory connections, we also tried ψ functions composed of one and three rings (two and fourwave packets; see Appendix), both of which produced hexagonal activity patterns. The choice of two rings was based only on our subjective impression that it produced the most accurate path integration. We also tried a cropped version of the 0thorder Bessel function of the first kind in which values beyond the fourth 0 crossing were set to 0; this also produced hexagonal activity patterns. In fact, in early versions of our model, a completely different function was successfully used, one based on the product of a sinusoid with a Gaussian. Thus, any function approximately of the form that we have described should suffice.
The asymmetric portion of the weight matrix is simpler in form and serves only to translate the bumps across the sheet. Models of the head direction system have used various weight matrices to achieve translating of an activity bump around a ring, and our asymmetric weight function is unlikely to be unique. Hahnloser (2003) showed that integration of an activity bump on a ring can be learned by a form of antiHebbian learning. A similar form of learning could likely be generalized to a twodimensional space. Because the asymmetric connectivity used here only creates interactions between neighboring neurons within the same activity bump, the asymmetric connections translate bumps independently of each other, whereas the symmetric connections enforce the hexagonal lattice structure. Thus, teaching a network to translate a single bump across a sheet would be sufficient for it to later translate a lattice of bumps.
Resetting the path integrator
The coupling of path integration and place systems facilitates both the correction of inaccurate path integration information by sensory cues and the representation of place in cuedeprived conditions. The ability of stable cues to reset the place code (Knierim et al., 1995, Jeffery and O’Keefe, 1999) and to influence praxic navigation (Etienne et al., 2004) is well established. The integration of these two systems suggests that recall of a hippocampal map when, for example, the rat returns to a familiar environment, should lead to reset of the path integrator to agree with the previously constructed place code (Touretzky and Redish, 1996; Redish et al., 1998). We assume that the projection from CA1 and subiculum to the deep layers of entorhinal cortex can influence the attractor bumps, and that these connections are established through Hebbian learning when the environment is novel.
To simulate recall of a specific activity pattern, we supplied additional external input to each of the cells equal to the value of the firing rate f_{i} of that cell in that pattern. The input was supplied continuously for 200 time steps via an additional term to Equation 2, and it was always successful in resetting the state of the network. However, weaker inputs, typically those below 0.25 f_{i}, did not always succeed, and, because of the nonlinear properties of the attractor network, the effect of this input was state dependent. If the input pattern was close to the current state, the network settled into a new minimum energy state that mirrored the input. However, if the input was nearly orthogonal to the current state, it had little effect, and the bumps did not move. This suggests that, after brief exposure to an environment or under conditions in which longterm potentiation (LTP) is impaired, a grid is more likely to be reset during the rat’s reentry into the environment if the phase to which it is being reset is similar to its phase just before entry.
Multiple grids
As Hafting et al. (2005) point out, a single hexagonal grid of the scale observed in dMEC is insufficient for path integration because the bump pattern soon repeats itself, leading to ambiguities in the rat’s location. However, multiple grids, with different scales and/or orientations, can encode a much larger space without repetition. More ventrally located cells in dMEC have larger firing fields and proportionately greater spacing between them; the distance between field peaks varies by at least a factor of 2, and cells in different regions of the dMEC show different grid orientations.
To determine to what extent a conjunctive encoding of multiple grids could produce unique positional encodings over a large space, we generated a set of grids of varying spatial frequencies and orientations and computed their activity patterns at all possible positions in environments of varying sizes. We used these patterns to drive a population of 2000 simulated “place cells.” The place cell population activity vectors were then correlated across positions to quantify how well the population encoding could distinguish positions over a large space when driven by a set of grid patterns that repeated over smaller spaces.
For computational efficiency, simulated grid cell activities were calculated in closed form. We first defined three basis vectors b_{k}, 60° apart: Each grid g was assigned a random orientation θ_{g}, defining a rotation matrix R_{g} that was used to rotate the basis vectors: Each grid was also assigned a unique random spatial frequency ω_{g} between 1 and 2. The grid was represented by an 8 × 8 array of units. Each unit i had a phase offset vector p_{i} that represented its position within the grid; values ranged from (0, 0) to (4π/, 4π/). Three amplitude functions z_{g,i,k} were calculated for each grid unit as a function of the environment location, x: The variable x ranged over a square space with side length L times the period of the lowest frequency grid, i.e., its values ranged from (0, 0) to (4π/ L, 4π/ L); values of L up to 10 were simulated. The firing rates of the grid cells, which were hexagonally periodic functions, were calculated based on the sum of the three amplitude functions: where […]_{+} is the semilinear threshold function that maps negative values to 0.
To construct a place cell representation based on the conjunctive activities of the grids, each of the n = 2000 place cells received connections from these grid cells, at most one grid cell per frequency. The weights were all of unit strength, and the phases of grid cells selected to project to each place cell were chosen at random. The activity of each place cell was calculated as f_{j} = [ξ_{j} − ξ_{98%}]_{+}, where and ξ_{98%} is a global inhibition term.
For the place representation to be sparsely coded, as seen in the hippocampus, a simple feedforward inhibition mechanism was used to reduce the number of active place cells at any one position to ∼2% of the population. The global inhibition term, ξ_{98%}, was calculated as a function of the average weighted input to the place cell population: The 98th percentile of ξ_{j} values was then well approximated by a quadratic function of ξ̄: ξ_{98%} = b_{2}ξ̄^{2} + b_{1}ξ̄ + b_{0}, with coefficients as shown in Table 1. ξ_{98%} is monotonically increasing and slightly sublinear over the range of ξ̄ values in these simulations.
Similarity of population codes within an arena
Grid and place cell activities were calculated over square spaces whose side length, L, was up to 10 times the period of the lowest spatial frequency grid. At each position in the space, the place cell population vector was compared with the vectors at every other position, calculating a correlation coefficient between each pair of vectors. To measure how well the place cells uniquely represented each of the positions in the space, each position was associated with a second position (at least one highestfrequency period away) whose population vector was the most similar to the first. Correlation coefficients between pairs of “most similar positions,” r_{max}, indicated how well the population represented the space. Values of r_{max} above 0.5 were considered to indicate inadequate distinction between positions. Although this threshold is somewhat arbitrary, other thresholds yielded the same pattern of results.
Figure 3A shows how the percentage of positions for which r_{max} > 0.5 varies with the size of the space and the numbers of grids. For this experiment, the orientations of the grids were the same. With only two grids, repetition within even smaller spaces was inevitable; population vectors at each position quickly began to reoccur as the size of the space increased. However, as more grids were used, progressively larger spaces could be represented without similar population vectors occurring at multiple positions.
When grid orientations were allowed to vary, additional decorrelation between population vectors was observed. Figure 3B shows the distribution of r_{max} when grid orientations were randomized. Clearly, using grids at varying orientations provides a substantial benefit in representing a large number of locations.
In the simulations above, each place cell received a projection from a grid cell in every grid. To explore the effects of a sparser connection structure, the projection density was reduced to two randomly chosen grids per place cell. Figure 3C shows the effect of this sparsity on the efficacy of the place code. The results are nearly indistinguishable from the dense connectivity results in Figure 3B.
Place cells tended to have multiple firing fields in the larger environments, and, when many grids were sampled, the firing fields were randomly distributed. This would be expected in any sparse code in which individual units are reused as part of a larger set of patterns, and this has been observed experimentally (Gerrard et al., 2001). When only a few grids were sampled, the place cells displayed a somewhat hexagonal arrangement of firing patterns, but this regularity was still limited to localized portions of the space. In no case did the place fields show the hexagonal regularity of the grid cells across the entire space.
Partial remapping
With multiple independent grids, resetting the path integrator would require resetting all of the grids. If only some grids were reset, the result should be what Muller et al. (1991) called “partial remapping.” To quantify this effect, we calculated place cell population vectors for each position in a space four times the period of the lowestfrequency grid. The population vectors were calculated twice: during the second run, the phases of N_{reset} of the grids were reset to their previous value, whereas the remaining grids were set to random phase values unrelated to their values during the first run. The simulations used eight grids of varying spatial frequencies and orientations, and place cells received either sparse (three grids) or dense (all eight grids) projections from the grid cell population. The population vector correlations between the first and second runs are shown in Figure 4A. As the number of reset grids increased, the similarity of the place code representation also increased. The number of grids sampled (three vs all eight) had a negligible effect.
Discordant remapping
Several labs have reported that place cells show discordant remapping during “double rotations,” when two sets of cues are rotated in equal and opposite directions (Tanila et al., 1997; Brown and Skaggs, 2002; Lee et al., 2004a). Specifically, some place fields rotated with the local track cues, other fields rotated with distal room cues, and others remapped. Moreover, the activity patterns in CA3 and CA1 differed. Lee et al. (2004a) found that, whereas CA3 fields rotated predominantly with one set of cues, CA1 fields rotated in approximately equal numbers with each set of cues, suggesting that the CA3 and CA1 representations may be formed independently.
The discordant remapping must initially be driven by sensory information. However, early experiments in darkness suggest that, once formed, some hippocampal place fields in CA1 persist without the support of visual sensory information (Quirk et al., 1990). Is it possible that discordant remapping could persist in darkness as well?
With multiple independent grids, some grid orientations could align with respect to the local cues, whereas others could align with respect to the distal cues. To understand the effects of such grid discordance on a place cell population, we simulated the activity of place cells under two conditions. In the “standard condition,” grid cell networks were initialized to random orientations. In the “rotated condition,” the grid networks were divided into two sets, and the orientations in one set were rotated by 90° with respect to those of the other set. Place field correlations were calculated to determine whether the fields in the rotated condition were similar to those in the standard condition (modulo rotation) or whether they bore little resemblance to the standard configuration field at either rotation angle, i.e., they remapped.
Figure 4B shows the results. When the number of grids in each set is equal (four grids each) and place cells are sparsely connected, the distribution of place cell responses to the double rotation is consistent with those found by Lee et al. (2004a) in CA1. As one set grows larger than the other, place cells tend to show fields consistent with the orientation of that set. In this figure, place field correlations <0.5 were taken to indicate remapping. Other values would change the percentages somewhat; however, values around 0.5 are typical in physiology studies.
Sensory modulation of activity patterns
Hafting et al. (2005) found that dMEC cells exhibited different peak firing rates in their various spatial firing fields, and these variations were reproducible during a second session in the same environment. We interpret this in the following way: dMEC receives sensory or hippocampal input that varies depending on the animal’s location, and this secondary input exerts a modulatory influence on the activity levels of dMEC cells without disrupting the hexagonal activity pattern.
To test this interpretation, we explored whether such a secondary input could modulate the rates of different neurons in our model in a reproducible way. A set of 100 random external input patterns was generated, abstractly representing the input from sensory cortex or hippocampus. A pattern of hexagonal firing fields was allowed to form in the network. After the firing fields stabilized, input patterns were applied to the network in the following way: (1) apply random noise for 10 time steps, (2) apply one of the input patterns for 10 time steps, (3) take a snapshot of the vector of firing rates, and repeat. Thus, every 20 time steps, a new pattern was presented for 10 time steps. The purpose of the intervening noise was to eliminate temporal correlations between successive patterns attributable to hysteresis in the network. The entire sequence of 100 patterns was presented twice.
The mean of the firing rates across patterns was subtracted from each snapshot, so that the snapshots reflect inputmodulated changes from the average hexagonal grid pattern. The snapshots from the first and second presentations were then crosscorrelated (Pearson’s r).
Figure 5A shows the correlation coefficients across patterns, comparing the first presentation with the second. The dark diagonal suggests that the second presentation evokes very similar network activity to the first presentation and is unrelated to that of the other patterns. Figure 5B (top) is a histogram of the correlation coefficients between the first and second presentations of the same pattern (i.e., the diagonal elements in Fig. 5A); all correlations are tightly clustered above 0.8. The middle histogram shows correlation coefficients between the first presentation of pattern i and the second presentation of another pattern, that which evoked the closest network activity to that of pattern i. In all cases, the next closest matching pattern has a dramatically lower correlation coefficient. The bottom histogram shows all offdiagonal correlation values from the correlation coefficient matrix shown in Figure 5A.
To assess the ability of individual hippocampal cells with limited fanin to discriminate between these sensory patterns imposed on the same array of activity bumps, we recalculated the correlations using a randomly selected subset of 20 units with nonzero activity instead of the full population. Pattern discrimination is still feasible, and the results are shown in Figure 5, C and D.
Discussion
Summary of properties of dMEC cells
The following points summarize the account of our model of the extant data on grid cells in dMEC and present predictions from the model for future studies.
(1) Dorsal MEC cells exhibit multiple spatial firing fields arranged in a hexagonal lattice. These are expressed immediately in a novel environment, show a constant phase relationship between cells across environments, and do not change scale with the size of the arena. Explanation: Local, radially symmetric oncenter/offsurround connections produce an attractor network whose stable states are a twodimensional toroidal manifold of hexagonally periodic activity patterns. (Note that, whereas the state space manifold is toroidal, the network connectivity is a simple sheet.) Different neurons in the same network are most active at different phases of the periodic pattern, but the firing fields of all units will show the same orientation and spacing.
The model occasionally settles into stable states with local irregularities in the network activity pattern, e.g., a pentagonal or heptagonal bump cluster. Such irregularities are caused by frustration among the arrangement of bumps as they form, not properties of specific neurons in the system. From this, an important prediction follows: when such irregularities are observed, they should be environment specific. Although not explicitly addressed by Hafting et al. (2005), the cell shown in their Figure 6 appears to bear this out: the field of the cell looks hexagonal in the familiar environment and heptagonal in the novel environment. If such irregularities occur consistently in dMEC fields, it would support a model such as this in which the hexagonal periodicity of fields is not hardwired. In contrast, a model based on a single attractor bump moving around on a sheet with hexagonally periodic toroidal connectivity would always yield hexagonal fields.
(2) The size and spacing of fields is similar among neighboring neurons but varies systematically along the dorsoventral axis. Explanation: As Hafting et al. (2005) point out, neighboring neurons in dMEC are likely coupled together as part of one of many local neural networks. We show computationally that multiple independent networks provide a basis for constructing a representation of space over a substantially larger domain than the periodicity of any one network.
The size and spacing of fields is determined by two factors in our model. The first is the ω parameter, which determines the spatial frequency of bumps on the neural sheet. The second is the strength of velocity, which determines how far the bumps are translated across the sheet per unit of travel in the environment. Increasing the rate of bump shift relative to physical movement shrinks the size and spacing of nodes in the field of a cell. Decreasing this ratio expands the node size and spacing. One should be able to determine whether velocity modulation differences account for field size variations: cells with tightly packed fields should show stronger changes in firing rate as a function of the animal’s velocity than cells with larger, more dispersed fields. Evidence of this has been observed downstream in the hippocampus (Maurer et al., 2005).
Amaral et al. (1990) estimated that there were ∼200,000 layer II entorhinal cortical cells that projected to the dentate gyrus (DG). If the number of cells in dMEC is approximately onequarter of that, then ∼17 networks of the size simulated here could be embedded within layer II. However, grid cells have also been found in layers III, V, and VI of dMEC (Moser et al., 2005), so the number of grid networks, or the number of cells per network, could be substantially higher than our model assumes.
(3) The orientation of fields is similar among neighboring neurons but varies over the dorsoventral axis and is not preserved across environments. Explanation: Radially symmetric connections allow each network to settle into a bump array with any orientation. We show that this heterogeneity of orientations provides a superior representation for constructing a place representation over a large space.
(4) Across multiple sessions in the same environment, phase and orientation of a firing field of a cell, and firing rate differences between its peaks, are all reproducible, whereas in different environments, they are different. Explanation: The ability to restore the state of a grid cell network to an earlier state is similar to the previously studied recall problem in singlebump attractor networks. Without such coherent input, the network settles into a random stable state. Failing to reset some of the networks or resetting them discordantly may reinforce partial or discordant remapping in the hippocampus.
Variations in peak firing rates among individual nodes of activity can be caused by afferent input without disrupting the hexagonal firing patterns in the network, and these inputs can be distinguished from one another based on the firing rates of the network.
(5) In darkness, spacing, mean firing rate, and spatial information did not change, but the fields shifted somewhat relative to the lighted condition. Explanation: Relative phases and orientations of different grid fields should persist in the dark because they are a product of attractor dynamics, not sensory input. However, field drift attributable to path integration error will accumulate in the absence of sensory information that can maintain alignment of the fields with external landmarks.
(6) Grid cells show weak velocity modulation? Sargolini et al. (2005) found the firing of some grid cells to be modulated by head direction, which is one of the two components of a velocity signal, the other being speed. Our model predicts that true velocity modulation will be found in at least some dMEC cells. The attractor dynamics of a network tend to dampen the impact of the velocity input on the firing rate of a cell, predicting that grid cells will show some velocity modulation, but it will be small relative to the range of firing rates of the cell.
Neural architecture for path integration
In the idealized architecture presented here, all units use the same weight pattern, rotated to reflect their preferred direction, and all cells show velocity tuning. There are asymmetric connections between cells of similar directional bias, with progressively more symmetric connections between cells of differing directional biases. An alternative architecture might divide the roles of the two connection types. In this scenario, two populations of dMEC cells would interact. The symmetrically weighted cells would enforce the hexagonal periodicity of firing patterns, whereas the asymmetric cells would shift these patterns during movement. Only the latter population should show velocity modulation.
dMEC impact on hippocampal remapping
Redish and Touretzky (1997) argued that navigation was likely accomplished using a set of interdependent representations, two of which were the hippocampal “place” representation, which provided an environmentspecific place code, and a universal coordinate map to support path integration based on vestibular and motor information. In their theory, these two representations were tightly coupled: in cuedeficient conditions, the hippocampal place representation could be updated based on idiothetic movement information, whereas in cuerich conditions, cumulative error in the PI could be corrected by efferent projections from the place code, whose representation is strongly driven by sensory information. Thus, in a novel environment, a bidirectional mapping is learned between PI coordinates and hippocampal representations of place. When returning to a familiar environment, it follows that the PI system would have to be reset to realign it with the animal’s present location. Failure to reset the system would result in complete hippocampal remapping (Redish et al., 1998).
Recent data has shed light on the possibility that the hippocampal remapping phenomenon may be mediated by a set of distinct (if interdependent) processes. Specifically, there appears to be evidence that, in novel environments, CA1 and CA3 representations form independently (Leutgeb et al., 2004; Lee et al., 2004b; Wilson et al. 2005). Place fields in CA1 are observed even when the Schaffer collateral projection from CA3 is severed (Brun et al., 2002). This suggests that immediate CA1 remapping is mediated by an extrahippocampal process such as the failure to reset the grid cell networks in dMEC.
In contrast, the gradual remapping in CA1 that manifests over many days of exposure is likely to be intrahippocampally mediated. When visual cues were reshaped (Lever et al., 2002) or repositioned either with respect to each other (Shapiro et al., 1997; Jeffery, 2000) or in conflict with vestibular information (Sharp et al., 1995), the number of CA1 cells that remapped during each successive experimental manipulation progressively increased. Preliminary evidence from recordings in DG shows heightened sensitivity to arena shape changes in DG relative to CA3 (Leutgeb et al., 2005), suggesting that, with experience, this remapping is propagated along the trisynaptic pathway from DG to CA1. Such DG sensitivity is consistent with behavioral data showing that DG lesions impair the rats’ ability to distinguish two locations only if they are nearby (Gilbert et al., 2001).
Together, these data suggest that hippocampal remapping attributable to gross environmental changes (or deficient LTP) may be caused by a failure to reset the dMEC grid cells, whereas subtle changes to spatial relationships between landmarks may be detected within the hippocampus and gradually propagated into CA1 in an experiencedependent manner.
Appendix: derivation of the weight function Ψ
During the development of the visual system, it has been known for some time that spontaneous waves of activity propagate across the surface of the retina (for review, see Shatz, 1996; Wong, 1999). These waves have been observed in several mammals, including rodents. They occur one at a time (one wave fully propagates across the retina before another is formed), and the direction of propagation of each wave varies randomly from one to the next. The waves are believed to serve several roles, including the refinement of topographical specificity of axonal projections to the LGN and beyond, as well as ocular specificity by layer in the LGN and by column in visual cortex.
For the purpose of learning the symmetric weight matrix, we consider a variation on this theme: a “wave packet,” composed of multiple contiguous waves, propagating across a dMEC sheet, each wave packet traveling in a different random direction. By passing a wave packet across the sheet, connections are learned both between units coactivated by the same wave and between units activated by other waves in the packet. Because the direction of propagation varies from packet to packet, units develop radially symmetric weights: one annulus of excitation and inhibition for each wave in the packet.
We simulated a square sheet of 31 × 31 dMEC units. Each unit i was assigned integral (x, y) positions on the grid from (−15, −15) to (15, 15), which were converted to polar coordinates (r_{i}, θ_{i}). The temporal phase of each unit was defined as follows: where κ determines the width of each wave in the packet, and θ_{wave} specifies the direction of travel of the wave packet; a different random value of θ_{wave} was picked for each packet. As time t increased, the temporal phase of each unit advanced. Each wave packet was propagated across the sheet by clamping the firing rate f_{i} of each unit to a sinusoidal function of its temporal phase: where N_{w} is the number of waves in the packet. For values of ζ_{i}(t) < 0, the wave packet had yet to reach the unit; for values between 0 and 2πN_{w}, the unit was somewhere within the packet, and, for values greater than 2πN_{w}, the entire packet had already passed over the unit. Outside the packet, the firing rates of the units were set to a baseline tonic activity, f_{tonic}, of 1. Within the packet, the firing rate of each unit varied from 0 to 2 as a sinusoidal function of its temporal phase. After each packet finished propagating across the network, t was reset, a new random value for θ_{wave} was chosen, and propagation of the next packet began.
A form of Hebbian learning was used to shape connections between dMEC units in the network: This learning rule is similar to the Bienenstock–Cooper–Munro (BCM) learning rule (Bienenstock et al., 1982) in that the average postsynaptic activity f̄;_{i} determines the threshold between weight increase and decrease. However, the nonlinearity of the BCM rule was removed, making analytic analysis of the asymptotic weight values tractable. In these simulations, f̄;_{i} was set to f_{tonic}, the true mean activity of each unit. The learning rate τ_{w} was slowed exponentially from 2000 to 100,000 during the course of training to increase accuracy by averaging over many packets once the weights were close to their asymptotic values.
One thousand wave packets, each containing three waves, were successively driven across the network. A onedimensional projection of the learned weights of the center unit is shown in Figure 1E (magenta points); weights of other units were similar. The magnitude of the weight is plotted as a function of the distance between the center unit and the others in the network. The three waves in each packet gave rise to a weight profile with a strong center peak and two progressively weaker peaks at larger distances.
As learning progresses, W_{ij}^{learn} converges to the expected value of (f_{i} − f̄;_{i})f_{j} over all wave packets. This expected value can be determined analytically as a function of the distance d between units i and j. Let ζ_{i} be the temporal phase of some unit i. Outside the packet, f_{i} = f̄;_{i}, so dW_{ij}^{learn}/dt = 0 We therefore only consider values of ζ_{i} between 0 and 2πN_{w}. Let ζ_{i} be the temporal phase of some unit j. Figure 1F shows an example of units i and j, both within the wave packet. Let us consider unit i to be at some fixed temporal phase ζ_{i} within the packet; thus, varying θ_{wave} results in the packet rotating about unit i. The temporal phase of unit j can therefore be expressed as a function of ζ_{i}, θ_{wave}, and the relative positions of the two units on the sheet: where θ_{ij} is the angle of unit j relative to unit i. By averaging over the ranges of ζ_{i} and θ_{wave}, one can calculate the asymptotic values of W_{ij}^{learn} as a function of the distance between units: The outer integral averages over the range of temporal phases of unit i as the packet passes over it. The inner integral averages over the various phases of unit j at a given distance d by averaging over all possible directions of wave propagation.
To evaluate Equation A5, we must consider the piecewise nature of the firing rate function (Eq. A2). Although we assume by construction that unit i is always within the packet, unit j may or may not be, depending on the value of θ_{wave}. Figure 1G shows the result of increasing θ_{wave} to the point at which unit j is at the front edge of the wave packet. This value of θ_{wave} will be referred to as θ_{front,1}. Rotating the wave packet further results in unit j being outside the wave packet. Rotating it further, unit j rejoins the wave packet at θ_{front,2}. Rotating further, unit j remains within the packet until θ_{back,1} and is behind the packet until θ_{back,2}. Thus, θ_{wave} values may be divided into four intervals: [θ_{back,2}, θ_{front,1}] and [θ_{front,2}, θ_{back,1}] when unit j is within the packet; [θ_{front,1}, θ_{front,2}] when the packet has yet to pass over unit j; and [θ_{back,1}, θ_{back,2}] when the packet has finished passing over unit j. This leads to the following substitution of Equation A2 into Equation A5: The inner integral in Equation A5 has been divided into a sum of integrals in Equation A6, each spanning one of the four aforementioned intervals of θ_{wave.} The first and second inner integrals are equal and so they are collapsed together in Equation A7. The third and fourth inner integrals do not contribute to the result because the range of the outer integral is a multiple of 2π. Simplifying Equation A6 and substituting in Equation A4 yields the following: Although the intervals of θ_{wave} and the value θ_{ij} depend on the position of unit j relative to unit i, the value of the inner integral depends only on the distance between the units because of the rotational symmetry underlying its construction. Hence, Ψ is simply a function of d.
Equation A7 was integrated numerically for many values of d, resulting in the green line shown in Figure 1E. The results of these integrations were used in the construction of the weight matrices shown in Figure 1A–D and in the simulations shown in Figure 2.
Footnotes
 Received October 12, 2005.
 Revision received March 14, 2006.
 Accepted March 14, 2006.

This work was supported by National Institutes of Health Grant MH59932.
 Correspondence should be addressed to David S. Touretzky, Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213. Email: dst{at}cs.cmu.edu