Abstract
Primates rely predominantly on vision to gather information from the environment and neurons representing visual space and gaze position are found in many brain areas. Within the medial temporal lobe, a brain region critical for memory, neurons in the entorhinal cortex of macaque monkeys exhibit spatial selectivity for gaze position. Specifically, the firing rate of single neurons reflects fixation location within a visual image (Killian et al., 2012). In the rodents, entorhinal cells such as grid cells, border cells, and head direction cells show spatial representations aligned to visual environmental features instead of the body (Hafting et al., 2005; Sargolini et al., 2006; Solstad et al., 2008; Diehl et al., 2017). However, it is not known whether similar allocentric representations exist in primate entorhinal cortex. Here, we recorded neural activity in the entorhinal cortex in two male rhesus monkeys during a naturalistic, free-viewing task. Our data reveal that a majority of entorhinal neurons represent gaze position and that simultaneously recorded neurons represent gaze position relative to distinct spatial reference frames, with some neurons aligned to the visual image and others aligned to the monkey's head position. Our results also show that entorhinal neural activity can be used to predict gaze position with a high degree of accuracy. These findings demonstrate that visuospatial representation is a fundamental property of entorhinal neurons in primates and suggest that entorhinal cortex may support relational memory and motor planning by coding attentional locus in distinct, behaviorally relevant frames of reference.
SIGNIFICANCE STATEMENT The entorhinal cortex, a brain area important for memory, shows striking spatial activity in rodents through grid cells, border cells, head direction cells, and nongrid spatial cells. The majority of entorhinal neurons signal the location of a rodent relative to visual environmental cues, representing the location of the animal relative to space in the world instead of the body. Recently, we found that entorhinal neurons can signal location of gaze while a monkey explores images visually. Here, we report that spatial entorhinal neurons are widespread in the monkey and these neurons are capable of showing a world-based spatial reference frame locked to the bounds of explored images. These results help connect the extensive findings in rodents to the primate.
Introduction
The rodent entorhinal cortex contains cell types that represent body position, including grid cells, border cells, and head direction cells (Hartley et al., 2014). However, unlike many sensory and motor areas of the brain that represent spatial information relative to the body (egocentric representation), entorhinal cells fire when the body occupies a particular space relative to features in the local environment (allocentric representation) (Quirk et al., 1992; Fyhn et al., 2004, 2007; Hafting et al., 2005; Sargolini et al., 2006; Barry et al., 2007; Savelli et al., 2008; Aronov and Tank, 2014). For example, an entorhinal neuron will fire selectively when a rodent is located in a particular place within an enclosure that has a visible cue card on the wall and then fire for that same location relative to the card when the card is moved (Quirk et al., 1992). This world-based, allocentric reference frame contrasts with the egocentric spatial reference frame of sensory and motor neurons. Specifically, whereas visual cortical neurons fire egocentrically to reflect the spatial arrangement of stimuli on the retina, entorhinal neurons instead fire allocentrically across highly variable visual inputs (Quirk et al., 1992) or even without visual input (Hafting et al., 2005) to reflect the same body position relative to objects in the world.
The great majority of this rich line of work on entorhinal spatial coding has been conducted in rodents, leaving open the question of whether primate entorhinal neurons display allocentric coding and how such coding might support the memory function of this brain area. Allocentric representations have been identified in other regions of the human and nonhuman primate brain (Olson and Gettner, 1995; Georges-François et al., 1999; Committeri et al., 2004; Chen and Crawford, 2017). Recent work demonstrated that primate entorhinal cells exhibit spatial representations by firing selectively when a monkey looked at certain regions of space on a computer screen while freely viewing images (Killian et al., 2012). However, because the monkey was head fixed and the images were presented in only a single location, it was unclear whether the neural representation of gaze position was image centered or was instead referenced to the head position. Specifically, it was unclear whether the representation was allocentric and reflected gaze position relative to conspicuous environmental features (the image boundaries) or instead was egocentric and reflected the position of the eye in the orbit. To begin to address this question, we recorded gaze position and the activity of individual entorhinal neurons in monkeys freely viewing images that were displayed in different screen locations across trial blocks. We determined whether neurons represented where the monkey was looking and, importantly, whether gaze position was coded relative to the visual image position in an allocentric, image-centered reference frame or relative to the monkey's head in a head-centered reference frame. The results identified a strikingly large proportion of entorhinal cells that coded gaze position and, although some cells coded allocentric space by showing consistent spatial firing locked to the image display window across its varied screen locations, other simultaneously recorded cells showed firing consistent with a head-centered reference frame. Together, these data show that gaze position is represented widely across the primate entorhinal cortex and gaze position is represented in multiple spatial reference frames simultaneously across the neuronal population.
Materials and Methods
Experimental design and statistical analyses.
To identify the frame of reference for spatial entorhinal neurons, we recorded gaze position and the spiking activity from single entorhinal neurons while two head-stabilized rhesus macaque monkeys freely viewed large (30° × 25° for Monkey MP; 30° × 15° for Monkey WR), complex images that were presented at two different locations on a stationary screen within the recording session (Fig. 1A). In left trial blocks, images were centered 2° to the left of the center of the screen, whereas in right trial blocks, images were centered 2° to the right of the center of the screen, resulting in a total offset of 4° between the image locations. This spatial offset was chosen because it is a half cycle of the average spatial periodicity of previously observed grid cells (Killian et al., 2012) and therefore maximized the possibility of observing spatially periodic activity locked to the image display. Images were shifted horizontally instead of vertically to accommodate the shift of large images on a screen that was more wide than tall.
Task and recording. A, Free-viewing task schematic. Left and Right trial blocks differed by a 4° shift in visual stimulus location relative to the screen borders (green rectangle). In Left trials, stimuli were centered 2° left of the screen center, whereas in Right trials, stimuli were centered 2° to the right of the screen center. The monkey's head remained in the same position relative to the room and the computer screen for all trials. All stimuli presented within each trial block were confined within the screen space occupied by the images of that block (the “image window” of that trial block). Fixation: Trials began with a 500–750 ms required fixation on a cross positioned pseudorandomly in one of nine potential locations across a gray background rectangle. Image viewing: A complex image (photograph of variable content) was displayed for 5 s of free viewing. Calibration: The monkey received a fruit slurry reward for releasing a response bar in response to a subtle color change of a small square (which could appear in multiple locations across the gray background rectangle). The monkey's gaze on this square was used to calibrate eye-tracking software and correct any drift in recorded eye position. A minimum of 30 image trials were presented within each block. Right and Left trial blocks started alternate recording sessions. B, Estimated position of recording channels within the entorhinal cortex in one recording session is shown in red on a coronal MRI. C, Successful targeting of the entorhinal cortex was further confirmed for each session by the electrophysiological signature of the LFP across cortical layers locked to eye movement (Killian et al., 2012, 2015). An example of this signature is shown from one recording session. Black lines are individual average LFPs for each of the 13 recording channels. The amplitude has been normalized to the maximum over all LFPs. Data are aligned to onset of eye movement.
Nonparametric bootstrapping procedures were used to determine significance at the level of p ≤ 0.05 for all statistical analyses unless otherwise noted. Details of how bootstrapping was used in different analyses are reported along with the description of each analysis. Unless specified, all analyses were performed using custom code in MATLAB (The MathWorks).
Subjects, training, and surgery.
Two male rhesus macaques, 10 and 11 years old, and weighing 13.8 and 16.7 kg, respectively, were trained to sit in a primate chair (Crist Instrument) with a fixed head position and to release a touch bar for fruit slurry reward delivered through a tube. The monkeys were trained to perform various tasks by releasing the touch bar at appropriate times relative to visual stimuli presented on a screen. MRIs of each monkey's head were made both before and after surgery to plan and confirm implant placement. Separate surgeries were performed to implant a head post and then, months later, a recording chamber and finally a craniotomy within the chamber. All experiments were performed in accordance with protocols approved by the Emory University and University of Washington Institutional Animal Care and Use Committees.
Behavioral task.
For all recordings, the monkey was seated in a dark room head fixed and positioned so that the center of the screen (54.1 cm × 29.9 cm LCD screen, 120 Hz refresh rate, 1280 × 720 pixels; BenQ America) was aligned with his neutral gaze position and 60 cm away from the plane of the his eyes (equating to ∼25 screen pixels per degree of visual angle or 1°/cm). Stimulus presentation was controlled by a PC running Cortex software (National Institute of Mental Health, Bethedsa, MD). Gaze location was monitored at 240 Hz with an infrared eye-tracking system (I-SCAN).
Gaze location was calibrated before and during each recording session with calibration trials (Fig. 1A) in which the monkey held a touch-sensitive bar while fixating a small (0.5°) gray square presented at various locations on the monitor. The square turned yellow 400–750 ms (uniform distribution) after its appearance and the monkey was required to release the bar in response to the color change for delivery of a fruit slurry reward. The subtlety of the color change forced the monkey to fixate visually the location of the small square to correctly perform those trials, therefore allowing calibration of gaze position to the displayed stimuli. Specifically, the gain and offset of the recorded gaze position were adjusted so that gaze position matched the position of the fixated stimulus. Throughout the session, calibration trials enabled continual monitoring of the quality of gaze position data and correction of any drift. The monkeys performed alternating blocks of trials in which presented images were centered either slightly to the right or slightly to the left (Fig. 1A). The left and right image window locations were offset by 4°. Before each image presentation, a crosshair (0.3° × 0.3°) appeared in 1 of 9 possible locations across a gray background rectangle superimposed on the dark background of the screen. The gray rectangle was the same size and position of the images presented in that trial block and encompassed the screen space for all visual stimuli in that trial block. Once gaze position registered within a 3° × 3° window around the crosshair and was maintained within that spatial window for 500–750 ms, the image was presented. Images were complex natural images downloaded from the public photo-sharing website,Flickr (www.flickr.com). If necessary, images were resized by the experimenter for stimulus presentation (sized 30° × 15° for Monkey WR and 30° × 25° for Monkey MP). Monkeys were allowed to view the image freely and the image vanished after gaze position had registered within the image frame for a cumulative 5 s. No food reward was given during image-viewing trials. Each image presentation was followed by three calibration trials. The color-change square of calibration trials was superimposed on the gray background rectangle of a given image block. To cover the relevant screen space adequately for calibration during and after the experiment, there were a large number of unique color change square locations (100 and 54 unique locations for Monkey MP and Monkey WR, respectively) across image location trial blocks, as well as an additional 18 unique calibration points from the required fixation on the crosshair. These frequent and spatially ranging calibration points ensured that gaze position data could be tracked for accuracy across the entire session for the whole expanse of relevant screen area.
After completing a block of trials, a new block of trials would begin with all visual stimuli (images and calibration trial stimuli) shifted laterally 4°. Stimuli in the left trial block were centered 2° to the left of the center of the screen, whereas stimuli in the right trial block were centered 2° to the right of the center of the screen. Left and right trial blocks were pseudorandomly selected to be the first trial block of an experimental session.
In the first 15 sessions (of 26 sessions total) for Monkey WR, an ABA design with three trial blocks was used (visual stimuli were centered the same way in the first and last trial blocks, whereas stimuli in the middle block were centered at a shifted location), with a total maximum of 180 image presentations per session. In the rest of the sessions for Monkey WR and all 14 sessions for Monkey MP, there was a total maximum of 240 image presentations across only two trial blocks (120 image presentations within each image window location).
Offline eye position calibration.
Eye position data from calibration trials were examined offline after the experiment to further improve calibration and ensure that fixation locations were stable throughout the experiment. Eye position traces during calibration trials were fit to the calibration points (and crosshair fixation data to crosshair points) with affine, polynomial, projective, or linear transformations in MATLAB (“cp2tform” function). The best-fitting transform was selected by visual inspection of plots showing calibration points and the fit eye position data from calibration. The selected transform was then applied to the rest of the eye position data. One session was excluded from further analysis because this check revealed an unsalvageable compromise in quality of the eye position data.
Electrophysiology.
For each recording session, a laminar electrode array (AXIAL array with 13 channels; FHC) mounted on a microdrive (FHC) was lowered slowly into the brain through the craniotomy. MRIs along with the neural signal were used to guide the penetration. Spikes and LFPs were recorded using hardware and software from Blackrock and neural data were sampled at 30 kHz. A 500 Hz high-pass filter was applied, as well as an electric line cancellation at 60 Hz. In some recording sessions, a channel without any spiking activity was used as a reference electrode to subtract artifact noise (e.g., reward delivery, movement of the monkey). Spikes were sorted offline into distinct clusters using principle components analysis (Offline Sorter; Plexon). Sorted clusters were then processed further by custom code in MATLAB to eliminate any data in which the minimum interspike interval was <1 ms and to identify any missed changes in signal (e.g., shrinking of the waveform of interest, a new waveform appearing) using a raster and plots of waveforms across the session for each cell. When change in signal was identified, appropriate cuts were made to exclude compromised spike data from before or after a change point. A total of 455 potential single units originally cut in Offline Sorter were reduced to 357 single units. To further ensure recording location within the entorhinal cortex and to identify from which cortical layers units were recorded, we examined each session's data for the stereotypical, electrophysiological signature produced across entorhinal cortical layers at the onset of saccadic eye movement (Fig. 1C; Killian et al., 2012, 2015). One recording session, which other electrode placement metrics suggest was conducted above the entorhinal cortex within the hippocampus, lacked this electrophysiological signature and was excluded from further analysis (eight single units were excluded from being categorized as entorhinal cells). No recording sessions showed the current source density electrophysiological signature of adjacent perirhinal cortex (Takeuchi et al., 2011) at stimulus onset. The laminar location of each recorded channel was estimated using approximate cortical thickness along with layer-specific signal features: The phase reversal across cortical layers that occurs near layer II ∼200 ms after the saccadic onset and a phase reversal 100–150 ms after the saccadic onset indicating the transition to white matter dorsally. When one of these two laminar-specific signals was missing, the resulting ambiguities were retained in laminar classification (i.e., a neuron was classified as ambiguously being either in superficial or deep layers).
Location of cells along the anterior–posterior anatomical axis was accomplished by visually matching the anatomical features within a brain atlas (Paxinos et al., 2000) to the postchamber implant surgery coronal MRI slice (1 mm slices) estimated to be the plane of a recording. The distance of cells from the rhinal sulcus was determined by the voxel distance (0.5 mm voxels) between the sulcus and the estimated recording location on a coronal MRI slice.
Rate maps to characterize neural representation of gaze.
To identify neural activity related to gaze position as the monkey viewed the images, firing rate maps were computed for each neuron. The firing rate maps were computed across all images combined and showed a neuron's activity level across gaze positions within the screen space occupied by the images (Fig. 2). Data from the first 500 ms of image viewing were excluded to avoid transient visual responses to the onset of an image. The image space was divided into 0.5° square spatial bins and the number of spikes that occurred when the monkey's eye position fell within each bin was divided by the total viewing time within that bin. To accommodate potentially different firing field sizes and levels of spatial resolution, rate maps were smoothed three different ways: Adaptive smoothing (Skaggs et al., 1993) and smoothing with a Gaussian filter (5.5° × 5.5°) that had a standard deviation of either 1° or 1.5°. All three of these smoothing methods produced similar-looking rate maps for a given piece of data. Although smoothing generated extrapolated values for unvisited spatial bins, these values were subsequently removed so that unvisited spatial bins were left empty.
Spatial stability.
To assess whether each neuron's spatial activity was consistent for gaze position across trials, rate maps were assessed for stability across time. A spatial correlation (Pearson's) was computed between two rate maps from one neuron from separate time periods of the neural recording. This spatial correlation was considered significant if it was greater than 95% of bootstrapped correlation values, which were computed from shuffled data of the neuron's original two rate maps. Specifically, the spike train of a rate map was shifted circularly along the corresponding eye position trace at 1000 equally spaced increments; that is, the end of the spike train was wrapped to correspond to the beginning eye position data with the first shift. To avoid a large overlap with the original map, the starting positions of spike train shifts along the eye position trace were constrained to begin at least 10 s after the start of the trace and 10 s before the end of the trace. For each of the 1000 shuffles, a new rate map was generated and a spatial correlation was computed between the two generated rate maps. If the original spatial correlation was greater than or equal to the 95th percentile of the 1000 spatial correlation values generated from shuffling, then spatial correlation between the neuron's two rate maps was considered to be significant (p ≤ 0.05) and the neuron was considered to show spatial stability.
To determine spatial stability for a neuron across trials in which images were presented in a single location, data from those trials were split across time to yield two rate maps across the same screen pixels that were then tested for correlated activity through the shuffling procedure described above. We then extended this analysis to examine spatial stability across trial blocks, when the image window was laterally shifted to a different location. To quantify image-aligned spatial activity that shifted with the location of presented images, a correlation was computed for each neuron between the two rate maps from image viewing in the two different image display windows. A cell was considered to have a spatial representation that shifted along with the image window location (“image-aligned” spatial representation) if rate maps from the two different image window locations, aligned with the image bounds, yielded a significant spatial correlation (Fig. 3A). Passing this criterion would indicate that a cell encodes gaze position in an allocentric, image-centered reference frame. Conversely, a cell was considered to have a stable spatial representation that did not shift along with image window location (“screen-aligned” spatial representation) if rate maps from two different image window locations but the same overlapped screen space yielded a significant spatial correlation (Fig. 3B). Passing this criterion would indicate that a cell does not encode gaze position in an image-centered reference frame, but rather in a reference frame that remained stationary like the screen or the monkey's head.
To test whether a neural spatial activity shifted partially in the direction of the new image window location, spatial correlations for each neuron were computed for a range of partial spatial offsets (Fig. 4). Each neuron's rate maps from different image windows were correlated for eight spatial offsets that ranged between 0% and 100% of the image shift distance (4°). The incremental distance between different spatial offsets was one rate map spatial bin (0.5°). This analysis was performed three times using three different rate map smoothing methods (described above) to be agnostic about the “correct” smoothing. The resulting three correlation values for each offset were then summed to create one cumulative correlation vector per neuron across the range of spatial offsets, meaning that one cumulative correlation value corresponded to each offset. The offset with the highest value for each neuron is indicated in Figure 4 in red; the lowest value is shown in blue. Variability was computed by resampling the given cell population with replacement for the same number of original cells to repeat the analysis 1000 separate times. Error bars in Figure 4 represent the middle 95% of values from these 1000 iterations.
Saccade direction selectivity.
Neurons meeting criterion for spatial stability were tested for saccade direction selectivity to avoid a potential confound within a rate map between gaze position firing fields and a saccade direction preference. If a neuron did show a saccade direction preference, then the neural data were tested to determine whether rate map stability was due to preferential firing for that saccade direction.
Saccade direction selectivity was tested in three different perisaccade epochs: 100 ms leading up to the saccade, 100 ms centered on saccade onset, and a 100 ms period starting once the saccade completed. For each of these epochs, a neuron's firing rate across different saccade directions was computed using a saccade direction bin 10° wide that incremented by 5°. Using methods similar to Killian et al. (2015), data were pseudorandomly downsampled so that each angular quadrant had the same number of trials. Any neurons with more than 10% of saccade direction bins lacking values were excluded from further analysis (one neuron). If the downsampled neuronal response showed significant (p ≤ 0.05) nonuniformity on the Rayleigh test and also no significant departure from a von Mises distribution on Kuiper's test (p ≤ 0.05; Berens and Valesco, 2009), then the response was considered to be selective for saccade direction. No neurons showing significant nonuniformity on the Rayleigh test (0/16) showed a significant departure on the von Mises test. The preferred direction of a neuron was considered to be the direction with the maximum firing rate.
Rate map stability of neurons selective for saccade direction was recomputed after removing the data for the preferred saccade direction. Data were removed for saccades within 15 degrees of the preferred direction for any saccade epoch showing significant direction selectivity and rate maps from these cut data were used to compute a new spatial correlation value. If this value was significantly lower than the original correlation (lower than the fifth percentile of 1000 bootstrapped correlation values generated from the size-matched, downsampled data from the original rate maps for p < 0.05), than the spatial stability of the original rate maps was considered to be due to the neuron's saccade direction preference and the compared rate maps were no longer deemed to show spatial stability.
Image salience.
Neurons with spatial stability were tested for a representational confound between gaze location and image salience by computing a spatial correlation (Spearman's rank correlation coefficient) between the cell's rate map and the image salience map of presented stimuli (Saliency Toolbox; Walther and Koch, 2006). A correlation was considered significant if it was higher than the middle 95% of a distribution of bootstrapped correlation values between the image salience map and 1000 rate maps generated by shuffling the original rate map data.
Decoding gaze location from neural activity.
To assess the quality of the spatial information carried by the population as a whole, we used neural data from all 349 recorded neurons to decode gaze location. We first selected the two rate maps for each neuron that had the highest spatial stability as determined by the spatial correlation weighted by its percentile within shuffled correlations of the same data. These two rate maps for each neuron were then stacked with rate maps of other neurons to create two population rate maps (Fig. 5A). In this way, a neuron could contribute either the first and second half of its data within one image window or data from the first and second image windows aligned to image window borders or to the screen borders. This process not only allowed an agnostic approach for targeting the most spatially consistent activity across all cells even if they were not categorized as spatial, but also permitted usage of cells with only data from one image window location. To be stacked, all rate maps were made to be the same size (the size of the screen area common to all image windows). The size of rate map spatial bins (0.5° × 0.5°) was not changed. The two population maps therefore reflected data from two different time periods for each neuron in the population.
The population firing rate vector of each spatial bin (0.5° × 0.5°) in one population map was then treated as a vector with a gaze location that needed to be estimated (Fig. 5A). This estimate was made by correlating the population firing rate vector in question with every spatial bin's population firing rate vector in the other population rate map (Pearson's correlation) and then choosing the bin with the highest correlation value. This chosen bin served as the estimate of gaze location for the vector in question. Because all rate maps of individual neurons were normalized between 0 and 1 before being added to the population rate map, general firing rate levels of individual neurons did not aid prediction. Each gaze location bin could be chosen only once, so a unique gaze location was predicted for every firing rate vector. If one gaze location was the best match for more than one vector, then that location was matched to the vector with which it was best correlated. Prediction error was computed as the Cartesian distance in units of degrees of visual angle between the actual location and predicted location of the population firing rate vector in question. The distribution of prediction errors for all 1623 firing rate vectors is shown in the top panel of Figure 5B. To estimate the variability of the median of this distribution, the distribution was resampled with replacement for the same number of predicted gaze locations to create a bootstrapped error distribution and the 50th percentile of this distribution was stored. This process was repeated 1000 times to produce 1000 bootstrapped values that represented the variability of the median prediction error, indicated by a horizontal error bar in the top panel of Figure 5B.
To compare the gaze location prediction to chance success, we predicted gaze location the same way as described above, except we scrambled the population rate maps by shifting each cell's contributing rate maps (one “layer” within the population rate map) a random x–y distance (Fig. 5B, bottom). This resulted in population rate maps with the same general spatial structure as those used in the original analysis, except that the positions of firing fields were randomly shifted within each component cell's rate map. As described above, the 50th percentile of prediction error was used to describe the typical error of the prediction across vectors. These 50th percentile prediction error values were then compared between the scrambled and original data. If the middle 95% of the median error values from the original data were lower and did not overlap with the middle 95% of the median error values from the scrambled data, then the original prediction was considered to have significantly lower error than would be expected by chance (p < 0.05).
Grid activity.
Each cell was tested for grid activity by determining grid scores (Sargolini et al., 2006; Brandon et al., 2011; Killian et al., 2012) for each of its rate maps, the significance percentile of those grid scores, and the spatial stability across time of any map with a significant score. No rate map was considered to have significant grid activity if it lacked significant spatial stability in the relevant image space. For example, a neuron was considered to show image-aligned grid-like activity if it was one of the neurons categorized as having image-aligned spatial activity (described above) and it had a significant grid score relative to shuffled values for any image-aligned rate maps. Likewise, a neuron was considered to show screen-aligned grid-like activity if it was one of the neurons categorized as having screen-aligned spatial activity and it had a significant grid score in any screen-aligned rate maps. A neuron with data from only one image window was considered to show grid activity if it had a significant grid score and showed significant spatial stability across time across the halves of its data. A neuron was also considered to show grid activity if it had lacked stability across image windows, but had significant grid score for a spatially stable rate map in a single image window. All of these tests were performed separately for each of the three different rate map-smoothing methods described earlier.
The same tests were also performed to examine grid activity across smaller rate maps (Fig. 6-1), except that those tests were intended to illustrate a conceptual point and were therefore less exhaustive. The aim was to test whether smaller rate maps, more comparable in size to those in our past research (Killian et al., 2012), would yield a comparable proportion of neurons with grid activity. Each neuron's rate maps were split across space along the two diagonals of the map (i.e., one rate map was divided into four rate maps with some overlapping data; see Fig. 6-1) and then any map with grid-like activity (a grid score ≥95th percentile of grid scores produced from that map's shuffled data) that was also spatially stable was counted as a neuron showing stable grid activity for this smaller space. Neurons with stable grid activity from this analysis are reported separately with regard to this test and are not grouped in with the other analysis of grid activity that used the whole image-viewing area. In addition, tests for grid activity across a smaller area differed in that they excluded rate maps smoothed adaptively, rate maps aligned to screen bounds from a single image window, and all cells with data from two image windows that only were significantly stable within, not across, image windows.
Grid scores for each rate map were computed from its autocorrelation in two different ways using the higher score as the final score (Killian et al., 2012). In the first method, the six closest peaks to the center peak of the autocorrelation were detected as the six bins closest to the center peak that each had a positive value higher than the surrounding 24 bins. The ringed area of the autocorrelation map that included these six peaks but excluded the central peak was then extracted, the values for each radial location along the ring were averaged together to create one value for each position on a ring vector, and then this ring vector was correlated with itself at rotational offsets of 30°, 60°, 90°, 120°, and 150°. The grid score was then calculated as the maximum correlation value of the 30°, 90°, and 150° rotations (the angle rotations for which a grid cell with 60° symmetry would be expected to have a low correlation) subtracted from the minimum correlation value of the 60° and 120° rotations (rotations for which a cell with 60° symmetry would be expected to have a high correlation). The initial ring area extracted from the autocorrelation began at half the distance of the average peak distance from the center, and was only two bins wide. In subsequent iterations, the width of the extracted area grew by one bin and the rotational values were assessed repeatedly to produce additional grid scores for fatter ring areas. The maximum grid score from all ring widths was taken as the grid score for this method. In the second method for computing grid scores that corrected for elliptical distortion of 60° symmetry (Brandon et al., 2011), the same process was repeated except the extracted ring area from the autocorrelation was adjusted to be an ellipse (the farthest peak was considered the major axis of the ellipse or, in another full repetition of grid score calculation, the closest peak was considered the minor axis of the ellipse). The highest resulting grid score across the two methods was used as the final score for a rate map. A grid score of a given rate map was considered significant if it was greater than or equal to the 95th percentile of synthetic grid scores produced from shuffling the data for that rate map 1000 times (the shuffling procedure is described in the “Spatial stability” section).
Results
Consistent spatial activity of individual neurons
A total of 349 single entorhinal neurons (237 from Monkey WR and 112 from Monkey MP) were recorded across 41 sessions. Over the entire population, ∼40% (136/349) of all recorded neurons exhibited a consistent spatial representation across many different presented images by representing gaze position either within or across locations of the image display window. Approximately one-fifth of all neurons (71/349) consistently represented gaze position across trials in which the images were presented in the same location (Fig. 2). Across trials in which images were presented in different locations, approximately one-third of neurons with data from two image window locations (87/283) exhibited a consistent spatial representation. Half of these neurons (44/87) exhibited an allocentric spatial representation locked to the bounds of the image display window, signaling gaze location relative to the bounds of the image display. For example, the neuron shown in Figure 2A (also Movie 1) shifted its spatial representation along with the shifted location of the image display window (Fig. 3A). The other half (43/87) of these neurons showed a spatial representation that did not shift with the location of the image window on the screen, suggesting that they were signaling gaze position in an egocentric or stationary reference frame (Fig. 3B, Movie 2). These relative proportions were similar in each monkey (image-aligned spatial activity, Monkey WR: 53%, Monkey MP: 44%). Interestingly, neurons with different spatial reference frames; that is, those with image-aligned or screen-aligned spatial activity, were often recorded simultaneously (18/34 sessions in which neurons with stable spatial activity across image frames were recorded) and were found across all cortical layers and all recording locations within the entorhinal cortex.
Neurons exhibit stable spatial activity when images are presented in one location. A, Spatial activity of a neuron is shown for trials when the image (30° × 15°) is presented on the left. The green rectangle represents the borders of the screen schematically and the black area represents screen space outside of the image display window of a trial block. Eye position trace is shown in gray and red dots indicate gaze locations where the neuron fired action potentials above the median firing rate. The same data are shown as a rate map to the right of each eye trace plot. A movie of spikes occurring for this example cell as gaze position moves over the screen is shown in Movie 1. B, Additional single neurons with stable spatial activity. Each gray rectangle shows the activity of one neuron. Rate maps are the same size as presented images, which were 30° × 25° for one monkey (top row) and 30° × 15° for the other monkey (bottom row). The top and bottom rate map displayed for each neuron show data from the first and second half of viewing within a single image window location, respectively. Warm and cool colors within the firing rate maps indicate high and low firing rate, respectively; the firing rate range reported for each rate map corresponds to the minimum and maximum of the color bar (top right). The spatial correlation between each neuron's pair of rate maps is indicated by “r” and followed by an asterisk to indicate a significance of p ≤ 0.05.
Image-aligned spatial activity. The movie shows spiking activity from one example neuron shown both in Figure 2A and the leftmost plot of Figure 3A as eye position moves over the screen. Data are shown separately for trials in which the monkey viewed images presented at two separate screen locations. Activity is aligned to image bounds.
Spatial activity can be aligned to the image or aligned to the screen. For each neuron, the data are split between the different image window locations. Firing rate maps and the spatial correlation between them are shown for each neuron with same plotting schematic as in Figure 2. A, Spatial activity for individual neurons (n = 2) that shifted their spatial activity along with the location of the image window. These image-aligned neurons show consistent spatial activity across the two image window locations when rate maps are aligned to image bounds. Rate maps are the same size as presented images, which were 30° × 15° or 25° for each monkey. A movie of spikes occurring for an example neuron as gaze position moves over the screen is shown in Movie 1. B, Spatial activity is shown for individual neurons (n = 2) that did not shift their spatial activity along with the location of image window. These screen-aligned neurons show consistent spatial activity across two image window locations when rate maps (over common image space) are aligned to screen bounds. Rate maps are the size of the image space common to all image presentations (26° × 15°). Crosshatching indicates the part of image window rate map that is not shown for visual clarity of nonshifting activity and represents the area of the screen not shared among all image presentations. Spikes occurring for an example neuron as gaze position moves over the screen are shown in Movie 2.
Screen-aligned spatial activity. The movie shows spiking activity from one example neuron shown in the leftmost plot of Figure 3B as eye position moves over the screen. Data are shown separately for trials in which the monkey viewed images presented at two separate screen locations. Activity is aligned to screen bounds.
All neurons with spatial consistency were further analyzed to determine the extent to which any saccade direction selectivity or salient image features may have contributed to observed spatial representations. Twelve percent of spatial neurons (16/136) showed selectivity for a saccade direction, but all of these neurons maintained spatial stability (p ≤ 0.05) after removing the preferred saccade direction responses. Regarding salient image features, 1% of spatial neurons (2/136) showed a consistent central spatial firing that was correlated (p ≤ 0.05) with the central spatial bias of salient image regions.
To determine whether spatial activity shifted partially in the direction of a new image window location, spatial correlations for each neuron were computed for a range of spatial offsets between 0% and 100% of the 4° distance between image windows. Across all neurons, neural activity was most consistent across image window locations for complete (0% or 100%) rather than partial shifts (Fig. 4, top). For the population of neurons with allocentric spatial representations that shifted along with image window location (“image-aligned cells”), neural activity was most consistent when spatial stability was tested at a completely shifted offset (Fig. 4, second row, red bars) and least spatially consistent at a nonshifted offset (Fig. 4, second row, blue bars). For example, the activity of the image-aligned cell shown in Figures 2A and 3A (leftmost example) is least spatially consistent of all possible spatial offsets when its activity is aligned to the screen. The reverse was observed for the population of cells with spatial representations that did not shift along with the image window location (“screen-aligned cells”; Fig. 4, third row).
Neural activity is most spatially consistent for complete (instead of partial) shifts along with image window location. Cell rate maps from different image window location trial blocks were tested for partial shifting by computing the correlation values for a range of spatial offsets between 0% and 100% of the full shift of image window location. For each spatial offset, the red bars indicate the number of cells with their maximum spatial stability at that particular offset. This is shown for all cells (top row), cells that exhibited spatial activity that shifted with the image window location (second row, “image-aligned cells”), cells that exhibited spatial activity that did not shift with the image window location (third row, “screen-aligned cells”), and cells that did not pass criterion to be categorized as image-aligned or screen-aligned (bottom row, “other cells”). Blue bars indicate the number of cells with their minimum spatial stability at each spatial offset. Error bars indicate 95% confidence intervals of 1000 bootstrap iterations (resampling with replacement the cell population for the same number of original cells). Nonoverlapping error bars confirm in each cell population that more cells have their maximum spatial correlation values at 0% and 100% shifts compared with partial shifts (p < 0.05). See Materials and Methods for additional details on computing cell spatial stability.
Although approximately one-third of neurons recorded with two image locations (87/283) met criterion as spatially stable across different image window locations, it is clear from the bottom panel of Figure 4 that the remaining cells (n = 196) mimic the correlation pattern of the cells that passed criterion; their rate maps also have maximum correlation values at complete (0% or 100%) rather than partial shifts with image window location. In addition, like the cells that passed criterion for spatial stability, approximately equal numbers of this cell population were maximally stable when perfectly aligned with the image window location or aligned with the screen. Including these neurons that exhibited maximum spatial stability at 0% or 100% of spatial offset, the majority of all recorded neurons (67%, 232/349) represented gaze location.
Decoding gaze location from neural activity
To assess the quality of the spatial information carried by the population as a whole, we used neural data from all 349 recorded neurons to decode gaze location. A gaze location was predicted for each population firing rate vector in a population rate map (Fig. 5A). The difference between the actual location and predicted gaze location is shown for all firing rate vectors in a histogram in the top panel of Fig. 5B. To compare this result to chance success, the bottom panel of Fig. 5B shows decoding error when the data is scrambled by circularly shifting each cell's rate map by a random x and y value (causing the firing fields to change location). When considering all 1643 population firing rate vectors in the population rate map, the size of the resulting median prediction error was quite small and was significantly smaller than chance (p < 0.001; Fig. 5B). Even when only 10 neurons were used to predict gaze position, the median error was significantly lower than chance and the prediction error was as low as 2.5° for some groups of neurons tested (Fig. 5-1A).
Gaze location can be decoded from population neural activity. A, Schematic of how gaze location was predicted for one population firing rate vector in a population firing rate map. Each cell contributed data from two different time periods (two rate maps) to create two population rate maps, each with approximately half the data. Within the population map shown on the left, a small black square located in the same location on each cell's rate map indicates the location of one gaze location spatial bin. The values of that gaze location bin location across rate maps constitute that spatial bin's population firing rate vector (bottom left). The gaze position bin location of this vector is then predicted by finding the highest correlation between the vector and all the vectors in the other population rate map containing the other half of the data (right). The bin location of the vector with the highest correlation was taken as the predicted gaze location. Across all predictions, each spatial bin (0.5° × 0.5°) was limited to being predicted once, meaning that, if two different vectors from the map shown on the left both had their highest correlation with the same bin vector in the map shown on the right, then only one vector (with the higher correlation value) would be predicted to have that spatial bin location. B, Predicting gaze location of population firing rate vectors using aligned rate maps (blue) of all recorded cells (n = 349) exceeds chance success (red). Histograms show smaller (p < 0.05) median error between the real and predicted gaze location when aligned neural activity was used to predict gaze location (top row) than when the prediction was made from scrambled rate maps (bottom row). Vertical line indicates the median and error bar indicates the 95% confidence interval of median error values generated by resampling the prediction error distribution with replacement 1000 times. Figure 5-1, shows prediction error when fewer neurons are used, as well as the number of cells contributing data aligned to the screen or the image window. C, Probability of predicting the correct gaze location using neural activity of all recorded cells (n = 349) is shown across all firing rate vectors. Probability was determined by the rank of the correct gaze location among all potential locations ordered by their correlation value. In other words, if the correct gaze location bin had the highest correlation value across all potential gaze location bins, then the probability of predicting it would be 1, but if it were ranked 80th of 100 potential bin locations, then the probability of correctly predicting it would be 0.20.
Figure 5-1
The accuracy of predicted gaze position is not due to image-aligned or screen-aligned cells alone. Using only image-aligned or screen-aligned cells to predict gaze position resulted in no difference in median prediction error (p = 0.17). In addition, because exactly the same amount of neurons across the population have their best spatial correlations in image-aligned or screen-aligned reference frames (Fig. 5-1B, right-most values show that the cyan and beige points perfectly overlap), each reference frame was used equally in the prediction analysis.
Grid activity
A proportion of neurons with stable spatial activity (14/136, 10%) demonstrated grid-like representations and had significant grid scores (Fig. 6). These grid cells were, like the general population of spatially consistent neurons, approximately evenly split between those which had a spatial representation that aligned to the image window location (n = 5) and those that aligned to the screen (n = 7). The remaining grid cell could not be tested for spatial stability beyond one image window because data were only collected from one image window location (n = 2).
Grid-like activity can shift along with image window location. An example neuron is shown that yielded a significant grid score (p = 0.04) for a spatially stable rate map across two image window locations. The neuron's spatial activity was aligned with the location of the image window (r = 0.4 spatial correlation between rate maps from two image window locations aligned to image window bounds, p = 0.002). Spikes occurring for this neuron as gaze position moves over the screen are shown in Movie 3. Plotting schematic is the same as Figures 2 and 3. The grid score for each rate map (g) is indicated on the top of the map. Rate maps are the same size as presented images (30° × 25°). Additional neurons showed stable, grid-like activity when a smaller portion of visual space was considered (Figure 6-1, and Movies 4 and 5).
Figure 6-1
Grid cells accounted for a small proportion of the total recorded neurons (14/349, 4%, without preselecting spatially consistent neurons) and contrasts with the larger proportion (12%) identified in a previous study from our laboratory (Killian et al., 2012). Although only approximately one-fourth of our recorded neurons (n = 95/349) were located in the superficial layers where grid cells are predominantly found in rodents, we suspect that this is not primarily responsible for our low yield of grid-like activity because the previous study identified grid cells in both superficial and deep layers. Rather, we speculate that our observation of a lower proportion of neurons with grid activity is attributable to the fact that we measured spatial activity across a much larger viewing area than in the previous study, with a viewing area subtending four to seven times more area in degrees of visual angle. Although a smaller area certainly provides a lower bar for detecting grid activity simply because a grid pattern need only be consistent across a smaller area, grid patterns have been shown to lose consistency across very large spaces in rodents (Stensola et al., 2015).
To determine whether a smaller viewing area would result in more reliable grid activity, we assessed grid activity and its stability for each neuron for half the area of its rate maps by dividing each rate map along each diagonal (Fig. 6-1, and Movies 4 and 5). Consistent with qualitative inspection, this analysis revealed a larger proportion of neurons with significantly stable grid activity (47/349 cells, 13%), which was comparable to the proportion reported in earlier work (Killian et al., 2012). Importantly, even within the smaller area, neurons with stable grid activity had more than six firing fields (Fig. 6-1).
Image-aligned spatial activity that is grid like. The movie shows spiking activity from the neuron shown in Figure 6 as eye position moves over the screen. Data are shown separately for trials in which the monkey viewed images presented at two separate screen locations. Grid-like activity is aligned to image bounds.
Grid-like spatial activity in part of viewing area. The movie shows spiking activity from the neuron shown in Figure 6-1A, as eye position moves over the screen. Data were collected from only one image window location. The first and second half of trials are shown separately. Grid-like activity is shown over half of the viewing area.
Neuron exhibits grid-like spatial activity in part of viewing area that is image aligned. The movie shows spiking activity from the neuron shown in Figure 6-1B, as eye position moves over the screen. Data are shown separately for trials in which the monkey viewed images presented at two separate screen locations. Grid-like activity over half of the viewing area is aligned to image bounds.
Discussion
Entorhinal neurons, including grid cells, border cells, and head direction cells, represent body position relative to world features as rodents actively explore their environment with locomotive movement. In monkeys, similar spatial representations have been identified during visual exploration with eye movement; however, the frame of reference for these spatial representations in primates was unknown. In particular, it was unclear whether entorhinal neurons code gaze position relative to visual world features, similar to the allocentric activity identified in rodents. In addition, the proportion of neurons with spatial representations had never been examined. Accordingly, this work sought to assess the extent of primate entorhinal neurons with spatial representations and to identify the spatial reference frame used by recording spiking activity while monkeys freely viewed images displayed at different locations.
Our results revealed that a majority of primate entorhinal neurons represent gaze position. Approximately half of these neurons fired consistently when a monkey fixated specific locations within an image display window even when the location of the image window was moved within the screen, demonstrating that individual primate entorhinal neurons can reflect an allocentric, visual frame of reference. Because not all simultaneously recorded neurons exhibited spatial firing that moved along with the location of the image display window, these results demonstrate that coactive neurons do not necessarily code gaze position within the same spatial frame of reference. Importantly, whereas the firing fields that we observed most often exhibited an irregular layout across space, the spatial activity across the neural population was stable and specific enough to allow for successful decoding of gaze position, with the low median error of 2.5° between actual and predicted location.
Coactive neurons with distinct reference frames
We found that the reference frame can differ across simultaneously recorded cells. These findings are consistent with some reports of rodent entorhinal cells noncoherently altering their spatial responses to environmental change (Savelli et al., 2008; Stensola et al., 2012), but stand in contrast to other reports of coherent spatial responses across cells to environmental change (Hafting et al., 2005; Fyhn et al., 2007; Solstad et al., 2008; Savelli et al., 2017). One explanation for the discrepancy across studies could be the variability in the strength of the environmental change (Jeffery, 2011). Cells might respond to environmental change independently of one another only when the change is subtle, which is perhaps a condition met in our experiment. Supportive of this idea, changing only the odor of an environment resulted in partial remapping in the rodent hippocampus, where only some cells within a population of simultaneously recorded cells changed their spatial activity (Anderson and Jeffery, 2003). Likewise, changing the location of a key landmark in a foraging task rich with other landmarks revealed that coactive hippocampal cells could show spatial firing fields relative to that landmark or the unchanging landmarks of the room (Gothard et al., 1996). Such mixed-reference frame hippocampal activity is potentially a downstream reflection of entorhinal cell activity. Consistent with this idea, inactivation of the medial entorhinal cortex in rodents can cause partial remapping in hippocampal CA1 neurons (Rueckemann et al., 2016), indicating that spatial responses of hippocampal cells can be directed by entorhinal input. Together, these results suggest that both entorhinal and hippocampal neurons can sustain multiple reference frames at one time across the population and may do so in concert as connected subnetworks across brain areas.
An important caveat is that we did not exhaustively test spatial responses relative to all possible shifts in reference frame. Specifically, we did not chronically monitor cell activity over multiple experimental sessions to determine whether individual cells switch reference frames over time. In addition, we did not examine the neuronal responses to shifts in other possible perceptual reference frames, such as the screen itself or the room, meaning that screen-aligned neurons could encode gaze position relative to the monkey's head or an allocentric, stationary reference frame like the screen. Earlier work in the primate hippocampus found that only a minority of neurons coded for gaze position relative to the body axis in true egocentric coordinates (Feigenbaum and Rolls, 1991).
Prevalence of irregular spatial activity
We designed this study to enhance our ability to identify grid cell activity by using larger visual images than our previous study, thus allowing for more firing fields within each image (Killian et al., 2012). However, we instead observed a dominance of irregular spatial cells that informs our understanding in several ways. First, the small amount of cells with coherent grid activity over the large image space could be viewed as consistent with results in rodent in which grid representations were influenced by local features, thus especially distorting the coherence of the grid pattern across large enclosures (Stensola et al., 2015). Congruent with this idea, the data from the present study demonstrate that some irregular spatial cells have significant grid-like activity within a restricted region of the visual display (Fig. 6-1). Second, the dominance of irregular spatial cells suggests that they play an important role within the circuit. In rodents, ∼10% of entorhinal cells in the superficial layers of medial entorhinal cortex are grid cells (Tang et al., 2014; Sun et al., 2015), whereas irregular spatial cells are widespread, comprising as much as 50–70% of superficial layer medial entorhinal cells (Sun et al., 2015; Diehl et al., 2017), and they likely constitute a major input to the hippocampus (Zhang et al., 2013). Irregular spatial cells also show persistent spatial responses, along with place cells in the hippocampus, even when theta oscillations and grid cell patterns are diminished by septal inactivation (Brandon et al., 2011; Koenig et al., 2011), indicating that irregular cells may play an important role in sustaining the spatially specific firing of place cells in the hippocampus (Poucet et al., 2014). The theoretical utility of cells with irregular fields for self-localization in large environments has been highlighted recently (Hedrick and Zhang, 2016). Here, our neural data, dominated by irregular spatial activity, were used to decode gaze position with a high degree of spatial resolution (within 2.5° of actual gaze position). Interestingly, this spatial resolution resonates ecologically; it is about the limit of visual space within which visual detail can be extracted during a single fixation (Findlay and Gilchrist, 2003).
Entorhinal gaze position signal is not a traditional visual response or a motor response
We identified a large portion of entorhinal neurons that represent gaze position. However, it is important to note that this entorhinal representation of gaze position is distinct from the traditional eye-centered responses observed in early visual cortex and oculomotor areas. Specifically, whereas a neuron in a visual or oculomotor area responds to a confined region of eye-centered space within the contralateral visual field (its response field), we observed that entorhinal neurons fired selectively when the monkey fixated multiple, locations within a 30°-wide image window, with highly variable scan paths across more than 100 images in each session. In addition, entorhinal neurons showed a lack of sensitivity to image content, firing consistently for fixation locations occupied by a large range of perceptually distinct images. These entorhinal properties are perhaps unsurprising given that the entorhinal cortex receives extensive input from the perirhinal cortex (Van Hoesen and Pandya, 1975; Suzuki and Amaral, 1994), where cells have large, bilateral receptive fields (Desimone and Gross, 1979). Although our experiment does not address the question of whether a neuron can represent gaze position relative to a particular visual object within an image, our observations do reveal that neurons can represent gaze position relative to the visual object of the image window.
Another notable distinction in the current research is the use of a free-viewing paradigm instead of the fixation-based paradigms traditionally used to measure visual and eye movement responses in primates. The free-viewing paradigm used here is in many ways analogous to the free-foraging paradigms used to assess entorhinal spatial responses in rodents and allows for cross-species comparisons. Moreover, this naturalistic, exploratory free-viewing paradigm was instrumental in identifying spatial representations in the primate entorhinal cortex (Killian et al., 2012, 2015) and this kind of paradigm has been shown to be sensitive to damage to medial temporal lobe structures in primates (Pascalis and Bachevalier, 1999; Ryan et al., 2000; Zola et al., 2000; Smith and Squire, 2008; Hannula et al., 2010).
Implications
The present results provide evidence that entorhinal cells can code gaze position in a visual reference frame that is spatially broad and insensitive to image content. Such large, visual reference frames could be used to produce eye movements from memory that use a remembered environment as a frame of reference. Specifically, some eye movements may be guided by gaze position relative to the remembered structure of an environment, such as in natural behavior when people shift gaze to a target outside the current field of view (Land et al., 1999; Hayhoe et al., 2003). Recent work in patients with medial temporal lobe damage suggests a strong role for this brain region in the rapidly acquired memory of the spatial layout in a visually presented scene (Urgolites et al., 2017). How the entorhinal cortex, as part of a memory system, uses these spatial signals to produce adaptive behavior is a fascinating question and future studies are necessary to advance our understanding of the neural mechanisms by which memory guides viewing behavior (Meister and Buffalo, 2016).
Footnotes
This work was supported by the National Institutes of Health (Grants 2R01MH080007 and R01MH093807 and National Institute of Mental Health Grant P51 OD010425 to E.A.B.). We thank Laura Kakalios, Kiril Staikov, Megan Jutras, and Kelly Morrisroe for assistance with animal training and handling; Nathan Killian for experimental advice and supplying example MATLAB code for analysis; and Jon Rueckemann for helpful discussion regarding the analysis of gaze position prediction.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Miriam Meister, Washington National Primate Research Center, University of Washington, Box 357330, Seattle, WA 98195. mmeister{at}uw.edu