Abstract
Everyday life is composed of events organized by changes in contexts, with each event containing an unfolding sequence of occurrences. A major challenge facing our memory systems is how to integrate sequential occurrences within events while also maintaining their details and avoiding over-integration across different contexts. We asked if and how distinct hippocampal subfields come to hierarchically and, in parallel, represent both event context and subevent occurrences with learning. Female and male human participants viewed sequential events defined as sequences of objects superimposed on shared color frames while undergoing high-resolution fMRI. Importantly, these events were repeated to induce learning. Event segmentation, as indexed by increased reaction times at event boundaries, was observed in all repetitions. Temporal memory decisions were quicker for items from the same event compared to across different events, indicating that events shaped memory. With learning, hippocampal CA3 multivoxel activation patterns clustered to reflect the event context, with more clustering correlated with behavioral facilitation during event transitions. In contrast, in the dentate gyrus (DG), temporally proximal items that belonged to the same event became associated with more differentiated neural patterns. A computational model explained these results by dynamic inhibition in the DG. Additional similarity measures support the notion that CA3 clustered representations reflect shared voxel populations, while DG’s distinct item representations reflect different voxel populations. These findings suggest an interplay between temporal differentiation in the DG and attractor dynamics in CA3. They advance our understanding of how knowledge is structured through integration and separation across time and context.
- attractor dynamics
- CA3
- dentate gyrus
- encoding
- event segmentation
- high-resolution fMRI
- hippocampus
- learning
- pattern completion
- pattern separation
Significance Statement
A major challenge of our memory system is to integrate experiences occurring in the same context to generalize context-appropriate knowledge, while also maintaining distinct representations of these same occurrences to avoid confusion. Here, we uncover a novel mechanism for hierarchical learning in the human hippocampus that might help to resolve this tension. In the CA3 subregion of the hippocampus, the neural representations of items presented sequentially in the same context, but not in different contexts, became more overlapping with learning. In contrast, adjacent items, appearing close in time and in the same context, became increasingly more differentiated in the dentate gyrus. Thus, multiple representations in different hippocampal subregions encoded in parallel might enable simultaneous generalization and specificity in memory.
Introduction
A challenge to a memory system is how to distinguish similar experiences that occur in the same context to avoid confusion, while also generalizing across these same experiences to extract shared context-relevant knowledge. Changes in context, termed event boundaries, are essential to constructing context-relevant knowledge because they cause us to parse continuous experience into discrete episodes or events (Clewett and Davachi, 2017; Radvansky and Zacks, 2017; Clewett et al., 2019; Bird, 2020; Zacks, 2020; Maurer and Nadel, 2021). These events unfold in time, and within each event, layered on a stable context, there is a higher-frequency sequence of occurrences or subevents. For example, making breakfast entails a sequence of subevents, such as brewing coffee and frying eggs, and is a distinct event from leaving our house to commute to work. As events become more familiar, such as our everyday life routines, learning provides opportunities to integrate subevents belonging to the same event, which can improve linking information in memory (Paz et al., 2010; DuBrow and Davachi, 2014; Ezzyat and Davachi, 2014). However, integration can also blur the distinction of event details (Brown and Stern, 2013; Favila et al., 2016; Chanales et al., 2019; Wanjia et al., 2021). Thus, it remains an important and open question how the memory system might balance between context-specific integration while also maintaining detailed representations of subevents.
Prior work has pointed toward distinct computations supported by hippocampal subfields CA3 and dentate gyrus (DG; O’Reilly and McClelland, 1994; Treves and Rolls, 1994; Norman and O’Reilly, 2003; Yassa and Stark, 2011). The hippocampus has been implicated in the representation of context, time, and sequential events (O’Keefe and Nadel, 1978; Davachi and DuBrow, 2015; Eichenbaum, 2017; Bellmund et al., 2018, 2020; Buzsáki and Tingley, 2018). In CA3, highly interconnected auto-associative networks have been hypothesized to promote attractor dynamics: similar inputs converge to the same CA3 activity pattern (“pattern completion”), whereas different inputs are attracted toward different activity patterns (Marr, 1971; Treves and Rolls, 1994; Lee et al., 2004; Vazdarjanova and Guzowski, 2004; Yassa and Stark, 2011; Kesner and Rolls, 2015; Knierim and Neunuebel, 2016). Through this mechanism, subevents that unfold in the same context and share perceptual and temporal information can be further integrated, whereas at event boundaries, changing perceptual input may drive CA3 toward a different pattern, representing the new context (Hasselmo and Eichenbaum, 2005; Howard et al., 2005; Kesner and Rolls, 2015).
As CA3 representations of items from the same event become more similar, DG might distinguish these same sequential representations. The DG has been broadly and widely implicated in “pattern separation”: the allocation of distinct neural representations to highly similar information (Treves and Rolls, 1994; Leutgeb and Leutgeb, 2007; Yassa and Stark, 2011; Knierim and Neunuebel, 2016; Nakazawa, 2017). Most if not all of this work has focused on pattern separation of highly similar spatial environments (Leutgeb et al., 2007; Leutgeb and Leutgeb, 2007; Neunuebel and Knierim, 2014; Baker et al., 2016; Berron et al., 2016; Danielson et al., 2016; Wanjia et al., 2021) or object stimuli (Kirwan and Stark, 2007; Bakker et al., 2008; Lacy et al., 2011). For sequential events that evolve in time, subevents that appear close in time are the most similar in their temporal information and thus might require disambiguated neural representations, to minimize interference. We investigated whether the DG performs such temporal differentiation, and if so, is it sensitive to event structure?
Much of everyday life is composed of familiar routines. Repeated experiences not only allow opportunities for both integration and encoding of details but also entail the risk of over-integration that can lead to the loss of contextually relevant knowledge. Theoretical models and some empirical work, mostly in rodents, suggest that both attractor dynamics and neural differentiation might require familiarity with spatial environments (Lever et al., 2002; Leutgeb et al., 2004; Wills et al., 2005; Gill et al., 2011; Steemers et al., 2016; Chanales et al., 2017; Schapiro et al., 2017; Fernandez et al., 2023). This motivated us to examine how context-based temporal differentiation and integration increase as sequential events become familiar, to promote adaptive learning.
Materials and Methods
Participants
Thirty participants were included in this study (18 females, aged 18–35 years, mean age 23.32). One additional participant was excluded due to exceeding movement (>3 mm within a scan, as indicated by MCFLIRT estimated mean displacement; most participants had below 1.5 mm movement in all scans, one participant had one scan with 3 mm displacement, and our examination revealed that MCFLIRT could correct this motion to 0.6 mm motion, which is within voxel. Hence, this participant was included). This criterion was set after motion correction was applied during preprocessing, before any functional or behavioral analysis. Two additional participants were excluded due to poor compliance with the task (a priori criteria of lower than 2.5 SD of the group average in the temporal memory test, during the first or the second day of testing). The sample size of 30 was determined based on previous fMRI studies using similar approaches (Schlichting et al., 2015; Aly et al., 2018; Dimsdale-Zucker et al., 2018) and a behavioral study using a similar paradigm (Heusser et al., 2018). The participants were members of the New York University community, with normal or corrected-to-normal vision. They were screened to ensure they had no neurological conditions or any contraindications for MRI. The participants provided written informed consent to participate in the study and received a payment at a rate of $30 an hour for their time in the fMRI scanner and $10 an hour for their time outside the scanner. The study was approved by the New York University Institutional Review Board.
Materials
The stimulus set consisted of 144 grayscale images of nameable objects on a white background that were selected from a pool previously used in similar studies in our lab (Heusser et al., 2016, 2018). The images were resized to 350 × 350 pixels, such that the object occupies as much as possible of that square, and the rest of the square was white. During the list-learning phase, objects appeared in the center of a colored squared frame. The size of the color square was 600 × 600 pixels in total, with the object image covering 350 × 350 pixels in the center of the square, leaving a frame of 125 pixels around the image of the object. We used six colors in the study (brackets refer to the RGB values of the colors): red [255,0,0], green [51,221,0], blue [0,0,255], yellow [255,255,0], orange [255,127,0], and magenta [255,0,255], which were set based on Matlab’s 2018 default settings. The background of the screen was set to gray [128,128,128] during all tasks.
Experimental design
Overview
The experiment was divided into 2 consecutive days that were identical in their structure. On each day, participants learned three lists of objects. Each list included different objects. During list learning, each list was repeated 5 times with identical and immediate repetitions. Before and after the five presentations of each list, all objects of that list appeared twice in random order. The random presentation after the list learning was followed by a temporal memory test for the order of the objects as they appeared during the list learning. On Day 2, after the temporal memory test of the third list, participants’ memory of the object–background color association was tested, for all objects presented on both days. All phases were conducted in the scanner and were controlled by Matlab (R2018b), using Psychtoolbox 3 extensions (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007). On each day, participants received detailed instructions on all phases and practiced them before they entered the scanner and once more in the scanner before performing the tasks. Upon completion of all tasks on Day 2, the participants left the scanner and were shortly debriefed. In the current study, we focused on fMRI data from hippocampal subfields in the list-learning phase and behavioral data from the list-learning and the temporal memory test (Future analyses of this multitask design include analyses of temporal memory, color memory, and pre- and post-similarity from the random-order viewing, as well as analysis of cortical areas in the list-learning task). Below, we provide details on each of these tasks.
List learning
Participants learned six lists, three on each day of the experiment. In each list, participants intentionally encoded 24 grayscale objects that appeared sequentially, embedded in a colored frame (Fig. 1A). Objects were unique to each list. On each trial, participants had to visualize the object in the color of the frame and indicate by a button press whether the object–color combination was pleasing or unpleasing to them. Participants indicated by the index or middle finger of their left hand (counterbalanced across participants), using an MRI-compatible response box. The participants were informed that the decision is subjective, that there are no right or wrong answers, and that making this task will help them to remember the color in a later memory task. Each object appeared on the screen for 2 s and was followed by a 3.5 s interstimulus interval (ISI) and a 0.5 s fixation cross before the onset of the next trial. The white background and the colored frame remained on the screen continuously (i.e., during the trial and the 3.5 + 0.5 s ISI and fixation). Participants were asked to make their judgment when the object was still on the screen, although responses were collected also for 1 s after the object was removed from the screen, to allow late responses.
The fixed inter-trial interval (ITI) during list learning was chosen for two reasons: first, different from previous fMRI studies that were designed to test specific pairs of items (DuBrow and Davachi, 2014; Ezzyat and Davachi, 2014), we designed the study to look at the similarity between multiple pairs of items in a list. This was done to enable testing our main hypotheses regarding the clustering of multiple items based on events, as well as the temporal differentiation analysis between different pairs of items within and across events. Altering the ITI between items would compromise our ability to compare pairs of items with different objective time gaps since it would have induced variability in both behavior and fMRI signals. We reasoned that measuring and including many pairs might be especially useful for representational similarity analysis (RSA) in small hippocampal subfields to overcome SNR limitations. Thus, we followed previous MVPA studies, including RSA in hippocampal subfields, that used fixed ITI (Kuhl et al., 2012; Kuhl and Chun, 2014; Kim et al., 2017) and included a rather slow ITI (4 s, along with 2 s item presentation, which makes a 6 s ISI). Second, fixed ITI allows straightforward background connectivity analysis, because trial-evoked activity can be filtered out by using a bandpass filter that removes activation corresponding to trial frequency (see below).
The color of the frame was identical for four consecutive objects, before switching to another color for the next four objects. The color of the frame switched to the other color simultaneously with the presentation of the first object in an event. This made every four objects an event with a shared context, as manipulated by the frame color (Heusser et al., 2016, 2018). We refer to the first object in an event as the boundary object, appearing in Event position 1, followed by nonboundary objects in Event positions 2, 3, and 4. This operationalization of a sequence of items with a stable context as an “event” is in line with our conceptualization in the Introduction and with vast prior literature (DuBrow and Davachi, 2013, 2014, 2016; Ezzyat and Davachi, 2014, 2021; Heusser et al., 2016, 2018 ; Sols et al., 2017; Clewett et al., 2020; Franklin et al., 2020). There were six events in each list; thus, in total, participants learned 36 events, in 6 lists. Within each list, the frame alternated between two colors (i.e., Events 1, 3, and 5 were of one color, and Events 2, 4, and 6 were of the other color in the color pair). We used pairs of highly contrasting colors to elicit clear events: red–green, yellow–blue, and orange–magenta. Each of these three color pairs appeared in one list on each day. The allocation of each color to odd-numbered events versus even-numbered events was randomized for each participant, with the restriction that if a color appeared in odd-numbered events on the first day of the experiment, it would appear in the even-numbered events when the same color pairs appear in the second day of the experiment, to avoid having identical order of colors across lists (objects were always different across lists). The allocation of each color pair to the first, second, or third list on each day was also randomized per participant, with the restriction that the first list on Day 2 cannot include the same color pair as the first list on Day 1, to prevent participants who might have remembered the color pair from the first day to believe that they are going to see exactly the same experiment again on Day 2. To encourage associative binding of the objects, we instructed the participants to make stories linking the objects together and informed them that this would help them to correctly remember the order of the objects in the following temporal memory test (DuBrow and Davachi, 2013). The participants were also provided with an example. Critically, we did not inform or guide them regarding the different events in making the stories. Thus, for each trial, participants made judgments about the object–color combination, typically within the 2 s of object presentation (see Results), as well as created the stories, which they could do during the object presentation and the 4 s ITI (total of 6 s). In a later debrief, participants reported they had successfully performed the task and had completed their stories after a couple of repetitions. Each list was presented five times, with immediate and identical repetition. That is, the same 24 objects (6 events) repeated in exactly the same order, with the same color frame. Each repetition was scanned in a separate scan, which began with a 2 s fixation cross (no background and color frame) and ended with a 12 s fixation cross, appearing immediately after the removal of the last object of the screen, to allow estimation of the BOLD response of the last trial.
Temporal memory test
After the five repetitions of each list, participants were tested on their memory for the order of the objects (Fig. 1C). On each trial, participants were first presented with an image of an object from the list, cueing them to recall the following object in the list and hold it in memory. All objects appeared without the colored frame in the memory test. The cue appeared in the center of the screen for 3 s and was followed by a 5 s white fixation cross appearing in the center of the screen. A white fixation cross was used during the delay to orient participants that they still needed to hold the following object in memory. Other fixation crosses during this task were black. Then, two objects appeared on the screen next to each other, a target object and a distractor. Participants had to indicate which object immediately followed the cue object during the list learning and rate their confidence. The four response options: “left sure,” “left unsure,” “right unsure,” and “right sure” appeared below the corresponding object on the left/right, and participants indicated their response by pressing one of the corresponding four buttons. The participants were asked to respond as quickly as possible without sacrificing accuracy. They were further encouraged to recall the target object during the appearance of the cue and maintain it in their memory because they might not have sufficient time to recall once the two probes appear on the screen. The appearance of the target on the left or right side was randomized per participant and per list. The distractor was always the object that immediately followed the target object. We used this fine-grained test in which participants had to arbitrate between two very near objects to make sure that our lists were well learned.
We had three types of trials: within-event boundary trial, in which the cue was the boundary object, and the target was the object in Position 2 (6 trials per list, a total of 36 trials in the experiment). This is a within-event trial because both the cue and the target appeared in the same event; thus, it tests temporal memory for occurrence within a single event. As within event nonboundary trials, we had trials in which participants were cued with the object in Position 2, and the target was the object that appeared in Position 3 (6 trials per list, total of 36). We also had across events trials, in which participants were cued with the last object in an event (Position 4), and the target item was the first object of the next event (5 trials per list, total of 30). Critically, all objects appeared with an identical temporal distance (i.e., the target immediately followed the cue) during list learning, and the objects were presented during retrieval with no color frame. Thus, the within-event and across-event trials were identical, only that in across-event trials, participants had to retrieve the target from the next event. The order of the trials was pseudo-randomized per participant and per list, such that no object was repeated (as a cue, target, or distractor) with a gap of less than one trial. We also verified that the same event was not tested in any two consecutive trials (for across events trials, that meant that objects from the event that the cued belonged to, and objects from the event that the target belonged to, did not appear in the previous or next trial). The two probe objects appeared on the screen for 3 s and were followed by a 1 s fixation cross, during which responses were still collected. This was followed by an ITI of an additional 3/5/7 s (average 5.25 s). Here we included jittering between trials since each trial was largely independent from other trials. To avoid variance in behavior and neural signal across different pair types, the ITI between the cue and the appearance of the targets was fixed.
During this, participants performed a secondary task: an arrow appeared in the middle of the screen randomly altering between left and right every second. Participants had to indicate the direction of the arrow by a button press. This task was selected to reduce baseline activation levels in the hippocampus between trials, to improve signal estimation (Stark and Squire, 2001; Duncan et al., 2012; note that in the current study, we only report behavioral data from the temporal memory test; this optimization was done for future studies). After the arrows, there was a 1 s fixation, which, 400 ms before the onset of the next trial, disappeared for 100 ms and appeared again for another 300 ms. This blinking fixation was meant to prepare participants for the next trial, which began with the presentation of the cue. This task also included a 2 s fixation cross before the beginning of the first trial and a 13 s fixation cross after the two probes of the last trial appeared, to allow estimation of the BOLD response of the last trial.
fMRI parameters and preprocessing
Participants were scanned in a 3 T Siemens Magnetom Prisma scanner using a 64-channel head coil. The experiment included an MPRAGE T1-weighted anatomical scan (0.9 × 0.9 × 0.9 mm resolution) and a T2-weighted scan (0.9 × 0.9 × 0.9 mm resolution) collected at the beginning of Day 1, before any task. Two pairs of fieldmap scans were acquired on each day (in each pair, one scan was acquired in the anterior-posterior (AP) phase encoding direction, the other in the posterior-anterior (PA) phase encoding direction). The first pair was acquired before the first list and the second before the third list. Each repetition of each list was conducted in a single whole-brain T2*-weighted multiband EPI scan (77 volumes, TR = 2,000 ms, 204 × 204 mm FOV, 136 × 136 matrixes, TE = 28.6, flip angle = 75, phase encoding direction: anterior–posterior, GRAPPA factor 2, multiband acceleration factor of 2). In each volume, 58 slices were acquired tilted minus 20° of the AC–PC, 1.5 × 1.5 × 2 mm (width × length × thickness) voxel size, no gap, in an interleaved order. At the end of Day 2, participants were also given the option of running an additional MPRAGE and T2 scan. These images were collected to ensure anatomical images in case the MPRAGE and T2 images collected on Day 1 were corrupted, but unless mentioned, the Day 1 images were used for the analysis.
The imaging data were preprocessed using the fsl version 5.0.11 (http://www.fmrib.ox.ac.uk/fsl). Images were first corrected for B0 distortion using the acquired fieldmap scans and the top-up function. For optimal correction, we corrected all the scans of each list based on the most nearing fieldmap scans. Thus, the first list on each day was corrected to the first pair of fieldmap scans, while the second and third lists on each day were corrected using the second pair of fieldmap scans. For three participants, we slightly modified the fieldmap scans we used to adapt for movement detected by the experimenter between runs, but importantly, all scans were corrected for distortion. Following distortion correction, volumes that included motion outliers were detected using the fsl_motion_outliers ,nd (framewise displacement, threshold = 0.9). Preprocessing was conducted using fsl’s FEAT (version 6.00). Images were high-pass filtered (0.1 Hz) and motion corrected within the run to the middle volume using MCFLIRT. No smoothing was performed, to protect the boundaries of the hippocampal subfields and since smoothing is unnecessary for RSA (Kriegeskorte et al., 2006, 2008). All analyses were conducted in each participant’s native functional space and within each participant’s anatomically segmented ROIs. Note that also for the univariate analysis detailed below smoothing was redundant since we examined the average BOLD signal across all voxels in an ROI.
For the analysis of the functional data, we aligned the functional images (for RSA) and the resulting t statistics of the GLMs (for univariate analyses) in each scan to the same functional space within the participant, by registering them to a template EPI image. To create this template, for each participant, we averaged together the fieldmap-corrected EPI reference images of the first scan in each day using fsl’s AnatomicalAverage ,nd, which registers and averages both images (the reference images are single volume images created by the scanner before the beginning of the scan, with identical parameters to the functional scan). This averaging across images of both days was done to avoid biasing the registration of images toward one day of the experiment. The MPRAGE image of each participant was registered to the participant’s template EPI image using fsl’s BBR registration. The transformation matrix created was then used to register the anatomical ROIs to each participant’s template EPI image, which is the participant’s native functional space.
Regions of interest
The hippocampal subregions, namely, CA3, DG, and CA1 (in each hemisphere), as well as CSF and white matter ROIs to be used as nuisance regressors, were automatically segmented for each participant by FreeSurfer 6.0 (Iglesias et al., 2015), using both MPRAGE T1-weighted and the T2-weighted images (version 6.0 and the inclusion of the T2-weighted address the criticism raised regarding previous versions of Freesurfer, Wisse et al., 2014). We acknowledge that CA3 and DG are difficult to distinguish (Wisse et al., 2017) and that functional fMRI studies using 3 T scanners have so far collapsed across these ROIs (Kyle, 2015; Tompary et al., 2016; Dimsdale-Zucker et al., 2018), preventing investigation of their different function. We leveraged FreeSurfer 6.0’s automatic segmentation and took a novel and automated approach to register the hippocampal subfields to each participant functional space, with the goal of retaining voxels from small hippocampal subfields to allow RSA (see below).
The anatomically segmented ROIs are masks that include 0 or 1 values (1 marks a voxel included in that ROI). When registering these ROIs to the functional space, the alignment and resampling procedure might mix voxels together, such that voxels can have values that are between 0 and 1. These values reflect how much of a registered voxel originated in voxels of a value of 1, which means voxels that were a part of the anatomical ROI (note that registering the EPI image to the anatomical space creates the same mixture, only without knowing whether or how voxels are mixed because the original values are not given in 0 or 1. Critically, this further manipulates the functional data. Thus, we registered the anatomical ROIs to the functional space). Per each participant and each hippocampal subfield produced by Freesurfer’s anatomical segmentation, we first registered the anatomical ROI mask to the participant’s native functional space and thresholded the masks to values of 0.25. These registered images were then masked by a hippocampus ROI, to ensure that only hippocampus voxels were included in our subfield ROIs. The hippocampus ROI was created by combining all hippocampal subfields detected by Freesurfer, excluding the hippocampal fissure, the fimbria, and the hippocampal–amygdala transition area. Finally, for each voxel, we assigned the subfield that had the largest value, meaning that this voxel originated mostly in that subfield. We reasoned that this approach allows us to functionally dissociate hippocampal subfields (even if not fully distinguish between them) while having the obvious benefits of automatic segmentation and registration procedure, namely, being reproducible and efficient. Indeed, in both the univariate and the RSA results (see Results), we were able to dissociate CA3 from DG, demonstrating that our methodology was successful in revealing functional dissociation between these subregions.
The white matter ROI was aligned to each functional scan, and the mean time series was extracted from each preprocessed scan using the fslmeants ,nd. To create a confined CSF ROI (Bartoň et al., 2019), we identified, for each participant, a voxel in each hemisphere at the center of the central part of the lateral ventricle and blew a 3 mm sphere around that voxel (this was done directly on the template EPI image to make sure we detect the desired voxel in the final functional space and since the ventricles are clearly visible on EPI). We then masked that ROI with the CSF ROI created by Freesurfer, to further ensure that only CSF voxels were included. The resulting ROI was registered to each functional scan, and the mean time series was extracted from each preprocessed scan using the fslmeants ,nd.
Quantification and statistical analysis
Throughout the analyses of both the behavioral and the neural data, when we used mixed-level models, they were implemented using the lme4 package in R (Bates et al., 2015; R core team, 2018). We used the chi-square test for statistical comparisons of the models with versus without the effect of interest (details below for the effects included in each analysis), and AIC was used for model selection. As all our analyses are within-participant, all models included a random intercept per participant. The models further included a fixed effect of the list (as a factor) to capture variance related to the different lists. All analysis decisions detailed below were made a priori based on prior literature and recommendations. Generally, ANOVAs were a priori used when the measure was calculated per participant (accuracy and confidence rates, univariate fMRI analyses) whereas mixed-effects models were used when recommended to analyze single-trial data (RTs, Lo and Andrews, 2015; and RSA, Dimsdale-Zucker and Ranganath, 2018). No further statistical analyses were performed on these data.
Behavioral data: list learning
To examine longer RT for boundaries across repetitions, we analyzed reaction time (RT) in making the pleasant/unpleasant task. No responses, too quick responses (below 100 ms), and RTs that were 3 SD above or below the average per participant, list, and repetition were excluded from the analysis. On average, 3.19% of the responses were excluded per participant. We also excluded the first response in each list, which was expectedly much higher compared to the rest of the list. Single-trial RTs were scaled without centering and entered as the predicted variable to general linear mixed models (gLMM, as implemented by the glmer function) with inverse Gaussian distribution as the linking distribution function (Lo and Andrews, 2015). We examined the effects of boundary (boundary/nonboundary objects), repetition (1–5 as a continuous variable), and their interaction.
Behavioral data: temporal memory test
To test whether memory differed as a function of event position, and whether participants had to bridge across events or not in memory, we analyzed accuracy rates, high-confidence hit rates, and RT. Accuracy and high-confidence hit rates were calculated per participant and entered into a repeated-measure one-way ANOVA with Event position (Pos1-2, Pos2-3, Pos4-1; Pos1-2, e.g., indicating that participants were cued with an object learned at Event position 1, while the following object in Event position 2 was the target object; Pos1-2 and Pos2-3 thus tested memory within events, while Pos4-1 trials tested memory across different events; see above). The ANOVA was implemented using the aov function in the R stats package, including a within-participant error term, that is, participant/event position. To examine RTs, single-trial RTs of high-confidence hits were entered as the predicted variable to gLMM (as implemented by the glmer function) with inverse Gaussian distribution as the linking distribution function (Lo and Andrews, 2015). We used only high-confident hits to avoid mixing together different responses and since high-confidence responses were the vast majority of participants’ accurate responses (see Results). The effect of event position was tested using all positions in one model, as well as planned comparisons examining the different pairwise comparisons between positions.
Univariate activation
To characterize whether the average BOLD signal differed between CA3 and DG during event learning, preprocessed data in each repetition in each list were entered into a voxel-based GLM as implemented by fsl’s FEAT (version 6.0). We had a regressor per each event position (four regressors), which included all the trials in the relevant event position, modeled with 2 s boxcars locked to the onset of each trial that was convolved with FEAT’s Double-Gamma HRF function. As nuisance regressors, we included the six motion regressors as outputted from MCFILRT (three for translation and three for rotation, in each of the x,y, and z directions of movement), the average time series for white matter, and the average time series of CSF, as well as a regressor for each time point that was detected as outlier motion by fsl_motion_outliers (this regressor has 1 for the outlier time point and 0 for all other time-points). To average across different lists within each participant, we registered first-level outputs to each participant’s native functional space (as detailed above) and ran a second-level (participant level) GLM that averages across lists and thus results in a t statistic reflecting activation per voxel per event position in each repetition.
For group-level analysis, the t statistics were averaged per participant and per ROI and hemisphere. We quantified each hemisphere separately since previous investigations reported lateralized hippocampal effects (Schlichting et al., 2015; Kim et al., 2017; Dimsdale-Zucker et al., 2018; Bein et al., 2020). The mean activation levels were entered into a repeated measures ANOVA with the factors of event position (1–4), repetition (1–5 as a continuous variable), ROI (CA3/DG), and hemisphere (right/left), implemented by aov function in R, including a within-participant error term (participant/event position × repetition × ROI × hemisphere; note that the aov function wraps around the lm function in R). Follow-up repeated measures ANOVAs were conducted within ROI, with the factors of position and repetition and the relevant within-participant error term. Partial eta square was estimated as effect size where relevant (sjstats package; Lüdecke, 2020).
Item–item RSA in time and context
To test the main hypotheses regarding learning event representations in CA3 and DG, we conducted RSA (Kriegeskorte et al., 2006, 2008). To remove nuisance artifacts, preprocessed data was entered into a voxel-based GLM (implemented by fsl’s FEAT) with the nuisance regressors as described above: six motion regressors as outputted from MCFILRT (three for translation and three for rotation, in each of the x, y, and z directions of movement), the average time series for white matter and the average time series of CSF, as well as a regressor for each time point that was detected as outlier motion by fsl_motion_outliers (default setting; this produces a regressor that has 1 for the outlier time point and 0 for all other time-points per each outlier time point). The residuals of this model were then registered to the template EPI, and the time course of each voxel within each ROI (CA3, DG), was extracted and z-scored (Kuhl et al., 2012; Kuhl and Chun, 2014). The third TR after the onset of each trial (starting at 4 s after the onset of the trial, capturing the peak of the HRF according to fsl’s HRF function, which is ∼4 s after the onset; Kim et al., 2017), was extracted from all voxels within an ROI to make the activity pattern corresponding to each trial. Note that this TR is before the appearance of the next trial, ensuring that patterns are not contaminated by activity related to the next trial. Additionally, due to our relatively long ISI (6 s), this time point is 10 s after the onsets of the previous trial, when most of the BOLD response for the previous trial already subsided, allowing a rather clean estimation of activation. To clean the data for RSA (as recommended in Dimsdale-Zucker and Ranganath, 2018), and particularly because correlations are susceptible to outliers, we a priori decided to remove voxels that demonstrated activation levels exceeding ±3SD of the mean within each participant and each ROI (across all voxels in an ROI in all lists and repetitions) in a specific trial were a priori removed from all analyses involving this trial (on average across participants, 0.2% of voxel responses were excluded in each ROI).
Within ROI and participant, we then calculated Pearson’s correlation between activity patterns corresponding to each trial and the activity pattern corresponding to all other trials within a list and repetition, and Fisher transformed these correlation values. In the current study, we specifically targeted the changes in similarity due to learning, or repetition. Thus, for each correlation value, we subtracted the value of the first presentation. This allowed us to control for correlations between activity patterns that result from characteristics of the BOLD signal and processing steps (Schapiro et al., 2012; Mumford et al., 2014; Schlichting et al., 2015), which is specifically important when comparing patterns across different temporal gaps, and when the order of the trials cannot be randomized, as in our design (Mumford et al., 2014). Since the lists were identical in all repetitions, subtracting the correlation value of the first presentation from the correlation values obtained in the following repetitions alleviates the concern that our results may reflect spurious correlations. This decision was additionally informed by a preliminary step in which, to detect potential biases in similarity values, we plotted the similarity matrices across all trials in each repetition in a CSF ROI and saw some consistent biases in all repetitions. Importantly, viewing the matrices was done with all items, prior to running the main analyses comparing trials of interest. Outlier correlation values (±3 SD of the mean within each participant) were a priori excluded from the analysis following recommendations to clean data for RSA (Dimsdale-Zucker and Ranganath, 2018), and the similarity values were entered into group analysis.
The group analysis was conducted by submitting the pairwise similarity values to linear mixed-level models. We took the similarity values for temporal distances of one/two/three items, i.e., the similarity between an object and the immediately following object (lag of 1), as well as the next two objects to follow (lag of 2 and 3) in all repetitions. That is because larger distances did not occur within the event, only across events. In all models, we included a fixed effect of the list and a random intercept per participant.
Are event representations different between CA3 and DG?
In addressing this question, we treated hemispheres separately due to previous research that reported lateralized representational similarity effects in hippocampal subfields (Kyle, 2015; Schlichting et al., 2015; Stokes et al., 2015; Kim et al., 2017; Dimsdale-Zucker et al., 2018; Bein et al., 2020). Thus, the similarity values per pair of objects in Repetitions 2–5 were entered into a mixed-effects linear model that examined the interaction between event (within event/across events), ROI (CA3/DG), and hemisphere (right/left). Since we observed a significant effect of hemisphere (see Results), we continued to test for the interaction of event by ROI in each hemisphere. Since an interaction of event by ROI was only found in the left hemisphere (see Results), the following analyses were only conducted in the left hemisphere.
How do event representations change through learning within each CA3 and DG?
Within each of the left CA3 and DG ROIs, we tested the effects of event, temporal distance, and repetition, and the interaction between these factors (in the analyses that include repetition, we included the similarity values in all Repetitions 1–5 without subtracting the first presentation, as the effect of repetition or the interaction with repetition directly test the change across repeated presentations of the lists). To further examine how learned similarity values may change from early to later in learning, we tested the effect of event, temporal distance, and their interaction in the second or the fifth repetition. Since the observed similarity effects in both CA3 and DG were most pronounced in the fifth repetition, further analyses were limited to the fifth repetition. Further, in all models that did not include temporal distance as an effect of interest, we controlled for temporal distance by including this factor as an explaining variable in the model. Likewise, when testing for the effect of temporal distance, the effect of event was included in the model to control for within versus across events differences.
Are CA3 event representations related to segmentation behavior?
To foreshadow, we found an “event representation” in the left CA3 in the fifth repetition: higher similarity within compared to across events (see Results). We then examined whether left CA3 event representations correlated with behavioral segmentation as participants were learning the lists. To that end, we took the difference between within-event and across-event similarity in the left CA3 in the fifth repetition, per participant per every two consecutive events, as well as the difference between boundary (Pos1 of the second of the two events) versus pre-boundary RTs (the preceding Pos4, belonging to the first of the two events). These similarity differences were then entered as an explaining variable to a mixed-level model, while the RT differences were the predicted variable (linear models were used because the RT difference distribution was normal, not requiring general linear models). These difference scores control for baseline differences and examine whether similarity differences correlate with the relative increase in RTs for the boundary item. The similarity values were scaled within participant, and as before, a fixed effect of list and event order, and a random effect per participant were added. We then followed up on this analysis by examining separately each of the values comprising these difference scores, namely, we examined whether across-event or within-event similarity correlated with RTs of the boundary item (Pos1). Likewise, we examined whether across-event or within-event similarity correlated with pre-boundary RTs (Pos4 of the first of each two events).
Control for a color background effect in similarity
As mentioned above, we designed our study in a way that allows us to examine a color effect irrespective of belonging to the same sequential event, because we alternated between two background colors in each list. To examine color effects, we compared a model with a regressor noting “1” for each similarity value between items presented with the same background color and “0” for each pair of items presented with different colors, to a model without this regressor. We excluded items within the same event or across adjacent events because these might show higher or lower similarity because they belong to the same or to the adjacent sequential event, which, in turn, could artificially contribute to a color effect, given that all items in the same event share the same color and all items from adjacent events are of different background colors. Temporal distance and list were accounted for in the model as was done above.
Control for univariate activation
Note that Pearson’s correlation measure subtracts the mean and, therefore, accounts for differences in mean activation. Nevertheless, to further establish that differences in representational similarity are not mere reflections of differences in univariate activation, we repeated the main analyses but added univariate activation (the difference in univariate activation between the fifth and the first repetition) as an explaining variable to the statistical models. Since each similarity value is the correlation between two items, we took the univariate activation difference per each trial, for each of the corresponding two items in the similarity analysis, as well as the interaction between them. This made a total of three univariate activation explaining variables added to our mixed-level linear models: item 1, item 2, item 1 × item 2. We then repeated these analyses using raw activation values in the fifth or the first repetition rather than the difference from the first repetition.
A closer look into voxel populations: additional similarity measures
We also assessed whether different populations of voxels, rather than changes in level of activation (potentially in a subgroup of neurons), may underly similarity differences. To that end, we adopted two known similarity measures for human fMRI that have been previously used to investigate hippocampal representations in rodents (Walther et al., 2016; Madar et al., 2019). These are the normalized dot product (ndp) and the vector-norm difference (Eqs. 1, 2; X and Y refer to two activity patterns). The ndp is the sum of the product of each pair of parallel voxels in the two vectors (i.e., dot product), normalized by the norms of the two vectors (the norm of each vector is the dot product of the vector by itself, which gives the total length of the vector). The ndp can be thought of as the cosine of the angle between two vectors (i.e., higher ndp means more similar patterns). The vector–length difference is the difference between the norms of two activity patterns. Note that since each voxel is multiplied by itself, negative activation values will become positive and contribute to a larger norm of a vector. Thus, this measure is not an average activation level, but rather the sum of the magnitude of changes in activation level, regardless of the direction:
A dynamic inhibition model for temporal differentiation in DG
As a preview, in the left DG, we found that learning led to lower similarity between temporally proximal items that appeared in the same event context. We formalized our hypothesized inhibitory dynamics to account for this observed lower similarity. Specifically, we propose the DG neurons that are activated for one item (e.g., item n) become inhibited for some time such that they are not active when the immediately following item (n + 1) appears. This results in other neurons representing the n + 1 item, and in a low overlap between activity patterns of temporally adjacent items (items n and n + 1). Inhibition then gradually decays, so that for the following item (n + 2), some of the neurons can overcome this inhibition and are active again, resulting in a slightly higher correlation between items n and n + 2. By item n + 3, inhibition further decays such that similarity levels are high again and reach the levels of between-event similarity. Between events, inhibition does not come into play because items can be distinguished based on perceptual features alone, thus DG differentiation is unnecessary.
We simulated these dynamics, with the primary goal of illustrating that the temporally decaying inhibition could, in principle, result in the pattern of the data we observed. We do not aim to offer a comprehensive mechanistic account for temporal differentiation in DG (this would be beyond the scope of the current paper; see Discussion). We simulated activity patterns of two continuous events, to examine similarity values of all temporal distances within and across events. Patterns were set as vectors of 0 (inactive units) or 1 (active units). The portion of active units for each simulated activity pattern was set to 0.1 or 0.2, reflecting low levels of activation in DG (Jung and McNaughton, 1993; Chawla et al., 2005). Inhibition levels in the model determine the rate of units from a pattern of a previous item that is not allowed to be activated for a current pattern. Inhibition level linearly increased across repetitions, and exponentially decayed in time between items in each repetition, based on Equation (1):
The simulation starts with the first presentation (Rep1), allocating a random activity pattern to the item at Position 1 in the event. Then, for each of the following items within the event, a random activity pattern is generated, with inhibition levels determined based on Equation (1), setting the rate of units from the pattern of the previous item that is not allowed to be activated in the pattern of the current item. For each item, inhibition of units in previous patterns in the event is enforced. Precisely, for an item in Position 2, units from Position 1 are inhibited. For an item in Position 3, units from Positions 1 and 2 are inhibited, each at different levels based on Equation (1), and for an item in Position 4, units from Positions 1, 2, and 3 are inhibited, each at different levels based on Equation (1). Then, Item 1 of the next event is presented to the model and the process starts again, by allocating a random activity pattern to Item 1. This effectively turned off inhibition between events.
For each of the next repetitions, the model would attempt to activate the pattern set for each item in the previous repetitions, reflecting memory reactivation (Danker and Anderson, 2010; Ritchey et al., 2013; Tompary et al., 2016). Because inhibition levels increase across repetitions, that could mean that some overlap that was permitted in previous repetitions might be too high in the current repetition and some of the previous units should now be inhibited (their activity set to 0). To implement that, from the overlapping units between the reactivated pattern and the pattern of the previous item being examined, we randomly sampled the minimal number of units that should be set to 0 to answer current levels of inhibition and set these units to 0. Next, we computed Pearson’s correlation between simulated patterns, based on temporal distance and within and across events, as was done for the empirical data.
Background connectivity analysis
We estimated learning-related changes in background (low frequency) connectivity between CA3 and DG. This approach was previously used to study changes in connectivity between hippocampal subfields at different phases of learning (Tambini et al., 2010; Tompary et al., 2015; Duncan et al., 2014). To remove from the BOLD signal activity directly related to items presented in the task as well as other nuisance factors, we used the residuals of the GLM that modeled univariate activation in each list and repetition of the task (see above), and which included a regressor per each event position, and nuisance regressors of CSF and white-matter activity, along with six movement regressors (Duncan et al., 2014). These residuals, reflecting fluctuations in the BOLD signal that are not tied to items in the task, were bandpass filtered into the 0.01–0.035 Hz range using AFNI’s 3dBandpass function (Newton et al., 2011; Duncan et al., 2014). This upper threshold is within the range of correlations between gray matter areas in fMRI (Cordes et al., 2001) and further ensures that low-frequency fluctuations do not reflect the event structure of the task, as events changed every 24 s or 0.04 Hz. To capture changes in connectivity due to learning, the average time course per each list in the first and last presentations (first rep, fifth rep), was extracted per each ROI and participant, and a Pearson’s correlation was computed between the time course of CA3 and DG. The resulting correlation coefficients were Fisher-transformed and averaged across lists per repetition. Thus, they reflect the connectivity between CA3 and DG across each repetition. We used a paired-sample t test to compare the first presentation (first rep) and the fifth (fifth rep). Thus, we directly tested the difference in connectivity between repetitions, which account for baseline connectivity levels between CA3 and DG.
Results
Behavioral measures of learning
Participants were exposed to repeating sequences of objects superimposed on a colored frame and were asked to make judgments about the pleasantness of the object–color combination for each object. The color of the frame changed every four objects, representing a change in context that was operationalized as an event boundary (Fig. 1A; Heusser et al., 2016, 2018). Previous studies have shown that, for once-presented events, processing time increases at event boundaries, reflecting a cost to transitioning to a new event (Zwaan, 1996; Pettijohn and Radvansky, 2016; Heusser et al., 2018). We investigated whether this marker of event segmentation persists with repetitions, indicating that participants continue to segment repeated events. Single-trial RTs for the pleasantness judgments were entered into mixed-level general linear models (Lo and Andrews, 2015), testing the effects of boundary, repetition, and their interaction. These models revealed a strong effect of repetition, reflecting that participants got significantly faster over repetitions (Fig. 1B), indicating learning (
After learning each list, we tested temporal memory for adjacent items from the list using a serial recognition memory test. In each trial, participants were cued with an object from the list and asked to select from two options the object that came next in the list (Fig. 1C). We tested temporal memory for three types of pairs: Pos1-2, Pos2-3, and Pos4-1 (the first number represents the event position of the cue, and the second number the event position of the target). As expected from presenting the sequences over five repetitions during learning, participants were highly accurate and confident in their responses, indicating that the lists were well learned (accuracy: Pos1-2 M = 88.24, SD = 11.91; Pos2-3 M = 88.24, SD = 10.33; Pos4-1 M = 88.33, SD = 10.74; high-confidence hits: Pos1-2 M = 81.20, SD = 18.35; Pos2-3 M = 80.96, SD = 18.20; Pos4-1 M = 88.33, SD = 18.60; rate calculated out of total responses) (Between the last repartition of each list and the temporal memory participants observed all items in random order. We expected that given the extensive training that included many repetitions, and since we presented all items regardless of events, this will have minimal influence on behavior. Indeed, we observed high accuracy rates and confidence levels). No difference was observed in accuracy or high-confidence rates for the three memory trials (one-way repeated measures ANOVA with the factor of event position: accuracy, F(2, 58) = 0.003, p = 0.997; high-confidence, F(2, 58) = 0.39, p = 0.68).
The event structure of the task was evident in participants’ RTs during retrieval (Swallow et al., 2009; Radvansky and Zacks, 2017). Interestingly, RTs were significantly slower for Pos4-1 trials, when participants had to make a temporal memory judgment for items that were from adjacent events, compared to those drawn from the same event (Pos1-2, Pos2-3; Fig. 1C). Single-trial RTs were entered into a mixed-level general linear model (Lo and Andrews, 2015), which revealed a strong effect of event position (Pos1-2/Pos2-3/Pos4-1;
Univariate activation decreases with learning in DG, but not in CA3
To establish that CA3 and DG can be dissociated in human fMRI, we first examined univariate activation (average fMRI BOLD response) in DG and CA3 over the course of learning. To this end, the univariate activation estimates during learning (t statistics resulting from a standard GLM analysis, see Materials and Methods: univariate activation) per each event position and repetition and per each participant, averaged across all voxels in each ROI, were entered into a repeated measures ANOVA with the factors of event position (1 through 4), repetition (1 through 5), ROI (CA3/DG), and hemisphere (right/left). We found a main effect of ROI (F(1, 29) = 7.69, p = 0.0096,
We followed up on the ROI by repetition interaction by testing the effect of repetition within each ROI, collapsed across hemispheres. Activation levels in each ROI were entered into a repeated measures ANOVA with event position and repetition. We found a significant main effect of repetition in DG (F(1, 29) = 5.577, p = 0.025,
Different event representations in the left CA3 versus DG
To test our main questions regarding neural representations in CA3 and DG, we computed the similarity between multivoxel activity patterns corresponding to sequential objects across learning repetitions (Fig. 2). To specifically target changes in multivariate patterns corresponding to the learning of event structure, we used the first presentation of each list as a baseline and subtracted the similarity values for the first presentation from those measured during all subsequent presentations (Schapiro et al., 2012; Schlichting et al., 2015). This allowed us to specifically target learning-related changes in pattern similarity while controlling for correlations that might stem from the BOLD signal characteristics and analysis steps, which are identical in all repetitions (Schapiro et al., 2012; Mumford et al., 2014; Schlichting et al., 2015; Methods). Similarity values were computed for three different temporal distances (lag 1, 2, and 3 items), that is, the similarity between an object and the immediately following object (lag 1), or the object presented two or three trials later, with lag 3 being the maximum temporal distance within an event. These similarity values between objects that appeared in the same event (within event) could therefore be compared to similarity values between objects with the same temporal distance but from different events (across events; Fig. 2, Materials and Methods: Item–item RSA in time and context). The similarity values per pair of objects were entered into a mixed-effects linear model that examined the interaction between event (within event/across events), ROI (CA3/DG), and hemisphere (right/left). Hemispheres were treated separately due to previously reported lateralized representational similarity effects in hippocampal subfields (Kyle, 2015; Schlichting et al., 2015; Stokes et al., 2015; Kim et al., 2017; Dimsdale-Zucker et al., 2018; Bein et al., 2020) as well as in hippocampal event representations (DuBrow and Davachi, 2013; Ezzyat and Davachi, 2014; Hsieh et al., 2014; Heusser et al., 2016). The three-way interaction between event, ROI, and hemisphere was significant (
Event representations in the left CA3
To further characterize the nature of the differences between CA3 and DG, we first focused on the left CA3. Following previous models of CA3 (Hasselmo and Eichenbaum, 2005; Howard et al., 2005; Kesner and Rolls, 2015), we propose that contextual stability within the same event together with a shift to a different activity pattern between events can be a mechanism by which attractor dynamics lead to integrated event representations in CA3 (Horner and Burgess, 2014; Davachi and DuBrow, 2015; Horner et al., 2015; Clewett et al., 2019; Liu et al., 2022). Our prior work has shown that the stability of hippocampal representations correlates with temporal memory for event sequences (DuBrow and Davachi, 2014; Ezzyat and Davachi, 2014). Consistent with this, it has also been shown that the magnitude of activation in CA3 during retrieval is related to holistic event retrieval success (Horner et al., 2015; Grande et al., 2019). Thus, we hypothesized that CA3 will integrate representations of information occurring within the same event while separating across different events.
Pattern similarity values for objects with all temporal distances (lag 1, 2, 3), within and across events, were entered into mixed-level models testing the effect of event (within/across), temporal distance (lag 1, 2, 3), and their interaction (first irrespective of repetition). In CA3, we found a main effect of event (within vs across:
CA3 event representations facilitate transitioning between events
If event representations in CA3 distinguish nearby events by allocating them to distinct neuronal ensembles, what consequences, if any, might have on behavioral segmentation? As mentioned above, RTs at event boundaries items are slower compared to nonboundary items, which has previously been interpreted as a cost to transitioning between events (Pettijohn and Radvansky, 2016; Heusser et al., 2018; Zacks, 2020). We asked whether it is easier to transition to an event that is clear and distinct in its representation from the current event and thus might evoke less confusion. This could mean a relatively small cost to transitioning between these events, reflected in only a small RT increase for the boundary item. Alternatively, shifting between very different representations might render the transition to the next event more difficult, resulting in longer RTs for distinctly represented events. To examine these possibilities, we created a metric of neural clustering to index distinct representations of adjacent events by averaging similarity within each of two adjacent events and subtracting from it the similarity across these same events (see Materials and Methods: Are CA3 event representations related to segmentation behavior?). Neural clustering was then regressed with the transition cost: the difference between the RT to the boundary item in the second of these events and the immediately preceding pre-boundary item (the subtraction measure per event pairs closely controls for any baseline differences across lists or participants; see Materials and Methods). As can be seen in Figure 4, we found greater neural clustering in the left CA3 significantly correlated with less cost at the boundary, namely, a smaller RT difference between the pre-boundary and boundary items in Repetition 5 (
CA3 representations of different events likely reflect different populations of voxels
So far, we see greater CA3 neural pattern within-event similarity compared to across events and that the extent of neural event clustering facilitates event transition, as measured in response times. We next assessed the extent to which these results are consistent with attractor dynamics, in which different events are represented by different populations of CA3 neurons (Treves and Rolls, 1994; Kesner and Rolls, 2015; Knierim and Neunuebel, 2016). In fMRI, the unit of measure is a voxel so we could ask whether different voxels are engaged for distinct events. While the data reported above shows learning-related emergence of event representations, the correlation measure used (and that is prevalent in RSA studies), while capturing changes in the population of voxels activated for an item, can also be sensitive to changes in overall activation level in only a subgroup of voxels. That is because when activation levels in a subgroup of voxels change, the mean activation level across the entire group of voxels changes correspondingly. Consequentially, the distance from the mean of each voxel changes, and with it, the value of the correlation (note that this is true for Spearman’s correlation as well, not just Pearson’s correlation). Thus, we computed additional measures of representational similarity that are (1) sensitive to the overlap in the population of active voxels versus the level of activation, and (2) sensitive to changes in activation levels, regardless of the population of voxels involved.
To that end, we used two known similarity measures, normalized dot product and the norm difference (Walther et al., 2016), that have been previously used in rodent work to examine similarity in hippocampal subfields (Madar et al., 2019). The normalized dot product is the cosine of the angle between two activity patterns (higher means more similar), normalized by their norms (the sum of squared activation values across all voxels), which makes it robust to changes in levels of activation, even in some voxels. The norm difference is the difference between the norms of two activity patterns; thus, it is only sensitive to differences in the level of activation and is blind to the population of voxels contributing to that difference. Briefly, simulations we conducted show that indeed, changing activation levels in some voxels of nonoverlapping patterns influence Pearson’s correlation and the norm difference, but not the normalized dot product, while changing the overlap between patterns while holding activation levels constant influences both Pearson’s correlation and the normalized dot product, but not the norm difference.
Computing the normalized dot product and norm difference on our fMRI data, we anticipated we would replicate our correlation results in the normalized dot product measure but see no differences between the patterns’ norms. Indeed, the normalized dot product replicated our Pearson’s correlation results, finding higher similarity within compared to across events in the fifth repetition (main effect of event, within vs across event:
Temporal differentiation in the left DG that is sensitive to context
As attractor dynamics in CA3 may pull the representations of items from the same event into a more similar space, a complementary mechanism is needed to distinguish these same sequential representations, to allow knowledge of sequential event details. For sequential events that evolve in time, subevents that appear close in time are the most similar in their temporal information and require disambiguation. In rodents, DG lesions have been shown to impair temporal memory (Morris et al., 2013), and, in humans, a prior study showed increased fMRI BOLD signal in CA3/DG in response to changes in the order of sequences of items (Azab et al., 2014), suggesting sensitivity to temporal order. However, neither of these studies has revealed how items are coded in the DG during the unfolding of events. A recent in vitro study found that temporally similar spike trains inputted to DG tissue resulted in divergent DG activity patterns, providing some initial evidence for pattern separation in time (Madar et al., 2019). Some recent work has provided evidence that distinct representations become revealed across learning for similar trajectories but these experiments could not dissociate time from space (Chanales et al., 2017; Liu et al., 2022; Fernandez et al., 2023).
We investigated the following open questions. (1) Does the DG perform temporal differentiation? (2) If it does, is this sensitive to event structure? As above, similarity values between activity patterns of objects with temporal distances of 1, 2, and 3, both within and across events, were entered into mixed-level models. In the left DG, we found a significant interaction between event and temporal distance (
We next examined how temporal differentiation evolved through learning. To that end, pattern similarity values from left DG were entered into a mixed-level model testing the interaction of event, temporal distance, and repetition. This revealed a significant three-way interaction (
Our findings show that DG differentiation is unlikely to happen between items that only share a background color, as the activity patterns of items within event that shared the same background color but had a larger temporal distance (lag of 2 and 3) did not show differentiation. However, to directly examine this, we asked whether, in the fifth repetition, temporal differentiation within event (computed as in the main analysis above) is stronger than differentiation between items that appeared across events with the same background color and event position (e.g., the similarity of items in Event position 1 with the similarity of items in Event position 2 in other events that shared the same background color, and likewise for all items in lag of 1, 2, and 3). While the main analysis that compared across events controlled for objective temporal distance, this across-event comparison reflects larger temporal distance than within events but controls for background color and event position of items. Here as well, there is a significant interaction of event (within vs across same color/position) and “temporal distance” (
To sum up, through learning, items that were close in time and belonged to the same event, and thus potentially shared similar temporal and perceptual information, became more separated in the left DG (O’Reilly and McClelland, 1994; Treves and Rolls, 1994; Leutgeb and Leutgeb, 2007; Yassa and Stark, 2011; Kesner, 2018).
A dynamic inhibition model for temporal pattern separation in DG
Neurophysiological literature showing uniquely high levels of inhibition in DG (Freund and Buzsáki, 1998; Chawla et al., 2005; Coulter and Carlson, 2007; Jinde et al., 2013) that determine the temporal dynamics of neuronal firing (Bartos et al., 2007). Thus, we suggest that inhibitory dynamics may account for the lower similarity between temporally adjacent items (Hasselmo and Wyble, 1997; Myers and Scharfman, 2011; Kesner and Rolls, 2015). Specifically, we propose the DG neurons that are activated for one item (e.g., item n) become inhibited for some time such that they are not active when the immediately following item (n + 1) appears. This results in other neurons representing the n + 1 item and in a low overlap between activity patterns of temporally adjacent items (items n and n + 1). Since inhibition gradually decays in time, for the following item (n + 2), some of the neurons can overcome the lower level of inhibition and are active again, resulting in a slightly higher correlation between items n and n + 2. By item n + 3, inhibition decays enough such that similarity levels are high again and reach the levels of between-event similarity. Between events, this inhibition between DG neurons does not come into play because items can be distinguished based on perceptual features alone, thus DG pattern separation is unnecessary (Fig. 6 and Discussion).
We formalized this idea using simulations, to illustrate that the temporally decaying inhibition could, in principle, result in the pattern of the data we observed. We do not offer a comprehensive mechanistic account for temporal differentiation, which would be beyond the scope of the current paper. We increased the level of maximal inhibition (between items n and n + 1) across repetitions but kept the decay in time parameter constant (see Materials and Methods: A dynamic inhibition model for temporal differentiation in DG). Across a range of ROI sizes and rates of inhibition decay, our simulations showed that indeed decaying inhibition results in the pattern of results observed in DG, namely, low similarity for temporally proximal items, with similarity increasing as temporal distance increases (Fig. 6). Conceptually, learning in such a model can be seen as shaping an activity pattern for an item that is differentiated enough from previous items, such that it defeats increasing levels of inhibition.
DG temporal differentiation likely reflects different populations of voxels
The dynamic inhibition model suggests that activation of different populations of voxels might underlie temporal pattern separation. Thus, we used the additional similarity measures we used above (normalized dot product and norm difference) to strengthen the empirical evidence that different populations of voxels, rather than changes in the level of activation (potentially in a subgroup of voxels), underlie the results observed in DG. As above, if the pattern similarity results reflect changes in the activation of distinct voxels and not in the activation level of the same voxels, we predicted that we would replicate our correlation results in the normalized dot product measure, but not in norm differences. Indeed, mean normalized dot product values in the left DG showed a similar pattern to Pearson’s correlation: lower normalized dot product for items that are close in time, only within event, but not across events (Fig. 5). As in Pearson’s correlation, in the fifth repetition, the interaction between event and temporal distance was significant (
Exploratory analysis: similarity values by event position
As an exploratory analysis, in both CA3 and DG, we examined the stability of representational similarity effects across event positions (1–4) in the fifth repetition. In the left CA3, similarity with the beginning and end of an event (Positions 1 and 4) were numerically higher than similarity with the middle of an event (Fig. 7A). In the left DG, temporal pattern separation between objects that are close in time seemed stable across the different positions in the event (Fig. 7B), consistent with the dynamic inhibition model.
CA3–DG connectivity increases with learning
Because we see that DG temporal differentiation is sensitive to event structure, it remains an open question of how DG is sensitive to the event structure. Our results also show that over learning, neural patterns in CA3 become more stable within events. Thus, one hypothesis is that feedback or communication between CA3 and DG may also increase as learning advances. Indeed, a prominent theoretical model proposes that inhibition in DG, and consequentially pattern separation, is achieved via back projections from CA3 (Scharfman, 2007; Myers and Scharfman, 2011). If pattern separation is mediated via CA3 back projections and was higher in Repetition 5 compared to early learning, we aimed to examine if a concomitant increase in CA3–DG synchronization is evident from the first to the fifth learning repetition. To address that, we used a background connectivity approach. Background connectivity is the correlation between slow fluctuations of BOLD signal in two ROIs, after removing trial-evoked activity using a linear regression (here we used 01–0.035 Hz, slower than the rate of object presentation). It is thought to capture broad, state-level, changes in circuit dynamics (rather than momentary fluctuations) and has been shown to capture different phases of learning in hippocampal subfields (Duncan et al., 2014; Tompary et al., 2015). We found that background connectivity between CA3 and DG did indeed significantly increase from the first to the fifth repetition (first rep: M = 0.34, SD = 0.24; fifth rep: M = 0.49, SD = 0.23, t(29) = 3.07, p = 0.005, Cohen’s d = 0.56; additional repetitions: second rep: M = 0.43, SD = 0.23; third rep: M = 0.41, SD = 0.21; fourth rep: M = 0.45, SD = 0.25).
No event representation in CA1
Although we focused on CA3 versus DG representations during learning of events, for comparison purposes, we report data from CA1 as well. In the left CA1, we found no effect of event or temporal distance, nor an interaction between these effects, or any interaction with repetition (
Discussion
Decades of research have shown that the brain segments continuous experience into discrete events based on context (Schank and Abelson, 1977; Gernsbacher, 1985; Zacks et al., 2007; Ezzyat and Davachi, 2011; DuBrow and Davachi, 2014; Clewett and Davachi, 2017; Radvansky and Zacks, 2017; Heusser et al., 2018; Zacks, 2020). The hippocampus is central to event cognition (Ben-Yakov et al., 2014; Ben-Yakov and Henson, 2018; Clewett et al., 2019; Maurer and Nadel, 2021; Zheng et al., 2022), potentially through its role in representing contextual and sequential information (O’Keefe and Nadel, 1978; Buzsáki and Moser, 2013; Davachi and DuBrow, 2015; Deuker et al., 2016; Ranganath and Hsieh, 2016; Eichenbaum, 2017; Buzsáki and Tingley, 2018; Bellmund et al., 2020; Umbach et al., 2020; Rueckemann et al., 2021). Critically, the hippocampus does not merely create copies of the external environment. Rather, it creates integrated representations of related life events, potentially allowing generalization and inference, while at the same time facilitating separated neural representations to distinguish memories (Duncan and Schlichting, 2018; Sugar and Moser, 2019; Brunec et al., 2020; Molitor et al., 2021; Liu et al., 2022). Theory and empirical data have broadly implicated the CA3 subregion of the hippocampus in integration and the DG subregion in separation (Marr, 1971; O’Reilly and McClelland, 1994; Treves and Rolls, 1994; Yassa and Stark, 2011; Knierim and Neunuebel, 2016). However, how time and context shape integration versus separation in hippocampal subregions to structure event representations, and how these representations change with learning, is unknown.
Here, participants learned repeating events while undergoing high-resolution fMRI. Event structure was evident in participants’ behavior, reflected in slower RTs for boundary items during learning, and slower RTs in a temporal memory test when memories bridged across different events. Using high-resolution fMRI and multivoxel pattern similarity analysis, we show novel evidence that CA3 activity patterns within the same event became more similar compared to across events, irrespective of temporal distance, suggesting that CA3 clustered representations based on events. The strength of neural event clustering in CA3 correlated with faster RTs during event transitions, suggesting that CA3 clustering facilitated transitioning between events. This adaptive CA3 segmentation is consistent with an attractor dynamics account for CA3.
In contrast to CA3, DG showed greater separation between objects that were close in time, compared to further in time. Previous human and rodent work showed evidence for pattern separation of visual stimuli or similar spatial environments (Kirwan and Stark, 2007; Leutgeb et al., 2007; Leutgeb and Leutgeb, 2007; Bakker et al., 2008; Neunuebel and Knierim, 2014; Baker et al., 2016; Berron et al., 2016; Danielson et al., 2016; Nakazawa, 2017; Wanjia et al., 2021). Here, for the first time to our knowledge, we show that DG performs differentiation across time, for temporally proximal information during an event, to facilitate distinct representations of temporally adjacent subevents. Importantly, our findings suggest that DG temporal differentiation is adaptive: when subevents belonged to different higher-level events that could be disambiguated based on perceptual context—in our case, the different background colors—we saw no temporal differentiation. A dynamic inhibition model, proposing that activated units are temporarily inhibited and then gradually released from inhibition, accounted for DG temporal pattern separation. In both CA3 and DG, integrated and differentiated event representations strengthened with learning. Finally, we used additional similarity measures to provide supporting evidence that different populations of voxels contributed to changes in similarity in both regions, consistent with attractor dynamics in CA3 and dynamic inhibition in DG.
CA3 event representations emerged even though the task allowed integration across different contexts. Specifically, in the current study, the order of objects and hence sequential events was replicated in the five learning lists. This theoretically offered participants the opportunity to predict and, thus, sequentially link across events even with the change in context features. From one perspective, that event segmentation arises from a prediction error process (Zacks et al., 2007, 2011; Franklin et al., 2020; Zacks, 2020), we might have expected to see a reduction in segmentation over learning repetitions as any “error” signaling across events should be reduced with repeated exposure. Instead, behavioral markers of segmentation were still evident in the fifth repetition, and CA3 neural clustering was even stronger, suggesting that event segmentation is a process used and applied even when events are highly predictable. Consistent with this, previous studies have found evidence for hippocampal event representations in highly predictable environments (Schapiro et al., 2012, 2016; Hsieh et al., 2014; Kyle, 2015; Hindy et al., 2016; Kok and Turk-Browne, 2018). However, they used paradigms in which the target sequence was determined by high predictability compared to low predictability between sequences; thus, learning the predictability structure was the only way of segmenting sequences into events (Schapiro et al., 2012, 2016; Hsieh et al., 2014). Other studies trained participants on discrete sequences or environments, requiring no segmentation (Kyle, 2015; Hindy et al., 2016; Kok and Turk-Browne, 2018; Zheng et al., 2021; Dimsdale-Zucker et al., 2022; Liu et al., 2022). By contrast, in our study, the fixed order made transitions within and across events fully and equally predictable, such that participants could learn and integrate across events. Nevertheless, participants leveraged the predictability in the list structure to adaptively chunk events instead of integrating across event boundaries (Clewett and Davachi, 2017; Shin and DuBrow, 2021).
In contrast to CA3 event-level representations, DG differentiated the representations of items that were close in time and experienced in the same event. Thus, DG temporal differentiation is specific and adaptive: when occurrences could be disambiguated based on different contexts, we did not observe separation in time. These results build on prior theories of the hippocampus (Treves and Rolls, 1994; Yassa and Stark, 2011; Kesner and Rolls, 2015; Kesner, 2018) as well as empirical work in humans showing reduced hippocampal pattern similarity between the multivoxel activity patterns of stimuli that are visually highly overlapping (Schlichting et al., 2015; Favila et al., 2016; Chanales et al., 2017; Koolschijn et al., 2019; Wanjia et al., 2021; Fernandez et al., 2023) or close in narrated, but not actual, time (Bellmund et al., 2022; but see Deuker et al., 2016). Additional studies have reported decreased similarity in CA3/DG between objects that share spatiotemporal context (Copara et al., 2014; Kyle, Smuda, et al., 2015; Dimsdale-Zucker et al., 2018; Liu et al., 2022) and specifically in DG (Berron et al., 2016; see also Baker et al., 2016). Our results extend this prior work in two critical ways: first, we show that DG differentiation can occur in the temporal domain, thus advancing our knowledge of how temporally extended experiences are represented in the brain (Eichenbaum, 2014; Davachi and DuBrow, 2015). Second, our results suggest that temporal differentiation is sensitive to context: DG does not separate all temporally similar experiences, but only when these experiences share the same context. Like the CA3 findings suggesting integration of temporally extended events, the DG temporal pattern separation results suggest that the hippocampus edits events, creating integrated and separated representations that are putatively adaptive for behavior (Ben-Yakov and Henson, 2018; Clewett et al., 2019; Sugar and Moser, 2019).
Motivated by the neurophysiological literature showing uniquely high levels of inhibition in DG (Freund and Buzsáki, 1998; Chawla et al., 2005; Bartos et al., 2007; Coulter and Carlson, 2007; Jinde et al., 2013; Hainmueller and Bartos, 2020), we considered that inhibitory dynamics in DG may underlie temporal differentiation. Potentially, voxels that are activated in one item, are then inhibited in the following item, maximizing separation between items. Then, this inhibition gradually decays, resulting in gradually decaying separation. This is consistent with a model that proposes that back projections from CA3 inhibit DG neurons and facilitate pattern separation (Myers and Scharfman, 2011). Here, we found that CA3–DG connectivity increased with learning, concomitant to the increase in DG temporal pattern separation, providing some initial support to the later account. Future research could investigate the specific mechanism by which temporal pattern separation may arise in the DG.
An interesting question is whether hippocampal representations extend to higher, as well as lower, levels of event hierarchy. In our study, an even higher-level event could be the list, encompassing a sequence of color-defined events. The current study was not designed to test hippocampal representations at the list level, as each list (and repetition) was included in a separate fMRI scan. Thus, comparing representations within versus across lists would include comparing similarity values within versus across scans, which is problematic (Mumford et al., 2014). Another related question is to what extent the breaks we imposed between repetitions of the same list influenced learning of hierarchical event representations in the hippocampus. Previous studies mentioned above showed event representations of discrete sequences, including at the list level (Kyle, 2015; Hindy et al., 2016; Kok and Turk-Browne, 2018; Zheng et al., 2021; Dimsdale-Zucker et al., 2022; Liu et al., 2022). Other studies showed hierarchical representations across multiple levels of hierarchy in the hippocampus (McKenzie et al., 2014; Theves et al., 2021), as well as time and space representations across extended periods of time (Ziv et al., 2013; Nielson et al., 2015; Hainmueller and Bartos, 2018). Thus, we postulate that the CA3 findings reported here could extend to higher hierarchical levels, whereby the representations are more similar within higher-level events that span a longer time (e.g., list) and might even be discontinuous.
In contrast, another theoretical possibility is an all-or-none event representation, whereby any change of features marks a novel event, and different levels are all the same in CA3. We believe that this possibility is less likely, given that in a lower event level, that of single items, changes of features between items did not interrupt CA3 clustered representations (we defined here items as subevents, but any subevent is also an event, and indeed vast memory literature defines each occurrence of an item as an “event” to be remembered). It would be interesting to test, potentially using electrophysiological methods that have the appropriate temporal resolution, whether the same CA3 and DG dynamics we observed here at the event level also occur within each item in a fine-grained temporal scale.
Many previous representational similarity studies in humans collapsed across CA3 and DG (Schapiro et al., 2012; Kyle, 2015; Stokes et al., 2015; Hindy et al., 2016; Dimsdale-Zucker et al., 2018, 2022; Wanjia et al., 2021; Zheng et al., 2021; Liu et al., 2022), hindering the examination of differences in these regions’ putative functions, namely, CA3 integration through attractor dynamics compared to DG pattern separation (Marr, 1971; Treves and Rolls, 1994; Yassa and Stark, 2011; Kesner and Rolls, 2015). Here, we leveraged a widely accepted automatic segmentation protocol of hippocampal subfields (Iglesias et al., 2015), in combination with automated alignment of the ROIs from anatomical to functional space (Materials and Methods). In our fMRI data, we were able to dissociate CA3 and DG in two different measures: average BOLD activity and pattern similarity. Thus, while we do not wish to argue that we have achieved perfect anatomical segmentation, our segmentation procedure was useful in uncovering compelling functional distinctions. Future studies, for example using 7 T fMRI, will be important to confirm these findings. However, our novel procedure has clear advantages: it is available to any 3 T scanner user, and it is fully automated, which is efficient and reproducible.
We did not find robust event or temporal representations in CA1. In this, we are consistent with previous human studies that did not find CA1 multivariate pattern representations of spatial context (Stokes et al., 2015) or temporal context (Dimsdale-Zucker et al., 2022). Shared by the current and these previous studies is that context representations were examined with only minimal learning of a few repetitions. Interestingly, other human studies that used extensive learning show spatial and temporal context representation in CA1 (Schapiro et al., 2012; Kyle, Smuda, et al., 2015; Hindy et al., 2016; Thavabalasingam et al., 2019). In rodents as well, some spatial and temporal representations in CA1 require learning (Lever et al., 2002; Pastalkova et al., 2008; Gill et al., 2011; Mankin et al., 2012). It is possible that more extensive learning is required for context representations to manifest in CA1, because of CA1’s diverse inputs, namely, CA3 and entorhinal cortex (Marr, 1971; Amaral, 1993; Kesner and Rolls, 2015; Knierim, 2015). Theoretical models and emerging empirical work converge on the notion that in familiar environments, and CA1 activity is more strongly modulated by CA3 inputs, whereas in novel environments, entorhinal input has a strongly modulatory effect on CA1 (Hasselmo et al., 1996, 2002; Colgin et al., 2009; Tort et al., 2009; Kemere et al., 2013; Duncan et al., 2014; Colgin, 2016; Lopes-dos-Santos et al., 2018; Bein et al., 2020). Potentially, when an environment is very well learned, CA3 input to CA1 is strong enough such that CA3 contextual representations also shape CA1 representations.
In sum, in the present study, we examined learning of temporally extended events and found that CA3 integrated items within event, and separated across events, consistent with an attractor dynamics account. DG, in contrast, separated representations for items that were close in time and within the same context, consistent with temporal differentiation that is sensitive to context. These results show how the hippocampus hierarchically represents events to address a major challenge of our memory system: how to integrate information to build contextually relevant knowledge, while maintaining distinct representations of this same information (McKenzie et al., 2014; Tompary and Davachi, 2017; Duncan and Schlichting, 2018; Brunec et al., 2020). Through learning, such hierarchical event representations can become the building blocks of organized semantic representations, including schemas, categories, and concepts (Collins and Quillian, 1969; Tulving, 1972; Eichenbaum, 2004; Murphy, 2004; Ghosh and Gilboa, 2014; Garvert et al., 2017; Morton et al., 2017; Mack et al., 2018; Renoult et al., 2019; Nieh et al., 2021; Theves et al., 2021). How event representations as observed here are transformed to become schematic knowledge is an exciting question for future research (Moscovitch et al., 2016; Robin and Moscovitch, 2017; Tompary and Davachi, 2017; Gilboa and Moscovitch, 2021).
Data Accessibility
Data will be available upon a reasonable request from the authors.
Code Accessibility
Analyses code is available at https://github.com/odedbein/Repatime_public.
Footnotes
This research was supported by the National Institute of Mental Health Grant R01MH074692 to L.D. O.B. was supported by the National Institute of Mental Health Grant T32MH065214, the McCracken fellowship, and a Grinker Award. We thank Andrew Heusser and J. Quinn Lee for their help in the initial conceptualization of the behavioral paradigm and Natalie Plotkin and Adam Benzekri for their help in the behavioral piloting of the task. We thank Monika Riegel, Tarek Amer, and David Clewett for insightful comments and conversations and Emily Cowan, Tarek Amer, Camille Gasser, and John Thorp for their comments on earlier drafts of the manuscript.
The authors declare no conflict of interest.
- Correspondence should be addressed to Oded Bein at oded.bein{at}princeton.edu or Lila Davachi at ld24{at}columbia.edu.