Representation of task episodes in human cortical networks

Abstract Task episodes consist of sequences of steps that are performed to achieve a goal. The current study used fMRI to examine which regions of the brain represent full episodes, items, and current step. Participants learned 6 tasks each consisting of 4 steps. Inside the scanner, participants were cued which task to perform and then sequentially identified the target item of each step in the correct order. The multiple demand (MD) network and the visual cortex exhibited phasic responses to each task step, suggesting that they are sensitive to the fine structure of the episode. In contrast, default mode (DMN) regions showed a phasic response predominantly to onset of the entire task episode. Beyond these phasic responses, gradually increasing activity across each task episode was seen throughout most of the brain. Representational similarity analysis of episode and item coding revealed a significant dissociation between MD and DMN networks. Compared to MD regions, which showed strong coding of individual items but not the entire episode, the DMN showed representation of both item and episode, with coding for the episode localized to the parahippocampal cortex. The data hint that the most abstract level of task structure may be encoded in medial frontal cortex.

A central feature of purposeful everyday behavior is the retrieval of learned sequences of events from memory (Hsieh and Ranganath 2015) to guide our current actions. This involves parcellating a main goal (e.g., "make a stew") into smaller achievable steps (e.g., "take food from fridge""wash vegetables" "chop vegetables"  "cook on stove") to allow progression towards the goal (Penfield and Evans 1935;Cooper and Shallice 2000;Farooqui et al. 2012). We call these temporally organized sequences of steps that occur within a given context "task episodes". A key aspect of these task episodes is the control of extended episodes of behavior as one unit, and not as a collection of independent acts (Schneider and Logan 2006;Duncan 2010;Farooqui and Manly 2018a). Whenever a step is completed, its specific content loses relevance, but higher level task representations of the full episode must remain in behavioral control (Farooqui et al. 2012;Farooqui and Manly 2018a). This raises the question of how different brain regions work together to execute the current step of the task while keeping the overall goal in mind.
Previous literature has highlighted the importance of a set of frontal and parietal regions, known as the multiple demand (MD) network (Duncan and Owen 2000), in executing complex mental programs (Duncan 2010(Duncan , 2013Farooqui et al. 2012). It has been proposed that the MD network plays a key role in defining and controlling parts of task episodes, allowing goals to be achieved by decomposition into a structure of subgoals (Kurby and Zacks 2008;Farooqui et al. 2012;Duncan 2013). The MD network is well suited for focusing on specific contents of a current cognitive operation, dynamically encoding information relevant to a current decision (Asaad et al. 2000;Everling et al. 2002;Li et al. 2007;Woolgar et al. 2011;Stokes et al. 2013), and radically changing the pattern of activity across successive task steps (Sigala et al. 2008;Duncan 2010). In particular, Farooqui et al. (2012) investigated the role of MD activity in task episodes requiring a series of target detection Representation of task episodes in human cortical networks 4 steps. The authors found that target detections that completed the entire task episode elicited the greatest MD activity, followed by those completing a subtask, and finally steps within one subtask. As MD activity depended on task completion, it was suggested to be involved in directing and revising the control representations of each step of the episode.
The ability to organize sequences of events within a given context has also been a key topic in the study of episodic memory (Ezzyat and Davachi 2011;Eichenbaum 2013;Hsieh et al. 2014;Cohn-Sheehy and Ranganath 2017;Radvansky and Zacks 2017). Tulving's original definition emphasized the importance of temporal events: "Episodic memory receives and stores information about temporally dated episodes or events, and temporal-spatial relations among these events (Tulving, 1972, p.385)." Event segmentation theory (Zacks and Tversky 2001;Zacks and Swallow 2007;Radvansky and Zacks 2017) proposes that humans can segment incoming information into temporal parts that are meaningfully related to the current situation. When important situation features change, the current event model is updated and experienced as an event boundary. Neuroimaging studies have found brain regions sensitive to event boundaries to overlap with areas associated with episodic memory retrieval including regions in the default mode network (DMN; Speer et al. 2007;Ben-Yakov et al. 2014;Richmond and Zacks 2017;Baldassano et al. 2018). Furthermore, it has been suggested that these dynamics within the DMN may reflect the underlying meaning of the episode rather than simple stimulus changes (Radvansky and Zacks 2017), as coarse segmentation elicited greater DMN activity than fine-grained segmentation (Speer et al. 2007). Consistent with this observation, the DMN network has been implicated in higher level cognition at a broader scale, such as encoding of schemas (Robin and Moscovitch 2017), situation models (Reagh and Ranganath 2018), and cognitive contexts (Crittenden et al. 2015). In a study of topographic mapping of a hierarchy of temporal receptive windows Representation of task episodes in human cortical networks 5 (TRW), participants listened to a story scrambled at the time scales of words, sentences, and paragraphs (Lerner et al. 2011). Results showed that early sensory regions were driven by incoming sensory input and were similarly responsive in all conditions; however, MD regions exhibited intermediate TRWs, whereas DMN regions were at the apex of the TRW hierarchy, such that they responded reliably only when intact paragraphs were heard in a meaningful sequence. This evidence suggests that the DMN is well suited to representing task episodes over an extended timescale.
Although various brain networks have been implicated in the execution of task episodes, to our knowledge, no study has contrasted the roles of MD and DMN in representing different aspects of a task episode. In the current study, we aimed to examine which brain regions are involved in coding of information at various levels of abstraction within a single task: individual steps, including their content and position within an episode, whole episodes, and groups of related episodes. Prior to the experiment, participants learned 6 everyday task episodes (3 kitchen tasks and 3 bathroom tasks) that each consisted of 4 steps. Inside the scanner, participants carried out a continuous "execution" task, in which, after being cued which task to perform, they sequentially identified the target item of each step in the correct order. This design allowed us to examine which brain regions represent rooms (e.g., kitchen), full episodes (e.g., "make a stew"), items within the episode ("take food from fridge"), and current position in the episode (e.g., 1st step). We hypothesized that different regions of the brain would be sensitive to different levels of the temporal task hierarchy. We first focused on the MD and DMN networks as a priori regions of interest. We hypothesized that the MD network would be especially involved in moment-to-moment control (Duncan 2010(Duncan , 2013Farooqui et al. 2012); whereas the DMN would be especially involved in representation of full episodes (Lerner et al. 2011;Reagh and Ranganath 2018). In addition to these pre-conceived as a higher-level cognitive representation of relationships between different elements of an episode. We hypothesized that selective activity in the posterior medial network and/or mPFC might encode higher aspects of task structure, including episode and/or room. We used both univariate finite impulse response (FIR) models to characterize the temporal evolution of activity though the extended episode, and representational similarity analysis (RSA) to investigate coding of cognitive representations of task structure and content. 43 participants (22 male,21 female;mean = 26.54,SD = 4.93) were included in the experiment at the MRC Cognition and Brain Sciences Unit. An additional 18 participants were excluded (2 participants were discovered to have cysts, 1 participant lost several slices due to poor bounding box positioning, 9 were excluded due to poor behavioral performance Representation of task episodes in human cortical networks 7 with accuracies more than three scaled median absolute deviations below the median, and a further 6 were excluded due to excessive head motion > 5 mm). All participants were neurologically healthy, right-handed, with normal or corrected-to-normal vision. Procedures were carried out in accordance with ethical approval obtained from the Cambridge Psychology Research Ethics Committee, and participants provided written, informed consent before the start of the experiment.

Stimuli and task procedures
The study consisted of a learning session outside the scanner and an execution session in the scanner. During the learning session, participants learned 6 everyday task sequences ("episodes") each based in one of two locations ("rooms"; 3 kitchen and 3 bathroom). Each episode consisted of 4 ordered "steps". For example, the episode "make a stew" consisted of the steps "take food from fridge", "wash vegetables", "chop vegetables", "cook on stove".
Each step was associated with a unique image ("item"). The complete set of stimuli is shown in Figure 1.
Representation of task episodes in human cortical networks 8 Figure 1. Illustration of the 6 task episodes (3 kitchen and 3 bathroom tasks) memorized before going into the scanner. Each task episode consisted of 4 steps to be completed in serial order (e.g., the task "make a stew" consisted of "take food from fridge", "wash vegetables", "chop vegetables", "cook on stove"). ("Wipe mouth" and "Rinse face" have been scrambled here to comply with biorxiv regulations).
In the learning session, participants viewed the names and images of the steps of each task episode in sequential order. The step images were presented simultaneously with a background image corresponding to the room they occur in (kitchen or bathroom). The learning was self-paced, in separate runs for each room. Within each room, each task sequence was presented three times, and each item within the sequence was presented until the participant decided to move on to the next item. There was a 1.5 s inter-stimulus interval between items. After viewing all six sequences, participants were tested for their memory of the task episodes by (1) sorting picture cards representing all steps of the six task episodes into the correct sequences, and (2) completing a pen-and-paper test in which they were asked to write down the names of the steps in the correct order for each task episode. Most participants performed these two tests without error. A few participants made a mistake on 1-2 items but were able to correct their answers after being told they make a mistake. The tests ensured participants had memorized the specific step sequence of each task. Before entering the scanner, participants practiced a shortened version of the main experiment, containing one trial of each task episode. During scanning, participants performed two runs of the experiment, interleaved with shorter runs (~5 minutes) of a localizer task that was not analyzed and is not described further.
Representation of task episodes in human cortical networks 9 Figure 2 illustrates the structure of the task episodes paradigm. At the start of each 45 s episode, participants were presented with a cue (e.g., "make a stew") for 1 s, indicating which task to complete. This was followed by a fixation period lasting between 1.5 -7.5 s before the onset of the first step. On each step, participants had to perform three visual searches. On each search, an array of 4 images was presented in a horizontal row (total left to right visual angle approximately 12.6°). These included (randomly ordered from left to right): the correct image ("target") corresponding to the current task step; a distractor image representing an incorrect step from the correct task; a distractor representing the correct step but from an incorrect task; and an additional distractor from the same incorrect task. To ensure that each display contained 2 images from each room, incorrect-task distractors always came from the alternative room to the current task. The array remained for 2 s, and within this time, the participant's task was to indicate the position of the target image using a 4-choice keyboard with their right hand. A 1 s fixation interval preceded onset of the next search array. Each step thus lasted for 9 s, with the participant selecting the same target in each of three search events, to allow separation of the hemodynamic response to successive task steps. At the end of the third search event, a 0.2 s presentation of the words "STEP COMPLETED" indicated the completion of that step, and was followed by a 0.8 s fixation interval. Without further cueing, the participant then moved on to the next task step. After completing the last of the 4 steps, a fixation interval of 0.5 -6.5 s was presented before the onset of the cue for the next task. The total interval between the last step of the previous task and the first step of the next task was fixed at 9 s. Each run consisted of 36 task episodes (with an additional dummy episode to start), constructed so that each task appeared following each possible preceding task once. Task ordering was chosen before the start of each run by calculating the design efficiency (Dale 1999) of all pairwise contrasts between tasks. 1000 task orders were simulated, and the most efficient one was chosen. Each of the two runs lasted ~28 min.
Representation of task episodes in human cortical networks 10 Figure 2. Structure of an example task episode. Episodes began with a cue indicating which task to perform (e.g., "make a stew"). After a short delay, the first search array of four items appeared, and participants were asked to select the item corresponding to the first step of that task (here, "take food from fridge"). Participants selected this same target in 3 search arrays (total step duration = 9 s), then were given a brief indicator that the step had been completed, and moved on to the next step (here "wash vegetables"). Completion of all four steps completed the entire task episode.
The data were preprocessed and analyzed using the automatic analysis (aa) pipelines and modules (Cusack et al. 2015), which called relevant functions from Statistical Parametric Mapping software (SPM 12, http://www.fil.ion.ucl.ac.uk/spm) implemented in Matlab (The MathWorks, Inc., Natick, MA, USA). EPI images were realigned to correct for head motion using rigid-body transformation, unwarped based on the field maps to correct for voxel displacement due to magnetic-field inhomogeneity, and slice time corrected. The T1 image was coregisted to the mean EPI, and then coregistered and normalized to the MNI template.
The normalization parameters of the T1 image were applied to all functional volumes. The model incorporated a high-pass filter with a cutoff at 1/128 Hz. Spatial smoothing of 10 mm FWHM was applied for univariate analysis, but no smoothing was done for multivariate analysis.

FIR Model
Statistical analyses were performed first at the individual level, using a general linear model (GLM). To capture the BOLD timecourse throughout the task episode, a 45 s epoch starting from the onset of the first search array of every task to the first search array of the next task was modeled using a finite impulse response (FIR) basis set of 30 1.5 s boxcar regressors. In this way, the response throughout task episodes could be modelled without making any assumptions about the shape of the hemodynamic response. Error episodes (defined as episodes that had > 25% errors) were removed from the analysis using a similar but separate set of regressors. Effects of cues, and errors on individual search arrays, were also removed by modelling with epoch regressors with duration of respective events (1 s for cues and 2 s for error events), convolved with a basis function representing the canonical hemodynamic response. The estimates for each time point were extracted from the two networks of interest, averaged over voxels within the network and across the 6 tasks. These average beta estimates for individual participants were entered into a random effects group analysis. The timepoints from the first 36 seconds of the average task response in each ROI were plotted for visualization.
Representation of task episodes in human cortical networks 13

Event-based GLM analysis
To complement the FIR model, an event-based GLM analysis was performed. In this analysis, we aimed to separate phasic activity linked to onset of each step from tonic activity across the whole step. Accordingly, each step was modelled using two regressors, an onset regressor modelled with 0 s duration and an epoch regressor modelled with 9 s duration, each convolved with the canonical haemodynamic response function. There were accordingly 48 regressors of interest, two (onset and epoch) for each of the 4 steps in each of the 6 tasks.
Error and cue activities were removed as before, with the cue also modelled using a combination of onset (0 s duration) and epoch (duration from cue onset to the onset of the first task step) components. Beta estimates were averaged across the 6 tasks for individual participants, and the contrasts against implicit baseline were entered into a random effects group analysis. To determine whether BOLD signal showed significant linear changes towards goal completion, we performed t tests on increasing ([-3 -1 +1 +3]) and decreasing ([+3 +1 -1 -3]) linear contrasts across task steps. To complement ROI analysis, this analysis was also carried out at the whole brain level, using an FDR-corrected threshold of p < 0.05.

RSA analysis
We performed representational similarity analysis (RSA) using linear discriminant contrast (LDC) to quantify dissimilarities between activation patterns. The analysis was done using the RSA toolbox (Nili et al. 2014), in conjunction with in-house software. The LDC was chosen because it is multivariate noise-normalized, potentially increasing sensitivity, and is a cross-validated measure which is distributed around zero when the true distance is zero (Nili et al. 2014). The LDC also allows inference on contrasts of dissimilarities across multiple pairs of stimuli. A pattern for each step of each task was obtained, by averaging the onset and epoched responses from the standard GLM described above. This resulted in 24 patterns in total in each run. For each pair of patterns, the patterns from run 1 were projected onto a Fisher discriminant fitted for run 2, with the difference between the projected patterns providing a cross-validated estimate of a squared Mahalanobis distance. This was repeated projecting run 2 onto run 1, and we took the average as the dissimilarity measure between the two patterns. All pairs of pattern dissimilarities therefore formed a symmetrical representational dissimilarity matrix (RDM) with zeros on the diagonal by definition. To compare dissimilarity magnitude across ROIs of different sizes, the LDC values were normalized by dividing by the number of voxels within each ROI.

Coding of information within regions of interest
We first performed this RSA analysis using activation patterns from a priori MD and DMN network ROIs. Activation pattern dissimilarity of each stimulus pair, cross-validated across the two scanning runs, was quantified by LDC. The result was a 24 × 24 representational dissimilarity matrix (RDM), as shown in Figure 3A, with each cell showing cross-validated LDC dissimilarity between the corresponding two task events. These included event pairs that shared the same room and episode but different steps (red cells); events that shared the same room and step but different episodes (white cells); events that shared the same room, but different episodes and steps (purple cells); events that shared the same step, but different rooms and episodes (blue cells); and events that differed in rooms, episodes and steps (green cells). All event pairs additionally differed in item. The cells on the diagonal (yellow) are zero by definition as they do not reflect a distance between different task events.
Based on this matrix, separate contrasts were used to examine coding of room, episode, step and item. Brain regions that coded for room should show higher dissimilarity for patterns from different rooms than from the same room, for example, the dissimilarity between "get toothpaste" and "take food from fridge" should be greater than the dissimilarity between "prepare ingredients" and "take food from fridge" (Figure 3, green vs purple cells, and blue vs white cells). To obtain an index of room coding (Figure 3, bottom), we calculated mean LDC distances for green, purple, blue and white cells, and averaged the two differences green minus purple and blue minus white. Regions that code for episodes should show higher dissimilarity for different episodes compared to same episode, for example, "take food from fridge" should show higher dissimilarity to "hand mix batter" than to "wash vegetables" (Figure 3, purple cells vs red cells). Our index of episode coding was simply the mean distance for purple minus mean distance for red cells. Regions that code for steps should show greater dissimilarity for different steps compared to same steps, for example "hand mix batter" should be more dissimilar to "take food from fridge" than to "wash vegetables" (Figure 3, purple vs white, and green vs blue). To index step coding, the two differences purple minus white and green minus blue were averaged. A more complex formula was needed to derive item coding, assuming additive effects of item, episode and step. Within a room, red cells ( Figure 3) differ in item and step, white cells differ in item and episode, and purple cells differ in item, step and episode. If there were no item coding, LDC dissimilarities for purple cells should be the sum of dissimilarities for red and white cells. With item coding, the dissimilarities for purple cells will be less than this sum. Accordingly, we computed item coding from the sum of mean distance for red and mean distance for white minus mean distance for purple cells (Figure 3, bottom).

Searchlight analyses
Next, to obtain more specific localization of regions that contained information within the larger networks, and to test for additional regions outside the predefined networks, we implemented a whole brain searchlight procedure (Kriegeskorte et al. 2006) to perform pattern analyses in small spherical ROIs (radius = 10 mm) centered on every voxel of the brain in turn. The procedure was identical to that described in the ROI analysis. Pairwise dissimilarities for each cell type were derived from a 24 × 24 RDM in each sphere, and were assigned to the center voxel. Differences of dissimilarities corresponding to the strength of room, episode, step, and item coding for each sphere were calculated as before, generating whole-brain maps of information coding for each subject. These individual subject maps were smoothed with a 10 mm FWHM Gaussian filter before performing second-level random effects analyses to identify voxels that coded for these four types of information across subjects. Unless otherwise specified, all results are reported at the FDR-corrected threshold of p < 0.05.

Figure 3. Illustration of representational similarity analysis. A) Conceptual RDM. LDC dissimilarities were computed between every possible pair of events (6 tasks × 4 steps), generating a 24 × 24 RDM. Diagonal cells of the RDM (yellow cells) are zero by definition as they do not reflect a dissimilarity between different events. Off-diagonal cells reflect pattern dissimilarity between trials that differ in target item and step (red cells), item and task (white cells), item, task and step (purple cells), item, room and task (blue cells), or item, room task and step (green cells). B. Hypothetical bar graph indicating expected ordinal magnitude of LDC dissimilarities for each cell type, assuming coding of all four factors contribute equally to estimated dissimilarities. Contrasts of selected cell types (bottom) were
used to derive indices of room, episode, step and item coding. ("Wipe mouth" and "Rinse face" have been scrambled here to comply with biorxiv regulations).

Behavioral results
Group behavioral accuracy and reaction time results are show in Figure 4. Results show poorest performance for the first search array of each step, especially for steps 2-4 when participants were required to switch from one step to the next. Overall accuracy was 97.7% ± 0.1% (mean ± SEM) and overall reaction time was 869 ± 4 ms. A step (steps 1-4) × search array (first, second, third within each step) ANOVA for accuracy revealed a significant main effect for step (F(3,126) = 14.21, p < 0.001), a significant main effect for array (F(2,84) = 31.81, p < 0.001), and a significant step × array interaction (F(6,252) = 6.02, p < 0.001). A similar ANOVA for reaction time also showed a significant main effect for step (F(3,126) = 16.15, p < 0.001), a significant main effect for array (F(2,84) = 252.97, p < 0.001), and a significant step × array interaction (F(6,252) = 10.10, p < 0.001).

ROI analysis
The FIR model provided estimates of the observed BOLD response across a full task episode in successive 1.5 s windows starting from the onset of the first step. In the first analysis, we extracted these FIR responses from a priori ROIs (Figures 5Ai and Bi). The MD network exhibited clear phasic response to the execution of each task step (Figure 5Aii), with 4 peaks throughout each task episode, corresponding to the 4 steps, suggesting that it is sensitive to the fine structure of the contents within an episode. Additionally, overall MD activity gradually increased throughout the task episode, suggesting that the MD network is also sensitive to progress through the full task episode. In contrast, DMN regions showed a phasic response that was smaller and more specific to the first step, followed by a tonic activation that began below baseline but gradually increased throughout the episode (Figure 5Bii).
These results suggest DMN involvement in initiation and progress of the entire task episode, with less sensitivity to its fine structure.

Representation of task episodes in human cortical networks 20
To quantify the phasic and tonic components contributing to the BOLD response at each task step, we performed a complementary GLM analysis with onset and epoch regressors modelling each task step. Onset regressors were designed to reflect phasic activity at the onset of each task step, while epoch regressors were designed to reflect tonic activity throughout the step. Within the MD network (Figures 5Aiii, iv)

. ROI analysis of the A. MD and B. DMN networks. In each panel, (i) shows regions of interest, (ii) shows results of the FIR analysis, with BOLD response as a function of time as participants progressed from the first search display to the end of the task episode, (iii)
shows beta estimates for onset regressors for each task step, (iv) shows beta estimates for epoch regressors. Error bars indicate standard error of the mean.

Whole-brain analysis
Results from the whole-brain analysis, again separating onset and epoch regressors, are shown in Figure 6. Panels A and B show contrasts of average onset and epoch regressors against baseline, while lower panels show increasing (C, D) and decreasing (E, F) linear trends across task steps.
In comparison to baseline, onset effects ( Figure 6A) were significant in large parts of the MD network, including regions of lateral frontal, insular, dorsomedial frontal, and lateral parietal cortex. Onset effects were also seen in large regions of visual cortex, medial parietal cortex, and subcortical structures including the cerebellum. Epoch effects ( Figure 6B), in contrast, were more restricted, including parts of the DMN network (medial prefrontal cortex, hippocampus, parahippocampal cortex, and temporal pole), as well as expected regions of visual and motor cortex. Onset regressors showed a linear increase across successive task steps in a more restricted subset of MD regions. Linear decreases were extensive, including many parts of the DMN network (compare Figures 5Bi and 6E). However, the ROI analysis indicates that this decrease across the DMN network is driven by an onset effect only for the first step of the task. Epoch regressors showed a linear decrease in visual cortex, but otherwise, an extensive pattern of linear increase across much of the brain.
Representation of task episodes in human cortical networks 23 These findings confirm that, in comparison to DMN, MD regions were sensitive to the fine temporal structure of the task, with phasic response to onset of each new task step. This phasic response was shared with several other brain regions, most notably visual and subcortical. DMN, in contrast, showed onset activity only for the start of an entire task.
Finally, the data show a pattern of gradual increase in activity across the full 36 s of the task, widespread throughout the brain.

RSA results
Results of the RSA analysis are shown in Figure 7. Rows A-D show contrasts between sets of LDC values (Figure 3) reflecting coding of different types of information: room, episode, item and step. Left panels show contrast values for a priori MD and DMN networks, with asterisks indicating values significantly greater than zero. Right panels show whole-brain searchlight results.

Figure 7. LDC contrasts representing strength of (A) room, (B) episode, (C) item, and (D) step coding in (i) MD and DMN ROIs and (ii), (iii) the whole-brain searchlight. Error bars
Representation of task episodes in human cortical networks 26 represent standard error. *** indicates p < 0.001, ** indicates p < 0.01, and * indicates p < 0.05 for comparisons against zero. All whole-brain maps except room coding are thresholded at FDR < 0.05, whereas room coding is thresholded at p < 0.05 (uncorrected).

Room coding
Neither the MD nor DMN network showed significant room coding (ts = -0.59 and -0.39, both ps > 0.55). There were no differences in room coding between the two networks (t(42) = -0.13, p = 0.90). In the searchlight analysis, we did not find room coding in any region after FDR correction; however, at the lenient threshold of uncorrected p < 0.05, a region in the medial prefrontal cortex (mPFC) showed greater dissimilarity for events in different rooms compared to those in the same room.

Episode coding
The DMN network showed significant coding for episode (t(42) = 2.63, p = 0.01), while MD did not (t(42) = 1.72, p = 0.09); however, the difference between networks was not significant (t(42) = -1.47, p = 0.15). The whole-brain searchlight analysis revealed episode coding to be localized rather specifically to a region near the anterior fusiform gyrus and PHC bilaterally.
It is possible that the response to regressors modelling adjacent steps could be similar due to imperfect temporal separation of the signal, such that pairs of steps within the same task appear more similar than those from different tasks due to differences in temporal separation in addition to differences in task episode. We examined this possibility by taking the voxels showing significant episode coding in the RSA searchlight, and recalculating the contrast using three subsets of cells, chosen to differ in separation of one, two, or three steps. That is, we extracted LDC values from cells of the RDM that represented either one (step 1 vs. step 2, Representation of task episodes in human cortical networks 27 step 2 vs. step 3, and step 3 vs. step 4), two (steps 1 vs. step 3 and step 2 vs. step 4), or three (step 1 vs. step 4) steps apart, and contrasted between-episode vs. within-episode cells within each subset separately. If temporal leakage were contributing to activity patterns, and hence to apparent episode coding, we should expect a stronger effect for steps closer together in time. However, we found no evidence of any difference in episode coding in these three conditions (F(2,84) = 0.11), nor a linear trend as a function of step (F(1,42) = 0.80). Step coding Both MD (t(42) = 9.91, p < 0.001) and DMN (t(42) = 11.56, p < 0.001) networks showed significant coding for step. The MD network showed greater step coding compared to the DMN (t(42) = 4.46, p < 0.001).
Step coding was widespread in all regions of the brain. This was not surprising, as in our univariate analysis, we observed significant linear trends across the task episode for much of the brain (visual cortex showed decreasing activity, while most other regions showed increasing activity).

Network comparison
Representation of task episodes in human cortical networks 28 Finally, we asked whether activity patterns in the MD and DMN networks differentially carried information about distinct aspects of task episodes by comparing LDC contrasts of room, episode, item, and step representation across these two network ROIs. A 2 (network) × 4 (information type) ANOVA showed a significant interaction (F(3,120) = 14.25, p < 0.001), as well as main effects of network (F(1,40) = 18.55, p < 0.001) and information type (F(3,120) = 70.51, p < 0.001). When limiting the information type to focus on episode and item coding, the ANOVA continued to show a significant interaction (F(1,42) = 12.31, p < 0.001), as well as main effects of network (F(1,42) = 13.23, p < 0.001) and information type (F(1,42) = 4.61, p = 0.04).

Discussion
The present study used fMRI to examine how different cortical networks represent task episodes. Specifically, we focused on the MD and DMN networks. Using FIR analysis to capture the evolution of the BOLD response throughout a multistep episode, we found that MD regions exhibited clear phasic responses to each task step, suggesting that they are sensitive to the fine temporal structure within a task. In contrast, DMN regions did not show significant phasic responses to every step, but instead exhibited a peak at the onset of the entire task episode. Both networks showed gradual increase in overall activity throughout the task, suggesting that they are both sensitive to task progression. Representational similarity analysis showed differential coding of task features in MD and DMN networks. MD regions showed strong coding of individual items but not the entire task episode, while for DMN, item coding was weaker than found in MD, and episode coding was also now significant. The RSA searchlight analysis confirmed differential representation of task features. The content of individual task steps was represented in visual cortex and the MD network. Task identity was represented in the PHC, with a hint of room identity in the mPFC.
Step was widely represented across most of the brain, in line with strong changes in univariate activity as the task progressed.
The finding that MD regions are especially sensitive to the identity of a current task step and its specific item content is consistent with prior research. Many previous experiments have shown coding of task-relevant information in MD regions that can rapidly change according to task demands (e.g., Li et al., 2007;Woolgar et al., 2011;Freedman et al., 2001), including radical reorganization between successive task steps (Sigala et al., 2008). fMRI studies show strong MD activity when a subgoal is completed and in transitions from one event to another (Sridharan et al. 2007;Farooqui et al. 2012), with progressively increasing activity as a goal is approached (Farooqui et al. 2012;Desrochers et al. 2018). The pattern of MD activation in our study is consistent with these previous findings. The results suggest that, as a task episode progresses, MD representations are in constant flux, reorganizing to encode the detailed contents of each task step.
While MD regions did not show significant discrimination for episodes, the DMN exhibited both item and episode coding. Univariate results showed a significant peak of DMN activity at the beginning of each episode, but no significant onset responses to subsequent steps.
These findings are consistent with prior reports of transient DMN activation at event boundaries (Ben-Yakov et al. 2013Baldassano et al. 2018). It has been proposed that the mental programs required for carrying out a task are assembled at the beginning of task execution (Schneider and Logan 2006;Farooqui and Manly 2018b). It is possible that DMN is involved in long-term memory retrieval for the entire task sequence prior to episode initiation. Our results match the observation that the DMN has long temporal receptive windows and can code for information accumulated over longer time scales (Hasson et al. 2008;Lerner et al. 2011;Manning et al., 2015).
Searchlight RSA revealed that episode representation was not spread equally throughout the DMN, but was focused near bilateral PHC. While DMN is involved in retrieval, it has been proposed to serve a broader function of mental construction of prospective episodes, through information from stored episodic, conceptual, and contextual representations (Buckner and Carroll 2007;Gilbert and Wilson 2007;Andrews-Hanna 2012). The PHC is a key component of the DMN, and has been shown to be involved in context representation (Diana et al. 2007;Ranganath 2010a;Ranganath and Ritchey 2012;Aminoff et al. 2013;Reagh and Ranganath 2018). It has been proposed that DMN regions such as the PHC encode a situation model, representing spatial, temporal, and broader causal relationships between different elements of an event (Ranganath and Ritchey 2012;Reagh and Ranganath 2018). Our results match prior indications that PHC codes for broad features of a current context (Ranganath 2010a;Kim and Maguire 2018).
Within the DMN, it has been proposed that the mPFC is involved in the representation of schemas which are more general than particular task episodes (Preston and Eichenbaum 2013;Spalding et al. 2015;Gilboa and Marlatte 2017;Robin and Moscovitch 2017).
Evidence has suggested that the mPFC accumulates information about the context of interrelated episodes (Preston and Eichenbaum 2013;Robin and Moscovitch 2017). In our data, we found a weak suggestion of room coding in the mPFC. Although we trained participants to memorize task episodes one room at a time, and the episodes have clear semantic associations with these locations, our experiment did not require grouping of episodes from the same room, perhaps contributing to weak room representation. Future research could provide more insight into the involvement of mPFC in encoding more generic cognitive contexts.
We also found that the DMN showed significant coding for items. The searchlight analysis revealed that areas that coded for items included several regions in the DMN network (including TPJ, pIPL, Rsp, PHC, HF+, amPFC, and PCC), in addition to more prominent representation in visual and MD regions. These results show some DMN representation not just for full task episodes, but also for specific contents within the episode. It has been suggested the hippocampus, a key region in the DMN, is involved in binding items to contextual episodes (O'Reilly and Rudy 2001;Diana et al. 2007;Manning et al., 2015;Hsieh et al., 2014). To play this binding role, it has been suggested that the hippocampus receives both item representations (e.g. from perirhinal cortex) and episode representations (e.g. from PHC and prefrontal cortex subregions) (Polyn and Kahana 2008;Ranganath 2010aRanganath , 2010bManning et al., 2015). Although we found both item and episode representations coexisting in PHC, consistent with a compositional code, this experiment cannot determine whether and where items and episodes might be bound into a conjunctive representation: because items were unique to each task, item-episode conjunctions are indistinguishable from item coding.
Disentangling these different forms of co-representation requires designs where the same item appears in different contexts. As well as item-context conjunctions in the hippocampus (Hsieh et al., 2014) such designs have associated various frontal and temporal regions with item-order associations (e.g., Reverberi et al. 2012;Kalm and Norris 2014), and rule-rule compositionality (e.g., Cole et al. 2011).
Both MD and DMN, along with most regions of the brain, tracked progress through the task episode, shown by increasing linear trends in the univariate data and step coding in the RSA analysis. These observations are consistent with previous studies that tracked activity and step representation throughout a task episode in MD (Farooqui et al. 2012;Desrochers et al. 2018) and DMN (Hsieh and Ranganath 2015) ROIs, but suggest that it might be a much more global property of brain function. While visual cortex showed a decrease in sustained activity over time, which may reflect adaptation to the sensory input (Grill-Spector and Malach 2001;Grill-Spector et al. 2006), most other cortical regions showed an increase in sustained activity over the episode. As this effect was so widespread, it is difficult to offer a precise interpretation, and different areas may increase for different reasons (Kalm and Norris 2017). For example, it is possible that increased activations in some regions reflect revision and reconfiguration of control representations that may increase in demand as larger portions of the task are complete (Farooqui et al. 2012;Desrochers et al. 2015Desrochers et al. , 2016. These activity changes could also reflect gradual assembly of an episode representation (Dumontheil et al. 2011) or accumulation of new information (Hasson et al. 2008;Lerner et al. 2011).
A hierarchical control structure is an organized representation of control elements (Rosenbaum et al. 1983;Schneider and Logan 2006) with task identity, local entities, and serial position codes. Our results describe how broad brain networks are involved in executing task sequences, with differential representation of individual task components and entire task episodes. The DMN, we suggest, may establish overall cognitive context, representing both individual cognitive operations and their broader context, and perhaps involved in binding them together. Within the DMN, there may be differentiation between a posterior "situation model" and a broader, more schematic representation in the mPFC. At the same time, the MD system, along with sensory regions, encodes the detailed content of individual cognitive operations. Acting together, these two brain networks manage the hierarchical structure of goal-directed behavior.