Abstract
Progress in understanding the neural bases of cognitive control has been supported by the paradigmatic color-word Stroop task, in which a target response (color name) must be selected over a more automatic, yet potentially incongruent, distractor response (word). For this paradigm, models have postulated complementary coding schemes: dorsomedial frontal cortex (DMFC) is proposed to evaluate the demand for control via incongruency-related coding, whereas dorsolateral PFC (DLPFC) is proposed to implement control via goal and target-related coding. Yet, mapping these theorized schemes to measured neural activity within this task has been challenging. Here, we tested for these coding schemes relatively directly, by decomposing an event-related color-word Stroop task via representational similarity analysis. Three neural coding models were fit to the similarity structure of multivoxel patterns of human fMRI activity, acquired from 65 healthy, young-adult males and females. Incongruency coding was predominant in DMFC, whereas both target and incongruency coding were present with indistinguishable strength in DLPFC. In contrast, distractor information was strongly encoded within early visual cortex. Further, these coding schemes were differentially related to behavior: individuals with stronger DLPFC (and lateral posterior parietal cortex) target coding, but weaker DMFC incongruency coding, exhibited less behavioral Stroop interference. These results highlight the utility of the representational similarity analysis framework for investigating neural mechanisms of cognitive control and point to several promising directions to extend the Stroop paradigm.
SIGNIFICANCE STATEMENT How the human brain enables cognitive control — the ability to override behavioral habits to pursue internal goals — has been a major focus of neuroscience research. This ability has been frequently investigated by using the Stroop color-word naming task. With the Stroop as a test-bed, many theories have proposed specific neuroanatomical dissociations, in which medial and lateral frontal brain regions underlie cognitive control by encoding distinct types of information. Yet providing a direct confirmation of these claims has been challenging. Here, we demonstrate that representational similarity analysis, which estimates and models the similarity structure of brain activity patterns, can successfully establish the hypothesized functional dissociations within the Stroop task. Representational similarity analysis may provide a useful approach for investigating cognitive control mechanisms.
Introduction
Goals, held in mind, can be used to overcome behavioral habits. Understanding how the human brain enables such cognitive control has been a fundamental interest of both basic and translational cognitive neuroscience. Toward this end, the use of response conflict tasks has been instrumental (e.g., Botvinick et al., 2001; Ridderinkhof et al., 2004). These tasks involve trials in which a less-automatic, but goal-relevant course of action, the target response, must be selected in the face of a habitual, but goal-irrelevant alternative, the distractor. The paradigmatic example is the color-word Stroop task (Stroop, 1935; Posner and Snyder, 1975; MacLeod, 1991): on each trial, the hue of a word must be named, despite the word expressing a potentially conflicting, that is, incongruent, color (see Fig. 1C). A major goal in this field has been to use measures of neural activity evoked by response conflict tasks, such as Stroop, to test models of cognitive control.
One broad, neurocomputational-level model ascribes particular roles to different frontoparietal regions in overcoming response conflict (Miller and Cohen, 2001; Shenhav et al., 2013). Central to this view is the type of task information these regions encode. The dorsomedial frontal cortex (DMFC) is proposed to “evaluate” demand for cognitive control, via encoding of incongruency-related information (see Fig. 1A, bottom row). Such information, according to this view, is used by dorsolateral PFC (DLPFC), in concert with lateral posterior parietal cortex (LPPC), to “implement” control, via encoding of goal and target-related information (see Fig. 1A, top row). Thus, this model predicts key functional dissociations between medial and lateral frontoparietal cortex (see Fig. 1B). But although this view has been influential, directly establishing these dissociations during the performance of standard color-word Stroop tasks has been difficult.
To date, the most traction on this problem has been gained via fMRI designs that temporally dissociate presentation of task-rule and incongruency-related information, in which subjects were instructed before each Stroop trial about which task to perform (color-naming, word-reading; MacDonald et al., 2000;Floden et al., 2011). But, while these studies generally found supportive evidence for the key claims, results were subject to three notable limitations. First, these studies were likely underpowered for fMRI (e.g., N = 12 in Floden et al., 2011; N = 9 in MacDonald et al., 2000). This fact alone warrants a follow-up study. Second, it is unclear whether the results extend to the more-standard Stroop-task design, in which task rules are not explicitly instructed before each trial, but are instead internally maintained. For example, goal-relevant coding in DLPFC may depend on such explicit rule instruction. Third, the prior results do not speak to functional dissociations within a single Stroop trial, during which interference is actually experienced and resolved. It is therefore possible, for instance, that the role of DLPFC (or other frontoparietal regions) in Stroop is primarily preparatory, and is less critical during actual interference resolution.
To address these questions, a neuroanatomically precise technique is needed that does not rely on temporal dissociations, but can instead read out multiple, simultaneously encoded sources of task information from individual brain regions of interest (ROIs). Multivariate (multivoxel) pattern analysis (MVPA) of fMRI, in popular use for over a decade (Edelman et al., 1998; Haxby et al., 2001; Cox and Savoy, 2003), accomplishes exactly this purpose. Surprisingly, however, these methods have not been brought to bear on the question of a functional dissociation between medial and lateral frontoparietal cortex in resolving Stroop conflict.
We fill this gap in the literature by using representational similarity analysis (RSA; Edelman et al., 1998; Kriegeskorte et al., 2008), a specific MVPA framework, to test for dissociations in frontoparietal coding during Stroop-task performance (see Fig. 1C–E). We conducted a retrospective analysis of data collected as part of the Dual Mechanisms of Cognitive Control project (Braver et al., 2020). Our primary goal was proof-of-principle: to demonstrate the potential of RSA for testing theorized distinctions in neural coding within cognitive control tasks, such as the Stroop (Freund et al., 2021).
Materials and Methods
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study (Simmons et al., 2011).
Code, data, and task accessibility
Code (R Core Team, 2019) and data to reproduce all analyses, in addition to supplementary analysis reports, are publicly available (https://doi.org/10.5281/zenodo.4784067). As part of the planned data release of the Dual Mechanisms of Cognitive Control project, raw and minimally preprocessed fMRI data have been deposited on OpenNeuro (https://doi.org/10.18112/openneuro.ds003465.v1.0.3). Additionally, the authors will directly share the specific fMRI data used for this study on request. Task scripts are available at the project website (http://pages.wustl.edu/dualmechanisms/tasks). More detailed information regarding all aspects of the project can be found on the Project's OSF page (https://osf.io/xfe32/).
Participants
Individuals were recruited from the Washington University and surrounding St. Louis metropolitan communities for participation in the Dual Mechanisms of Cognitive Control project. The present study began with a subset (N = 66; 38 women, 26 men, 1 “prefer not to answer”) of these subjects: those with a full set of imaging and behavioral data from the Stroop task during a particular scanning session (the “proactive” session), selected for methodological reasons (see Selection of data). One subject was excluded from all analyses because of a scanner error. We split the remaining sample into two sets of individuals: a primary analysis set (N = 49; 27 women, 21 men, 1 “prefer not to answer”), which we used in all analyses, and a validation set (N = 16; 11 women, 5 men), which was only used in the Model selection analysis (see below). This unbalanced partitioning was done to account for the familial structure present within our sample. Specifically, subjects within each set (primary validation) were all unrelated; however, subjects within the validation set were co-twins of 16 subjects within the primary analysis set. Two of these co-twins were selected for use in the primary analysis set as their respective co-twins had atypically high rates of response omission (>10%; >20% errors of any type); the remaining co-twins were randomly selected. Critically, partitioning the sample in this way ensured that the primary analysis set was a random sample of independent subjects.
The partitioning of the data into two subsets also afforded the opportunity to use the validation subset as held-out data for evaluation of the brain–behavior model within the Model selection analysis. As we performed this sorting of individuals into primary and validation sets only once and did not analyze the validation-set data (except to assess predictive accuracy of the final selected model) the validation set provides an unbiased assessment of predictive accuracy, in the sense that no statistical “double-dipping” could have occurred. But because the sets are familially dependent, it is perhaps more accurate to consider the validation-set analyses as assessing a kind of test–retest reliability (i.e., while eliminating the potential confound of practice effects), rather than providing an estimate of out-of-sample predictive accuracy. To evaluate this matter, follow-up control analyses were conducted in which the co-twins were removed from the primary analysis set.
Experimental design and statistical analysis
Task
Participants performed the verbal color-word Stroop (1935) task. Names of colors were visually displayed in various hues, and participants were instructed to “say the name of the color, as fast and accurately as possible; do not read the word.”
The set of stimuli consisted of two subsets of color-word stimuli (randomly intermixed during the task): a mostly incongruent and an unbiased set. Each stimulus set was created by pairing four color words with four corresponding hues in a balanced factorial design, forming 16 unique color-word stimuli within each set. The mostly incongruent set consisted of stimuli with hues (and corresponding words) 'blue' (RGB = 0, 0, 255), 'red' (255, 0, 0), 'purple' (128, 0, 128), and 'white' (255, 255, 255); the unbiased set, of 'black' (0, 0, 0), 'green' (0, 128, 0), 'pink' (255, 105, 180), and 'yellow' (255, 255, 0). These words were centrally presented in uppercase, bold Courier New font on a gray background (RGB = 191, 191, 191). Of stimuli within the mostly incongruent set, incongruent stimuli were presented to subjects more often than congruent stimuli (per block, proportion congruent = 0.25). Unbiased stimuli were presented with a balanced frequency (proportion congruent = 0.5). These manipulations of incongruency statistics are standard manipulations to elicit proactive control (Bugg, 2014; e.g., Gonthier et al., 2016) and were performed to investigate questions outside the scope of the current study. Thus, as described further below, the unbiased stimulus set was excluded from all analyses.
Each trial (see, e.g., Fig. 1C) began with a central fixation cross, presented for 300 ms on a gray background (RGB = 191, 191, 191). The color-word stimulus, preceded by a blank screen following fixation offset (100 ms), was centrally presented for a duration of 2000 ms, fixed across trials. The duration of the intertrial interval (triangle of fixation crosses) was 900, 2100, or 3300 ms, selected randomly (with uniform probability). Each of two scanning runs consisted of three blocks of 36 trials, intermixed with four resting fixation blocks, during which a fixation cross appeared for 30 s. This formed a mixed block-event design (Petersen and Dubis, 2012). Each of the 16 mostly incongruent stimuli — that is, each unique colored word (e.g., “BLUE” displayed in red hue) — was presented in both runs. Within each run for each participant, mostly incongruent stimuli were presented an equal number of times within each block. Within each block, stimulus order was fully randomized.
Selection of data
We focused our fMRI pattern analyses solely on trials from the mostly incongruent stimulus set within a particular scanning session (the “proactive” session) of our Stroop task. This selection was made purely on the basis of methodological reasoning: these trials were the only set of trials within the larger Dual Mechanisms project in which each unique Stroop stimulus (i.e., one of the 16 color-word combinations) was presented an equal number of times (9) to each participant, constituting a balanced design. Balanced designs ensure that differences in the total number of trials per condition cannot explain any differences observed in pattern correlations among conditions.
Display and recording systems
The experiment was programmed in E-Prime 2.0 (2013, Psychology Software Tools), presented on a Windows 7 Desktop, and back-projected to a screen at the end of the bore for viewing via a mirror head-mount. Verbal responses were recorded for offline transcription and response time (RT) estimation. The first 45 participants spoke into a MicroOptics MR-compatible electronic microphone (MicroOptics Technologies); because of mechanical failure, however, we replaced this microphone with the noise-cancelling FOMRI III (OptoAcoustics), which subsequent participants used. A voice-onset processing script (from the MATLAB Audio Analysis Library) was used to derive RT estimates on each trial via spectral decomposition (the accuracy of which was verified by manually coding RTs from a subsample of subjects and ensuring the two methods gave similar estimates). Code for this algorithm is available within the Dual Mechanisms GitHub repository (https://github.com/ccplabwustl/dualmechanisms/tree/master/preparationsAndConversions/audio).
Importantly, we verified that the change in microphone did not induce confounding between-subject variance in RT measures of interest. While RT estimates recorded via the Micro-Optics microphone tended to be slower (b = 102.59, p = 0.01) and more variable (
Image acquisition, preprocessing, and GLM
The fMRI data were acquired with a 3T Siemens Prisma (32 channel head-coil; CMRR multiband sequence, factor = 4; 2.4 mm isotropic voxel, with 1200 ms TR, no GRAPPA, ipat = 0), and subjected to the minimally preprocessed functional pipeline of the Human Connectome Project (version 3.17.0), outlined by Glasser et al. (2013). More detailed information regarding acquisition can be found on the Project OSF site (https://osf.io/tbhfg/). All analyses were conducted in volumetric space; surface maps are displayed in figures only for ease of visualization. Before revision of this manuscript, the data were reprocessed with fMRIPrep, using the standard fMRIPrep pipelines (Esteban et al., 2019, 2020). At this point, the preprocessed results with the HCP pipelines were inadvertently removed. Thus, some follow-up control analyses were conducted with the fMRIPrep-preprocessed data (Extended Data Figs. 2-4, 3-2, 4-4). The fMRIPrep pipeline was implemented in a Singularity container (Kurtzer et al., 2017) with additional custom scripts used to implement file management (more detail on the pipeline is available at https://osf.io/6p3en/; container scripts are available at https://hub.docker.com/u/ccplabwustl).
After preprocessing, to estimate activation patterns, we fit a whole-brain voxelwise GLM to BOLD time-series in AFNI, version 17.0.00 (Cox, 1996). To build regressors of primary interest, we convolved with an HRF [via AFNI's BLOCK(1,1)] 16 boxcar time courses, each coding for the initial second of presentation of a mostly incongruent stimulus that resulted in a correct response. We also included two regressors [similarly created via BLOCK(1,1)] to capture signal associated with congruent and incongruent trials of noninterest (unbiased stimuli) that prompted correct responses, an error regressor coding for any trial in which a response was incorrect or omitted (via BLOCK), a sustained regressor coding for task versus rest (via BLOCK), a transient regressor coding for task-block onsets [as a set of piecewise linear spline functions via TENTzero(0,16.8,8)], six orthogonal motion regressors, five polynomial drift regressors (order set automatically) for each run, and an intercept for each run. These models were created via 3dDeconvolve and solved via 3dREMLfit. The data for each subject's model consisted of 2 runs × 3 blocks × 36 trials (144 from the mostly-incongruent stimulus group, 72 from unbiased). Frames with FD > 0.9 were censored.
Definition of ROIs
Our primary hypotheses concerned a set of six anatomic regions: DMFC, DLPFC, and LPPC in each hemisphere. Consequently, our primary analyses used a targeted ROI-based analysis approach. Rather than defining functional ROIs via a whole-brain searchlight, which has known issues (Etzel et al., 2013), we defined ROIs via a cortical parcellation atlas. We selected the MMP atlas (Glasser et al., 2016) for two reasons: (1) the atlas was developed recently via multimodal imaging measures; and (2) individual MMP parcels are relatively interpretable, as they are heterogeneously sized and have been explicitly connected to a battery of cognitive tasks (Assem et al., 2020), the canonical functional connectivity networks (Ji et al., 2019), and a large body of neuroanatomical research (Glasser et al., 2016). We used a volumetric version, obtained from https://figshare.com/articles/HCP-MMP1_0_projected_on_MNI2009a_GM_volumetric_in_NIfTI_format/3501911?file=5534024 (also available on the project GitHub repository; see Code, Data, and Task accessibility). We then defined a set of six spatially contiguous sets of MMP parcels (three in each hemisphere), which we refer to as “superparcels,” that corresponded to each of our ROIs. For full superparcel definitions, see Extended Data Fig. 1-1. DMFC was defined as the four parcels covering SMA–pre-SMA and dACC. DLPFC was defined as the four parcels that cover middle frontal gyrus (i.e., mid-DLPFC). LPPC was defined as all parcels tiling IPS, from the occipital lobe to primary somatosensory cortex. The overwhelming majority of parcels that met these anatomic criteria were assigned, within a previous report, to the cinguloopercular (most of DMFC), frontoparietal (most of DLPFC), and dorsal-attention (most of LPPC) control networks (Ji et al., 2019). Further, these ROI definitions contain several parcels that correspond to key nodes within the “multiple demand” network (Assem et al., 2020). To assess the robustness of our results to particular superparcel definitions, we additionally used alternative, more inclusive, superparcel definitions of DMFC and DLPFC (see Extended Data Fig. 1-1). For the brain–behavior model selection analysis (see Model selection), we compiled a larger set of anatomically clustered MMP parcels, covering regions across the cortex (Extended Data Fig. 3-3). Two additional, non-MMP ROIs were included in this set, to give better coverage of particular functional brain regions. A mask for ventral somatomotor cortex (the “SomatoMotor–Mouth” network) was obtained from the Gordon atlas (Gordon et al., 2016), as the MMP does not split somatomotor cortex into dorsal and ventral divisions. A mask for left ventral occipito-temporal cortex (encompassing the “visual word-form” area) was obtained using MNI coordinates −54<x<−30, −70 <y<−45, −30<z<−4, specified in a prior report (Twomey et al., 2011). To remove cerebellar voxels from this ROI, we used the Deidrichsen atlas (Diedrichsen, 2006) hosted by AFNI (https://afni.nimh.nih.gov/pub/dist/atlases/SUIT_Cerebellum/SUIT_2.6_1/).
Estimation of coding strength β
To estimate the regional strength of target, distractor, and incongruency coding, we used the RSA framework (Kriegeskorte et al., 2008). The RSA framework consists of modeling the observed similarity structure of activation patterns with a set of theoretically specified model similarity structures (see Fig. 1E). For a given subject and cortical region, fMRI GLM coefficient estimates for each of the 16 conditions of interest (four colors factorially paired with four words; e.g., the word “WHITE” presented in blue hue) were assembled into a condition-by-voxel activity pattern matrix B. The observed similarity structure was estimated as the condition-by-condition correlation matrix
These four models were jointly fitted to the observed similarity structure from each region through multiple regression (ordinary least-squares), separately for each subject. The response vector y and design matrix of this regression were assembled in a series of steps. (1) The 120 unique off-diagonal elements of each similarity matrix (one observed and four models) were extracted and unwrapped into vectors. (2) The four model similarity vectors were separately z-scored and assembled into columns. This formed the RSA design matrix. (3) The observed similarity vector was rank-transformed (Nili et al., 2014) then z score standardized, to form a vector r. (4) The vector r was prewhitened to remove a specific nuisance component. This component stemmed from the task design: though each mostly incongruent stimulus occurred an equal number of times throughout the course of a session, these stimuli were not fully balanced across the two scanning runs. Specifically, half of the stimuli were presented 3 times in the first run versus 6 times in the second (vice versa for the other half). As each scanning run contains a large amount of run-specific noise (Mumford et al., 2014; Alink et al., 2015), this imbalance across runs could lead to a bias in the resulting β coefficients, in which pattern similarity of stimuli that mostly occurred within the same run would be inflated. We formalized this component of bias as another model similarity vector, v, with elements equal to 1 if the run in which condition i most frequently occurred = the run in which condition j most frequently occurred, otherwise 0. The magnitude of this bias was estimated as the slope term b1 in a linear regression
Thus, the RSA regression yielded three β coefficients of interest: βtarget, βdistr., βincon.. These coefficients can be understood as a (standardized) contrast on (rank-transformed) correlations of activity patterns, between conditions in which only one task dimension was shared (e.g., the target dimension for βtarget), versus those in which no dimensions were shared (i.e., different levels of target, distractor, and congruency).
Dimensionality reduction
We used non-metric multidimensional scaling (Kruskal, 1964), a flexible, nonparametric dimensionality reduction technique, to visualize the structures of activity patterns within selected regions (see Fig. 2): ventral somatomotor cortex (corresponding to “mouth” homunculi), primary visual cortex (V1), and (left) DMFC. These parcels were selected to highlight coding of each task dimension. For each selected region, we averaged observed correlation matrices across subjects and then subtracted these values from 1 to obtain a dissimilarity matrix. Before averaging, we z-transformed (inverse hyperbolic tangent, artanh) correlations, and inverted this transform after averaging. (In contrast to the RSA regression above, we did not rank-transform correlation matrices, as non-metric multidimensional scaling incorporates a monotonic regression). Similar to our RSA, we prewhitened each similarity matrix before conducting this procedure (see Step 4 in Estimation of coding strength β). Each mean dissimilarity matrix was submitted to an implementation of Kruskal's nonmetric multidimensional scaling, vegan::metaMDS() in R, to generate a two-dimensional configuration (Oksanen et al., 2019).
Group-level dissociation analysis
To test for regional dissociations in coding preferences, we fit a hierarchical linear model on RSA model fits (see Estimation of coding strength β) obtained from our three ROIs within each hemisphere, and for our three RSA models. Fixed effects were estimated for the interaction of RSA model, ROI, and hemisphere. Random effects by subject were estimated for the interaction of RSA model and ROI, with a full covariance structure (9 × 9; Barr et al., 2013). This model was fit with lme4::lmer() in R (Bates et al., 2014).
Planned contrasts on the fixed effects were performed to test our hypotheses. p values were estimated using an asymptotic z-test, as implemented by the multcomp::glht() function in R (Hothorn et al., 2008). We performed three types of contrasts: (1) to compare coding strengths within-region (e.g., DMFC:
Selection of behavioral measures for individual-level analyses
Audio recordings of verbal responses were transcribed and coded for errors offline by two researchers independently. Discrepancies in coding were resolved by a third. Errors were defined as any nontarget color word spoken by a subject before utterance of the correct response (e.g., including distractor responses, but not disfluencies) or as a response omission. Trials in which responses were present but unintelligible (e.g., because of high scanner noise or poor enunciation) were coded as such.
We fit two hierarchical models on these data: one on errors and one on RTs. Several observations were excluded from these models. From the error model, only trials with responses coded as “unintelligible” were excluded (54). From the RT model, several types of trials were excluded: trials with RTs >3000 ms (1) or <250 ms (53; 52 of which were equal to zero). A cluster of fast and unrealistically invariable RTs from 2 subjects (23/216, 26/216) that were likely because of an artifact of insufficient voice-onset signal within the recording. Trials with a residual RT that was more extreme than three interquartile ranges from an initial multilevel model fitted to all subjects data (of the structure of the model equation below; as in Baayen and Milin, 2010). All were trials with incorrect (137), unintelligible (54), or no response (52). In total, 232 trials were excluded (0-62 per subject), leaving 10,352 trials for analysis (154-216 per subject).
RTs for subject s were modeled (following Laird and Ware, 1982 notation) as follows:
This hierarchical modeling framework enabled us to estimate the amount and internal consistency of individual variability in the Stroop interference effect within both RTs and errors while accounting for trial-level error (e.g., Haines et al., 2020). We used these subject-level estimates to validate that our behavioral measures met prerequisite properties for individual differences analyses. To assess the amount of between-subject variability in the Stroop interference effect, a nested model comparison was conducted, in which the models fitted above were compared with a random-intercept model. Stroop effects differed significantly across subjects in RT (
Individual-level dissociation analysis
Similar to our Group-level dissociation analysis, we tested our individual-level hypotheses within a hierarchical modeling framework. Preliminary analyses suggested that error measures were inadequate for individual differences analyses (see above), so we focused solely on RT measures.
We began with the RT model described in the preceding section. However now, for a given RSA model and ROI, we incorporated into the fixed effects each subject's estimated coding strength, βs, by interacting this coding-strength term with the congruency factor. This formed a cross-level, continuous-by-categorical interaction,
Model selection
To complement our hypothesis-driven brain–behavior analyses, we used a more data-driven model-selection approach. An expanded set of 24 superparcels (in addition to our six ROIs) was defined (see Fig. 3B; for list, see Extended Data Fig. 3-3). Some superparcels were included as ROIs, others were included as negative controls (i.e., regions that were not predicted to be important for explaining behavioral performance). Subject-level coefficients of the Stroop interference effect contrast were extracted from the behavior-only RT model and used as the response vector (in the model equation, the slope elements of
To assess validation-set accuracy, the selected model coefficients were applied to a design matrix from validation-set subjects, generating a predicted Stroop effect vector. The linear correlation was estimated between this predicted Stroop effect and observed Stroop effects (estimated as conditional modes via a hierarchical model separately fitted to validation-set data). The significance of this correlation was assessed by randomly permuting the training-set response vector, refitting the model, generating new predicted validation-set values, and re-estimating the predicted–observed correlation 10,000 times. The p value was given by the proportion of resamples in which the null correlation was greater than the observed correlation.
Exploratory whole-cortex RSA
The RSA-model fitting procedure, as outlined in Estimation of coding strength β was separately conducted on each MMP cortical parcel. Inferential statistics followed those suggested by Nili et al. (2014). One-sided signed-rank tests were conducted for significance testing (>0). p values were FDR-corrected over all 360 parcels, separately for each task dimension (target, distractor, incongruency).
Univariate activation analyses
We additionally conducted a standard “univariate activation” analysis on these data. This was not meant to evaluate whether univariate activity was a plausible confounding variable in our analysis, but rather to provide some basis for comparing our data to most extant neuroimaging studies of Stroop. For a given ROI (or MMP parcel), β coefficients from the first-level fMRI GLM were averaged over voxels by stimuli, then over stimuli by congruency. These mean values were then contrasted, analogous to the behavioral Stroop interference effect (incongruent – congruent). This statistic gives an estimate of the overall (across-voxel) difference in fMRI activity within a given brain region on incongruent versus congruent trials.
Follow-up control analyses
To establish the robustness of our results, we conducted several control analyses that examined a number of confounds and concerns: potential differences in signal-to-noise ratio (SNR) across prefrontal ROIs, the effects of head motion, different RSA models for incongruency coding, the presence of bias imposed by the experimental design, within-run versus between-run RSA estimation, and the effects of downsampling to account for different trial numbers across runs. These are each described next.
Comparison of SNR ratios
To test for differences in SNR between DMFC and DLPFC, we estimated “noise ceilings” within each region and contrasted them across regions. Noise ceilings indicate the maximum observable group-level effect size (RSA model fit) given the level of between-subject variability in similarity structure (Nili et al., 2014). Lower (smaller) average noise ceilings indicate poorer SNR for group-level tests. We used the cross-validated “lower-bound” noise ceiling estimator of Nili et al. (2014), as this yields a lower-variance estimate, and therefore more powerful contrast across regions, than the non-cross-validated “upper-bound” (Hastie et al., 2009). For a given region, the lower-bound noise ceiling is defined for each subject s in
RSA on head motion estimates
As a negative control analysis for our exploratory whole-cortex RSA, we attempted to decode task variables (target, distractor, incongruency) via RSA from framewise estimates of head motion. The 6 motion regressors that were used in the fMRI GLM as nuisance covariates (corresponding to translation and rotation in 3 dimensions) were regressed on the design matrix containing predicted BOLD timecourses of our 16 conditions of interest. The coefficient matrix resulting from this regression was then submitted to the RSA procedures described in Estimation of coding strength β and Exploratory whole-cortex RSA. To check whether more aggressive movement denoising within the fMRI GLM was warranted (i.e., in addition to the 6 nuisance regressors), we conducted this same movement-based RSA, however, using 12 motion regressors (the 6 bases and their temporal derivatives). RSA model fits between the 6 and 12-basis motion-based RSAs were compared via paired-sample signed-rank test.
Alternative RSA incongruency models
The RSA incongruency model parameterized the congruent–congruent correlations (i.e., Rij where i and j are both patterns from congruent trials, denoted here simply as CC) with a separate nuisance regressor. That is, these cells were effectively excluded, and the model instead computed the contrast II – IC. This exclusion was done because we have no specific hypotheses regarding how congruent trials should be encoded relative to one another. Other parameterizations are possible, however, including models that (a) incorporate CC correlations within the “baseline” or intercept term, by omitting the congruent nuisance regressor [i.e.,
Design bias
For the current study, a typical within-run form of RSA estimation was implemented, in which correlations were computed among activation patterns estimated within the same scanning run and first-level GLM. Within-run RSA has been criticized because it is susceptible to design biases that occur when trial orders are insufficiently randomized within the experiment (Cai et al., 2019). A priori, this was not a strong concern in the current design, as trial orders were fully randomized both within and between participants. Nevertheless, we conducted several diagnostic and robustness analyses to validate that our results, and conclusions were not impacted by this potential bias. First, we estimated the extent of the potential bias by running through our RSA pipeline data simulated under a “worst-case” scenario, that is, when SNR = 0 (details described within Extended Data Fig. 1-2) and across a wide range of autocorrelation strengths. At worst, the three models were weakly biased (within 0.02-0.05 of α = 0.05; Extended Data Fig. 1-2). This result indicated that while the bias was present, it was relatively minimal (compare Cai et al., 2019). Second, we validated that these simulated estimates were realistic, by using our actual fMRI data to estimate the false positive rate empirically. To do this, we conducted RSA on the first-level GLM coefficients from each subjects' ventricles, as these voxels should contain no brain activity signal but similar noise characteristics as those of interest (a ventricle mask of 2431 voxels was obtained from AFNI servers: https://afni.nimh.nih.gov/pub/dist/tgz/suma_MNI_N27.tgz). By treating the group-level mean and SD of these RSA model fits as the parameters of a non-central null distribution,
Between-run RSA
As an alternative to within-run RSA, various between-run estimation approaches have been proposed which have been shown to be less sensitive to potential design biases. We opted not to use between-run RSA for our primary analyses, both because of the reduced effects of design bias established above, but also because between-run RSA is noted to be considerably more conservative that within-run RSA (Cai et al., 2019). Moreover, several particulars of the present design are known to further hamper its power. Namely, between-run RSA makes incomplete use of the data, an issue that is exacerbated to the maximum extent possible in the present case, as our design has only the minimum number (two) of cross-validation folds (runs; Diedrichsen et al., 2020). Additionally, because the image acquisition sequence involved a reversal of phase-encoding direction across the two runs, this effectively adds a strong nonlinear component of noise if between-run RSA is used.
Nevertheless, to examine further the extent of design bias in our data, as well as the robustness of our results to the drop in power imposed by between-run RSA, we conducted a follow-up analysis of our primary results using between-run RSA approaches. We conducted two forms of between-run RSA: the first used “cross-run RSA”, which operates on the cross-correlation of patterns between scanning runs (see Alink et al., 2015), and the second used “cross-validated RSA”, which operates on the inner product of pattern contrasts between runs (see Walther et al., 2016). We selected these two forms of RSA as they have complementary benefits. Cross-run correlation is most comparable to our original within-run correlation, as they are both linear correlations. However, using this method within our data set also necessitated using downsampling (as the numbers of trials per condition were not perfectly balanced at the run-level; see Downsampling analysis), which increases the variance of resulting estimates because of discarding data. In contrast, cross-validated RSA is insensitive (in terms of expected value) to the issue of trial numbers per condition (Diedrichsen et al., 2020). Using this method, therefore allowed us to conduct the RSA using all the data at once, without downsampling. But cross-validated RSA tests a more constrained hypothesis than cross-run RSA. Whereas cross-run RSA can be sensitive to nonlinear differences between conditions, cross-validated RSA tests linear discriminability between conditions. Thus, when a nonlinear boundary separates conditions, cross-validated RSA will fail to detect an effect, whereas cross-run RSA could succeed. A nonlinear boundary would occur, for example, when one condition (e.g., incongruent stimuli) drives a reliable, common, response while the other conditions (e.g., congruent stimuli) either drive unreliable, or stable but heterogeneous, responses. Nevertheless, to make cross-validated RSA as comparable as possible to our primary RSA, we z-score standardized patterns before computation and omitted spatial prewhitening (Walther et al., 2016).
Downsampling analyses
Last, we checked whether the primary results were robust to the prewhitening method of data preprocessing, which was introduced to handle the imbalance of trials across runs (see Estimation of coding strength β, step 4). In this analysis, we instead handled this issue by performing RSA after equating the number of trials per run*condition, by iteratively downsampling conditions with random subsets of trials. Specifically, we first fitted GLMs on fMRI time-series, separately for each scanning run, that contained a single regressor per trial (LS-A method of Mumford et al., 2012). The minimum number of times we presented each unique Stroop stimulus in a single run was 3; this was the number to which we downsampled all conditions with >3 occurrences. For these conditions, we randomly sampled three trials, and averaged GLM coefficients voxelwise over these trials. (For 3-trial conditions, we simply averaged all trials that were present.) This formed 16 separate condition-level coefficient vectors (activity patterns) per run. We then averaged these coefficient vectors across run, then estimated condition × condition correlation matrices from these patterns (averaging across runs was omitted from the downsampled cross-run RSA; see preceding section). We repeated this resampling, averaging, and correlation process 1000 times, and averaged the resulting correlation matrices across iterations (after an artanh transform). These correlation matrices were then submitted to the same RSA as outlined in Estimation of coding strength β, with the omission of the prewhitening step (4).
Results
Influential theories of cognitive control have proposed specific dissociations in the type of task information encoded by human medial and lateral frontoparietal cortex (Fig. 1A,B). But previous studies have largely approached this question indirectly, by using tasks designed to recruit these regions differentially in time, then testing for temporal dissociations in regional-mean levels of fMRI activity. Here, we used a more direct approach, by using the similarity structure of neural activity patterns evoked within these regions to estimate their informational content. In particular, through RSA, we compared neural coding of three distinct types of Stroop-task information — target, distractor, and incongruency (Fig. 1D,E) — within each ROI simultaneously, while Stroop interference was being experienced and resolved.
Figure 1-1
ROIs, or “superparcels,” for primary analyses. Colored parcels were included in these superparcels. For DLPFC and DMFC, we additionally conducted sensitivity analyses with more inclusive definitions (i.e., including parcels with asterisked labels). Network assignments were provided in Ji et al. (2019). Parcel names that are in bold were identified as likely being situated over nodes of the “multiple demand network” (Assem et al., 2020), a collection of regions that is commonly recruited by a variety of demanding tasks. Of these, those in gray belonged to the penumbra (recruited somewhat task-selectively) and those in black belonged to the core (robustly recruited regardless of task). LPPC, LPPC; MD, multiple demand; FPN, frontoparietal network; CON, cinguloopercular network; DMN, default mode network; DAN, dorsal attention network; SM, somatomotor network; Vis2, second visual network. Download Figure 1-1, EPS file.
Figure 1-2
Simulation to diagnose the presence and extent of design bias. A, Estimated false positive rate of each RSA model across a range of autocorrelation strengths. We simulated time-series of pure noise and regressed these onto each subjects' design matrix used in the analysis. Time-series were drawn
We describe three sets of analyses. First, we examined group-level effects, to test for neuroanatomical dissociations in representation of task information (Fig. 1B, middle). Second, we examined individual-level effects (i.e., individual differences), to test for dissociations in brain–behavior relationships (Fig. 1B, right). These two analyses were ROI-based and primarily focused on dorsomedial frontal and lateral frontoparietal regions (Fig. 1D), but also included sensorimotor ROIs for comparison purposes. The last set of analyses was conducted in whole-cortex exploratory fashion, to provide a more comprehensive picture of the anatomic profile of each task dimension.
Group
DMFC and DLPFC exhibit distinct coding profiles
Primary group-level results are summarized in Figure 2. Statistical estimates corresponding to results outlined within this section are contained in Table 1.
Figure 2-1
Parameter estimates from a sensitivity test of the group-level RSA, which used alternative (more inclusive) definitions of DLPFC and DMFC (see Extended Data Fig. 1-1). This table contains statistics analogous to Table 1.
Figure 2-2
Mean RSA model fits in each MMP parcel within frontoparietal ROIs. (A) DLPFC, (B) DMFC, (C) LPPC. Hemispheres displayed separately. Errorbars represent 95% CIs on between subject variability (estimated via bias-corrected and accelerated bootstrap, 10 000 resamples). Download Figure 2-2, EPS file.
Figure 2-3
Univariate analysis. A, Across-voxel mean levels of activity in response to incongruent versus congruent conditons (“univariate Stroop contrast”) within each frontoparietal ROI. Errorbars represent 95% CIs of between-subject variability (estimated via bootstrap). B, Subject-level correlations between the univariate Stroop contrast and RSA incongruency model fit. Download Figure 2-3, EPS file.
Figure 2-4
Group-level results from alternative RSA techniques. “Within-run downsampled” columns correspond to results from the downsampling analysis, while “Cross-run downsampled” and “Cross-validated” columns correspond to results from between-run RSA (see relevant Method sections). Following Nili et. al (2014), one-sample contrasts are sign-rank tested against a one-tailed hypothesis, while model comparisons are two-tailed. p values uncorrected. Download Figure 2-4, TEX file.
Figure 2-5
Comparison of results from alternative RSA incongruency models to original parameterization. Two alternative incongruency models were fitted, and model fits (β) were compared with our original parameterization of incongruency coding. See Alternative RSA Incongruency Models section of Methods for description of alternative models. Note that the II – CC parameterization is expected to yield higher variance fits than the other two, because of excluding substantially more data per subject. A, Incongruency model fits for each parameterization in each of our ROIs. Error bars show bootstrapped 95% CI. Each line is a subject. Model fits were highly similar across parameterizations. B, Incongruency model fits for each parameterization in each MMP parcel (each point is a parcel). Model fits were highly similar across parameterizations. Download Figure 2-5, EPS file.
The DMFC has been strongly associated with the coding of incongruency information in response conflict tasks, such as the Stroop. RSA approaches were used to directly test the specificity of that hypothesis. This region was indeed found to encode the incongruency dimension of the Stroop task (left:
We next focused on lateral frontoparietal regions and the coding of target information. DLPFC indeed encoded target information (left:
Thus, at the group level, we observed a single rather than double dissociation between medial and lateral frontoparietal cortex, in the form of enhanced sensitivity to incongruency relative to target coding in left DMFC. In all ROIs, however, these two sources of task information were more strongly encoded than distractor information.
Sensitivity and control analyses
We next tested a series of hypotheses to scrutinize and extend our results.
First, we conducted a positive control analysis to bolster confidence in the statistical power of RSA methods within the present design. In particular, we sought to determine whether our methods could detect dissociations in task coding that are strongly expected to exist. For this, we focused on primary somatomotor and visual cortical ROIs, the responses of which can be assumed to reflect, relatively selectively, response-related (i.e., motoric) and visual form-related coding. As the distractor (word) defines the visual form of the Stroop stimulus, coding of form-related features should be captured by our distractor model. In parallel, as our analysis included only correct-response trials (i.e., in which the target response was spoken), coding of motoric features should be captured by our target model. Consistent with this logic, within early visual cortex, evidence of preferential distractor coding was observed (Fig. 2C; distractor:
Second, we conducted sensitivity analyses to assess the robustness of our results to the particular ROI definitions used. In one analysis, we tested more expansive ROI definitions, by using alternatively-defined superparcels (Extended Data Fig. 1-1). These definitions included additional, more rostral PFC parcels (1 in DMFC, 3 in DLPFC), which begin to encroach into ventromedial PFC and frontopolar cortex (e.g., the rostral DMFC parcel was assigned to the Default-Mode network within the Cole-Anticevic divisions). Nevertheless, the previously observed dissociations were robust to these more liberal definitions (Extended Data Fig. 2-1). In the other analysis, we examined whether the overall superparcel coding profiles were representative of individual parcels. While DMFC and DLPFC results generally reflected that of constituent parcels (Extended Data Fig. 2-2A,B), interestingly, there was substantial heterogeneity within left LPPC (Extended Data Fig. 2-2C). Similar to left DMFC, a collection of left LPPC regions spanning the length of intraparietal sulcus strongly encoded the incongruency dimension (i.e., IP1, IP2, IPS1, AIP, LIP, MIP).
Third, we examined whether the lack of observed discrimination between target and incongruency coding dimensions within DLPFC could be explained by increased error variance potentially present in fMRI activity patterns within this region. Prior work has suggested that PFC regions might be particularly susceptible to this confound (Bhandari et al., 2018). It is possible that we might have observed a dissociation in left DMFC but not DLPFC because of differential levels of statistical power across the two regions. We therefore derived an SNR analysis to determine whether this was a viable explanation (see Materials and Methods). A paired-sample bootstrap test did not indicate a systematic difference between DLPFC versus left DMFC group-level SNR (
Fourth, to provide a basis for comparison to most extant neuroimaging research of the Stroop task, we conducted a “univariate activation” analysis, examining whether these brain regions were generally more active during incongruent versus congruent conditions. No regions were found to respond more strongly overall to incongruent versus congruent conditions (Extended Data Fig. 2-3A), although the mean contrast in DMFC (L) was positive (i.e., incongruent>congruent). This null result was not surprising, however, because of the high frequency of incongruent trials within the experiment — which is known to reduce both the behavioral and neural univariate Stroop effect (Logan and Zbrodoff, 1979; Carter et al., 2000; De Pisapia and Braver, 2006). While this null result demonstrates the utility of using RSA in this case, it should not, however, be seen as direct evidence for the increased sensitivity of RSA versus univariate methods, as the univariate and RSA-based tests as implemented here are subject to different constraints and are thus incomparable (Allefeld et al., 2016). Finally, the magnitude of the univariate Stroop effects was only weakly correlated to incongruency coding model fits (Extended Data Fig. 2-3B), suggesting that these measures were nonredundant.
Finally, we tested whether these patterns of results were robust to alternative RSA techniques, including a downsampling technique to equate trial counts across runs, two “between-run” RSA methods (cross-run RSA and cross-validated RSA), and alternative parameterizations of the incongruency coding model (see Materials and Methods). Findings were robust to downsampling and to between-run RSA (Extended Data Fig. 2-4), and were highly similar across different parameterizations (Extended Data Fig. 2-5). Interestingly, however, the detection of incongruency coding in DMFC depended on whether a linear or nonlinear RSA method was used. When using an RSA method that tests linear discriminability between conditions (cross-validated RSA), the incongruency coding effect was abolished in DMFC; whereas when using a comparable method that is sensitive to nonlinear pattern differences (cross-run RSA), the effect remained quite strong. This pattern of results suggests that incongruency information was encoded nonlinearly within DMFC activation patterns. Indeed, this can be seen within the two-dimensional embedding (Fig. 2), as a radial, rather than linear, separation of incongruent (central) and congruent (peripheral) stimuli.
Individual
Primary individual-level results are summarized in Figure 3. Statistical estimates corresponding to results outlined within this section are contained in Table 2 (for scatter plots of all associations, see Extended Data Fig. 3-1).
Figure 3-1
Associations between subjects' coding strength (β) and Stroop interference effect (RT) separately displayed within each hemisphere, ROI, and coding model (target, incongruency). Green, target coding model; orange; incongruency coding model. Confidence bands indicate 95% CI from percentile bootstrap (10 000 resamples). Download Figure 3-1, EPS file.
Figure 3-2
Robustness check of individual-level correlations found in our primary analysis (see Fig. 3;Table 2) to alternative RSA techniques. “Within-run downsampled” corresponds to results from the downsampling analysis, while “Cross-run downsampled” and “Cross-validated” corresponds to the between-run RSA (see relevant Method sections). While attenuation of effect sizes is generally expected when moving to higher variance methods (e.g., downsampling, between-run RSA), critically, the signs of the effects are robust to technique. Notably, however, the large reduction in DMFC incongruency coding correlation with cross-validated RSA, but preservation of the effect with cross-run RSA, matches what is seen at the group level (see Extended Data Fig. 2-4, 4-4), and suggests that incongruency information was encoded non-linearly within DMFC activation patterns.
Figure 3-3
Superparcels defined for model-selection analysis. All superparcels were lateralized (one in each hemisphere) except for relatively early sensory and motor regions ('Aud,''V1-V3,''VSM'). Ventral somatomotor cortex was defined using the Gordon et al. (2016) atlas 'somato-motor–mouth network,' as MMP's somatomotor parcels encompass all of somatomotor strip (i.e., dorsal and ventral areas). Left ventral occipitotemporal cortex (“visual word-form area”) was defined using coordinates from Twomey et al. (2011). DPM, dorsal premotor cortex; VPM, ventral premotor cortex; VVis, ventral visual cortex; AIns, anterior (frontal) insular cortex; IFC, inferior frontal cortex; OFC, orbitofrontal cortex; FPC, frontoparietal cortex; Aud, early auditory cortex; V1-V3, visual cortex, VSM, ventral somatomotor cortex. LVOT, left visual occipitotemporal cortex. Download Figure 3-3, TEX file.
Better-performing subjects have stronger lateral frontoparietal target coding
The fidelity of target-related information in lateral frontoparietal cortex — DLPFC, in particular — is thought to be closely linked to the efficiency with which an individual resolves response conflict in tasks such as Stroop. By using subject-level target-coding estimates (βtarget) to model behavioral performance (RT), we tested this fundamental prediction relatively directly. Indeed, subjects with stronger target coding in both right DLPFC and right LPPC resolved Stroop interference effects more quickly (DLPFC:
Conversely, incongruency-related responses of DMFC are thought to be positively associated with maladaptive policies of response selection. In line with this notion, subjects with stronger incongruency coding in left DMFC tended to exhibit greater Stroop interference, although this interaction was nonsignificant (
Considering the collective pattern of results, we conclude here that our findings support the hypothesis that target coding in (right) DLPFC and in LPPC reflected a common process of implementing control.
Model selection affirms a lateral–medial dissociation and identifies unexpected relationships
Because the preceding hypothesis-driven analysis exclusively focused on a limited set of regions, important brain–behavior relationships may have been missed. A more accurate model may even omit target and incongruency coding from DLPFC, DMFC, and LPPC altogether. Consequently, we conducted a more comprehensive test to identify regions and task dimensions that could better account for Stroop performance variability across individuals.
A data-driven model selection analysis was conducted to address this question (see Materials and Methods). We defined an expanded set of 24 cortical regions (superparcels), including the six defined and used earlier, which covered various areas that may be important for performing the Stroop task (Fig. 2B; Extended Data Fig. 3-3). Conducting RSA on each superparcel furnished three coding estimates (one per coding model) per superparcel. These 72 estimates were then used as features in a cross-validated model selection procedure.
Strikingly, the selected model contained all three hypothesized measures: (right) DLPFC and LPPC target coding, and (left) DMFC incongruency coding. In addition, two unexpected measures were identified, both with negative slopes: left DLPFC distractor coding (
Nevertheless, to provide a cursory test of a truly independent validation set (i.e., with no familial dependency to the training set), we excluded all subjects in the training set who were co-twins of with those in the validation set, and reconducted this model selection procedure. This amounted to discarding 16/49 (33%) of training-set observations. The selected model contained only one variable, which was not in our ROIs (but in early visual cortex), and was unable to predict held-out Stroop effects (r = −0.12). This is perhaps unsurprising, however, given the substantial reduction in the size of the training dataset for an already high-dimensional model. To reduce the dimensionality, we reconducted this analysis, focusing now instead only on ROIs and coding schemes of interest — target coding in DLPFC and LPPC, and incongruency coding in DMFC (within each hemisphere) — and additionally ensured that all variables were used in prediction (via ridge regression). This model was better able to predict the held-out Stroop effect (r = 0.20), in particular, relative to a comparable model that contained theoretically “mismatched” ROI×coding scheme combinations (incongruency coding in DLPFC and LPPC, target coding in DMFC; r = −0.17; bootstrapped p = 0.088).
Results from these model selection analyses affirm a functional dissociation across the medial–lateral axis of frontoparietal cortex, and further demonstrate that Stroop-task representations within DLPFC and DMFC hold relatively privileged relationships with behavior
Exploratory whole-cortex RSA
In a final, exploratory analysis, we estimated RSA models separately for each MMP parcel, to determine more comprehensively how target, distractor, and incongruency coding are distributed across cortex. These three task dimensions were encoded across cortex according to different neuroanatomical profiles. Target coding was widespread (observed in 207/360 parcels), covering substantial portions of the frontal and temporal lobes, including many perisylvian regions (Fig. 4, top; Extended Data Fig. 4-1). Notably, the strongest target coding was observed within regions that receive strong sensory and (or) motor-related input (Extended Data Fig. 4-1). Contrastingly, incongruency coding was detected predominantly within prefrontal and intraparietal sulcal parcels — including DMFC, but also left LPPC, bilateral superior frontal gyrus, and left lateral frontopolar cortex (rostral DLPFC) — but additionally within left retrosplenial and right lateral occipital cortex (Fig. 4, middle; Extended Data Fig. 4-3). Aside from this occipital area, these incongruency-coding parcels notably belonged to control networks (frontoparietal, cinguloopercular, and dorsal attention; Extended Data Fig. 4-3). In a third, distinct profile, distractor coding was only observed within early visual cortex (left V1 and V2; Fig. 4, bottom; Extended Data Fig. 4-2).
Figure 4-1
MMP parcels with significant target model fits. p values are FDR-corrected across all 360 cortical parcels.
Figure 4-2
MMP parcels with significant distractor model fits. p values are FDR-corrected across all 360 cortical parcels.
Figure 4-3
MMP parcels with significant incongruency model fits. p values are FDR-corrected across all 360 cortical parcels.
Figure 4-4
Robustness check of results from exploratory whole-cortex RSA (see Fig. 4) to alternative RSA techniques. “Within-run downsampled” corresponds to results from the downsampling analysis, while “Cross-run downsampled” and “Cross-validated” corresponds to between-run RSA (see relevant s). Maps are thresholded for model fits >0 with significance at FDR-corrected (across all 360 parcels) p value of 0.05. Note that, in the present design, between-run RSA methods are expected to strongly attenuate effect sizes (see Materials and Methods). Nevertheless, our core results were robust: in both downsampled and between-run RSA methods, we identified incongruency coding in parcels within DMFC, target coding in mid-DLPFC, and distractor coding in visual cortex. Notably, the reduction in DMFC incongruency coding with cross-validated RSA, but preservation of the effect with cross-run RSA, matches what is seen at the individual-difference level (see Extended Data Fig. 3-2), and suggests that incongruency information was encoded non-linearly within DMFC activation patterns (see also Fig. 2B). Relative to the prewhitening approach we used to handle imbalance of trials across runs, the downsampling approach appears to have increased sensitivity to incongruency coding, in terms of coverage (compare Fig. 4). This could be because of the prewhitening approach (in a separate, preceeding regression) being a somewhat more aggressive denoising strategy. Critically, however, this did not impact our main conclusions: we still observed stronger incongruency coding in DMFC than in DLPFC (see Extended Data Fig. 2-4).
As a negative control analysis, we tested whether we could decode these three task variables (target, distractor, incongruency) from framewise head motion estimates, using the same RSA procedures as above. No task variable was significantly encoded within patterns of head movements (
Finally, we repeated this exploratory analysis using alternative RSA methods involving downsampling and between-run RSA (see Downsampling analysis, and Between-run RSA; Extended Data Fig. 4-4). Across these analyses, the core results were quite robust: we found incongruency coding in parcels within DMFC (though this depended on the use of nonlinear RSA methods, as in Extended Data Figs. 2-4, 3-2), target coding in mid-DLPFC, and distractor coding in visual cortex. Our findings were therefore not specific to a particular estimation method.
Collectively, these exploratory results confirm and extend our prior findings. (1) As with the reported brain–behavior associations, target coding was emphasized relative to coding of other task dimensions. (2) Yet despite this emphasis, important and expected dissociations in anatomic profiles were identified across our three coding models, further suggesting that these models were successful in measuring coding of distinct task dimensions (Fig. 1A).
Discussion
We analyzed the similarity structure, or representational geometry (Kriegeskorte and Kievit, 2013), of frontoparietal activity patterns associated with cognitive control, during performance of the classic color-word Stroop task. In left DMFC, incongruency coding predominated. While DLPFC and LPPC encoded both target and incongruency-related information, distractor coding was not detected in these regions but was instead identified in early visual cortex. Further, these neural coding estimates were important and specific indicators of individual differences in magnitude of the behavioral Stroop interference effect. Individuals with stronger target coding in right DLPFC and right LPPC, but weaker incongruency coding in left DMFC, exhibited enhanced cognitive control, in terms of a reduced Stroop effect. Further, in a more comprehensive predictive model that included coding measures from a wide set of cortical regions, coding measures specifically from lateral frontoparietal and dorsomedial frontal regions were privileged in their link to behavior.
On one level, this study is a specific extension of research that has drawn dissociations between control-related functions of DLPFC and DMFC (MacDonald et al., 2000; Floden et al., 2011). Most prominently, MacDonald et al. (2000) used a modified Stroop-task design, in which task rules (delivered via precues) randomly alternated between color naming and word reading across trials, to demonstrate that DLPFC and DMFC encoded different types of information during cognitive control engagement. In particular, DLPFC was selectively recruited following cues for the more demanding color-naming task, whereas DMFC was instead driven by incongruent color-naming trials. This pattern of recruitment suggested that DLFPC encodes task-set and rule-related information in a preparatory manner, whereas DMFC encodes incongruency and conflict-related information in a stimulus-evoked manner. In the current study, we leveraged the high spatial dimensionality of fMRI to test whether this functional dissociation can be observed within a common time window of response selection, and further with a more traditional Stroop-task design, which does not involve task cues or switches. Our findings reinforce the conclusions of these relatively low-powered studies (N = 12 in Floden et al., 2011; N = 9 in MacDonald et al., 2000) and indicate that the dissociations were not dependent on the use of cued task-switching designs. Synthesizing these prior findings with those of the present study hints at a continuity in the putative role of DLPFC during target selection. Rather than exclusively contributing to preparation, DLPFC coding may evolve from proactively representing abstract rule or set-related information, toward more concrete targets and behavioral choices as relevant stimulus information becomes available in the environment. This view accords with work in monkey neurophysiology (Mante et al., 2013; Rigotti et al., 2013; Stokes et al., 2013), yet further work is needed to determine whether similar dynamics occur within human DLPFC and how such dynamics may reflect or interact with specific processes of cognitive control in Stroop-like tasks.
More broadly, the results of this study highlight the utility of RSA and the general representational geometric framework for investigating cognitive control. Previous work has used MVPA decoding in the Stroop task to study the impact of control demand on posterior representations (e.g., Banich et al., 2019). Here, we used the RSA framework to explicitly model and decompose control-related frontoparietal representations. Indeed, a major motivation of the current study was to assess how well RSA measures of frontoparietal coding map to theorized mechanisms of control. For this purpose, the medial–lateral functional dissociation in frontoparietal cortex was a useful test-bed, as it features in several theoretical accounts (Botvinick et al.,2001; Miller and Cohen, 2001; Ridderinkhof et al., 2004; Shenhav et al., 2013). Our results were generally in line with these accounts, joining with a growing body of research in suggesting that RSA provides a convenient yet powerful framework from which neural measures can be used to test cognitive control theory (for review, see Freund et al., 2021).
Nevertheless, the current work represents only an initial step in using the RSA framework to investigate cognitive control within Stroop-like tasks. As such, our study raises a number of unaddressed questions. But promisingly, there are ample opportunities for improving and extending the RSA framework highlighted here. For instance, a key limitation of the current study was the finding of widespread coding of the target dimension, suggesting a lack of specificity in the target RSA model. This is perhaps not surprising, however, as the model would capture not only coding of attentional-template and choice-related information, but also hue and response-related information. We mitigated this issue by demonstrating that target coding was selectively related to behavior within DLPFC and LPPC. Yet, this limitation could be addressed more powerfully by experimental design. Adding specific factorial manipulations, such as a task rule manipulation (see MacDonald et al., 2000; Hall-McMaster et al., 2019) or a response modality manipulation (see Minxha et al., 2020; see also Barch et al., 2001), would enable a richer, more precise set of cognitive control-relevant coding variables to be estimated (Fig. 5).
Future work could also address some of the complexities revealed by our data that were not entirely accounted for by the theoretical frameworks we used. For one, although predicted coding profiles emerged in some frontoparietal ROIs, all regions encoded incongruency and target information. This incongruency-coding finding is consistent with prior univariate fMRI research (e.g., Nee et al., 2007; Niendam et al., 2012) and a more recent finding that the responses of single neurons in human dACC and DLPFC are robustly modulated by conflict (Smith et al., 2019). With respect to target coding, however, one speculative interpretation is that, during the relatively late phase of response selection and execution, control networks may lose modular structure as the circuitry collectively converges on a behavioral choice. This interpretation accords with the fact that “choice axes” are encoded within multiple key nodes of frontoparietal decision circuitry, for example: in macaque LIP (Roitman and Shadlen, 2002), in macaque caudal DLPFC (Mante et al., 2013), and in human dACC and pre-SMA (Minxha et al., 2020; see also Okazawa et al., 2021). This account could be addressed using the enriched experimental design described above, identifying when and where choice coding is emphasized over the course of a trial.
Another unexpected finding was a relatively robust negative relationship between the strength of incongruency coding in left ventral visual cortex and the magnitude of the Stroop effect (Fig. 3B, left). One interpretation of this finding is provided by the biased competition framework, as an effect of selective visual attention. Prior work has demonstrated that certain ventral visual regions, those that are strongly tuned to target features, activate as a function of Stroop incongruency (Egner and Hirsch, 2005). In our task, mid-ventral stream areas may have received biasing input, selectively on incongruent trials, which enhanced stimulus-related coding and communication with downstream regions. Using the expanded RSA design sketched above, we might expect that such an effect would be limited to color naming conditions, when selective attention processes would be most prominent.
Perhaps the most surprising result was the robust negative correlation observed between left DLPFC distractor coding and the behavioral Stroop effect (Fig. 3B, right). At face, accounting for this finding within the framework of top-down biased competition is difficult. But, given the statistics of our task in which incongruent trials were frequent and congruent were rare, distractor information could have been used to facilitate performance. An association between distractor features and incongruency could have been learned and used to influence response selection, for example, by retrieving and implementing a stimulus-appropriate attentional setting (Melara and Algom, 2003; Bugg and Crump, 2012). Indeed, subjects clearly do exploit these associations, as indicated by reduced Stroop effects for stimuli that are “mostly incongruent,” also known as “item-specific proportion congruency” effects (ISPC; Bugg et al., 2011; Jiang et al., 2015; see also Crump and Milliken, 2009). The prediction that distractor coding might reflect ISPC effects could be tested by varying ISPC levels across different stimuli. For stimuli in which the specific color or word is not predictive of congruency, the relationship between DLPFC distractor coding and improved Stroop performance should not be present.
Finally, the present study sets the stage for using RSA to test the dual-mechanisms framework of cognitive control (Braver, 2012). This framework explains much within- and between-individual variability in cognitive control function by the existence of two operational “modes” of cognitive control: proactive and reactive. These modes are proposed to have dissociable signature neural coding schemes. Proactive control should rely heavily on goal-relevant coding schemes that originate in LPFC before target-stimulus onset as abstract rule or context coding, but which may morph into target coding after stimulus onset. In contrast, reactive control should rely on an incongruency-based coding scheme (including coding of whichever task dimensions are predictive of incongruency), originating post-target onset, with potential loci in DMFC or subcortical structures (Jiang et al., 2015; Chiu et al., 2017). As suggested here, it may be possible to measure correlates of these neural coding schemes via RSA. Experimental factors that encourage subjects to adopt one mode over another (e.g., strategy training, expectancy of difficulty) should correspondingly shift frontoparietal coding schemes along these proactive and reactive dimensions. Further, their behavioral relevance should predictably change, as well: for example, in task contexts in which a proactive control mode is theoretically maladaptive, subjects with stronger proactive coding should perform worse. Thus, the dual-mechanisms framework suggests a broad range of hypotheses amenable to testing with RSA methods (see, e.g., Hall-McMaster et al., 2019). Such hypotheses can be addressed in the broader Dual Mechanisms of Cognitive Control dataset (Braver et al., 2020), of which the data used here are a small subset.
Footnotes
This work was supported by National Institutes of Health Grant R37 MH066078 to T.S.B. Mensh and Kording (2017) was a useful resource for organizing an initial draft of this manuscript. Computations were performed in part using the facilities of the Washington University Center for High Performance Computing, which were supported in part by National Institutes of Health Grants 1S10RR022984-01A1 and 1S10OD018091-01. Surface images were prepared with Connectome Workbench (Marcus et al., 2011). The original copy of this manuscript was drafted with papaja (Aust and Barth, 2020) and knitr (Xie, 2015). We thank all former and current team members of the Dual Mechanisms of Cognitive Control Project for their efforts; Jo Etzel for general methodological wisdom; Atsushi Kikimoto for useful thoughts on our RSA models; the Cognitive Control and Psychopathology Laboratory for support and suggestions; and other Washington University in St. Louis colleagues in the J.M.B., Kool, and Zachs laboratories.
The authors declare no competing financial interests.
- Correspondence should be addressed to Michael C. Freund at m.freund{at}wustl.edu