Abstract
Metacognition describes the process of monitoring one's own mental states, often for the purpose of cognitive control. Previous research has investigated how metacognitive signals are generated (metacognitive monitoring), for example, when people (both female/male) judge their confidence in their decisions and memories. Research has also investigated how metacognitive signals are used to influence behavior (metacognitive control), for example, setting a reminder (i.e., cognitive offloading) for something you are not confident you will remember. However, the mapping between metacognitive monitoring and metacognitive control needs further study on a neural level. We used fMRI to investigate a delayed-intentions task with a reminder element, allowing human participants to use their metacognitive insight to engage metacognitive control. Using multivariate pattern analysis, we found that we could separately decode both monitoring and control, and, to a lesser extent, cross-classify between them. Therefore, brain patterns associated with monitoring and control are partially, but not fully, overlapping.
SIGNIFICANCE STATEMENT Models of metacognition commonly distinguish between monitoring (how metacognition is formed) and control (how metacognition is used for behavioral regulation). Research into these facets of metacognition has often happened in isolation. Here, we provide a study which directly investigates the mapping between metacognitive monitoring and metacognitive control at a neural level. We applied multivariate pattern analysis to fMRI data from a novel task in which participants separately rated their confidence (metacognitive monitoring) and how much they would like to use a reminder (metacognitive control). We find support for the notion that the two aspects of metacognition overlap partially but not fully. We argue that future research should focus on how different metacognitive signals are selected for control.
Introduction
Our brains possess a remarkable ability to monitor performance and to then use metacognition to control future behavior. For example, if you have low confidence that you will remember a delayed intention (metacognitive monitoring [MetaM]) like regular medication intake, you might set a reminder on your phone (metacognitive control [MetaC]). This distinction between monitoring and control is found in the seminal metamemory framework by Nelson and Narens (1990; see also Flavell, 1976; Kluwe, 1982; Brown, 1987; Yeung et al., 2004; Efklides, 2008; Fletcher and Carruthers, 2012; Shea et al., 2014; S. M. Fleming and Daw, 2017; Fleur et al., 2021), which proposes that cognition functions at two distinct levels: the object and the meta level (Fig. 1A). Information at the object level about decisions, memories, attention, action, and so forth is re-represented at the meta level via a process of MetaM. Meanwhile, information at the meta level controls processing at the object level (MetaC). Shimamura's (2000) dynamic filtering theory extends the framework by Nelson and Narens (1990), ascribing the role of the object level to posterior cortical regions and the role of the meta level to PFC. The information flow between these regions forms the basis of MetaM and MetaC.
We are only slowly beginning to understand the neural mapping between MetaM and MetaC. This mapping or link describes the relationship that exists between MetaM and MetaC on a functional level: are these labels describing the identical process or two different computations with different inputs? This question is important because one rationale for studying MetaM is that it can provide insight into MetaC (e.g., Boldt and Yeung, 2015; Bang and Fleming, 2018; Gherman and Philiastides, 2018; Miyamoto et al., 2018; Odegaard et al., 2018; Shekhar and Rahnev, 2018; Ye et al., 2018; Masset et al., 2020; Wokke et al., 2020). This would be strengthened if the mapping between the two were better understood. Furthermore, dissociations have been found between MetaM and MetaC. For example, in some circumstances, young children (Redshaw et al., 2018), obsessive-compulsive disorder patients (Vaghi et al., 2017), older adults (Dunlosky and Connor, 1997), and individuals with autism spectrum conditions (Grainger et al., 2016) have a diminished mapping between MetaM and MetaC, which could lead to suboptimal behavioral regulation. However, the potential neural substrates for this variability are unknown.
One of the reasons why the MetaM-MetaC mapping has received little attention is that the two aspects of metacognition are usually studied in isolation (but see Koriat et al., 2006, 2014; Son and Schwartz, 2009; Qiu et al., 2018; Mei et al., 2020; Schulz et al., 2021). Studies on MetaM commonly explore the variables that affect how confident people feel and the associated neural correlates. For example, neuroimaging studies have identified a widespread network of involved regions, including the rostrolateral PFC (Yokoyama et al., 2010; S. M. Fleming et al., 2012; Allen et al., 2017) and also the precuneus specifically for metamemory studies (e.g., Baird et al., 2013; McCurdy et al., 2013; Ye et al., 2018). Moreover, machine-learning techniques have been used to “decode” brain patterns associated with low versus high confidence, using both fMRI (Cortese et al., 2016; Hebart et al., 2016; Morales et al., 2018) and EEG (Boldt and Yeung, 2015). Research on MetaC, on the other hand, has focused on situations in which metacognitive experiences are used for learning, communication, or speed-accuracy trade-off, to name a few (e.g., Metcalfe and Finn, 2008; Bahrami et al., 2010; Shea et al., 2014; Guggenmos et al., 2016; Desender et al., 2019; Lak et al., 2020; Frömer et al., 2021).
Most of what we know about the link between monitoring and control comes from the field of cognitive control and error monitoring. Electrophysiological correlates have been found that signal not only when an error has been committed but are also sensitive to correct-trial performance fluctuations (Allain et al., 2004; Yeung et al., 2004). Such monitoring of errors often results in lower response speed immediately after a mistake, a robust and often-replicated phenomenon termed post-error slowing (Rabbitt, 1966; Notebaert et al., 2009; Danielmeier and Ullsperger, 2011). In addition to errors, conflict signals appear to be monitored by the posterior medial frontal cortex, including the dorsal anterior cingulate cortex (dACC). The lateral PFC (laPFC) is thought to receive this input and implement cognitive control (Ridderinkhof et al., 2004). It should be noted that participants are often not aware of such errors or response conflicts and that these studies are not directly measuring metacognitive signals. Nevertheless, evidence from this domain suggests that similar brain regions support MetaM and MetaC. Qiu et al. (2018) conducted four elegant fMRI experiments, using a decision–redecision paradigm: Participants were presented twice in a row with each stimulus and rated both their response and confidence for each presentation. They reasoned that participants would engage MetaM for their initial response and use MetaC to revise and improve decisions in the redecision phase. Their analyses revealed an involvement of dACC in the first response and lFPC in the second. However, because the order of the decision-redecision phases was always the same, it is impossible to conclude whether the redecision phase really triggered more MetaC or whether the signal observed in lPFC was instead a “late” monitoring one. Another open question is whether MetaM and MetaC rely on similar representations.
In order to address these questions, it is necessary to study both aspects of metacognition in a single paradigm, which we did using a cognitive offloading task. Cognitive offloading is the use of physical action to reduce cognitive demand (e.g., setting external reminders rather than relying on internal memory). Previous research has demonstrated a MetaM-MetaC link whereby individuals are more likely to set reminders (MetaC) when they have low confidence in their memory abilities over and above the influence of their actual memory performance (MetaM) (Dunn and Risko, 2016; Risko and Gilbert, 2016; Hu et al., 2019; for a review see Gilbert et al., 2022). This finding is a robust pattern that can even be observed when reminder setting is not explicitly instructed (Boldt and Gilbert, 2019) or when confidence was measured in an unrelated perceptual task (Gilbert, 2015). Here, we use a decoding approach to examine this link at a neural level.
Participants performed a delayed intention task where in separate blocks they engaged in MetaM (how confident am I that I will remember?) or MetaC (how much would I like a reminder?). This allowed us to answer two questions: (1) Do similar brain patterns characterize MetaM and MetaC? If so, (2) can the neural patterns that characterize specific acts of MetaC be exhaustively characterized in terms of their associated processes of MetaM? We answered these questions by examining cross-classification between MetaM and MetaC: the extent to which a classifier trained on one judgment can decode the other. Insofar as this is possible, this implies a shared neural code for MetaM and MetaC. But if cross-classification is weaker than decoding MetaM and MetaC individually, this implies that their neural bases do not overlap fully.
Materials and Methods
Participants
We trained 29 participants in a behavioral task during a first session. After reviewing their training data, 22 participants returned to the laboratory for a second MRI session 1-21 d later, excluding 7 participants (2 unsuited for MRI because of safety regulations, 2 had extreme staircase values, 3 were unavailable for a second session). Another participant was excluded after scanning because of excessive movement in the scanner. This resulted in a final sample of 21 participants, of which 15 were female and 6 were male. While we determined our sample size based on practical constraints and on available resources, the final sample size of N = 21 is nevertheless in accordance with previous MRI studies using similar methods (Hebart et al., 2016; Morales et al., 2018; Qiu et al., 2018). Participants were 20.3 years on average (18-26 years) and were paid £36 for their participation in both sessions (∼90 and 150 min). All participants were right-handed, had intact color vision, no uncorrected visual impairments, and had not been diagnosed with any psychiatric or neurologic disorders. All testing was approved by the local ethics committee, and participants gave informed consent before taking part in the study.
Experimental design
In order to investigate the extent to which neural patterns associated with MetaM and MetaC are similar or distinct, we had to study both aspects of metacognition within a single paradigm. Participants underwent short miniblocks of ongoing shape discrimination trials. For this ongoing task, participants had to quickly and accurately decide whether an array of colored shapes grouped around a fixation dot looked on average more like a circle or a square (De Gardelle and Summerfield, 2011) by pressing one of two buttons. The response categories were equally likely. During some of these miniblocks, participants also had to maintain a delayed intention to press a different button if the stimulus appeared in a target color (Fig. 1B). Participants were allowed to use reminders (cognitive offloading) to support their prospective memory in approximately half of the miniblocks, which meant that the central fixation dot of the stimulus took on the target color for the duration of the miniblock. Instead of having to rely on their memory, participants could then simply wait for the color of the shapes to match the color of the fixation dot, making the fulfillment of the delayed intention much easier. There were 12 colors, placed equidistant in RGB space. Within each miniblock, colors were drawn without replacement. There was only one target color per miniblock, presented at the beginning of the miniblock, and its occurrence during the ongoing-task trials always terminated the miniblock.
The task comprised three within-subject experimental conditions (20% Baseline, 40% MetaM, and 40% MetaC; Table 1) each structured into miniblocks. A miniblock comprised presentation of a target color (except for in the Baseline condition, which had no prospective memory element), a single metacognitive rating or cursor placement, followed by 3-7 ongoing-task trials. The number of trials per miniblock was drawn from an exponential distribution with a mean of µ = 1.1; in other words, shorter miniblocks were more frequent than longer miniblocks. Each of the eight blocks consisted of 94 shape trials spread unevenly across 40 miniblocks (Fig. 1C). The critical difference between our two key conditions was the metacognitive rating given about the target color before each miniblock. In the MetaC condition, participants reported how much they would want to set a reminder to help them remember this target color. The higher the rating given by the participant, the greater the likelihood of receiving a reminder, which occurred on ∼50% of miniblocks. More specifically, ratings larger than the moving median of the past 20 MetaC ratings were assigned a reminder, whereas ratings below this cutoff had to be solved using only unaided memory. In the MetaM condition, participants reported their prospective confidence in remembering the target color. However, this had no influence on the likelihood of receiving a reminder, which occurred on a randomly selected 50% of miniblocks. In other words, the two conditions also differed in the relationship between participants' ratings and the provision of reminders. In the MetaM condition, participants' ratings had no influence on whether or not they received a reminder. In the MetaC condition, on the other hand, which miniblock contained a reminder was largely determined by participants' ratings. Therefore, in the MetaM condition, participants engaged in MetaM but did not exercise MetaC. In the MetaC condition, they exercised control to make a decision, which is known to be guided by metacognition (Boldt and Gilbert, 2019; Gilbert, 2015). However, they were not explicitly asked to make a direct metacognitive judgment.
In the Baseline condition, there was no target color and thus no prospective-memory component (and no need for a reminder). The rating participants were asked to give was thus an “empty” one, that is a scale without labels but with a cursor was presented on screen together with two little markers indicating where the cursor should be placed on the scale. Participants then had to move the cursor to the indicated position. In all three conditions, participants were instructed to move the cursor at least once to submit a rating.
Each block was comprised of only two of the three conditions, the Baseline condition together with either the MetaM or the MetaC condition and alternated between the two. Within each block, conditions were predictable, that is they always followed the order of one Baseline miniblock followed by four other miniblocks. We determined the optimal order of conditions using simulations, allowing us to maximize the efficiency of our design. The main analysis window was the initial 7 s of the task (presentation of target color and rating). At the time of these prospective ratings, participants were still unaware whether or not they would receive a reminder, keeping our key contrast free of confounds, which would have been unavoidable had we chosen a retrospective confidence judgment as is more commonly used in the field. To increase the number of instances this analysis window was shown, we therefore included partial miniblocks; that is, half of the time (20 miniblocks per run), the miniblock ended immediately after the rating without the need to perform any shape classification trials or search for the target.
The study comprised two sessions. The purpose of the first session was assessment of MRI safety, completion of a prestudy questionnaire on how much participants liked the 12 colors used in the task, and training in the behavioral task (presented in MATLAB using Psychtoolbox3) (Kleiner et al., 2007). Participants first completed eight practice blocks, each introducing them to a new aspect of the paradigm. They then completed four experimental runs that were identical to the task they would have to complete while in the scanner, each lasting ∼9 min. During the second session, participants first underwent two practice blocks outside of the scanner (each lasting ∼5 min) to remind them of the task before they completed eight runs in the scanner, with a 6 min T1 scan between the fourth and fifth run. One participant only completed six blocks because of feeling unwell inside the scanner. Because of the unbalanced design, we decided to exclude this participant from all multivariate analyses.
At the end of the second session, participants were furthermore asked to fill in a post-experiment questionnaire, asking them to rate the liking of all colors again, together with how difficult they found them and several additional questions to determine whether they perceived the MetaM and MetaC conditions as similar, how much control they felt during these conditions, how they used the reminders depending on whether or not they asked for them, and how they approached each rating. The orientation of the rating scales was flipped halfway through the experiment to avoid confounding visuomotor processes with low versus high ratings. The order of scale orientations, response keys for the shape task, and the order of the conditions were counterbalanced across participants.
MRI data collection and preprocessing
We used a 1.5T Siemens Avanto scanner with a 32-channel head coil and MRI-safe button boxes. We acquired both T1-weighted structural images, as well as T2*-weighted EPI (64 × 64; 3.2 × 3.2 × 3.2 mm voxels) with BOLD contrast. We used a multiband acquisition sequence with acceleration factor = 3, TE = 54.8 ms, flip angle = 75°, to record 39 interleaved, axial slices (3.2 mm thick, oriented approximately to the anterior commissure–posterior commissure plane). This allowed us to cover most of the brain with an effective repetition time of 1.3 s per volume. Encoding phase direction was anterior to posterior. Functional scans were acquired in eight runs, each comprising 410 volumes (∼9 min). The first five volumes in each session were discarded to allow for T1 equilibration effects. Between the fourth and fifth functional scans, an ∼6 min T1-weighted MPRAGE structural scan was collected.
All preprocessing was done using SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/). The T1-weighted images were skull stripped, and their origin was set to the anterior commissure. We then realigned the EPI volumes and normalized them into 3 mm cubic voxels with fourth-degree B-spline interpolation using normalization parameters derived from segmentation of the coregistered structural scan, then smoothed with an isotropic 8 mm FWHM Gaussian kernel.
Statistical analysis
Analyses of behavioral data were conducted using R version 3.6.0 (Planting of a Tree) with the additional packages plyr, plotrix, Hmisc, R.matlab, viridis, effsize, raincloudplots, ggplot2, grid, gridExtra, and Rmisc. Statistical tests were conducted two-sided if not stated otherwise. For t tests, we reported effect sizes as Cohen's d, and for ANOVAs as partial eta square, η2p. For the fMRI analyses, the volumes acquired during the eight sessions were treated as separate time series. For each time series, the variance in the BOLD signal was decomposed with a set of regressors in a GLM. Three regressors were generated to code for the target color presentation and the rating as a 7s boxcar, separately for miniblock and rating conditions (Baseline in MetaM blocks, low MetaM rating, high MetaM rating in MetaM blocks and Baseline in MetaC blocks, low MetaC rating, high MetaC rating in MetaC blocks). Six additional regressors were generated that represented effects of no interest, specifically, stimulus presentation as a stick function, separately for targets and nontargets, the ongoing task spanning from the onset of the first to the last shape stimulus of the miniblock, separately for whether there was a prospective-memory requirement (Baseline vs MetaM and MetaC) and the time when the computer revealed to the participant whether they were allowed to use a reminder as a stick function, separately for Reminder and Own Memory miniblocks. All regressors were convolved with a canonical hemodynamic response function. The regressors outlined above, along with six regressors representing residual movement-related artifacts and the mean over scans, comprised the full model for each session. The data and model were high-pass filtered at a cutoff of 1/128 Hz. Parameter estimates for each regressor were calculated from the least mean squares fit of the model to the data. Effects of interest were assessed in a random-effect analysis by first forming subject-specific contrasts subtracting the Baseline from the other two conditions. The resulting contrast images were entered into a repeated-measures ANOVA using nonsphericity correction (Friston et al., 2002), representing a condition agnostic selection contrast to identify a network of regions active in the rating task. Results are reported applying a height threshold of p < 0.001 uncorrected in conjunction with an extent threshold determined by SPM12 to achieve p < 0.05 familywise error correction for multiple comparisons across the whole-brain volume. ROI analyses were conducted by extracting subject-specific contrast estimates from the resulting ROIs with the toolbox MarsBaR (Brett et al., 2002), then entering the resulting data into an ANOVA in R using the same correction procedure described above.
The logic behind the key analysis of our study was the following: Replicating and extending previous findings (Cortese et al., 2016; Hebart et al., 2016; Morales et al., 2018), we first trained separate classifiers to detect (1) whether participants were in a high or low confidence state (MetaM), and (2) whether they had high or low desire for a reminder (MetaC). These classifiers could then also be combined in a cross-classification analysis, that is, whether a classifier trained on MetaM ratings can also predict MetaC ratings (and vice versa). Insofar as this cross-classification is possible, this suggests shared brain representations for both aspects of metacognition. Going one step further, we then compared within-category classification to cross-classification accuracy to distinguish between two possible patterns of results: If MetaM and MetaC are based on the exact same representational code, there should be no difference in classification accuracy. If, on the other hand, MetaM and MetaC share partially overlapping patterns, we should find significantly higher classification accuracy for within- than across-category classification, but significantly-different-from-zero accuracy for cross-classification.
For the multivariate-pattern analyses, we used The Decoding Toolbox (Hebart et al., 2015), based on the beta images resulting from the previously described general linear models (except that the models were refit to unsmoothed, unnormalized data and the MetaM and MetaC boxcar regressors were split into two regressors each using a median split on the respective metacognitive rating). When we ran our four separate decoding analyses, two drew the training and testing data from the same condition (low vs high ratings for the MetaM and MetaC conditions, respectively; defined by block-, condition-, and subject-wise median splits), whereas the other two cross-classified (train on low vs high MetaM ratings and test on high vs low MetaC ratings and vice versa; the rating scale had to be flipped for MetaC as low confidence implies high desire for a reminder). For each of these analyses, a linear support vector machine was trained to discriminate between low versus high ratings given the patterns of BOLD activity across voxels. Given the alternating block design and the fact that the orientation of the scale was flipped halfway through the study, we had two low and two high rating images available for each training or testing fold, resulting in a twofold procedure (Fig. 1D).
We used a whole-brain searchlight approach (Kriegeskorte et al., 2006), meaning that for each voxel a separate support vector machine was built, fitted to the beta values within a sphere with a radius of 3 voxels (9.6 mm). This resulted in three-dimensional decoding accuracy maps in native space for each participant and analyses. Decoding accuracy is calculated relative to chance level (subtracted by 50%, so a 5% accuracy corresponds to 55%). These maps were then normalized into MNI space (using the same normalization parameters as the univariate analyses) and smoothed using a Gaussian kernel (FWHM, 4 mm). This kernel was half of the one used for the univariate analyses. This was done to avoid excessive smoothing, given that the searchlight analysis already imposes spatial smoothing on the data. The resulting images were entered into a one-sample t test using SPM12. This allowed assessment of voxels showing consistently higher decoding accuracy in a random-effect analysis. We note that the suitability of second-level t tests has been challenged for information-like measures, such as classification accuracy, where classifier performance can meaningfully be above, but not below, chance levels (Allefeld et al., 2016; Hirose, 2020). However, this characteristic does not apply to our two key hypothesis-testing analyses. For the cross-classification between MetaM and MetaC, high MetaM could either predict higher or lower MetaC. For the comparison between within- and cross-classification, accuracy for one classification could be higher or lower than the other. Therefore, in both cases, our statistical tests are valid because they are performed on data that could meaningfully take values both above and below zero.
Along with the main multivariate pattern analyses described above, we conducted an additional analysis. Here, we used a similar approach to the univariate ROI analysis described above by defining a condition agnostic contrast (the mean of all four decoding analyses), extracting ROIs with significantly above-chance decoding accuracy and then entering the resulting classification accuracies into a repeated-measures ANOVA with factors ROI, training condition (MetaM/MetaC), and classification type (within-condition/cross-classification).
Results
Behavioral results
Our sample included 22 participants, one of which was excluded because of excessive motion in the MRI scanner (for more details, see Materials and Methods). Participants performed the tasks with a high level of accuracy (mean shape-discrimination accuracy = 93.4%, SEM = 0.84%; nonsignificant shape bias, t(20) = 1.2, p = 0.25, d = 0.26; mean target-detection rate = 88.1%, SEM = 3.37%; NB chance target-detection accuracy would be 8.3%; false alarm rate = 0.8%). With our design, we decided against using a direct manipulation of difficulty (e.g., spacing some colors closer to each other in color space) as this would have made it difficult to interpret any effect of confidence because of its inherent confound with a difficulty manipulation. Instead, we relied on natural fluctuations in confidence caused, for example, by individual preferences for colors or fatigue. Figure 2A shows that average, unaided memory performance varied across colors with some colors (e.g., the fourth color, a shade of green) being associated with lower accuracy when participants had to remember this target color unaided by a reminder. Moreover, this figure shows that not all participants had the same inherent color-difficulty profile and that instead some participants perceived particular colors as more difficult than others. Performance in the Baseline condition was high. Here, an indicator of compliance with instructions is participants' placement of the cursor between two thin lines marked on the scale (Fig. 2B). Participants reported that those lines were difficult to see in the scanning session. Nevertheless, their cursor locations peaked around the marked location and landed within the marked positions on approximately half the trials (δ = 47.6%).
We next established that the reminders aided participants in their fulfillment of the delayed intentions by comparing target-detection error rates for miniblocks in which participants had to use their own memory (fixation dot stayed white) to miniblocks in which they were allowed to use a reminder (fixation dot took on target color), shown in Figure 2C. In both conditions, error rates were reduced when reminders could be used (F(1,20) = 20.5, p < 0.001, η2p = 0.51; t > 3.4, p < 0.01, d > 0.37 when tested separately for the MetaM and MetaC conditions). Error rates did not differ significantly between conditions (F < 1), nor was there an interaction between the two factors (F < 1).
When asked explicitly after the experiment how similar they perceived the two conditions, participants rated the conditions as similar but not identical (mean = 0.68 on a scale from 0 = totally different to 1 = exactly the same; min = 0.28; max = 0.98). Indeed, we found that participants' perception of the two conditions differed in how much control participants felt they had over the reminders. On a scale ranging from 0 = no control to 1 = full control, participants rated the MetaM condition with a mean of 0.32 (min = 0.00; max = 0.88) and the MetaC condition with a mean of 0.80 (min = 0.06; max = 0.98). This difference was significant (t(20) = 6.4; p < 0.001, d = 1.94). This shows that participants were able to grasp the key difference that distinguished the two conditions.
We furthermore aimed to rule out that any condition differences found in the pattern classification analyses could be caused by behavioral differences in how the different ratings were approached. First, Figure 3A shows that the average ratings participants gave for each individual color were almost indistinguishable whether they were giving a metacognitive-monitoring or metacognitive-control rating. Indeed, if we correlated the average ratings for each color for each individual participant, there was an average relationship of r = 0.76 with 19 of 21 participants showing a significant, positive relationship between the MetaM and the MetaC rating for different colors. Relatedly, participants' rating and rating RT distributions for the two types of ratings were closely matched (Fig. 3B,C). It is important to note that participants did not receive any instructions to use these scales in the same way (except for being asked to use the entire range of the scale in both cases).
Furthermore, neither of the metacognitive rating conditions showed a systematic relationship between confidence and accuracy: For retrospective confidence judgements, it is commonly found that these correlate; that is, participants express lower confidence on errors than on correct trials (confidence resolution or Type II sensitivity). In the MetaC condition, on the other hand, participants' ratings triggered reminders, so we would expect to see the opposite pattern: Trials for which they expressed a high need for a reminder should naturally be the ones on which they were allowed to offload and error rates should therefore be lower. However, we found no significant difference between correct- and error-trial ratings in any of the four conditions (MetaC reminder, t(14) = 0.2, p = 0.88, d = 0.04; MetaC own memory, t(19) = 0.1, p = 0.95, d = 0.02; MetaM reminder, t(17) = 1.0, p = 0.34, d = 0.30; MetaM own memory, t(20) = 1.1, p = 0.28, d = 0.23; participants with missing data excluded from the respective analysis). We furthermore correlated the dichotomous accuracy vector with our continuous confidence measure for all four data cells, separately for each participant. The distributions of these correlations are shown in the right panels of Figure 3D. None was significantly different from zero (t < 1.0, p > 0.32). Together, both the prospective nature of the ratings in the present task (i.e., participants might have felt they needed to invest more into trials in which they felt less confident or wanted a reminder more) and our unique offloading design could potentially have led to a reduced confidence resolution, but this was the case for both rating conditions.
Univariate fMRI results
We first performed univariate analyses to identify brain regions activated by the requirement to encode new intentions and make metacognitive judgements about them. We therefore averaged across the two metacognition conditions (MetaM and MetaC) and compared them with the Baseline condition, allowing us to find ROIs activated by our task. After familywise error correcting for multiple comparisons, this contrast revealed seven regions showing increased BOLD signal in the metacognitive conditions (Table 2; Fig. 4A,B).
Within the seven ROIs, activity was then compared between the metacognition conditions. More specifically, activity was extracted in two separate contrasts (MetaM > Baseline and MetaC > Baseline) and then compared. This comparison is orthogonal to the initial selection contrast and therefore unbiased (Kriegeskorte et al., 2009). BOLD signal was higher for the MetaC than the MetaM condition in all seven ROIs (Fig. 4C), and this main effect was significant when examined in an ROI (7) × Condition (2: MetaC/MetaM) repeated-measures ANOVA (F(1,20) = 8.1, p = 0.01, ηp2 = 0.29). There was furthermore a reliable main effect of ROI (F(7,140) = 7.8, p < 0.001, ηp2 = 0.28) as well as a significant interaction of the two factors (F(7,140) = 3.4, p < 0.01, ηp2 = 0.14), reflecting that the absolute signal change and also the difference in signal change were larger in some ROIs compared with others. Together, these results show that regions that respond to the conditions requiring delayed intentions and metacognitive judgments showed higher activity when participants rated how much they would like a reminder (MetaC) compared with how confident they were (MetaM).
We repeated the univariate analyses for deactivations, revealing six “task-negative” regions showing decreased signal in the conditions requiring delayed intentions and metacognitive judgments compared with Baseline (Fig. 5; Table 3). These regions included the cingulate and paracingulate cortices, supplementary motor area, supramarginal gyrus, middle and inferior temporal gyri, occipital gyri, and anterior cingulate gyrus. Within these task-negative ROIs, there was more deactivation when participants rated how confident they were (MetaM) compared with how much they would like a reminder (MetaC); however, BOLD signal did not differ significantly between the MetaC and the MetaM condition (F(1,20) = 1.3, p = 0.26, ηp2 = 0.06). There was a reliable main effect of ROI (F(5,100) = 18.2, p < 0.001, ηp2 = 0.48). The interaction was not significant (F < 1).
Multivariate fMRI results
The multivariate analyses allowed us to address our two key questions: (1) Do the brain patterns of different metacognitive experiences also distinguish different acts of control? (2) Can the neural patterns that characterize specific acts of MetaC be exhaustively characterized in terms of their associated metacognitive experiences? In a first analysis, we attempted to decode confidence (MetaM). Figure 6A, B shows the resulting decoding accuracy maps corrected for chance level and multiple comparisons, resulting in nine clusters that contained meaningful information when predicting whether the brain was currently in a low or high confidence state, including the anterior cingulate gyrus, parietal occipital sulcus, central sulcus, superior parietal lobule, superior occipital gyrus, cuneus, precuneus, supplementary motor area, occipital fusiform gyrus, calcarine cortex, superior corona radiata, and precentral gyrus (Table 4).
We then repeated the equivalent analysis for the MetaC condition, again successfully decoding whether participants gave a low or high rating (i.e., desire for a reminder) from five clusters, including the occipital pole, lateral occipital cortex, superior parietal lobule, superior frontal gyrus (medial segment), and middle temporal gyrus (Table 4). Together, these analyses show that the neuroimaging data contain meaningful patterns that distinguish both different metacognitive experiences (low vs high confidence) and different acts of MetaC (low vs high desire for a reminder).
Having established the existence of meaningful patterns across the brain that distinguish different levels both of MetaM and MetaC, we could then ask whether it was possible to cross-classify the two aspects of metacognition. More specifically, we trained classifiers to distinguish low from high confidence beta images (MetaM) and tested them to predict high versus low MetaC ratings. An inverse relationship is expected between MetaM and MetaC ratings (i.e., low confidence predicts high desire for reminder, and vice versa). Therefore, one of the scales was inverted to perform this analysis. Above-chance classification accuracy can be interpreted as overlapping patterns encoding both MetaM and MetaC. The same analysis was then applied to the opposite direction (train on MetaC, test on MetaM). Importantly, we found overlapping patterns that encode these different types of metacognitive ratings. However, only for the latter analysis direction (train on MetaC, test on MetaM) did we find above-chance classification accuracy after correcting for multiple comparisons. The surviving cluster was located in the left superior and middle frontal gyri. These findings show that brain patterns associated with different metacognitive experiences (low vs high confidence) also distinguish different acts of MetaC (low vs high desire for a reminder).
To address our second key question, we compared classification accuracy resulting from the two different types of classification analyses described above: within-category (test on MetaM and train on MetaM; test on MetaC and train on MetaC) versus across-category classification (i.e., cross-classification: test on MetaM and train on MetaC; test on MetaC and train on MetaM). We first performed a condition-blind analysis by averaging across all four decoding analyses. This identified ROIs that contain information in one or more of the analyses in an unbiased manner, yielding significant effects in the occipital pole, middle occipital gyrus, parietal cortex (superior parietal lobule, precuneus), superior frontal gyrus, middle frontal gyrus, and precentral gyrus (Table 5; Fig. 6C,D). Within the resulting ROIs, classification accuracies in the four analyses could then be compared (Fig. 6E) to address the question whether decoding accuracy differed significantly between the within-condition classification and the cross-classification analyses. Taking an analogous approach to our univariate analysis, these comparisons were unbiased because they were orthogonal to the analysis used to define the ROIs. We entered the classification accuracies from these regions into a repeated-measures ANOVA with factors ROI, training condition (MetaM/MetaC), and classification type (within-condition/between-condition cross-classification). There was a significant main effect of classification type (F(1,19) = 6.2, p = 0.02, ηp2 = 0.25), with higher classification accuracy for within-condition classifications than between-condition cross-classifications. This finding can be interpreted as partially overlapping neural representations between MetaM and MetaC as opposed to perfect overlap between the patterns associated with the two aspects of metacognition. Moreover, there was no effect of the conditions on which the classifier was trained or which ROI was analyzed, F values < 1. We found a significant interaction between ROI and category (within vs between classification; F(6,114) = 2.4, p = 0.03, ηp2 = 0.11), reflecting that the difference between within-condition and across-condition decoding analyses was larger in some ROIs compared with others. No other interactions were significant (F values < 1). In sum, while our results demonstrate overlapping patterns between MetaM and MetaC, they also suggest that patterns of MetaC cannot exhaustively be characterized by associated patterns of MetaM when participants report their confidence.
Discussion
MetaM is only valuable insofar as it can subsequently influence control. And MetaC can only occur if there are metacognitive representations to begin with, which can then be used to adjust future behavior. The two processes must therefore be intimately related, yet the mapping between them requires further study, especially on a neural level. Here we report three main findings: (1) we can separately decode MetaM and MetaC; (2) brain patterns of different levels of metacognition monitoring (low vs high confidence) also distinguish different acts of MetaC (low vs high desire for a reminder); and (3) this overlap in patterns while significant is only partial. These findings suggest that patterns of brain activity corresponding to specific acts of MetaC are partially, but not fully, characterized by associated acts of MetaM.
Our cross-classification analysis revealed involvement of the left superior and middle frontal gyri, which form part of the laPFC in both MetaM and MetaC. The role of the laPFC in metacognition has already been highlighted by previous studies, suggesting a role in domain-general metacognition (Morales et al., 2018; see also Vaccaro and Fleming, 2018), in the readout of sensory information as an input for confidence signals (Shekhar and Rahnev, 2018), and more broadly in a mediating role of more rostral parts of laPFC in metacognitive accuracy (S. M. Fleming et al., 2010; Rounis et al., 2010). Crucially, the laPFC has also been implied in MetaC (Qiu et al., 2018; for review, see Shimamura, 2000; S. M. Fleming and Dolan, 2014; Seow et al., 2021) matching its more general proposed involvement in cognitive control (MacDonald et al., 2000; Ridderinkhof et al., 2004). Our study therefore extends this growing body of research that implies an involvement of the laPFC in metacognition and cognitive control.
Seeing as MetaC could not be characterized exhaustively in terms of the MetaM judgments we investigated, this raises the obvious question of which other signals might contribute to MetaC. We consider two main possibilities. The first possibility is that nonmetacognitive signals also play a role in influencing MetaC. A wide variety of signals may be relevant here, such as motivation, the costs and rewards associated with different levels of performance, serial dependencies, fatigue, states of interoceptive and bodily awareness reflecting endogenous signals like arousal (Allen et al., 2016; Hauser et al., 2017; Rouault et al., 2018), and so on. This influence of nonmetacognitive signals on MetaC was already acknowledged in the seminal paper by Nelson and Narens (1990) introducing their metamemory framework. The influence of a wide variety of signals on control is also central to an influential model from the cognitive control literature, the Expected Value of Control model (EVC; Shenhav et al., 2013). This model emphasizes the flexibility with which different control signals are selected, based on the costs and benefits associated with these signals. The model proposes that the dACC integrates both costs and benefits to form the expected value of control. Seeing as MetaC may involve the integration of multiple relevant signals, including the products of MetaM and additional nonmetacognitive signals as well, this could potentially explain the greater univariate signal we observed for the MetaC than the MetaM condition. This suggests the incorporation of additional processes into the MetaC judgment beyond those involved in MetaM. We also note that the factor of within- versus cross-classification interacted significantly with region, although there was no main effect of region. This suggests that the overlap between MetaM and MetaC is greater in some regions than others.
A second possible contribution to the MetaC condition is the integration of additional metacognitive signals, beyond the confidence judgment required by the MetaM condition. In our task, for instance, participants' desire for reminder might have been influenced not only by confidence in their prospective memory but also confidence in their perceptual judgements. Consistent with this, behavioral evidence suggests that confidence judgements are influenced by a variety of domain-general and domain-specific signals (Gilbert, 2015; Kantner et al., 2018; Rouault et al., 2018). Confidence can be regarded as an explicit representation of uncertainty, and uncertainty exists at multiple levels throughout the brain (as noted by the Bayesian brain hypothesis; Knill and Pouget, 2004). Therefore, the metacognitive signals measured in the MetaM condition probably form only a subset of the metacognitive signals which may have contributed to MetaC judgments.
Our paradigm involved measurement of only a single MetaC judgment, which may have been influenced by multiple MetaM signals. In reality, there are multiple types of both MetaM and MetaC. Take, for example, the situation of a foreign language student studying for a test at her desk during the early evening hours. The student reads a word on a flashcard, and we can assume she has access to two relevant metacognitive signals: On the one hand, there is the certainty with which the word is perceived in the waning light; the other is the certainty with which the word is recognized from memory. The former confidence should guide her decision whether or not to switch on her desk lamp. The latter confidence should guide her decision whether or not to place the flashcard on the pile marked as “restudy.” Similarly, the same confidence signal could lead to opposite consequences depending on the situation, as shown by Carlebach and Yeung (2020). The authors report that low confidence leads to advice-seeking when the quality of the advice is known and high. However, when the quality of the advice is unknown, people tend to seek advice, especially when they have high confidence to test the accuracy of the advisor. How does the brain then “harvest” these various confidence signals and route them to the appropriate act(s) of MetaC? How does it flexibly switch to a different set of signals when required to do so? How are metacognitive signals weighted by past rewards, and how do such weightings shift when our goals change? Questions such as these could potentially be addressed by adapting the present paradigm to a situation involving two or more forms of MetaM and MetaC.
The key finding of our study was the cross-classification between MetaM and MetaC. At a whole-brain corrected threshold, this analysis produced a significant effect in only one direction (train on MetaC and test on MetaM). It is not clear whether this reflects an asymmetry in cross-classification, or simply a thresholding artifact. This could be an interesting question to investigate in future work. Our finding of successful cross-classification is in line with the notion that metacognition should be regarded as a cornerstone of cognitive control. Twenty years ago, this point was made prominently by Fernandez-Duque et al. (2000), who drew parallels between metacognitive and executive control functions. Similarly, Yeung and Summerfield (2012, 2014) have suggested that error monitoring, as it is commonly studied in the cognitive-control literature, constitutes an inverse, binary measure of graded confidence. It is therefore not surprising that decision confidence is tracked by a well-established electrophysiological marker of error monitoring, the error positivity (Boldt and Yeung, 2015). Other empirical examples of links between metacognition and cognitive control are the findings that metacognitive efficiency correlates with cognitive control ability (Drescher et al., 2018) and that confidence modulates the speed accuracy trade-off on a trial-by-trial basis with participants prioritizing accuracy over response speed after a previous low-confidence decision (Desender et al., 2019). The latter effect is reminiscent of post-error slowing (Rabbitt, 1966; Jentzsch and Dudschig, 2009; Danielmeier and Ullsperger, 2011), one of the most extensively studied effects of the cognitive control literature.
Our findings bear some interesting parallels to another recent decoding study. Mei et al. (2020) reported the results from two behavioral experiments, each focused on a different type of prospective decision (belief of successfully classifying a visual stimulus vs deciding whether or not to attend to the stimulus during the upcoming trial). The authors found that it was possible to use the data from one experiment (awareness ratings, confidence ratings, and accuracy in previous trials) to predict the prospective decision from the respective other experiment and vice versa. This cross-classification analysis therefore highlights similarities of MetaM (in this case: beliefs of successfully classifying the upcoming stimulus) and MetaC (in this case: decision to attend), showing that both aspects of metacognition appear in the context of the same behavioral precursors.
Despite the theoretical distinction between two binary facets of metacognition and the two different labels assigned to the conditions, the conceptual distinction between the two is not as straightforward as it may seem. For example, our MetaM condition might still be considered to involve an act of MetaC in the sense that participants need to use their metacognitive knowledge to control the act of placing the cursor on the scale to indicate low versus high confidence. We suggest that the key distinction between the conditions is that MetaM involves relatively direct read-out of metacognitive (e.g., confidence) signals, whereas MetaC involves the use of the signals to inform more complex behaviors rather than report the metacognitive experience itself. However, seeing as metacognitive reports are, at least to some degree, inferential in nature (Koriat, 1993), MetaM and MetaC might be seen as extreme points on a continuum rather than dichotomous processes.
In conclusion, our study delineates the similarities and divisions between neural correlates of MetaM and MetaC. Ultimately, understanding the link between monitoring and control could inform interventions, such as metacognitive training in conditions including brain injury (J. Fleming et al., 2017), schizophrenia (Moritz and Woodward, 2007), and obsessive-compulsive disorder (Fisher and Wells, 2008). We propose that a full understanding of the relationship between monitoring and control will require a focus on the ways in which distinct metacognitive signals are integrated and selectively routed to appropriate acts of MetaC.
Footnotes
This work was supported by the Wellcome Trust, who awarded Sir Henry Wellcome Postdoctoral Fellowship 206480/Z/17/Z to A.B.; and the Economic & Social Research Council, who awarded Research Grant ES/N018621/1 to S.J.G. Neither of these funding bodies played a role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. We thank the MetaOffloading laboratory for help with the MRI data collection; Pei-Chun Tsai for help with anatomical labeling; and Carsten Allefeld and Kai Görgen for useful discussions regarding the data analysis.
The authors declare no competing financial interests.
- Correspondence should be addressed to Annika Boldt at a.boldt{at}ucl.ac.uk