Abstract
Previous studies have shown that self-generated stimuli in auditory, visual, and somatosensory domains are attenuated, producing decreased behavioral and neural responses compared with the same stimuli that are externally generated. Yet, whether such attenuation also occurs for higher-level cognitive functions beyond sensorimotor processing remains unknown. In this study, we assessed whether cognitive functions such as numerosity estimations are subject to attenuation in 56 healthy participants (32 women). We designed a task allowing the controlled comparison of numerosity estimations for self-generated (active condition) and externally generated (passive condition) words. Our behavioral results showed a larger underestimation of self-generated compared with externally generated words, suggesting that numerosity estimations for self-generated words are attenuated. Moreover, the linear relationship between the reported and actual number of words was stronger for self-generated words, although the ability to track errors about numerosity estimations was similar across conditions. Neuroimaging results revealed that numerosity underestimation involved increased functional connectivity between the right intraparietal sulcus and an extended network (bilateral supplementary motor area, left inferior parietal lobule, and left superior temporal gyrus) when estimating the number of self-generated versus externally generated words. We interpret our results in light of two models of attenuation and discuss their perceptual versus cognitive origins.
SIGNIFICANCE STATEMENT We perceive sensory events as less intense when they are self-generated compared with when they are externally generated. This phenomenon, called attenuation, enables us to distinguish sensory events from self and external origins. Here, we designed a novel fMRI paradigm to assess whether cognitive processes such as numerosity estimations are also subject to attenuation. When asking participants to estimate the number of words they had generated or passively heard, we found bigger underestimation in the former case, providing behavioral evidence of attenuation. Attenuation was associated with increased functional connectivity of the intraparietal sulcus, a region involved in numerosity processing. Together, our results indicate that the attenuation of self-generated stimuli is not limited to sensory consequences but is also impact cognitive processes such as numerosity estimations.
Introduction
The ability to distinguish self-generated versus externally generated stimuli is crucial for self-representation (Kircher and David, 2003; Legrand, 2006). A typical mechanism by which stimuli generated by oneself and those caused by external sources are distinguished is sensory attenuation, whereby self-generated stimuli are perceived as less intense. Indeed, previous studies have shown that self-produced stimuli in the auditory (Baess et al., 2011; Timm et al., 2014), visual (Hughes and Waszak, 2011; Benazet et al., 2016), and somatosensory (Shergill et al., 2013; Kilteni and Ehrsson, 2020) domains, are perceived as less intense compared with the same stimuli when they are externally generated. Such attenuation was shown at the behavioral level and at the neural level in sensory cortical regions (e.g., auditory cortex; Rummell et al., 2016; Whitford, 2019) as well as the thalamus, cerebellum, supplementary motor area (SMA), and inferior parietal cortex (Hickok, 2012; Lima et al., 2016; Bansal et al., 2018; Brooks and Cullen, 2019).
Previous studies have shown that attenuation not only applies to stimuli that are generated by overt actions, but also extends to covert processes such as motor imagery (Kilteni et al., 2018) or inner speech (Scott et al., 2013; Whitford et al., 2017). In the latter case, attenuation of the electrophysiological (Whitford et al., 2017; Jack et al., 2019) and behavioral (Scott et al., 2013) responses corresponding to the test stimulus (phoneme) was observed when it matched the imagined cued phoneme. These studies used phonemes that are integrated into the early stages of hierarchical speech processing implying primary sensory cortices (Liebenthal et al., 2005; DeWitt and Rauschecker, 2012), where attenuation effects have been demonstrated (Rummell et al., 2016). An outstanding question is whether the attenuation of self-generated stimuli is limited to sensory consequences (e.g., weakened percept) or whether it can also impact cognitive processes that are nonperceptual in nature such as abstract judgments or performance monitoring.
To answer this question, we investigated the cognitive function of numerosity estimations, defined as approximate judgments when counting is not involved (Dehaene, 1997). Properties of numerosity estimations such as innateness, amodality, or precision that linearly decreases with increasing numerosity have been described (Anobile et al., 2016; Burr et al., 2018). Extensive neuroimaging work has also established that brain areas in the intraparietal sulcus (IPS) region play a key role in numerosity processing (for review, see Arsalidou and Taylor, 2011). If attenuation for self-related functions extends to higher-level cognitive processes such as numerosity estimations, we would expect the number of items that are self-generated to be underestimated compared with externally generated items in relation to modulation of IPS activity. Thus, this study aimed at investigating whether numerosity estimations of self-generated words were attenuated compared with numerosity estimations of passively heard words, thereby demonstrating that self-attenuation applies to cognitive processes beyond sensory processing.
To investigate which brain regions and networks showed activity related to attenuation processes, we designed a functional magnetic resonance imaging (fMRI) paradigm allowing the controlled comparison of numerosity estimations of either self-generated words (active condition) during a phonetic verbal fluency task or externally generated words (passive condition) while passively listening to a stream of words. Additionally, we asked participants to evaluate the error they could have made in their estimation (performance monitoring). Assuming that attenuation occurs during numerosity estimations (active condition), we predicted that participants would underestimate the number of self-generated versus externally generated words and explored possible effects on performance monitoring. As performance monitoring is better for self-initiated versus observed processes (Pereira et al., 2020), we expected to find a stronger relationship between the reported and actual number of words in the active versus passive condition. At the neural level, considering that sensory attenuation of self-generated stimuli involves the corresponding sensory brain regions (e.g., primary auditory cortex for attenuated sounds; Rummell et al., 2016; Whitford, 2019), we expected to find a reduced BOLD signal during the active condition versus passive condition in areas responsible for numerosity processing including the IPS.
Materials and Methods
Participants.
Three independent participant groups were tested in this study. First, we performed a behavioral pilot experiment in a mock scanner, where we tested 17 participants (6 women; age range, 18–28 years; mean age, 23 years; 95% CI = 23, 24; schooling level, between 13 and 21 years; mean schooling level = 17 years; 95% CI = 16, 18). During the main fMRI experiment, we studied 25 participants (14 women; age range, 18–37 years; mean age, 23 years; 95% CI = 22, 26; schooling level, between 12 and 22 years; mean age, 17 years; 95% CI = 16, 18). An additional control experiment in the mock scanner was performed, where 14 participants (12 women) were tested (age range, 18 participants 29 years; mean age, 24 years; 95% CI = 22, 26; schooling level, between 10 and 23 years; mean schooling level, 16 years; 95% CI = 14, 18). All participants were right handed according to the Edinburgh Hand Preference Inventory (Oldfield, 1971) and native French-speaking healthy volunteers with no history of neurologic or psychiatric disease and no recent reported history of drug use. Participants had normal or corrected-to-normal vision and no claustrophobia. All participants were naive to the purpose of the study, gave informed consent in accordance with institutional guidelines and the Declaration of Helsinki, and received monetary compensation (20 ₣/h). The study was approved by the local ethical committee of the canton of Geneva (protocol ID, 2015–00092).
Experimental task.
The experiment was performed in a mock scanner environment (both for the behavioral pilot experiment and the training for the main fMRI experiment) and in the MRI scanner (main fMRI experiment). The task consisted of phonetic verbal fluency and passive word-listening parts, followed by numerosity and error estimations (Fig. 1). The task comprised two conditions (20 trials each), during which participants either covertly generated words (active condition) or listened to prerecorded words (passive condition). Each trial started with a randomly jittered intertrial interval varying between 4 and 4.75 s, followed by an audio cue (2 s) indicating which of the active or passive condition will follow, and a recorded cue letter (2 s). Next, the word generation phase lasted between 20 and 35 s; in the active condition, participants had to covertly generate words starting with the cued letter (Fig. 1, top). For each word they covertly generated, participants were asked to press a response button, which gave us access to the actual number of generated words. In the passive condition, a series of prerecorded audio words were played to the participants, who were asked to press the response button for every word that contained the cue letter (Fig. 1, bottom). We avoided asking participants to detect words starting with the cued letter, as this could have influenced word generation in the next active trials with the same cued letter. In both conditions, the end of the generation/listening phase was indicated by an audio cue (0.5 s). After that, participants first reported the estimation of a total number of words generated (active condition) or heard (passive condition). For this, they used two buttons that moved a slider displayed on the screen. The slider was presented as a random integer (range, 0–20), which could be changed in value by pressing the response box buttons (left button, decrease the value; right button, increase the value). This was followed by an error estimation, where participants were asked to evaluate their performance on the numerosity estimation by estimating the magnitude of the error they thought they might have made (in number of words). For this, an automatically sliding bar was presented, and the participant had to select with one button press the desired value (e.g., ±2 words error). Values varied from ±0 words (e.g., numerosity response judged as correct) to ±5 words mistaken. The total time for numerosity and error estimations was restricted to a maximum of 7 s. Except for the numerosity and error estimation periods, participants were asked to perform the task with eyes closed.
Figure 1-1
Data simulation of behavioral performance measures. Data were simulated with pseudorandom generation of an actual number of words and numerosity estimations ranging between 5 and 20. Error estimation varied between 0 and 5 as set during the task design. A, Simulated numerosity performance. Increasing intensity of red represent linear increase of overestimation (positive values of numerosity performance), while increasing intensity in blue represents a linear increase of underestimation (negative values of numerosity performance). B, Simulated performance monitoring when the number of words is constant (10 or 15 words). The performance monitoring value is depicted in the heatmap changing from blue (being correct) to red (worse performance monitoring). Download Figure 1-1, TIF file.
Stimuli.
All stimuli were prepared and presented using MATLAB 2016b (MathWorks) and the Psychtoolbox-3 Toolbox (psychtoolbox.org; Brainard, 1997; Pelli, 1997; Kleiner et al., 2007). Twenty different cue letters were used for active and passive conditions. The same cue letter was used once during the active and once during the passive condition in the counterbalanced order. Words played back during the passive condition were chosen from a list of 420 French words (Ferrand and Alario, 1998). The audio stimuli presented during the task were recorded by male and female native French speakers in a neutral manner and registered in wav format with a 11,025 Hz sampling frequency. The gender of the voice pronouncing words in the passive condition was matched to the participants' gender. During the experiment, participants were equipped with MRI-compatible earphones and report buttons for the right hand.
Procedure.
Participants were trained in a mock scanner before the main fMRI experiment to familiarize themselves with the task. They were asked to perform four trials of the task during the training (twice for each condition), with the cue letters “j” and “k.” These letters were not used later during the main fMRI experiment. We note that participants were instructed by the experimenter not to count or use any strategy to try to remember the number of words generated/heard while performing numerosity and error estimations.
The main experiment consisted of three runs lasting ∼15 min, each with short breaks in between. The total duration of a trial varied between 40 and 55 s because of pseudorandomized time for the word generation/listening phase. This time variability was introduced to avoid habituation and predictability for the number of generated/heard words and to decorrelate hemodynamic activity related to word generation/listening and numerosity estimations. The experiment was designed in 10 blocks with four trials of the same condition per block. The blocks of the active condition trials always preceded the blocks of the passive condition, allowing us to use the number and pace of generated words that were recorded based on a participant's button presses to play back the words during the next block of the passive condition. The order of the number of words played with their corresponding pace during the passive condition was shuffled within the block. This was done to ensure that participants could not recognize whether the number and pace of the played words matched the preceding active condition block. Additionally, to avoid repetition effects between blocks, the cue letters between two consecutive blocks were different, meaning that the cue letters in a given active block were not used in the following passive block.
After the main experiment, a standard phonemic verbal fluency test (generation time, 60 s; cue letter, “p”; Lezak, 1995) was performed overtly to verify that subjects understood the task correctly. Overall, the experiment lasted ∼1 h and 30 min (MRI session) and 1 h in the mock scanner (pilot session). The pilot mock scanner study contained the same procedure as the main MRI experiment, except for the shorter breaks between the runs since there was no scanning involved.
The sample size for the main fMRI experiment was based on a recent study from our group in which similar conditions were contrasted (e.g., judgment of self-generated vs observed decisions; Pereira et al., 2020).
Control experiment.
An additional control experiment in the mock scanner was performed to rule out the possibility that repetition effects occurred between consecutive blocks. It consisted only of passive condition trials, which were divided into “random” and “repeated” blocks. The random block trials were designed by pseudorandomly generating the number of words played (mean, 10 words/block; range, 6–20 words/trial) and their playing pace (total time of playing, between 20 and 35 s) to match the main fMRI experiment. Repeated blocks consequently followed random blocks, with the same number of words and word-playing pace as in the preceding random block, but with a shuffled order. This control experiment consisted of two random and two repeated blocks, with four trials/block, and lasted ∼30 min.
Behavioral performance measures.
Most statistical frameworks to analyze performance monitoring have been developed for discrimination tasks with a binary response (for review, see Fleming and Lau, 2014). In the following, we propose two indices of numerosity performance and performance monitoring to analyze ordinal data (e.g., the number of words). In addition to the prerequisites described below, these indices were defined at the single-trial level, so they could serve as parametric regressors of interest in the fMRI analysis.
The numerosity performance index is a normalized accuracy ratio reflecting how correctly participants estimated the number of words during the generation/listening phase. For each trial, we wanted signed numerosity performance to be proportional to the difference between the reported number of generated/heard words [numerosity estimation (N)] and the actual number of words generated/heard (W). We normalized this difference by the sum of numerosity estimation and the actually generated/heard number of words (N + W; Eq. 1) to give more weight to errors made about low numbers of words (e.g., an error of ±2 given a numerosity estimation of 8 has higher magnitude than an error of ±2 given a numerosity estimation of 16; Extended Data Fig. 1-1A). This normalization allowed us to assess attenuation effects independent of the number of generated words, given that the precision of numerosity estimation linearly decreases with an increasing number of items (Piazza et al., 2004). Negative numerosity performance values thus reflected an underestimation of generated/heard words, and positive values reflected an overestimation of generated/heard words. In contrast, null numerosity performance values reflected correct answers about the number of generated/heard words, as follows:
In addition to the numerosity performance index, we derived separate measures of accuracy (N – W) and accuracy ratio [(N – W)/W] to assess the generalizability of our findings.
Performance monitoring reflected how well participants estimated an error about their previous performance. We defined it as the absolute value of the difference between the error estimation (E) and accuracy (N – W), normalized by the sum of the numerosity estimation and words generated/heard (N + W; Eq. 2). Normalization was performed to consider the difficulty; the same error made when estimating the low number of words or high number of words should be penalized proportionally. A performance monitoring value of 0 reflected ideal error tracking, whereby participants correctly estimated the error made during the numerosity estimation. An increase in performance monitoring value represented an increase in error magnitude while estimating the difference between numerosity estimation and the actual number of words generated/heard (Extended Data Fig. 1-1B), as follows:
Data cleaning was performed before statistical analysis: trials for which participants did not generate at least five words or failed to answer numerosity or error estimations within the time limit were excluded from behavioral and fMRI analysis (in total, 2.44 ± 1.87 trials/subject were excluded). The threshold of five words was selected according to the working memory capacity of 5 ± 2 items (Cowan, 2010). Considering all participants' data, 6.1% of all trials were discarded.
Behavioral data analysis.
All continuous variables (numerosity performance, accuracy, accuracy ratio, and performance monitoring) were analyzed using linear mixed-effects regressions with condition (“active”, “passive”, or “random”, “repeat” for the control experiment) as a fixed effect and a random intercept by participant and condition. The inclusion of additional random effects was guided by model comparison and selection based on Bayesian information criteria. Analyses were performed using the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages in R (https://www.r-project.org/). The significance of fixed effects was estimated using Satterthwaite's approximation for degrees of freedom of F statistics (Luke, 2017).
Bayesian analysis was performed using the brms package (Bürkner, 2017) in R to examine the occurrence of possible repetition effects between random and repeated passive blocks in the control experiment. Namely, we conducted a Bayesian linear model of numerosity performance with condition (random, repeated) as a fixed effect and a random intercept by participant and condition with four chains of 10,000 iterations including 2000 warmup samples. We made a prior assumption that underestimation would be observed in the passive condition (prior with Gaussian distribution of mean = −0.032 and SD = 0.156), based on the difference between numerosity performance during the active and passive conditions observed in the main fMRI experiment.
Data simulation of absolute accuracy during numerosity estimations.
To examine whether the absolute accuracy (correct numerosity estimations of words generated/heard) of numerosity estimations observed in the experimental data may be approximated by a noisy sampling process rather than counting, we performed data simulations. More specifically, we simulated 1000 numerosity estimations (as the main experiment contained 40 trials/subject * 25 subjects) from a normal distribution centered on 10 and an SD varying from 0.5 to 6 with 0.1 steps. We rounded each item to the nearest integer and counted how many times the number 10 was obtained. The simulation was iterated 10,000 times.
fMRI data acquisition.
MRI data were acquired using a Siemens Magnetom Prisma 3 T scanner with a 64-channel head coil. T1-weighted (1 mm isotropic) scans were acquired using an MPRAGE (magnetization-prepared rapid acquisition gradient echo) sequence [TR = 2300 ms; TI = 900 ms; TE = 2.25 ms; flip angle = 8°; GRAPPA (generalized autocalibrating partial parallel acquisition) = 2; FOV = 256 × 256 mm; 208 slices]. Functional scans were obtained using echoplanar imaging (EPI) sequence (multiband acceleration = 6; TR = 1000 ms; TE = 32 ms; flip-angle = 58°; FOV = 224 × 224 mm; matrix = 64 × 64; slice thickness = 2 mm; 66 slices). The number of functional image volumes varied according to the experiment duration (2278 ± 61 volumes).
fMRI data preprocessing.
Anatomical and functional images were processed and analyzed using SPM-12 (Wellcome Center for Human Neuroimaging). Preprocessing steps included slice time correction, field map distortion correction, realignment, and unwarping to spatially correct for head motions and distortions, coregistration of structural and functional images, normalization of all images to common Montreal Neurologic Institute (MNI) space, and spatial smoothing with a Gaussian kernel with a full-width at half-maximum of 4 mm. Quality assurance of all EPI images was performed with the criteria of a maximum of 2 mm translation and 2° rotation between volumes. In addition, an excessive movement was estimated with the mean framewise displacement (FD; Power et al., 2012) with an exclusion threshold of 0.5 mm. None of the subjects had a higher mean FD (0.2 ± 0.06 mm) than the set threshold.
fMRI data analysis.
We used a two-level random-effects analysis. In the first-level analysis, condition-specific effects were estimated according to a general linear model (GLM) fitted for each subject. An average mask of gray matter from all subjects was built using FSL (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/) and was used to mask out white matter and nonbrain tissues. The GLM was built using six boxcar regressors corresponding to the duration of the word generation/listening phase, numerosity estimation and error estimation in the active and passive conditions. Parametric modulators of numerosity performance and performance monitoring were included in the numerosity estimation and error estimation regressors, respectively. Further, we added regressors of no interest corresponding to audio instructions, button presses, and excluded trials, plus six regressors for head motion (translation and rotation).
At the second level (group level), we performed a one-way ANOVA with F tests to assess the main effects common to active and passive conditions and t tests to analyze the difference between conditions (active vs passive) for each regressor of interest: numerosity estimation and error estimation. We used a voxel-level statistical threshold of p < 0.001 and corrected for multiple comparisons at the cluster level using familywise error (FWE) correction with the threshold of p < 0.05. We used the anatomic automatic labeling atlas for brain parcellation (Tzourio-Mazoyer et al., 2002).
Functional connectivity analysis.
Psychophysiological interaction (PPI) analysis was used to identify modulations of functional coupling between a seed region and other brain regions by experimental conditions (active vs passive; Friston et al., 1997). To perform this analysis, we used generalized PPI toolbox version 13.1 (McLaren et al., 2012). Spheres of 6 mm radius were formed around the peak coordinates of the right IPS cluster (x = 29; y = −65; z = 50) and the left IPS cluster (x = −27; y = −66; z = 47) that were identified in the second-level analyses. First-level (individual) GLM analyses were performed, including task regressors of numerosity estimation or error estimation (psychological term) and time course of the seed region (physiological term). As for other aforementioned fMRI analyses, we performed t tests to compare the differences between conditions.
Data availability.
The MRI data that support the findings of this study are available in a public repository on Zenodo (https://doi.org/10.5281/zenodo.4925909). Behavioral data, analysis, and task codes are available as follows: https://gitlab.epfl.ch/lnco-public/cognitive-attenuation.
Results
Behavioral results
By design, the number of generated and heard words was matched between conditions (active: mean = 10.7; SD = 3.6; 95% CI = 10.4, 11.1; passive: mean = 10.7; SD = 3.5; 95% CI = 10.4, 11.0; F(1,884) = 0.005; p = 0.94). The duration of word generation and listening did not differ between conditions either (active: mean = 27.4; SD = 5.6; 95% CI = 26.9, 27.9; passive: mean = 28.4; SD = 7.0; 95% CI = 27.7, 29.0, F(1,26) = 2.04; p = 0.16). Having shown that conditions were similar in terms of difficulty, we turned to the analysis of numerosity performance and performance monitoring.
Numerosity performance indices revealed that globally, participants underestimated the number of words (mean = −0.05; SD = 0.11; 95% CI = −0.06, −0.05). This underestimation was significantly larger (F(1,24) = 5.85; p = 0.023; 18 of 25 subjects showed the effect) in the active condition (mean = −0.07; SD = 0.11; 95% CI = −0.08, −0.06) compared with the passive condition (mean = −0.04; SD = 0.12; 95% CI = −0.05, −0.03; Fig. 2A). Comparable results were obtained during the pilot experiment in the mock scanner in an independent group of subjects (active: mean = −0.05; SD = 0.01; 95% CI = −0.06, −0.03; passive: mean = −0.03; SD = 0.10; 95% CI = −0.04, −0.02; F(1,16) = 5.80; p = 0.016). These two experiments suggest that numerosity estimations for self-generated words are attenuated.
Figure 2-1
Mixed-effects linear regression between numerosity estimation (reported number of words) and an actual number of words depicted for each participant. Reference dashed line with a slope equal to 1 represents ideal performance. Thick lines represent the individual model fit per condition (active, red; passive, blue). Each dot represents a participant's single-trial performance (overlapping dots are depicted in higher color intensity). Download Figure 2-1, TIF file.
We further show that the larger underestimation in the active compared with the passive condition was also found when considering the measures of accuracy (active: mean = −1.48; SD = 2.38; 95% CI = −1.69, −1.26; passive: mean = −0.94; SD = 2.59; 95% CI = −1.17, −0.71; F(1,24) = 4.43; p = 0.045) and accuracy ratio as dependent variables (active: mean = 0.89; SD = 0.20; 95% CI = 0.87, 0.91; passive: mean = 0.95; SD = 0.23; 95% CI = 0.93, 0.97; F(1,24) = 7.44; p = 0.011).
We then assessed the linear relationship between the reported (numerosity estimation) and actual number of words (Fig. 2B). In addition to a main effect of words (F(1,908) = 1042; p < 0.001) and condition (F(1,198) = 8.25; p = 0.0045), we found an interaction between condition and generated/heard words (F(1,802) = 4.3; p = 0.038). This interaction was driven by a steeper slope in the active condition (mean = 0.62; 95% CI = 0.57, 0.67) compared with the passive condition (mean = 0.55; 95% CI = 0.50, 0.60; Fig. 2C), which indicates better numerosity tracking for self-generated words during the main fMRI experiment (a slope of value 1 reflecting ideal performance). We note that this interaction did not reach significance in our pilot experiment (F(1,350) = 1.10; p = 0.66), which calls for interpreting this result with caution.
Next, we investigated whether performance monitoring varied between conditions, by quantifying how well participants were able to track the error made when estimating the number of generated/heard words. We found no differences in performance monitoring (F(1,24) = 1.09; p = 0.3) between the active condition (mean = 0.07; SD = 0.06; 95% CI = 0.07, 0.08) and the passive condition (mean = 0.08; SD = 0.06; 95% CI = 0.07, 0.09) during the main fMRI experiment or during the mock scanner pilot experiment (F(1,16) = 0.78; p = 0.39; active: mean = 0.07; SD = 0.06; 95% CI = 0.06, 0.08; passive: mean = 0.06; SD = 0.06; 95% CI = 0.06, 0.07). This confirms the absence of evidence supporting an effect of attenuation on performance monitoring.
Finally, we conducted a control experiment for word generation outside the scanner, during which participants had to perform a standard verbal fluency task overtly with the cue letter “p.” Comparing the number of words generated, starting with the letter “p” overtly (outside the scanner) and covertly (during the scanning), we did not observe any significant differences between the number of words generated overtly (mean = 12.6; SD = 3.79; 95% CI = 10.9, 14.1) and covertly (mean = 12.3; SD = 3.96; 95% CI = 10.6, 13.9; paired t test: t(24) = −0.4; p = 0.69). This control experiment confirms that subjects did generate words covertly and comparably to overt fluency, as instructed.
Control experiment
The global underestimation (mean = −0.06; SD = 0.11; 95% CI = −0.08, −0.05) was replicated as in the main experiment. This underestimation was not different (F(1,117) = 0.03; p = 0.87) between random blocks (mean = −0.06; SD = 0.12; 95% CI = −0.09, −0.04) and repeated blocks (mean = −0.07; SD = 0.12; 95% CI = −0.09, −0.04), with a Bayes factor BF10 = 0.09, supporting the absence of difference between conditions.
Data simulation of absolute accuracy during numerosity estimations
As can be seen in Figure 3, the value of 10 occurred in 19.5% of all data points when drawn from a normal distribution with SD = 2, similar to the observed empirical data.
fMRI results
Numerosity performance
Brain activity during the numerosity estimation phase was widespread, regardless of the experimental condition (Extended Data Table 1-1). Differences between conditions revealed widespread relative deactivations in the active compared with the passive condition in the bilateral parietal cortex, including IPS, middle-superior temporal gyri, precuneus, cerebellum, middle cingulate gyri, SMA, insula, middle-superior frontal gyri, hippocampus, caudate nucleus, and putamen (Table 1, detailed list of all areas).
Table 1-1
Numerosity estimation: main effect (active + passive conditions). Brain areas activated during the numerosity estimation phase independent of parametric modulation. Voxel level, p < 0.001 uncorrected; cluster threshold at p < 0.05, FWE corrected. BA, Brodmann area; k, cluster size; R, right hemisphere; L, left hemisphere; R, right hemisphere. Download Table 1-1, DOCX file.
To avoid contaminating the results with inherent differences between the active and passive conditions that are not specific to numerosity processes, we looked for brain activity parametrically modulated by numerosity performance. We found a single brain region with such a pattern of activity, namely the right IPS, whose activity was negatively correlated with numerosity performance (main effect: F = 20.0; pFWE = 0.042; cluster size = 148; MNI coordinates: x = 29, y = −65, z = 50; Fig. 4A,B). The same pattern was found bilaterally when using a less stringent threshold (peak level uncorrected, p < 0.005; Fig. 4C, Table 2). Unlike our prediction, IPS activation did not differ between the active and passive conditions (pFWE > 0.05), suggesting that this effect was independent of whether words were heard or actively generated.
Using PPI analysis, we investigated whether the functional connectivity of the right IPS with other brain regions differed between the active and passive conditions. This analysis revealed that bilateral SMA, left inferior parietal lobule (IPL), and left superior temporal gyrus (STG) had increased connectivity with the right IPS in the active compared with the passive condition (Fig. 5, Table 3). No effect in the other direction was observed.
Performance monitoring
In addition to numerosity performance, we also quantified brain activity during the error estimation phase. We found widespread cortical and subcortical activations during this phase compared with baseline, regardless of the experimental condition (Extended Data Table 4-1). Moreover, bilateral insula and right putamen were significantly less active during the active condition compared with the passive condition (Fig. 6, Table 4). The left caudate nucleus showed the opposite pattern, with higher activity in the passive condition than in the active condition.
Table 4-1
Error estimation: main effect (active + passive conditions). Brain areas with activations during the error estimation phase, independent of parametric modulation. Voxel level, p < 0.001, uncorrected; cluster threshold at p < 0.05, FWE corrected. BA, Brodmann area; k, cluster size; R, right hemisphere; L, left hemisphere; R, right hemisphere. Download Table 4-1, DOCX file.
To avoid confounding factors between the active and passive conditions, we looked more specifically at parametric modulations of error monitoring. We observed that the BOLD signal in the left IPS was more related to performance monitoring in the active condition compared with the passive condition (t = 4.02; pFWE = 0.041; kvox = 198; MNI coordinates: x = −27, y = −66, z = 47; Fig. 7). Interestingly, this activation cluster spatially overlapped with the observed activation cluster related to the main effect of numerosity performance when applying a lower statistical threshold (exploratory analysis: peak level uncorrected, p < 0.005; cluster level, pFWE < 0.05; Fig. 4C).
Last, results from the PPI analysis using the left IPS as a seed ROI did not reveal any functional connectivity differences between the active and passive conditions during error estimation.
Discussion
The present study examined whether the attenuation of self-generated stimuli impacts cognitive processes beyond the traditionally investigated sensory processes, namely numerosity estimations and performance monitoring. To this end, we developed an experimental paradigm allowing the controlled comparison of numerosity estimations and performance monitoring regarding self-generated and externally generated words while acquiring fMRI data. We found that participants more strongly underestimated the number of self-generated compared with externally generated words, providing behavioral evidence that numerosity estimations are indeed subject to attenuation. As expected, numerosity performance was associated with hemodynamic activity changes in IPS. Furthermore, a network including the bilateral SMA, left IPL, and left STG showed increased functional connectivity with the right IPS during numerosity estimations for self-generated words, suggesting that numerosity-related attenuation involves this neural network. Finally, by asking participants to monitor the accuracy of their own numerosity estimations, we found equivalent performance monitoring for self-generated and externally generated words. Although no difference was found at the behavioral level, we found that performance monitoring was associated with increased IPS activity in the active condition versus the passive condition.
Behavioral and neural markers of attenuated numerosity estimations
Participants underestimated the number of words they generated or heard. Such underestimations were previously described regarding the numeric estimation of perceptual quantities (e.g., number of dots or sequences of sounds) to discrete measures (e.g., Arabic numeral; Castronovo and Seron, 2007; Reinert et al., 2019). In the present study, we found that word numerosity underestimation was stronger in the active compared with the passive condition, in line with our hypothesis that attenuation of self-generated stimuli may extend to higher-level cognitive functions (Kilteni et al., 2018; Jack et al., 2019). Of note, this underestimation could not be because of simple repetition effects between successive blocks of active and passive condition trials, as a control experiment showed no differences in numerosity estimations between repeated blocks of passive condition trials.
Attenuation of self-generated stimuli has been investigated mostly for sensory processes and refers to the diminished behavioral and neural responses associated with self-generated compared with externally generated stimuli (Timm et al., 2014; Benazet et al., 2016). Importantly, attenuation has recently also been described for imagined actions in the absence of overt actions (Kilteni et al., 2018; Jack et al., 2019). For example, imagined self-touch was felt as less intense compared with externally applied touch (Kilteni et al., 2018). Similarly, imagined speech elicited reduced electrophysiological signals related to auditory processing (Whitford et al., 2017). These studies, however, investigated cognitive processes intimately linked to sensorimotor systems (e.g., imagined movement and touch), yet, to the best of our knowledge, it was unknown whether comparable mechanisms of attenuation also affect cognition beyond sensorimotor processing, such as numerosity estimations with no or less obviously implicated sensorimotor processes (Dehaene, 1997). Here we show that differential attenuation can be observed for cognitive processes such as numerosity estimations that depend on whether they are self-generated or not. As of today, there is no consensus to explain how the brain attenuates expected (e.g., self-generated) stimuli while remaining sensitive to unexpected ones (for a recent review and unifying theoretical account, see Press et al., 2020). Among the so-called cancelation theories, the internal forward model (Miall and Wolpert, 1996; Farrer and Frith, 2002) proposes that corollary discharges related to action are used to predict the sensory consequences of that action. When such predictions match the actual sensory feedback from the action, its sensory consequences are attenuated, and the action is perceived as self-generated (Wolpert and Flanagan, 2001). The forward model thus proposes to link sensory attenuation to the sensory predictions generated by a neural comparator. An analogous mechanism has been proposed to account for attenuation for covert actions such as motor imagery (Kilteni et al., 2018) or inner speech (Tian and Poeppel, 2010). The present data show that attenuation exists beyond overt and covert actions, raising the possibility that the forward model extends to repetitive cognitive activity (e.g., fluency and related numerosity estimations). One possibility, similar to what has been described for inner speech, would be that numerosity underestimation stems from a weaker perceptual representation of self-generated words. In other words, attenuation may not impact numerosity estimations directly, but rather through a decreased representation of self-generated words, which in turn may lead to attenuation of numerosity estimations.
A second account could be that mental operations have a gating effect (Cromwell et al., 2008), independent from predictive mechanisms, thereby directly affecting the strength of the mental representations of self-generated words. A similar gating mechanism has recently been shown to offer a plausible alternative to forward model accounts of sensory self-attenuation (Thomas et al., 2020). Finally, the general mechanisms put forward by the predictive coding account of active inference to explain sensory attenuation may also apply to nonperceptual cognitive attenuation such as the one we observed (Brown et al., 2013).
At the neural level, the IPS showed activity related to numerosity performance, which is in line with previous studies (Cohen and Dehaene, 1996; Piazza et al., 2006). This relation, however, was not modulated by condition (active vs passive), as one could have expected based on previous reports showing that sensory attenuation coincides with decreased activity in the corresponding sensory area (e.g., attenuation in primary auditory cortex during self-generated auditory stimuli; Rummell et al., 2016; Whitford, 2019). Importantly, we found differences in functional coupling during numerosity estimations of the IPS with a network of brain regions between the active and passive conditions. The increase in coupling during the active condition occurred between the IPS and a network comprising the SMA, IPL, and STG, which are known to be involved in predictive processing, for example of self-generated auditory and imagery speech (Lima et al., 2016; Tian et al., 2016). Although previous neuroimaging studies have mainly shown attenuated brain activity for self-generated actions (Whitford, 2019), increases in functional connectivity such as we describe have been reported in the primary auditory cortex during attenuated speech (van de Ven et al., 2009) or in somatosensory cortex during attenuated touch (Kilteni and Ehrsson, 2020). Thus, increased functional connectivity in a network centered on the key numerosity region, IPS, and that has been associated with speech-related processing further supports that numerosity underestimation stems from a process related to attenuation when participants self-generate words.
In addition to numerosity underestimation, we observed that participants formed more accurate numerosity judgments for self-generated versus externally generated words: while participants' word estimations were lower in the active condition than in the passive condition, the relation between their estimated number of words and the actual number of words was better in the active condition compared with the passive condition, suggesting a sharper representation of the number of self-generated words. This result corroborates recent findings showing better monitoring for decisions that are committed rather than observed (Pereira et al., 2020). This improved monitoring of self-generated words could be related to the sharpening of expected representations known in the sensory domain (Kok et al., 2012) or to a self-generation effect underlying the facilitation of information encoding and enhanced recall for self-generated stimuli (Slamecka and Graf, 1978; Bertsch et al., 2007). We note that this effect was not replicated during the mock pilot experiment, which included a smaller sample of participants with a globally higher absolute accuracy. It is possible that this subtle effect could not be observed in such conditions, which warrants future investigations and cautious interpretation.
Performance monitoring is modulated at the neural but not behavioral level
Previous research has shown that both humans and nonhuman primates (Beran et al., 2006; Duyan and Balcı, 2018, 2019) can monitor the quality of their numerosity estimations. Thus, in addition to asking participants to estimate the number of words they generated or heard, we also asked them to estimate their own error during this process (performance monitoring). Although our behavioral results showed similar performance monitoring between conditions, we observed that performance monitoring was associated with hemodynamic activity in the left IPS predominantly in the active condition. Interestingly, this region is not typically associated with performance monitoring as is the case in the prefrontal cortex or the insula/inferior frontal gyrus (Vaccaro and Fleming, 2018). Since activity in the left IPS is related to numerosity estimation (Cappelletti et al., 2007; Dormal et al., 2012), this parametric modulation could therefore represent a substrate for monitoring that is specific to numerosity estimations. By investigating global fMRI activity during error estimations, we also observed decreased activation in the insula and putamen and increased activity in the caudate nucleus when comparing the active versus passive conditions. To note, the anterior insula and putamen were activated during both conditions but were attenuated in the active compared with the passive conditions. While the anterior insula activations are consistent with previous literature on performance monitoring (Ullsperger et al., 2010; Bastin et al., 2017), the findings of modulated activity in the striatum were unexpected. Both areas are known as essential for the control of goal-directed decision-making (Balleine et al., 2007; Kim and Im, 2019). Further links between performance monitoring and activity modulation between conditions in the striatum regions should be explored in future studies.
Methodological considerations
As generating versus listening to words inherently involves different behavioral and neural processes, several aspects of our design should be considered when interpreting the current findings as reflecting attenuation per se rather than experimental confounds. At the behavioral level, the larger underestimation during the active condition was not because of difficulty differences between conditions as task difficulty was matched by design, and absolute accuracy (e.g., correct numerosity estimations of words generated/heard) was similar in both conditions. Of note, a steeper slope during numerosity estimations and the number of words generated in the active condition suggest that participants tracked their performance better, even when underestimating more compared with the passive condition. Although we could not explicitly control whether participants counted the number of words they generated or heard, instructions were given not to do so. Yet, counting seems unlikely as absolute accuracy reached only 19.5% of all trials, which is comparable to a recent study with a similar setup (Serino et al., 2021), and was much lower than what is typically observed when participants are instructed to count (Kansaku et al., 2007). Furthermore, a similar level of absolute accuracy was obtained on simulated data with numerosity estimations drawn from a normal distribution with SD = 2, suggesting that behavior in this task is approximated by a noisy sampling process rather than counting. At the neural level, to ensure that the inherent differences between conditions did not contaminate our results, we focused analyses on the numerosity and error estimation phases, which were identical between conditions, instead of the word generation/listening phases. In addition, we did not base our conclusions on a direct contrast between the two conditions, but on correlations between neural activity and numerosity or error estimates.
Conclusion
Based on behavioral and neuroimaging data, we propose that higher-level cognitive functions such as numerosity estimations about the number of self-generated words are attenuated. Such attenuation involves a functional network including a key numerosity region (IPS) and speech-related regions including the SMA, IPL, and STG. While attenuating the sensory consequences of one's actions is of crucial importance for aspects of the self, such as the sense of agency, attenuating the products of one's mental activities may also be relevant to distinguish them from external sources of information. Our paradigm offers a promising tool to investigate attenuation processes related to the self in cognition and to compare and distinguish them from sensory attenuation processes. It may be of relevance to the study of clinical cases in which attenuation in sensory and cognitive domains may be altered, including patients with psychotic symptoms like thought insertion whereby thoughts are not considered as one's own, but as those of someone else.
Footnotes
This work was supported by the Bertarelli Foundation (Grant 532024), the Swiss National Science Foundation (Grant 3100A0-112493), National Center of Competence in Research “Synapsy - The Synaptic Bases of Mental Diseases” (Grant 51NF40–185897), and by two donors advised by CARIGEST SA (Fondazione Teofilo Rossi di Montelera e di Premuda and a second one wishing to remain anonymous). N.F. has received funding from the European Research Council under the European Union Horizon 2020 Research and Innovation Program (Grant 803122). We thank the Foundation Campus Biotech Geneva (FCBG) Human Neuroscience Platform of the Campus Biotech for technical support during data acquisition.
The authors declare no competing financial interests.
- Correspondence should be addressed to Olaf Blanke at olaf.blanke{at}epfl.ch or Nathan Faivre at nathan.faivre{at}univ-grenoble-alpes.fr