Abstract
When individuals make a movement that produces an unexpected outcome, they learn from the resulting error. This process, essential in both acquiring new motor skills and adapting to changing environments, critically relies on error sensitivity, which governs how much behavioral change results from a given error. Although behavioral and computational evidence suggests error sensitivity can change in response to task demands, neural evidence regarding the flexibility of error sensitivity in the human brain is lacking. Here, we tested whether the nervous system's sensitivity to errors, as measured by prediction-driven suppression of auditory cortical activity, can be modulated by altering participants’ (both males and females) perceived variability during speech. Our results showed that error sensitivity, as measured by this suppression, was increased after exposure to an auditory perturbation that increased speakers’ perceived variability. The current study establishes the validity of behaviorally modulating the nervous system's sensitivity to errors, which has significant potential to enhance motor learning and rehabilitation.
- auditory feedback
- error sensitivity
- motor variability
- sensorimotor learning
- speaking-induced suppression
Significance Statement
Error sensitivity, how responsive the sensorimotor system is to perceived errors, is typically considered a stable, intrinsic characteristic. Here, however, we demonstrate that a temporary manipulation of motor output variability induces a persistent change in a neural index of error sensitivity, highlighting a potential avenue to enhance motor learning and rehabilitation.
Introduction
Throughout our daily activities, we constantly monitor the outcomes of our actions and adjust our behavior when these outcomes do not match our expectations. A critical variable in such sensorimotor learning is error sensitivity, the “learning rate” determining how much is learned from trial to trial (Thoroughman and Shadmehr, 2000).
Early models of error-based sensorimotor adaptation assumed that the sensitivity to error for a given action is constant; that is, the brain always learns a certain fraction of the error (Thoroughman and Shadmehr, 2000; Scheidt et al., 2001; Smith et al., 2006; van Beers, 2012). However, recent work using psychophysics and computational modeling of reaching movements has suggested that error sensitivity is not static, but changes as a function of error size (Marko et al., 2012) and the history of previously experienced error (Herzfeld et al., 2014). Similarly, although movement variability is often thought to be constant, arising largely from unwanted noise in the nervous system (Faisal et al., 2008), recent work demonstrates that motor variability is more actively controlled (Wong et al., 2009; Tang et al., 2022). Furthermore, motor variability correlates with adaptation rate (Wu et al., 2014) and aftereffects following adaptation (Tsay et al., 2024), suggesting error sensitivity may play a shared role in both processes.
Despite this behavioral and computational evidence for flexible error sensitivity in the motor system, there is a paucity of neural evidence regarding whether and how error sensitivity can be modulated in the human brain. Speech production provides a unique opportunity to examine changes in neural sensitivity to errors during motor behavior. The sensitivity of the nervous system to auditory errors in speech has been extensively studied by examining the suppression of the auditory cortical response to self-produced speech compared with its response to playback of the same signal (Curio et al., 2000; Houde et al., 2002; Ventura et al., 2009; Flinker et al., 2010). This speaking-induced suppression (SIS), primarily seen in the left hemisphere, is thought to reflect a partial neural cancellation of incoming auditory feedback by efference copy prediction. Critical for our purposes, SIS is modulated by sensory prediction error (Behroozmand and Larson, 2011; Chang et al., 2013; Niziolek et al., 2013; Sitek et al., 2013). For example, the magnitude of SIS diminishes in response to an increase in perceived error resulting from external auditory perturbation (Behroozmand and Larson, 2011; Chang et al., 2013) or internal variability (i.e., acoustic deviance; Niziolek et al., 2013; Beach et al., 2024). Moreover, when auditory perturbations are consistent, SIS gradually returns to baseline as participants adapt through repeated exposures, with a correlation between changes in SIS and adaptation (Kim et al., 2023). In sum, previous work suggests that SIS can index sensitivity to sensory prediction errors. This implies that when the magnitude of the sensory error is constant, increased error sensitivity will be manifested by greater perceived error and, consequently, a decrease in the magnitude of SIS.
In the current study, we use SIS to test whether the nervous system's sensitivity to errors can be modulated by auditory feedback perturbations that alter speakers’ perceived variability. A recent behavioral study found that regardless of whether these perturbations were applied in an inward or outward direction (i.e., reducing or increasing perceived variability, respectively; Fig. 1A), participants responded by increasing their produced variability (Tang et al., 2022). Critically, simulations using a state-space model of learning suggested that the seemingly counterintuitive response in the outward condition was best explained by an increase in error sensitivity, which drove trial-to-trial overcorrections that increased variability. Conversely, the response in the inward condition could not be explained by changes in error sensitivity but was instead consistent with increases in underlying motor variability, as participants relaxed their constraints on movement outcomes. Based on these results, we predicted that exposure to outward (but not inward) perturbations would result in an increase in error sensitivity that would be reflected in reduced SIS in this condition.
Materials and Methods
Participants
Twenty-six native speakers of American English participated: 16 in the main MEG study and 10 in a small behavioral pilot before the main study. The pilot group of 10 participants (7/3 females/males, 18–39 years, mean 23.8 ± 5.9) were recruited to confirm the effect of auditory perturbation on produced variability under the current experimental procedure, including both speaking and active listening trials. All participants were right-handed with no reported history of neurological disorders. Participants’ hearing thresholds were measured using the modified Hughson–Westlake audiogram procedure (Carhart and Jerger, 1959; Hughson and Westlake, 1944); all participants had normal hearing as defined by thresholds of 25 dB HL or less for frequencies between 250 and 4,000 Hz. All participants gave their informed consent, and the protocol was approved by the Institutional Review Board of the University of Wisconsin–Madison.
Our main analysis compared the magnitudes of SIS before and after auditory feedback perturbation. One participant who did not exhibit suppression during the baseline phase across all three MEG visits was excluded, as we could not assess our main hypothesis in this participant. In total, data from 15 participants (9/6 females/males, 24–65 years, mean 39.3 ± 12.1) were further analyzed.
Experimental procedure
Each participant completed three separate sessions where they were exposed to different auditory feedback perturbations (either inward-pushing or outward-pushing) or no perturbation at all (control). The order was counterbalanced across participants. There was at least a 1 week interval between sessions (mean interval: 10.1 d) to minimize potential carryover effects in speech from the auditory perturbation.
During MEG recording, participants were seated upright in a sound-attenuated, magnetically shielded recording room and required to complete two types of tasks: a speaking task and a listening task. Stimulus words (“ease,” “add,” and “odd”) were pseudorandomly selected and presented on a Panasonic DLP projector (PT-D7700U-K) for 1.5 s, one at a time. The interstimulus interval was randomly jittered between 0.4 and 1.15 s. These stimulus words contain the same three vowels as in our previous behavioral study (Tang et al., 2022) but start without an onset consonant to avoid movement artifacts in MEG. During the speaking task, participants produced the stimulus words while receiving auditory feedback of their own voices through insert earphones. Participants were instructed to keep still and minimize jaw movement during speech, although not at the expense of vowel quality. During the listening task, participants passively listened to the audio recorded in the previous speaking task. They were instructed to keep their eyes open and look at the screen during both tasks.
Each MEG session was divided into three phases (Fig. 1B):
Baseline phase: 60 productions of each word with unaltered auditory feedback, divided into two blocks (30 productions of each word in each block). Each block of the speaking task was followed by a block of the listening task.
Exposure phase: 150 productions of each word with (inward-pushing or outward-pushing) or without (control) perturbation applied, divided into five blocks (30 productions of each word in each block). See below, Real-time auditory perturbation, for more detailed perturbation information. Only the first block of the speaking task in this phase was followed by a listening task.
Test phase: Identical to baseline phase.
Experiment design. A, Schematic and examples of perturbations (inward and outward) applied to vowel formants during the exposure phase from a single representative participant. The formant values that the participant produced and heard are indicated by black and colored circles, respectively. The ellipses represent a 95% confidence interval around the data points of the same color. All auditory formant perturbations were applied in mels, a logarithmic measure of frequency. B, Experimental procedure. Each participant completed three sessions, one for each of the three perturbations (inward, outward, control) applied during the exposure phase. Participants performed both a speaking task and a listening task grouped into blocks of 90 trials. The stimulus words were “ease,” “add,” and “odd,” pseudorandomized within each block. No participants initially perceived their speech to be altered (Extended Data Table 1-1).
Table 1-1
Perturbation awareness. Download Table 1-1, DOCX file.
After every 45 trials, participants were given a short break (<30 s); a long break (1–2 min) was given every 90 trials. After completing all three MEG sessions, participants were given a brief questionnaire to assess their awareness of the perturbations. The pilot group completed a single session which followed the same experimental procedure as the outward-pushing session. This group received the same instructions (i.e., keep still and minimize jaw movement during speech) as participants did in the main study, though they were not seated in the MEG recording room.
Real-time auditory perturbation
During the exposure phase, participants were exposed to no perturbation (control) or to auditory perturbations that increased (outward-pushing) or decreased (inward-pushing) their perceived variability (Tang et al., 2022). The inward-pushing perturbation shifted every production toward the center of that participant's distribution for each vowel (i.e., the median F1/F2 values, the vowel “targets”; Fig. 1A). The outward-pushing perturbation shifted every production away from these targets. In speech, vowel sounds are defined by resonances in the vocal tract, known as formants. The first two formant frequencies, F1 and F2, are mostly determined by the height and front-back position of the tongue body, respectively, and sufficient to disambiguate different vowel sounds (Ladefoged, 2001). The perturbation magnitude was 50% of the distance, in F1/F2 space, between the current formant values and the vowel targets. Participants’ median F1/F2 values for each vowel were calculated during the baseline phase and subsequently used to calculate the participant-specific perturbation field.
Apparatus
We used a modified version of Audapter (Cai et al., 2008; Tourville et al., 2013) to record participants’ speech, alter the speech signal when necessary, and play the (potentially altered) signal back to participants in near real time [an unnoticeable delay of ∼18 ms, as measured on our system following Kim and Max (2020)]. Speech was recorded at 16 kHz via a lavalier microphone (Shure SM93, modified for compatibility with MEG recording) placed ∼4–7 cm away from the left corner of the mouth. Speech recordings were played back to the participants via MEG-compatible insert earphones (TIP-300, Nicolet Biomedical) at a volume of ∼80 dB SPL. The volume of speech playback varied dynamically with the amplitude of participants’ produced speech.
MEG data were acquired at Froedtert Hospital, using a 306-channel (204 planar gradiometers and 102 magnetometers) whole-head biomagnetometer system (Vectorview, Elekta-Neuromag). The raw data were acquired with a sampling rate of 2 kHz and high-pass filtered with a 0.03 Hz cutoff frequency. The head position of participants relative to the sensors was determined using four head-position indicator coils attached to the scalp surface, whose locations were digitized using a Polhemus Fastrak system, together with three anatomical landmarks (nasion and preauricular points) and ∼100 additional scalp points to improve anatomical registration with MRI. Head position was monitored continuously during the entire MEG recording and confirmed for consistency between blocks. Horizontal and vertical eye movements and heartbeats were monitored with concurrent electrooculogram (EOG) and electrocardiogram (ECG) recording.
In a separate MRI session after all three MEG sessions were completed, high-resolution T1- and T2-weighted images of each participant were obtained with a GE Healthcare Discovery MR750 3-T MR system in order to coregister each participant's MEG activity to a structural image of his or her own brain.
Acoustic analysis
Acoustic data were preprocessed and analyzed following the procedures previously described in Tang et al. (2022). F1 and F2 of all recorded speech words were tracked offline using wave_viewer (Niziolek and Houde, 2015), an in-house software tool that provides a MATLAB GUI interface to Praat (Boersma and Weenink, 2019). Linear predictive coding (LPC) order and pre-emphasis values were set individually for each participant. Vowel onset and offset were first automatically detected using a participant-specific amplitude threshold. All trials were then checked manually for errors. Errors in vowel onset and offset were corrected by manually labeling these times using the waveform and spectrogram. Errors in formant tracking were corrected by adjusting the pre-emphasis value or LPC order. In total, a limited number of trials (1.2% in control, 1.4% in inward-pushing, 1.1% in outward-pushing) were excluded due to production errors (e.g., if the participant said the wrong word), disfluencies, or unresolvable errors in formant tracks.
The primary goal of the acoustic analysis was to evaluate how variability changed across the different phases of each session. Variability within each experimental phase was measured as the average 2D distance in F1/F2 space from each vowel to the center of the distribution for that vowel in that phase, measured from the first 50 ms of vowel. For offline analysis, the exposure phase was divided into early (the first block of speaking), middle, and late exposure phases (two speaking blocks of each). We also measured vowel centering, a measure of within-trial variability change which is thought to reflect online correction for auditory errors (Niziolek et al., 2013; Niziolek and Kiran, 2018; Niziolek and Parrell, 2021). Centering was calculated as the change in variability from vowel onset (first 50 ms, dinit) to vowel midpoint (middle 50 ms, dmid): C = dinit − dmid.
Previously, Niziolek et al. (2013) showed that SIS was reduced when speech was less prototypical (i.e., for trials whose formants were farther from the center of the distribution in 2D formant space for a given vowel), compared with more prototypical speech productions (trials whose formants were closer to this center). Although not the primary focus of the current study, we conducted similar analyses to attempt to replicate these results with our current data. For each participant, speech productions during the baseline phase (60 productions of each stimulus word with unaltered auditory feedback) were divided into center and peripheral trials, defined as the closest and farthest 20 trials from each vowel target (closest and farthest third of trials; Fig. 5C). Combining these across all three stimulus words, 60 center and 60 peripheral trials were defined in the baseline phase of each MEG session for each participant.
MEG analysis
MEG sensor data preprocessing
A temporal variant of signal space separation (tSSS) using MaxFilter software v2.2 (Elekta-Neuromag) was first performed to remove external magnetic interferences and discard noisy sensors. MEG preprocessing and source estimation were performed with Brainstorm (http://neuroimage.usc.edu/brainstorm/; Tadel et al., 2011) combined with in-house MATLAB code.
All recordings were manually inspected to detect segments contaminated by large body/head movements or remaining environmental noise sources, which were discarded from further analysis. Heartbeat and eyeblink artifacts were automatically detected from the ECG and EOG traces and removed using signal space projections (SSP). Projectors were calculated using principal component analysis (one component per sensor) with Brainstorm default parameter settings (ECG: [−40, +40] ms, [−13 to 40] Hz; EOG: [−200, +200] ms, [−1.5 to 15] Hz). In all participants, the principal components (one for heartbeats and one for eyeblinks) that best captured the artifacts’ sensor topography were manually selected and removed. This was sufficient to remove artifact contamination. The preprocessed data (projectors included) were then bandpass filtered between 4 and 40 Hz with an even-order linear phase FIR filter, based on a Kaiser window design (Brainstorm default settings). The 4 Hz high-pass cutoff was applied to filter out low-frequency movement-related artifacts during speech production, improving detection of the M100. Sound onsets were detected offline automatically from the audio channel using an amplitude threshold (i.e., when the amplitude of the signal increased above 1.4 times the standard deviation of the signal over the entire file) and were corrected manually after visual inspection of the waveform. Filtered data were then baseline-corrected using a baseline period from −700 to −400 ms relative to sound onset (avoiding potential prespeech preparatory movement) and segmented into epochs of 1,100 ms (−700 to 400 ms relative to sound onset).
MEG source estimation
Source reconstruction was performed in Brainstorm using minimum-norm (MN) imaging. MEG sensor data were coregistered to individual anatomical MRIs for each participant using common fiducial markers and head shape digitization. For each MEG block, a forward model of neural magnetic fields was computed using the overlapping spheres method. The positions and orientations of the elementary dipoles were constrained perpendicularly with respect to the cortical surface. The noise covariance matrix, which was used for source estimation, was calculated from a MEG empty room recording (∼3–6 min) collected the same day as the participant's recordings. Dynamic statistical parametric maps (dSPMs), which are a set of z-scores providing spatiotemporal source distribution with millisecond temporal resolution, were estimated by applying Brainstorm's MN imaging approach (see Fig. 3A for source activity maps from a representative participant).
Measuring SIS
Source maps were averaged across blocks for conditions (task × experimental phase: Speakbaseline, Listenbaseline, Speaktest, Listentest; 180 trials each). For each MEG session of each participant, a region of interest (ROI) was defined as the 10 vertices (constrained) that had the largest response at M100 peak during Listenbaseline, restricted to left auditory and inferior parietal areas. Figure 3C shows the ROIs chosen for each participant during each MEG session projected on a template brain. Source brain activity was extracted from the chosen ROIs and root mean square (RMS) transformed, yielding a time series of positive evoked responses. The M100 peak in each condition was defined as the time point of maximal activity between 85 and 120 ms after sound onset; peaks were confirmed by visual inspection. The M100 amplitudes were then calculated as the mean amplitude across a 20 ms window centered at the M100 peak for each condition. In alignment with previous studies (Curio et al., 2000; Niziolek et al., 2013), SIS was calculated by taking the difference in M100 amplitude between the listening and speaking tasks (Fig. 3A; SISbaseline = Listenbaseline − Speakbaseline; SIStest = Listentest − Speaktest.
Baseline-normalized variability changes across sessions (control, inward, and outward). Individual and group means are indicated by thin lines with small transparent dots and thick lines with large solid dots, respectively. Error bars show standard error (SE). * indicates significant change (p < 0.05) from baseline. Baseline-normalized variability and centering changes in the pilot behavioral study are shown in Extended Data Figure 2-1. Baseline-normalized centering changes in the MEG study are shown in Extended Data Figure 2-2.
Figure 2-1
Baseline-normalized variability (left) and centering (right) changes in pilot behavioral study (outward perturbation), normalized by subtracting the average value in the baseline from the remaining trials. Individual and group means are indicated by thin lines with small transparent dots and thick lines with large solid dots, respectively. Error bars show standard error (SE). * indicates significant change (p < 0.05) from baseline. Download Figure 2-1, TIF file.
Figure 2-2
Baseline-normalized centering changes (normalized by subtracting the average value in the baseline from the remaining trials) across sessions (control, inward and outward). Individual and group means are indicated by thin lines with small transparent dots and thick lines with large solid dots, respectively. Error bars show standard error. Download Figure 2-2, TIF file.
Statistical analysis
Following similar procedures previously described in Tang et al. (2022), acoustic statistical analyses were performed with repeated-measures ANOVAs and post hoc tests. Data from baseline, late exposure, and test phases were included in the repeated-measures ANOVAs separately for the variability and centering results and for each MEG session, with phase and vowel identity as within-subject factors. Baseline-normalized changes in variability (normalized by subtracting the average value in the baseline from the remaining trials) were also compared between sessions (inward, outward, and control) using repeated-measures ANOVAs separately for the late exposure and test phases.
To test our main hypothesis that error sensitivity, measured as SIS magnitude, can be modulated by perturbations that change perceived variability, normalized SIS changes (calculated as 100 × (SISbaseline − SIStest) / SISbaseline) from all three MEG sessions were included in a repeated-measures ANOVA, with session (inward, outward, control) as a within-subject factor. We also assessed the significance of normalized SIS changes in each session using one-sample t tests (against value = 0, H0 = no change from baseline). Correlations between SIS changes and variability changes were estimated with Spearman's correlation. To test for potential changes in SIS based on prototypicality (Niziolek et al., 2013), we compared the SIS between center (SISc = Listenc − Speakc) and peripheral trials (SISp = Listenp − Speakp) during the baseline of all three MEG sessions using a repeated-measures ANOVA with distance (center vs periphery) and session (inward, outward, control) as within-subject factors.
For all analyses, post hoc comparisons with Bonferroni correction were conducted in the event of a significant main effect or interaction. The significance level for all statistical tests was set to p ≤ 0.05. Statistical analyses for both behavioral and MEG data were conducted in R (R Core Team, 2019).
Results
Speech variability can be modulated by alterations of perceived acoustic variability
Previous work in speech production showed that an outward auditory feedback perturbation, which increases perceived variability, results in increases in produced variability, potentially due to increased sensitivity to auditory errors (Tang et al., 2022). We conducted an initial behavioral study (pilot group: 10 participants) to confirm that the addition of passive listening trials, necessary for our neural metric of auditory error sensitivity, does not alter the behavioral response to this outward perturbation (Fig. 1B, outward condition only). In speaking blocks, participants produced the stimulus words (“ease,” “add,” and “odd”) one at a time while receiving auditory feedback of their own speech through circumaural headphones. Blocks of listening trials were interspersed between speaking blocks, as would be required for measuring SIS. Vowel formants, resonances of the vocal tract that distinguish different vowels, were perturbed during the exposure phase, with formants pushed outward from the center of each vowel (50% of the distance to the center in 2D mel frequency space). Vowel formants were unperturbed in initial baseline and final test phases. As in our previous study, participants in the pilot group significantly increased their produced variability during the outward perturbation (Extended Data Fig. 2-1; main effect of phase: F(2.1,18.7) = 4.741, p = 0.021,
For the main MEG study, each participant (N = 15) took part in three separate MEG sessions, each separated by at least 1 week, during which they received different auditory feedback during the exposure phase of the experiment (Fig. 1B): an inward perturbation that decreased perceived variability, an outward perturbation that increased perceived variability, and a no-perturbation control (normal auditory feedback). The inward and outward perturbations were designed to increase and decrease perceived trial-to-trial variability without affecting their overall mean (Tang et al., 2022). To confirm that this was achieved in the main study, we first calculated the magnitude of the applied perturbations of all trials across participants. During the exposure phase, the inward perturbation significantly decreased formant variability in the auditory playback by 19 mels over participants’ produced variability (one-sample t test against 0, t = 19.31, p < 0.001), while the outward perturbation significantly increased formant variability by 17.9 mels (t = 17.59, p < 0.001). Consistent with our intention to alter perceived variability without introducing any consistent shift in formants, neither perturbation affected the mean formant values (all means <2 mels, all p > 0.05 against 0).
There was no significant difference in baseline production variability across the three sessions (F(1.2,17.6) = 0.872, p = 0.388), suggesting movement variability during speech production remains stable over time in the absence of any auditory perturbation. To further evaluate the stability of baseline variability across sessions, intraclass correlation coefficient (ICC, using average measures) was calculated based on a two-way mixed-effects model. The across-session ICC for baseline speech variability was 0.89, representing good reliability (Koo and Li, 2016).
We then measured how participants changed their produced variability in response to the feedback perturbations. Produced variability was stable in the control session (Fig. 2, no main effect of phase: F(2,28) = 1.006, p = 0.378), while participants significantly increased their produced variability during both the inward (Fig. 2, main effect of phase: F(2,28) = 4.545, p = 0.02,
Contrary to our previous behavioral study, the variability increase observed during the inward session's test phase (when the perturbation was removed) was not statistically significant (t = 2.28, p = 0.116, d = 0.59), although it was numerically similar to the increase during the late exposure phase (+6.0 mels; Tang et al., 2022). However, directly comparing variability changes across sessions did show a reliable main effect of session during both the late exposure (F(2,28) = 6.823, p = 0.004,
In our previous work (Tang et al., 2022), the outward perturbation resulted in an increase not only in produced variability but also in vowel centering (+2.2 mel), defined as the reduction in variability from vowel onset (first 50 ms) to vowel midpoint (middle 50 ms). Because vowel centering may reflect corrections to ongoing vowel trajectories (Niziolek et al., 2015), this metric may also be related to auditory error sensitivity, although online corrections and trial-to-trial adaptation may be differentially sensitive to errors (Parrell et al., 2017; Franken et al., 2019; Lester-Smith et al., 2020). In the pilot group (N = 10), we observed numerically larger (+4.6 mel) but nonsignificant increases in vowel centering during the outward perturbation session (Extended Data Fig. 2-1; late exposure phase: t = 1.94, p = 0.251). In the main MEG study, no significant change in centering was observed in participants in any of the three perturbation sessions (outward: −0.41 mels, p > 0.05 in all cases; Extended Data Fig. 2-2).
No participants reported awareness of the perturbation, and none correctly identified the perturbation as a change to their vowels when informed after the final session that their speech had been manipulated (Extended Data Table 1-1).
In sum, we replicated our previous behavioral results: participants exposed to both inward-pushing and outward-pushing perturbations unconsciously increased their produced variability.
Alteration of perceived variability affects neural sensitivity to auditory errors
We first confirmed that M100 peak amplitudes during the baseline phase did not differ across sessions, neither in the speaking (F(2,28) = 2.73, p = 0.082) nor in the listening task (F(2,28) = 1.17, p = 0.324; one-way ANOVAs with session as a within-subject factor). Next, we compared the auditory cortical responses to speech between the speaking and listening tasks. As seen in Figure 3B, compared with the listening task, the average M100 response (sensor activity: averaged across all participants and all three MEG session baselines) was suppressed during the speaking task. Peak activations were observed around the left auditory and inferior parietal areas (cortical source maps). Source brain activity was then extracted from the ROIs (Fig. 3C, ROI locations for individual participants) and root mean square (RMS) transformed, yielding a time series of positive evoked responses (see Materials and Methods for detailed description and Fig. 3A for time courses of source activity extracted from ROIs for speaking and listening during the baseline phase in a representative participant).
Speaking-induced suppression (SIS) in left auditory cortex. A, Top panel, Time courses of source activity extracted from regions of interest (ROIs; see bottom panel) for speaking (solid line) and listening (dashed line) during the baseline phase in a representative participant. SIS is calculated as the difference in M100 amplitude between the listening and speaking peaks (vertical bar on the y-axis). Bottom panel, Source cortical maps (dSPM) during listening and speaking trials in the baseline phase at M100 peak (102 ms after sound onset). ROIs were defined as the 10 vertices surrounding the vertex with the largest response (restricted to left temporal and inferior parietal areas) at the time of the M100 peak during listen trials. B, Time courses of averaged sensor activity for baseline listening (top panel, dashed lines) and speaking (bottom panel, solid lines) during the baseline phase (averaged across all participants and all three MEG session baselines). Averaged, spatially smoothed cortical source maps at three visible peaks (36, 95, and 176 ms after sound onset) are shown below. C, ROI locations for each of the three MEG sessions projected on a common cortical surface template. Individual participants are indicated by different colors.
The primary goal of MEG analysis was to test whether the nervous system's sensitivity to errors, as measured by the magnitude of SIS (Fig. 4), can be modulated by auditory perturbations that alter speakers’ perceived variability. The magnitude of SIS has been shown to be modulated by the perceived sensory error (Behroozmand and Larson, 2011; Chang et al., 2013; Niziolek et al., 2013; Sitek et al., 2013); therefore, when the magnitude of the error is constant, increased error sensitivity will be manifested by greater perceived error, and, consequently, a decrease in the magnitude of SIS. In the current experiment, we compare the magnitude of SIS between baseline and test phases when the magnitude of sensory error is constant (no perturbation applied). We predicted that increased error sensitivity induced by temporary exposure to auditory perturbation would be reflected by a decrease in the magnitude of SIS in the test phase compared with the baseline phase.
Modulation of SIS across sessions (control, inward and outward). A, Source-localized auditory cortical time course aligned to sound onset during speaking (solid line) and listening (dotted line), shown separately before (baseline, left column) and after (test, right column) exposure to auditory perturbations. Shaded regions around the MEG traces indicate SEM across participants. B, SIS magnitudes across sessions obtained during baseline recording. C, Normalized SIS changes (%, calculated as 100 × (SISbaseline − SIStest) / SISbaseline) across sessions. Group means are indicated by solid squares. Connected points represent data from individual subjects. Error bars show standard error (SE). * indicates significance (p < 0.05).
All participants exhibited SIS: the M100 peak was suppressed in the speaking task relative to the listening task during all three sessions (Fig. 4A). We confirmed that the SIS magnitudes during the baseline phase did not differ across sessions (F(2,28) = 0.61, p = 0.549; Fig. 4B), suggesting that this measure is stable over time in the absence of any auditory perturbation. Moreover, the across-session ICC (average measures) for SIS magnitudes was estimated to be 0.82, representing good reliability (Koo and Li, 2016).
Consistent with our predictions, SIS was attenuated after exposure to the outward perturbation (decrease of 22.2% ± 8.2 s.e., t = 2.70, p = 0.017, d = 0.69; Fig. 4C), suggesting that speakers became more sensitive to auditory errors in this condition. This was true even though produced variability had returned to baseline levels; in other words, the decrease in SIS was not due to greater acoustic error during the test phase of this session. Conversely, SIS did not show significant changes after exposure to the inward perturbation (7.4% ± 16.1 s.e., t = 0.46, p = 0.652) nor when no perturbation was applied (control session, 28.9% ± 15.6 s.e., t = 1.85, p = 0.089). Normalized SIS changes in the test phase differed significantly across sessions (F(2,28) = 3.237, p = 0.054,
In addition, we tested whether M100 peak amplitudes changed after perturbations (from baseline to test) in speaking and listening tasks separately and whether such changes differed between sessions (inward, outward and no perturbation). We found that M100 amplitudes showed significant changes during listening trials after both inward (decrease of 8.8% ± 3.5 s.e., t = 2.19, p = 0.046, d = 0.57) and outward perturbations (decrease of 14.5% ± 4.6 s.e., t = 2.62, p = 0.020, d = 0.67). However, changes did not significantly differ across sessions when corrected for multiple comparisons (p > 0.07 for all cases). During the speaking trials, M100 amplitudes showed a relatively large but nonsignificant change only after the outward perturbation (decrease of 10% ± 7.5 s.e., t = 1.86, p = 0.083; p > 0.15 for other cases). Therefore, changes in M100 amplitude in speaking and/or listening trials alone were not responsible for significant across-session differences in SIS. Significant across-session differences were observed in speaking-induced suppression (SIS) only, which necessarily compares the evoked responses between listening and speaking and is suggested to reflect sensitivity to sensory prediction errors.
Although the current study was designed and powered primarily to assess group-level changes in neural markers of auditory error sensitivity, we additionally performed Spearman's correlation analyses between changes in produced variability and SIS, separately for each session. The results did not show any significant correlation between behavioral and SIS measures (ρ < 0.2, p > 0.05 in all cases). However, this negative result should be interpreted with caution given that (1) these two measures were necessarily calculated from different blocks in the experiment (i.e., variability was compared between baseline and late exposure phases, while SIS was compared between baseline and test phases) and (2) even at 80% power (β = 0.2, α = 0.0), a sample size of 47 would be required to detect a moderate correlation, suggesting the sample size in the current study (N = 15) did not have enough power to reliably detect any potential correlation between our behavioral and neural measures.
Suppression decreases with acoustic deviance
A previous study found that SIS was reduced in less prototypical productions (i.e., trials farther from the center of the vowel distribution) compared with more prototypical productions (i.e., trials closer to the center of the vowel distribution), suggesting that the auditory system is sensitive to sensory errors resulting solely from motor variability (Niziolek et al., 2013). Although not the main focus of the current study, we conducted an analysis to confirm these findings in the current dataset. Following previous methods, we divided baseline trials into tertiles based on their distance to the median formants for that vowel: center trials were defined as the closest tertile and peripheral trials as the farthest tertile (Fig. 5B; see Materials and Methods). Confirming previous work, we found that SIS was smaller in peripheral trials relative to center trials in all three sessions (Fig. 5A,C: F(1,14) = 13.179, p = 0.003,
Center versus periphery vowel productions. A, Source-localized auditory cortical time course aligned to vowel onset, during speaking (solid line) and listening (dotted line), separated into trials at the center (left column, green) and periphery (right column, purple) of each vowel's distribution. Shaded regions represent SEM across participants. B, Productions from a single participant in the current study, shown in 2D formant frequency space. C, SIS amplitudes measured over trials from the center and periphery across sessions (control, inward and outward). Large open circles show group means. Error bars indicate SE. The individual SIS amplitudes are shown in small filled circles.
Discussion
This study used a widely observed phenomenon in the auditory-motor domain, speaking-induced suppression (SIS), as a window into sensory error processing at the cortical level. Results showed that neural sensitivity to sensory errors, as measured by this suppression, can be modulated by altering speakers’ perceived variability. In particular, SIS was reduced (consistent with an increase in error sensitivity) after exposure to an outward-pushing perturbation that increased participants’ perceived variability, even after that perturbation was removed and participants’ produced variability had returned to baseline levels. These results confirm predictions about changes in error sensitivity in response to this perturbation generated from previous behavioral data and state-space modeling.
Given its importance to sensorimotor learning, sensitivity to sensory error has been extensively studied in the past two decades. Growing behavioral evidence has shown that the sensitivity to sensory error is affected by a number of factors. Individuals adjust their error sensitivity based on the level of confidence they have in their sensory feedback (Korenberg and Ghahramani, 2002; Burge et al., 2008). For example, when the visual feedback from a movement outcome is blurry, individuals are less likely to modify their motor commands than when it is sharp (Izawa and Shadmehr, 2008; Wei and Körding, 2010). Avraham et al. (2020) found that error sensitivity increased in consistent environments and suggested that such an increase was mainly due to the contribution of explicit strategies rather than the implicit process driven by sensory prediction errors. Reward and punishment feedback have also been shown to affect the sensitivity to error. Providing explicit rewards can change the speed of adaptation and also enhance the retention of adaptation (Galea et al., 2015; Nikooyan and Ahmed, 2015; Mawase et al., 2017).
The novel contribution of our study to the literature on error sensitivity is evidence consistent with a modulation of sensitivity to sensory error during speech adaptation. Compared with visuomotor adaptation during reaching, auditory-motor adaptation is thought to be a more implicit process, occurring without learner awareness (Munhall et al., 2009; Keough et al., 2013; Kim and Max, 2020; Lametti et al., 2020). No participants reported any conscious awareness of the auditory perturbations we applied in the current study or previous behavioral study (Tang et al., 2022). Thus, our results, for the first time, provide evidence that increasing perceived motor variability can increase sensory error sensitivity during implicit sensorimotor learning.
At first glance, this evidence of increased error sensitivity caused by a perturbation that magnified self-produced variability may seem to conflict with recent studies in reaching, which found that the variability of a sensorimotor perturbation (visuomotor rotation) reduced error sensitivity (i.e., learning rate) during implicit sensorimotor learning (Albert et al., 2021). However, more recent work (Wang et al., 2024) has shown that this effect results directly from the fact that adaptation corrects for errors of different magnitudes in a nonlinear manner, with no change in the underlying sensitivity to error. Additionally, there is a substantial difference between the manipulation in Albert et al. (2021), which modulated the variability of a sensory perturbation unrelated to participants’ own behavior, and our paradigm, which directly magnifies motor variability contingent on the participants’ movements. This difference (variability in the perturbation vs perceived self-produced motor variability) could potentially affect the sensorimotor system in varying ways. The high perturbation variability created in Albert et al. (2021) diminished the reliability of the error signal. In contrast, the outward perturbation applied in our study caused individuals to perceive their own movements as more variable, possibly leading to a decrease in confidence in their own feedforward motor control system and an increase in error sensitivity. Our paradigm is thus tied more closely to intrinsic motor variability and offers a novel examination of the relationship between motor variability and error sensitivity.
While evidence from behavioral studies suggests error sensitivity is malleable, there is limited neural evidence regarding whether and how error sensitivity can be modulated in the human brain. In the current study, we used SIS to measure the change in error sensitivity at the cortical level before and after exposure to auditory perturbations that alter speakers’ perceived variability. We argue that the measure of speaking-induced suppression necessarily compares the evoked responses between listening and speaking, using the listening condition as a baseline from which to evaluate the magnitude of suppression. It is worth mentioning that in the current study, although there were significant changes during listening trials alone before and after perturbations, we believe that these changes were not primarily attributed to experimental manipulations. First, these changes did not differ across sessions (outward vs inward). Moreover, if an experimental manipulation did evoke differences in listening activity (i.e., a difference in auditory processing alone), we would expect these differences to also be manifest in the speaking condition, as the input audio signal from the headphones is identical. This was not the case in our data; in other words, after exposure to the outward-pushing perturbation, the processing of self-generated and externally generated signals changed differentially. We argue that this difference reflects a change in the perceived accuracy of the prediction, regardless of whether the speaking or listening activity (or both) differs from the baseline.
In our study, exposure to an outward perturbation resulted in an attenuation of SIS during the test phase, after the perturbation had been removed. Importantly, participants’ produced variability had returned to baseline levels during this test phase, suggesting the decrease in SIS can be attributed to an increase in sensitivity to the same sensory error. In the inward perturbation session, we observed no increase in SIS. However, participants’ produced variability remained at elevated levels during the test phase compared with baseline (not significantly different from baseline in the current study, though this reached significance in our prior study; Tang et al., 2022). One possibility, therefore, is that error sensitivity may have decreased as a result of the inward perturbation, but that this decrease was cancelled out by the increase in sensory error they experienced during the test phase. Nevertheless, it is also possible that error sensitivity cannot be modulated by inward perturbations. Future analyses comparing SIS in the trials with the same sensory error (i.e., produced variability) during the inward perturbation session would help clarify this point.
It is important to note that, in this study, SIS was used to measure error sensitivity, but we do not argue that sensory prediction error arises exclusively in the sensory cortex. The cerebellum, an integral part of the motor system, is widely thought to generate predictions and compute and process sensory prediction errors (Kawato, 1999; Diedrichsen et al., 2005; Tseng et al., 2007; Shadmehr et al., 2010). For example, increased cerebellar activation has been found in the presence of prediction errors arising from an unexpected sensory event or the absence of an anticipated somatosensory stimulus (Schlerf et al., 2012). Numerous animal studies have indicated that error signals are encoded in the complex spikes of Purkinje cells of the cerebellum (Kitazawa et al., 1998; Kobayashi et al., 1998; Streng et al., 2018).
Such prediction errors then modulate a range of neuronal responses (den Ouden et al., 2012). For example, in the auditory-motor domain, the magnitude of SIS decreases in response to an increase in perceived error induced by external auditory perturbation (Behroozmand and Larson, 2011; Chang et al., 2013) or arising from internal variability (i.e., acoustic deviance; Niziolek et al., 2013; Beach et al., 2024). Therefore, although it is not feasible to measure cerebellar-mediated prediction error by directly recording cerebellar output using MEG, we can hypothesize changes in prediction error by assessing SIS at the auditory cortex. It should be noted that SIS, measured in this study using MEG, reflected an average response from left temporal and inferior parietal areas (see Fig. 3 for individual ROIs). A previous study using direct cortical recording (ECoG/iEEG), with higher spatial resolution, suggested that SIS and speech perturbation-response enhancement might be encoded at distinct temporal and inferior parietal regions (Chang et al., 2013). This is consistent with single-unit animal data showing that distinct subgroups of sensory neurons are responsible for suppression and enhancement (Eliades and Wang, 2003; Nelson et al., 2013; Schneider et al., 2014).
Several issues remain to be addressed in future work. First, we did not find a significant correlation between behavioral and neural measures (i.e., changes in acoustic variability associated with changes in SIS magnitude). As mentioned in our results, the behavioral and SIS changes were not measured in the same phase. Moreover, the sample size in the current study (N = 15) had limited power to detect any potential correlation. It would be important in the future to more directly test the correlation between behavioral and neural measures with a larger sample size. Second, in the main MEG study, in contrast to our previous behavioral study (Tang et al., 2022), we did not observe an increase in vowel centering during the outward perturbation session. In the pilot group, we did observe an increase in centering which was numerically similar to the results of Tang et al. (2022), but the increase was not statistically significant. To observe a reliable centering change, a larger sample size (e.g., >20) seems to be required, which also suggests that centering might be a less sensitive and noisier measure of error sensitivity. Future research is needed to determine the validity and reliability of centering as an index of online error correction.
In summary, error sensitivity, which determines how much we learn from erroneous movements, is a critical factor in sensorimotor learning. Previous work has provided behavioral evidence that error sensitivity can be modulated, but neural evidence has so far been lacking. Here, we took advantage of a well-established neural response during speech production that reflects changes in auditory error sensitivity. For the first time, we found that the nervous system's sensitivity to errors, as measured by prediction-driven suppression, can be modulated by altering perceived variability. Enhancing our ability to learn from erroneous movements is a crucial element in practical settings such as rehabilitation.
Footnotes
This work was supported by National Institutes of Health Grant R01 DC019134, a grant awarded through the University of Wisconsin–Madison Fall Research Competition, and a core grant to the Waisman Center from the National Institute of Child Health and Human Development (P50HD105353).
The authors declare no competing financial interests.
- Correspondence should be addressed to Caroline A. Niziolek at cniziolek{at}wisc.edu.