Abstract
The human brain tracks temporal regularities in acoustic signals faithfully. Recent neuroimaging studies have shown complex modulations of synchronized neural activities to the shape of stimulus envelopes. How to connect neural responses to different envelope shapes with listeners’ perceptual ability to synchronize to acoustic rhythms requires further characterization. Here, we examine motor and sensory synchronization to noise stimuli with periodic amplitude modulations (AM) in human participants (14 females, 10 males). We used three envelope shapes that varied in the sharpness of amplitude onset. In a synchronous motor finger-tapping task, we show that participants more consistently align their taps to the same phase of stimulus envelope when listening to stimuli with sharp onsets than to those with gradual onsets. This effect is replicated in a sensory synchronization task, suggesting a sensory basis for the facilitated phase alignment to sharp-onset stimuli. Surprisingly, despite less consistent tap alignments to the envelope of gradual-onset stimuli, participants are equally effective in extracting the rate of amplitude modulation from both sharp and gradual-onset stimuli, and they tapped consistently at that rate alongside the acoustic input. This result demonstrates that robust tracking of the rate of acoustic periodicity is achievable without the presence of sharp acoustic edges or consistent phase alignment to stimulus envelope. Our findings are consistent with assuming distinct processes for phase and rate tracking during sensorimotor synchronization. These processes are most likely underpinned by different neural mechanisms whose relative strengths are modulated by specific temporal dynamics of stimulus envelope characteristics.
Significance Statement
Ample evidence demonstrates synchronized neurophysiological activity to the temporal regularities of sounds. This phenomenon has been proposed to reflect neural responses to onset edges in acoustic signals. Here, we examine listeners' ability to behaviorally synchronize to stimuli with sharp or gradual onsets. In two experiments, we show that while the sharp amplitude onsets facilitate temporal phase alignment between participants' behavioral output and the stimulus envelope, sharp onsets are not essential for tracking the rate of auditory rhythms in the acoustic input. The dissociation between phase and rate tracking suggests distinct underlying neural mechanisms that are separately modulated.
Introduction
Temporal regularity is a fundamental feature of our acoustic environments (Ding et al., 2017; Varnet et al., 2017). Our auditory system tracks these regularities from continuous fluctuations in the amplitude envelope of sounds (Doelling et al., 2014). Neural synchronization to amplitude envelopes has been observed across various types of stimuli, from clicks with sharp amplitude onsets (Lakatos et al., 2013; ten Oever et al., 2017) to noise and musical sounds with gradual onsets (Henry et al., 2014; Doelling et al., 2019).
Recent neurophysiological studies have demonstrated that the shape of stimulus envelope impacts certain characteristics of stimulus–brain synchronization (Doelling et al., 2019; Oganian and Chang, 2019). For instance, Irsik et al. (2021) recorded electroencephalographic (EEG) responses to periodic noise sequences with sharp or gradual onsets. They showed that both stimulus types induce synchronized neural activities in the auditory cortex, albeit with distinct response patterns. Sharp-onset stimuli elicit stronger auditory event-related potentials, which have been argued to underpin efficient tracking of temporal regularity in stimulus sequences (Oganian and Chang, 2019; Zou et al., 2021). Gradual-onset stimuli evoke smoother, sinusoidal neural activity, which exhibit higher intertrial phase coherence (Irsik et al., 2021). This result was interpreted as robust neural synchronization to the repetition rate of stimuli, which resonates with consistent phase alignment between slow cortical activity and stimulus envelope when participants listened to sequences with gradual-onset stimuli (Doelling et al., 2019).
Neuroimaging findings reveal an interesting relation between the brain's sensitivity to the shape of stimulus' envelope and its processing of temporal regularity of stimulus sequences. Intuitively, one would assume that tracking temporal regularity of stimulus sequences relies on precise assessment of each stimulus' onset timing. Thus, sharp-onset stimuli, which provide clearer timing cues, should lead to stronger synchronization to stimulus sequences. However, despite stronger evoked responses to sharp- than to gradual-onset stimuli, continued neural synchronization to rhythmicity at the sequence level appears similarly robust for both stimulus types (Irsik et al., 2021).
These neuroimaging observations leave open the question of how the brain tracks temporal regularity. Moreover, since it is often not straightforward to assume a direct correspondence between the intensity of a neural response and the efficiency of a hypothesized cognitive process, neuroimaging data alone cannot fully characterize the impact of stimulus envelope on listeners' abilities to track temporal regularity.
Here, we address this question by examining listeners' synchronization to stimulus sequences at the behavioral level. Participants perform a sensorimotor synchronization (SMS) task (Repp, 2005; Repp and Su, 2013), wherein they synchronously tap to periodically repeated noise stimuli with various stimulus envelopes. Participants' motor output allows us to directly assess how stimulus envelope influences (1) their precision in capturing the onset of the repeated stimulus and (2) their ability to track the temporal regularity of stimulus sequences.
To examine participants' perception of stimulus onset, we examined to which precise position in the stimulus envelope they aligned their synchronized taps. The average alignment location reflects the perceived onset, also known as the perceptual center, of the stimulus (Morton et al., 1976). Previous studies showed that precise perception of stimulus onset results in consistent alignment locations across trials of the same stimulus type (see Hawkins, 2014 for a review).
To examine participants' tracking of the rhythmicity of stimulus sequences, we analyzed two characteristics of their continuous finger taps alongside stimulus sequences: (1) how consistently participants align each tap to the envelope of the repeated stimulus and (2) how steadily they maintain their tapping rate in line with the periodicity of the stimulus sequence. These two aspects, known as phase and period tracking, were proposed to operate concurrently to assure motor synchronization to periodic acoustic signals (Repp, 2005). We examine how stimulus envelope impacts each of these processes.
In addition to the motor experiment, we further conducted a sensory experiment to uncover the sensory basis of the envelope effect and formulated a model to qualitatively account for results from both experiments.
Materials and Methods
Participants
Twenty-four participants (14 females, 10 males; average age, 22.04; range, 18–27) provided written informed consent to take part in the study and received monetary compensation for their participation. All participants were right-handed and reported normal hearing. The experimental procedure was approved by the Ethics Council of the Max-Planck Society (no. 2017_12).
Experimental design
Stimuli
We used broadband (0–22,050 Hz) Gaussian noise stimuli which were amplitude modulated (AM) at 3 Hz. Each modulation window (333 ms) was composed of an AM noise stimulus (273 ms) that was preceded and followed by a silence period of 30 ms (in total 60 ms of silence in each modulation window). We used three modulation envelopes: damped, symmetrical, and ramped. The damped envelope corresponded to the descending half-cycle (peak to trough) of a sinusoidal function with a period of 273 × 2 = 546 ms (Fig. 1A, left), such that noise stimuli with this envelope exhibited sharp (abrupt) amplitude onset and gradual amplitude offset. The symmetrical envelope corresponded to the full cycle (trough to trough) of a sinusoidal function with a period of 273 ms (Fig. 1A, middle), such that noise stimuli with this envelope exhibited symmetrical amplitude rise and decay with the peak amplitude in the middle of the modulation window. The ramped envelope corresponded to the ascending half-cycle (trough to peak) of a sinusoidal function with a period of 273 × 2 = 546 ms (Fig. 1A, right), such that that noise stimuli with this envelope exhibited gradual amplitude onset and sharp (abrupt) amplitude offset. The RMS of each modulation window was normalized across different stimulus envelope shapes.
Stimuli and paradigm. A, Sample excerpts of acoustic stimuli with different envelope shapes. Each sample shows 1 s of acoustic signal which comprises three cycles of the amplitude modulated noise with the damped, symmetrical, and ramped envelopes (the modulation frequency is 3 Hz). B, Paradigm for the motor synchronization task. The schematic illustrates a single trial of the task. The top row depicts the acoustic stimulus, and the bottom row indicates the timing of each finger tap from the participant in response to the acoustic input. The red area highlights the analysis time window during which participants are assumed to have reached synchronized tapping. The time window comprises the final 20 taps of each trial. C, Paradigm for the sensory synchronization task. During this task, participants are presented with the same noise sequences as in the motor synchronization task. Along with each cycle of the modulated noise, a 1 kHz tone click (blue line) is presented, with its initial location being randomly selected within the modulation time window. Both the modulated noise and tone click are presented diotically. Participants are instructed to change the location of the tone click using a dial until the click is perceived to be synchronized to the noise stimuli. Participants then press a button to confirm the final location of the click, marking the end of the trial.
Tasks
Each participant performed two behavioral tasks. The first is a sensorimotor synchronization task in which participants were asked to produce continuous finger taps in sync with the periodically presented AM noise stimuli (Fig. 1B). The second task consisted of a sensory synchronization task in which the periodic AM noise sequence was simultaneously presented with a sequence of pure tone clicks that had the same presentation rate (Fig. 1C). Participants were instructed to adjust the delay between the tone sequence and noise sequence until the two acoustic streams were perceived as in-sync with each other.
The sensory synchronization task was introduced to complement findings from the motor synchronization task. Due to the absence of motor output, the sensory synchronization task examined participants' perception of the acoustic landmarks within the envelope of the noise stimuli that elicit the percept of synchronization with another acoustic stream. The same acoustic landmarks are assumed to serve as sensory references to guide the motor synchronization in the finger tapping task. Results from the sensory task would by hypothesis reveal how the envelope shape impacts the level of variability in the sensory processing of these landmarks and thus highlight the sensory components of synchronization variability in participants' performances in the motor task.
Procedure
Participants were seated in a sound-proof booth in front of an LCD monitor to receive instructions and feedback. Auditory stimuli were generated using MATLAB (The MathWorks) at 44.1 kHz/16 bits, output by a high-quality interface (RME Fireface UCX) and presented to participants binaurally via electrodynamic headphones (Beyerdynamic DT770 PRO). The experiment was run using MATLAB Psychophysics Toolbox extensions (Brainard, 1997) on a Fujitsu Celsius M730 computer running Windows 7 (64 bit).
Motor synchronization task
Participants were instructed to lay their right forearm and hand on the desk and tap their right index finger on the desk surface during each trial next to a Schaller Oyster S/P contact microphone. The sound of their taps was collected by the microphone and recorded via the RME Fireface UCX soundcard.
In each trial, one periodic sequence of amplitude modulated (AM) broadband noise with a modulation rate at 3 Hz was presented to the participant. Each sequence contained 30 modulation cycles, leading to a duration of 10 s. The average output intensity of the modulated sequence was calibrated at 70 dB (A-weighted). In each trial, the 10 s AM noise sequence was always preceded by a short noise stimulus whose amplitude was fixed at the maximum level of the amplitude modulation (Fig. 1B). The duration of this unmodulated noise stimulus varied randomly between 2 and 2.333 s across different trials.
Participants were instructed to start tapping their finger continuously after the onset of the unmodulated noise with any speed and rhythms they liked. Given that the duration of the unmodulated noise was variable across trials, having participants engage in free-style tapping alongside this noise stimulus assured a random alignment between participants' taps and the stimulus envelope at the moment when the periodic amplitude modulation started. After the beginning of the amplitude modulation, participants' task was to transition from free-style tapping to synchronized tapping to the periodic presented noise stimuli as fast as possible. Once synchronized tapping had been reached, they needed to maintain the synchronized tapping until the end of the sequence. After each trial, an intertrial interval randomly distributed between 1 and 2 s occurred before the beginning of the next trial.
For this task, each participant performed a total of 90 trials, divided into 10 blocks. Each block contained nine trials, three for each envelope shape. The trial order within each block was pseudorandomized such that there was no repetition of the same envelope condition between any two adjacent trials. Each block lasted ∼2 min, and participants were given a short break after each block. The experiment started with a practice session composed of six trials (two trials for each envelope condition), such that participants could familiarize themselves with the trial structure and their task, and find a comfortable arm and hand position for the experiment. Data from this session were excluded from the analysis.
Sensory synchronization task
On each trial, participants received a periodic sequence of AM noise stimuli with one of the three stimulus envelopes. Individual AM stimuli within each sequence were created with the same physical characteristics as in the motor synchronization task. Therefore, the AM noise sequence exhibited a 3 Hz modulation rate. Concurrently with the noise stimuli, a 1 kHz pure tone click was presented within each modulation window. The tone click had a 10 ms duration with 1 ms rise and decay time, and it was presented with a signal-to-noise (SNR) level of 2 dB with respect to the AM noise stimuli. This SNR level was selected such that the tone stimulus can be perceived at all locations of the modulation window without being too intense. The noise stimuli and tone clicks were diotically presented to participants.
For each trial, the first tone click was presented at a random temporal position within the modulation window of the noise stimulus, which gave a random delay between the onset of the tone click and the onset the AM noise. Participants were instructed to adjust the temporal position of the tone click by rotating a physical dial (Griffin Technology PowerMate UBS knob) until they perceived the tone click as being synchronized with the AM noise stimuli. Turning the dial clockwise would move the tone click forward in time, while turning the dial anticlockwise would move the tone backward. It is noteworthy that the tone click was moved within a circular space that covered the length of the modulation window of the noise stimuli. That is, moving the tone click beyond the right-end boundary of the modulation window would make it appear at the left-end boundary of that window, and vice versa. Since both acoustic streams (AM noise and tone click) were presented continuously, participants received the updated position of the tone click in real time and could continuously assess the synchrony between the two streams. Upon their perceptual judgment of synchrony between the tone click and AM noise, participants were instructed to press a button on the keyboard to confirm the final position of the tone click. This confirmation would mark the end of the trial and stop the presentation of both stimuli. An intertrial interval randomly distributed between 1 and 2 s occurred before the beginning of the next trial.
Participant also received 90 trials in total, divided into 10 blocks. Each block contained nine trials, three for each envelope shape. The trial order within each block was pseudorandomized such that there was no repetition of the same envelope condition between any two adjacent trials. For this task, since the duration of each trial was determined by participants, i.e., depending on how long they needed to reach the percept of synchrony, the duration of each block was variable. In average, the 10 blocks of the test lasted ∼45 min. This experiment started with a practice session composed of six trials (two trials for each envelope condition). Data from this session were excluded from the analysis.
Data analysis
Motor synchronization task
Detection of individual taps of each trial
The raw data of each trial consisted of a sound file that contained two channels. The first channel contained the sound of participants' finger taps during the entire trial recorded by the external microphone. The second channel contained the sequence of noise stimuli that was internally recorded by the soundcard during the trial. In order to measure the onset timing of individual taps of each trial, we developed a procedure that enabled flexible amplitude thresholding to extract tap onsets (abrupt amplitude rises to a relatively high level) from the background noise floor and, occasionally, from other low amplitude artifactual sound that overlapped with the tap (e.g., sound caused by movements of the participants' arm or hand against the desk surface). The trial recording and detected tap onsets were plotted for visual inspection, which allowed for further removal of sound artifacts that were falsely detected as taps.
Analysis rationale
Since we investigate participants' performance in synchronized tapping to stimulus sequences, we focused on participants' final 20 taps of each trial (Fig. 1B), during which participants are assumed to have reached consistent synchronized tapping. We derived different measurements from these taps to examine (1) participants' perceptual precision of stimulus' onset and (2) their continued tracking of temporal regularity of stimulus sequences.
The first examination concerns participants' perceptual precision in detecting the onset of each stimulus envelope. In a given trial, once a participant achieves synchronized tapping to the repeated stimulus, their taps should align approximately to the same location of the repeated stimulus envelope. The average alignment location across these taps provides a good estimate of the participant's perceived onset of the repeated stimulus during that trial. Moreover, the average tap alignment across all the trials of the same stimulus type provides an estimate of the overall perceived onset for that participant. This averaged location of the perceived onset is referred to as the perceptual center (Morton et al., 1976).
Besides the overall location of participants' perceptual center, previous research has also shown that stimuli with less salient physical onsets lead to larger variability in participants' average tap alignment locations across different trials. This effect can be described as the broadening of the perceptual center, which reflects less consistent localization of the perceived onset of the stimuli occurring across different trials and across different participants (see Hawkins, 2014 for a review). Reduced onset saliency can result from increased acoustic complexity of the signal (Villing et al., 2011) or slower amplitude ramping (Scott, 1993; Danielsen et al., 2019). Based on these previous findings, we expect larger variability in participants' tap alignment locations across trials for stimuli with gradual amplitude onsets compared with those with sharp onsets. We refer to measurements related to this examination as between-trial synchronization performance.
The second examination concerns participants' ability to continuously track the temporal regularity from individual stimulus sequences. Here, instead of estimating the overall tap alignment location for each trial, we focus on dynamic changes across participants' synchronized taps within each trial. From these taps, we derive two measurements to examine (1) how consistently participants maintain the tap alignment locations across synchronized taps and (2) how steadily they maintain their tapping rate in line with the periodicity of the stimulus sequence. Previous research of sensorimotor synchronization has proposed that these two abilities, known as phase and period tracking, reflect two cognitive processes that operate concurrently during continuous motor synchronization to periodic acoustic signals (Repp, 2005; Jacoby and Repp, 2012).
Given the presence of intrinsic noise in human's motor output, after achieving synchronized tapping in each trial, one should expect participants' taps to fluctuate to a certain extent around the location of the perceived stimulus onset. The degree of this fluctuation could be impacted by the temporal precision of the perceived onset. Under the assumption that gradual onsets reduce perceptual precision of stimulus onset, we expect participants to also show larger variability in tap alignment locations across individual taps within each trial. For the stability of participants' tapping rate within trials, we measure, for each trial, the variability of intertap intervals (ITIs) across the synchronized taps. Intuitively, greater variations in participants' tap-stimulus alignments across consecutive taps within a trial should also lead to larger variations in their ITIs across the same taps. We thus also expect an impact of stimulus envelope shape on the steadiness of participants' tapping rates within trials. We refer to measurements related to the second examination as within-trial synchronization performance.
Between-trial synchronization performance
To assess participants' between-trial performance, we examined the average location and variability across participants' tap-stimulus alignments (TSAs) from different trials of each envelope condition. The essential measurement for these analyses was the TSA location for each trial (TSAtrial), which was computed by averaging the TSA across the 20 taps within the trial (Fig. 2A). For this measurement, we first calculated the temporal delay (in ms) between the onset of each tap and the onset of the nearest modulation window. The cyclic presentation of AM noise stimuli led to a circular distribution of the TSAs. In order to more accurately estimate the average TSA location across the 20 taps of each trial, we converted the time delays (ms) into angles (radian) within a circular space that covered the length of the modulation window and computed the circular mean of these angles, which gave the average TSA location of each trial (TSAtrial). We then averaged TSAtrial across different trials of each envelope condition, which was used to assess participants' overall TSA locations for each envelope shape (TSAcondition). To examine how consistently participants' taps aligned to the same part of stimulus envelope across different trials of the same envelope condition, we computed the circular standard deviation (CSD; Mardia, 1975) across TSAtrial from different trials of each envelope condition (CSDbetween-trial TSA), using CircStat toolbox in MATLAB (Berens, 2009). Consistent tap-stimulus alignments between trials should lead to low CSDbetween-trial TSA.
Within-trial synchronization performance
To assess participants' within-trial performance, we examined their maintenance of the tapping rates and tap-stimulus alignments across the final 20 taps within each trial. To this end, we calculated the ITIs between every two adjacent taps across the 20 taps of each trial, which yielded 19 ITIs per trial (Fig. 4A). The mean ITI of each trial thus indicated the tapping rate of the trial, which should be close to 333 ms upon synchronized tapping at 3 Hz. Meanwhile, the maintenance of the tapping rate within the trial would be reflected in the standard deviation across the 19 ITIs, which we referred to as SDwithin-trial ITI. Steady tapping rate within trials should lead to low SDwithin-trial ITI. To examine the maintenance of tap-stimulus alignments (TSA) within trials, we measured the TSAs of individual taps within each trial, first as time delay (ms) and then converted into angles (radian) within a circular space that covered the modulation window of noise stimuli (Fig. 4A). To examine how consistently participants' individual taps aligned to the same part of stimulus envelope within single trials, we computed the circular standard deviation (Mardia, 1975) across the 20 TSAs for each trial (CSDwithin-trial TSA). Consistent tap-stimulus alignment within trials should lead to low CSDwithin-trial TSA for each individual trial.
Statistical analysis
To examined the effect of stimulus envelope shape on the between-trial variability of participants' TSAs, we used a nonparametric Friedman test with participants' CSDbetween-trial TSA as the dependent variable and stimulus envelope as the within-participant factor (damped vs symmetrical vs ramped). In case of significant effect of stimulus envelope, we ran post hoc comparisons with Wilcoxon signed-rank tests. We conducted the same analyses on participants' within-trial variability of TSAs (CSDwithin-trial TSA) and of ITIs (SDwithin-trial ITI). See results for detailed descriptions of these analyses.
Removal of trials with outlier synchronization performances
All participants self-reported having achieved synchronized tapping for most trials across the three envelope conditions. Nevertheless, we ran a procedure for each participant's data to detect and remove outlier trials in which the participant gave relatively inferior synchronization performance.
For each participant and envelope condition, we removed trials which, across the final 20 taps, exhibited exceptionally irregular tapping rates (ITIs) and/or tap-stimulus alignments with respect to other trials from the same condition. We conducted outlier removals for each envelope condition in order to avoid unfairly removing more trials from conditions which presumably elicited less consistent synchronized tapping than other conditions. Our goal was to achieve a more uniform trial cohort that reflect participants' synchronization performances in each envelope condition.
For each condition, trials whose mean tapping rate (reflected in ITItrial) exceeded the 99.7% percentile (equivalent to three times of the standard deviation of both sides) of the condition distribution (estimated as a normal distribution) were considered as outliers and removed. Trials whose within-trial ITI variation (SDwithin-trial ITI) was higher than the 99.7% percentile (right side) of the condition distribution (estimated as a Chi2 distribution) were considered as outliers and removed. Trials whose within-trial TSA variation (CSDwithin-trial TSA) was higher than the 99.7% percentile (right side) of the condition distribution (estimated as a Chi2 distribution) were considered as outliers and removed.
In summary, our procedure removed trials with exceptionally irregular tapping rates and/or tap-stimulus alignments, while preserving the effect of each envelope condition on both measurements. Our procedure removed an average of 5.74% of trials per participant.
For each selected trial, we conducted Rayleigh’s test to check whether the 20 TSAs within the trial form a uniform distribution. A uniform distribution makes it harder to interpret the trial-averaged TSA location. This test did not reveal any trials with uniform distribution of TSAs. For each participant, we also conducted the same test on TSAs of selected trials for each condition between compute the condition average TSA. This test revealed two participants (S114 and S119) of whom the distribution of TSAs across trials in the ramped condition did not differ significantly from a uniform distribution. Since we did not use the condition-averaged TSA for any analyses, and that all trials from these two participants passed the nonuniform distribution test, we did not exclude data from these two participants from our analyses on between-trial and within-trial TSA variabilities. However, we excluded the condition-average TSA for the ramped condition of these two participants from the calculation of the average TSA location across participants.
Sensory synchronization task
Determine the click-noise alignment for each trial
For this task, the experimental control program recorded the temporal position of the tone click within last modulation window of each trial. This final temporal position was selected and confirmed by the participant when the tone click was perceived as in-sync with the AM noise sequence (Fig. 3A). For each trial, we calculated the delay (in ms) between the onset of final tone click and the onset of the corresponding modulation window, which indicated the click-noise alignment of the trial (CNAtrial). Same as for the tap-stimulus alignment (TSA), we converted time delays (ms) into angles (radian) within a circular space that covered the modulation window in order to more accurately estimate the average click-noise alignment for each trial.
Between-trial synchronization performance
We then computed two measurements in order to assess participants' between-trial synchronization performance in the sensory synchronization task. First, we computed the average CNA across all the trials of each envelope condition (CNAcondition), which reflected each participant's overall CNA locations for each envelope shape. To examine the level of consistency with which participants aligned the tone click to the noise stimuli across different trials of each envelope condition, we computed the circular standard deviation (Mardia, 1975) across CNAs from different trials of each envelope condition (CSDbetween-trial CNA). Consistent click-noise alignments between trials should lead to low CSDbetween-trial CNA for individual participants.
For each participant, we conducted Rayleigh’s test to check whether the CNAs across trials of each envelope condition form a uniform distribution. A uniform distribution makes it harder to interpret the condition-averaged CNA location. This test revealed one participant (S117) of whom the distribution of CNAs across trials in the ramped condition did not differ significantly from a uniform distribution. Since we did not use the condition-averaged CNA for any analyses, we did not exclude data from this participant from our analyses on between-trial CNA variabilities. However, we excluded the condition-average CNA for the ramped condition of this participant from the calculation of the average CNA location across participants.
Statistical analyses
To examine the effect of stimulus envelope shape on the between-trial variability of participants' CNAs, we used a nonparametric Friedman test with participants' CSDbetween-trial CNA as the dependent variable and stimulus envelope as the within-participant factor (damped vs symmetrical vs ramped). In case of significant effect of stimulus envelope, we ran post hoc comparisons with Wilcoxon signed-rank tests.
Results
Effect of envelope shape on participants' average tap-stimulus synchronization
Figure 2B shows each participant's average tap-stimulus alignment for each stimulus envelope (TSAcondition). Note that all TSA-related analyses were conducted using circular data (in radians; see Materials and Methods). To better illustrate the TSA locations with respect to different types of envelopes, we displayed participants' TSAcondition as time delays (in ms) to the AM onset of noise stimuli in Figure 2B. Also, in our results we report the average and variability of TSA in both radian and ms (converted from the circular data). For the damped stimuli, participants' average taps are generally aligned to stimulus onset with a negative asynchrony (converted time delay: mean = −10.66 ms, SD = 11.55 ms; circular TSAcondition: mean = −0.20 radian, CSD = 0.22 radian). For the other two stimulus envelopes with more gradual amplitude onsets, participants' average taps were distributed after the onset of the amplitude rise, yielding an average delay of 44.89 ms (SD = 14.3 ms) across participants for the symmetrical envelope (circular TSAcondition: mean = 0.85 radian, CSD = 0.27 radian) and an average delay of 113.74 ms (SD = 38.35 ms) for the ramped envelope (circular TSAcondition: mean = 2.14 radian, CSD = 0.72 radian).
Participants' between-trial performance in the motor synchronization task. A, Computation of the tap-stimulus alignment of each trial (TSAtrial) by averaging the TSA across the 20 final taps of each trial. B, Average TSA of each stimulus envelope condition for individual participants (left, damped stimuli; middle, symmetrical stimuli; right, ramped stimuli). In each graph, the gray curve indicates the wave form of an example stimulus with the corresponding stimulus envelope. The x-axis represents time, with zero indicating the onset of the amplitude rise. The red circles indicate the condition-average of TSA (TSAcondition) of each of the 24 participants. The y-axis shows participant ID, common across the three graphs. Two participants' average TSAs for the ramped condition are omitted (S114 and S119). In these two cases, the distribution of TSAs across trials did not differ significantly from a uniform distribution, which affects the interpretability of the average TSA locations across these trials (see Materials and Methods for more details). C, Between-trial variation of TSA of the three stimulus envelope conditions (CSDbetween-trial TSA). For each envelope condition, circles indicate the average CSDbetween-trial TSA of individual participants. The levels of CSD are shown in both radians (y-axis on the left side) and ms (converted from circular data; y-axis on the right side). Asterisks mark statistical significance of pairwise comparisons among envelope conditions (***p < 0.001).
We then examined the between-trial variability of participants' TSAs to different stimulus envelopes (Fig. 2C). We used a nonparametric Friedman test with participants' CSDbetween-trial TSA as the dependent variable and stimulus envelope as the within-participant factor. The test showed a significant effect of stimulus envelope [Chi-2(2) = 46.08, p < 0.001]. Post hoc comparisons with Wilcoxon signed-rank tests revealed higher CSDbetween-trial TSA for the ramped envelope than for the symmetrical [Ramped-Symmetrical = 0.45 radian (23.7 ms); z = 4.29; p < 0.001] and for the damped conditions [Ramped-Damped = 0.57 radian (30.03 ms); z = 4.29; p < 0.001], as well as higher CSDbetween-trial TSA for the symmetrical condition than for the damped condition [Symmetrical-Damped = 0.12 radian (6.33 ms); z = 4.17; p < 0.001]. Additional statistical tests using mean resultant vector length (R) as the dependent variable showed the same effects of envelope shape (Extended Data Fig. S1A).
In summary, participants exhibited larger between-trial variability in their tap-stimulus alignments when synchronized to stimuli with more gradual onsets than to those with sharper onsets. This result aligns with findings from a previous study that employed sensorimotor synchronization tasks to musical and quasimusical stimuli with different onset dynamics (Danielsen et al., 2019).
Effect of envelope shape on participants' click-noise synchronization
Next, the analysis of the sensory synchronization task revealed similar between-trial performance as the motor synchronization task. First, participants' average click-noise alignment (CNAcondition) for different types of envelope shapes showed similar distributional properties as their average tap-stimulus alignment (TSAcondition) for the same envelope condition (Fig. 3B). For the damped stimuli, participants' final click locations were aligned to stimulus onset with a positive asynchrony, contrasting with the negative asynchrony of the sensorimotor tapping task (time delay: mean = 12.3 ms, SD = 8.18 ms; circular CNAcondition: mean = 0.23 radian, CSD = 0.15 radian). For the symmetrical stimuli, participants' final click locations showed an average delay of 79.86 ms (SD = 13.98 ms) with respect to the onset of amplitude modulation (circular CNAcondition: mean = 1.51 radian, CSD = 0.26 radian). For the ramped stimuli, participants' final click locations showed an average delay of 162.63 ms (SD = 24.73 ms) with respect to the onset of amplitude modulation (circular CNAcondition: mean = 3.07 radian, CSD = 0.47 radian).
Participants' between-trial performance in the sensory synchronization task. A, The click-noise alignment of each trial (CNAtrial) was measured based on the final position of the tone click of trial that was confirmed by the participant when the tone click was perceived as in-sync with the noise stimulus. B, Average CNA of each stimulus envelope condition for individual participants (left, damped stimuli; middle, symmetrical stimuli; right, ramped stimuli). In each graph, the gray curve indicates the wave form of an exemplar stimulus with the corresponding stimulus envelope. The x-axis presents time, with zero indicating the onset of amplitude rise. The blue circles indicate the condition-average of CNA (CNAcondition) of each of the 24 participants. The y-axis shows participant ID, which are common across the three graphs. One participant's average CNA for the ramped condition is omitted (S117). For this participant, the distribution of CNA across trials did not differ significantly from a uniform distribution, which affects the interpretability of the average TSA location across these trials (see Materials and Methods for more details). C, Between-trial variation of CNA of the three stimulus envelope conditions (CSDbetween-trial CNA) of the three stimulus envelope conditions. For each envelope condition, circles indicate the average CSDbetween-trial CNA of individual participants. The levels of CSD are shown in both radians (y-axis on the left side) and ms (reversely converted from circular data; y-axis on the right side). Asterisks indicate statistical significance of pairwise comparisons among envelope conditions (***p < 0.001).
Second, we examined the between-trial variability of participants' CNAs for different stimulus envelopes. A Friedman test with participants' CSDbetween-trial CNA as the dependent variable and stimulus envelope as the within-participant factor revealed a significant effect of stimulus envelope [Chi-2(2) = 42.25, p < 0.001]. Post hoc comparisons with Wilcoxon signed-rank tests revealed higher CSDbetween-trial CNA for the ramped envelope than for the symmetrical [Ramped-Symmetrical = 0.37 radian (19.61 ms); z = 4.26; p < 0.001] and for the damped conditions [Ramped-Damped = 0.53 radian (27.94 ms); z = 4.29; p < 0.001], as well as higher CSDbetween-trial TSA for the symmetrical condition than for the damped condition [Symmetrical-Damped = 0.16 radian (8.33 ms); z = 4.06; p < 0.001; Fig. 3C]. Additional statistical tests using mean resultant vector length (R) as the dependent variable showed the same effects of envelope shape (Extended Data Fig. S1B).
Finally, we compared participants' average synchronization locations in their motor task and sensory task. For each participant, we calculated the angular difference between the average tap-stimulus alignment (TSAcondition) and the average click-stimulus alignment (CNAcondition) for each stimulus envelope. We referred to this measurement as motor-sensory asynchrony (MSA). All the three envelope shapes yielded a negative MSA across participants: damped stimuli (angular difference: mean = −0.43 radian; CSD = 0.22; converted time difference: mean = −23 ms; SD = 11.56 ms); symmetrical stimuli (angular difference: mean = −0.66 radian; CSD = 0.35 radian; converted time difference: mean = −34.93 ms; SD = 18.43 ms); and ramped stimuli (angular difference: mean = −0.82 radian; CSD = 0.81 radian; converted time difference: mean = −43.27 ms; SD = 42.93 ms). This finding supported the presence of negative mean asynchrony between participants' tap locations and the presumable locations of the acoustic landmarks to which participants synchronize their taps.
In summary, the results from both motor and sensory synchronization tasks converge to show more variable synchronization locations with gradual-onset stimuli than within the envelope of sharp-onset stimuli. This variability effect was mainly observed across different trials within individual participants, although more variable alignments to gradual-onset stimuli were also observable across different participants (Figs. 2B, 3B). In the following analyses, we focus on participants' motor synchronization performances on individual trials. The objective of these next analyses is to assess the impact of stimulus envelope shape on participants' ability to generate and sustain temporally regular motor output that is synchronized to the input acoustic sequences.
Effect of envelope shape on participants' continuous synchronization to acoustic input
We examined two characteristics of participants' continuous motor output across the different taps in individual trials (see Materials and Methods for details): (1) the maintenance of tap-stimulus alignments (TSA) across different taps which was reflected in the variation of TSAs across taps of individual trials (CSDwithin-trial TSA; Fig. 4A) and (2) the maintenance of tapping rate which was reflected in the variation of ITIs across different taps of individual trials (SDwithin-trial ITI; Fig. 4A).
Participants' within-trial tapping performances. A, Measurements of within-trial variation of tap-stimulus alignments (TSA) and intertap interval (ITI). B, Within-trial variation of TSA of the three stimulus envelope conditions (CSDwithin-trial TSA). For each envelope condition, circles indicate the average CSDwithin-trial TSA of individual participants. The levels of CSD are shown in both radians (y-axis on the left side) and ms (reversely converted from circular data; y-axis on the right side). C, Within-trial variation of ITI of the three stimulus envelope conditions (SDwithin-trial ITI). For each envelope condition, circles indicate the average SDwithin-trial ITI of individual participants. Asterisks indicate statistical significance of pairwise comparisons among envelope conditions (*p < 0.05; ***p < 0.001).
We first examined the impact of envelope shape on participants' TSAs within trials. A Friedman test with participants' average CSDwithin-trial TSA as the dependent variable and stimulus envelope as the within-participant factor revealed a main effect of stimulus envelope [Chi-2(2) = 42.25, p < 0.001]. Post hoc comparisons with Wilcoxon signed-rank tests showed higher CSDwithin-trial TSA for the ramped envelope than for the symmetrical [Ramped-Symmetrical = 0.091 radian (4.85 ms); z = 4.26; p < 0.001] and for the damped conditions [Ramped-Damped = 0.13 radian (7.01 ms); z = 4.29; p < 0.001], as well as higher CSDwithin-trial TSA for the symmetrical condition than for the damped condition [Symmetrical-Damped = 0.041 radian (2.15 ms); z = 3.97; p < 0.001; Fig. 4B]. Additional statistical tests using mean resultant vector length (R) as the dependent variable showed the same effects of envelope shape (Extended Data Fig. S1C).
Next, we examined the impact of envelope shape on participants' ITIs within trials. Overall, all the three envelope shapes yielded an average ITI close to 333 ms (Damped: mean = 333.42 ms, SD = 0.51 ms; Symmetrical: mean = 333.60 ms, SD = 0.55 ms; Ramped: mean = 334.09 ms, SD = 1.26 ms). This result indicates a tapping rate close to 3 Hz for all three envelope types, with the largest deviation from the target interval being 1.09 ms for the ramped envelope condition. A Friedman test with mean ITI as dependent variable and envelope shape as the within-participant factor revealed a significant main effect of envelope shape [Chi-2(2) = 9.25, p < 0.01]. Post hoc comparisons with Wilcoxon signed-rank tests showed higher ITI for the ramped envelope than for the symmetrical [Ramped-Symmetrical = 0.47 ms; z = 2.17; p < 0.05] and for the damped conditions [Ramped-Damped = 0.67 ms; z = 2.52; p < 0.05]. The difference between symmetrical condition and damped condition was marginally significant (Symmetrical-Damped = 0.18 ms; z = 1.94; p = 0.052).
To examine the impact of envelope shape on participants' within-trial ITI variations, we conducted a Friedman test with SDwithin-trial ITI as the dependent variable and envelope shape as the within-participant factor. This analysis showed a significant main effect of envelope shape [Chi-2(2) = 6.25, p < 0.05; Fig. 4C]. Post hoc comparisons with Wilcoxon signed-rank tests revealed a significant difference between ramped and damped envelopes, but, unexpectedly, with the ramped envelope exhibiting lower within-trial ITI variations than the damped envelope [Ramped-Damped = −0.49 ms, z = −2, p < 0.05; Fig. 4C]. There was also a marginally significant difference between ramped and symmetrical envelopes, also with the ramped envelope exhibiting lower within-trial ITI variations (Ramped-Symmetrical = −0.81 ms, z = −1.77, p = 0.067; Fig. 4C).
Our analyses reveal complex modulations of participants' within-trial synchronized tapping performances by the shape of stimulus envelope. On the one hand, we demonstrated that gradual-onset envelopes caused participants' within-trial TSA to vary more across taps compared with sharp-onset envelopes, which is in line with our results on participants' between-trial TSA variabilities. On the other hand, the effect of stimulus envelope shape on participants' within-trial ITIs showed unexpected patterns. First, although our analysis showed a significant effect of envelope shape on participants' mean ITI, it is noteworthy that the amount of ITI differences across conditions were numerically negligible (<1 ms), with 0.67 ms being the largest difference between ramped and damped conditions. Most unexpectedly, our analysis showed similar within-trial ITI variations across the three conditions. Not only gradual-onset stimuli did not cause participants to vary more in their ITIs across taps, the ramped stimuli even showed slightly less within-trial ITI variations compared with the damped stimuli and symmetrical stimuli.
The differential influence of envelope shape on within-trial TSA and ITI variations is intriguing, as one might expect a rather straightforward correspondence between the two measurements: more variable tap-stimulus alignment between consecutive taps should lead to more variable intervals between consecutive taps. Therefore, it is puzzling how participants exhibit larger difficulty in maintaining consistent tap-stimulus alignment in trials with the gradual-onset stimuli than with the sharp-onset stimuli, while being able to maintain their ITIs similarly consistently for both envelope types. To address this unexpected pattern, we conducted a more detailed examination on the trajectories of participants' tap locations within trials.
Analyses of tap trajectory reveal stronger presence of TSA drift in synchronized tapping to ramped stimuli
Visual inspection revealed potential links between participants' within-trial tap trajectories and the levels of ITI and TSA variations. Figure 5B shows the TSA trajectories across the 20 taps from three selected trials of a single participant (Participant 117; Fig. 5A). Based on the previous findings, we are particularly interested in exploring characteristics of participants' tap trajectories that could result in different levels of TSA variations—while exerting little impact on ITI variation. Accordingly, the three selected trials illustrated in Figure 5B exhibit different levels of within-trial TSA variations while showing nearly identical within-trial ITI variations (Fig. 5A). Visual inspection of their tap trajectories reveals that the increase of TSA variations coincided with the amount of TSA drifts during the course of the trial (Fig. 5B). Meanwhile, the degree of TSA drifts did not impact the ITI variations of the three trials.
Analyses of tap trajectories within individual trials. A, Within-trial variations of tap-stimulus alignment (TSA) and intertap interval (ITI) of trials from Participant 117. Filled circles indicate three trials that we selected to show their TSA trajectories across the 20 taps in Figure 5B. These three trials exhibited different levels of within-trial TSA variations (CSDwithin-trial TSA, y-axis) while showing similar within-trial ITI variations (SDwithin-trial ITI, x-axis). B, Trajectories of tap-stimulus alignments across 20 taps of the three selected trials. In each graph, the x-axis presents time, with zero indicating the onset of the noise stimulus; y-axis present the order of the 20 taps from top to bottom; circles indicate the TSA location of each individual tap. C, Average TSA drift of the three stimulus conditions. For each envelope condition, circles indicate the average TSA drift of individual participants. The levels of TSA drift are shown in both radians (y-axis on the left side) and ms (reversely converted from circular data; y-axis on the right side). Asterisks indicate statistical significance of pairwise comparisons among envelope conditions (***p < 0.001).
Based on these observations, we examined the impact of stimulus envelope on the degree of within-trial TSA drift. To quantify the degree of TSA drift in each trial, we computed the average TSA location across the first 10 taps and across the last 10 taps of the 20-tap analysis window for each trial. The absolute difference between the two average TSA locations thus indicates the amount of TSA drift that takes place during the course of the trials. We then conducted a Friedman test on TSA drift with envelope shape as within-participant factor. The test revealed a main effect of envelope [Chi-2(2) = 44.33, p < 0.001]. Post hoc analyses revealed larger TSA drifts in participants' taps to ramped stimuli than to the symmetrical [Ramped-Symmetrical = 0.17 radian (9.04 ms); z = 4.29; p < 0.001] and damped stimuli [Ramped-Damped = 0.24 radian (12.61 ms); z = 4.29; p < 0.001], as well as larger TSA drifts for the symmetrical stimuli than for damped stimuli [Symmetrical-Damped = 0.067 radian (3.58 ms); z = 3.82; p < 0.001; Fig. 5C].
In summary, our analysis of within-trial tap trajectories showed greater TSA drift when participants synchronously tap to stimuli with gradual-onset envelopes than to those with sharp-onset envelopes. While it is expected that larger TSA drifts leads to higher within-trial TSA variations, it did not seem to be impactful for the level of within-trial ITI variations. It is noteworthy that greater TSA drift for gradual-onset envelopes may also account for slightly slower tapper rates in these conditions, as our analysis of participants' average ITI showed. If a participant's TSAs constantly drifts toward one direction within a trial, it should make the tapping rate in that trial faster or slower than the modulation rate of the stimulus sequence. Higher overall ITI for gradual-onset stimuli thus suggests that participants more often drifted their taps toward a slower tapping rate.
A unified model to account for sensory and motor alignment with noise stimuli
Results from both motor and sensory synchronization tasks reveal that stimuli with gradual amplitude onsets lead to larger variability in participants' alignment locations across different trials: tap-stimulus alignment (TSA) for the motor task and click-noise alignment (CNA) for the sensory task. This pattern indicates a relatively broad zone within the envelope of these stimuli that allows participants to align either their motor output or a sensory probe in order to achieve perceptual synchronization with the noise stimuli. In contrast, the small between-trial variability in TSA and CNA locations for sharp-onset stimuli indicates a narrow zone within these envelopes for achieving synchronization.
We outline a possible model which qualitatively accounts for this aspect of the alignment results from both tasks. The model characterizes a process which determines, for each individual trial, the location within the envelope of the noise stimuli which results in maximum synchrony between the noise stimuli and participants' taps (in the motor task) or the tone click (in the sensory task). As a proof of concept, we use the synchronization process in the sensory task to illustrate the proposed mechanism, given that the computations of maximum synchrony between tone clicks and noise is restricted to the auditory domain and is more straightforward to implement.
The model is shown in Figure 6A. The core computational unit takes two inputs: the cochlear output of the modulated noise sequence (right section of diagram) and that of the tone-click sequence (left part of diagram). Since the tone clicks are at 1 kHz, we assume that the cochlear region at 1 kHz is relevant for the alignment task. The two cochlear envelopes at 1 kHz reflect instantaneous firing rate functions of the neuronal circuits that respond to the noise stimuli and tone clicks, respectively. In the computational realization of our model, maximum synchrony is determined by the sequential cascade of two neuronal circuits. First, the cochlear envelopes are processed by an upward-level-crossing detector, which produces a spike when the up-going envelope crosses a certain threshold. The timing of the spike depends on the level of the threshold. Second, a coincidence detector is modeled to be the Euclidean distance between the spike time from the noise envelope, which serves as the reference, and the spike time of the click envelope (which in the experiment can be continuously adjusted via a dial). Maximum synchrony is reached when the adjustment of click time leads to minimum distance with the noise time, which results in the optimal temporal alignment between the two auditory streams.
Model for the sensory synchronization task. A, Schematic presentation of the model. A coincidence detector provides the distance between reference spikes, driven by the modulated noise stimuli as well as the clicks. The search for best alignment is accomplished via dialing the click phase toward minimum distance. The neuronal circuit that generates spikes is modeled as an upward going level-crossing detector, and the parameter is the level-crossing threshold. The figure depicts spike sequences for two threshold values (in orange and in magenta), overlayed on top of the corresponding simulated cochlear envelopes, for ramped and damped stimuli. B–D, Locations of the best alignment as a function of threshold level of the upward-level crossing detector for damped stimuli (B), symmetrical stimuli (C), and ramped stimuli (D). The blue curves represent the amplitude envelope of the 1 kHz cochlear channel of the modulated noise stimuli. Dotted lines indicate the level of threshold. Red squares indicate the locations of the optimal synchronization positions that correspond to the intersection between each threshold level and noise envelope.
Given the transient, sharp nature of the tone clicks, which should result in constant spike times with respect to their onset, the position of the optimal alignment ultimately depends on the spike time generated from the envelope of the modulated noise stimuli. Our model considers two sources of variability that affect the spike time of the noise stimuli. The first one is caused by biophysical uncertainties in the neuronal circuit of the upward-level-crossing detector, which changes the level of the threshold from one trial to another. The second one is caused by the random amplitude fluctuations in the envelope of the noise stimuli given the stochasticity of the broadband noise used in our study. Both sources of variability affect the location where the envelope of noise stimuli meets the threshold level and hence, the resulting spike time.
Outcomes of the model show that the spike times of gradual-onset stimuli are sensitive to the level of threshold (Fig. 6C,D), while those of sharp-onset stimuli are not (Fig. 6B). As shown in the figures, fluctuations of the threshold level within a fixed range lead to wider distributions of the spike time from the envelopes of gradual-onset stimuli than from those of sharp-onset stimuli, which consequently results in wider distribution of optimal alignment locations for the former. This outcome is qualitatively in line with larger between-trial variability of click-noise alignments for gradual-onset stimuli than for sharp-onset stimuli observed in the sensory task (Fig. 3), which suggests a reset of the biophysical threshold at the start of each trial.
The model also shows that the stochasticity of the broadband noise generates additional variation in spike time for each threshold level. This variation is also larger for gradual-onset stimuli than for sharp-onset stimuli. Under the assumption of a fixed threshold level for each trial, the additional variation in spike time should manifest across individual cycles of noise stimuli. Given that we only recorded the final alignment location of the tone click of each trial, such variation cannot be reflected in our data. Meanwhile, one could speculate that these additional variations may affect participants' certainty level in selecting the best alignment location, with less certainty for gradual-onset stimuli than for sharp-onset stimuli.
Finally, although this model is based on sensory synchronization task, its architecture can also be generalized to qualitatively account for the finger tapping data. For the latter task, the act of tapping replaces the tone dialing: the subject aims at minimizing the distance between the spike time from the noise envelope and the perceived timing of their own taps (somatosensory input). The determination of the spike time of noise stimuli can be implemented in the same way as the sensory synchronization model, with the same two sources of variation affecting the spike time. Meanwhile, the determination of spike time for tapping would mainly involve sensory processing in the somatosensory modality. Furthermore, given the sequential nature of continuous tapping, participants should also be able to make use of their preceding tapping intervals, in addition to the spike time of the noise stimuli, to adjust the timing of their following taps (Jacoby and Repp, 2012).
Stronger impact of ramping properties on between-trial than within-trial TSA variabilities
Except for the damped envelope, for which amplitude rises instantly, the symmetrical and ramped envelopes exhibit a progression in ramping time. Specifically, the ramped envelope has twice the ramping duration of the symmetrical envelope, which makes the ramping speed of the former half as fast as the latter.
Both ramping time and speed impact TSA variability. Since participants only align their taps to the rising part of stimulus envelope (Morton et al., 1976; Hawkins, 2014), the symmetrical envelope only contains half the range for viable synchronization spots compared with the ramped envelope. Moreover, faster ramping speed in the symmetrical envelope makes the stimulus onset more salient, which should further narrow down the zone for participants' tap alignments.
When comparing participants' CSDbetween-trial TSA in the two conditions, we found that between-trial TSA viability for the ramped condition is more than twice as large as that for the symmetrical condition (Symmetrical: 0.34 radian corresponding to 17.98 ms; Ramped: 0.79 radian, 41.68 ms). However, this condition difference is substantially reduced when comparing participants' within-trial TSA variations—CSDwithin-trial TSA (Symmetrical: 0.35 radian, 18.65 ms; Ramped: 0.44 radian, 23.50 ms).
These observations are in line with the assumptions that between-trial and within-trial TSA variabilities reflect different cognitive components of sensorimotor synchronization. Specifically, the full range of available acoustic landmarks for synchronization, which is reflected in the between-trial TSA variability, may be more directly determined by the physical properties of the onset envelope. In contrast, the range of TSA variability within a sequence of synchronous taps is more restricted within a narrower zone inside the full range of viable synchronization locations. Participants likely commit to a specific landmark in each trial and employ various mechanisms to maintain their tap alignment, which thereby restricts within-trial TSAs within a narrower zone.
Discussion
We examined how stimulus envelope shape influences listeners' behavioral synchronization to periodic acoustic input. First, we demonstrated, in both motor and sensory experiments, that envelope shape affects participants' temporal precision in perceiving stimulus onset. Second, we reported a complex impact of envelope shape on participants' continuous motor synchronization within trials: while gradual onsets reduced participants' consistency in tap-stimulus alignments, they showed little effect on tapping rate stability.
Gradual-onset envelopes provide less stable acoustic landmarks for onset perception
Stimulus onset plays a key role in sensorimotor synchronization, as listeners naturally align their motor output to acoustic landmarks perceived as the onset of an auditory event (Morton et al., 1976). Previous research on perceptual centers has demonstrated that the perceptual precision of stimulus onset is strongly influenced by the temporal dynamics of the stimulus envelope (see Hawkins, 2014 for a review). Reduced precision in onset localization has been observed using stimuli with slower ramping speed (Scott, 1993; Danielsen et al., 2019) and increased acoustic complexity (Villing et al., 2011). Using amplitude modulated broadband noise and two behavioral tasks, we confirmed previous findings that slower amplitude ramping increases variability in perceived onset locations across trials. This result suggests that gradual-onset stimuli provide a broader zone of viable landmarks for sensorimotor synchronization. Meanwhile, lower saliency of these possible landmarks also makes them unstable over time, which leads participants to shift among them across trials.
Consistent tap-stimulus alignment relies on the saliency of acoustic landmarks
Our within-trial analyses demonstrated clear effect of onset sharpness on participants' ability to consistently align their taps to stimulus envelope within individual trials, where sharp onsets led to more consistent tap-stimulus alignments than gradual onsets. This result resonates with previous neuroimaging studies showing stronger auditory evoked responses to sharp-onset stimuli (Thomson et al., 2009; Doelling et al., 2019; Irsik et al., 2021). Intracranial recordings in human and nonhuman primates also revealed specific neural populations in the auditory cortex whose firing rate is specifically modulated by the sharpness of stimulus onset (Oganian and Chang, 2019; Liu and Wang, 2022). For instance, Liu and Wang (2022) showed that the spiking of onset-sensitive neurons in marmoset auditory cortex is temporally more scattered when the animals were exposed to stimuli with ramped amplitude onsets than to those with sharp amplitude onsets.
Since evoked responses are time locked to their sensory triggers, strong responses to repetitive salient landmarks should provide reliable timing cues for the motor system to sustain consistent tap-stimulus alignments within individual trials. In contrast, reduced evoked responses to gradual-onset envelopes suggest less precise timing estimation of potential landmarks that can be extracted. Reliance on the percept of these landmarks should lead to larger variability in tap-stimulus alignments across taps. Our result confirms this assumption and suggests a functional link between the strength of auditory evoked responses and the precision in temporal tracking of repeated acoustic landmarks during sensorimotor synchronization.
Maintaining the tapping rate with salient and weak acoustic landmarks
Intuitively, one would expect a direct correspondence between the consistency of tap-stimulus alignment within a trial and the steadiness of the tapping rate. Assuming that participants' synchronized taps fluctuate randomly around a landmark that was identified at the beginning of the trial, larger fluctuations should lead to greater variations of both tap-stimulus alignments and ITIs. However, we found that despite larger variations in within-trial tap-stimulus alignments for gradual-onset stimuli than for sharp-onset stimuli, both stimulus types yielded similar levels of stability in participants' ITIs. Further analysis revealed that participants' tap alignments to gradual-onset stimuli did not fluctuate randomly but exhibited small-stepped drifts across consecutive taps. This pattern preserves a steady tapping rate while allowing for substantial deviations of tap-stimulus alignments across the trial.
The slow tap drift may be interpreted into two ways. First, the perceptual instability of the identified landmark in gradual-onset stimuli could cause small shifts of its perceived location within a trial. In this case, participants' taps simply follow the slow landmark movements and drift gradually. Alternatively, unstable landmarks in gradual envelopes may drive participants to rely more on monitoring their own ITIs to regulate the timing of their taps. Participants may first establish a reference interval based on consecutive onset landmarks extracted from the initial stimuli of the sequence and then regulate the timing of their taps to maximally match their ITIs to the reference interval. In this case, slow drifts occur because small variations in tapping interval are less perceptible without strong external feedback, and corrections are only made when deviations become large enough to be noticed.
Phase and period correction during synchronous tapping
The tapping pattern for gradual-onset stimuli points to two corrective processes that both contribute to the maintenance of synchronized tapping: (1) phase correction, which minimizes temporal mis-alignment between each tap and a sensory reference, and (2) period correction, which maintains the intervals between successive taps as close to a target interval as possible (Repp, 2005; Repp and Su, 2013).
Previous studies typically used stimulus perturbation paradigms, where a stimulus phase shift or tempo (period) change is introduced to the stimulus sequence after participants achieved synchronized tapping. These studies showed that participants' tap correction to stimulus phase shift occurs rapidly (e.g., after a single tap; Thaut et al., 1998; Repp, 2000), while correction to tempo change is usually established gradually across multiple taps (Repp and Keller, 2004). The latter result is in line with slow tap drift observed in our gradual-onset condition. Moreover, tap drifts have also been observed when participants try to maintain periodic tapping without external stimuli (i.e., self-paced tapping), where only period correction is possible (Madison, 2001). In summary, previous findings on period correction support the interpretation that the tap drift pattern observed in the gradual-onset condition reflects an increased reliance on monitoring of ITIs (i.e., tapping rate) to maintain synchronized tapping.
A putative neural mechanism for rate maintenance in sensorimotor synchronization
While phase correction is a point process on the time delay between the participant's single taps and the perceived landmark in each stimulus cycle, period correction requires a temporary storage of a specific interval, or rate, and comparing that reference interval to each of the participant's ITIs. Existing neurobiological research has suggested that the latter process could be achieved through neural entrainment which synchronizes rhythmic neural activities in the auditory cortex to the temporal regularity in the external acoustic stimuli (Lakatos et al., 2019). This phenomenon relies on intrinsic oscillatory fluctuations of neuronal excitability in the auditory cortex (Buzsáki and Draguhn, 2004; Lakatos et al., 2005), which can adjust their period to match the modulation rate of external acoustic input (Thut et al., 2011; Poeppel and Teng, 2020).
Despite differing explanations on several aspects of the entrainment mechanism (Doelling and Assaneo, 2021), neural entrainment has been demonstrated using stimuli with salient onset landmarks like tone clicks (Lakatos et al., 2013), as well as those with weak, more gradual, landmarks (Henry and Obleser, 2012; Irsik et al., 2021). Crucially, passive listening to rhythmic stimuli has been shown to elicit similarly rhythmic activities in the motor cortex (Assaneo and Poeppel, 2018). The rhythmic coupling between auditory and motor areas could underpin monitoring and correction of tapping rate during active sensorimotor synchronization.
Focal synchronization spots within a broad zone of the stimulus envelope
We interpreted the large between-trial TSA variability for gradual-onset stimuli as a broader zone of viable, yet unstable, acoustic landmarks for sensorimotor synchronization within the stimulus' envelope. It is however noteworthy that both our motor and sensory tasks required participants to align their behavioral output to a specific location within the viable zone in each trial. The sense of committing to a specific landmark is explicit in the sensory task, where participants could freely explore various locations to align the tone click and the noise stimulus before explicitly choosing one optimal location that gave the percept of maximum synchrony between the two stimuli streams. A similar process of settling on an optimal landmark should also take place in the motor task, although our within-trial measurements focused more on participants' ability to maintain the alignment to the selected location.
It remains unclear what mechanism accounts for the variation of the optimal synchronization spot across different trials. In our hypothesized auditory model, we propose that this variation primarily arises from biophysical fluctuations within the neuronal circuit that detects the amplitude onset of auditory input, which may also underlie the scattered spiking time of auditory neurons following ramped sensory input (Liu and Wang, 2022).
Conclusion
Our behavioral assessment of synchronization to external auditory rhythmicity highlights distinct processes for tracking individual acoustic events and the rate of periodicity in continuous signals. The data provide a meaningful link between neurophysiological responses to envelope characteristics of auditory input and the cognitive aspects in humans' behavioral synchronization to sensory rhythmicity.
Data Availability
Data from both behavioral tasks and all analysis code are available at the Open Access Data Repository of the Max Planck Society (https://doi.org/10.17617/3.DBNTVP).
Footnotes
We thank Julia Guldan, Freya Materne, and Claudia Lehr for their assistance with data collection as well as Cornelius Abel and Patrick Ulrich for technical support. We also thank Xiangbin Teng for thoughtful comments on various aspects of the study. This work was supported by the Max Planck Society and the Ernst Struengmann Institute for Neuroscience.
The authors declare no competing financial interests.
This paper contains supplemental material available at: https://doi.org/10.1523/JNEUROSCI.1488-24.2025
- Correspondence should be addressed to Yue Sun at yue.sun{at}esi-frankfurt.de.