In Spoken Word Recognition, the Future Predicts the Past

Laura Gwilliams; Tal Linzen; David Poeppel; Alec Marantz

doi:10.1523/JNEUROSCI.0065-18.2018

Article Figures & Data

Figures

Download figure
Open in new tab
Download powerpoint
Figure 1.
Stimuli examples. a, An example 11-step voice onset time syllable continuum used in Experiment 1. b, An example 11-step place of articulation syllable continuum used in Experiment 1. c, An example five-step perceptually defined continuum pair used in Experiment 2 generated from the words “barricade” and “parakeet” (shown in green). The resultant non-words “parricade” and “barakeet” are shown in red. The point of disambiguation is represented with a dashed line.
Download figure
Open in new tab
Download powerpoint
Figure 2.
Behavioral results for Experiment 1. Top, Behavioral psychometric function of phoneme selection as a function of the 11-step acoustic continuum. PoA and VOT continua are plotted separately. The colored horizontal lines correspond to the five behavioral classification positions used to define the perceptual continuum used in Experiment 2. Bottom, Reaction times as a function of the 11-step continuum. Note the slow down for ambiguous tokens and slower responses to items on the VOT continuum compared with the PoA continuum.
Download figure
Open in new tab
Download powerpoint
Figure 3.
Early responses to ambiguity in left HG (LHG) across the two experiments. A, Experiment 1: Time course of responses for each ambiguity level averaged over significant sources in LHG plotted separately for PoA and VOT continua. B, Experiment 1: Location of sources found to be sensitive to ambiguity in the spatiotemporal cluster test time-locked to syllable onset. Light-shaded region of cortex represents the search volume (HG and STG). Average t-value over time is plotted on individual vertices. C, Experiment 1: Averaged responses in significant sources in LHG over the p50m peak time-locked to syllable onset from 40 to 80 ms. Note that, for the p–t continuum, /p/ is “front” and /t/ is “back.” For the t-k continuum, /t/ is “front” and /k/ is “back.” D, Experiment 2: Location of sources found to be sensitive to ambiguity in the spatiotemporal cluster test time-locked to word onset. E, Experiment 2: Responses time-locked to word onset averaged from 40 to 80 ms over significant sources. F, Experiment 2: Location of sources found to be sensitive to ambiguity in the spatiotemporal cluster test time-locked to POD onset. G, Experiment 2: Response time-locked to POD onset averaged from 40 to 80 ms in significant sources. dSPM refers to a noise-normalized estimate of neural activity. **p < .01.
Download figure
Open in new tab
Download powerpoint
Figure 4.
Decoding analysis on acoustic stimuli. a, FFT decomposition of first 20 ms of the auditory stimuli plotted for each phoneme continuum. The histogram represents the 1000 permutations used to determine the significance of classification accuracy. b, Accuracy of the logistic regression classifier in identifying the correct phoneme based on leave-one-out cross validation. Accuracy drops off for more ambiguous tokens. c, Chance-level accuracy in classifying steps along the continuum.
Download figure
Open in new tab
Download powerpoint
Figure 5.
Time course of regression analysis for the four primary variables of interest for Experiment 2 time-locked to word onset (left column) and point of disambiguation (right column). A, Location of the most significant cluster for ambiguity (green) and acoustics (orange) derived from the spatiotemporal cluster test. B, Activity for each step on the continuum, averaged over the spatio-temporal extent of the cluster, after regressing out the other variables in the model: Plotting ambiguity effect after regressing out acoustics and feature type; plotting acoustic effect after regressing out ambiguity and feature type. C, Mean t-values averaged in the corresponding cluster for ambiguity and acoustics when put into the same regression model. Note that because the cluster is formed based on the sum of adjacent t-values that may be either above 1.96 or below −1.96, the mean value over sources is not directly interpretable as “t above 1.96 = p < 0.05.” D, Location of the most significant cluster for PoA (pink) and VOT (blue). E, Activity averaged for each level of the phonetic features when regressing out the other phonetic feature; for example, regressing out the effect of VOT and then plotting residual activity along the PoA dimension and vice versa. F, Mean t-values averaged in the corresponding cluster for PoA and VOT when put into the same regression model. *p < .05, **p < .01.
Download figure
Open in new tab
Download powerpoint
Figure 6.
Results of multiple regression applied at each phoneme of the words presented in Experiment 2. Analysis was applied to average source estimates in auditory cortex at different time windows. For ambiguity and acoustics, activity was averaged over left or right HG (the results for both hemispheres are shown). For PoA and VOT, activity was averaged over left or right STG and HG. The plotted values represent the t-value associated with how much the regressor modulates activity in the averaged region and time window. The analysis was applied separately at the onset of a number of phonemes within the words: p0 = word onset; POD = point of disambiguation; +1 = one phoneme after disambiguation point. Bonferroni-corrected p-values are shown for reference: *p < 0.05; **p < 0.01; ***p < 0.001. Average number of trials per subject is shown below the x-axes because the number of trials entered into the analysis decreases at longer phoneme latencies.
Download figure
Open in new tab
Download powerpoint
Figure 7.
Testing for phonological commitment: analysis pipeline. A, Location of cluster sensitive to the interaction between lexical resolution (word v. non-word) and continuous latency of POD as defined in milliseconds. B, Time course of interaction between lexical resolution and “early” versus “late” disambiguation. “Early” is defined as at or before the increment from word onset shown on the x-axis; “late” is defined as after the millisecond on the x-axis. The split from green to red shows the final position that the interaction is still significant (450 ms). C, Condition averages for the early/late word and non-words at POD. A significant interaction can be seen when splitting responses at 450 ms.
Download figure
Open in new tab
Download powerpoint
Figure 8.
Schematic model of processing stages. Acoustic input in the form of spectrotemporal information is fed to primary auditory cortex (i). Here, we hypothesize that subphonetic acoustic information of the input is compared with an internal representation of the perceptual boundary between phonetic features. The absolute distance from the boundary is computed, which corresponds to phoneme ambiguity as tested in this study. The signed distance (i.e., closer to one category or another) corresponds to phoneme acoustics. This processing stage is therefore the locus of the ambiguity effect, although we do not claim that ambiguity is neurally represented per se. Next, this travels to STG (ii), where the phonetic features of a sound (e.g., VOT, PoA) are processed. Note that it is likely that other features of the sound, such as manner, are also generated at this stage, as indicated by the ellipsis. The outputs of these two stages are fed to a neural population that tries to derive a discrete phonological representation based on the features of the input (iii). This stage represents the “phoneme commitment” process, which converges over time by accumulating evidence through its own recurrent connection, as well as feedforward input from the previous stages and feedback from the subsequent stages. The output of the processes performed at each phoneme position then feeds to a node that tries to predict the phonological sequence of the word (iv) to activate potential lexical items based on partial matches with the input (v). Note that both /p/- and /b/-onset words are activated in the example because both cohorts are partially consistent. Below, we show the anatomical location associated with each processing stage. Stage i (processing subphonetic acoustic detail) is located in HG bilaterally (in green). Stages ii–iii (processing phonetic features) is in STG bilaterally (in blue). Stage v (activating lexical candidates) is in left middle temporal gyrus (in purple). Note the similarities with the functional organization of the dual-stream model proposed by Hickok and Poeppel (2007).