The influence of lexical statistics on temporal lobe cortical dynamics during spoken word listening
Introduction
Many theoretical accounts of spoken word recognition posit a process of matching acoustic input with stored representations that have rich lexical and semantic structure. These models assume the existence of acoustic–phonetic, phonemic, and lexical targets in the brain that are activated when specific input is received (Marslen-Wilson, 1987, Marslen-Wilson, 1989, McClelland and Elman, 1986, Norris, 1994). For example, hearing the word “cat” evokes activity in neural populations that are selective to phonetic features like plosives and low front vowels, which in turn activate stored representations of the phonemes /k/, /æ/, and /t/. The activations of these phonemic representations are integrated over time, and serve as inputs to neurons at the lexical level that represent the word “cat”. The dynamic nature of this process has led many researchers to suggest that the representations in this hierarchy interact with and influence each other, meaning that word recognition is an iterative process where multiple targets at each level are active until the input is no longer consistent with those targets (e.g., /k-æ-p/) (Heald and Nusbaum, 2014, Marslen-Wilson and Welsh, 1978).
Several influential models of the neural basis of speech comprehension (Hickok and Poeppel, 2007, Scott and Wise, 2004) propose a set of cortical regions that perform the transformation from spectrotemporal representations of speech signals to abstracted lexical representations of words. These proposals are based on data from patients with lesions to various cortical areas (Dronkers & Wilkins, 2004), and on recent neuroimaging studies that support a distributed and interconnected network of cortical regions thought to be responsible for the representation of words and language (see, e.g., Davis and Gaskell, 2009, Turken and Dronkers, 2011). Many of these studies observe functional specialization of different regions in the temporal lobe, with acoustic–phonetic and phonemic representations in the posterior superior temporal cortex, and higher-order lexical representations in the middle, anterior, and ventral temporal cortex. This pathway is often referred to as the auditory ventral stream and is argued to link acoustic, phonemic, and lexical processing (see also DeWitt and Rauschecker, 2012, Okada et al., 2010, Lerner et al., 2011).
However, it remains unclear how this transformation occurs, and specifically how the ventral stream integrates high-level knowledge about the language with bottom-up acoustic input. In particular, the mental lexicon can be characterized by a number of features and statistics that relate the stored representations of individual words with one another, and also with lower-level features like phonemes and phonetic features. As a speech token unfolds, a cohort of forms stored in the lexicon that match the acoustic input is activated (Marslen-Wilson, 1987, Marslen-Wilson, 1989). This matching set of lexical forms (the cohort) will change over time as more of the target word is heard, thereby changing the lexical competition space on a moment-by-moment basis. It is therefore necessary to capture the temporal dynamics of these changing lexical statistics when describing the processes involved in word comprehension. A primary goal of the present study is to describe the spatiotemporal dynamics of spoken word recognition across the duration of a word, and across the auditory ventral stream.
In the present study, we compare neural responses to real words (e.g. ceremony, repetition) and novel forms, or pseudowords (e.g. moanaserry, piteretion), to examine how this lexical structure is encoded in the brain. Several studies have found differences in the hemodynamic response to real words and pseudowords (Davis and Gaskell, 2009, Mainy et al., 2008, Mechelli et al., 2003, Raettig and Kotz, 2008, Tanji et al., 2005). However, it is likely that the word/pseudoword difference is not purely binary; in particular, there is behavioral evidence that word-like forms may be processed as potential real words (De Vaan et al., 2007, Lindsay et al., 2012, Meunier and Longtin, 2007). Taken together, these findings suggest that while neural responses to pseudowords can be reliably distinguished from familiar word forms, the processing of novel forms may also rely on information stored in the lexicon, including high-level features like cohort statistics. Therefore, a second goal of this study is to explore how this type of stored lexical information can affect the processing of both pseudowords and real words.
Lexical statistics which capture aspects of lexical competition may be of particular importance for both real word and pseudoword processing. Cohort size (Magnuson et al., 2007, Marslen-Wilson, 1987, Marslen-Wilson, 1989) is defined as the number of words in the lexicon that match the phonemes the listener has heard up to any given point in a word. This provides an incremental metric of potential lexical forms which changes as acoustic input is received. Average cohort frequency, by contrast, is defined as the average lexical frequency of the words in a cohort. Finally, summed cohort frequency sums the lexical frequency of all words in a cohort, thus quantifying the number of words and their relative usage statistics in a single metric. The extent to which neural activity evoked by real words and pseudowords is modulated by these features allows us to explore the specific linguistic processes involved in the acoustic-to-lexical transformation.
To study the on-line processing of lexical forms and cohort statistics, we examined cortical responses to real words and pseudowords using data recorded from high-density electrocorticographic (ECoG) electrodes placed directly on the cortical surface. ECoG provides high spatial and temporal resolution with a relatively high signal-to-noise ratio at the individual electrode level. These properties are critical to our study goals of examining how the lexical status and cohort statistics of specific speech tokens affect neural activity as the input is being processed in real-time. Because the neural representations of words are complex, distributed, and likely represented in a high-dimensional space, these methodological advantages may be necessary to uncover the nature of lexical processing.
Section snippets
Subjects
Four human subjects underwent surgical placement of a 256-channel subdural electrocorticography (ECoG) array as part of clinical treatment for epilepsy. All electrode arrays were placed over the perisylvian region of the language-dominant hemisphere (left hemisphere for all but subject 3; no observable patterns differentiated the right hemisphere data from the left hemisphere data). All subjects gave informed written consent prior to surgery and experimental testing.
Stimulus design
The stimulus set consisted
Timing and location of responses to real words and pseudowords
Real words and pseudowords evoked different high-gamma neural responses across many temporal lobe electrodes (see Fig. 2a for electrode placement in one subject) in all participants, with typically stronger activity for pseudowords (Fig. 2b). This lexicality effect was significant between 320 and 1500 ms (bootstrap p < 0.05). In addition to these magnitude differences, there was a clear progression in the timing of the peak of the neural response from posterior to anterior temporal lobe electrodes
Discussion
We used high-resolution direct intracranial recordings to examine how the ventral stream for speech processes lexical information in both familiar and novel word forms. We found that neural activity recorded from the temporal lobe while participants listened to words and pseudowords reflects differences between these broad categories with relatively complex temporal dynamics. Further, responses to spoken stimuli were modulated by language-level features – cohort size, average cohort frequency,
Acknowledgments
The authors would like to thank Connie Cheung, Angela Ren, and Susanne Gahl for technical assistance and valuable comments on drafts of this work, and Stephen Wilson for stimulus design. E.S.C. was funded by a National Science Foundation Research Fellowship. M.K.L. was funded by a National Institutes of Health National Research Service Award F32-DC013486 and by an Innovative Research Grant from the Kavli Institute for Brain and Mind. E.F.C. was funded by National Institutes of Health Grants
References (46)
- et al.
Induced electrocorticographic gamma activity during auditory perception
Clinical Neurophysiology
(2001) - et al.
High-frequency gamma oscillations and human brain mapping with electrocorticography
Progress in Brain Research
(2006) - et al.
Lesion analysis of the brain areas involved in language comprehension
Cognition
(2004) - et al.
Sub-centimeter language organization in the human temporal lobe
Brain and Language
(2011) The cortical organization of lexical knowledge: A dual lexicon model of spoken language processing
Brain and Language
(2012)- et al.
Parallel versus serial processing dependencies in the perisylvian speech network: A Granger analysis of intracranial EEG data
Brain and Language
(2009) - et al.
Characterizing the dynamics of mental representations: The temporal generalization method
Trends in Cognitive Sciences
(2014) - et al.
Dynamic speech representations in the human temporal lobe
Trends in Cognitive Sciences
(2014) - et al.
Acquiring novel words and their past tenses: Evidence from lexical effects on phonetic categorisation
Journal of Memory and Language
(2012) Functional parallelism in spoken word-recognition
Cognition
(1987)
Processing interactions and lexical access during word recognition in continuous speech
Cognitive Psychology
The TRACE model of speech perception
Cognitive Psychology
Multimodal imaging of repetition priming: Using fMRI, MEG, and intracranial EEG to reveal spatiotemporal profiles of word processing
Neuroimage
Morphological decomposition and semantic integration in word processing
Journal of Memory and Language
Statistical and computational models of the visual world paradigm: Growth curves and individual differences
Journal of Memory and Language
Shortlist: A connectionist model of continuous speech recognition
Cognition
An event-related fMRI investigation of phonological–lexical competition
Neuropsychologia
Neuromagnetic evidence for the timing of lexical activation: An MEG component sensitive to phonotactic probability but not to neighborhood density
Brain and Language
Auditory processing of different types of pseudo-words: An event-related fMRI study
Neuroimage
The functional neuroanatomy of prelexical processing in speech perception
Cognition
Probabilistic phonotactics and neighborhood activation in spoken word recognition
Journal of Memory and Language
Functional organization of human sensorimotor cortex for speech articulation
Nature
Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English
Behavior Research Methods
Cited by (25)
Text feature-based copyright recognition method for comics
2024, Engineering Applications of Artificial IntelligenceSimultaneously recorded subthalamic and cortical LFPs reveal different lexicality effects during reading aloud
2021, Journal of NeurolinguisticsThe neural substrate of noun morphological inflection: A rapid event-related fMRI study in Italian
2021, NeuropsychologiaCitation Excerpt :Finally, our findings support the general notion that the statistical structure of a language affects the functioning of the mental lexicon and of its neural underpinnings and demonstrate that they both benefit from the existence of highly frequent phenomena in a linguistic environment (Cibelli et al., 2015; Milin et al., 2009; Pylkkänen et al., 2004).
Low-frequency cortical responses to natural speech reflect probabilistic phonotactics
2019, NeuroImageCitation Excerpt :However, crucial questions remain unanswered about the underpinnings of the corresponding cortical processes, mainly due to a lack of tools to extract direct measures of brain responses to phonotactics. Although prior studies have partially fulfilled this need (Connolly and Phillips, 1994; Dehaene-Lambertz et al., 2000; Wagner et al., 2012; Cibelli et al., 2015), their findings were mainly confined to nonsense words or to the domain of phonotactic violations, which are exceptions in natural speech scenarios. The present study aimed to measure brain signals reflecting phonotactic information during continuous speech processing, which are difficult to isolate when measuring only phonotactic violations.