Abstract
Visual word recognition, at a minimum, involves the processing of word form and lexical information. Opinions diverge on the spatiotemporal distribution of and interaction between the two types of information. Feedforward theory argues that they are processed sequentially, whereas interactive theory advocates that lexical information is processed fast and modulates early word form processing. To distinguish between the two theories, we applied stereoelectroencephalography (SEEG) to 33 human adults with epilepsy (25 males and eight females) during visual lexical decisions. The stimuli included real words (RWs), pseudowords (PWs) with legal radical positions, nonwords (NWs) with illegal radical positions, and stroked-changed words (SWs) in Chinese. Word form and lexical processing were measured by the word form effect (PW versus NW) and lexical effect (RW versus PW), respectively. Gamma-band (60 ∼ 140 Hz) SEEG activity was treated as an electrophysiological measure. A word form effect was found in eight left brain regions (i.e., the inferior parietal lobe, insula, fusiform, inferior temporal, middle temporal, middle occipital, precentral and postcentral gyri) from 50 ms poststimulus onset, whereas a lexical effect was observed in five left brain regions (i.e., the calcarine, middle temporal, superior temporal, precentral, and postcentral gyri) from 100 ms poststimulus onset. The two effects overlapped in the precentral (300 ∼ 500 ms) and postcentral (100 ∼ 200 ms and 250 ∼ 600 ms) gyri. Moreover, high-level regions provide early feedback to word form regions. These results demonstrate that lexical processing occurs early and modulates word form recognition, providing vital supportive evidence for interactive theory.
SIGNIFICANCE STATEMENT A pivotal unresolved dispute in the field of word processing is whether word form recognition is obligatorily modulated by high-level lexical top-down information. To address this issue, we applied intracranial SEEG to 33 adults with epilepsy to precisely delineate the spatiotemporal dynamics between processing word form and lexical information during visual word recognition. We observed that lexical processing occurred from 100 ms poststimulus presentation and even spatiotemporally overlapped with word form processing. Moreover, the high-order regions provided feedback to the word form regions in the early stage of word recognition. These results revealed the crucial role of high-level lexical information in word form recognition, deepening our understanding of the functional coupling among brain regions in word processing networks.
- broadband gamma activity
- feedforward theory
- interactive theory
- stereoelectroencephalography
- visual word recognition
Introduction
Visual word recognition entails several components, including the processing of word form (e.g., word length, letter order, and radical position) and high-level lexical information (e.g., phonology and semantics; Carreiras et al., 2014), which is supported by a wide variety of brain regions (Coltheart et al., 2010; Woollams et al., 2011). Visual word form information is coded in the left ventral occipitotemporal cortex (Binder et al., 2006; Vinckier et al., 2007; Bruno et al., 2008; Woolnough et al., 2021), whereas nonvisual lexical information is processed in multiple cortices (e.g., semantics, anterior temporal gyrus, inferior frontal gyrus, angular gyrus, posterior middle temporal gyrus; phonology, supramarginal gyrus, superior temporal gyrus, perisylvian cortex; Lau et al., 2008; Wu et al., 2012; Chen et al., 2020; Ding et al., 2020).
However, the spatiotemporal distribution of and interaction between the above-mentioned processing of visual word form and high-level lexical information remain unclear. Feedforward theory argues that lexical properties are processed after visual word form and do not affect word form processing (Jobard et al., 2003; Levy et al., 2009; Solomyak and Marantz, 2009). However, interactive theory assumes that visual words automatically activate top-down knowledge (e.g., sounds and meanings), which provides predictive feedback for word form processing (Woodhead et al., 2014; Whaley et al., 2016; Li et al., 2020). To solve this dispute, when and where the visual word form and nonvisual high-level lexical information are processed need to be elucidated.
Researchers have simultaneously manipulated word form and lexical factors to explore the spatiotemporal distributions of visual and nonvisual information using event-related potentials (ERPs; Hauk et al., 2006; Lin et al., 2011). For example, studies about Chinese characters used this approach. A Chinese compound character comprises a semantic radical and a phonetic radical, which are positioned following orthographic rule (Li and Kang, 1993; Shu et al., 2003). Previous studies have found that orthographically legal characters rather than orthographically illegal characters, regardless of lexicality, induced a stronger negative ERP component at ∼170 ms (N170) in the left posterior electrodes; thus, the authors inferred that visual radical position information rather than lexical features was processed at 170 ms after stimulus presentation (Lin et al., 2011; Yum et al., 2015, 2017). However, others have observed that phonology was already accessed at 100 ms in high-order regions [e.g., the left inferior frontal gyrus (IFG) and precentral gyrus (PrCG)] for English words (Wheat et al., 2010; Klein et al., 2015). The inconsistent findings about whether lexical information has been processed in the visual form processing stage might be because of the differences between languages (e.g., Chinese vs English) or between brain regions of interest (e.g., posterior primary regions vs high-order regions). To address this issue, it is necessary to inspect more precise spatiotemporal dynamics and functional connectivity between word form and lexical processes across primary and high-level regions in a given language.
The present study seeks to reveal the spatiotemporal distribution and the functional coupling of a wide range of cortical areas between word form and lexical processes during recognizing a Chinese word. To accomplish this, 33 adults with epilepsy performed a classical lexical decision task (Dehaene and Cohen, 2011), which contained real words (RWs), orthographically legal pseudowords (PWs) without meaning, orthographically illegal nonwords (NWs), and stroke-changed words (SWs; Fig. 1A). Word form processing was measured by the word form effect (i.e., PW vs NW), whereas lexical processing was measured by the lexical effect (i.e., RW vs PW), with the power of broadband gamma activity (BGA; 60 ∼ 140 Hz) of participants' intracranial stereoelectroencephalography (SEEG) signals as an electrophysiological measure. Different from the ERPs originating from a low-frequency oscillation (<40 Hz), the BGA is highly correlated with the firing rates of local neuronal populations, reflecting increases (or decreases) in neural activity (Lachaux et al., 2012), particularly in advanced cognitive tasks (e.g., language and memory; Kadipasaoglu et al., 2014; Cibelli et al., 2015).
Materials and Methods
Participants
Thirty-three adults with epilepsy (25 males) were recruited from Sanbo Brain Hospital, Capital Medical University, China. They were stereotactically implanted with depth electrodes to localize seizure foci for further clinical treatment. An a priori power analysis was conducted with G*Power 3.1.7 (F tests, ANOVA repeated measures, within factors; Faul et al., 2007) based on the criterion (α = 0.05; effect size f = 0.25). The effect size was estimated according to two relevant previous studies (Lin et al., 2011; Wang et al., 2016). The power analysis indicated that 24 participants in total would ensure 80% statistical power. All were native Chinese Mandarin speakers and most (29 patients) were right-handed (Oldfield, 1971). The patients' mean age and education duration were 27.48 years (SD = 6.26; range, 18 ∼ 43 years) and 12.15 years (SD = 2.87; range, 6 ∼ 17 years), respectively. The seizure zones in these patients covered the bilateral frontal, parietal, temporal, and occipital lobes and central sulcus (Table 1). The patients were implanted with 354 electrodes with 5035 contacts in total. All but six patients had their electroencephalography (EEG) signals recorded from 64 contacts using the 64-channel EEG system by BrainAmp amplifiers (Brain Products; Table 1). A contact was not selected to record when it was the ending contact of the electrode, showed high impedance (>15 kΩ), or was close to the epileptogenic zone. The epileptogenic zone of each patient was identified from preoperative magnetic resonance imaging (MRI), positron emission tomography (PET), and SEEG recordings. We also attempted to choose contacts in classical language areas that were repeatedly reported to participate in language processing, such as Broca's area, Wernicke's area, the fusiform gyrus (FG), the inferior temporal gyrus (ITG), and the precentral gyrus (PrCG; Jobard et al., 2003; Bolger et al., 2005; Ferstl et al., 2008; Price, 2012; Wu et al., 2012). The six patients who did not have all 64 contacts recorded had fewer than 64 contacts meeting the above inclusion criteria. Thus, 2095 contacts from 332 electrodes were recorded in our experiment. More electrodes (left 232 vs right 100) and contacts (1456 vs 639) were located in the left hemisphere (Fig. 2A; Table 1). The patients provided informed written consent. The study was approved by the Institutional Review Board of the National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University.
Materials and experimental procedure
The stimuli had 300 Chinese single-character words, including 150 RWs and 150 false words. False words did not exist in the Chinese corpus, including 50 PWs, 50 NWs, and 50 SWs. An RW (e.g., “种”, plant,/zhong4/) has a semantic (S) radical (“禾”, crops,/he2/) and a phonetic (P) radical (“中”, middle,/zhong1/). The semantic radical usually provides semantic clues for the whole word, whereas the phonetic radical provides phonological clues (Weekes et al., 2006; Bi et al., 2007). Radicals usually have certain positions in Chinese characters (Taft et al., 1999). PW's or NW's S and P radicals appeared in their commonly or unusually occupied positions, respectively. SW was created by adding or subtracting one stroke from an RW (Fig. 1A). RWs and false words had the same radicals. In other words, false words were created by combining the radicals of the RWs. A classical lexical decision task was adopted. Each word was presented for 1000 ms, interspersed with an unfixed-duration interstimulus interval (ISI, 1500 ∼ 2000 ms). During the ISI, a fixation cross, which was used to direct the participants' attention and to obtain a low-level baseline for neural activity, was presented for 1000 ∼ 1500 ms, followed by a 500 ms blank screen (Fig. 1B). The participants were instructed to decide whether the word was real by pressing the Yes or No button using their right hand. Half of the patients pressed the Yes button with their index finger and the No button with their middle finger (labeled as the yes-index finger group), whereas the other half pressed the buttons in the reverse (the yes-middle finger group). The experiment comprised 10 blocks, each having 30 stimuli. Seventeen patients (patient codes 1 ∼ 3, 6, 8, 9, 13 ∼ 17, 24, 26 ∼ 28, 31, and 32 in Table 1) completed the task twice. When participants conducted the task, the intracranial SEEG signals were recorded on-line at a sampling rate of 5000 Hz by BrainAmp amplifiers.
Behavioral analysis
We calculated patients' accuracy (ACC) and reaction time (RT) in the four experimental conditions (RW, PW, NW, and SW). One-way repeated ANOVA was used to analyze the main effects of the condition, and a paired-sample t test was used to compare the difference between conditions (Bonferroni corrected p < 0.01).
SEEG data preprocessing
SEEG signals were processed using the EEGLAB toolbox (Delorme and Makeig, 2004), the FieldTrip toolbox (Oostenveld et al., 2011), and in-house scripts in MATLAB 2018b (MathWorks). Data processing consisted of the following steps. (1) The raw EEG signals were referenced on-line to a scalp contact placed at the vertex and rereferenced to the average of the remaining contacts off-line after removing the contacts that displayed high impedance (>15 kΩ) and contained many artifacts or epileptiform activities. (2) The data were filtered off-line with a finite impulse response filter from 1 to 200 Hz, notch filtered at 50 Hz and its harmonics to remove line noise, downsampled to 1000 Hz, and segmented in −1 to 2 s epochs relative to the stimulus onset. (3) Epochs followed by incorrect behavioral responses were excluded. To control for the potential artifacts in the raw SEEG data, we further removed the epochs where most contacts showed improbable values (4 SDs away from the mean amplitudes across all the epochs), and those whose voltages were >350 μV. 4). To obtain the stimulus-related oscillations, we conducted time-frequency analysis to calculate the event-related spectral perturbation (ERSP), which indicated the correlation of the changes in the power spectrum with the onset time of the experimental stimulus (Makeig et al., 2004). In response to the stimulus presentation, an increase or a decrease in EEG oscillations in specific frequency bands was defined as event-related synchronization (ERS) or desynchronization (ERD), respectively. The time-frequency decomposition for each epoch was performed in the frequency band of interest (high γ frequency band, 60 ∼ 140 Hz) through a Morlet-based wavelet transform. The width of each window in each wavelet transform was seven cycles. To correct for the rapid dropoff in the spectral power with frequency, the power of each epoch was normalized to the inverse square frequency (Thesen et al., 2012). The power at each time frequency point after the stimulus presentation was normalized to the average power at baseline (from 800 ms before stimulus presentation to the stimulus onset) using z-scores. The converted absolute values were averaged over the frequency band and the epochs. (5) To localize the anatomic positions of intracranial contacts for each patient, the brain images in the preoperative MRI scans were registered to those in the postoperative computer tomography scans using FreeSurfer (Fischl, 2012). To perform group analysis across all patients (Babajani-Feremi et al., 2016), we projected the contacts from each subject onto a standard Montreal Neurologic Institute (MNI) reference brain to gain the MNI coordinates of all contacts. Using BrainNet Viewer software (Xia et al., 2013), the contacts from all patients were superimposed on a brain surface template for visualization. Further analyses were restricted to the contacts located in the Anatomical Automatic Labeling (AAL) gray matter template (Tzourio-Mazoyer et al., 2002).
Behavior-SEEG power mapping analysis
Identifying the task-relevant regions for collapsed experimental conditions
To define the brain regions that were involved in visual word recognition, we first selected the task-responsive contacts that were activated in the lexical decision task. In all trials of RW, PW, NW, and SW conditions, a contact was defined as responsive if it induced significant ERS or ERD in BGA, which exceeded the mean ±3 SDs of BGA values during the baseline period (an 800 ms time window before stimulus onset) for a consecutive 50 ms. The 50 ms time window provides an estimate of significant BGA activation that is less susceptible to momentary fluctuations (Osman et al., 1992; Matsuo et al., 2015; Ozker et al., 2017). Contact selection was performed for each time point from stimulus onset to 800 ms poststimulus presentation. It is worth noting that the identified contacts might engage in whole visual lexical recognition processing, including word-specific processes and cognition-general processes (e.g., visual perception, working memory, decision-making, and executive control). Then, we divided these responsible contacts into anatomic brain regions based on the AAL cerebral atlas. A brain region was defined as task related if it contained at least 20 responsible contacts from at least five different patients.
To further examine whether response fingers affected the SEEG result pattern, we calculated the incidence of task-relevant patients in the yes-index finger group (or the yes-middle finger group) in each task-relevant region. Namely, the number of task-relevant yes-index finger (or yes-middle finger) patients divided by the total number of patients in the yes-index finger (or the yes-middle finger). The incidences between the two finger groups were compared with χ2 tests.
Finally, we further extracted the curves of BGA power over time for all contacts in each task-relevant region. The following two measurements were calculated to describe the dynamic characteristics of these curves. The onset latency was defined as the earliest time point when the BGA power after stimulus presentation rose above (or fell below) baseline noise for 50 ms and was estimated for each responsive contact using the trial-averaged BGA power. The peak latency was defined as the duration time from the stimulus onset to the time where the contact obtained the maximum BGA value (peak power) before the end of the entire epoch (2000 ms after stimulus presentation). Independent sample t tests were conducted across all contacts for each pair of brain regions to investigate whether differences existed in the onset latencies (or peak latencies) of the contacts among these regions [false discovery rate (FDR) corrected p < 0.01].
Delineating the activity of each experimental condition in task-relevant regions
To examine which task-relevant regions were causally involved in each experimental condition (RW, PW, NW, and SW), we conducted the following analyses. First, we averaged the power values of contacts in each brain region for each patient in successive time windows (time window length, 100 ms; sliding time, 50 ms). Based on previous EEG studies on word recognition processing (Hauk et al., 2006; Chen et al., 2013; Hirshorn et al., 2016; Woolnough et al., 2021), we only analyzed the EEG data before 600 ms poststimulus presentation. Second, we calculated the inverse efficiency (IE) of each patient in each brain region. The IE measure referred to the average response time of correct trials divided by accuracy, which was used to reflect the inversed behavioral efficiency (Townsend and Ashby, 1983; Wei et al., 2012). Third, we computed Spearman's rank correlations between the power value and IE across the patients (p < 0.05). Finally, confirmatory Bayesian statistical analyses for all correlations were performed using Kendall's tau-b and the default settings in JASP version 8.6 software (https://jasp-stats.org/). Bayes factors (BF10) were used to index how likely the two variables should be correlated (Wagenmakers et al., 2018).
Revealing the difference between experimental conditions in each task-relevant region
The present study focuses on delineating the spatiotemporal dynamics of the word form effect (PW vs NW) and lexical effect (RW vs PW) in each task-related region. However, the results of the contrast between RW and NW conditions (i.e., word form plus lexical effect) could also provide an additional comprehensive interpretation for the function of each region. Therefore, we compared the difference in SEEG power values among the three conditions (RW, PW, and NW) in each region. Specifically, we ran a linear mixed-effects model for each task-relevant brain region in successive time windows (time window length, 100 ms; sliding time, 50 ms) before 600 ms poststimulus presentation (FDR corrected p < 0.05). In the model, we tested for fixed effects of experimental condition with the BGA power values as the dependent variable and behavioral IE as a covariate. When a region showed a significant main effect in the model, a pairwise comparison was conducted between the conditions in the region (FDR corrected p < 0.05). Thus, we extracted the exact time windows of the three effects: the word form effect (PW vs NW), lexical effect (RW vs PW), and word form plus lexical effect (RW vs NW). A conjunction analysis further distinguished the same and different time windows among the effects in each region.
To further validate the significant effects observed in this analysis, we calculated the proportion of contacts whose effect tendency was consistent with that of the observed significant word form (PW vs NW), lexical (RW vs PW), or word form plus lexical effect (RW vs NW).
Investigating the feedback received by the left ventral occipitotemporal and occipital cortices from other task-relevant regions in each experimental condition
To explore the direction of information flow between different brain regions, we performed Granger causal analysis (GCA) with a multivariate autoregressive model. GCA assumes that all time courses have covariance stationarity, and if the values of past time point(s) of time course X can forecast the current values of time course Y, then X causally modulates Y. This approach has been widely used with intracranial EEG data (Perrone-Bertolotti et al., 2012; Si et al., 2017). To improve the stationarity of the data, we conducted the following steps: detrending, first-order differencing, and subtracting the mean voltages from the preprocessed data. The Dickey-Fuller test (α = 0.01) was applied to test the unit roots. The optimal model order was determined by evaluating the Akaike information criterion (Akaike, 1974) and the Bayesian information criterion (Seth, 2005). However, similar to previous electrophysiological studies, both criteria failed to yield an optimal model order (e.g., Brovelli et al., 2004; Gow et al., 2008; Gow and Segawa, 2009; Omigie et al., 2015). Therefore, we further tested three other model orders (15, 20, 25) and obtained similar results for these orders. To simplify, we only reported the results of a model order of 25. The Granger causal (GC) values were computed with a 200 ms window size during 400 ms prestimulus onset to 1000 ms poststimulus onset. They were further normalized to the 400 ms prestimulus baseline by percentage change (%). The statistical significance of GC values was obtained via surrogate statistics (permutation test with 200 permutations). If the GC values between two brain regions in a patient were significantly higher than those of the random distribution over 50 ms (p < 0.05), connectivity between the regions was considered to exist for this patient. To avoid false positive results, we regarded the existence of a connection between two brain regions in the patient group only if its connectivity strength was significant in more than half the patients. Note that the connectivity analysis could only be conducted in patients who had contacts in two regions of interest at the same time. Among the regions showing word form effects, the left ventral occipitotemporal cortex (VOTC) plays a key role in processing orthographic information (e.g., bigram frequency, number of common letters, consonant-vowel structure; Vinckier et al., 2007; Lochy et al., 2018; Woolnough et al., 2021). In addition, the left occipital cortex has also been involved in orthographic tasks (Wu et al., 2012) and associated with visual word form representations (Zhang et al., 2018). Therefore, we examined the directional connectivity to VOTC [i.e., fusiform gyrus (FG) and inferior temporal gyrus (ITG)] and occipital cortex (i.e., MOG) from other task-relevant brain regions. They included the left calcarine sulcus [CS: 7 patients → FG (the number of patients who could be used for conducting GCA from CS to FG); 5 → ITG; 6 → MOG], IFG (3 → FG; 2 → ITG; 4 → MOG), inferior parietal lobe (IPL: 3 → FG; 3 → ITG; 4 → MOG), insula (INS: 3 → FG; 2 → ITG; 3 → MOG), middle frontal gyrus (MFG: 2 → FG; 2 → ITG; 2 → MOG), middle temporal gyrus (MTG: 8 → FG; 7 → ITG; 9 → MOG), precuneus (PcUN: 4 → FG; 4 → ITG; 5 → MOG), postcentral gyrus (PsCG: 3 → FG; 2 → ITG; 5 → MOG), PrCG (1 → FG; 1 → MOG), and superior temporal gyrus (STG: 7 → FG; 6 → ITG; 7 → MOG).
Data availability
The datasets that support the findings of this study are available from the corresponding authors on request.
Results
Behavioral performance
The behavioral performance of 33 subjects in the lexical decision task is shown in Fig. 1C. The main effect of ACC among the four conditions (RW, PW, NW, and SW) was significant (F(3,96) = 32.40; p < 0.001; ηp2 = 0.50; one-way ANOVA). The ACC of the PW condition [0.80 (mean) ± 0.14 (SD)] was significantly lower than that of the other three conditions: RW (0.94 ± 0.05; t(32) = −5.26; Bonferroni corrected p < 0.001; Cohen's d = −0.92; paired t test), NW (0.96 ± 0.03; t(32) = −7.25; corrected p < 0.001; Cohen's d = −1.26; paired t test), and SW (0.89 ± 0.07; t(32) = −5.38; corrected p < 0.001; Cohen's d = −0.94; paired t test). The ACC of the SW condition was significantly lower than that of the NW condition (t(32) = −5.86; corrected p < 0.001; Cohen's d = −1.02; paired t test) and the RW condition (t(32) = −3.23; corrected p = 0.003; Cohen's d = −0.56; paired t test). Furthermore, the ACC of the NW condition was significantly higher than that of the RW condition (t(32) = 2.97; corrected p = 0.006; Cohen's d = 0.52; paired t test). Similarly, the main effect of RT among the four conditions was significant (F(3,96) = 59.40; p < 0.001; ηp2 = 0.65; one-way ANOVA). Follow-up pairwise comparisons showed that the RT of the PW condition (882.70 ms ± 106.4 ms) was significantly longer than that of the RW condition (783.20 ms ± 96.21 ms; t(32) = 8.20; corrected p < 0.001; Cohen's d = 1.43; paired t test), the NW condition (783.50 ms ± 99.30 ms; t(32) = 15.21; corrected p < 0.001; Cohen's d = 2.65; paired t test), and that of the SW condition (841.50 ms ± 94.53 ms; t(32) = 4.80; corrected p < 0.001; Cohen's d = 0.84; paired t test). The RT of the SW condition was significantly longer than that of the NW condition (t(32) = 9.73; corrected p = 0.001; Cohen's d = 1.69; paired t test) and the RW condition (t(32) = 6.26; corrected p < 0.001; Cohen's d = 1.09; paired t test). However, there was no significant difference in RT between the RW and NW conditions (t(32) = −0.03; corrected p = 0.98; Cohen's d = −0.005; paired t test). Generally, the patients performed more poorly in the PW condition than in the other three conditions.
SEEG analysis results
Task-relevant regions in collapsed experimental conditions
We recorded EEG signals from 2095 contacts in 33 patients (Fig. 2A; Table 1). To perform SEEG analyses, we removed the following six types of recorded contacts successively: (1) those with much high electric resistance during the experiments (in the left hemisphere, 28; right, 11), (2) those with many artifacts or epileptiform activities (left, 75; right, 20), (3) those with >10% improbable trials of all the trials (left, 104; right, 35), (4) those beyond the gray matter AAL regions (Tzourio-Mazoyer et al., 2002; left, 259; right, 125), (5) those unrelated to our visual lexical decision task (left, 423; right, 236), and (6) those not reaching our inclusion criteria (at least 20 task-relevant contacts from at least five different patients in an AAL brain region; left, 168; right, 212). The remaining contacts (left, 399; right, 0) were distributed in 13 regions in the left hemisphere and came from 23 patients (Fig. 2B; Table 2). These identified brain regions were considered to be task relevant and might participate in multiple processes, such as visual perception, working memory, decision-making, executive control, or lexical/word form processes. For each task-relevant region, the incidence of task-relevant patients was insignificant in the yes-index finger and yes-middle finger groups (ps > 0.05; χ2 tests). The results indicated that the pressing button fingers did not influence the probability of task-relevant occurrence.
The contacts in each of the 13 task-relevant regions showed significant ERS of BGA values when the trials of four experimental conditions (RW, PW, NW, and SW) were averaged (Fig. 3A). Note that most of these task-relevant regions (10 regions) could also be defined even when only using trials at the first time of the task. Additionally, the Pearson correlations between the time series of the trials in the first time and those in the two times were also high in these 10 regions (all correlation coefficients > 0.97; ps < 0.001). Thus, we only reported the results at both times throughout this article.
To elucidate the spatiotemporal interrelationship between task-relevant regions, we compared the response onset latencies and peak latencies of the BGA time series among the regions. The onset latencies of three posterior visual regions (the left FG, 142 ms ± 55 ms; ITG, 265 ms ± 186 ms; MOG, 197 ms ± 171 ms) were earlier than those of 3 regions (the left CS, 421 ms ± 164 ms; INS, 422 ms ± 174 ms; STG, 550 ms ± 207 ms; t values −10.04 to −3.45; FDR corrected ps < 0.01; Cohen's d values −2.84 to −0.86; independent t tests). The onset latencies of two posterior visual regions (the left FG and MOG) were earlier than those of four regions (the left IFG, 333 ms ± 151 ms; IPL, 334 ms ± 152 ms; PcUN, 366 ms ± 140 ms; and PsCG, 357 ms ± 179 ms; t values −7.76 to −3.08; FDR corrected ps < 0.01, Cohen's d values −2.26 to −0.78; independent t tests). However, the onset latencies of two posterior visual regions (the left ITG and MOG) were comparable with those of the remaining three regions (the left MFG, 227 ms ± 103 ms; MTG, 256 ms ± 174 ms; and PrCG, 260 ms ± 158 ms; t values −0.96 to 1.68; FDR corrected ps > 0.05; Cohen's d values −0.24 to 0.39; independent t tests; Fig. 3B). The peak latencies of the three posterior visual regions (the left FG, 355 ms ± 325 ms; ITG, 433 ms ± 379 ms; and MOG, 373 ms ± 381 ms) were not significantly different (t values −0.89 to 0.73, FDR corrected ps > 0.05; Cohen's d values −0.22 to 0.16). The peak latencies of the three regions were significantly earlier than those of the other six regions (the left CS, 945 ms ± 503 ms; INS, 795 ms ± 392 ms; IPL, 742 ms ± 328 ms; PcUN, 801 ms ± 471 ms; PsCG, 705 ms ± 274 ms; and STG, 761 ms ± 269 ms; t values −5.12 to −3.74; FDR corrected p values < 0.01; Cohen's d values −1.44 to −0.81; independent t tests) but were comparable with those of three brain areas (the left MFG, 525 ms ± 386 ms; MTG, 411 ms ± 351 ms; and PrCG, 536 ms ± 295 ms; t values −2.29 to −0.26; FDR corrected ps > 0.05; Cohen's d values −0.59 to −0.06; independent t tests; Fig. 3C). Only one of the occipitotemporal regions reached peak power earlier than the remaining region (the left MOG vs the left IFG, 679 ms ± 487 ms; t(73) = 3.05; FDR corrected p < 0.01; Cohen's d = 0.71; independent t test). These results indicate that some high-order brain regions (e.g., the left MFG, MTG, and PrCG) might be coactivated in parallel with the posterior ventral occipitotemporal and occipital cortices during visual word processing.
Activity of each experimental condition in task-relevant regions
Figure 4 shows the correlations between the power of BGA of each task-related region and the behavioral IE of each experimental condition across patients. Significant correlations appeared in two regions for the RW condition (the left INS, 100 ∼ 200 ms and 150 ∼ 250 ms; left PcUN, 200 ∼ 300 ms and 350 ∼ 600 ms; ps < 0.05), two regions for the PW condition (the left IPL, 0 ∼ 100 ms and left MTG, 50 ∼ 150 ms; ps < 0.05), two regions for the NW condition (the left INS, 400 ∼ 550 ms and left PcUN, 450 ∼ 600 ms; ps < 0.05), and one region for the SW condition (the left CS, 150 ∼ 250 ms; p < 0.05). These results demonstrate that high-level brain regions (e.g., the left INS, IPL, MTG, and PcUN) might be involved in early visual word recognition.
The difference between experimental conditions in each task-relevant region
Figure 5 shows BGA power values over time for three conditions (RW, PW, and NW) in each region. The linear mixed-effects model showed significant main effects of the experimental condition in 10 task-relevant regions (F values 4.60 to 17.27; FDR corrected ps < 0.05; ηp2 values 0.12 to 0.35; linear mixed effects). A significant word form effect (PW vs NW; FDR corrected ps < 0.05) was observed in eight left task-relevant regions by pairwise comparisons (the left FG, 150 ∼ 550 ms, t(54) = −3.11, p = 0.003, Cohen's d = −0.59; INS, 450 ∼ 550 ms, t(63) = 2.14, p = 0.036, Cohen's d = 0.39; IPL, 50 ∼ 200 ms, t(56) = −3.93, p < 0.001, Cohen's d = −0.89; ITG, 200 ∼ 600 ms, t(83) = −2.90, p = 0.005, Cohen's d = −0.45; MOG, 150 ∼ 550 ms, t(90) = −2.40, p = 0.019, Cohen's d = −0.36; MTG, 50 ∼ 200 ms, t(92) = −4.25, p < 0.001, Cohen's d = −0.68; PrCG, 300 ∼ 500 ms, t(65) = −3.44, p = 0.001, Cohen's d = −0.60; and PsCG, 100 ∼ 200 ms, t(69) = −3.38, p = 0.001, Cohen's d = −0.56 and 250 ∼ 600 ms, t(71) = −4.63, p < 0.001, Cohen's d = −0.77; linear mixed effects). Except for the left INS, other brain regions showed higher BGA power in the NW condition than in the PW condition. A significant lexical effect (RW vs PW; FDR corrected p < 0.05) was observed in five left task-related regions (the left CS, 400 ∼ 600 ms, t(57) = 3.83, p < 0.001, Cohen's d = 0.77; MTG, 200 ∼ 300 ms, t(83) = −3.31, p = 0.001, Cohen's d = −0.53; PrCG, 300 ∼ 500 ms, t(65) = 3.40, p = 0.001, Cohen's d = 0.59; PsCG, 100 ∼ 200 ms, t(70) = 2.71, p = 0.009, Cohen's d = 0.45 and 250 ∼ 600 ms, t(71) = 4.56, p < 0.001, Cohen's d = 0.76; and STG, 500 ∼ 600 ms, t(54) = 2.75, p = 0.008, Cohen's d = 0.56; linear mixed effects). Except for the left MTG, other brain regions induced higher BGA power in the RW condition than in the PW condition. The word form effect had time windows identical to the lexical effect in the left PrCG (300 ∼ 500 ms) and PsCG (100 ∼ 200 ms and 250 ∼ 600 ms). As the difference between the RW and NW conditions at least included both word form and lexical processes, it is not surprising that a significant difference between them also simultaneously appeared in the time windows of each of the above word form effects or lexical effects. However, two of 10 regions (left PrCG, 300 ∼ 500 ms; PsCG, 100 ∼ 200 ms and 250 ∼ 600 ms) did not show a significant difference between RW and NW, although they presented both a significant word form effect and a significant lexical effect. It might reflect more working memory load when encoding unfamiliar visual representations (e.g., the NW stimuli; Rämä et al., 2001; Yang et al., 2011).
The validation analysis for the word form effect revealed that a consistent tendency appeared in a majority of contacts in each region (the left FG, 20/28 contacts; INS, 20/30; IPL, 15/20; ITG, 34/42; MOG, 33/45; MTG, 29/39; PrCG, 25/33; and PsCG, 30/36). Similarly, the lexical effects also showed consistent effect tendencies in most contacts in each region (the left CS, 18/25; MTG, 26/39; PrCG, 27/33; PsCG, 31/36; STG, 15/24). Consistent tendencies of the word form plus lexical effects were also found in a majority of contacts in each region (the left CS, 18/25; FG, 23/28; INS, 26/30; IPL, 18/20; ITG, 29/42; MOG, 31/45; MTG, 26/39; STG, 19/24).
These results indicate that lexical processing might occur at the early stage of word recognition (e.g., in the left MTG and PsCG) from 100 ms after stimulus onset and that lexical processing overlays word form processing in time and space (e.g., in the left PrCG and PsCG).
The feedback received by the left ventral occipitotemporal and occipital cortices from other task-relevant regions in each experimental condition
Figure 6 illustrates the strength of the interregional connectivity from high-level task-relevant regions to the ventral occipitotemporal cortex (i.e., the left FG and ITG) and the posterior occipital cortex (i.e., the left MOG) before 150 ms after stimulus presentation, where a word form effect was observed in the posterior word form regions. The left MOG received directional functional connectivity from the left IFG for the RW condition; it received feedback from the left IFG, INS, IPL, and PsCG for the PW condition; and it was connected from the left INS, PcUN, and PsCG for the NW condition. The left ITG received directional connectivity from the left CS for the RW condition, and it received feedback from the left CS, IPL, MTG, and PsCG for the PW condition. The left FG received directional connectivity from the left IFG for the RW condition, it received directional connectivity from the left MTG and PsCG for the PW condition, and it received feedback from the left INS for the NW condition. These results demonstrate that word form regions (including the left FG, ITG, and MOG) received early top-down modulation from nonvisual high-level regions (e.g., the left IFG, INS, IPL, PcUN, PsCG, and MTG) in word form processing of word recognition.
Discussion
To examine whether high-level lexical information is involved in early visual word form processing, the current study compared the power of BGA (60 ∼ 140 Hz) for four types of stimuli (RW, PW, NW, and SW) in a visual lexical decision task for 33 adults with epilepsy. Only 19.05% of the contacts (399/2095) responded to these experimental stimuli and were distributed in 13 left-brain regions. Possible reasons for the lack of task-relevant brain regions in the right hemisphere are that (1) the current task mainly depends on left-dominant processing and (2) some crucial brain tissues for the lexical decision were not recorded because of sparse sampling of SEEG or a few contacts. Among these regions, word form processing (measured by the word form effect, i.e., PW vs NW) occurred from 50 ms after the stimulus onset. High-level lexical processing (measured by the lexical effect, i.e., RW vs PW) occurred from 100 ms after the stimulus onset. The lexical effect occurred at early latency and even spatiotemporally overlapped with the word form effect in the left PrCG (300 ∼ 500 ms) and PsCG (100 ∼ 200 ms and 250 ∼ 600 ms). Moreover, the high-level regions provided early top-down feedback to the word from regions. This provides supporting evidence for interactive theory by investigating fine spatiotemporal dynamics of word processing across a wide range of brain areas in a given language.
Word form processing regions
We observed word form effects in eight brain regions (left FG, INS, IPL, ITG, MOG, MTG, PrCG, and PsCG) from 50 ms poststimulus presentation. We also found that among these brain regions, the activation patterns induced by the stimuli with legal word form (e.g., RW and PW) were similar, but different from the activation pattern induced by the stimuli with illegal word form (e.g., NW). The left FG and ITG, located in the VOTC, play critical roles in processing visual word form (Nobre et al., 1994; Woolnough et al., 2021) and represent orthography from coarse to fine over time (Hirshorn et al., 2016). We found a word form effect in the left VOTC from 150 ms poststimulus presentation, which might support gist-level discrimination of words with different visual statistics (Hirshorn et al., 2016). Previous literature also suggested that the occipital cortex showed a preference for visual words (Zhang et al., 2018) and participated in orthographic tasks of Chinese characters (Wu et al., 2012). This brain region might be responsible for visuospatial processing and ordering symbols in unfamiliar strings (Boros et al., 2016). Thus, the early activity of the MOG might contribute to the visuospatial processing of the stimuli structured by complex square-combined configurations.
The left IPL is known to play an essential role in the spatial selection, sequencing, and discrimination of spatial positions of letters (Pammer et al., 2006; Cohen et al., 2008; Ossmy et al., 2014; Carreiras et al., 2015b). Interestingly, we observed a word form effect in the left IPL starting from 50 ms after stimulus onset, even earlier than the word form effects observed in the VOTC and occipital cortices. We inferred that the letter position information might be transferred from the dorsal parietal areas to the ventral visual stream by potential functional and anatomic connectivity (Bouhali et al., 2014; Finn et al., 2014). The left INS was involved in word reading and contributed to phonology (Dickens et al., 2019). In the current study, we observed higher activation in the left INS for the orthographically well-formed stimuli (e.g., the RW and PW) compared with the stimuli with illegal orthography (e.g., the NW), which indicates that this region might process the abstract orthography. The left MTG, PrCG, and PsCG were found to be involved in both word form and lexical processing, and we discuss their roles in the following section.
High-level linguistic regions
We found that lexical effects occurred in five brain regions (the left CS, MTG, PrCG, PsCG, and STG) from 100 ms poststimulus presentation. Early lexical effects might not merely be driven by the adopted high-level task. Carreiras et al. (2015a) observed early activations of the left angular gyrus and intraparietal sulcus at ∼120 ms after stimulus onset even in a low-level visual task (pushing a button when a dot was presented as part of a stimulus; Carreiras et al., 2015a). The left CS is usually considered to be related to early visual processing (De Putte et al., 2018). Interestingly, we observed that this region was engaged in lexical processing. The findings might be explained by interactive theory (Price and Devlin, 2011), which emphasized that the function of the occipitotemporal region was determined by the interaction of visual input and top-down predictions induced by nonvisual attributes. The left STG is often implicated in phonological access, phonological short-term memory (Mano et al., 2013), and sublexical print-to-sound mapping (Tan et al., 2001). Compatible with the previous study (Hauk et al., 2012), the left MTG showed an early lexical effect from 200 ms. Earlier than the lexical effect, we also observed a word form effect. The left MTG might be important in the interface of orthographic information with semantic information for reading (Purcell et al., 2014), the link between word form and semantic networks (Price and Mechelli, 2005), lexico-semantic processing (Taylor et al., 2013), and the access and retrieval of semantic information (Gitelman et al., 2005; Wei et al., 2012). The left PrCG and PsCG have been found to be associated with phonology (Dickens et al., 2019). Particularly, the left PrCG is usually dedicated to the prelexical phonological representations from orthography (Wheat et al., 2010) and articulatory mapping (Dickens et al., 2019). Notably, in the left PrCG and PsCG, no significant differences were observed for the BGA power in the RW and NW conditions. One possibility might be that encoding the less familiar NW stimuli might increase working memory load and require stronger activations of the two regions to maintain working memory during the visual lexical decision task (Rypma et al., 1999; Rämä et al., 2001).
Modulation of linguistic regions to word form processing regions
Consistent with previous electrophysiological studies (Woodhead et al., 2014; Whaley et al., 2016), the present study showed directional connectivity from high-order language brain regions to word form regions at early latency (before 150 ms poststimulus presentation). Whaley et al. (2016) discovered early modulations of the visual form region from frontal regions (e.g., the left IFG and MFG) associated with phonological representations during word production. Consistently, the current study also found that high-level brain regions (e.g., the left IFG, INS, PsCG), which were linked to graphophonological conversion and spelling-sound translation (Jobard et al., 2003; Borowsky et al., 2006), provided early feedback toward the word form brain regions. Different from the above-mentioned two studies, more semantically related brain regions (e.g., the left MTG; Binder et al., 2009; Wu et al., 2012) were connected to the ITG in the current study. Different feedback connections might indicate that top-down modulations were affected by task demands, as proposed by the interactive theory (Price and Devlin, 2011). Oral production mainly requires the articulation of phonology, whereas lexical decision needs more semantic processes (Edwards et al., 2005).
As suggested by interactive theory (Carreiras et al., 2014), we found that the top-down modulations were affected by the properties of visual input. For PW stimuli that contain phonetic radicals, more feedback from the left dorsal IPL, which might be associated with the transformation from orthography to phonology (Taylor et al., 2013) and the short-term storage of phonological information (Tan et al., 2005), modulated word form processing. For NW stimuli with novel word form, PcUN, which might be related to visuospatial imagery (Cavanna and Trimble, 2006), provided more feedback to word form processing.
Limitations
The current study has at least the following caveats. (1) Inequivalent numbers of stimuli were contained in different experimental conditions (RW, 300 trials; PW, 100 trials; NW, 100 trials; SW, 100 trials), which might bias the lexical effect (RW vs PW). (2) GCA revealed that some high-level brain regions provide feedback to the left VOTC and MOG immediately after stimulus presentation. This might be because of the short interstimulus interval (1500 ∼ 2000 ms). Postreaction monitoring or predictions might produce top-down modulations that are too early. (3) Because of a lack of control task, word-specific processing could not be isolated from other general processes (e.g., visual perception, working memory, decision-making, and executive control).
Conclusion
By analyzing the SEEG signals recorded from 33 adults with epilepsy, we observed that lexical processing occurred early and even overlapped with word form processing in time and space. Furthermore, high-order brain regions (e.g., the left IFG, INS, IPL, MTG, PcUN, and PsCG) provided early top-down feedback to the primary visual word form regions (i.e., the left FG, ITG, MOG). These results demonstrate that word form and lexical processes in visual word recognition are interactive rather than sequential, providing important insights into the neural network dynamics of visual word recognition.
Footnotes
This work was supported by the National Key Research and Development Program of China (2018YFC1315200), the National Natural Science Foundation of China (31872785, 81972144, 31871099), and the National Defense Basic Scientific Research Program of China (2018110B011). We thank the reviewers for comments and Meng Zhao and Zhao Liu of Sanbo Brain Hospital for data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Zaizhu Han at zzhhan{at}bnu.edu.cn or Yuguang Guan at ygguan2000{at}163.com