Debate surrounds the precise cortical location and timing of access to phonological information during visual word recognition. Therefore, using whole-head magnetoencephalography (MEG), we investigated the spatiotemporal pattern of brain responses induced by a masked pseudohomophone priming task. Twenty healthy adults read target words that were preceded by one of three kinds of nonword prime: pseudohomophones (e.g., brein–BRAIN), where four of five letters are shared between prime and target, and the pronunciation is the same; matched orthographic controls (e.g., broin–BRAIN), where the same four of five letters are shared between prime and target but pronunciation differs; and unrelated controls (e.g., lopus–BRAIN), where neither letters nor pronunciation are shared between prime and target. All three priming conditions induced activation in the pars opercularis of the left inferior frontal gyrus (IFGpo) and the left precentral gyrus (PCG) within 100 ms of target word onset. However, for the critical comparison that reveals a processing difference specific to phonology, we found that the induced pseudohomophone priming response was significantly stronger than the orthographic priming response in left IFG/PCG at ∼100 ms. This spatiotemporal concurrence demonstrates early phonological influences during visual word recognition and is consistent with phonological access being mediated by a speech production code.
Extensive research has shown that phonological processing skill is a critical predictor of reading acquisition (Bradley and Bryant, 1983) and has been identified as a source of difficulty in dyslexia (for review, see Goswami, 2000). A common technique used to probe the earliest stages of processing in visual word identification is the masked priming paradigm. Such studies demonstrate reaction time advantages when pseudohomophones (e.g., brein) prime target words like “BRAIN” as compared to orthographic control primes (e.g., broin) (Perfetti and Bell, 1991; Lukatela and Turvey, 1994). This pseudohomophone priming effect is typically interpreted as indicating that the initial access code for word recognition is phonological in nature.
Although behavioral masked priming studies have suggested that phonological access occurs as quickly as 50–100 ms after words are presented (Ferrand and Grainger, 1993), such studies cannot determine precisely the time course of events that comprise visual word recognition. In part this is because outcome measures like reaction time represent the output of the system as a whole. But more importantly, experimental manipulations such as varying prime duration do not necessarily provide direct information about the time course of processing. For example, Rayner et al. (2003) demonstrated that exposure to text as brief as 60 ms is sufficient for lexical information to be extracted, but this was indexed by changes in eye-fixation duration ∼250 ms after stimulus. Thus, observing an experimental effect with a 60 ms prime does not necessarily mean that a particular processing step happens within 60 ms. Rather, 60 ms worth of input provides sufficient information to permit that process to occur, at whatever time point thereafter.
To elucidate when and where phonological access occurs during visual word recognition, time-sensitive neurophysiological measurements are ideal. Typically, the earliest EEG correlates of phonological priming have been found ∼200–300 ms following word presentation (Sereno et al., 1998; Grainger et al., 2006). An exception to this was reported by Ashby et al. (2009), who recorded EEG as participants read targets with voiced and unvoiced final consonants (e.g., fad, fat), preceded by pseudoword primes that were incongruent or congruent in voicing and vowel duration (e.g., fap, faz). Phonological feature congruency modulated ERPs by 80 ms, indicating that subphonemic features can be activated rapidly during word recognition. This latter finding is consistent with recent MEG studies showing early responses to printed words ∼100 ms after stimulus onset in the left inferior frontal gyrus, pars opercularis (IFGpo) and the precentral gyrus (PCG) (Pammer et al., 2004; Cornelissen et al., 2009). Put together, these neurophysiological data imply that phonological activation may indeed occur at ∼100 ms, and may be mediated by the IFGpo/PCG. However, such a conclusion is premature because neither Pammer et al. (2004) nor Cornelissen et al. (2009) specifically manipulated phonology in relation to IFGpo/PCG activity, and Ashby et al. (2009) did not localize their ERP data. Therefore, to test this idea, we used MEG to measure brain responses during a masked pseudohomophone priming task, and analyzed the data with cortical source reconstruction methods that provide high temporal resolution (milliseconds) and good spatial resolution (estimated to be 5 mm for 85% of voxels) (Barnes et al., 2004).
Materials and Methods
Twenty native English-speaking, strongly right-handed adults (mean age 23.2 years, SD 5.97 months; 12 female) gave informed consent to participate in the study. None had been diagnosed reading disabled and all read normally based on WRAT-III performance. Handedness was defined by the Annett Hand Preference Questionnaire (Annett, 1967). The study conformed with The Code of Ethics of the World Medical Association (Declaration of Helsinki).
The target words were 111 English five-letter nouns and verbs, with a mean word frequency count of 19.7 (CELEX). These were primed by pseudohomophones of the target word (PSEUD), matched orthographic nonwords (ORTH), and unrelated nonwords (UNREL). Pseudohomophone primes shared four out of five letters with their target word (brein–BRAIN) and were pronounced identically. Orthographic control primes shared the same four out of five letters with the target word but were pronounced differently (broin–BRAIN). Unrelated primes were pseudowords that shared no letters (in any position) with the target word (lopus–BRAIN). Five observers judged whether pseudohomophones sounded like target words and whether orthographic controls sounded different from target words. Winer's interrater reliability for these decisions was 0.97. All three prime types were matched on bigram frequency using a positional bigram frequency count derived from the five-letter words in CELEX. The mean log10 frequencies were as follows: pseudohomophones 5.639 (SEM 0.033); orthographic primes 5.687 (SEM 0.027); and unrelated primes 5.635 (SEM 0.034). A one-way ANOVA for positional bigram frequency score was not significant (F(2,330) = 0.06, p = 0.94), indicating that no condition contained primes made up of more frequently occurring letter pairs than any other condition.
Catch trials were randomly interspersed with experimental trials. Target catch trials had an animal name as the target, ensuring the participant had a purpose for attending to the stimuli (i.e., to spot the animal names). Prime catch trials had an animal name as the prime with the purpose of monitoring the visibility of the primes.
Participants were asked to rapidly and silently read target words and to press a button only if they spotted an animal name. Participants could be heard over an intercom at all times, ensuring they were not reading words aloud. The experiment consisted of 373 trials (including 40 catch trials) of 1890 ms separated by a fixation cross with duration randomly jittered between 1200 and 2200 ms. Each trial comprised the following: 300 ms blank screen, 500 ms forward mask “#####,” 66.7 ms lowercase prime, 16.7 ms backward mask “#####,” 300 ms uppercase target word, and 500 ms blank screen.
Stimuli were back-projected (60 Hz vertical refresh) as light gray words and symbols (Arial Monospace 24 pt) on a dark gray background using Presentation v12.0 (Neurobehavioural Systems). At a viewing distance of ∼75 cm stimuli subtended ∼1° vertically and ∼5° horizontally. Each participant saw each of the 111 target words three times, once for each priming condition (PSEUD, ORTH, and UNREL), making a total of 333 trials. A pseudorandom blocked design ensured that each participant saw a unique overall target word presentation order, and across six participants, prime–target relationships were counterbalanced.
MEG data were collected continuously using a 4D Neuroimaging Magnes 3600 Whole Head, 248 channel system, with the magnetometers arranged in a helmet shaped array. Data were sampled at a rate of 678.17 Hz (200 Hz anti-alias filter). Head shape and head coil position were recorded with a 3-D digitizer (Polhemus Fastrak), and used for coregistration (Kozinska et al., 2001) with a high-resolution T1-weighted anatomical volume reconstructed to 1 mm isotropic resolution, acquired using GE 3.0T Signa Excite HDx.
Source-space analysis: beamforming.
Neural sources of activity were reconstructed with an in-house modified type I vectorized linearly constrained minimum-variance beamformer (Van Veen et al., 1997; Huang et al., 2004). In a beamforming analysis, the neuronal signal at a location of interest in the brain is constructed as the weighted sum of the signals recorded by the MEG sensors, the sensor weights computed for each location forming three spatial filters, one for each orthogonal current direction. The beamformer weights are determined by an optimization algorithm so that the signal from a location of interest contributes maximally to the beamformer output, whereas the signal from other locations is suppressed. For a whole-brain analysis, a cubic lattice of spatial filters is defined within the brain (here 5 mm spacing), and an independent set of weights is computed for each of them. The outputs of the three spatial filters at each location in the brain are then summed to generate the total power at each so-called “virtual electrode” (VE) over a given temporal window and within a given frequency band.
The localization accuracy of spatial filtering approaches to source analysis has been found to be superior to that of alternative MEG analysis techniques such as minimum norm (Sekihara et al., 2005). However, the accuracy of spatial filtering approaches can be affected by several factors, including the length of the analysis window, signal-to-noise level, and the signal bandwidth (Brookes et al., 2008). Simulation studies have suggested that type 1 spatial filters maintain localization accuracy at adverse signal-to-noise ratios and are not prone to produce “phantom” sources of activity (Huang et al., 2004).
The main limitation of MEG is the difficulty in detecting and localizing deep sources. However, Hillebrand and Barnes (2002) have demonstrated ∼90% detection rate for MEG signals in IFGpo/PCG, middle occipital gyrus (MOG), and indeed most of the cortical network involved in reading which is the concern of the current study. An exception to this is the medial portion of the middle and anterior fusiform gyrus, where detection probability reduces to ∼50%. In addition there is a theoretical restriction in resolving perfectly temporally correlated sources (Van Veen et al., 1997). However, perfect correlation between distinct sources is unlikely and beamforming has been shown to resolve even highly temporally correlated sources (Huang et al., 2004).
A major advantage of beamformer analysis relative to alternative source localization techniques, such as equivalent current dipole modeling or minimum norm estimation, is the ability to image changes in cortical oscillatory power that do not give rise to a strong signal in the evoked-average response. Evoked signal components tend to have a stereotypical wave shape that is phase locked to the onset of the stimulus in such a way it can be revealed by both the evoked average in the time domain and by frequency domain analyses. In contrast, induced components are those changes in oscillatory activity which, though they may occur within a predictable time window following stimulus onset, lack sufficient phase locking to be revealed by averaging in the time domain. They are, however, revealed by changes in power in the frequency domain.
Source-space analysis: statistics.
After acquisition, the MEG data were segmented into epochs running from 900 ms before target onset to 800 ms after. Epochs containing artifacts, such as blinks, articulatory movements, swallows, and other movements, were rejected.
Previous MEG studies of visual word recognition have revealed a complex spread of activation across the cortex with time (Tarkiainen et al., 1999; Pammer et al., 2004; Cornelissen et al., 2009). The earliest components of this pattern occur in occipital, occipitotemporal, and prefrontal cortex ∼100–150 ms after stimulus. Therefore, as a compromise between being able to reveal this temporal pattern across the whole brain and being able to resolve oscillatory activity as low as 5–10 Hz, we conducted beamforming analyses for 200-ms-long windows.
At the first, within-subject level of statistical analysis, we computed a paired sample t-statistic for each point in the VE grid. To do this, we compared the mean difference in oscillatory power (averaged across epochs) in four frequency bands: 5–15 Hz, 15–25 Hz, 25–35 Hz, and 35–50 Hz between a 200 ms passive window (i.e., −790 to −590 ms before target onset), which was shared between all conditions, and two active time windows (0–200 ms and 200–400 ms following target onset). This procedure generates separate t-maps for each participant, for each contrast, at each of the frequency band/time window combinations. Individual participant's t-maps were then transformed into the standardized space defined by the Montreal Neurological Institute (MNI).
At the second, group level of statistical analysis, we used a multistep procedure (Holmes et al., 1996) to compute the permutation distribution of the maximal statistic (by relabeling experimental conditions), in our case the largest mean t-value (averaging across participants) from the population of VEs in standard MNI space (Nichols and Holmes, 2004). For a single VE, the null hypothesis asserts that the t-distribution would have been the same whatever the labeling of experimental conditions. At the group level, for whole-brain images, we rejected the omnibus hypothesis (that all the VE hypotheses are true) at level α = 0.05 if the maximal statistic for the actual labeling of the experiment was in the top 100α% of the permutation distribution for the maximal statistic. This critical value is the (c + 1)th largest member of the permutation distribution, where c = [αN], αN rounded down. This test has been formally shown to have strong control over experiment-wise type I error (Holmes et al., 1996).
At specified regions of interest (ROIs), we wanted to compare the evoked and induced frequency components between experimental conditions, retaining millisecond temporal resolution. We selected ROIs based on peaks in the group-level analyses, and used separate beamformers to reconstruct the time series at these sites. We used Stockwell transforms (Stockwell et al., 1996) to compute time–frequency plots for each participant for each condition, and used generalized linear mixed models (GLMMs) to compare these at the group level. The GLMMs included repeated-measures factors to account for the fact that each participant's time–frequency plot is made up of multiple time–frequency tiles. Time–frequency (spatial) variability was integrated into the models by specifying a spatial correlation model for the model residuals (Littell et al., 2006).
To verify our task design and stimuli, 18 participants (none of whom subsequently participated in MEG) read aloud target words primed by the PSEUD [mean vocal reaction time (VRT) 419.1 ms, SEM 21.9 ms], ORTH (mean VRT 439.8 ms, SEM 21.1 ms), and UNREL (mean VRT 467.5 ms, SEM 19.3 ms) conditions, thus confirming a 21 ms pseudohomophone priming effect. A repeated-measures ANOVA with post hoc comparisons revealed significant differences between PSEUD and ORTH (t(1,34) = 6.13, p < 0.0001), PSEUD and UNREL (t(1,34) = 14.35, p < 0.0001), and ORTH and UNREL (t(1,34) = 8.23, p < 0.0001).
In MEG, participants were very poor at correctly identifying animal words in the prime position (mean d′ = 0.40, SD 0.77), indicating appropriately low awareness of primes. In comparison, participants correctly identified animal words in the target position with a mean d′ = 3.55 (SD 0.54), indicating participants were successfully attending to the task.
Figure 1a illustrates three-dimensional rendered images for a representative condition (PSEUD), thresholded at p < 0.05 (corrected). During the first 200 ms following target onset, in all three conditions cortical activity was centered on left IFGpo, PCG, and left and right MOGs. However, the inherent uncertainty in spatial localization of MEG beamforming analysis prevented us from clearly distinguishing the extent to which activity was localized in either IFGpo or PCG alone or whether there was functionally distinct activity in both areas. Therefore, henceforth we label this cluster of activation as IFGpo/PCG. Activation during this time window also extended inferiorly toward left and right mid-fusiform gyri, and superiorly toward right superior parietal lobule.
Figure 1b shows substantial overlap in IFGpo/PCG activity for all three conditions in the first 200 ms. During the 200–400 ms following target onset, all conditions activated additional reading-related regions, including anterior middle temporal gyrus, left posterior middle temporal gyrus, angular and supramarginal gyri, and left superior temporal gyrus.
We performed region of interest analyses on the IFGpo/PCG (centered on MNI coordinate: −56, 4, 18) and the left and right MOG (centered on MNI coordinates: −26, −96, 8 and 24, −98, 10, respectively) sites visible in the 0–200 ms window (Fig. 1a), to compare the strength of responses between conditions. Figure 1c shows the results of group-level comparisons of time–frequency plots for the critical comparison between PSEUD and ORTH (these statistical contours are based on the estimated marginal means derived from the model parameters, and the predicted population margins were compared using tests for simple effects by partitioning the interaction effects). It demonstrates that shared phonology between prime and target, over and above shared orthography, results in significantly greater induced 30–40 Hz activity (blue–aqua scale) at IFGpo/PCG ∼100 ms after stimulus presentation. No such differences were found in MOG.
Within 100 ms of target word onset, we observed stronger responses to pseudohomophone priming than to orthographic priming of visually presented words in a cluster that includes left IFGpo and/or PCG. These findings therefore demonstrate an early neurobiological response to phonological priming during visual word recognition within a time frame that is consistent with behavioral studies and the ERP result of Ashby et al. (2009). Furthermore, these data provide additional confirmation of the early activation of IFGpo/PCG in response to visually presented words as reported by Pammer et al. (2004) and Cornelissen et al. (2009).
Involvement of left posterior IFG is not unique to language and visual word recognition tasks. For example, there is fMRI evidence for a role during motor imitation (Buccino et al., 2004) and in cognitive control (Snyder et al., 2007). However, the early difference we observed between the PSEUD and ORTH conditions was obtained from an event-related paradigm in which the task, silent reading, was identical for all trials, and where participants could not predict the nature of the up-coming prime, or even detect it reliably. Therefore, we argue it is most parsimonious to attribute our findings to a stimulus driven differential effect in phonological priming, rather than top-down alternatives such as cognitive control. Indeed, we interpret the early engagement of MOG and IFGpo/PCG as reflecting prelexical orthographic–phonological mapping between these regions. Several lines of evidence are consistent with this idea. First, abstract representations of letters/letter clusters are available in MOG as early as ∼100 ms after a printed word is presented (Tarkiainen et al., 1999; Hauk et al., 2006), thus providing the necessary orthographic component of the mapping within an appropriate timeframe. Second, white matter fiber tract connections between the inferior and middle temporal cortices, MOG and IFGpo/PCG may be carried by the superior longitudinal fasciculus (Wakana et al., 2004; Bernal and Altman, 2010), and these could provide the necessary anatomical connectivity. Third, MEG evidence of functional connectivity between MOG and IFGpo/PCG from reading tasks indicates that nodes in the left occipitotemporal cortex can cause the activity observed in prefrontal nodes of the reading network (Kujala et al., 2007). Fourth, early activation of IFGpo/PCG should be observed for pronounceable letter strings not only for silent reading tasks as demonstrated in the current study (Fig. 1b) but also for visual lexical decision and for passive viewing of words, and this has been shown by Pammer et al. (2004) and Cornelissen et al. (2009), respectively.
The difference in induced oscillatory responses between pseudohomophone priming and orthographic priming at left IFGpo/PCG occurred within 100 ms of target onset, whereas differences in the evoked activity were not apparent in IFGpo until ∼150–200 ms. This may provide an explanation for the failure of most EEG studies to identify such an early neurophysiological signature for fast phonological priming. Because analyses of EEG data are often restricted to the evoked average signal, only the brain responses that are phase locked to target onset are routinely observed. Ashby et al. (2009), however, used short three-letter stimuli, which may have aligned the phases of the cortical responses to individual trials sufficiently to reveal a significant effect of phonological consistency in their evoked averaged analysis, analogous to the recognition point for spoken stimuli.
Although the inferior frontal gyrus is implicated in many functions, direct recording in surgical patients (Greenlee et al., 2004) and fMRI studies (Brown et al., 2008) indicate that IFGpo/PCG in particular is strongly associated with motor control of speech articulators. Further evidence that this region is associated with speech production codes comes from Pulvermüller et al. (2006) who found that when individuals listened to speech sounds, somatotopic representations of articulatory features were activated in PCG which were spatially consistent with the motor representations required for generating those same speech sounds. Finally, activation of this speech-motor region is consistent with findings from behavioral studies suggesting that the phonology accessed in visual word recognition is sensitive to articulatory characteristics of the words (Abramson and Goldinger, 1997; Lukatela et al., 2004). In conclusion therefore, the early involvement of IFGpo/PCG in pseudohomophone priming supports a role for these sites in prelexical access to phonological information during visual word recognition. Moreover, these findings suggest that early word recognition may be achieved by a direct print-to-speech mapping mediated by a speech production code.
The contribution of S.J.F. was supported by National Institute of Child Health and Human Development Grant P01-HD-01994 to Haskins Laboratories. We are grateful to Andy Ellis of York University and Michael Simpson of York Neuroimaging Centre, who provided advice and guidance at various stages of this project regarding the development of the experimental protocols, MEG data acquisition, and analysis. We thank Vesa Kiviniemi (Department of Statistics, University of Kuopio, Kuopio, Finland) for advice on the statistical analyses of the time–frequency plots and Jane Ashby of Central Michigan University for comments on drafts of the manuscript.
- Correspondence should be addressed to Dr. Piers L. Cornelissen, Department of Psychology, University of York, York YO10 5DD, UK.