Abstract
Speech production relies on fine voluntary motor control of respiration, phonation, and articulation. The cortical initiation of complex sequences of coordinated movements is thought to result in parallel outputs, one directed toward motor neurons while the “efference copy” projects to auditory and somatosensory fields. It is proposed that the latter encodes the expected sensory consequences of speech and compares expected with actual postarticulatory sensory feedback. Previous functional neuroimaging evidence has indicated that the cortical target for the merging of feedforward motor and feedback sensory signals is left-lateralized and lies at the junction of the supratemporal plane with the parietal operculum, located mainly in the posterior half of the planum temporale (PT). The design of these studies required participants to imagine speaking or generating nonverbal vocalizations in response to external stimuli. The resulting assumption is that verbal and nonverbal vocal motor imagery activates neural systems that integrate the sensory-motor consequences of speech, even in the absence of primary motor cortical activity or sensory feedback. The present human functional magnetic resonance imaging study used univariate and multivariate analyses to investigate both overt and covert (internally generated) propositional and nonpropositional speech (noun definition and counting, respectively). Activity in response to overt, but not covert, speech was present in bilateral anterior PT, with no increased activity observed in posterior PT or parietal opercula for either speech type. On this evidence, the response of the left and right anterior PTs better fulfills the criteria for sensory target and state maps during overt speech production.
Introduction
Speech, although performed effortlessly from childhood onwards, is a highly complex sensory-motor skill. It depends on habitual (that is, automatically executed) sequences of coordinated movements of the respiratory, laryngeal, and articulatory muscles. Using functional magnetic resonance imaging (fMRI) to investigate speech production, from syllable repetition to more natural connected speech, has proved to be a challenge due to artifacts generated by overt articulation. To avoid these artifacts, much fMRI research on speech production has used covert rather than overt speech. A series of studies by Hickok and colleagues proposed a region for auditory-motor integration (the “sensorimotor interface”), located within the posterior planum temporale (PT) and extending up into the parietal operculum (PO; Hickok et al., 2003, 2009; Pa and Hickok, 2008; Buchsbaum et al., 2011). This small region of cortex has been termed “area Spt,” and it is proposed that it is strongly left-lateralized. It is considered to be the homolog of part of nonhuman primate polysensory area Tpt (Pandya and Sanides, 1973). The assumption has been that area Spt is active in the absence of actual motor or sensory activity, perhaps as the result of pre-articulatory feedforward signals from premotor areas to sensory regions that are tuned to respond to postarticulatory sensory feedback. However, there are obvious objections to inferring sensory-motor neural networks controlling speech from studies that do not involve actual vocal production.
Previous studies have directly compared overt and covert speech, but with conflicting results in terms of the networks involved (Yetkin et al., 1995; Barch et al., 1999; Palmer et al., 2001; Huang et al., 2002; Shuster and Lemieux, 2005). These inconsistent results may be partly explained by the very small numbers of participants included in these studies (≤10). In addition, these studies used continuous image acquisition rather than sparse sampling, even for overt speech production, which may result in apparent task-related brain activity that is, in reality, movement artifact (Kemeny et al., 2005).
The specific tasks involved may also explain the discrepancy across these previous studies; for example, the Stroop test (Barch et al., 1999), word-stem completion (Rosen et al., 2000; Palmer et al., 2001), and word generation following a letter cue (Yetkin et al., 1995; Lurito et al., 2000) emphasize domain-general cognitive functions, such as selective attention and inhibition of prepotent responses, rather than speech production itself. Further, they only require participants to produce single words, which again does not fully engage the speech production system.
The present study compared activity in response to both overt and covert sentence-level speech production. This identified common systems involved in both and revealed functional components that are not present when speaking covertly. Two different types of speech task were included in this study: propositional (noun definition) and nonpropositional speech (counting). The specific aim of this study was to determine, by investigating activity within the left temporoparietal junction, whether intention (covert speech) activates a “sensorimotor interface” in the absence of actual motor and sensory activity.
Materials and Methods
Participants.
Seventeen right-handed native speakers of English (eight female) took part in this study, with an age range of 21 years, 10 months, to 61 years, 3 months (mean, 28 years, 3 months). The study was approved by the local research ethics committee and all participants gave informed written consent.
Experimental conditions.
There were four language tasks (propositional and nonpropositional speech with two response types, overt and covert speech) and a rest baseline (referred to as Overt Noun Definition, Overt Counting, Covert Noun Definition, Covert Counting, and Rest). The propositional speech tasks required participants to describe nouns, which were selected using the Medical Research Council psycholinguistic database (Wilson, 1988). All had high values for concreteness and imagability, although frequency was variable. Fifty nouns were selected from the database and then randomly assigned to either the overt or covert speaking conditions. There were no significant differences between mean values for concreteness, imagability, or frequency between the words assigned to the overt condition and those assigned to the covert condition. The list of nouns and their psycholinguistic values are presented in Table 1. Nonpropositional speech was tested with a counting task, counting upward from one for the duration of the trial at a rate of ∼1 per second. Stimuli were displayed on an MRI-compatible screen for 7.5 s, and participants were instructed to start speaking as soon as the stimuli appeared on the screen. The end of the task was indicated by a fixation cross. All tasks were preceded by an image that indicated whether the following task was to be performed overtly or covertly. The rest condition consisted of a series of X's displayed on the screen, and no response from the participant was required. There were 25 trials of each speech condition and 20 rest trials, presented in a pseudorandomized order.
Stimuli list with values for linguistic variablesa
Online speech recordings.
Audio recordings were taken of the participants' online speech to ensure that the participants responded appropriately with regard to overt speech and to ensure that they had not spoken overtly during a covert trial. Rare trials in which the participants failed to respond during an overt trial, or spoke overtly when the task should have been performed covertly, were excluded from analyses.
Data acquisition.
MRI data were obtained using a Philips Intera 3.0 tesla scanner, using dual gradients, a phased-array head coil, and sensitivity encoding with an undersampling factor of 2. Sparse acquisition was performed to minimize movement-related and respiratory-related artifact associated with speech studies (Hall et al., 1999), as well as to minimize auditory masking. Functional MR images were obtained using a T2*-weighted, gradient-echo, echoplanar imaging sequence with whole-brain coverage (TR, 10 s; acquisition time, 2 s, giving 8 s for the participants to speak during silence; echo time, 30 ms; flip angle, 90°). Thirty-two axial slices with a slice thickness of 3.25 mm and an interslice gap of 0.75 mm were acquired in ascending order (resolution, 2.19 × 2.19 × 4.00 mm; field of view, 280 × 224 × 128 mm). There was one run of 120 volumes. Quadratic shim gradients were used to correct for magnetic field inhomogeneities within the brain. T1-weighted images were also acquired for structural reference. Stimuli were presented visually using Matlab Psychophysics toolbox (Psychtoolbox-3; www.psychtoolbox.org) run on an IFIS-SA system (In Vivo). Sounds were delivered through MR-compatible headphones and speech was recorded using a fiber-optic noise-cancelling microphone.
fMRI data whole-brain analysis.
fMRI data were analyzed using FEAT (FMRI Expert Analysis Tool) in FSL (FMRIB Software Library) version 5.98. Preprocessing included motion correction using MCFLIRT (Jenkinson et al., 2002), nonbrain removal using BET (Brain Extraction Tool; Smith, 2002), spatial smoothing using a Gaussian kernel of 5 mm full width at half maximum, grand-mean intensity normalization of the entire four-dimensional (4-D) dataset by a single multiplicative factor, and high-pass temporal filtering (Gaussian-weighted least-squares straight line fitting, with σ = 50.0 s). A simple model design was used, applying the default FSL settings of a single TR with a single response that assumes that the event occurred in the middle of the TR. This makes fewer assumptions about how long participants were speaking for, which is appropriate in the present study as there is no measure of covert speech. We also used a full model approach, which makes more assumptions about speech timings and was based on the average timings of overt speech production. Both approaches gave similar results and here we present data from the simple model design. The BOLD response was modeled using a double-gamma hemodynamic response function (Glover, 1999). In addition, motion outliers were included in the model as additional confound variables. Time-series statistical analysis was performed using FILM (FMRIB's Improved Linear Modeling) with local autocorrelation correction (Woolrich et al., 2001). Z (Gaussianized T/F) statistic images were thresholded using clusters determined by Z > 2.3 and a corrected cluster significance threshold of p = 0.05. Registration to high-resolution structural and standard space images was performed using FLIRT (FMRIB's Linear Image Registration Tool; Jenkinson and Smith, 2001; Jenkinson et al., 2002). Higher-level group analysis was performed using FLAME (FMRIB's Local Analysis of Mixed Effects) stage 1 (Beckmann et al., 2003; Woolrich et al., 2004).
The initial whole-brain analysis was a 2 (Production: Overt and Covert) × 2 (Task: Noun Definition and Counting) ANOVA. Subsequent to this, each condition was contrasted to the common baseline condition of Rest. Finally, direct contrasts between the various speech-related conditions were performed.
Region of interest analysis.
Four regions of interest (ROIs) were defined on an individual basis, separately for each hemisphere, using Freesurfer's autosegmentation, based on gyral and sulcal landmarks (detailed below; Fig. 1). Individually defined left and right hemisphere ROIs were used as there are many studies demonstrating hemispheric anatomical asymmetries at the temporoparietal junction, including the PT (Westbury et al., 1999).
ROIs. The masks used for the ROI analyses are shown in both hemispheres on an inflated Freesurfer standard brain image: the fOp (1, dark blue), the PO (2, light blue), the posterior PT (3, orange), and the anterior PT (4, yellow).
Three ROIs were around the temporoparietal junction (anterior PT, posterior PT, and PO) and the fourth was the frontal operculum (fOp). Detailed explanation of the selection of these three ROIs around the temporoparietal junction is given in our earlier paper (Simmonds et al., 2011). The fOp was selected to investigate changes in the premotor responses to different types of speech.
The fOp and the PT were labeled using Freesurfer's automatic parcellation. The anterior and posterior halves of the PT were defined on an average brain surface and then applied to individual brains using Freesurfer. This decision was made in light of the functional heterogeneity of the PT, with the anterior half displaying a more canonical auditory response and the posterior half corresponding to both auditory-motor integration and responding to acoustic stimulation produced by the human vocal tract (Pa and Hickok, 2008). In the absence of a defined PO within Freesurfer, we defined the ROI as in our previous work (Simmonds et al., 2011). The cortical surface from each participant's high-resolution T1 scan was reconstructed using Freesurfer (Dale et al., 1999), and the ROIs were then automatically defined for each individual's reconstructed cortical surface. This approach has been shown to be comparable in accuracy to manual labeling of brain regions (Fischl et al., 2002). Mean effect sizes for overt and covert propositional and nonpropositional speech conditions, relative to rest, were calculated for each individual. For this analysis, the functional data were not spatially smoothed before averaging as the ROIs are anatomically variable small structures (Da Costa et al., 2011; Dorsaint-Pierre et al., 2006; Westbury et al., 1999).
We performed an additional analysis using a medial and lateral division of the PT, but it added no additional information to that based on the anterior–posterior segmentation (described above) and so we do not report those results here.
Independent components analysis.
We have previously used independent components analysis (ICA) combined with dual regression on fMRI data collected with sparse sampling (Geranmayeh et al., 2012; Simmonds et al., 2014). This approach is more sensitive than a standard subtraction analysis in that it can reveal task-dependent activity within a component of a distributed network when the net activity within a region is not apparent in a univariate analysis because of the presence of functionally distinct overlapping networks. In the present study, this method was used to identify separate networks originating from anatomically adjacent or overlapping regions within the left PT. The PT was defined using the Harvard-Oxford probabilistic atlas within FSL. A temporal concatenation group ICA (Beckmann et al., 2005) was then run on the functional data, identifying voxels within the PT that covary together (Leech et al., 2012; Simmonds et al., 2014) and limited to seven components. Although these patterns can partially overlap spatially and temporally, they each correspond to a separate functional–anatomical network (Beckmann et al., 2005). The time course for each spatial map was calculated as in our previous studies (Leech et al., 2012; Simmonds et al., 2014), using a multiple regression with the 4-D functional data as the dependent variable and the seven spatial maps from the ICA as the independent variables.
Having identified seven distinct components within the PT, a second general linear model was then used to calculate correlations between each of these components with activity across the whole brain. We have previously used this approach (Leech et al., 2011; Simmonds et al., 2014), which is a variant on the dual regression approach (Zuo et al., 2010). This generated statistical maps that provide a whole-brain voxelwise measure of functional connectivity for each of the components while controlling for variance from the other components. Random permutation testing was used to compute nonparametric statistics, corrected for multiple comparisons using familywise error cluster corrections with a nominal t value of 2.3. The function of these separate components was investigated by comparing the individual time courses generated for each subject with the experimental time course (e.g., when specific overt and covert speech production tasks were carried out). The dependent variable was the individual time course for each component and the experimental time course was the design matrix in the general linear model. This generated a β value, quantifying how much each component's time course was modulated by the different task conditions. Subsequent ROI analyses used this measure of BOLD signal change with task. Further details on these methods are reported in our previous work (Simmonds et al., 2014).
Results
Behavioral performance
Technical problems meant that audio recordings were not available for one participant, who was then excluded from analysis. The 16 participants with audio recordings generally responded accurately, with over half (nine) with an accuracy rate of 100%. Seven participants made occasional incorrect responses (i.e., treating a covert as an overt trial and vice versa) during the speech tasks, and these trials were excluded from fMRI analyses.
fMRI analysis
Whole-brain ANOVA results
The whole-brain 2 (Production: Overt and Covert) × 2 (Task: Noun Definition and Counting) ANOVA revealed a main effect of Production and a main effect of Task, but no significant interaction. The main effect of Production was observed in medial premotor cortex [the supplementary motor area (SMA)], primary sensorimotor and auditory cortices, thalami, putamen, and paravermal cerebellum (Fig. 2A). The pattern of activity was symmetrically distributed between both cerebral and cerebellar hemispheres.
Main effects of Production and Task. A, The main effect of Production (Overt > Covert) revealed bilateral activity in medial premotor cortex (1), primary sensorimotor cortex (2), primary and association auditory cortex (3), left and right thalamus (4), basal ganglia (5), and paravermal cerebellum (6). B, The main effect of Task (Noun Definition > Counting) revealed activity in pre-SMA, extending in lower planes into anterior cingulate cortex (1); left inferior frontal gyrus (2); paracingulate cortex (3); bilateral thalamus (4); bilateral caudate nucleus (5); left inferolateral temporal cortex (6); bilateral putamen (7); and cerebellum (8). Axial slices are shown in 4 mm decrements; from left to right, the top row shows slices 50 to 22, the middle row shows slices 18 to −10, and the bottom row shows slices −14 to −42. Z-statistic images were thresholded using clusters determined by z > 2.3 and a corrected cluster significance threshold of p < 0.05. Results are displayed on a standard brain template (MNI152) in the neurological convention.
The main effect of Task was observed in left inferior frontal gyrus extending to the superior frontal gyrus, as well as pre-SMA, paracingulate cortex, left inferolateral temporal cortex, bilateral thalami, caudate nuclei and putamen, and right lateral cerebellum (Fig. 2B). What appears to be activity in the left lateral cerebellum is likely to be activity smoothed down from the adjacent inferior temporal cortex.
Individual conditions greater than Rest
Overt Noun Definition produced bilateral activation: in the SMA and pre-SMA, extending into anterior cingulate and paracingulate cortices; premotor and primary sensorimotor cortices; posterior supratemporal planes; basal ganglia; thalami; the superior temporal gyri; and the cerebellum (Fig. 3A). In addition, there was left-lateralized activity in the inferior and middle frontal gyri, fOp, and posterior inferior temporal gyrus. Covert Noun Definition returned an activation pattern that appeared to be similar to but less extensive and more left-lateralized than Overt Noun Definition (Fig. 3B). There was activity in the pre-SMA, extending into cingulate and paracingulate cortices, left inferior frontal gyrus, left middle temporal gyrus, and right cerebellum. When viewing high-resolution images (not illustrated) we did not observe activity in area Spt during Covert Noun Definition, even at a liberal threshold of p < 0.05 uncorrected. Overt Counting also resulted in an activation pattern with some regions that appeared common to those observed during Overt Noun Definition, but were less extensive (Fig. 3C). There was bilateral activation in premotor and primary sensorimotor and auditory cortices, secondary somatosensory cortices, and in paravermal cerebellum. Table 2 presents the cluster peaks and local submaxima. Covert Counting, relative to Rest, did not result in any activity for the given cluster correction of z > 2.3.
Whole-brain results for three speech tasks, each against Rest. A, Overt Noun Definition > Rest (dark blue) revealed activity in the SMA, extending into anterior cingulate and paracingulate cortices (1); bilateral primary sensorimotor cortices (2); bilateral posterior supratemporal planes (3); bilateral superior temporal gyri (4); left inferior frontal gyrus (5); left posterior inferior temporal gyrus (6); and the cerebellum (7). B, Covert Noun Definition > Rest (pink) revealed activity in left inferior frontal gyrus (1); left middle temporal gyrus (2); left pre-SMA, extending into anterior cingulate and paracingulate cortices (3); and the right cerebellum (4). C, Overt Counting > Rest (turquoise) revealed activity in bilateral primary sensorimotor cortices (1), bilateral primary auditory cortices (2), bilateral secondary somatosensory cortices (3), and bilateral paravermal cerebellum (4). In each section sagittal slices are shown; the top rows show slices from the right hemisphere (from left to right: x = 7, 37, 47, 57, and the orthogonal coronal slice); the bottom rows show slices from the left hemisphere (from left to right: x = −63, −53, −43, −13, −3). Z-statistic images were thresholded using clusters determined by z > 2.3 and a corrected cluster significance threshold of p < 0.05. Results are displayed on a standard brain template (MNI152).
Cluster peaks and local submaxima whole-brain results
Direct comparisons between conditions
The direct contrast of Overt against Covert, for Noun Definition only, resulted in bilateral activity in the SMA, primary sensorimotor cortices, primary and association auditory cortices in the superior temporal gyri, and the paravermal cerebellum. In addition there was signal in the left caudate nucleus but no other activity in the basal ganglia at the statistical threshold used (Fig. 4A). The same contrast for Counting revealed activity in a very similar distribution (Fig. 4B). The reverse contrast of Covert against Overt did not show activation at the threshold used, for either Noun Definition or Counting.
Overt and Covert direct contrasts. A, Overt Noun Definition > Covert Noun Definition. B, Overt Counting > Covert Counting. C, Overt Noun Definition > Overt Counting. D, Covert Noun Definition > Covert Counting. For the six contrasts, sagittal slices are shown, from left to right, x = −63, −53, −43, −13, −3, 7, 37, 47, 57, and the orthogonal coronal slice. Z-statistic images were thresholded using clusters determined by z > 2.3 and a corrected cluster significance threshold of p < 0.05. Results are displayed on a standard brain template (MNI152).
Directly contrasting Noun Definition with Counting, both for Overt (Fig. 4C) and Covert speech (Fig. 4D), demonstrated activity in the pre-SMA, extending into anterior cingulate and paracingulate cortices, the left inferior frontal gyrus, the basal ganglia, and the right lateral cerebellum.
ROI analyses
The ROI analyses gave high-resolution detail of univariate activity within the left and right PTs (anterior and posterior halves) and adjacent POs, and the fOps. Due to the anatomical variability of this region, we used subject-specific ROIs. Ono and colleagues (1990) have demonstrated that the posterior extent of the Sylvian sulcus is orientated vertically in approximately half of normal subjects, and horizontally in most of the rest. Therefore, group analyses with anatomical normalization will blur the distinction between the PT (on the ventral bank of the posterior Sylvian sulcus) and the PO (on the dorsal bank). Activity in the theoretically motivated ROIs was measured for the four experimental conditions. The posterior ROIs (PT and POs) allowed investigation of activity related to sensory feedback during speech production and the anterior ROI (fOps) investigated the premotor control of articulation (Golfinopoulos et al., 2010). They were entered into eight separate 2 (Production: Overt and Covert) × 2 (Task: Noun Definition and Counting) ANOVAs. In view of the multiple ANOVAs, the threshold for statistical threshold was set at p < 0.01. There was a significant main effect of Task in the left fOp (Noun Definition > Counting, F(1,16) = 23.575, p < 0.0005; Fig. 5A). There was a main effect of Production in the left and right anterior PT [left Overt > Covert, F(1,16) = 19.086, p < 0.00005 (Fig. 5A); right Overt > Covert, F(1,16) = 9.833, p < 0.006 (Fig. 5B)]. In neither anterior PTs was activity for both Covert conditions greater than Rest. There was no Task–Production interaction in either hemisphere. Post hoc, two-tailed paired t tests revealed significant differences in the left fOp and left and right anterior PTs (Table 3). As is evident from Figure 5, neither of the posterior PTs demonstrated significant differences in activity between the speech conditions. Further, regional activity during these conditions was no greater than during Rest. In the adjacent POs, activity during Covert Noun Definition was significantly below Rest.
ROI analyses for the four experimental conditions. A, Left ROIs. B, Right ROIs. antPT, Anterior PT; postPT, posterior PT. Mean effect sizes for each of the speaking tasks contrasted with Rest are shown: Overt Noun Definition (black), Covert Noun Definition (white), Overt Counting (gray), and Covert Counting (hatched). Error bars represent 95% confidence intervals. Asterisk indicates significant main effect of Production (Overt/Covert) or Task (Noun Definition/Counting). Table 2 presents the significant post hoc t tests.
Significant t test following significant ROI ANOVAs
When directly testing for lateralization, the only ROI to show a significant left-lateralization was the fOp, for both Overt Noun Definition (Left > Right, p = 0.02) and Overt Counting (Left > Right, p = 0.001). The anterior PT showed a weak trend toward left-lateralization for Overt Noun Definition (p = 0.09), but not for the other conditions.
ICA results
From the seven components identified within the PT, three were of relevance to the purpose of this study. Two components demonstrated activity that could be predicted from the univariate whole-brain analyses. Thus, the first component showed activity greater during both Overt conditions relative to both Covert conditions and to Rest and demonstrated connectivity with bilateral superior temporal gyri, bilateral ventral primary sensorimotor cortices, and bilateral paravermal cerebellum. The second component showed the same profile of activity across the conditions but was distributed across bilateral superior temporal gyri and sulci, the left inferior frontal gyrus, and bilateral cerebellum, lateral to the paravermal regions identified in component one. No component demonstrated activity during both Overt and Covert speech that was greater than Rest. There was one component that responded to Covert Counting, and to a lesser extent to Covert Noun Definition, relative to both Rest and to the Overt conditions. The functional connectivity of this component was largely restricted to the left PT only (Fig. 6) and it was not active for overt speech. Therefore, this component equates with “covertness” and not with the function that might be expected for sensorimotor integration.
ICA results. A, The “Spt-like” component identified from the PT ICA. B, The whole-brain functional connectivity map for this component. The colored overlays are displayed on sagittal (x = −50 mm, left image), coronal (y = −40 mm, center image), and axial (z = 21 mm, right image) slices taken from a standard MNI brain template. The statistical threshold for the whole-brain functional connectivity was set at p < 0.01, corrected for multiple comparisons using a correction for familywise error rate. C, Activity for this component, as identified by the ICA, in response to the different speech production tasks: Overt Noun Definition (black), Covert Noun Definition (white), Overt Counting (gray), and Covert Counting (hatched). Error bars represent 95% confidence intervals.
Discussion
This study used fMRI to investigate both propositional and nonpropositional speech production, with the two speech tasks performed both overtly and covertly. The motivation was to investigate whether a subcomponent of the left PT, and not the right, demonstrated a response profile compatible with a role as a sensorimotor interface during the generation of speech. This functional–anatomical association was proposed over a decade ago, based on data from both fMRI (Hickok et al., 2000) and positron emission tomography (PET; Wise et al., 2001). There was subsequent support from additional fMRI studies (Hickok, 2009). Further, a proposal was made that a lesion of this cortical region could explain the syndrome of conduction aphasia (Buchsbaum et al., 2011). However, although the prearticulatory processing stages of speech production in frontal cortex are strongly left-lateralized, in “classic” Broca's area (Brodmann areas 44 and 45), the primary motor cortical outflow to the many axial muscles controlling speech production, and the auditory and somatosensory reafferent feedback, involves both cerebral hemispheres. Therefore, it might be presumed that the sensorimotor interface for speech production in posterior cortex is distributed between both cerebral hemispheres. Nevertheless, the fMRI-defined functional–anatomical association in what has become known as area Spt, and located to the posterior left PT (Hickok et al., 2009), has become influential in theories about the control of articulation (Hickok et al., 2011; Hickok, 2012). Left-lateralized activation in area Spt/temporoparietal junction has been reported for overt compared with covert sentence reading (Kell et al., 2011) and for affective compared with neutral sentence reading (Pichon and Kell, 2013). Using overt sentence generation, Tremblay and Small (2011) have demonstrated bilateral activation in the transverse temporal gyrus, extending caudally into the PT.
The results from the present study found no evidence in support of area Spt's involvement in the sensory-motor integration during speech. Even though arguing to a null result can be problematic, the design of the present study was well suited to detect whether any activity was associated with overt and covert speech in this region. In contrast, other components of the widely distributed networks supporting speech production observed in the present study were compatible with previous research. The common systems for Overt Noun Definition and Overt Counting, each relative to Rest, were in accord with studies using both PET and fMRI (Braun et al., 2001; Blank et al., 2002; Guenther et al., 2006; Awad et al., 2007; Dhanjal et al., 2008; Simmonds et al., 2011; Geranmayeh et al., 2012) that have revealed the bilateral cortical and subcortical systems controlling the complex sequential movements involved in the production of connected speech. The ANOVA demonstrated the main effect of Noun Definition, regardless of whether it was performed overtly or covertly, that was distributed in higher-order cortices with a more asymmetrical distribution. The regions included pre-SMA, extending ventrally into dorsal anterior cingulate cortex, the left inferior frontal gyrus, left posterior middle temporal gyrus, and the right lateral cerebellum. This main effect of Noun Definition independent of actual articulation replicates the results from a previous multicenter PET study of covert single-word generation (Poline et al., 1996).
A replication of previous published results indicated that the design and execution of the present study was reliable. Further analyses were then directed at the left and right temporoparietal junctions. The definition of area Spt, based on univariate analyses of functional imaging data (Hickok et al., 2009), would predict that the present study should have identified activity in the left posterior PT that was greater than Rest for all speech conditions, Covert as well as Overt. However, the profile of activity across conditions in the subject-specific ROIs that encompassed the anterior and posterior PTs and POs do not confirm previous descriptions of the response of the left-lateralized area Spt. The present study only showed activity in the anterior PT, both left and right, and only for the Overt tasks. Although the anterior PTs are typically thought to comprise unimodal auditory cortex, this activity may reflect somatosensory as well as auditory reafferent feedback, as a study in nonhuman primates has demonstrated that cortex immediately posterior to primary auditory cortex is heteromodal (Smiley et al., 2007). There was no activity greater than Rest in either the left or right posterior PT during the Overt and Covert speech tasks. The same was true for both POs, with evidence for deactivation in this region for Covert Noun Definition.
However, an absence of activity in a univariate contrast does not imply an absence of involvement, as net activity within a brain region that comprises an anatomical overlap of different processing subcomponents may decline, even when the one subcomponent under investigation is activated by the task. This possibility can be investigated with multivariate analyses. The ICA performed in this study, investigating the whole-brain connectivity of the left PT, failed to find a region that was active for both Overt and Covert speech conditions relative to Rest; the expected response for the functionally defined area Spt. There were two components more active for the Overt speech conditions relative to both the Covert speech conditions and Rest, which in terms of the anatomical distributions of the associated networks were in accord with the univariate analyses. Interestingly, there was one component, with almost no connectivity with other brain regions, which was more active for the Covert than the Overt conditions, although not significantly more active than Rest. Its distribution would accord with the published anatomical location of area Spt. It is possible that this region may have become apparent in univariate contrasts between other conditions of covert speech, explaining the activity observed in this region in other studies (Hickok et al., 2009). However, activity associated with covertness alone does not equate with the predicted function of a core component for articulatory motor-sensory integration. This “covertness-related” activity is likely involved in articulatory planning and may also reflect some form of auditory imagery, rather than sensory-motor integration (Parker Jones et al., 2014). It could also be that this component reflects some form of inhibition due to forming the sentence but not uttering it.
There are two influential reviews about the organization of speech and language that differ considerably in detail. The one by Hickok and Poeppel (2007), which strongly proposes a left-lateralized area Spt acting as a sensorimotor interface, depicts a direct connection from this region to the left ventral premotor cortex, the fOp, and the anterior insula. The model proposed by Rauschecker and Scott (2009; Rauschecker, 2011) depicts forward and inverse auditory-motor mapping involving bidirectional connections between left auditory, inferior parietal, premotor, and inferior frontal cortices. The PT ROI used for the present ICA did not show this extensive cortical circuit. However, a previous ICA analysis by our group, using a larger left posterior perisylvian ROI, more closely represented the Rauschecker and Scott model in terms of connected regions (Simmonds et al., 2014). In this study from our group, two regions, the posterior left superior temporal gyrus and the adjacent ventral anterior parietal cortex, demonstrated differently distributed and anticorrelated activity across temporal, inferior parietal, and posterior frontal cortices across a range of speech tasks. These included data from the present study, and a further study of sublexical speech production (repetition of non-native words). However, these networks were symmetrically distributed between the two cerebral hemispheres, whereas the Rauschecker and Scott model implies left cerebral hemisphere dominance. The proposal that the motor, sensory, and combined sensorimotor processes involved in speech production, independent of the lexical, semantic, and syntactic language levels, is distributed symmetrically around bilateral perisylvian cortices is also the conclusion from a study using a very different technique, namely direct electrocorticographic recordings at the time of epilepsy surgery (Cogan et al., 2014). As the PTs are buried within the posterior part of the Sylvian fissure, recordings were not made directly from these sites. The present fMRI study complements the study of Cogan and colleagues by demonstrating a symmetrical response of the left and right anterior PTs to overt speech production, and an absence of a demonstrable and asymmetrical role for the posterior PT.
The notion that an infarct of area Spt can also result in the clinical syndrome of conduction aphasia (Buchsbaum et al., 2011), which is predominantly characterized by impaired repetition and spontaneous speech, with other language functions largely intact, is also contested by a recent study. This syndrome in eight patients was attributed to damage to the white matter tract that connects left posterior temporal, parts of inferior parietal, and ventral premotor cortices (Parker Jones et al., 2014).
In conclusion, we propose that both left and right anterior PTs are components of bilateral perisylvian regions involved in the sensory-motor control of articulation. This does not invalidate other recent detailed models of speech production (Guenther and Vladusich, 2012; Hickok, 2012), but the results presented here do not support a pre-eminent role for area Spt. In accord with Cogan and colleagues (2014), we conclude that speech production is best considered as supported by a bilateral, distributed perisylvian network.
Footnotes
This work was supported by the Medical Research Council.
The authors declare no competing financial interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Anna J. Simmonds, Computational, Cognitive and Clinical Neuroimaging Laboratory (C3NL), Imperial College London, Hammersmith Hospital campus, London W12 0NN, UK. anna.simmonds08{at}imperial.ac.uk
This article is freely available online through the J Neurosci Author Open Choice option.