Abstract
The visual word form area (VWFA) is a region in the left occipitotemporal sulcus of literate individuals that is purportedly specialized for visual word recognition. However, there is considerable controversy about its functional specificity and connectivity, with some arguing that it serves as a domain-general, rather than word-specific, visual processor. The VWFA is a critical region for testing hypotheses about the nature of cortical organization, because it is known to develop only through experience (i.e., reading acquisition), and widespread literacy is too recent to have influenced genetic determinants of brain organization. Using a combination of advanced fMRI analysis techniques, including individual functional localization, multivoxel pattern analysis, and high-resolution resting-state functional connectivity (RSFC) analyses, with data from 33 healthy adult human participants, we demonstrate that (1) the VWFA can discriminate words from nonword letter strings (pseudowords); (2) the VWFA has preferential RSFC with Wernicke's area and other core regions of the language system; and (3) the strength of the RSFC between the VWFA and Wernicke's area predicts performance on a semantic classification task with words but not other categories of visual stimuli. Our results are consistent with the hypothesis that the VWFA is specialized for lexical processing of real words because of its functional connectivity with Wernicke's area.
SIGNIFICANCE STATEMENT The visual word form area (VWFA) is critical for determining the nature of category-related organization of the ventral visual system. However, its functional specificity and connectivity are fiercely debated. Recent work concluded that the VWFA is a domain-general, rather than word-specific, visual processor with no preferential functional connectivity with the language system. Using more advanced techniques, our results stand in stark contrast to these earlier findings. We demonstrate that the VWFA is highly specialized for lexical processing of real words, and that a fundamental factor driving this specialization is its preferential intrinsic functional connectivity with core regions of the language system. Our results support the hypothesis that intrinsic functional connectivity contributes to category-related specialization within the human ventral visual system.
Introduction
The category-related organization of the human ventral occipitotemporal cortex (VOTC) is remarkably consistent across individuals, with distinct regions consistently showing preferential responses to particular object categories. One of the factors that constrains this organization is the differential connectivity of specific category-related VOTC regions with other cortical regions that store and/or process category-relevant properties (Martin, 2006; Mahon and Caramazza, 2011). Recent work has provided compelling evidence for this idea using measures of structural connectivity (Kravitz et al., 2011a, 2013; Saygin et al., 2011; 2016; Bouhali et al., 2014; Osher et al., 2016) and intrinsic functional connectivity [i.e., resting-state functional connectivity (RSFC); Hutchison et al., 2014; Stevens et al., 2015].
Within this framework, the so-called visual word form area (VWFA), a brain region purportedly specialized for processing visually presented words (Glezer et al., 2009; Dehaene and Cohen, 2011; Hirshorn et al., 2016), would be expected to show privileged connectivity with core components of the language system (Martin, 2006; Cai et al., 2008; Behrmann and Plaut, 2013; Hannagan et al., 2015), especially with the posterior-most extent of the superior temporal gyrus (STG)/Sylvian fissure, the planum temporale, the critical site “where spoken and written language meet” (Nakada et al., 2001) and the heart of Wernicke's area, a critical region for language comprehension (DeWitt and Rauschecker, 2013). However, word specialization in the VWFA only develops through experience (Dehaene and Cohen, 2011). Thus, over the course of reading acquisition, repeated functional coupling of the occipitotemporal sulcus (visual word analysis) with Wernicke's area (word comprehension) should lead to strengthened RSFC related to reading skill (Stevens and Spreng, 2014).
Previous studies investigating the functional specificity and connectivity of the VWFA have not localized this region, or other language regions, in individual subjects, but instead have used group-level analyses or coordinates obtained from previous publications. This is a critical limitation, given recent work demonstrating the importance of individually localizing regions for evaluating category-specific RSFC (Stevens et al., 2012, 2015). Accordingly, no study to date has demonstrated privileged connectivity of the VWFA with Wernicke's area. Moreover, although some studies have reported structural (Bouhali et al., 2014; Fan et al., 2014; Saygin et al., 2016) and functional (Koyama et al., 2010, 2011; Li et al., 2013; Wang et al., 2015; Chai et al., 2016) connectivity between the VWFA and the left lateral frontal and temporal regions, they failed to demonstrate that these areas were involved in any aspect of language processing. Even more troubling, other studies have reported preferential RSFC between the VWFA and brain regions associated with visual attention, rather than language (Vogel et al., 2012a; Zhou et al., 2015). These latter findings are more in keeping with claims that the VWFA is nothing more than a general visual processor for discriminating high-spatial-frequency stimuli of any kind (Price and Devlin, 2003, 2004; Vogel et al., 2012b).
Here, we investigated the functional specificity and connectivity of the VWFA using individual functional localization. We hypothesized that the VWFA is specialized for lexical processing of written words because of its RSFC with core language regions, and therefore, that, (1) it would discriminate real words from nonword letter strings (i.e., pseudowords); (2) it would show specific and strong RSFC with other core areas of the classical language system (Marslen-Wilson and Tyler, 2007), especially Wernicke's area; and (3) the strength of RSFC between the VWFA and Wernicke's area would predict individual differences in reading skill.
Using a combination of fMRI techniques—a multicategory individual functional localizer, multivoxel pattern analysis (MVPA), and high-resolution RSFC analyses—we demonstrate that (1) the VWFA reliably differentiates real words from pseudowords; (2) it has preferential RSFC with Wernicke's area and other core regions of the language system; and importantly, (3) the strength of RSFC between the VWFA and Wernicke's area predicts performance on a semantic classification task for words, but not other categories of visual stimuli.
Materials and Methods
Healthy young adults participated in two fMRI sessions ∼1 week apart. The first day of scanning included a high-resolution resting-state scan, followed by a 10-run multicategory task-based functional localizer. The second day of scanning, ∼1 week later, included 12 runs of a rapid event-related semantic classification task with randomly intermixed stimuli from multiple categories, including words and pictures.
Participants.
Participants were 33 healthy, right-handed, young adults (mean age, 24.0 years; range, 22–33 years; 17 females; 16 males), with normal or corrected-to-normal visual acuity, no history of psychiatric, neurological, or other medical illness, or history of drug or alcohol abuse, which might compromise cognitive functions. Data from these participants have been reported previously (Stevens et al., 2015). Data reported here are from participants who participated in one or two of two separate sessions ∼1 week apart (Session 1: n = 33; Session 2: n = 24 of the same participants). All participants were paid for their participation and gave written informed consent before participation, in accordance with a National Institutes of Health Institutional Review Board-approved protocol.
Functional localizer.
In the first session, participants were scanned with fMRI during a multicategory task-based functional localizer described previously (Stevens et al., 2015). Briefly, a standard block-design task was presented across 10 runs, each run consisting of 14 task blocks (20 s), one for each of 14 different stimulus categories, interleaved with 13 fixation blocks (10 s). Stimuli from seven categories of interest were analyzed in the current study: grayscale images (600 × 600 pixels) of words (multiple fonts) and pictures, including nameable objects (nonmaninpulable objects, such as furniture, vehicles, etc.; tools; animals; body parts), faces, and scenes, all matched in image size to the longest dimension across all exemplars in all categories, projected onto a screen, and viewed via a mirror mounted to the head coil. Task blocks comprised 20 trials [stimulus duration, 300 ms; interstimulus interval (ISI), 700 ms] of a single stimulus category, with stimuli repeating in immediate succession either once or twice in each block. To ensure attention to each stimulus, participants performed a repetition detection (“one-back”) task, by pressing a button with the left index finger to indicate a repetition.
Semantic classification task.
In the second session ∼1 week later, participants were scanned during 12 runs of a semantic classification task in a rapid event-related design. Stimuli (different from those used in Session 1) included 20 exemplars from each of 14 categories, including eight picture categories (tools, nonmanipulable objects, animals, body parts, scenes, faces, abstract objects, phase-scrambled images from all stimulus categories), five word categories (tool words, nonmanipulable object words, animal words, body part words, scene words), and pseudowords (nonword letter strings that respect the phonotactic rules of language). All 280 stimuli were presented six times each in pseudorandom order (optimized using the program optseq) as follows: once across the first two runs, and once again across each subsequent two-run period, resulting in exactly six presentations of each stimulus, distributed across the entire session. Words were concrete common nouns presented in multiple fonts and were matched with pseudowords for length, number of orthographic neighbors, and bigram frequency by position, using the English Lexicon Project (Balota et al., 2007). For each stimulus, whether picture or word, participants indicated whether it was “natural” (i.e., faces, animals, body parts, some scenes), “man-made” (i.e., tools, nonmanipulable objects, abstract objects, some scenes), or neither (i.e., pseudowords, scrambled images), as quickly and accurately as possible. One-third of all trials across the session were fixation trials to introduce temporal jitter. Classification accuracy for each individual category was calculated as the proportion of correct responses minus the proportion of incorrect responses, including only trials for which there was a response. Trials lasted 2 s (stimulus duration, 300 ms; ISI, 1700 ms).
MRI scanning.
MRI data were collected using a GE Signa 3 tesla whole-body MRI scanner (GE Healthcare) with an eight-channel head coil at the National Institutes of Health Clinical Center Nuclear Magnetic Resonance Center Research Facility using standard procedures as previously described (Stevens et al., 2015). In Session 1, a high-resolution T1-weighted anatomical image (MPRAGE) was obtained [124 axial slices; slice thickness, 1.2 mm; field of view (FOV), 24 cm; acquisition matrix, 256 × 256]. Then, a high-resolution “resting-state” scan (8 min 10 s; 140 TRs) was acquired using gradient-echo echo-planar fMRI with whole-brain coverage while participants maintained visual fixation on a centrally located white cross on a gray background (TR = 3500 ms; TE = 27 ms; flip angle, 90°; 42 interleaved contiguous axial slices per volume; slice thickness, 3.0 mm; FOV, 220 mm; acquisition matrix, 128 × 128; single-voxel volume, 1.7 × 1.7 × 3.0 mm). Finally, task-evoked brain activity was similarly measured during the functional localizer runs (TR = 2000 ms; TE = 27 ms; flip angle, 77°; 41 interleaved contiguous axial slices per volume; slice thickness, 3.0 mm; FOV, 216 mm; acquisition matrix, 72 × 64; single-voxel volume, 3.0 mm isotropic). Each localizer run lasted 7 min 18 s (219 TRs). In Session 2, the 12 rapid event-related task runs (7 min 24 s; 222 TRs) were scanned with the same parameters as the localizer runs. A second anatomical image was then acquired as in Session 1. Independent measures of nuisance physiological variables (cardiac and respiration) were recorded during all scans in both sessions for later removal.
fMRI data preprocessing.
Echo-planar images were preprocessed using the Analysis of Functional Neuroimages (AFNI) software package (Cox, 1996). The first four TRs were removed from each run. Large transients in the remaining volumes were removed through interpolation (3dDespike). Volumes were then slice-time corrected (3dTshift) and coregistered to the volume nearest the anatomical scan (3dVolreg). Nuisance physiological and nonphysiological artifacts were removed using a modified version of the ANATICOR procedure (Jo et al., 2010), as previously described (Stevens et al., 2015). This procedure has been shown to drastically reduce or virtually eliminate motion-related artifacts in resting-state time-series analyses (Jo et al., 2013).
The functional localizer was used to independently define category-related functional ROIs to be used as seeds in RSFC analyses of the resting-state data, and for MVPA of data collected in Session 2. Thus, to preserve the integrity and spatial specificity of the task and resting-state data, as well as the high-resolution of the latter, ROIs created from the localizer data were aligned and resampled to the space and resolution of the task and resting-state data independently, before MVPA and RSFC analyses.
All RSFC analyses were performed in cortical surface space. For cortical surface-based analyses, participant-specific surface models were created from each participant's anatomical scan using Freesurfer. Standard-mesh surfaces of 141,000 nodes per hemisphere were created using AFNI Surface Mapper (SUMA; Saad et al., 2004) to produce node-to-node anatomical correspondence across surfaces from multiple participants. The denoised residual time series described above for both the localizer data and resting-state data were mapped onto the cortical surfaces (3dVol2Surf), with a mean kernel of 10 sampling points uniformly distributed along a line between smooth white matter and pial surfaces, extending 80% of the distance between corresponding nodes on the two surfaces. Spatial smoothing was performed on the surface-mapped functional data (SurfSmooth) with a heat kernel resulting in a 6 mm full-width-at-half-maximum noise spatial correlation structure along the white matter surface. MVPA of the Session 2 task data was performed in each participant's native volume space on unsmoothed data as described below.
ROI definition.
To derive the BOLD response magnitudes for each of the conditions of interest at the individual participant level, the blocked functional localizer data were modeled with a boxcar function convolved with a canonical hemodynamic response function and deconvolved (AFNI; 3dDeconvolve–block). The model included 14 regressors corresponding to the 14 stimulus categories, in addition to nuisance regressors (12 regressors for the motion parameters and a third-order polynomial regressor to account for very low-frequency MRI signal drift). The localizer data were used to define both group-level (g-) and individual (i-) functional ROIs for each participant. These ROIs were then applied to their respective resting-state data for whole-brain voxelwise and pairwise ROI-based RSFC analyses, and to task data for MVPA. Standard contrasts were used to define ROIs showing category-preferential responses for different stimulus conditions. ROI peak activation (t statistic) was localized in both volume-based and surface-based (standard) space simultaneously, using spatially locked anatomically corresponding AFNI and SUMA viewing interfaces, to cross-reference activations across viewing modalities and facilitate anatomical specificity and accuracy.
ROIs were created for the VWFA and other core language areas (Marslen-Wilson and Tyler, 2007) at the location of peak activation (p < 0.01) in the contrast words > nameable entities (tools, nonmanipulable objects, animals, body parts) within the corresponding anatomical locations. While there is variability in the precise anatomical locations typically associated with these functional areas in the literature, they were defined here as follows, based on the cited literature: the VWFA, within the left occipitotemporal sulcus (Dehaene et al., 2010); Wernicke's area, within the left planum temporale of the posterior STG/Sylvian fissure (Nakada et al., 2001; DeWitt and Rauschecker, 2013); Broca's area, within the par opercularis of the left posterior inferior frontal gyrus (Amunts et al., 1999); and peak activation, within the left precentral gyrus (Marslen-Wilson and Tyler, 2007). The left and right fusiform face areas (FFAs) were localized at the site of peak activation in the contrast faces > scenes within the left and right lateral fusiform gyrus, respectively. To include a minimum number of voxels required for the analyses and maximize participant specificity, ROIs for MVPA were defined in individual participants' native volume-space and included only voxels with a positive t value in the contrast of interest within a 6-mm-radius sphere centered on the peak voxel. To maximize individual spatial specificity in the whole-brain voxelwise and pairwise ROI-based RSFC analyses, each ROI constituted a single peak node in standardized cortical surface space (SUMA: Saad et al., 2004), representing the mean activity within ∼8.7 mm3 at the corresponding location in native volume-space. Functional localizer data were analyzed at the group level using node-wise (surface)/voxelwise (volume) repeated-measures ANOVAs. For display purposes, all contrast maps are overlaid on cortical surfaces and volume slices with a statistical node-wise/voxelwise threshold of p < 0.001, whole-brain threshold of p < 0.01 (false discovery rate corrected), and a minimum cluster size of 25 nodes/voxels (voxel size: native space, 1.7 × 1.7 × 3.0 mm; standard NN_27 space, 3.0 × 3.0 × 3.0 mm).
MVPA.
The MVPA analyses were conducted in a manner broadly in keeping with previous analyses reported in the literature using representational similarity analysis (RSA; Kriegeskorte et al., 2008; Kravitz et al., 2010, 2011b). The advantage of using RSA is that it allows one to determine the extent to which a given cortical ROI discriminates between individual stimuli or different categories of stimuli (Kriegeskorte et al., 2008; Kravitz et al., 2010, 2011b) based on the voxelwise pattern of activation within this region, and independent of any overall difference in the mean amplitude of the BOLD response across an ROI. The latter is consistent with previous work explicitly demonstrating that MVPA techniques (e.g., RSA) are not sensitive to overall mean amplitude differences across voxels within an ROI (Davis et al., 2014). First, ROIs were derived for each participant as described above. For all ROIs derived within each participant, but not for the group-defined VWFA, all overlapping voxels were removed to ensure that each ROI had unique voxel constituents. Following this step, deconvolved responses to every individual stimulus (n = 280) were extracted for each voxel in each ROI across all possible independent halves of the data. Data were always divided between runs to rule out any autocorrelation confounds across the independent halves. Session 2 comprised six sequential full iterations of all stimuli across pairs of runs, and was thus essentially a six-run design, though these six “full runs” were presented to participants as 12 shorter runs to limit the duration of sustained fMRI scanning to reduce participant fatigue and/or discomfort. Given the six full runs in Session 2, there were 10 possible ways of dividing the data in half. Each possible split was run separately through the MVPA-RSA described below, and combined only when similarity matrices had been created.
The response patterns for each stimulus were then correlated with the response patterns of every stimulus in the other half of the split data. This resulted in 280 × 280 full similarity matrices wherein each point represented the similarity between the pattern of response across voxels within a given ROI as defined by Pearson's r. The main diagonal of these matrices represents the reproducibility of the pattern of response to an individual stimulus across independent presentations. This main diagonal was then removed before averaging the similarity matrices by category to create 14 × 14 category similarity matrices. These matrices served as the basis of the two discrimination metrics used to establish the presence of discriminative information for pictures and the difference between words and pseudowords.
Picture discrimination was calculated in each participant as the difference between the average within-category similarity (correlation between a picture category and itself across two independent halves of the data) and the average between-category similarity (correlation between two different picture categories). If the within-category similarity was significantly greater than the between-category similarity, that was taken as evidence for the ability to discriminate the picture category from the response of any particular ROI.
Word/pseudoword decoding was calculated in each participant as the difference between the average within-category similarity (correlation between the pseudoword category and itself) and the average between-category similarity (correlation between the pseudoword and word categories). If the within-category similarity was significantly greater than the between-category similarity, that was taken as evidence for the ability to discriminate pseudowords from words in any particular ROI.
We predicted that word/pseudoword discrimination would be strongest in the i-VWFA, and significantly stronger than the discrimination for pictures, while the opposite should hold true for the i-FFA. Repeated-measures ANOVAs were used to quantify differences in discrimination indices across ROIs; two-tailed paired-samples t tests were used to examine the simple main effects. Error bars in all plots represent 1 between-subjects SEM.
RSFC analyses.
All RSFC analyses were conducted on the cortical surface. The residual denoised BOLD signal time series was extracted from each ROI, and its correlation (Pearson's r) with that of every other cortical surface node was calculated for whole-brain node-wise analyses; for pairwise ROI-based analyses, the correlations for pairs of ROIs' time series were calculated. All correlations were transformed using Fisher's z before being analyzed at the group level. The ROI-based RSFC analyses were performed using repeated-measures ANOVAs and paired-samples t tests (two-tailed) on the z-transformed correlation values among pairs of ROIs' time courses.
Results
Functional localization of VWFA, language, and control VOTC regions
A general linear model contrast of words > nameable objects was used to localize core language regions by identifying the peak activation for this contrast in the left occipitotemporal sulcus (VWFA), the left planum temporale (Fig. 1c) of the posterior STG/Sylvian fissure (Wernicke's area), the pars opercularis of the left inferior frontal gyrus (Broca's area), and the left precentral gyrus (PCG). All of these regions were reliably identifiable at the group level (n = 33) in standardized anatomical space (AFNI TT_N27: x, y, z coordinates) as follows: VWFA: −52, −49, −13; Wernicke's area: −49, −40, 26; Broca's area: −46, 14, 23; precentral gyrus: −46, −10, 41 (Fig. 1a). However, previous work has demonstrated that individually localizing cortical regions improves, or is necessary for, identification of functional specificity (Glezer and Riesenhuber, 2013) and connectivity (Stevens et al., 2012, 2015). All of these language regions were reliably identified at the individual level, including the i-VWFA, in each of the 33 participants in their native anatomical volume-space and corresponding standard surface-space (Fig. 1b, Example Single Participant). Word-related activation was strongly left lateralized, as expected, with almost no significant activation in the right hemisphere at the group level (Fig. 1a) and, typically, minimal activation at the individual level (Fig. 1b). To determine whether the VWFA is specialized for real-word recognition, and has preferential functional connectivity with Wernicke's area and other core language regions, as we predicted, we localized several control regions in the VOTC for comparisons (see Fig. 3b), including the g-VWFA, as well as the individually localized left and right i-FFA, defined as the location of peak activation in the contrast faces > scenes in the left and right lateral fusiform gyrus, respectively. These control regions are very stringent, given that all of these ROIs are within very close proximity to one another (see Fig. 3b). Further, some have suggested that the left FFA and VWFA are not anatomically or functionally distinct (Behrmann and Plaut, 2013, 2014).
MVPA-RSA of word versus pseudoword and picture discrimination in the VWFA
During the second fMRI session 1 week later, participants were required to classify randomly intermixed words and pictures as natural, man-made, or neither, as quickly and accurately as possible, with each stimulus presented six times across the entire session to facilitate MVPA-RSA. This technique makes it possible to determine the extent to which a given cortical ROI can discriminate between individual stimuli or different categories of stimuli (Kriegeskorte et al., 2008; Kravitz et al., 2010, 2011b) based on the voxelwise pattern of activation within this region.
Pseudowords were pronounceable letter strings that respect the phonotactic rules of language, but have no lexical meaning (e.g., “zant”), and were matched with the words (concrete common nouns) on numerous parameters, including length, number of orthographic neighbors, and bigram frequency by position. Therefore, because pseudowords are visually and orthographically similar to real words, a strong difference in brain activation for these stimuli would be based on lexical, rather than perceptual or orthographic properties. The critical question was whether the i-VWFA could reliably discriminate between words and pseudowords better than between other categories of visual stimuli, and do so better than other category-preferring VOTC regions and the g-VWFA. One-sample t tests indicated that word/pseudoword and picture category discrimination were significantly better than chance in all ROIs (all p's < 0.001). Critically, however, the i-VWFA showed significantly better word/pseudoword discrimination than all of the control regions, and conversely, significantly worse picture discrimination than the left i-FFA and right i-FFA, as predicted. A 2 × 2 ANOVA with ROI (i-VWFA vs left i-FFA) and category discrimination (word/pseudoword vs picture category) revealed a significant crossover interaction (F(1,25) = 34.370, p < 0.0001), with the i-VWFA performing significantly better at word/pseudoword discrimination (t(23) = 3.579, p = 0.0014), but worse at picture-category discrimination (t(23) = −5.349, p < 0.0001) than the left i-FFA. A parallel analysis comparing the i-VWFA to the right i-FFA also demonstrated a highly significant crossover interaction (F(1,25) = 70.541, p < 0.0001), with the i-VWFA performing better at word/pseudoword discrimination (t(23) = 6.866, p < 0.0001), but worse at picture-category discrimination (t(23) = −4.720, p < 0.0001), than the right i-FFA. Finally, the i-VWFA performed better than the g-VWFA at both word/pseudoword (t(23) = 2.673, p = 0.0131) and picture-category discrimination (t(23) = 3.998, p = 0.0005; Fig. 2).
An association between our key MVPA-RSA finding that the i-VWFA discriminates words from pseudowords, and importantly, does so more strongly than the control ROIs (multivariate effect), and any difference in the mean amplitude of the BOLD response for words versus pseudowords (univariate effect) across ROIs might be expected, given that they are measures of the same activity aggregated in different ways. To test for an association between the multivariate effect and any univariate effect across ROIs, and to quantify the unique contribution of the former when controlling for the latter, we conducted the following analyses.
First, simple linear regression, with the multivariate effect as the criterion variable and univariate effect as the predictor variable, revealed that the univariate effect was a significant predictor of the multivariate effect in all four ROIs, as expected, but not equally so across the ROIs, as it accounted for very different proportions of variance (adjusted R2) in the different ROIs, ranging from ∼13 to 46% of the variance (i-VWFA: adjusted R2 = 0.458; B = 0.44; t = 4.7, p < 0.001; g-VWFA: adjusted R2 = 0.132; B = 0.2.19; t = 4.7, p < 0.05; left FFA: adjusted R2 = 0.19; B = 0.16; t = 2.62, p < 0.05; right FFA: adjusted R2 = 0.227; B = 0.1; t = 2.89, p < 0.01). This indicates that although positively associated, the multivariate effect does not depend on the univariate effect.
Second, comparison of the effect sizes (Cohen's d) of the univariate and multivariate effects in each ROI revealed that the effect size was greater for the multivariate effect than the univariate effect in three of the four ROIs, including the i-VWFA (univariate: mean, 0.104; t = 8.08, p < 0.001; Cohen's d = 1.58; multivariate: mean, 0.068; t = 8.32, p < 0.001; Cohen's d = 1.63), the left i-FFA (univariate: mean, 0.036; t = 2.13, p < 0.05; Cohen's d = 0.417; multivariate: mean, 0.030; t = 5.36, p < 0.001; Cohen's d = 1.05), and the right i-FFA (univariate: mean, 0.016; t = 1.42, p < 0.; Cohen's d = 0.278; multivariate: mean, 0.011; t = 4.72, p < 0.001; Cohen's d = 0.927), but the opposite was true for the g-VWFA (univariate: mean, 0.081; t = 4.44, p < 0.001; Cohen's d = 0.871; multivariate: mean, 0.038; t = 4.09, p < 0.001; Cohen's d = 0.803). Thus, the relative effect sizes of the multivariate and univariate effects varied substantially across the ROIs, further suggesting that the effects are independent.
Finally, we conducted an ANCOVA of the multivariate effect across ROIs with the univariate effect as a covariate to assess whether the differential ability to discriminate words from pseudowords across the ROIs, as revealed by MVPA-RSA, was significant when controlling for the univariate effect across these ROIs. The covariate was significant (univariate effect: F(1,99) = 29.86, p < 0.001, partial η2 = 0.232). Importantly, however, the multivariate effect across ROIs was still significant, after controlling for the univariate effect (multivariate effect: F(3,99) = 5.77, p < 0.001, partial η2 = 0.149). Further, after controlling for the univariate effect, pairwise comparisons of the adjusted means of the multivariate effect across ROIs revealed that word/pseudoword discrimination was stronger in the i-VWFA (mean, 0.058; 95% confidence interval, 0.045–0.071) than all other ROIs: g-VWFA (mean, 0.033; 95% confidence interval, 0.021–0.045; p < 0.005); left i-FFA (mean, 0.035; 95% confidence interval, 0.023–0.047; p < 0.05); right i-FFA (mean, 0.020; 95% confidence interval, 0.008–0.033; p < 0.001). There were no significant differences in the adjusted means between the three control ROIs (all p's > 0.05).
These analyses demonstrate the following: (1) the multivariate and univariate effects are positively related, as expected; and (2) there are independent, statistically significant multivariate effects, after controlling for the univariate effects, consistent with previous work demonstrating that “manipulations affecting a single underlying cognitive dimension can be sensitively detected by MVPA measures that are, by definition, insensitive to the mean level of activation across voxels” (Davis et al., 2014). Thus, MVPA methods are more powerful than univariate voxelwise analyses for two reasons: First, they exploit voxel-level variability within subjects that is discarded in univariate analysis; second, they discard subject-level variability in mean activation, which in univariate analyses can reduce sensitivity (Davis et al., 2014).
RSFC of the VWFA with language regions
Previous work has demonstrated that the VWFA, when defined in standardized anatomical space using coordinates from previous literature, shows no significant or preferential RSFC with any language regions, but with dorsal attention network regions and/or other visual regions instead, in whole-brain voxelwise RSFC analyses (Vogel et al., 2012a; Zhou et al., 2015). However, the critical question is whether the i-VWFA shows preferential RSFC with language regions, and Wernicke's area in particular, relative to immediately adjacent VOTC regions that show preference for nonword categories and the g-VWFA. Therefore, as in previous work (Stevens et al., 2015), we used a ROI-based analysis to compare the RSFC strength of the i-VWFA with particular language ROIs, relative to that of two adjacent control regions in the same hemisphere, the g-VWFA and left i-FFA. These analyses were restricted to the left VOTC ROIs to eliminate potential confounds due to cross-hemispheric differences in RSFC. Whole-brain voxelwise RSFC of the VWFA in an example single participant showed strongest RSFC (r > 0.5) with all three core language regions: Wernicke's area, Broca's area, and PCG (Fig. 3a). Across all participants (n = 33), a 3 × 3 ANOVA with seed ROI (i-VWFA, g-VWFA, left i-FFA) and language ROI (Wernicke's, Broca's, PCG) revealed a highly significant main effect of ROI (F(2,64) = 5.933, p < 0.0043), driven by stronger RSFC between the i-VWFA and the language regions, including stronger RSFC with Wernicke's area than the left i-FFA (t(32) = 2.762, p = 0.0094) and g-VWFA (t(32) = 2.073, p = 0.046), with Broca's area than the left i-FFA (t(32) = 2.375, p = 0.0237), and with PCG than the left i-FFA (t(32) = 2.162, p = 0.0382) and g-VWFA (t(32) = 3.860, p = 0.0005; Fig. 3c).
RSFC of VWFA with Wernicke's area predicts word-classification accuracy
To test our hypothesis that the strength of RSFC of the VWFA with Wernicke's area would be directly related to language skill, we correlated ROI-based RSFC strength (Fig. 4a) with accuracy on the semantic-classification task for words and pictures from multiple categories across participants. For each stimulus, whether picture or word, participants indicated whether it was natural (i.e., faces, animals, body parts, some scenes), man-made (i.e., tools, nonmanipulable objects, abstract objects, some scenes), or neither (i.e., pseudowords, scrambled images), as quickly and accurately as possible. There was a highly significant correlation between RSFC strength of the i-VWFA with Wernicke's area and accuracy on word classification only (Fig. 4b; r = 0.49, p = 0.0039), but no significant correlations with accuracy for any of the other categories, though the correlation with accuracy for pseudowords was marginal (Fig. 4c; r = 0.32, p = 0.0729; Table 1). The latter is not surprising, given that distinguishing pseudowords from words (i.e., a lexical decision task) relies on language skills. To directly test whether this brain–behavior correlation for word accuracy was significantly stronger than that for the nameable pictures, we first compared the VWFA–Wernicke's area RSFC correlation with accuracy for words to that with mean accuracy across all nameable picture categories (animals, body parts, tools, nonmanipulable objects, scenes; Fig. 4d; r = 0.215, p = 0.2292) using a two-tailed test of the difference between two dependent correlations with one variable in common (Steiger, 1980; Lee and Preacher, 2013). The correlation with accuracy for words was significantly higher than that with mean accuracy for all nameable pictures (z = 2.202, p = 0.028, two-tailed; Table 1). This predicted significant effect between words and all nameable pictures was followed up with one-tailed tests in the predicted direction: correlation with word accuracy > correlation with accuracy for each of the individual nameable picture categories. The VWFA–Wernicke's RSFC correlation with accuracy for words was significantly higher than that for all of the nameable picture categories (z = 1.673–3.667, p = 0.047–0.0001; Table 1; Fig. 4e,f), except body parts, which did not reach significance (z = 1.111, p = 0.133). The correlation with word accuracy did not significantly differ from the correlation with pseudoword accuracy (z = 1.156, p = 0.124), as expected.
Discussion
Using multiple fMRI techniques, we demonstrate that the i-VWFA differentiates real words from pseudowords better than among other categories of visual stimuli, and does so more strongly than adjacent VOTC regions; has preferential RSFC with Wernicke's area and other core language regions, relative to adjacent VOTC regions; and that the strength of i-VWFA-to-Wernicke's area RSFC is correlated with reading skill specifically. These findings strongly support the claim that the VWFA is specialized for lexical processing of written words in part because of its preferential intrinsic functional connectivity with Wernicke's area, which is consistent with the idea that experience-driven strengthening of RSFC is a contributing factor to category-related specialization of VOTC regions.
Functional specialization of VWFA for real words
There has been considerable debate regarding the nature of functional specialization within the VWFA. Some have argued that it responds preferentially to high spatial-frequency visual stimuli in general, rather than to words per se (Vogel et al., 2012b), and others have argued that it responds to a wide range of visual and nonvisual word and nonword stimuli, interacting with task demands as well (Price and Devlin, 2003). We suggest two limitations of studies upon which such conclusions are based: (1) there is considerable variability in the location and size of the VWFA across individuals; therefore, group-level whole-brain analyses in standardized anatomical space fail to capture the functional specificity of this region (Glezer and Riesenhuber, 2013); and (2) simple differences in the amplitude of BOLD response are used as a proxy for representational specificity, when in fact there are much more sensitive measures, such as repetition suppression/adaptation (Grill-Spector et al., 2006; Schacter et al., 2007; Stevens et al., 2008) and MVPA techniques (Kriegeskorte et al., 2006, 2008).
Few studies have used MVPA to analyze representational specificity in the VWFA specifically (Liu et al., 2013; Boylan et al., 2014), and none have demonstrated that it can discriminate words from pseudowords. We argue that the best demonstration of specificity for real words is the ability to robustly discriminate words from pseudowords, given that they share the same phonotactic properties and, thus, differences at a sublexical level (e.g., visual properties or orthography) cannot explain differential patterns of response. Using fMRI rapid adaptation, Glezer et al. (2009) were the first to show that the i-VWFA discriminates between words and pseudowords. However, they did not demonstrate that these effects were unique to the VWFA, or that the VWFA is better at word/pseudoword discrimination than any other categories of visual stimuli. While our MVPA-RSA results complement those of Glezer et al. (2009), they also represent a substantial advance, insofar as we demonstrate that the i-VWFA is significantly better at word/pseudoword discrimination, but not picture-category discrimination, than other category-preferring VOTC regions and the g-VWFA.
VWFA has preferential RSFC with Wernicke's area and other language areas
To our knowledge, we report the first evidence that the VWFA has preferential connectivity with Wernicke's area, perhaps the most critical cortical region for language comprehension (Nakada et al., 2001; DeWitt and Rauschecker, 2013). Using high-resolution RSFC analyses, we demonstrated that when individually defined, the VWFA had strong RSFC with left lateralized core language regions, including Wernicke's area, Broca's area, and PCG, in whole-brain voxelwise analyses in the individual participants' native anatomical space (Fig. 3a). Using an ROI-based analysis, we also found that the VWFA had stronger RSFC with these regions than stringent adjacent control regions in the VOTC across participants (Fig. 3b,c).
Several recent studies have explored the anatomical connectivity of the VWFA using diffusion-weighted MRI. Bouhali et al. (2014) reported that an ROI corresponding to the VWFA had strong connectivity with left-lateralized “perisylvian language-related regions,” including the frontal operculum and STG. Notably, however, no language regions were localized in this study, and the perisylvian STG region they identified encompassed only the very anterior-to-mid-STG, and clearly did not include the planum temporale or the broader posterior aspect of the STG typically associated with Wernicke's area. Fan et al. (2014) identified a single region near the VWFA peak that showed different connectivity patterns between typically developing (TD) and dyslexic children. While they concluded that this region showed stronger connectivity with “linguistic regions” in TD than in dyslexic children, this included only the middle and inferior temporal gyri, and language regions were not localized. Thus, these studies failed to identify robust/significant anatomical connectivity between the VWFA and Wernicke's area; in both studies, the VWFA was defined using group-level analyses or coordinates from previous literature, rather than individual localization.
Saygin et al. (2016) recently demonstrated that anatomical connectivity patterns determine/constrain the potential location of the VWFA before any functional specialization for words in this region. They showed that the location of word-preferential activation in the left VOTC of literate 8-year-olds could be predicted from their own individual anatomical “connectivity fingerprints” at age 5, before they could read and before showing any functional specificity for words in this region whatsoever. These results provide critical evidence that anatomical connectivity constrains the characteristic locations of functionally specialized VOTC regions, but importantly, that it does not drive functional specialization, given that not all individuals learn to read, and illiterate adults do not show word-preferential activation in the VOTC (Dehaene et al., 2010). However, a recent hypothesis proposes that RSFC plays a dynamic causal role in the development of functional specialization through a Hebbian-like mechanism, whereby repeated coupling of brain regions simultaneously engaged during task performance over time leads to selective long-term increases in RSFC among these regions (Stevens and Spreng, 2014). By this account, while anatomical connectivity constrains the probable location of potential category-related functional regions, it is experience-driven increases in sustained functional connectivity that actualize the functional specialization of these regions.
Several studies have explicitly investigated RSFC of the VWFA (Koyama et al., 2010, 2011; Vogel et al., 2012a; Li et al., 2013; Wang et al., 2015; Zhou et al., 2015; Chai et al., 2016). All of these studies defined the VWFA based on group-level analyses or coordinates from previous literature (e.g., meta-analyses), and none of these studies identified significant RSFC of the VWFA with Wernicke's area in whole-brain voxelwise analyses. The stark contrast of these findings to our results underscores the importance of taking individual differences into account by individually localizing functional ROIs, and the value of ROI-based RSFC analyses.
RSFC of VWFA with Wernicke's area predicts word-classification accuracy
There is now considerable evidence that RSFC strength across domain-specific and process-specific cortical circuits predicts individual differences in performance specific to those domains and processes: e.g., RSFC strength among face-preferential regions predicts performance across a range of tasks with faces, but not other categories (Zhu et al., 2011; O'Neil et al., 2014). Likewise, several studies have demonstrated that RSFC among various language regions is related to measures of language performance (Hampson et al., 2006; Koyama et al., 2011; Zhou et al., 2015; Chai et al., 2016). For example, stronger RSFC between Wernicke's and Broca's areas was correlated with reading performance (Koyama et al., 2011) and the ability to acquire a second language (Chai et al., 2016); reduced RSFC of the VWFA with middle frontal gyrus was associated with dyslexia (Zhou et al., 2015), while increased RSFC with mid-STG was associated with reading speed in a second language (Chai et al., 2016). However, we have argued that a primary constraint on the location of the VWFA within the left occipitotemporal sulcus might be its proximity and privileged connectivity with Wernicke's area, and further, that word specialization in the VWFA is related to its intrinsic functional connectivity with this region; here, we demonstrate that the VWFA has preferential connectivity with this critical region, and moreover, that the strength of this connectivity predicts performance on a classification task with words, but not other categories of stimuli. These results are consistent with the finding that anatomical disconnection (which would disrupt functional connectivity) between the ventral and lateral temporal cortex can result in alexia without agraphia (Welcome et al., 2014).
Conclusion
The VWFA is a category-preferring VOTC region in literate individuals that develops only through experience (Baker et al., 2007). Several studies now convincingly demonstrate that anatomical connectivity profiles can reliably identify the location of category-related functional regions in the VOTC (Saygin et al., 2011; Osher et al., 2016). Accordingly, the future location of the VWFA can be identified before the development of functional specificity for words in this region (Saygin et al., 2016), indicating that while anatomical connectivity constrains location, it does not drive specialization. While the evidence for training-dependent anatomical plasticity is tenuous (Thomas and Baker, 2013), evidence for the dynamic role of intrinsic functional connectivity, as revealed by RSFC analysis, in functional plasticity and its relationship to behavioral performance is abundant and compelling (Stevens and Spreng, 2014). Thus, task-driven functional coupling among regions critical for performance during skill acquisition may lead to sustained increases in RSFC among these regions over time, which in turn facilitates subsequent functional coupling and task performance, giving rise to functional specialization of cortical regions. The evidence we report here regarding the functional specificity and connectivity of the VWFA provides critical support for this hypothesis.
Footnotes
This study was supported by the Division of Intramural Research Programs, National Institute of Mental Health, National Institutes of Health (ZIA MH002588-26; clinical trials number: NCT00001360). We thank Ziad Saad, Bob Cox, and Daniel Glen for invaluable assistance with AFNI and SUMA analyses; Steve Gotts, Kelley Barnes, and Chris Baker for helpful discussions; and Naail Khan, Lily Solomon-Harris, and Jennifer Gabel for assistance preparing the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to W. Dale Stevens, Department of Psychology, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada. stevensd{at}yorku.ca