Abstract
How verbal and nonverbal visuoperceptual input connects to semantic knowledge is a core question in visual and cognitive neuroscience, with significant clinical ramifications. In an event-related functional magnetic resonance imaging (fMRI) experiment we determined how cosine similarity between fMRI response patterns to concrete words and pictures reflects semantic clustering and semantic distances between the represented entities within a single category. Semantic clustering and semantic distances between 24 animate entities were derived from a concept-feature matrix based on feature generation by >1000 subjects. In the main fMRI study, 19 human subjects performed a property verification task with written words and pictures and a low-level control task. The univariate contrast between the semantic and the control task yielded extensive bilateral occipitotemporal activation from posterior cingulate to anteromedial temporal cortex. Entities belonging to a same semantic cluster elicited more similar fMRI activity patterns in left occipitotemporal cortex. When words and pictures were analyzed separately, the effect reached significance only for words. The semantic similarity effect for words was localized to left perirhinal cortex. According to a representational similarity analysis of left perirhinal responses, semantic distances between entities correlated inversely with cosine similarities between fMRI response patterns to written words. An independent replication study in 16 novel subjects confirmed these novel findings. Semantic similarity is reflected by similarity of functional topography at a fine-grained level in left perirhinal cortex. The word specificity excludes perceptually driven confounds as an explanation and is likely to be task dependent.
Introduction
How does visuoperceptual input connect to semantic knowledge? For pictures, a representational space can be uncovered from functional magnetic resonance imaging (fMRI) response patterns in occipitotemporal cortex from which the identity and category membership of objects of different kinds can be decoded (Kriegeskorte et al., 2006; Kay et al., 2008; Haxby et al., 2011; Connolly et al., 2012). How a picture looks like is tightly linked to what it represents (“its referent”), the properties of its referent, and to which other entities it is related. Different strategies have been adopted to try to disentangle stimulus-driven, perceptual factors that facilitate decoding success from conceptually driven, semantic factors (Connolly et al., 2012; Peelen and Caramazza, 2012). In contrast to pictures, the semantic content of written words is independent of how the words look like. If a semantic similarity effect can be found for words, it can be safely assumed that this effect is not confounded by stimulus-driven perceptual effects, hence the importance of studying semantic effects in occipitotemporal cortex not only for pictures but also for words.
Multivariate pattern analysis (MVPA) studies that examined semantic processing of words and pictures within a same experiment have mainly focused on decoding the broad semantic category to which the words or pictures belong, such as tools and dwellings (Shinkareva et al., 2012); animals and tools (Simanova et al., 2012); or fruits, tools, clothes, mammals, and birds (Fairhall and Caramazza, 2013). When subjects perform a typicality rating of concrete words within their respective semantic categories and fMRI response patterns are used for training a classifier, response patterns for pictures in a subsequent test set allow the classifier to successfully assign pictures to the proper category (Fairhall and Caramazza, 2013). The regions that contribute to accurate cross-modal decoding are, among others, posterior middle temporal gyrus, precuneus, and ventral temporal cortex (Fairhall and Caramazza, 2013). As no correlation was obtained between semantic distances and the similarity of activity patterns in the ventral temporal regions, the semantic nature of the ventral temporal effect remained less firmly established by that study than was the case in, e.g., posterior middle temporal gyrus or precuneus (Fairhall and Caramazza, 2013). Univariate contrasts have revealed common activations for words and pictures during semantic tasks in mid and anterior fusiform cortex (Buckner et al., 2000; Bright et al., 2004; Visser and Lambon Ralph, 2011), and the anterior temporal pole (Vandenberghe et al., 1996; Rogers and McClelland, 2004; Lambon Ralph et al., 2010), prime candidates within occipitotemporal cortex for semantic processing.
In the current study of words and pictures, the emphasis was not on drawing boundaries between broad categories but on how semantic similarity within a single category (Connolly et al., 2012) is reflected in the cosine similarity of fMRI activity patterns in ventral occipitotemporal cortex. Remaining within one category reduces the perceptual confounds for the picture modality. The semantic clusters and distances were determined in a data-driven manner on the basis of a concept-feature matrix derived from extensive behavioral work (De Deyne et al., 2008).
Materials and Methods
Participants
Nineteen subjects (10 men, 9 women, between 19 and 26 years old) participated in the main fMRI study and 16 other subjects (5 men, 11 women, between 19 and 26 years old) in the replication study. All subjects were native Dutch speakers and strictly right-handed as tested by the Oldfield Inventory (Oldfield, 1971). The volunteers were free of psychotropic or vasoactive medication and had no neurological or psychiatric history. All participants gave written informed consent in accordance with the Declaration of Helsinki. The Ethics Committee of the University Hospitals Leuven approved the experimental protocol.
Stimuli
Stimulus presentation and response registration were controlled by a PC running Presentation 14.8 (Neurobehavioral Systems). Subjects viewed the stimuli on a mirror in front of them using a Barco 6400i projector at a frequency of 60 Hz and a resolution of 1024 × 768 pixels. Stimuli were projected against a black background.
Twenty-four animal stimuli were selected for the fMRI study as follows: a feature generation task was performed by 1003 college students for a total of 136 animal stimuli. Each student was presented with 10 stimulus words, and asked to write down 10 features for each word, emphasizing that different types of features had to be generated (e.g., perceptual, functional). A minimum of 180 features was gathered for each stimulus word. Next, the applicability of the most frequently generated features (n = 764) was scored for each animal by four different subjects, who were instructed to judge for each animal-feature pair whether or not the feature characterized the entity (feature applicability judgment task; De Deyne et al., 2008). The feature applicability judgment task allows one to avoid an availability bias, i.e., the task corrects for the fact that some features are more salient for one entity than for others. The feature applicability judgment task requires that all properties are verified for all entities, regardless if the feature was generated for the specific animal or not. This matrix, with rows corresponding to objects and columns to semantic features (concept-feature matrix), was used to derive a similarity matrix by computing the cosine similarities between each row. These cosine similarity values are a measure for the semantic distances between entities: high cosine similarities between entities reflect that there is a short semantic distance between them. Note that such a similarity matrix derived from the concept-feature matrix reflects all kinds of features (encyclopedic, sensory, etc.). For the current study, 24 animal stimuli were selected in a way that broad coverage of the animal category was ensured. We applied bottom-up hierarchical clustering to the similarity matrix of these 24 stimuli using Ward's method. Hierarchical clustering revealed that the 24 animal stimuli could be represented as six semantic clusters (silhouette coefficient = 0.79; Fig. 1).
Word length of the stimuli was between 3 and 11 characters. We matched lexical parameters (word frequency; Baayen et al., 1993, familiarity, age of acquisition, and imageability) as much as possible between the six semantic clusters (Cree and McRae, 2003; Table 1). The similarity matrix derived from the concept-feature matrix did not correlate with differences in absolute values of any of the lexical parameters (Table 1). Words were presented with a letter size of 0.7 visual degrees.
For each noun, a prototypical color photo was selected. Picture size was 5.1 × 5.1 visual degrees. In the control condition we also used consonant letter strings and scrambled pictures. The consonant letter strings were created by randomizing the position of the letters from the word stimuli and replacing the vowels with consonants according to a fixed rule. Scrambled pictures were created by dividing the picture stimuli in squares of 0.7 × 0.7 visual degrees and randomizing the position of these parts (Fig. 2B). The entities from which these scrambled pictures were derived could not be identified, as ascertained in an independent group of five subjects.
Experimental task design
In the property verification task, subjects had to respond whether a given property was applicable to a given animal. We used eight properties, both sensory and nonsensory, which were selected from the behavioral data (De Deyne et al., 2008) and were among the most frequently generated by the 1003 students. These properties were as follows: “large,” “legs,” “wings,” “smooth,” “exotic,” “mammal,” “sea,” and “flies.” The properties were selected in such a way that each of them applied to approximately half of the entities and that for each entity the number of positive and negative correct responses was distributed evenly.
During the fMRI experiment, subjects performed the property verification task and a low-level control task (Fig. 2). At the start of each trial, the fixation point changed from white to red for 500 ms. Next an animal stimulus (picture or word modality) was presented foveally for 750 ms against a black background, followed by a backward mask of 40 ms. After a delay of 1 s, a probe question was presented: a property was displayed in writing followed by a question mark for a duration of 150 ms. Each entity was presented eight times, four times as a word and four times as a picture. Each of the eight properties was presented exactly once for each entity. Subjects had to determine whether the property applied to the animal and provide a yes–no response via key press before the start of the next trial. Following the probe word, a square white fixation point (0.6 visual degrees) was displayed for 6060 ms. The long interstimulus interval of 8500 ms was chosen to minimize the need for deconvolution of the hemodynamical response to each stimulus.
In the low-level control condition, a scrambled picture (7.5 × 7.5 visual degrees) or a consonant letter string was shown for 750 ms followed by a backward mask of 40 ms. After a delay of 1 s, a probe question was presented, consisting of the printed word “word” or “photo” followed by a question mark. Subjects had to select between two key response options depending on whether the sample stimulus originated from a word or a picture. We also included null events where nothing happened and subjects had to maintain fixation of the central fixation point for 8500 ms.
Each subject underwent a total of six runs. Each run (255 scans) contained 32 property verification trials, 16 control trials, and 12 null trials. Each run contained an equal number of word and picture trials. Over the total of six runs, each animal was presented in eight property verification trials: four times as a word and four times as a picture and in a way that each animal was combined once with each feature. In each run, one-third of the 24 animals was presented twice and two-thirds was presented once. The stimuli that were only presented once served as an input to generate the stimuli of the low-level control task for that run.
Statistical analysis was performed using the Statistics Toolbox, MATLAB 2011b. We used Wilcoxon rank sum test to compare reaction times between the property verification and the control trials and between property verification trials using word and pictures. We also compared the reaction times between the eight different features. Using a two-way ANOVA, we evaluated whether there was an interaction between feature and input modality (word or picture).
MRI acquisition
Structural and functional images were acquired on a 3 T Philips Intera system (Best) equipped with an 8-channel head volume coil. Structural imaging sequences consisted of a T1-weighted 3D turbo-field-echo sequence (repetition time = 9.6 ms, echo time = 4.6 ms, in-plane resolution = 0.97 mm, slice thickness = 1.2 mm). Functional images were obtained using T2* echoplanar images comprising 36 transverse slices (repetition time = 2 s, echo time = 30 ms, voxel size 2.75 × 2.75 × 3.75 mm3, slice thickness = 3.75 mm, Sensitivity Encoding (SENSE) factor = 2), with the field of view (220 × 220 × 135 mm3) covering the entire brain. Each run was preceded by four dummy scans to allow for saturation of the blood oxygenation level-dependent (BOLD) signal. Eye position was monitored using an Applied Sciences Laboratory infrared system (ASL 5000/LRO system).
Image processing
Preprocessing of the fMRI data (spatial realignment, slice time realignment, coregistration, and normalization with a voxel size of 3 × 3 × 3 mm3) was performed using Statistical Parametric Mapping 2008 (SPM8; Welcome Trust Centre for Neuroimaging, London, UK). All images were smoothed (Op de Beeck, 2010) using a 5 × 5 × 7 mm3 Gaussian kernel. From these images, the fMRI response pattern was derived by calculating the area under the curve of the BOLD response within every voxel between 2 and 8 s after trial onset. Standard SPM8 modeling was used to remove covariates of no interest (motion regressors, low-frequency trends). This procedure resulted in a 3D activation map for each stimulus presentation, containing the fMRI response pattern.
Univariate contrasts
The main purpose of the univariate contrast was to define an occipitotemporal volume of interest for the MVPA based on the contrast of property verification and control trials across modalities. We modeled the fMRI data using a general linear model with the five event types: property verification trials for words and pictures, control trials for words and pictures, and null events. A random-effects analysis with time derivative was performed. We determined the main effect of task: [property verification for words + property verification for pictures] − [control task for consonant letter strings + control task for scrambled pictures] (contrast 1). The significance level was set at a voxel-level threshold of uncorrected p < 0.001 with a cluster-level inference of p < 0.05 corrected for the whole-brain volume (Poline et al., 1997). Each of the occipitotemporal clusters that reached significance in the group analysis for contrast 1 was used as a volume of interest (VOI) for multivoxel pattern analysis. For all subjects the same VOIs derived from the group analysis were used.
For the sake of completeness, we will also report the main effect of stimulus modality ([property verification for words + control task for consonant letter strings] − [property verification for pictures + control task for scrambled pictures] (contrast 2), and inverse) and the interaction effect between task and stimulus modality ([property verification for pictures − control task for scrambled pictures] − [property verification for words − control task for consonant letter strings] (contrast 3), and inverse).
Multivariate analysis
For each trial, a vector was constructed in a higher dimensional space, with the activity levels in each voxel as the elements of the vector and the dimensionality of this vector equal to the number of voxels examined. The cosine similarity between two vectors is the cosine of the angle formed by the vectors. When cosine similarity equals 1, the patterns are identical, save a possible scaling factor (Mitchell et al., 2008). Per subject, the cosine similarity of the vectors was calculated for each possible pair of trials within the a priori defined VOIs. The matrix containing the pairwise cosine similarity between every two trials is the similarity matrix.
Nonparametric comparisons of cosine similarity between semantic clusters.
We determined the cosine similarities of fMRI response patterns between pairs belonging to a same semantic cluster. For each possible pair of trials within each semantic cluster, we determined the cosine similarity between the fMRI response patterns in each subject. This included word–word, picture—picture, and word–picture pairs. We then averaged the cosine similarities over all pairs within a semantic cluster in each subject. The group average of these values over the entire group of 19 subjects will be called the average cosine similarity (ACS). To evaluate whether ACS for entities belonging to the same semantic cluster differed from chance, we compared the ACS to those obtained based on random permutation labeling, with 10,000 random permutations. Random permutation labeling is a nonparametric test: within each subject a random label is assigned to every trial, in a way that no label recurs. Next, cosine similarity is recalculated with these random labels for each subject. ACS within semantic clusters was then compared with the distribution of the ACS obtained with these random labels. We used a one-tailed statistical threshold of p ≤ 0.05 uncorrected (Bruffaerts et al., 2013).
Second, for each possible pair of word trials within each semantic cluster, we determined the average cosine similarity between the fMRI response patterns in each subject. We then averaged the cosine similarities over all these pairs within each subject. To evaluate whether ACS for word pairs belonging to the same semantic cluster differed from chance, we compared the ACS to those obtained based on random permutation labeling of word trials, with 10,000 random permutations. We applied a same procedure for each possible pair of picture trials within each semantic cluster and compared the ACS to those obtained based on random permutation labeling of picture trials.
We also directly compared cosine similarities obtained for words within semantic clusters with that obtained for pictures within semantic clusters: to this purpose, we calculated the rank of the results in the random permutation labeling distribution of words and pictures for each subject. Next, we compared the ranking of pictures versus words across subjects by means of a Wilcoxon rank sign test and set the threshold for significance at p ≤ 0.05.
Finally, we evaluated possible transmodal effects: for each possible pair consisting of one word and one picture within each semantic cluster, we determined the average cosine similarity between the fMRI response patterns in each subject. To evaluate whether the average cosine similarity within semantic clusters differed from chance for these word–picture pairs, we compared these ACS to those obtained based on random permutation labeling of word–picture pairs, in the same manner as outlined above.
Representational similarity analysis.
In volumes where we found an effect of semantic clustering on the fMRI activity pattern, we evaluated whether the semantic distances between entities (De Deyne et al., 2008) correlated with the cosine similarity between fMRI response patterns to these entities (representational similarity analysis, RSA; Kriegeskorte et al. (2008)). We calculated the cosine similarity between (1) the entity-by-entity matrix of cosine similarities based on behavioral feature generation data (De Deyne et al., 2008) and (2) the entity-by-entity matrix of cosine similarities based on fMRI data. The significance of the result was determined by 10,000 random permutation labelings of the second matrix. The similarity matrix between entities based on behavioral data was derived from the cosine similarities between each row of the concept-feature matrix (De Deyne et al., 2008). The similarity matrix between entities based on fMRI data was calculated by averaging the cosine similarity values over all trial pairs representing the same entities across all 19 subjects. The RSA was applied separately for the entity-by-entity matrix of cosine similarities based on fMRI data for words, pictures, and for words and pictures.
Replication study.
The replication study had as its sole purpose to assess the replicability of the main study's principal findings in an independent sample of 16 novel subjects. The experimental design, the preprocessing steps, and the analytical approach in the replication study were identical to that in the main study.
Results
Behavioral data
Compared with De Deyne et al. (2008), the 19 volunteers from our main experiment provided the same responses to the feature applicability judgment task on 93.26% of all property verification trials. Agreement was lowest within the semantic cluster of herpetofauna (90.03%) and highest within the cluster of insects (96.58%). Reaction times to property verification trials were significantly slower than to the control task (mean: 1007 ms, SD 237 ms, Wilcoxon rank sum test: W = 472, p = 0.003). During the property verification trials, there were no differences between word and picture trials (words: 1304 ms, SD 303 ms; pictures: mean 1238 ms, SD 256 ms, Wilcoxon rank sum test: W = 347, p = 0.502). Reaction times depended on the feature that was being probed (Wilcoxon rank sum test: W = 472, p = 0.002): according to a post hoc analysis, responses to the properties “has wings” (mean: 1078 ms) and “flies” (mean: 1092 ms) were significantly faster than to the property “exotic” (mean: 1480 ms). There was no interaction effect between the property being probed and the input modality (two-way ANOVA: F(7,288) = 0, p = 1).
Univariate fMRI contrasts
Compared with the low-level control trials, the property verification trials led to significant activation of a large bilateral occipitotemporal activity cluster (contrast 1; extent (ext.): 1413 voxels of 3 × 3 × 3 mm3; Table 2, Fig. 3A). We divided the occipitotemporal activity cluster into a left (Fig. 4A) and right (Fig. 5A) occipitotemporal VOI, leaving out voxels on the midline (x = 0). The left-sided activity cluster contained 804 voxels, the right-sided 468 voxels, and will be used as the VOIs for the cosine similarity analysis.
Compared with the low-level control trials, the property verification trials also activated left pars triangularis (ext.: 306 voxels), right pars orbitalis (ext. 50 voxels), left angular gyrus (ext.: 53 voxels), and left superior frontal gyrus (anterior, ext.: 43 voxels and supplementary motor area, ext.: 58 voxels; cluster level corrected p < 0.05; Table 2, Fig. 3A) but these regions outside occipitotemporal cortex will not be further analyzed in the current report. For completeness, we also report the main effect of input modality (contrast 2) and the interaction effect between input modality and task (contrast 3).
There was a significant main effect of input modality, with higher activity for pictures and scrambled pictures than for words and consonant letter strings in primary visual cortex, middle occipital gyri, and posterior fusiform cortex bilaterally (contrast 2; ext.: 3388, Z = 6.66). The inverse contrast did not yield any significant differences at the preset threshold. The interaction between input modality and task was significant: compared with words, property verification with pictures activated the left and right posterior fusiform cortex (contrast 3; Fig. 3B; right ext.: 339, Z = 5.55; left ext.: 384, Z = 5.15).
We evaluated the temporal signal-to-noise ratio (TSNR) in anterior temporal cortex given the well known magnetic susceptibility artifact in this region. We calculated the TSNR by dividing the mean of the time series over its SD (Murphy et al., 2007). In perirhinal cortex the TSNR ranged between 117 and 187 in our 19 subjects, values that are well within a proper sensitivity range (Murphy et al., 2007) while more anterolaterally, e.g., within the middle part of the left temporal pole, the TSNR ranged between 42 and 100.
Cosine similarity within semantic clusters
In the left occipitotemporal VOI (Fig. 4A), average cosine similarity between fMRI response patterns to entities belonging to a same semantic cluster was significantly higher than chance (ACS = 0.023, p = 0.050; Table 3). In the right occipitotemporal VOI (Fig. 5A) this effect did not reach significance (ACS = 0.019, p = 0.066; Table 3). When the semantic similarity effect was determined for each input modality separately, a significant effect was found for words in the left occipitotemporal VOI (ACS = 0.017, p = 0.045; Fig. 4D), but not in the right occipitotemporal VOI (ACS = 0.008, p = 0.255; Fig. 5D). No semantic similarity effects were found for pictures in left or right occipitotemporal cortex at the prespecified threshold (Table 3, Figs. 4C, 5C).
To localize the semantic similarity effect for words in the left occipitotemporal VOI in further detail, we divided this VOI into five equal parts (Fig. 6A,B) of ∼160 contiguous voxels each, along the anteroposterior axis (Table 4). This division was done a priori. Of the five subdivisions, only the left perirhinal cortex and adjacent left anteromedial fusiform (center coordinate: x = −26, y = −25, z = −14; Fig. 4E) exhibited a significantly higher similarity between fMRI response patterns for words belonging to a same semantic cluster compared with random permutation labeling (ACS = 0.010, p = 0.020; Table 4, Fig. 4H). The ranking for words was significantly higher than that for pictures in this region (p = 0.040; Fig. 4, compare G,H). This left perirhinal volume of interest consisted of Brodmann areas 35 and 36, as derived from PickAtlas (Maldjian et al., 2003) and lay medially from the lateral bank of the collateral sulcus (Insausti et al. (1998); Fig. 6C). To estimate the effect of smoothing, we reanalyzed our data without smoothing and obtained essentially the same result (ACS for words in left perirhinal cortex = 0.004, p = 0.047).
To evaluate to which degree our findings were specific for the left occipitotemporal activity cluster, we applied the same procedure to the right-sided activity cluster, despite the fact that the overall effects of semantic similarity did not reach significance. When the right occipitotemporal activity cluster was divided into 5 equal partitions, no semantic similarity effects were found when words or pictures were analyzed in separation (Table 5). When we directly compared cosine similarity for words belonging to a same semantic cluster between the left and right perirhinal volume, the effect tended to be higher in the left perirhinal volume than to the right (p = 0.059).
No transmodal effects were observed in left or right occipitotemporal cortex based on the average cosine similarity between all word–picture pairs (Table 3).
Representational similarity analysis
In left perirhinal cortex, cosine similarity between word pairs based on the concept-feature matrix (semantic distances) correlated significantly with the cosine similarity of the fMRI response patterns to these word pairs: the cosine similarity between the cosine similarities derived from the concept-feature matrix and the cosine similarities derived from the fMRI responses, respectively, was 0.156. Random permutation labelings demonstrated that this similarity in structure was higher than chance (p = 0.042; Fig. 7). When words and pictures were pooled, the cosine similarity between the cosine similarities from the concept-feature matrix and the cosine similarities based on the fMRI responses was also significant in this region (cosine similarity: 0.540, p = 0.015). For pictures separately, we did not find any effects (cosine similarity: 0.524, p = 0.174).
Replication study
In the main study, cosine similarity for words belonging to a same semantic cluster was significantly increased in left perirhinal cortex and, according to the RSA, cosine similarity between response patterns to words correlated inversely with the semantic distances between these words. To evaluate the replicability of these two findings, we acquired fMRI data in an additional set of 16 novel subjects using the exact same paradigm and evaluated semantic similarity effects for words within the same left perirhinal VOI. We replicated our main findings in this novel dataset: average cosine similarity between fMRI response patterns to entities belonging to a same semantic cluster was significantly higher than chance for words in left perirhinal cortex (ACS: 0.020, p = 0.047).
RSA in the independent sample confirmed that the cosine similarity between the cosine similarities derived from the concept-feature matrix and the cosine similarities derived from the fMRI response patterns for written words was significantly increased in left perirhinal cortex (cosine similarity: 0.414, p = 0.008).
Discussion
An effect of semantic similarity specifically for words was present in left perirhinal cortex and the adjacent left anteromedial fusiform cortex (Table 4, Fig. 4H). This was demonstrated by the increase in cosine similarity of fMRI response patterns when words belonged to a same semantic cluster (Fig. 4H). It was also evident from the significant second-order cosine similarity between the similarity matrix as derived from the concept-feature matrix and the similarity matrix based on the fMRI response patterns (RSA; Kriegeskorte et al. (2008); Fig. 7). These two principal novel findings from the main study were replicated in a replication study in 16 novel participants.
To limit the number of comparisons, we restricted our VOI to ventral occipitotemporal activity clusters obtained from the contrast between the property verification condition and the low-level baseline condition across input modalities, words, or pictures (Fig. 3A). The way in which we defined the VOI is statistically independent from the similarity analysis of the responses to the property verification trials. The contrast revealed a number of activations that have occurred in a wide variety of studies of semantic processing (for review see Binder et al. (2009); Price (2012)). The visual word form area (Cohen et al., 2000) was not part of the activity pattern obtained by contrasting property verification to the low-level control condition. This is most likely due to the use of words in the probe question of the control condition so that effects related to sublexical or lexical-orthographical processing were subtracted out.
In absolute terms, the cosine similarity between pairs of pictures belonging to a same semantic cluster was higher than that between pairs of words, and nevertheless the effect of semantic cluster was only significant for words (Table 4). Statistical significance depends not only on the values for pairs belonging to a same semantic cluster but also on the values obtained after random permutation labeling. The latter values were higher for pictures (Fig. 4G) than for words (Fig. 4H) indicating that cosine similarity of activity patterns was generally higher for pictures than for words, regardless of semantic clustering. A high cosine similarity between random pairs may be a consequence of the visual similarity between pictures within the animate category. The comparison with random pairs effectively serves as a control for nonsemantic sources of differences in cosine similarity between stimuli.
In posterior fusiform cortex, we found relatively weak semantic similarity effects for pictures or when words and pictures were pooled (Table 4). Numerous previous studies have reported category effects in posterior fusiform cortex but almost always used far more widely separate categories than we did (e.g., faces vs houses; Haxby et al., 2001 or animals versus tools; Chao et al., 1999). Hidden perceptual differences between the pictures belonging to different semantic clusters could have contributed to the subthreshold posterior fusiform semantic similarity effects we observed (Bruffaerts et al., 2013). A semantic similarity effect that occurs for words is much less likely to be confounded by hidden perceptual confounds since the relationship between the visual form of a word and its meaning is entirely arbitrary (Price et al., 2003; Devlin et al., 2005). Hence, we will mainly focus in the remainder of the discussion on semantic similarity effects that were present for words.
In left perirhinal cortex and the adjacent left anteromedial fusiform cortex we obtained a significant semantic similarity effect for words belonging to a same semantic cluster in both the main study (Fig. 4H) and the independent replication study. The semantic similarity effect was significantly stronger when entities were presented as words compared with pictures (Table 4, Fig. 4, compare G,H). This differs from what we would have concluded purely based on the univariate contrast: aggregate response amplitudes in this region were equally high for the property verification task with words as with pictures which could have suggested an amodal semantic effect (Fig. 4F). Other examples have been reported of how MVPA may reveal the distinctive nature of activity patterns whereas voxelwise contrasts between response amplitudes would suggest commonality (Nestor et al., 2013).
Both amodal and word-specific effects of semantic processing have been reported in anteromedial temporal cortex (Bright et al., 2004). A preferential involvement, however, for words as input appears to be relatively consistent in this region between studies of semantic processing (Bright et al., 2004; Chan et al., 2011; Visser and Lambon Ralph, 2011). Cortical surface electrode recordings have revealed category-selective responses to written and auditory words representing animals and objects in human anteroventral temporal areas including inferotemporal, perirhinal, and entorhinal cortices (Chan et al., 2011).
Other studies have found semantic effects in this region also for pictures (Bright et al., 2004; Liu et al., 2009; Peelen and Caramazza, 2012). Our perirhinal region partly overlaps with a ventral temporal region obtained in a recent study of cross-modal classification (Fairhall and Caramazza, 2013). Our results differ from that study in two respects. First, we did not obtain cross-modal similarity effects. In contrast to Fairhall and Caramazza (2013) our stimuli were restricted to the animate category. The task was also different: a typicality rating (Fairhall and Caramazza, 2013) may require more explicit access to the semantic content of the pictures while a property verification task with pictures may rely relatively more on structural description processing obviating the need to access the semantic memory system more in full during property verification. In contrast, for written words access of the orthographic representation to the semantic system is obligatory before retrieval of sensory features of the referent. As a second difference in results compared with Fairhall and Caramazza (2013), cosine similarity between entities as derived from the concept-feature matrix correlated with cosine similarity between fMRI response patterns to these words. The difference, therefore, between the two studies is not merely a systematic difference in sensitivity per se. It is plausible that the degree to which this region reflects semantic similarity for words or pictures is dynamically influenced by the exact task performed (Taylor et al., 2012; Mano et al., 2013). Alternatively, we may have missed a similarity effect for pictures belonging to a same semantic cluster because similarity between fMRI response patterns is already high even for random pairs (in contrast to what we found for words). This could have made it harder to isolate any additional effect that is specifically due to semantic similarity, and even more so if the high cosine similarity between random pairs of pictures is a consequence of the high visual similarity between pictures given the close link between visual and semantic similarity (Dilkina and Lambon Ralph, 2012).
Which cognitive process could be fulfilled by this perirhinal region? As semantic content of concrete entities must be processed at a more fine-grained level, activation progresses more and more anteriorly in the occipitotemporal pathway (Bright et al., 2005). Within the conceptual structure account model (Bright et al., 2005; Taylor et al., 2011) the perirhinal cortex plays a particular role in the integration of multiple properties that are represented in a distributed manner over the cortex (Bright et al., 2005; Devlin and Price, 2007; Holdstock et al., 2009; Tyler et al., 2013), analogous to the associative role of hippocampus in episodic memory. The relationship between semantic similarity and similarity in activity patterns in left anteromedial temporal cortex can also be reconciled with the “similarity in topography” (SIT) model (Damasio, 1989; Simmons and Barsalou, 2003) but only if “topography” is defined at a fine-grained level of activity patterns within neuroanatomical regions. An influential functional-anatomical theory of semantic memory, the semantic hub theory, emphasizes that semantic processing of concrete entities requires convergence of information into anatomical hubs that mediate the construction of multimodal coherent concepts (Rogers and McClelland, 2004; Binney et al., 2012; Visser et al., 2012). Our findings reveal that left perirhinal response patterns reflect the semantic similarity structure as derived from the concept-feature matrix. This is compatible with a role as semantic hub within this theory. Apart from left perirhinal cortex, inferolateral and basal anterior temporal regions may also serve as semantic hubs (Binney et al., 2012), but the TSNR in these regions was lower than in perirhinal cortex.
The role of left anteromedial fusiform cortex in word semantics is also relevant from a clinical standpoint. In a study of semantic dementia, metabolism in this left-sided region correlated with picture naming and verbal fluency, while metabolism in the homotopical right anterior fusiform correlated with nonverbal associative-semantic processing (Mion et al., 2010). There is also pronounced volume loss in semantic dementia in this region (Binney et al., 2010). In aphasic stroke patients activity in perirhinal cortex during semantic decisions based on verbal input was lower compared with controls (Sharp et al., 2004).
To conclude, our findings confirm the role of perirhinal cortex in semantic processing and provide strong support for neuroanatomical models implicating anteromedial temporal cortex in semantic similarity (Bright et al., 2005; Binney et al., 2012).
Footnotes
R.B. is a PhD fellow of the Research Foundation Flanders (F.W.O.) and R.V. is a Senior Clinical Investigator of the F.W.O. Funded by F.W.O. Grant G0660.09, KU Leuven Grant OT/08/56, OT/12/097, and Federaal Wetenschapsbeleid Belspo Inter-University Attraction Pole Grant P6/29 and P7/11.
The authors declare no conflicting financial interests.
- Correspondence should be addressed to Dr. Rik Vandenberghe, Neurology Department, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium. rik.vandenberghe{at}uz.kuleuven.ac.be