Abstract
How the brain encodes the semantic concepts represented by words is a fundamental question in cognitive neuroscience. Hemodynamic neuroimaging studies have robustly shown that different areas of posteroventral temporal lobe are selectively activated by images of animals versus manmade objects. Selective responses in these areas to words representing animals versus objects are sometimes also seen, but they are task-dependent, suggesting that posteroventral temporal cortex may encode visual categories, while more anterior areas encode semantic categories. Here, using the spatiotemporal resolution provided by intracranial macroelectrode and microelectrode arrays, we report category-selective responses to words representing animals and objects in human anteroventral temporal areas including inferotemporal, perirhinal, and entorhinal cortices. This selectivity generalizes across tasks and sensory modalities, suggesting that it represents abstract lexicosemantic categories. Significant category-specific responses are found in measures sensitive to synaptic activity (local field potentials, high gamma power, current sources and sinks) and unit-firing (multiunit and single-unit activity). Category-selective responses can occur at short latency (as early as 130 ms) in middle cortical layers and thus are extracted in the first pass of activity through the anteroventral temporal lobe. This activation may provide input to posterior areas for iconic representations when required by the task, as well as to the hippocampal formation for categorical encoding and retrieval of memories, and to the amygdala for emotional associations. More generally, these results support models in which the anteroventral temporal lobe plays a primary role in the semantic representation of words.
Introduction
Evidence for selective effects of lesions on the ability to comprehend and produce words associated with living animals versus manmade objects has been reported since at least 1946 (Nielsen, 1946; Warrington and McCarthy, 1983; Warrington and Shallice, 1984). Deficits in production and comprehension of animal-related concepts are associated with lesions of the left inferior temporal and ventral occipital cortex, while deficits in naming and comprehending tools and manmade objects are associated with lesions of frontal premotor and posterior middle temporal gyri (McCarthy, 1995; Tranel et al., 1997; Mahon and Caramazza, 2009). Building on these findings, fMRI and PET studies have shown increased activity in lateral posterior fusiform gyrus and ventrolateral occipital cortex for animal versus tool stimuli (Martin et al., 1996; Chao et al., 1999; Devlin et al., 2005; Hauk et al., 2008), while manmade objects evoke increased activation of middle temporal gyrus (Perani et al., 1999; Hauk et al., 2008) and medial fusiform gyrus (Chao et al., 1999; Devlin et al., 2005).
The selective hemodynamic responses to pictures depicting objects versus animals in the ventral occipitotemporal cortex have been interpreted as examples of high-level visuoperceptual areas specialized for various categories of objects, including faces and buildings (Kanwisher et al., 2001; Martin, 2007). Some neuroimaging studies have found that this selective response to animals versus objects extends to the words that refer to them (Chao et al., 1999; Devlin et al., 2005), others have not (Mummery et al., 1998; Phillips et al., 2002; Price and Devlin, 2003). Since the tasks where words are effective are thought to invoke elaborative processing, and the areas involved are also activated by visual imagery (Ishai et al., 1999), the hypothesis has been advanced that this area is involved in structural, rather than semantic, representations (Devlin et al., 2005). That is, in tasks that invoke extended processing of words, top-down projections from semantic areas are hypothesized to activate ventral occipitotemporal areas specialized for the perceptual processing of objects and animals (Devlin et al., 2005; Noppeney et al., 2006), whereas images activate this area in a bottom-up fashion (Mechelli et al., 2003, 2004; Noppeney et al., 2006). Devlin et al. (2005) notes that the posterior occipitotemporal lesions producing category-selective visual agnosia often spare general semantic knowledge concerning the same categories, implying that the region is higher order visual-perceptual rather than semantic per se (Etcoff et al., 1991; Arguin, 1996; Humphreys et al., 1997).
This hypothesis predicts that words evoke a relatively early and selective response in the semantic area to evoke the later, selective feedback activation. Hemodynamic methods lack the temporal resolution to determine the latency of category-specific activity. Here, we use intracranial EEG (iEEG) to study category-selective responses in anteroventral temporal lobe (avTL) to words referring to animals and manmade objects. Using written and spoken words, as well as multiple tasks, allowed for the exploration of supramodal, task-independent semantic representations. Microelectrode and macroelectrode arrays provided the spatiotemporal resolution for the detection of early, potentially first-pass, category-specific activation in ventral temporal lobe.
Materials and Methods
Participants.
Nine patients, five female and four male, at the Massachusetts General Hospital or Beth Israel Deaconess Medical Center with medically intractable epilepsy participated in this study while undergoing clinical evaluation using intracranial electrodes. Patients were between ages of 17 and 65 years and all were right-handed. Patients were implanted with a variable number of depth electrodes as determined by a clinical team caring for the patients. Patients were enrolled in this study under the auspices of local institutional review board oversight in accordance with the declaration of Helsinki. See Table 1 for detailed subject information.
Patient information
Intracranial electrodes and recording.
Intracranial EEG recordings were obtained from ≤80 channels of clinical macroelectrode arrays. Six or eight contact depth electrodes (Adtech Medical) were used to record from both medial and lateral cortical areas of the frontal and temporal lobes. Contacts were platinum cylinders, 1.1 mm in diameter and 2.3 mm in length, with 5 mm between the center of adjacent contacts. The decision to implant electrodes and the type, number, and spatial configuration of electrode placement was determined entirely on clinical grounds. iEEG was continuously recorded at 500 Hz with bandpass filtering from 0.1 to 200 Hz. These macroelectrode depth recordings were obtained from six of the nine patients.
Intracranial macroelectrodes were localized by using a volumetric image coregistration procedure. Using Freesurfer scripts (http://surfer.nmr.mgh.harvard.edu), the preoperative T1-weighted MRI (showing the brain anatomy) was aligned with a postoperative CT (showing electrode locations), and both were transformed into Talairach coordinates. Electrode coordinates were manually determined from the CT and also placed into Talairach space. To visualize electrode locations, coordinates were plotted on the average Freesurfer pial surface (fs-average) and individual coronal MRI slices were obtained for each contact.
The remaining three patients were implanted with linear arrays of microelectrodes capable of recording local field potentials (LFPs) and multiunit activity (MUA) across the cortical layers. These arrays were 3.5 mm in length with 24 platinum–iridium contacts (40 μm diameter) spaced 150 μm apart. Recordings from these laminar electrodes were obtained in dual bands: 2 kHz sampling rate for field potentials and 20 kHz for unit activity. The amplifier used a bipolar electrode configuration to minimize noise (for details of construction and use of these arrays, see Ulbert et al., 2001; Cash et al., 2009; Csercsa et al., 2010; Keller et al., 2010). In these three patients, postoperative T1- and T2-weighted MRIs were obtained with the electrodes in place. Direct visualization localized the microelectrodes to the inferotemporal cortex (IT) in patient L1, perirhinal cortex (PR) in patient L2, and entorhinal cortex (ER) in patient L3 (Table 1). Direct visualization from MRI, informed by the known laminar cytoarchitecture of the respective cortical areas and confirmed and refined by determination of background activity (e.g., white matter and CSF have different amplitude local field potentials compared with gray matter) permitted the individual contacts of each laminar electrode to be assigned to putative cortical layers (Ulbert et al., 2004a,b; Halgren et al., 2006; Fabó et al., 2008).
Analysis.
Averaged LFPs were computed for all macroelectrode recordings. Continuous signals from each iEEG channel were initially low-pass filtered at 30 Hz and subsequently epoched from 1 s before to 2 s after stimulus onset. Trials containing large artifacts were rejected using a predefined amplitude threshold, and trials containing epileptic discharges were rejected manually. After alignment to stimulus onset, waveforms from all channels were baseline corrected using a 500 ms prestimulus period. These preprocessing steps were performed within MATLAB using the EEGLAB 6.03b toolbox (Delorme and Makeig, 2004).
Gamma-band responses were also computed for macroelectrode recordings. Power was first computed from 30 to 100 Hz in 2 Hz increments using a Morlet wavelet time–frequency analysis. The number of wavelet cycles was increased linearly from 3.6 to 12 as frequencies ranged from 30 to 100 Hz, providing a constant temporal and frequency resolution across the entire band. The resulting temporal resolution (σt) was 30 ms with a corresponding frequency resolution (σf) of 8 Hz. Spectral power at each frequency was normalized to the power in a 500 ms prestimulus baseline before averaging across the entire band to generate a single gamma-band event-related spectral perturbation waveform.
From the laminar microelectrodes, population current source density (CSD) and MUA were estimated. The CSD estimates the transmembrane current in each cortical layer, while MUA estimates changes in firing rate of the same population of neurons. CSD was computed as the second spatial derivative of field potentials after applying a five-point Hamming filter (Ulbert et al., 2001). Differential transmembrane current sources between conditions were displayed by plotting the subtraction of the mean CSD for objects from the mean CSD for animals. When comparing more than two conditions, the F-statistic from a one-way ANOVA of the CSD computed across individual trials was plotted as a measure of the difference between conditions. MUA was computed by first filtering the 20 kHz signal from each channel between 500 and 3000 Hz and subsequently rectifying the signal. This rectified signal was then low-pass filtered at 30 Hz.
To test the statistical significance of response differences between animal and object categories, a cluster-based nonparametric Monte-Carlo hypothesis test was used on LFP and MUA waveforms, gamma-band power, and CSD plots (Maris and Oostenveld, 2007). This corrects for multiple comparisons while preserving sensitivity in the time domain. All reported temporal regions of significant differences within averaged LFP, gamma waveforms, or CSD or MUA plots are at a p < 0.05 level.
Language tasks.
All participants performed a language task involving written words (SV) and two participants also performed an auditory-word version (SA) of the same task. Each trial involved presentation of a written word for 300 ms (SV task) or an auditory word 500 ms in length (SA task), followed by a fixation point. The shorter duration of the visual stimulus was chosen to align the potentials related to lexicosemantic processing. Subjects were instructed to press a button if the presented word represented an object larger than one foot in any dimension (target trials; e.g., tiger, sofa), while refraining from responding to objects smaller than a foot (nontarget trials; e.g., cricket, lipstick). Exactly half of the trials involved words representing objects or animals larger than one foot, requiring a motor response (target trials). This required subjects to access the semantic representations of these particular words and retrieve visuospatial or propositional knowledge of the associated object. Words were equally divided between living objects (animals and animal parts) and manmade objects. Half of the trials presented a novel word which was shown only once during the experiment; the other half of the trials presented one of 10 repeated words (each shown multiple times during the experiment). Object/animal, target/nontarget, and novel/repeated were fully crossed and balanced.
Novel words representing living and manmade objects were balanced in terms of number of syllables (SA: living = 1.52, manmade = 1.36; SV: living = 2.18, manmade = 2.09), letters (SA: living = 5.22, manmade = 5.21; SV: living = 6.49, manmade = 6.8), and lexical frequency (per million: SA: living = 15.5, manmade = 17.34; SV: living = 12.52, manmade = 12.45) (Francis and Kucera, 1982). These word properties were not statistically different between living and manmade object categories (Wilcoxon sign-rank, p > 0.05). Repeated words were chosen to be representative of the novel words with respect to frequency and length. Visual stimuli were presented as white text on a black background; auditory stimuli were normalized in peak volume and length. The SV and SA tasks contained unique sets of words with no overlap between the two experiments. The visual version of the task included 390 trials; the auditory version included 780 trials (for further description of the tasks and extracranial MEG/EEG analyses of word processing in these tasks, see Marinković et al., 2003; Dale et al., 2000; Chan et al., 2011).
A word memory task (WM) was also performed on the three subjects implanted with microelectrode arrays. Subjects were first asked to remember a list of 10 words, each presented three times. During the experiment, words were visually presented for 300 ms with a stimulus onset asynchrony of 2000 ms. Subjects were asked to press a button whenever any word from the initial list was visually displayed. The target words were shown 12 times each (for a total of 120 trials) while 120 novel words were displayed only once over the course of the experiment. Words were either animals, manmade objects, or abstract nouns (e.g., respect, honor).
Finally, patient L2 also performed an abstractness judgment task (DI) in which the subject was asked to view visually presented words and respond to any words that were abstract rather than concrete. A total of 480 novel words were presented, with no repetition, and words referred to animals, objects, or abstract nouns.
Results
Averaged LFP differences between animals and manmade objects
In general, averaged LFP waveforms in anteroventral temporal lobe showed large deflections at ∼400–500 ms (Fig. 1). This is likely an intracranial manifestation of the well studied scalp N400 potential (Kutas and Federmeier, 2000; Marinković, 2004).
Ventrotemporal category specificity in averaged local field potentials. Center, Depth electrode coordinates from all patients in Talairach space plotted on the Freesurfer average surface. Blue circles indicate electrodes at temporal recording sites demonstrating significant averaged LFP differences, gray circles indicate electrodes at temporal recording sites without significant LFP differences, and yellow cirlces indicate Talairach coordinates of either the center or maximally significant voxel for category-specific fMRI or PET responses as reported in previous literature. Coronal MRI slices of the temporal lobe are shown for each significant electrode location. Side plots, Averaged LFP waveforms (solid lines) or gamma power (dashed lines) for animals (blue) versus objects (red). Electrodes in occipitotemporal sulcus, collateral sulcus, and hippocampus/parahippocampal gyrus demonstrate category specificity. Differences are seen largely starting at 400 ms and in some cases, remain until 1500 ms after stimulus onset. In four subjects, gamma-band power (30–100 Hz) was differentially modulated by animals and objects. Latencies of significant differences are seen as early as 300 ms and as late as 1200 ms.
Robust animal/object-specific activity was observed in bilateral ventral and medial temporal areas in the averaged LFPs of the six patients with macroelectrode recordings (Fig. 1, plots, solid lines). Specifically, electrodes in or near collateral and occipitotemporal sulci, both anteriorly and posteriorly, demonstrated category-specific activity. Two electrodes near right hippocampus and parahippocampal gyrus also showed category-specific activity. Significant differences between the two semantic categories were observed as early as 200 ms and as late as 1500 ms. While category-specific differences were apparent at the 400–500 ms peak in seven of the 13 electrodes, six electrodes demonstrated differences within the slow return to baseline beyond 500 ms. In all but two cases (the right hemisphere electrodes in patient D3), the response to animals yielded more negative LFP waveforms than the response to manmade objects.
In the two subjects who also performed the auditory version (SA) of the size judgment task (patients D5 and D6), the electrodes that demonstrated written word category specificity also exhibited category-specific responses to spoken words. The averaged LFP waveforms in both task modalities exhibited similar morphology, and category-specific differences occured at similar latencies. This suggests that these ventral temporal regions are supramodal with respect to the encoding of semantic category.
The location of the electrodes demonstrating category-specificity found here are further anterior than the posterior ventral temporal locations reported in hemodynamic studies of activation to pictures or words representing objects or animals (Chao et al., 1999; Perani et al., 1999; Thompson-Schill et al., 1999; Whatmough et al., 2002; Price et al., 2003; Devlin et al., 2005; Mechelli et al., 2006; Noppeney et al., 2006) (Table 2). While the Talairach coordinates of fusiform-specific category-selective activity reported in those studies ranged from y = −33 to −83 (mean = −58) in the anterior–posterior axis, the coordinates of the involved electrodes in this study ranged from y = −11 to −37 (mean = −24). In four other studies, PET or fMRI category-specific activity was observed in response to pictures at the temporal poles (Damasio et al., 1996; Mummery et al., 1996; Moore and Price, 1999; Devlin et al., 2002). This activity is further anterior than the recording sites reported in this study. Category-specific activity in this portion of the ventral temporal lobe has not previously been reported by neuroimaging.
Talairach coordinates of category-specific responses in previous neuroimaging studies
Because these category-specific findings are generally consistent across subjects despite the varying epilepsy etiologies and seizure foci (Table 1), it is likely that the semantic processing observed in the avTL is representative of the normal function of this area.
Gamma-band selectivity
In three of the subjects, category-specificity was found in gamma-band power (30–100 Hz) in medial and inferior temporal electrodes (Fig. 1, plots, dashed lines). In these subjects, gamma-band power increases at approximately the same time as the major deflection of the averaged LFP. However, in several cases, these responses continue beyond the 400–500 ms peak in the field potentials, demonstrating that increased gamma power may continue even after the field potential returns to baseline. Time–frequency plots of these channels indicate that the high-frequency activity seen here is a result of increases in frequencies between 30 and ∼120 Hz. The most pronounced example of category specificity was seen in subject D5. In this subject, gamma-band category differences were clearly visible in both visual and auditory modalities of the size judgment task in the left anterior occipitotemporal sulcus electrode. While significantly more gamma-power was visible in response to object trials, gamma-power increased for both semantic categories. Significant differences began at 300 ms and lasted until 1200 ms. These latencies began slightly earlier than the corresponding LFP differences seen in the same electrode.
Gamma-band and LFP specificity, although spatially correlated, were not simultaneously present for every electrode. In patients D1, D4, and D6, none of the channels showing LFP category specificity showed differential gamma-band activity. In patients D2 and D3, one of the three electrodes that showed LFP specificity also showed gamma-band specificity. In patient D5, one electrode showed only LFP specificity, one showed only gamma-band specificity, and two showed specificity in both types of activity. These data demonstrate that while LFP and gamma-band activity often occur together, they can also occur independently. In all electrodes showing gamma band differences except for one, LFP differences were also seen, suggesting that gamma activity tends to be more focal.
Multiunit activity and current source density
In the three subjects with laminar microelectrode arrays, CSD plots illustrate robust task-related responses in IT, PR, and ER (Fig. 2). Category-specific differences were observed in the size judgment (SZ), DI, and WM tasks. This suggests that even in a task that does not require explicit access of visual–structural information (the word memory task), category-selective responses are still seen.
Laminar microelectrode recordings demonstrate category-selective responses. CSD and MUA show category-specific differences between animals and manmade objects for the three implanted patients. CSD was computed as the second spatial derivative of laminar recordings. In CSD plots, outlined regions indicated statistically significant differences between animals and objects for the SZ task, or animals (ani), objects (obj), and abstract nouns (abs) in the WM or DI tasks (p < 0.05). Animal/object (ani-obj) plots were generated by subtracting the mean CSD for objects from the mean CSD for animals. Plots of the F-statistic from a one-way ANOVA indicate differences between three conditions (object/animal/abstract) for the WM or DI tasks. In MUA waveform plots, shaded regions indicate time-points with statistically significant differences. L1, The right (R) IT electrode shows a layer IV sink beginning at 160 ms that is modulated by semantic category in both SZ and WM tasks. L2, In the right PR electrode, the first sink occurs ∼100 ms in layers IV/V in all three tasks. Category specificity is seen in these same layers beginning as early as 150 ms. Differential MUA responses are seen in deeper layers and demonstrate animal-specific increases in firing beginning as early as 200 ms. L3, In the left (L) ER electrode, an initial layer V/VI sink is present beginning as early as 100 ms in the SZ task and ∼200 ms in the WM task. Category-selectivity is present in deeper layers at 130 ms and more superficial layers later. MUA responses for the WM task demonstrate animal-specific increases in firing.
In the inferotemporal electrode in patient L1, activation began with a sink in putative layer IV with a concurrent source in layers II/III peaking at 160 ms in both SZ and WM tasks. Category-specific differences were seen within this first layer IV sink starting at 150 ms, and again at 900 ms in both layer IV and upper layers. This difference can be characterized by a larger layer IV sink in response to animals.
In the right perirhinal cortex electrode in patient L2, an early sink was again present in putative layer IV beginning at 120 ms and peaking at 150 ms, followed by a superficial layer II/III sink at ∼500 ms for the SZ task. Category differences were seen within this first activation in layers IV/V starting at 150 ms, and were again characterized by a larger sink in response to animals. Early responses to the WM and DI tasks were very similar in terms of laminar distribution, latency, and category-specific difference.
Patient L2 also yielded reliable multiunit activity for SV and WM tasks. Robust increases in MUA were apparent in layers IV and V with clear differences between animals and objects. This increase in unit firing implies that the early layer IV/V sink in the CSD is excitatory in nature. The MUA response to animals is significantly larger than the response to manmade objects, with differences beginning at ∼200 ms. Similarly, in the WM task, the MUA response to animals was significantly larger than the response to objects or abstract nouns, but no difference was found between these latter two categories (p > 0.05). These differences began at 230 ms. In the abstract judgment task, the CSD response to animals was again larger than the response to either man-made objects or abstract concepts, with no difference between the two latter categories, mirroring the MUA differences in the SZ and WM tasks.
In the entorhinal cortex electrode in patient L3, activation began with a sink in layer V/VI at 120 ms followed by a sink in superficial layers II/III at ∼190 ms. In the ER electrode, differences were seen starting at 130 ms in deeper layers, with additional differences appearing in more superficial layers at ∼450 ms. These differences were quite prolonged, and lasted beyond 1500 ms. Robust multiunit activity was also observed in layers V/VI for the WM task in this patient. As in the case of the perirhinal electrode, the activity in this entorhinal electrode increased over baseline, indicating an excitatory early sink. Again, the MUA showed a larger increase to animals but no differences between manmade objects and abstract objects.
In all cases, category-specific differences occured at the layer and latency of the first current sink in the CSD, suggesting that first-pass activation of these areas contains semantic information. Furthermore, the layer IV location of this initial current sink in IT and PR is consistent with the typical layer where feedforward activation arrives. This also suggests that the main source of this information is from longer-distance corticocortical afferents rather than local interneurons.
Single-unit category selectivity
Single-unit firing was identified in the perirhinal microelectrode recordings of patient L2 (Fig. 3). A total of eight distinct units were identified across the 24 channels. A raster plot of a representative unit is shown in Figure 3A. In this case, firing decreased after stimulus onset. While mean firing rates were low (∼0.1 Hz), three of the eight units demonstrated statistically significant differences in firing for animal and object trials between 0 ando 300 ms (Wilcoxon rank-sum, p < 0.01; Fig. 3C). In all three cases, more spiking was observed in response to animals than objects, which is consistent with increased MUA response to animals in the same electrode.
Perirhinal cortex single unit firing rates show animal/object information specificity. A, Single unit raster plot and peristimulus time histogram for a representative unit. B, Mean firing rate in five time bins for the same unit shown in A for animals (blue) and objects (red). From 0 to 300 ms, the drop in firing rate for objects is much larger than the drop in response to animals. C, Number of spikes per trial (sorted into animal and object trials) for each of eight identified units. Percentages indicate the proportion of trials with at least one spike in which the stimulus was a word associated with an animal (blue) or manmade object (red). Asterisks indicate the three units with statistically significant differences in firing between animal and objects trials (Wilcoxon rank-sum, p < 0.01).
Discussion
While many studies have demonstrated category-specific hemodynamic activity to images in posterior ventral temporal areas, responses to words in these areas have been more variable and little has been seen more anteriorly. We report focal electrophysiological responses selective for words referring to animals versus objects in the inferotemporal, perirhinal, and entorhinal sectors of the human avTL. Differences were observed both in measures sensitive to synaptic activity (LFP, gamma-band power, and CSD) and to unit-firing (MUA and single units) at multiple spatial scales. The timing, laminar location, and task correlates of this activity have implications for the mechanisms, whereas more posterior ventral visual regions may show similar differential activation to the same stimuli. The avTL categorical responses may also contribute to stimulus-selective cuing of the hippocampal formation for recall and of the amygdala for emotional evaluation. More generally, these findings provide additional evidence for a key role of avTL in semantic encoding.
Semantic category selectivity is present in the initial responses recorded in IT and PR at latencies as early as 130 ms. Using CSD analysis (Ulbert et al., 2001; Pettersen et al., 2006; Einevoll et al., 2007), we identified the initial response as a sink in what was estimated to be middle cortical layers, the location where feedforward afferents terminate (Van Hoesen and Pandya, 1975; Saleem et al., 1993; Saleem and Tanaka, 1996). These afferents are excitatory, as confirmed by the concurrent increase of category-selective multiunit activity. The principle source of feedforward afferents to these structures in macaques arise largely in ventral occipitotemporal cortex (Desimone et al., 1980; Mishkin et al., 1983; Martin-Elkins and Horel, 1992; Suzuki and Amaral, 1994; Suzuki, 1996; Lavenex and Amaral, 2000). In humans, these structures could correspond to the various high-level visual material-specific processors that generally show their first peak of activity between 150 and 200 ms and lie just anterior to classical retinotopic cortical areas (Allison et al., 1994, 1999; Halgren et al., 1999; VanRullen and Thorpe, 2001).
Indeed, it is possible that these afferents arise in the ventral occipitotemporal regions that respond selectivity to pictures of objects and animals (Chao et al., 1999; Perani et al., 1999; Chao and Martin, 2000; Devlin et al., 2005; Noppeney et al., 2006; Liu et al., 2009). However, we consider this possibility unlikely because these occipitotemporal areas do not reliably respond to words referring to these categories, but rather, their response is task-dependent (Mummery et al., 1998; Phillips et al., 2002; Price et al., 2003; Devlin et al., 2005).
In contrast, the category-selective responses to words in this study were present regardless of the task, including size, familiarity, and abstract/concrete judgment tasks. In fact, the word-memory task, which does not require explicit activation of an object's visual form, also yielded category-specific responses in these areas. Our data suggest that avTL projections to ventral occipitotemporal cortex may cause it to display category-selective hemodynamic responses to words. Strong feedback projections between homologous areas have been demonstrated in macaques (Van Hoesen, 1982; Halgren et al., 1999; Suzuki et al., 2000; Lavenex et al., 2002). This hypothesis posits that occipitotemporal areas encode visual structural, rather than supramodal, semantic information, resulting in automatic bottom-up activation by images, consistent with the early latencies seen by Liu et al. (2009). However, category-specific activation to words would only be observed in this area during tasks that require a full instantiation of that item's structural form. This interpretation is consistent with that proposed previously by Devlin et al. (2005) based on fMRI and neuropsychological results. MEG studies have also shown that more anterior areas in the ventral stream provide feedback to ventral occipitotemporal areas after first-pass processing of pictures to participate in the successful identification of visual objects (Bar et al., 2006), especially when precision is required (Clarke et al., 2011). Feedback projections arise in infragranular pyramidal cells in deep cortical layers. The current study recorded sustained activity in deep layers of avTL sites, also selective for semantic category, immediately after the feedforward peak in putative layer IV. Thus, this study demonstrates category-specific synaptic and unit-activity in input layers at early latencies reflecting feedforward activation, and in deep layers at longer latencies reflecting the presumed source of feedback to ventral occipitotemporal areas.
Figure 4 illustrates our proposed model of category-selective perceptual and semantic information flow in the temporal lobe. The implication that activation of perceptual processing areas by words is secondary to lexicosemantic encoding, as well as being nonobligatory and task-dependent, may be inconsistent with some of the stronger claims of embodied cognition (Martin, 2007; Mahon and Caramazza, 2009). In our model, semantic category responses in avTL would reflect projections from the visual word form area (VWFA) in the fusiform gyrus at the occipitotemporal junction (Halgren et al., 1994; Cohen et al., 2000; Crone et al., 2001; Dehaene et al., 2005) and a possibly homologous auditory area in the superior temporal sulcus (Scott et al., 2000; Parker et al., 2005; Saur et al., 2008). It may be possible to conceive of the category-selective responses reported here as a continuation of progressively greater abstraction, a general theme of the ventral stream (Mishkin et al., 1983; Ungerleider and Haxby, 1994; Mesulam, 1998; Vinckier et al., 2007).
Model of lexicosemantic information flow in the temporal lobe. Visual inputs (either pictures or written words) are preprocessed by low-level occipital areas. Visual information proceeds to material-selective visual form areas in ventral occipitotemporal cortex that represent the structural information present in an image or the orthographic representation of a written word. Category-specificity is possibly seen in this area to images due to the structural differences between living and nonliving objects. This information then proceeds to anteroventral temporal cortex in which lexicosemantic associations are processed. Spoken word information proceeds along a similar pathway within the superior temporal cortices. When the particular task requires accessing visuostructural information after a written or auditory word input is perceived, feedback pathways (blue arrows) activate ventral occipitotemporal cortices.
In several sites, semantic category-selective responses were evoked by both visual and auditory words. This implies that there may also be input to avTL from auditory areas analogous to the VWFA, consistent with projections from the superior temporal lobe to this region in macaques (Seltzer and Pandya, 1978; Saleem et al., 2000), MEG colocalization of N400 responses to auditory and visual words in avTL (Marinković et al., 2003), and activation of anterior temporal lobe (aTL) to written and spoken language in fMRI (Spitsyna et al., 2006). Unfortunately, we did not record responses to auditory words from the laminar microarrays, and so could not determine whether these responses were feedback or associative.
These supramodal category-selective responses to words are consistent with proposals that the avTL plays a central role in semantic representations (Lambon Ralph et al., 2010). Anomias and semantic dementia can be caused by lesions of this area (Bozeat et al., 2000; Damasio et al., 2004; Davies et al., 2004; Patterson et al., 2007; Jefferies et al., 2009; Mion et al., 2010), and the main generator of the N400, an event-related potential associated with lexicosemantic associations, is found here (Smith et al., 1986). Neuroimaging studies have often failed to find responses in aTL due to susceptibility artifacts or limited field-of-view (Visser et al., 2010a); however, recent studies using distortion-corrected fMRI (Binney et al., 2010; Visser et al., 2010b) and repetitive transcranial magnetic stimulation (Pobric et al., 2007, 2010a,b; Lambon Ralph et al., 2009) have provided further evidence for the importance of aTL in semantic processing. Interestingly, these studies have shown category-general semantic processing in lateral aTL (Lambon Ralph et al., 2010; Pobric et al., 2010b), while this study demonstrates category-specific effects in more inferior and medial areas. This is consistent with the finding that semantic dementia patients generally do not show a category-specific deficit; however, herpes simplex virus encephalitis patients, who have significantly greater medial involvement, do show such an effect (Lambon Ralph et al., 2007; Noppeney et al., 2007).
Our recordings demonstrated strong modulation of gamma-band activity by category membership when retrieving knowledge about the objects or animals represented by words. Gamma-band power from 30 to 40 Hz, recorded extracranially, has previously been associated with feature binding and the semantic lookup of lexical items (Lutzenberger et al., 1994; Pulvermüller et al., 1996a,b; Tallon-Baudry and Bertrand, 1999). The data presented here are broadly consistent with a role for gamma activity in the semantic encoding of lexical items within the avTL. Our results also suggest that gamma-band activity tends to be more focal than low-frequency LFP activity, as others have proposed (Lindén et al., 2010).
The anteroventral, inferotemporal, and perirhinal areas showing early semantic category-selective responses project strongly to entorhinal cortex, the gateway to the hippocampus (Insausti et al., 1987; Burwell, 2000). O'Keefe and Nadel (1987) originally proposed that the human hippocampus maps semantic space in a manner analogous to the rodent mapping of physical space. Indeed, human hippocampal neurons selectively fire to specific words (Heit et al., 1988), which may correspond to the firing of rodent hippocampal neurons to specific places. More recently, the apparent raw material for constructing place cells has been identified as the grid cells of entorhinal cortex in rats (Hafting et al., 2005). In a similar way, the firing of human entorhinal cells to specific semantic categories may provide the inputs used by hippocampal cells to select for individual words (Heit et al., 1988; Kreiman et al., 2000).
In conclusion, the results of this study demonstrate not only that category selectivity is present in avTL, but that this selectivity is present on the first pass of activity through this area. This activity is seen in measures sensitive to both synaptic and unit-firing activity at multiple spatial scales. The model proposed here suggests that avTL encodes semantic categories and provides this information to posterior ventral temporal areas when task demands so require, resulting in their variable category-selective hemodynamic response to words.
Footnotes
This work was supported by an NDSEG Fellowship, an AMNTP grant, and the Frank H. Buck Scholarship to A.M.C., a Rappaport Fellowship to S.S.C., and OTKA K-81357 and NKTH-ANR Neurogen to I.U.. Overall support was provided by NIH Grant NS18741. We thank A. R. Dykstra, C. J. Keller, N. Dehghani, J. Cormier, R. Zepeda, J. Donoghue, and I. Sukhotinsky for their helpful comments.
- Correspondence should be addressed to either Alexander M. Chan, 55 Fruit Street, Their 423, Boston, MA 02114, amchan{at}mit.edu; or Eric Halgren, 9500 Gilman Drive, Mail Code 0841, La Jolla, CA 92093-0841, ehalgren{at}ucsd.edu