Abstract
Common or folk knowledge about animals is dominated by three dimensions: (1) level of cognitive complexity or “animacy;” (2) dangerousness or “predacity;” and (3) size. We investigated the neural basis of the perceived dangerousness or aggressiveness of animals, which we refer to more generally as “perception of threat.” Using functional magnetic resonance imaging (fMRI), we analyzed neural activity evoked by viewing images of animal categories that spanned the dissociable semantic dimensions of threat and taxonomic class. The results reveal a distributed network for perception of threat extending along the right superior temporal sulcus. We compared neural representational spaces with target representational spaces based on behavioral judgments and a computational model of early vision and found a processing pathway in which perceived threat emerges as a dominant dimension: whereas visual features predominate in early visual cortex and taxonomy in lateral occipital and ventral temporal cortices, these dimensions fall away progressively from posterior to anterior temporal cortices, leaving threat as the dominant explanatory variable. Our results suggest that the perception of threat in the human brain is associated with neural structures that underlie perception and cognition of social actions and intentions, suggesting a broader role for these regions than has been thought previously, one that includes the perception of potential threat from agents independent of their biological class.
SIGNIFICANCE STATEMENT For centuries, philosophers have wondered how the human mind organizes the world into meaningful categories and concepts. Today this question is at the core of cognitive science, but our focus has shifted to understanding how knowledge manifests in dynamic activity of neural systems in the human brain. This study advances the young field of empirical neuroepistemology by characterizing the neural systems engaged by an important dimension in our cognitive representation of the animal kingdom ontological subdomain: how the brain represents the perceived threat, dangerousness, or “predacity” of animals. Our findings reveal how activity for domain-specific knowledge of animals overlaps the social perception networks of the brain, suggesting domain-general mechanisms underlying the representation of conspecifics and other animals.
Introduction
Epistemology is becoming an empirical science. Advances in functional neuroimaging and computational methods for neural decoding make it possible to study how knowledge is encoded in brain activity (Mitchell et al., 2008; Huth et al., 2012; Haxby et al., 2014; Guntupalli et al., 2016). For this purpose, the animal kingdom represents an exceptional test domain: knowledge of animals is relevant to human brain evolution, and nature provides objective criteria for evaluating animals along multiple salient dimensions.
Classic studies (Henley, 1969; Rips et al., 1973) show that knowledge of animals is shaped by two salient dimensions, namely “predacity” and size. Although the representation of size has been studied (Konkle and Caramazza, 2013), no previous neuroimaging work has systematically investigated predacity (but see Davis et al., 2014 for a hypothetical example). The term predacity has precedent in the psychological literature describing an implicit psychological dimension underlying similarity judgments. However, to avoid confusion with the biological concept, which entails specific relationships of eating and being eaten, we refer instead to perceived threat or aggressiveness; note that, whereas a frog is generally viewed as harmless, it is a dangerous predator to a fly. We assume the psychological dimension of “predacity” falls under this more general notion.
In previous work (Connolly et al., 2012), we found that the representational structure in occipitotemporal regions—collectively, the lateral occipital complex (LOC) known to support visual categories (Grill-Spector et al., 2001; Haxby et al., 2001)—reflected semantics in detail sufficient to support taxonomic hierarchies of animal classes, such as mammals versus birds versus bugs. Furthermore, the first principal component (PC) of the multidimensional similarity space in LOC suggested an “animacy continuum” (Huth et al., 2012). Pictures of nonhuman mammals evoke responses in the LOC similar to those evoked by viewing pictures of humans, whereas bugs evoke patterns similar to inanimate objects, suggesting a human-centric hierarchy with mammals near the top and insects near the bottom. A similar continuum underlies similarity judgments based on taxonomy; however, the psychological distinction between the “least-animate” animals and actual nonliving stimuli was much greater than that observed in the brain; thus, the living–nonliving distinction is not an obvious feature of the LOC (Sha et al., 2015).
However, animacy is not the only dimension represented in the LOC. Although it is a central feature of the LOC, patterns for animal classes remain distinct after removing animacy-related variance from category-specific multivariate data (Connolly et al., 2012). Furthermore, the general representational space in the LOC, characterized by responses to a full-length feature film, is best modeled by no less than 35 unique components (Haxby et al., 2011). These observations suggest that representation in the LOC comprises multiple feature dimensions encoded in overlapping multiplexed neural population codes. Category-specific patterns measured with fMRI reflect spatially quantized sums of activity of intermingled population responses. Our challenge is to tease apart the unique components of this high-dimensional space to isolate the representation of specific semantic dimensions.
To this end, we investigated how a dimension known to be important for knowledge about animals—namely, “predacity” or more generally perceived threat—is represented and how it is disentangled from the representation of animacy. We measured brain activity evoked by viewing 12 animal classes selected to vary along the dimensions of threat and taxonomic class spanning the range of animacy (Fig. 1). We hypothesized that threat would be identified with one of the as-yet unspecified secondary dimensions within the complex LOC space. However, as we report below, this hypothesis was not supported by our results. Instead, threat was a prominent dimension in a distributed network centered on the right superior temporal sulcus (STS), emerging as independent from low-level visual and taxonomic representations in a posterior-to-anterior progression. The overlap of this network with the social perception network suggests common mechanisms for perceiving the dangerousness of animals and the intentions and dispositions of other humans.
a, The stimuli were images from 12 animal categories spanning a 3 × 2 experimental design with three taxonomic groups (mammals, reptiles, and bugs) and two levels of threat (high and low). b, While undergoing fMRI scanning, subjects saw the stimuli presented in sets of three images in brief succession while they monitored whether the images all came from the same animal category. Category-specific brain responses were estimated using the GLM with regressors modeling the encoding trials that contained all the same category. Catch trials (i.e., trials that contained an oddball category) were infrequent and were not included in the subsequent analyses.
Materials and Methods
Participants
Participants were 12 adults with normal or corrected vision from the Dartmouth College community (age range, 20–35 years; mean age, 25 years; seven males). Subjects were screened for MRI scanning and provided informed consent in accordance with the Institutional Review Board of Dartmouth College. Subjects were paid an hourly rate for participation.
Stimuli and design
Subjects were shown still images of 12 different animal categories: deer, cottontail rabbits, wolves, cougars, tortoises, frogs, cobras, crocodiles, ladybugs, monarch butterflies, tarantulas, and scorpions. These 12 categories span the cells of a three × two experimental design (Fig. 1a) with three levels of taxonomic classes (mammals, reptiles and amphibians, and bugs) and two levels of threat (low and high). Two unique animal categories are assigned to each cell of this design. For simplicity, we will refer to the taxonomic group containing three reptiles and one amphibian as “reptiles.”
The images used for the fMRI experiment comprised 20 color photographs for each of 12 animal categories, plus left–right flipped complements, for a total of 40 unique images per category. The images were collected from the public domain. The images were edited to remove the background and scaled to the maximal size in a 400 × 400 pixel frame. Images were presented to subjects in the MRI scanner using a rear-projection screen positioned at the rear of the scanner and viewed with a mirror mounted to the head coil. Viewed images subtended ∼10° of visual angle. The stimulus presentation program was written in Python and used the PsychoPy package (Peirce, 2007).
Procedure
The stimuli were presented to subjects using a slow event-related design while they were engaged in a simple oddball task (Fig. 1b). “Same” encoding events consisted of three different images of the same category presented consecutively for 500 ms each. Occasional catch trials were pseudorandomly interspersed with same-encoding trials. Catch trials were oddball events in which the first two animals presented were from the same class but the third animal was from a different class (e.g., rabbit rabbit tarantula). Subjects were required to respond on each event indicating same events by pressing the index finger button and oddballs by pressing the middle finger. Events were followed by a 4500 ms interstimulus interval [6000 ms stimulus onset asynchrony (SOA)]. During null events, a fixation cross remained on the screen for the 1500 ms stimulus presentation interval. Event order was determined using a Type 1, Index 1 de Bruijn sequence (Aguirre et al., 2011) for first-order counterbalancing according to a 14 condition design: 12 target conditions plus oddball and blank events. There were three concatenated sequences split into seven runs with six presentations of each stimulus condition per run. To preserve the continuity of the de Bruijn sequence across scanning runs, three preceding dummy trials were added to the beginning and one to the end of each run. There was a total of 88 trials per run, 72 of which were target category trials (excluding oddball, blank, and dummy trials), and a grand total of 616 trials in the experiment, with 504 target encoding events.
Image acquisition
Brain images were acquired with a 3T Philips Achieva Intera scanner with a 32-channel head coil, using gradient-echo echo-planar imaging with a sensitivity-encoded reduction factor of 2. The MR parameters were as follows: TE, 35 ms; TR, 2000 ms; flip angle, 90°; resolution, 3 × 3 mm; matrix size, 80 × 80; and FOV, 240 × 240 mm. There were 42 transverse slices with full-brain coverage, and the slice thickness was 3 mm with no gap. Slices were acquired in an interleaved order. Each of the seven functional runs included 276 dynamic scans and four dummy scans for a total time of 560 s per run. At the end of each scanning session, a single high-resolution T1-weighted (TE, 3.72 ms; TR, 8.176 ms) anatomical scan was acquired with a 3D turbo field echo sequence. The voxel resolution was 0.938 × 0.938 × 1.0 mm, with a bounding box matrix of 256 × 256 × 220 (FOV, 240 × 240 × 22 mm).
Image preprocessing
fMRI image preprocessing was done using AFNI (Automated Functional Neuro-Imaging) software (Cox, 2012; free software available for download online from http://afni.nimh.nih.gov/afni). Time series data were corrected for differences in slice acquisition time (using 3dTshift in AFNI) and subject movement (3dvolreg). The time series were despiked (3dDespike) to remove any extreme values not attributable to physiological processes, thus correcting for normal scanner noise. The time series were detrended (3dDetrend) within each run to remove fluctuations in signal attributable to scanner drift using up to fifth-order polynomials. Motion-related components estimated during the motion-correction step were also removed from the signal during the detrending step. Values were normalized within each run to make the sum-of-squares equal to 1. Using a general linear model (GLM, using 3dDeconvolve in AFNI), we estimated the voxelwise magnitude of the hemodynamic responses evoked by each of the 12 stimulus classes. Predictor variables were constructed for each stimulus by convolving the presentation time courses with a canonical gamma-variate hemodynamic response model. Catch trials were modeled as nuisance variables and were not analyzed further. For each subject, eight separate GLM analyses were computed: one fitting the predictor variables for the entire seven-run experiment and one for each individual run. The voxelwise β weights for each stimulus class were used as input for all subsequent multivariate pattern and representational similarity analyses [multivoxel pattern analysis (MVPA) and representational similarity analysis].
The high-resolution T1-weighted images were used as input to Freesurfer (Fischl and Dale, 2000) for cortical surface reconstruction (using the Freesurfer recon-all suite, which is freely available for download online; http://surfer.nmr.mgh.harvard.edu). The Freesurfer generated surfaces were transformed to SUMA (Surface Mapping) format using SUMA (Saad and Reynolds, 2012; http://afni.nimh.nih.gov/afni/suma/). Each subject's surfaces were fitted to high- and low-resolution standard mesh grids. The high-resolution grid was defined by an icosahedron with 64 linear divisions, yielding 81,924 nodes for the whole-brain cortical surface, and the low-resolution grid was defined by an icosahedron with 32 linear divisions, yielding 20,484 nodes. The standard mesh grids allow for anatomical correspondence between surface nodes across subjects based on sulcal alignment of inflated surfaces to the MNI template provided by Freesurfer.
Surface-based MVPA searchlight
Functional brain mapping was done using a surface-based searchlight algorithm (Oosterhof et al., 2010, 2011) as implemented in PyMVPA (Hanke et al., 2009), which is documented and freely available for download online (http://www.pymvpa.org). The surface-based searchlight technique is a refinement of original spherical searchlight proposed by Kriegeskorte et al. (2006). Spherical searchlights run the risk of including mixtures of voxels from across sulcal boundaries and voxels that may include only white matter or CSF, making some results difficult to interpret. Surface-based searchlights ensure that only gray-matter voxels from contiguous regions on the cortical manifold are included in the same searchlight. This method is fully explained by Oosterhof et al. (2011). A brief explanation is provided here. Using each surface node on the low-resolution standard surface mesh grid, that node defines the center of a local disc comprising neighboring surface nodes of the high-resolution standard grid. Voxels in a subject's native volume are mapped to the surface nodes of the high-resolution grid using a many-to-one mapping so that each voxel is assigned to only one surface node, and each surface node may be assigned more than one voxel. Voxels are assigned to their closest node in the volume space. The radius of the searchlight disc was expanded outward from the center node until a desired number of voxels were selected: we used discs containing 100 voxels for all searchlight analyses. Iterating over all of the surface nodes in the low-resolution surface grid, each unique set of 100 voxels in turn was treated as a multivariate dataset, and some measure was computed (e.g., MVPA classification accuracy) and recorded at the center node. Subsequent group analyses were done by computing statistics across subjects (e.g., one-sample t test against chance classification performance) at each node in the low-resolution standard surface mesh grid.
Four-step analysis procedure
We analyzed the data using a four-step procedure. First, we ran searchlight pattern classifiers to identify all regions that contained information for discriminating stimuli based on either taxonomy or level of perceived threat. The purpose of this step was to characterize the anatomical extent of entire pathways that are likely to process these semantic dimensions. Second, we investigated the internal organization of each pathway. Because it is unlikely that the functional profile across a pathway is uniform, it is necessary to segment the pathways to better understand how information is organized and transformed across the pathway. Therefore, we segmented the pathways using a clustering algorithm to group surface nodes based on similarity of searchlight-defined dissimilarity matrices (DMs). This technique provides a functional segmentation of the pathways. Third, we visualized the representational spaces within subregions of each pathway using multidimensional scaling (MDS). Finally, we evaluated the structure of the information content across each pathway by comparing neural DMs from each functional cluster to a set of models that predict structure based on visual features and behavioral judgments. The following sections provide details for each of the four steps in our pipeline.
Step 1: localization using support vector machine patterns classification.
Using the surface-based searchlight approach described above, we localized all brain regions supporting category-specific information based on either taxonomy or threat. For this purpose, we used support vector machine (SVM; Cortes and Vapnik, 1995; Chang and Lin, 2011) pattern classifiers, as implemented in PyMVPA (Hanke et al., 2009). To classify our stimuli based on superordinate categories, such as taxonomy (e.g., mammals vs reptiles vs bugs) or threat (e.g., high vs low), we split the data to avoid training and testing on the same individual subordinate classes (e.g., cougars). In the high-dimensional representational space, some dimensions may carry subordinate category-specific information (e.g., information specific to cougars) but not necessarily about taxonomy or threat. A decision boundary influenced by such dimensions with cougars in the training set will result in correct classifications at the superordinate level whenever a cougar is encountered in the testing set, leading to inflated superordinate classification accuracies based on confounded subordinate-level information. We split the data into cross-validated testing and training folds using one design factor (e.g., taxonomy) and then tested classification of the second factor, in this case, threat level on the left-out taxonomic class. Thus, when training to discriminate taxonomy, we trained the classifier to learn mammals versus reptiles versus bugs by only having one-half of the stimulus categories in the training set (e.g., only those with low threat) and then testing on the other half (e.g., only those with high threat). We then repeated training and testing by switching the roles of the different halves of the stimuli (e.g., train using only high threat and then test using only low threat). Similarly, when training a classifier to discriminate between high and low threat, we would train on two of the taxonomic classes (e.g., mammals and reptiles) and then test on the left-out class (e.g., bugs). All possible combinations of stimulus class-based cross-validation folds were tested, and the mean accuracies were computed for each subject and recorded at the center node for each searchlight.
For multiclass SVM classification (i.e., mammals vs reptiles vs bugs), we used the standard implementation found in PyMVPA, which computes all linear maximum-margin hyperplanes separating every pair of classes. Test items are classified by each pairwise SVM and then ultimately assigned to the class that received the maximum number of votes. The soft-margin parameter, C, was automatically scaled by the norm of the training data for each data fold.
Step 2: organization of surface nodes into functional clusters.
To identify regions of interest (ROIs) that represented coherent functional subregions of the larger networks, we clustered the surface nodes based on the similarity between local representational structure measured as DMs. First, for each node to be clustered, we defined its DM as the set of correlation distances between all pairs of animal categories for the 100-voxel searchlight volume centered on that node. The DMs for all nodes to be clustered were vectorized and used as the rows in an N × M matrix, where N is the total number of nodes and M is the total number of pairwise distances in the upper triangle of the DM (i.e., 66 given 12 stimuli). Because the surface nodes were anatomically aligned between subjects using the standard mesh grid, a unique N × M matrix was produced for each subject in which the rows corresponded to the same N nodes.
We used hierarchical clustering using the Ward method (Ward, 1963), implemented in the Scikit-learn Python package (Pedregosa et al., 2011). The stability and reproducibility of the clusters is calculated using cross-validation across subjects according to the following scheme based on the study by Yeo et al. (2011). First, we calculate the searchlight-based DMs for all subjects. Then, for each surface node, we take the average DM across a subset of subjects—a randomly selected half of the subjects. We then use Ward clustering to find k cluster solutions for k = 2 through k = 100 clusters on this data. For each k clustering, we then predict the clustering solution for the second half of the data based on the centroids determined by the clustering solution on the first half of the data, yielding a cluster prediction map. Next, we run the clustering algorithm again on just the second half of the data with k clusters. We then calculate a consistency measure by comparing the predicted cluster map with the cluster map resulting from clustering just the second half of the data. We compare these maps using an adjusted mutual information statistic (Vinh et al., 2009). We repeated the process 1000 times with different random halves of the data for each k for k = 2 through k = 100. Maximum consistency for clustering the threat maps was found using seven clusters, whereas maximum consistency for the taxonomy maps was found with 10 clusters. Thus, we chose the k = 7 and k = 10 solutions for the threat and taxonomy networks, respectively, for additional investigation.
After determining the clusters based on representational similarity, the found clusters were subjected to a secondary spatial clustering step. The purpose of this step was to exclude isolated nodes and to identify anatomically plausible functional clusters. Spatial clustering used a proximity criterion of 10 mm and a neighborhood criterion of 20 nodes, i.e., 20 neighboring nodes within 10 mm nearest neighbor, and was implemented using the AFNI SurfClust program.
Step 3: visualization of representational spaces within clusters.
For visualization of the representational structure within the cluster-defined ROIs, we used STATIS (Structuration des Tableaux À Trois Indices de la Statistique; Abdi et al., 2012). STATIS is a method for combining data tables from multiple subjects to provide a group MDS solution using principal components analysis (PCA; Kruskal and Wish, 1978; Abdi and Williams, 2010; Abdi et al., 2012). Each participant contributes a data matrix with dimensions N × M, where N is the number of stimulus classes (in our case 12) and M is the number of voxels. Note that our cluster-defined ROIs are defined over surface nodes in the standard mesh grid, but the mappings from the nodes to the corresponding voxels in each subject are unique; therefore, the value of M varies across subjects. Before analysis, the individual data tables were mean centered along the columns, mean centered along the rows, and then normalized along the rows by dividing the row values by the row norms. Note that this transformation yields N × N cross-product matrices that are equivalent to Pearson's correlation matrices on the rows. We first analyze the between-subject correspondence between representational structures by computing the eigen decomposition of the S × S Rv matrix (Abdi, 2007), where S is the number of subjects (in our case 12), which reflects the pairwise correlations between similarity structures for all pairs of subjects. The first eigen vector provides weights for each subject that correspond to how similar this subject's similarity space is to the group average. These weights may also be used to identify individual differences and outliers among the subjects. The weights are then used to scale the original N × M data matrices for optimally scaling individual contributions to a group compromise data matrix. We combined the weighted subtables by stacking them horizontally to produce an N × M̂ group compromise, where M̂ is the sum of M across subjects. The group PCA solution is then computed using the singular value decomposition of the compromise matrix. After computing the PCA of the compromise, we compute confidence intervals for the factor scores for each stimulus class on each PCA dimension. This is done using bootstrap resampling of subjects with replacement and projecting the recalculated factor scores into the PCA space (Abdi et al., 2009). We visualize the distribution of bootstrapped factor scores by fitting ellipses to the factor score clouds in two dimensions containing 95% of the points.
Step 4: evaluation of models for explaining representational structure.
Similarity ratings were collected using Amazon's Mechanical Turk (AMT) crowd sourcing service (Mason and Suri, 2012). Subjects performed a triad judgment task in which they chose the odd-one-out from a set of three images that were presented side by side. In the taxonomic condition, subjects were instructed to make decisions based on the “kind of animals depicted.” In the threat condition, subjects were instructed to make their decisions based on “how dangerous” the animals were. The total number of combinations of 3 of 12 stimuli give a total of 220 unique triads. For a given condition, every unique triad was judged by 10 different AMT workers, while each worker was assigned a subset consisting of one-fourth (55) of the triads. Across conditions, 80 anonymous raters made a total 4400 triad judgments. Raters were paid a fixed sum for each set of 55 judgments. Triad judgments were converted to DMs following the procedures described by Connolly et al. (2012). The resulting behavioral DMs were used as models of semantic spaces separately reflecting threat-related semantics and taxonomy-related semantics. Henceforth, we refer to the threat and taxonomy behavioral models as THREAT and TAX, respectively.
In addition to these behavioral models, we used a computational model of visual processing in the primary visual cortex to model the early visual neural response to the stimulus images (HMAX; Serre et al., 2007). This standard model is a hierarchical bank of Gabor wavelet filters with various orientations, spatial frequencies, and visual field locations. We used the C1 units of the HMAX model to model a grayscale version of each stimulus image. C1 units are the second layer of the HMAX model and are said to have response profiles similar to those of complex cells in the striate cortex. The C1 unit vectors for each stimulus were averaged across exemplars for each of the 12 animal categories, and the DM for this early visual model (VIS) was calculated using correlation distance between all pairs of these averaged C1 unit vectors.
The THREAT model was not correlated with TAX (r = 0.01) or VIS (r = −0.02), and TAX and VIS were moderately correlated with each other (r = 0.33).
When comparing model DMs with neural DMs, it is important to calculate a measure of the noise ceiling that reflects the maximum correlation between DMs that can be expected given the level of noise inherent in the model. In our model DMs derived from behavioral judgments, the source of noise is intrasubject and intersubject differences, whereas in VIS, noise for a particular category, e.g., scorpions, arises from differences across exemplar images of that category. To calculate the noise ceiling for VIS, we computed the average correlation between different approximations of VIS using bootstrap resampling of our stimuli for each category, yielding a new VIS DM on each iteration. We then calculated the noise ceiling as the average correlation between all bootstrapped VIS DMs. We then repeated this step for TAX and THREAT using bootstrap resampling of our AMT subjects to measure the effects of between-subject variation.
Computing environment
Stimuli delivery, data preprocessing, and analysis were performed using GNU/Linux Debian operating system and software with additional neuroscience software from the NeuroDebian repository (Halchenko and Hanke, 2012).
Results
The SVM classification searchlight revealed several regions that carry information that discriminates between low- versus high-threat animals (Fig. 2). The searchlight results reflect classifiers trained using stimuli from two of three superordinate animal categories—mammals, reptiles, or bugs—and then were tested on the left-out category. For example, a classifier trained to separate high- versus low-threat animals based only on exemplars from mammals and bugs was then tested on just the reptiles. This conservative cross-validation scheme ensured that accuracy for threat was not attributable to learning to classify individual subordinate classes or learning narrow-scope threat variation that might apply within taxonomic class comparisons only.
A, Map showing the mean group accuracy for better-than-chance classification (chance, 0.5) for high versus low threat. Pattern classification was performed using a linear SVM classifier within a surface-based searchlight. The accuracy values passing the group result threshold of t(11) > 3 ranged from ∼0.53 to 0.58. B, Group searchlight t map for better-than-chance classification (chance, 0.33) for taxonomic class controlling for threat. Significantly above chance accuracies ranged from 0.38 to 0.63. Maps in A and B are thresholded at t(11) > 3, which has a p value = 0.006, uncorrected for multiple comparisons.
The regions that carried information related to threat included primarily an extended area in the right STS with additional patches of cortex in the left posterior STS (STSp), left anterior part of the intraparietal sulcus, left inferior frontal sulcus, a portion of early visual cortex (EV) along the calcarine sulcus, and several smaller isolated locations in the frontal lobes and temporal lobes.
In parallel to the threat analyses, searchlight classification was done to map three-way taxonomic class discrimination: mammals versus reptiles versus bugs. A similarly conservative cross-validation scheme was used in these analyses in which a classifier was trained on high-threat animals only (i.e., wolf, cougar, crocodile, cobra, scorpion, tarantula) and tested on only low-threat animals (i.e., deer, rabbit, tortoise, frog, butterfly, ladybug) and vice versa. The searchlight results for discriminating taxonomic classes yielded significant classification accuracy throughout all of high-level vision cortex, including the ventral temporal, lateral occipital, and posterior intraparietal cortices (Fig. 2). We also observed significant classification accuracy in the EVs with the exception of a small region at the occipital pole. The extent of the taxonomic classification map mostly replicates our previous findings (Connolly et al., 2012; Sha et al., 2015).
The SVM searchlights identified regions of cortex that carry information for discriminating between the different levels of threat and taxonomy, but this information sheds little light on the functional organization within these regions or about how the different regions might be related to each other. To investigate the organization within networks, we used hierarchical clustering to subdivide the above-threshold nodes from the SVM maps, i.e., those nodes with T > 3 in the group analysis. [Note that t tests have been shown to be robust against violations of normality (Rasch and Guiard, 2004), but their use for making valid population inferences about information measures may be controversial. Our choice of the t test in this context reflects standard current practice.] This analysis identified clusters of nodes with similar representational geometries, as indexed by the representational DMs of responses to the 12 animal categories. Figure 3A shows the seven clusters from the threat map. The spatial clustering step with clustering criterion of 20 neighbors and proximity 10 mm yielded four large anatomical clusters: (1) the right anterior STS (STSa, cluster 7); (2) the right mid-STS (STSm, cluster 6); (3) the right STSp (cluster 5); and (4) the EV (cluster 1). Clusters 2–4 yielded no nodes surviving our initial spatial clustering criteria. For completeness, to include these clusters in the analysis, we relaxed our spatial clustering criterion to require just 10 neighbors instead of 20, keeping the proximity criterion at 10 mm. The locations of clusters 2–4 can be seen in Figure 3.
Maps showing the clusters found by clustering the surviving surface nodes from the searchlight maps (Fig. 2) based on the similarity between local representational similarity matrices measured using a searchlight for each surface node. In a second step, the spatial clustering was applied to identify anatomically adjacent groups of surface nodes. A, The threat map yielded seven clusters. B, The taxonomy map yielded 10 clusters.
All 10 of the clusters derived from the taxonomy map yielded spatially coherent clusters (Fig. 3B). The pattern of clusters emerging from the taxonomy map resembles a bilateral progression of functionally related regions along the ventral and lateral occipitotemporal pathway. Starting with cluster 1 in the EV, the lateral and ventral anterior progression is mirrored bilaterally from clusters 3 through 7. Along the medial surface starting from the occipital pole, the progression is slightly different: starting in the EV, cluster 1 gives way to clusters 2, 8, and then 9 along the medial anterior progression, with cluster 10 appearing only on the right.
To visualize the representational structure within regions, we used MDS calculated using STATIS (Abdi et al., 2012). For simplicity, we have included MDS plots (Fig. 4) for the two regions that showed the clearest effects of threat and taxonomy, STSa (threat cluster 7) and anterior LOC (taxonomy cluster 5), respectively. The three-dimensional depiction of the representational space defined over the first three PCs of the STATIS solution for STSa (Fig. 4A) shows a plane separating high- from low-threat animals. To estimate the stability of the representational spaces, we used a bootstrap resampling of subjects to calculate 95% confidence intervals for the factor scores of the stimulus classes on each PC (Fig. 4, ellipses). The biplots for STSa for PCs 1 versus 2 and 1 versus 3 reflect the stability of the solution and how high- and low-threat animals are separated from each other as reflected by non-overlapping ellipses. The representational structure in taxonomy cluster 5 in the LOC shows strong separation between mammals and bugs on the first PC, with the reptiles between. This result mirrors the animacy continuum result that we have reported previously (Connolly et al., 2012; Sha et al., 2015). PCs 2 and 3 appear to mainly reflect differences among the various reptiles.
MDS for threat cluster 7 (STSa; A) and taxonomy cluster 5 (LOC; right; B). A, Top, Three-dimensional plot showing the first three PCs from STATIS for STSa, color coded for low (blue) and high (red) threat. The middle and bottom plots show the second and third dimensions, respectively, plotted against the first dimension. B, Top, Three-dimensional plot showing the first three PCs for the anterior LOC, color coded for mammals (brown), reptiles and amphibians (green), and bugs (purple). Middle and bottom plots show the second and third dimensions, respectively, plotted against the first dimension. The ellipses show 95% confidence intervals for the values of the factor scores based on 1000 bootstrap resamplings of the subjects. Tau is the percentage of variance accounted by each PC.
Evaluation of predictive models
Finally, to account for the representational organization within these regions and how they may be functionally related to each other, we looked at the relationships among the representational structures defined by the DMs for each of the regions, two models of representational structure based on behavioral ratings (THREAT and TAX), and one model of visual features (VIS; Fig. 5).
DMs for threat (THREAT), taxonomy (TAX), and the visual model (VIS). THREAT and TAX were derived from behavioral judgments collected using AMT. For THREAT, participants were instructed to make similarity judgments based on how dangerous the animal is, whereas for TAX, subjects were instructed to make judgments based on what kind of animal was depicted. VIS was based on features from a computational neural model (HMAX; Serre et al., 2007) to model the visual response to the stimulus images. The THREAT model was not correlated with TAX (r = 0.01) or VIS (r = −0.02), and TAX and VIS were moderately correlated with each other (r = 0.33). Noise ceiling calculations based on bootstrap resampling of behavioral subjects (for THREAT and TAX) or stimuli (for VIS) revealed maximum expected correlations with these models to be bounded by r = 0.99, r = 0.99, and r = 0.88 for THREAT, TAX, and VIS, respectively. Model DMs are shown on the same scale normalized over the interval zero to one.
The similarity relationships between pairs of DMs are shown using classical MDS in Figure 6. The MDS solutions were calculated separately for threat and taxonomy clusters based on DMs defined over the following: (1) seven threat clusters and three models (Fig. 6A); and (2) 10 taxonomy clusters and three models (Fig. 6B). In both solutions, the first PC captures the maximal distance between the THREAT model versus the VIS and TAX models, and the second PC separates VIS from TAX. The pattern of factor scores for threat clusters (Fig. 6A) shows that cluster 7 (STSa) was closest to the THREAT model, cluster 5 (STSp) closest to TAX, and cluster 1 (EV) closest to the VIS model. The factor scores for the 10 taxonomy clusters (Fig. 6B) had almost uniformly negative values on the threat-defined first PC. Their values instead spanned the range of the second PC, with the clusters in the occipital pole (1 and 2) closest to the VIS model, the transitional occipitotemporal cluster 3 neutrally loaded on PC2, and the clusters in the LOC (4–7) nearest to TAX.
Classical (metric) MDS showing the factor scores on the first two PCs for distance matrices defined between clusters and predictive models. A, Factor scores for the first two PCs for the MDS solution calculated from the pairwise DM defined over the seven threat clusters (circles) and three models (squares). Red arrow corresponds to increasing similarity with THREAT along the posterior–anterior axis of the right STS. B, Factor scores for the first two PCs for the MDS solution calculated from the pairwise DM defined over 10 taxonomy clusters (diamonds) and three models (squares). The green arrow corresponds to increasing similarity with TAX from EV to LOC. The solutions in A and B were calculated separately.
The factor scores along PC1 for the threat clusters in the right STS (Fig. 6A) suggest a trend of increasing similarity to THREAT in the posterior-to-anterior direction. To quantify this effect, we used the bootstrapped DMs to recalculate the MDS solution in Figure 6A (10,000 times) and for each iteration recorded the distances between THREAT and each of the STS clusters along the first PC. For each iteration, we calculated the slope of the linear fit to the ordered distances. A negative slope suggests decreasing distance between the ROIs and THREAT from STSp to STSm to STSa. The number of bootstrapped solutions producing a negative slope was 9873 of 10,000 (p < 0.02).
To measure the amount of unique variance in the brain DMs accounted for by each of our models, we calculated semipartial correlations for each brain region and each model DM by first regressing each model DM on the DMs of the other two models and then correlating the resultant residuals with the brain DMs (Fig. 7; Tables 1, 2). Using this method and bootstrap resampling of subjects, we calculated the unique explanatory contribution for each model for each region with 95% confidence intervals.
A, The semipartial correlations between the three models and the four surviving threat clusters. B, Semipartial correlations between the models and seven clusters visible along the right lateral surface. Error bars indicate 95% confidence intervals based on bootstrap resampling of the subjects.
Semipartial correlations (with ±95% confidence intervals) explaining unique variance accounted for by models (THREAT, TAX, and VIS) for seven threat clusters
Semipartial correlations (with ±95% confidence intervals) explaining unique variance accounted for by models (THREAT, TAX, and VIS) for 10 taxonomy clusters
The THREAT model DM was positively correlated with each of the DM for the threat-derived ROIs. This is expected because these regions were derived from the THREAT classification searchlight map. Visual inspection of the posterior-to-anterior trends along the STS suggest that the influences of the TAX and VIS models diminish in progression from posterior to anterior, whereas the relative influence of THREAT increases. The absolute magnitude of the correlation of THREAT with STSa was significantly greater than that for STSm (p < 0.05; based on the distribution of bootstrapped partial correlations) and trended toward significance compared with STSp (p < 0.1).
The purpose of these analyses is to illuminate relationships between subregions of the networks and the relative goodness of fit for different explanatory models of representation. We have taken caution to avoid making claims based on evaluation of ROIs on the same criteria used to identify them. For example, as noted, it is expected that threat-derived ROIs all correlate significantly with the THREAT model because these regions were selected initially using searchlight classification for high versus low threat. However, the observation that VIS best predicts representational structure in cluster 1 (EV) is not predetermined by the selection process, although it is consistent with what we know about the EV. Similarly, TAX predicts representational structure in cluster 5 (STSp) better than the THREAT model.
Threat representation becomes disentangled from confounding visual and taxonomic representations as the relative influence of these factors diminishes. To test the relative influences of THREAT versus TAX and VIS, we used the bootstrapped correlation data to measure the interactions between pairs of ROIs across the STS and the target model DMs. Thus, to test whether the relative influence of THREAT compared with VIS increased from STSp to STSa, we first calculated the difference between correlation values for THREAT and VIS with STSa for each bootstrap (STSaTH-V), then we calculated the difference between THREAT and VIS with STSp (STSpTH-V) for each bootstrap, and finally we calculated the difference between these two quantities for each bootstrap (STSaTH-V − STSpTH-V). Because we expect that the difference between THREAT and VIS will be greater in STSa than in STSp, we expect these values to be greater than zero, and so the proportion of the bootstrapped differences below zero directly give us a p value to test for significance. For the interaction tested by (STSaTH-V − STSpTH-V), the proportion of values below zero based on 10,000 bootstraps was p = 0.0011, i.e., significant at α level p < 0.002. A similar test for the interaction between STSa and STSm (STSaTH-V − STSmTH-V) yielded a proportion of values below zero of p = 0.02, and that for STSm versus STSp (STSmTH-V − STSpTH-V) was not significant. Thus, the relative influence of THREAT compared with VIS significantly increases from the pSTS to aSTS and from the STSm to STSa, with a nonsignificant increase between the STSp and STSm. Parallel analyses testing for the relative influence of THREAT versus TAX yielded the following: STSaTH-T − STSpTH-T, p < 0.0001; STSaTH-T − STSmTH-T, p = 0.05; STSmTH-T − STSpTH-T, p < 0.0001. Thus, the relative influence THREAT versus TAX increases from the STSp to STSm and again from the STSm to STSa.
These analyses show the directionality of the transformation of representational structure—THREAT emerges as the predominant explanatory factor from posterior to anterior along the STS by paring away confounding representations of low-level visual features and semantic information about taxonomy.
Visual inspection of trends outside of the STS suggest a similar posterior-to-anterior tradeoff between the influences of VIS and TAX. VIS was always more correlated across bootstraps with the DMs in taxonomy cluster 1 (EV) than with either TAX (p < 0.0001) or THREAT (p < 0.0001). Across the contiguous regions taxonomy clusters 3–5, the correlations with VIS decrease, whereas the correlations with TAX increase. To show the significance of these trends, it suffices to show the following. (1) The interaction between taxonomy clusters 4 and 3 (C4T-V − C3T-V)—with C4 just anterior to C3—yielded no values less than zero over 10,000 bootstraps. Thus, the (signed) difference between the correlations of TAX and VIS in cluster 4 was always greater than the difference between TAX and VIS in cluster 3 (p < 0.0001). (2) The interaction measured between taxonomy clusters 5 and 4 (C5T-V − C4T-V)—with cluster 5 just anterior to cluster 4—also yielded no values less than zero. Thus, the difference between the correlations of TAX and VIS in cluster 5 was always greater than the difference between TAX and VIS in cluster 4 (p < 0.0001), consistent with a transition from visual representation in the posterior LOC to taxonomic representation in the anterior LOC.
Discussion
This study investigated how knowledge about the natural world is encoded in neural populations in the human neocortex. Our strategy included measuring and evaluating patterns of BOLD activity across the brain in response to viewing a set of naturalistic stimuli chosen to orthogonalize psychological dimensions important for categorization within the animal kingdom ontological domain. Following classic studies in cognitive psychology, our target dimension was perceived threat—referred to in the literature as predacity—for which the neural basis was unexamined until now. The other dimension of interest was taxonomic classification for which we chose representative animals from mammals, reptiles, and bugs. These classes were chosen to span the implicit dimension we refer to as the animacy continuum, a major feature of LOC representation. Our primary finding showed that, although animacy dominated representational structure in the LOC, threat emerged as an independent dimension along a separate dorsolateral temporal pathway.
To summarize our findings, we used searchlight pattern classifiers to localize regions supporting distinctions based on taxonomy or threat, resulting in two mostly non-overlapping pathways: (1) threat classification yielded a set of nodes spanning from the EV through the right STS; and (2) taxonomic classification yielded the familiar ventrolateral occipitotemporal LOC pathway. To investigate how information is represented and transformed along these pathways, we used clustering to divide the pathways into subregions based on intrinsic functional organization. MDS within subregions provided visual evidence for a representational space separating animal classes based on perceived threat along the first two PCs in the STSa (Fig. 4A, Threat Cluster 7) and by the animacy continuum along the first PC in the LOC (Fig. 4B, Taxonomy Cluster 5). Finally, we illuminate how information is transformed across each pathway by evaluating the relative goodness of fit for three explanatory models: THREAT, TAX, and VIS, respectively, corresponding to two DMs derived from judgments of threat and taxonomy and one model of the EV.
Evaluation of the models across the threat pathway revealed that threat becomes more pronounced compared with other models in a posterior-to-anterior progression. Structure in the STSp (threat cluster 5) reflects a combination of THREAT, TAX, and VIS, with a trend for stronger taxonomic organization, followed by early visual features and then threat. The prominence of taxonomic representation in the STSp suggests that the representational geometry is an early stage in the emergence of threat as an independent dimension. Correlations with VIS and TAX diminish moving from the STSp to STSm (threat cluster 6), whereas that with THREAT remains mostly constant. The STSa was significantly more strongly correlated with the THREAT model compared with the other models, with a trend toward increasing absolute correlation with threat in the STSa compared with the STSp and STSm. These patterns suggest a progression that disentangles the representation (DiCarlo and Cox, 2007) of threat by paring away representations of early visual and unrelated semantic features to build a clearer, unconfounded representation.
We found no evidence for threat representation in the ventral pathway. Although we cannot rule out the possibility that ventral regions might contribute threat-relevant signals as inputs to the STS pathway, the lack of evidence for such contributions is consistent with the view of independent pathways for visual representation along the ventral and the more dorsal lateral temporal cortices. Evidence from fMRI, transcranial magnetic stimulation, and white-matter tractography suggest the independence of the ventral face pathway—including the occipital and fusiform face areas—from the lateral STS face processing regions (Gschwind et al., 2012; Pitcher et al., 2014). Furthermore, connections from the STSa to medial structures including the amygdala (Gschwind et al., 2012; Kravitz et al., 2013) suggest that this pathway mediates evaluation of visual stimuli for emotional response; however, our investigation of the amygdala showed no effects of threat in either response magnitude or multivariate signal (data not shown), consistent with previous findings that increased amygdala response is evoked only by phobia-relevant stimuli in subjects with specific phobias and not in normal controls (Dilger et al., 2003; Ahs et al., 2009).
The STS is widely accepted to be a central component of the social perception network (Allison et al., 2000) and is part of the core system for face perception (Haxby et al., 2000) representing the changeable aspects of faces that convey expressions and intentions. It is also activated by biological motion such as point-light displays of human actions (Grossman et al., 2010), geometric shapes that appear to act with agentic purpose (Castelli et al., 2000; Gobbini et al., 2007), and robots (Shultz and McCarthy, 2012). The STSa reflects head-angle invariant representation of gaze direction (Carlin et al., 2011) and dynamic information about faces (Pitcher et al., 2011) critical for evaluating the intentions of others. Because the STSa is also a central locus of threat representation, as we demonstrate, this suggests a relationship between the representation of threat of animals and social perception. The STSa—and the extended network associated with it—is involved in the cognitive evaluation of threat, and, in social contexts, evaluation of threat involves assessing the aggressiveness or trustworthiness (Winston et al., 2002; Engell et al., 2007) of other humans and their current intentions with regards to oneself.
Social cognition, by definition, involves thinking about conspecifics. However, there is evidence of overlap in cognitive mechanisms for perceiving conspecific and interspecific actions. Neural activity evoked by observing motions of animals overlaps with that for observing motions of humans (Ptito et al., 2003). Observing mouth movements of dogs, monkeys, and humans activates motor cortex if the actions are in the human repertoire (Buccino et al., 2004). The STSp in dog experts but not non-experts is active when viewing socially relevant dog postures (Kujala et al., 2012), consistent with findings that perception of fear in dogs is different for experts and non-experts (Wan et al., 2012).
We also examined a set of regions identified by high classification accuracy for taxonomic classes, including regions spanning most of the LOC, taxonomy clusters 3–7 (Figs. 3, 6, 7). Analysis of these regions suggest a progression from early to late vision: cluster 3 was best predicted by the VIS model, cluster 4 correlated slightly more with TAX than VIS, and clusters 5–7 were clearly best predicted by TAX. Our structural analyses of cluster 5, a central cluster of LOC (Fig. 4), revealed maximal distinction between mammals and bugs with reptiles between, providing a replication of the animacy continuum result reported by Connolly et al. (2012) and Sha et al. (2015) for which animals with high animacy (e.g., mammals) evoke greater activity in lateral than medial ventral temporal cortex than less animate animals (e.g., bugs) that in turn evoke greater activity in medial than lateral ventral temporal cortex (topographical analysis not shown here).
Conclusions
The representation of threat independent of the biological class of perceived animals is encoded within neural systems that also play a central role in social perception. The representation of agents—animals, people, or animated cartoons—that are motivated by complex inner states is related to the animacy continuum (Gobbini et al., 2007, 2011; Sha et al., 2015). However, threat is independent of the degree of perceived animacy, insofar as animals that are high and low in terms of animacy can be predators or prey, for example, wolves versus rabbits or scorpions versus ladybugs. Therefore, perception of threat is apparently related to a different dimension of social perception: the perception of dispositions that can lead to threatening or benign actions. The internal states that motivate these actions need not be complex or simple because the relevant factor is the potential danger to oneself or others.
Footnotes
Funding was provided by National Institutes of Mental Health Grants F32MH085433-01A1 (A.C.C.) and 5R01MH075706 (J.V.H.) and by National Science Foundation Grant NSF1129764 (J.V.H.). We thank the present and former members of the Haxby Laboratory who provided expert advice and assistance, especially Jason Gors, Michael Hanke, and Yu-Chien Wu.
The authors declare no competing financial interests.
- Correspondence should be addressed to Andrew C. Connolly, Department of Neurology, Dartmouth Hitchcock Medical Center, One Medical Center Drive, Lebanon, NH 03756. andrew.c.connolly{at}dartmouth.edu