There is now strong empirical evidence that modal perception, action, and emotion systems play a large role in concept retrieval (Fischer & Zwaan, 2008; Kiefer & Pulvermüller, 2012; Meteyard, Rodriguez Cuadrado, Bahrami, & Vigliocco, 2012). Concepts are generalizations derived from sensory, motor, and affective experiences, and the principle that modal brain systems responsible for these experiences are also involved in knowledge retrieval provides a parsimonious account of concept acquisition and storage (Barsalou, 1999; Damasio, 1989). Embodiment of conceptual knowledge provides a natural mechanism for grounding concepts in perception and action and thus the critical means by which concepts can refer to the external world (Harnad, 1990).

Much more research is needed to clarify the extent to which different levels of sensory, motor, affective and other hierarchical processing systems are involved in concept representation, the role of bimodal and multimodal areas, the involvement of these systems in representing temporal and spatial event concepts, their role in abstract concepts, and so on. As this research unfolds, it is also useful to keep in mind that not all brain areas that process concepts are content-specific. Before the “embodiment revolution,” it was not uncommon to study conceptual processing in the brain without reference to any specific type of semantic content. Many functional imaging studies, for example, compared neural responses evoked by unselected words with responses evoked by pseudowords (Binder et al., 2003; Binder, Medler, Desai, Conant, & Liebenthal, 2005; Cappa, Perani, Schnur, Tettamanti, & Fazio, 1998; Démonet et al., 1992; Henson, Price, Rugg, Turner, & Friston, 2002; Ischebeck et al., 2004; Kotz, Cappa, von Cramon, & Friederici, 2002; Kuchinke et al., 2005; Mechelli, Gorno-Tempini, & Price, 2003; Orfanidou, Marslen-Wilson, & Davis, 2006; Rissman, Eliassen, & Blumstein, 2003; Xiao et al., 2005). The assumption was that meaningful words would engage concept retrieval to a greater degree than meaningless pseudowords, regardless of the specific content of the word meanings. A similar logic applied to studies contrasting related word pairs with unrelated word pairs (Assaf et al., 2006; Graves, Binder, Desai, Conant, & Seidenberg, 2010; Mashal, Faust, Hendler, & Jung-Beeman, 2007; Mechelli, Josephs, Lambon Ralph, McClelland, & Price, 2007; Raposo, Moss, Stamatakis, & Tyler, 2006) and studies contrasting sentences with random word strings (Humphries, Binder, Medler, & Liebenthal, 2006; Humphries, Willard, Buchsbaum, & Hickok, 2001; Kuperberg et al., 2000; Mashal, Faust, Hendler, & Jung-Beeman, 2009; Obleser & Kotz, 2010; Obleser, Wise, Dresner, & Scott, 2007; Pallier, Devauchelle, & Dehaene, 2011; Stringaris, Medford, Giampietro, Brammer, & David, 2007). In each case, a “semantic system” was expected to respond more strongly to the more meaningful stimulus than to the less meaningful stimulus, regardless of the specific type of content that was represented.

Somewhat surprisingly, given the unselected nature of the stimuli and the wide variety of tasks that were used, these studies yielded very reproducible results. My colleagues and I performed an activation likelihood estimate (ALE) meta-analysis of 87 such studies (Binder, Desai, Conant, & Graves, 2009). To be included, each experiment had to include a comparison task that provided controls for orthographic, phonological, and general cognitive demands of the semantic task. The results (see Fig. 1) revealed a distributed, left-lateralized network comprised of seven nodes: (1) inferior parietal cortex (angular gyrus and portions of the supramarginal gyrus); (2) middle and inferior temporal gyri, extending into the anterior temporal lobe; (3) ventromedial temporal cortex (fusiform and parahippocampal gyri); (4) dorsomedial prefrontal cortex (superior frontal gyrus and posterior middle frontal gyrus); (5) ventromedial prefrontal cortex; (6) inferior frontal gyrus (mainly pars orbitalis); and (7) the posterior cingulate gyrus and precuneus.

Fig. 1
figure 1

A supramodal “conceptual hub” network identified by quantitative meta-analysis of 87 neuroimaging studies of semantic processing. The studies all included a manipulation of stimulus meaningfulness but no manipulation of modality-specific content. Note. DMPFC = dorsomedial prefrontal cortex; FG/PH = fusiform gyrus/parahippocampus; IFG = inferior frontal gyrus; IPC = inferior parietal cortex; PC = posterior cingulate/precuneus; VMPFC = ventromedial prefrontal cortex. Adapted with permission from Binder et al. (2009). (Color figure online.)

Some anatomical characteristics of this network are noteworthy. Without exception, all nodes in the network are high-level multimodal/supramodal areas distant from primary sensory and motor cortices (Mesulam, 1985; Sepulcre, Sabuncu, Yeo, Liu, & Johnson, 2012). Each has been identified as a “hub” with a dense and widely distributed pattern of connectivity (Achard, Salvador, Whitcher, Suckling, & Bullmore, 2006; Buckner et al., 2009). A conspicuous feature of the parietal and temporal regions is that they are sandwiched between multiple modal association cortices. For example, the angular gyrus lies at a confluence of visual, somatosensory, and auditory processing streams. Macaque area PG/7a, the closest monkey homologue of the angular gyrus, receives inputs exclusively from secondary visual, auditory, and multimodal regions (Andersen, Asanuma, Essick, & Siegel, 1990; Cavada & Goldman-Rakic, 1989; Jones & Powell, 1970). The ventral anterior temporal lobe, which has probably been under-represented in fMRI studies of semantic processing due to difficulty obtaining MRI signals from this region (Devlin et al., 2000), is another case in point. This region receives inputs from a broad range of modal association cortices (Jones & Powell, 1970; Van Hoesen, 1982), and patients with damage to this general region show multimodal (visual, auditory, motor) knowledge deficits (Patterson et al., 2007). Such facts suggest that these temporal and parietal nodes occupy positions at the top of a multimodal, convergent sensory-motor-affective hierarchy (Damasio, 1989). Their activation across a wide range of meaningful stimuli regardless of sensory-motor-affective content suggests that the information processed in these regions is not strongly tied to a particular perceptual or motor modality.

But what is the precise nature of the information represented in these high-level convergence zones, and what role might these representations play in semantic cognition? Standard models of cognitive processing certainly depend on amodal symbolic representations (Newell & Simon, 1976; Pylyshyn, 1984), but are these abstract representations necessary for actual conceptual processing in the brain or merely a convenience for creating computational models? Evidence that sensory, motor, and affective systems play a role in conceptual processing is increasingly difficult to deny, and the principle of modality-specific knowledge representation provides an elegant account of concept acquisition and grounding. If the conceptual content of actual human consciousness can be fully specified by activation of sensory-motor-affective information, what need is there for highly abstract representations (Barsalou, 1999; Gallese & Lakoff, 2005; Martin, 2007; Prinz, 2002)? In addition to their possible redundancy, abstract representations are usually conceived as having fixed content, such that models composed entirely of abstract symbols are often criticized as inflexible and unable to account for context effects (Barsalou, 1982; McCarthy & Hayes, 1969; Murphy & Medin, 1985). In contrast, distributed modal representations of conceptual knowledge are capable of context-sensitive variation in the pattern and relative strength of activation of component modal features, enabling dynamically flexible conceptual representation (Barsalou, 2003).

In the following brief discussion, I propose a way of thinking about abstract conceptual representations as high-level conjunctions rather than amodal symbols, and discuss some specific functions these representations might have. A variety of empirical neuroimaging findings are then explained in terms of the predicted responses of such representations to particular stimulus and task manipulations. The formulation owes much to previous convergence zone theories (Damasio, 1989; Simmons & Barsalou, 2003) and pluralistic representational accounts (Andrews, Frank, & Vigliocco, 2014; Dove, 2009; Louwerse & Jeuniaux, 2010; Meteyard et al., 2012; Patterson et al., 2007). The principal aims here are to expand the list of potential computational advantages conferred by high-level conjunctive representations and to review in some detail the neuroimaging evidence specifically relevant to these proposed processes.

The utility of broadly conjunctive conceptual representations

Some clarification of terminology is first necessary. Symbolic representations in traditional computational theories of cognition are “abstract” by definition: They refer to concepts via an arbitrary relationship and have no intrinsic content aside from links to other symbols (Harnad, 1990). The theory presented here is rather different. Abstract representations in the brain arise from a process of hierarchical conjunctive coding, and it is their combinatorial nature that is important rather than their abstractness per se. Conjunctive representation occurs when a neuron or neural ensemble responds preferentially to a particular combination of inputs. The essential function of neurons is to collect and combine information, and conjunctive representation seems to be a ubiquitous feature of perceptual systems in the brain (Barlow, 1995). Abstraction occurs at the level of a conjunctive representation because the representation codes the simultaneous occurrence of two or more inputs, say A and B, and not, in general, all of the particulars of A or B. These particulars are retrieved as needed by top-down activation of A and B by the conjunctive representation (Damasio, 1989).

Rather than “abstract representation,” a term closely tied to nonbiological models of cognition, I will use the term “crossmodal conjunctive representation” (CCR) to emphasize the essential combinatorial function of these representations and their origin in neurobiological systems. Another advantage of this term is that it offers flexibility regarding how “abstract” a particular representation is relative to low-level sensory-motor representations. All indications are that conjunctive representations are arranged hierarchically in perception and action systems (Felleman & Van Essen, 1991; Graziano & Aflalo, 2007; Hubel & Wiesel, 1968; Iwamura, 1998; Kobatake & Tanaka, 1994; ), with multiple levels of representational complexity, where “complexity” refers to the number or range of low-level inputs contributing to activation of the conjunctive representation. The degree to which lower-level information (e.g., information coding a particular shape, color, or body action) is retained at higher levels of representation (e.g., banana) presumably varies depending on the salience of the information and level of representation. At very high levels of this convergent hierarchy, CCRs might retain so little representation of actual experiential information that they functionally resemble arbitrary symbols. The key point, however, is that CCRs are not theoretical constructs; they arise through neurobiological convergences of information. They are as abstract as they “need to be” to represent a combination of inputs. On this view, there is no absolute demarcation between embodied/perceptual and abstract/conceptual representation in the brain.

It is important to stress here that the CCR terminology is adopted purely as a convenient, descriptive label intended to bring to mind the basic neural computational process of conjunctive coding, and should not be taken as a novel proposal. A number of previous authors have proposed models of knowledge representation based on hierarchical conjunctive coding in convergence zones at different levels of complexity (Damasio, 1989; Simmons & Barsalou, 2003). A CCR is equivalent to the content represented in a crossmodal convergence zone (Simmons & Barsalou, 2003).

Another important clarification is that CCRs are not necessarily highly localist in their neural realization. The critical aspect of CCRs is that they represent broad combinations of inputs. In theory such representations could be instantiated in single, dedicated cells, and such sparse, highly localized representations have been observed in the medial temporal lobe (Quiroga, Reddy, Kreiman, Koch, & Fried, 2005). Given the almost infinite number of concepts and concept variations that are possible, however, it is more likely that CCRs are instantiated as distributed neural ensembles or networks, and that a given neural ensemble represents a range of related concepts through variation in a distributed pattern of activation (O’Reilly & Busby, 2001).

The role of conjunctive coding has been explored, under various guises, in multiple sensory and motor domains (Fitzgerald, Lane, Thakur, & Hsiao, 2006; Graziano & Aflalo, 2007; Hubel & Wiesel, 1968; Kobatake & Tanaka, 1994; Schreiner, Read, & Sutter, 2000; Suga, 1988) and in episodic memory encoding (O’Reilly & Rudy, 2001; Lin, Osan, & Tsien, 2006; Rudy & Sutherland, 1995). In the domain of semantic cognition, Rogers, Patterson, and colleagues argued that broadly convergent, supramodal conceptual representations allow the brain to recognize underlying object similarity structure in the face of variably overlapping and conflicting features (Rogers et al., 2004; Rogers & McClelland, 2004; Patterson et al., 2007). For example, people know that apples, oranges, bananas, grapes, and lemons are all fruit despite salient differences in their appearance, taste, associated actions, and names. In computer simulations in which neural networks were trained to map between sensory, motor, and verbal features of objects, only networks containing highly convergent representations were able to capture semantic similarity relationships between the objects (Rogers & McClelland, 2004). Thus, CCRs that capture multimodal convergences appear to be necessary for learning taxonomic category relationships.

A related and equally ubiquitous phenomenon for which CCRs provide a much-needed explanation is thematic association. Consider the statement “The boy walked his dog in the park.” The inference that the dog is likely wearing a leash cannot be made purely on the basis of the sensory-motor features of dog, walk, park, or leash. Rather, the leash is a thematic or situation-specific association based on co-occurrence experiences. Thematic associations of this kind (dog-bone, coffee-cup, paper-pencil, shoe-lace, etc.) are pervasive in everyday experience and provide much of the foundation for our pragmatic knowledge (Estes et al., 2011). What kind of neural mechanism would support such associations? A mechanism that is sensitive only to sensory-motor feature similarity would find this a hard problem. Any association between coffee and cup based on feature content would be unlikely to generalize to other associations of coffee (e.g., cream, sugar, café, barista). The problem is that thematic associations primarily reflect situational co-occurrence rather than the structure of feature content, and the enormous number and variety of such associations would seem to make links based solely on a linear function of overlapping features impossible.

CCRs solve this problem by providing highly abstract conceptual representations activated by conjunctions of features, which can then “wire together” with other highly abstract conceptual representations with which they co-occur. That is, activation of the concept leash in the context of walk, dog, and park results from direct activation of the CCR for leash by the CCRs for the other concepts, independent of the sensory-motor feature overlap between these concepts. Mapping between concepts that have little or no systematic feature overlap, like dog and leash, is conceptually similar to other arbitrary mapping problems, such as mapping between orthographic or phonological word forms and meaning. In such cases, the output is not a simple linear combination of features of the input, and intermediate representations that combine information across multiple features are necessary to enable nonlinear transformations (Rumelhart, Hinton, & Williams, 1986). Thus, another principal function of high-level CCRs is to provide a neural mechanism for activating a field of thematically associated concepts independent of any shared sensory-motor feature structure.

Learning and retrieving taxonomic and thematic associations, however, is not an end in itself. The ability to learn and retrieve associations between concepts makes possible a range of other abilities. Prominent among these is the ability to mentally retrieve a typical situation or context in which a concept occurs. Thematic association underlies, for example, our ability to retrieve the context kitchen when presented with the concept oven, and to retrieve a set of other concepts thematically related to ovens and kitchens. This rich associative retrieval in turn enables more efficient and more complete comprehension of oven, and it primes the processing of any items in the thematically related field that might subsequently appear (Estes et al., 2011; Hare, Jones, Thomson, Kelly, & McRae, 2009; Metusalem et al., 2012). Thus, thematic association can be thought of as a form of prediction that allows anticipation of future events and extensive inference about current situations (Bar, 2007).

Although associations derived from experience offer important predictive advantages, human conceptual abilities are not limited to retrieval of frequent associations. A defining feature of human thought is its generativity and creative capacity. This generative capacity depends on the ability to compute mental representations of situations (i.e., events, states, and other propositional content). A situation, in the most general sense, can be thought of as simply a configuration of concepts, generally including entities, actions, properties, and relationships. For illustration purposes, take any two objects O1 and O2, two intentional agents A1 and A2, an intransitive action I, a transitive action T, a locative preposition L, and a property P:

The O1 was P.:

= property state (e.g., The ball was heavy.)

The O1 was L the O2.:

= spatial relationship state (e.g., The ball was in the box.)

The A1 did I.:

= intransitive event (e.g., The girl ran.)

The A1 did T to O1.:

= transitive object event (e.g., The girl hit the ball.)

The A1 did T to A2.:

= transitive social event (e.g., The girl hit the boy.)

As these schematic examples illustrate, propositional content is constructed of configurations of concepts. For a situation to be represented in awareness, all of the constituent concepts must be simultaneously activated and in some sense bound together, with each concept assigned its thematic role. It is difficult to see how such complex conceptual combinations could be instantiated using sensory-motor representations alone. This would require a flexible representation of thematic roles within sensory-motor systems that would distinguish, say, the concept of girl as an agent versus girl as a patient in a social situation. Such a distinction would depend on relationships between the girl and the other entities comprising the situation, which by definition arise de novo from the particulars of the situation and so could not be contained within the sensory-motor content of girl. High-level CCRs provide a schematic, or “chunked” representation of concepts to which roles can be assigned flexibly, based on context.

The specific mechanisms by which such conceptual composition occurs are still largely unknown, and a detailed discussion of these processes is beyond the scope of this review. In a language comprehension context, syntax obviously provides important sources of information for constraining conceptual composition. The present theory, however, is about conceptual processing in general, whether in a linguistic or a nonlinguistic “mental imagery” context. Even in language tasks it seems clear that a conceptual composition must be computed independent of language prior to comprehension or overt expression (Bransford & Johnson, 1973; Kintsch & van Dijk, 1978; Metusalem et al., 2012; Tanenhaus et al., 1995). One general idea is that CCRs are associated with other concepts that “afford” particular kinds of roles and relationships. As one example, the concept of intentionality is strongly associated with concepts of individual people and groups of people, and to some extent with intelligent animals. Activation of this associated concept biases interpretation toward a role as agent in a situation. As another example, very large, inanimate objects (parks, buildings, etc.) are associated with the concept of being fixed in space, which affords a role as a spatial reference point and a geographical ‘container’ in which activities can occur. Verb concepts, too, have associations that constrain the types of subjects and objects with which they can sensibly combine (e.g., a car can hit a tree but a car cannot eat a tree) and specify the spatial, temporal, body action, mental experience, social, and other schemata contained in the event that is being represented (Jackendoff, 1990; Levin, 1993).

According to this theory, then, another principal function of high-level CCRs is to create mental representations of situations. The importance of this process for human cognition is hard to overstate, as it provides the semantic content for our episodic memory, imagination of future events, evaluation of propositions for truth value, moral judgments, goal setting and problem solving, daydreaming and mind wandering, and all other thought processes that involve forming relational configurations of concepts. One often-discussed problem for which such configurations might provide a general solution is the representation of very abstract concepts, such as justice, evil, truth, loyalty, and idea. Many such concepts seem to be learned by experience with complex social and introspective situations that unfold over time and involve multiple agents, physical events, and mental events (Barsalou & Wiemer-Hastings, 2005; Borghi, Flumini, Cimatti, Marocco, & Scorolli, 2011; Wiemer-Hastings & Xu, 2005). Thus, Barsalou has proposed that such concepts seem “abstract” because their content is distributed across multiple components of situations (Barsalou & Wiemer-Hastings, 2005). According to this view, then, the ability to build mental representations of situations through relational configuration of high-level CCRs is central to the representation of many abstract concepts.

A frequently noted limitation of symbolic representations is their static nature, which rules out contextual flexibility in concept retrieval (Barsalou, 1982; McCarthy & Hayes, 1969; Murphy & Medin, 1985; Wittgenstein, 1958). It is important to realize, however, that this problem arises only in models composed entirely of static symbols. Hierarchical convergence zone models contain a mixture of (subsymbolic) distributed modal representations and more abstract conjunctive codes, and permit interactions between and within levels. Context effects could arise in these structures through two mechanisms. First, interactions at high levels between CCRs representing the context (call them “context CCRs”) and CCRs representing the topic concept could modulate activation of other high-level CCRs associated with the topic. For example, in the context of the question, “What color is your dog?”, the context CCR color activates a field of color concepts, one of which is associated with my dog and thus receives additional activation. Second, context CCRs could cause top-down activation of modal components of the topic CCR. In the context of the question, “What does your dog sound like?”, the context CCR sound interacts with the topic CCR my dog to produce top-down activation of a perceptual simulation of the sounds produced by your dog.

Some neuroimaging evidence for broadly conjunctive conceptual representations

Given the hypothesis that nodes in the “conceptual hub” network shown in Fig. 1 contain high-level CCRs, several fairly straightforward predictions are possible regarding modulation of activity in these nodes. The first is that activation in these areas should reflect the number of CCRs that are active (and their intensity of activation) at any given moment, which in turn depends on the number of associations that these CCRs have. Distributed neural ensembles in these regions are literally equivalent to CCRs, each of which can activate a set of associated CCRs. (The exact set activated and the strength of activation of each member in the set is assumed to vary with context and individual experience.) All else being equal, a CCR that activates many other associated CCRs (causing, in turn, activation of the CCRs associated with those CCRs, and so on) will produce greater activation in these areas than a CCR with relatively few or relatively weak associations. This prediction was verified by Bar and colleagues (Bar, 2007) in a series of studies contrasting object concepts that have strong thematic associations (e.g., microscope) with objects that have weaker or less consistent thematic associations (e.g., camera). Relative to low-association concepts, high-association concepts produced greater activation of the posterior cingulate/precuneus region, the medial prefrontal cortex, and a left parieto-occipital focus that is probably in the posterior angular gyrus (Talairach coordinates -49, -72, 13).

The Bar et al. experiment demonstrates the specific effect of association strength and set size on activity at high levels of the conceptual network, but the same principle accounts for a wide range of similar results from studies that did not explicitly manipulate this variable. For example, nodes in the conceptual hub network are activated by single words relative to pseudowords (Binder, Medler, et al., 2005; Binder et al., 1999; Binder et al., 2003; Cappa et al., 1998; Démonet et al., 1992; Henson et al., 2002; Ischebeck et al., 2004; Kotz et al., 2002; Kuchinke et al., Mechelli et al., 2003; Orfanidou et al., 2006; Rissman et al., 2003; Xiao et al., 2005; see Fig. 2, top left). According to the present theory, this is due to the fact that pseudowords have no strong associations with concepts, and therefore evoke little, if any, high-level CCR activation. Very similar results were obtained in studies comparing responses to familiar and unfamiliar proper names (Sugiura et al., 2006; Woodard et al., 2007; see Fig. 2, lower left). Like pseudowords relative to words, unfamiliar names, which refer to no known individual, have far fewer associations than familiar names, which refer to actual people about which one has associated knowledge. An important related observation is that activation of conceptual hub regions by known words is stronger when a semantic retrieval task is required than when a non-semantic task (e.g., phonological or orthographic decision) is required (Craik et al., 1999; Devlin, Matthews, & Rushworth, 2003; Gitelman, Nobre, Sonty, Parrish, & Mesulam, 2005; Miceli et al., 2002; Mummery, Patterson, Hodges, & Price, 1998; Otten & Rugg, 2001; Price, Moore, Humphreys, & Wise, 1997; Roskies, Fiez, Balota, Raichle, & Petersen, 2001; Scott, Leff, & Wise, 2003). This indicates that activation of CCRs and spread of activation to associated CCRs is not an entirely automatic process, but depends in part on cognitive control.

Fig. 2
figure 2

Activation of the conceptual hub network by four manipulations of associative content. Top left: Activation by words (hot colors) relative to pseudowords (cool colors) during an oral reading task. (Adapted with permission from Binder, Westbury, et al., 2005.) Lower left: Activation by familiar (i.e., publicly famous) person names relative to unfamiliar names during a famous/unfamiliar decision task. (Adapted with permission from Woodard et al., 2007.) Top right: Activation by concrete words (hot colors) relative to abstract words (cool colors) averaged across three studies using lexical decision, semantic decision, and oral reading tasks. (Adapted with permission from (Binder, 2007) Lower right: Activation by high-frequency (hot colors) relative to low-frequency (cool colors) words during an oral reading task. (Adapted with permission from Graves, Binder, et al., 2010.) Areas activated in common across all studies include the angular gyrus, posterior cingulate gyrus/precuneus, and superior frontal gyrus (dorsomedial prefrontal cortex). (Color figure online.)

Another observation explained by the general principle of association is the activation of conceptual hubs, particularly the angular gyrus, ventral temporal lobe, and posterior cingulate region, by concrete relative to abstract concepts (Bedny & Thompson-Schill, 2006; Binder, Medler, et al., 2005; Binder, Westbury, et al., 2005; Binder et al., 2009; Jessen et al., 2000; Fliessbach, Wesi, Klaver, Elger, & Weber, 2006; Graves, Desai, Humphries, Seidenberg, & Binder, 2010; Sabsevitz et al., 2005; Wallentin, Østergaarda, Lund, Østergaard, & Roepstorff, 2005; see Fig. 2, top right). Concrete words show a variety of behavioral processing advantages over abstract words, including faster response times in lexical and semantic decision tasks and better recall in episodic memory tasks. Paivio explained these advantages as due to the availability of visual and other sensory associations in the case of concrete concepts and not in the case of abstract concepts (Paivio, 1986). Schwanenflugel and colleagues proposed that concrete concepts have greater “context availability” (Schwanenflugel, 1991), meaning that they more readily or automatically activate a network of situational and contextual associations than abstract concepts. Thus, these theories have in common the idea that abstract concepts produce less activation of associated knowledge than concrete concepts. This claim might initially seem to contradict other proposals, mentioned above, that abstract concepts depend on complex situational knowledge to a greater degree than concrete concepts (Barsalou & Wiemer-Hastings, 2005). However, the idea that abstract concepts depend more on situational knowledge does not mean that this knowledge is more available. Recent work by Hoffman et al. using latent semantic analysis of text corpora suggests that abstract concepts actually tend to occur in a wider variety of semantic contexts than concrete words (Hoffman et al., 2011). However, high contextual variability is also associated with reduced distinctiveness of meaning (Hoffman et al., 2013), which presumably makes retrieval of associations less automatic in the case of abstract concepts (Schwanenflugel, 1991). The greater activation of conceptual hub nodes by concrete concepts is therefore consistent with the idea that activation of these nodes reflects the overall intensity of associated concept activation rather than just their sheer number.

Word frequency is another variable correlated with number and strength of associations (Nelson & McEvoy, 2000). Frequency of use is an approximate indicator of the familiarity of a concept (Baayen, Feldman, & Schreuder, 2006; Graves, Desai, et al., 2010; Toglia & Battig, 1978) and the variety of contexts in which it is used (Adelman, Brown, & Quesada, 2006; Hoffman et al., 2011). Frequency was positively correlated with the number of semantic features subjects produced in a feature listing procedure (McRae, Cree, Seidenberg, & McNorgan, 2005). Several studies (Carreiras, Riba, Vergara, Heldmann, & Münte, 2009; Graves, Desai, et al., 2010; Prabhakaran, Blumstein, Myers, Hutchison, & Britton, 2006) have now reported activation of conceptual hub nodes (angular gyrus, posterior cingulate gyrus, and dorsomedial prefrontal cortex) as a function of increasing word frequency (see Fig. 2, lower right). Assuming that words with higher frequency of use automatically activate a larger number of associations, this result is consistent with the aforementioned word-pseudoword, familiar-unfamiliar name, and concrete-abstract effects, all of which can be accounted for by a common underlying mechanism (i.e., relative differences in the overall intensity of activation of associated concepts).

Note that these modulatory effects are strictly speaking “supramodal” in the sense that they are not related to any particular sensory, motor, or affective content, thus it is unclear how they could be explained in terms of modal representations. Vigliocco and colleagues (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011; Vigliocco et al., 2014) have pointed out a correlation between abstractness and affective content, but this correlation would explain only activation differences favoring abstract words, not the converse. Whereas associative networks of high-level CCRs provide a unified account of all of these phenomena, it is unclear how theories that deny or minimize the role of such representations (e.g., Barsalou, 1999; Gallese & Lakoff, 2005; Martin, 2007; Prinz, 2002) can account for any of them.

A closely related modulatory effect is predicted for stimulus contrasts involving different levels of compositionality. “Compositionality” refers here to the degree to which a combination of concepts expresses a coherent meaning. The word strings below, for example, illustrate different degrees of compositionality:

  1. (1)

    the man on a vacation lost a bag and a wallet

  2. (2)

    on vacation a lost a and bag wallet a man the

  3. (3)

    the freeway on a pie watched a house and a window

  4. (4)

    a ball the a the spilled librarian in sign through fire

In (1), the constituent concepts can be combined to represent a semantically coherent, plausible situation, and the lawful syntactic structure assists the formation of this representation by indicating thematic roles. In (2), the same constituents are present but without a supporting syntactic structure. In (3), thematic roles are clear from the syntax, but the constituents have no semantic relationship to a common theme that would enable the construction of a coherent situation. In (4) there is no clear thematic relationship among the constituents and no syntactic cues to indicate thematic roles.

The importance of compositionality is that it permits a wide range of additional associations to be activated. Once the situation depicted in (1) is represented in the conceptual hub network, for example, we can activate representations of how the man in the situation might feel having lost these valuable items, possible scenarios that led to the losses, what repercussions the losses might have, and what actions he might then take. Each of these associated representations can then lead to activation of other relevant associations, such as representations of possible objects that were in the lost bag and wallet, the possible locations of the missing items, and the likelihood that they will be found. Activation of such associated concepts and situations is much less likely to occur in response to (2) because of the relative difficulty in retrieving a coherent representation of the situation in the absence of syntactic cues, although a partial representation might still be possible as a result of interactions between the thematically-related concepts without explicit role assignment. Activation of associated concepts and situations is also less likely in response to (3) because the situation described does not correspond to any plausible real-word event (a freeway cannot be located on a pie, and a freeway cannot watch something), although the combination of freeway, house, and window might evoke a partial representation of a house situated near a freeway. Similarly, string (4) might conceivably activate a partial representation of a fire in a library, but the absence of a clear theme linking all the constituents and the lack of thematic roles would likely result in a rather weak and noisy representation.

Compositionality-related modulation of neural activity in the conceptual hub network has been demonstrated across several levels of linguistic complexity. At the simplest level, semantically related word pairs have been shown in multiple studies to produce stronger activation of conceptual hubs compared to semantically unrelated words (Assaf et al., 2006; Graves, Binder, et al., 2010; Kotz et al., 2002; Mashal et al., 2007; Mechelli et al., 2007; Raposo et al., 2006). As pointed out by Raposo et al. (2006), this “semantic enhancement” effect is unexpected based on neural models of priming that predict less neural activity when words are primed by feature overlap or repetition (Buckner, Koutstaal, Schacter, & Rosen, 2000; Copland et al., 2003; Masson, 1995). Unlike repetition priming, however, in which no new information is provided by the second stimulus in a pair, semantically related pairs provide an opportunity for conceptual combination, in which the pair of words now refers to a new concept or situation (Downing, 1977; Gagné & Shoben, 1997; Graves, Binder, et al., 2010; Smith & Osherson, 1984). At the sentence level, conceptual hubs respond more strongly to semantically meaningful, grammatical sentences than to semantically anomalous sentences and word strings (Humphries et al., 2001; Humphries et al., 2006; Kuperberg et al., 2000; Mashal et al., 2009; Obleser & Kotz, 2010; Obleser et al., 2007; Pallier et al., 2011; Stringaris et al., 2007). Fig. 3 illustrates a typical response pattern using sentence and word string conditions like those in examples (1–4) above. Finally, the principle of compositionality can be extended to the level of discourse and narrative text comprehension, which are characterized by representation of multiple situations in thematically related temporal sequences. Just as sentences elicit activation of a wider range of associated concepts than isolated words, connected text is capable of illustrating events in much richer detail and complexity than isolated sentences, thereby eliciting a larger and richer set of associated concepts. As expected, conceptual hubs respond more strongly to text narratives and discourse than to isolated sentences (Ferstl, Neumann, Bogler, & von Cramon, 2008; Fletcher et al., 1995; Homae, Yahata, & Sakai, 2003; Xu, Kemeny, Park, Frattali, & Braun, 2005; Martín-Loeches, Casado, Hernández-Tamames, & Álvarez-Linerad, 2008; Yarkoni, Speer, & Zacks, 2008).

Fig. 3
figure 3

Activation of the conceptual hub network by sentence compositionality effects. The map shows areas activated by a contrast between semantically and syntactically coherent sentences (Sem+ Syn+), exemplified by item (1) in the text, and semantically random word strings (Sem- Syn-), exemplified by item (4) in the text. Graphs show activation levels (in arbitrary units of BOLD signal change relative to the “resting” baseline) for the four conditions exemplified by items (1–4) in the text: coherent sentences (Sem+ Syn+), thematically associated word strings (Sem+ Syn-), semantically random sentences (Sem- Syn+), and semantically random word strings (Sem- Syn-). A graded response is observed reflecting varying levels of compositionality, weighted more toward semantic than syntactic structure. Note. Adapted with permission from Humphries et al. (2006)

To summarize, the hypothesis that neural activity in conceptual hub areas reflects activation of associated networks of CCRs accounts for a wide range of empirical data. At the simplest level, this hypothesis explains effects of lexicality, familiarity, concreteness, and frequency observed in single word studies. With more complex conceptual structures, the same basic mechanism accounts for successively greater activation by sentences and phrases relative to unrelated word strings, and connected text relative to unrelated sentences. Finally, the same principles can be applied to account for the “spontaneous” activity that occurs in these regions during the conscious “resting” state. This state is now generally recognized to include rich and dynamically changing conceptual content in the form of mental representations of situations pertaining to the past, present, and future (Andreasen et al., 1995; Andrews-Hanna, 2012; Antrobus, 1968; Binder et al., 1999; McKiernan, D’Angelo, Kaufman, & Binder, 2006; Pope & Singer, 1976; Smallwood & Schooler, 2006). The adaptive and other intrinsic properties of such representations have made them an independent focus of study, but even such complex mental representations must arise from simpler neurobiological processes. The proposal offered here is that the conceptual content of these representations arises through activation of associated combinations of CCRs, the conceptual building blocks for representing situations in conscious awareness.

Summary

I have argued for the importance of a type of abstract conceptual representation derived from convergences of information at crossmodal levels. High-level CCRs capture broad conjunctions of inputs and retain variable amounts of experiential information content, thus they are not equivalent to amodal symbols. Broadly conjunctive conceptual representations perform an essential ‘chunking’ function that is useful for capturing taxonomic similarity structure, making possible thematic association, and enabling situation building in conscious awareness, three ubiquitous conceptual processes that seem difficult to explain using purely modal representations. The neurobiological importance of abstract CCRs is supported by empirical evidence for a network of high-level convergence zones (conceptual hubs) whose neural activity depends on the general associative richness (i.e., meaningfulness) of stimuli but not on the presence or absence of particular modal sensory-motor content. The need for abstract conceptual representations has been questioned by some proponents of a pure embodied knowledge view, perhaps in part as a reaction to traditional nonbiological, symbolic models of conceptual processing. While some versions of embodiment theory explicitly recognize the need for conjunctive representations (Barsalou, 1999; Damasio, 1989; Simmons & Barsalou, 2003), the computational advantages of high-level, supramodal conjunctions and the proportion of cortex devoted to their processing are often underestimated. The theory promoted here is that neural representations at different levels of abstraction contribute to conceptual knowledge in different ways. Whereas modal sensory, motor, and affective representations serve to ground concepts by enabling reference to the external world, abstract CCRs enable associative and generative processes that support a range of mental simulation, recall, deduction, prediction, and other phenomena dependent on the representation of situations. These associative and generative processes represent a large component of everyday conceptual cognition and depend on a large, distributed, dedicated brain network.