Introduction

As the topological and dynamic properties of the human connectome continue to be elucidated (see the December 2013 issue of Trends in Cognitive Sciences), there is mounting interest in complex network models of conceptual knowledge, such as those proposed by Meyer and Damasio (2009), Binder and Desai (2011), Lambon Ralph (2014), and Reilly et al. (2014). These models differ in nontrivial ways, but they all assume that a flexible, multilevel architecture is needed to accommodate the available data. The lowest level consists of modality-specific systems for perception, action, and emotion. Within each of these systems, feature patterns that recur across experiences involving similar entities or events are extracted and stored in long-term modality-specific memory. These patterns constitute the “embodied” content of concepts, but they are not able to support all aspects of conceptual knowledge. This is because high-level convergence/divergence zones, also referred to as hubs, are needed to serve such functions as capturing cross-modal correspondences, identifying categorization criteria that transcend surface similarities, distinguishing between typical and atypical members of categories, and coordinating top-down activations of modality-specific features during online processing.

Importantly, these sorts of theoretical frameworks are well-suited to handle variability in conceptual grounding effects—i.e., in behavioral and neural markers of the retrieval of modality-specific representations during conceptual processing. Lebois et al. (in press) point out that such variability is far more common than often assumed and that even the most entrenched features of concepts are not always activated in an immediate, automatic manner (see also Willems & Casasanto, 2011; Willems & Francken, 2012; Zwaan, 2014). In the same vein, Binder and Desai (2011, p. 531) make the following remarks about their own model of conceptual knowledge: “All levels are not automatically accessed under all conditions. Rather, this access is subject to factors such as context, frequency, familiarity, and task demands.… In highly familiar contexts, the schematic representations are sufficient for adequate and rapid processing. In novel contexts or when the task requires deeper processing, sensory-motor-affective systems make a greater contribution in fleshing out the representations.”

It also is noteworthy that when words enter the architecture, they sculpt it in many fine-grained ways, for they are essentially language-particular coding devices that have been culturally designed to package concepts for communicative purposes (Malt and Majid, 2013; Tomasello, 2014). Extending the original ideas of Paivio (1986), some researchers have proposed that the statistical co-occurrence patterns of word-forms across discourses can give rise to a “disembodied” form of conceptual knowledge, and a number of computational studies have shown that, on the basis of such associations among word-forms, it is possible to model a variety of psycholinguistic phenomena, including priming effects, sentence completions, ambiguity resolution, and the extraction of gist from texts (Burgess & Lund, 1997; Landauer & Dumais, 1997; Griffiths et al., 2007; Jones & Mewhort, 2007). At the same time, however, there is growing agreement that, in accord with Binder and Desai’s (2011) view, when tasks require detailed, meticulous conceptual processing about concrete entities and events in the world, it is usually helpful, if not necessary, to draw upon one or more of the modality-specific systems that subserve the most relevant embodied representations (Barsalou et al., 2008; Simmons et al., 2008; Dove, 2011; Louwerse, 2011; Andrews et al., 2014).

The purpose of this brief paper is to demonstrate the explanatory power of these kinds of hybrid, pluralistic models by focusing on the meanings of verbs. In an influential analysis, Levin (1993) grouped more than 3,000 English verbs into roughly 50 classes and 200 subclasses, most of which cover a wide range of semantic fields in the enormous conceptual realm of action. To take a few examples, there are verbs of body-internal motion, like fidget, twitch, squirm, wiggle, sway, rock, etc.; verbs of assuming a position, like crouch, kneel, stoop, lean, slump, sprawl, etc.; verbs of gesturing with specific body parts, like nod, shrug, point, wave, squint, wink, etc.; verbs of ingesting, like chew, chomp, munch, nibble, gobble, devour, etc.; verbs of exerting a force, like press, push, shove, pull, tug, yank, etc.; and so on and so forth. If one adopts a broader perspective that encompasses the roughly 6,000 languages currently spoken around the globe, one finds that although there are some universal properties of verb meaning (Van Valin, 2006), there is also a great deal of diversity (McGregor, 2002; Levinson & Wilkins, 2006; Filipovic, 2007; Majid et al., 2008; Malt et al., 2008). This is nicely illustrated by the semantic field of manner of motion. In English, this complex psychological space is intricately partitioned into distinct categories by more than 100 verbs that vary in terms of visual pattern, motor pattern, intention, emotion, and social significance, leading to numerous clusters of lexical items that encode subtly different kinds of rapid motion (e.g., dash, scurry, scramble, sprint), leisurely motion (e.g., amble, drift, stroll, mosey), furtive motion (e.g., creep, sneak, tiptoe, sidle), smooth motion (e.g., glide, slide, slink, slip), awkward motion (e.g., limp, lurch, stagger, stumble), etc. (Levin, 1993; Slobin, 2000). The meanings of some of these verbs are so specialized that it is hard to find equivalents in other languages; and conversely, other languages often have unique motion verbs that do not match any individual words in English, some examples being gulukudu ("rush in headlong") in Zulu, widawid ("swinging the arms while walking") in Ilocano, and tyôko-maka ("move around in small steps") in Japanese (Slobin, 2006).

How are such concepts represented in the brain? I will argue that, in keeping with the kinds of frameworks mentioned above, the idiosyncratic semantic features that distinguish between action verbs within the same class are subserved by modality-specific cortical systems. To be clear, I am talking about features such as those that allow one to determine that strut is more like sashay than stroll, that slap is more like spank than jab, that hack is more like chop than carve, and that twist is more like bend than rip. As shown by Kemmerer et al. (2008), making these sorts of subtle similarity judgments engages widely distributed cortical regions that have been independently associated with performing and perceiving actions, and it is reasonable to suppose that these neural response patterns reflect, in part, the retrieval of embodied representations that are inherent components of the verb meanings and that facilitate the precise conceptual comparisons required by the task. Crucially, however, there is increasing evidence that these detailed semantic components, and the corresponding modality-specific brain regions, are not always activated in an immediate, automatic fashion, nor are they essential for accomplishing all tasks. On the contrary, there is a substantial amount of variability regarding when, how, and why they are recruited. To elaborate these points, I will concentrate on just one particular aspect of verb meanings, namely their motor features, which are hypothesized to be represented in the precentral motor cortices. Before commencing, however, I must make a caveat: Due to space limitations, I will restrict the discussion mostly to premotor and primary motor regions. Hence, I will not address either Broca’s area or the inferior parietal lobule, even though both of these regions also have been implicated in some of the motor aspects of action concepts.

Activation patterns

If it is the case that the motor features of verb meanings are represented in the precentral motor cortices, one would expect that these regions often are engaged during the processing of action verbs. Consistent with this prediction, many fMRI studies have shown that tasks such as reading action verbs (Hauk et al., 2004), hearing action verbs (Raposo et al., 2009), making semantic similarity judgments about action verbs (Kemmerer et al., 2008), and distinguishing action verbs from nonwords (de Grauwe et al., 2014), do tend to evoke significant responses in premotor and/or primary motor areas. Moreover, these responses often occur in a somatotopic manner—i.e., in such a way that verbs for leg/foot actions (e.g., stomp) engage leg/foot areas, verbs for arm/hand actions (e.g., grab) engage arm/hand areas, and verbs for face/mouth actions (e.g., bite) engage face/mouth areas (for reviews see Pulvermüller, 2005, 2013; Kemmerer & Gonzalez Castillo, 2010). Several studies have included functional localizer scans to verify that some of the precentral motor areas that are ignited when participants process body-part-specific action verbs also are ignited when they execute correspondingly body-part-specific movements. There are even a few hints that the laterality of the neural responses to verbs for unimanual actions (e.g., scribble) varies according to the handedness of the participants, such that mainly left-hemisphere motor areas are engaged in right-handers, whereas mainly right-hemisphere motor areas are engaged in left-handers (Willems et al., 2010a). These results suggest that the distinctive motor features of verb meanings may be coded in the precentral motor cortices in ways that reflect individual differences in how the designated types of actions tend to be performed (for additional data consistent with this view see Beilock et al., 2008; Lyons et al., 2010). Further research is needed, however, to elucidate exactly how the neural coding of the motor features of verb meanings relates to the neural coding of not only the execution, but also the observation and imagination, of the matching kinds of actions within the frontal lobes (for steps in this direction see Willems et al., 2010b; Moody-Triantis et al., 2014; Rueschemeyer et al., 2014).

One complication that has emerged in this field of inquiry is that when the verb-induced activation peaks from multiple studies are compared, it can be seen that while they do tend to cluster in a manner that roughly resembles the layout of the classic motor homunculus, there is still a great deal of variability (Kemmerer & Gonzalez Castillo, 2010; de Zubicaray et al., 2013). It is possible, however, that this simply reflects the complex anatomofunctional organization of the precentral motor cortices. For instance, the frontal motor system in the macaque brain appears to be parcellated not only in terms of somatotopy, but also in terms of actotopy—i.e., according to different categories of ethologically important behaviors that require the coordination of multiple joints, such as licking/chewing behaviors, defensive behaviors, reach-to-grasp behaviors, central-space manipulation behaviors, and climbing/leaping behaviors (Graziano & Aflalo, 2007). If the frontal motor system in the human brain is also shaped by both mapping principles, this could influence the distribution of verb-induced activation peaks (Fernandino & Iacoboni, 2010).

Another concern that has arisen is that when the verb-induced activation peaks from multiple studies are plotted in relation to cytoarchitectonically defined premotor and primary motor regions (Eickhoff et al., 2006), many of them fall beyond the boundaries of both areas (de Zubicaray et al., 2013). This worry is mitigated, however, by the fact that the premotor and primary motor regions have also been defined on the basis of functional neuroimaging data (Mayka et al., 2006), and when numerous verb-induced activation peaks are plotted on these maps instead, most if not all of them fall inside the boundaries of both areas (Kemmerer & Gonzalez Castillo, 2010; Kemmerer et al., 2012). Additional work is needed to determine which approach to defining the two regions is more appropriate, as well as how the two regions differentially contribute to representing the motor features of verb meanings.

Finally, and somewhat surprisingly, a recent meta-analysis by Watson et al. (2013) found that although action verbs do tend to elicit significant responses in the left posterior middle temporal gyrus (related to visual motion patterns) and the left supramarginal gyrus (related to action planning), they do not reliably elicit significant responses in the precentral motor cortices. However, the authors acknowledge two nontrivial limitations of their investigation: first, their meta-analytic method—namely, activation likelihood estimation (ALE)—forced them to exclude fMRI studies that examined activity in functionally defined motor regions of interest; and second, as they put it, “if stimuli within or across studies refer to actions executed with different effectors (e.g., leg, arm, face), then the power to detect a spatially coherent effect within somatotopically organized areas will be diminished” (Watson et al., p. 1199). It also is worth noting that the motor features of verb meanings may not always be accessed to the same extent on every occasion but may instead be accessed to different degrees in different situations, depending on factors such as task and context. This possibility was mentioned in the Introduction, and evidence for it is presented below, together with a discussion of its theoretical implications.

Processing dynamics

A number of studies suggest that action verbs ignite somatotopically mapped motor areas very quickly. For example, in an electrophysiological study that used sophisticated source localization techniques, Hauk and Pulvermüller (2004) found that verbs for leg/foot actions, arm/hand actions, and face/mouth actions elicited the expected topographic response patterns approximately 200 ms after stimulus presentation—a point in time that, according to Dehaene and Changeux (2011), is well before the roughly 300 ms threshold of conscious access (see also Shtyrov et al., 2004; Dalla Volta et al., 2014). Furthermore, in an electrophysiological study that involved subliminal presentation of verbs for arm/hand actions, Boulenger et al. (2008b) found that the stimuli modulated the readiness potential (an index of motor preparation) associated with subsequent reaching movements, and also influenced the kinematics of those movements. In addition, several investigations have employed magnetoencephalography to demonstrate that action verbs engage body-part-congruent precentral motor areas with remarkable speed, in some cases as soon as 100 ms after the words can be uniquely identified (Shtyrov et al., 2014; see also Pulvermüller et al., 2005b; Moseley et al., 2013; Klepp et al., 2014; for a critique see Papeo & Caramazza, 2014). Taken together, these findings show that action verbs can trigger somatotopic frontal activity in an apparently automatic manner, without the need for deep semantic processing.

However, there is growing evidence that when action verbs are perceived, the mobilization of the relevant precentral motor cortices is not completely autonomous and independent of higher-level cognitive influences but is instead susceptible to modulation by the task and context (for a review see Tomasino & Rumiati, 2013; see also Kiefer et al., 2012). Here are some examples. As an illustration of task effects, in a TMS study Papeo et al. (2009) found that although the level of activity in hand-related primary motor cortex was enhanced when participants were instructed to think about the motor features of hand-related verbs, it was not enhanced when participants were instructed to count the number of syllables in the very same words. As an illustration of linguistic context effects, in an fMRI study, Raposo et al. (2009) found that although somatotopically mapped motor areas were engaged when participants heard action verbs in isolation (e.g., kick) and in literal phrases (e.g., kick the ball), they were not engaged when participants heard such verbs in idiomatic phrases (e.g., kick the bucket) (for similar results see Aziz-Zadeh et al., 2006, Desai et al., 2013, and Schuil et al., 2013, but for contrary results see Boulenger et al., 2009). Finally, as an illustration of nonlinguistic context effects, in an fMRI study, Papeo et al. (2012) found that not only action verbs but also purely stative verbs recruited the precentral motor cortices in certain circumstances—specifically, when they were encountered after participants first performed a mental rotation task using a motor-oriented rather than a visuospatially oriented strategy.

These studies, among many others (e.g., Sato et al., 2008; Mirabella et al., 2012; Aravena et al., 2012, 2014), suggest that verb-induced motor activation is not a rigid, inflexible affair, but is instead sensitive to attentional and situational factors that we are only beginning to understand. It is essential to realize, however, that the mere fact that there is some variability regarding when and how the motor features of verb meanings are retrieved does not imply that those features are not really long-term components of the concepts or that they are not really subserved by the precentral motor cortices. It simply requires that we move closer to the kinds of models described earlier, in which the activation of modality-specific semantic features is not deterministic but rather conditioned by many factors.

Importantly, this sort of instability is not unique to the motor features of verb meanings but has been found for a variety of other conceptual categories as well (Lebois et al., in press). For example, several neuroscientific studies have shown that when people process nouns for tools, the recruitment of sensory and motor areas is strongly influenced by which semantic properties are emphasized by the context (Hoenig et al., 2008; Van Dam et al., 2012, 2014). There is even evidence that the color features of color words are not always accessed in an obligatory manner. In the classic Stroop paradigm, when people are asked to name the font colors of color words, their reaction times are slower when the two colors are incongruent (e.g., the word green in red font) than when they are congruent (e.g., the word green in green font). This interference effect has traditionally been treated as evidence for automatic semantic access, but as Lebois et al. (in press) point out, it can be reduced or eliminated by varying the proportion of congruent to incongruent trials (Jacoby et al., 2003), by varying the frequency of congruent and incongruent trials for specific word-color pairings (Jacoby et al., 2003), by coloring a single letter in the color word instead of the whole word (Besner et al., 1997), and by priming the notion of dyslexia (Goldfarb et al., 2011). It also can be diminished or abolished by the post-hypnotic suggestion that the words are meaningless symbols (for a review see Lifshitz et al., 2013; for an alternative perspective see Augustinova & Ferrand, 2014). Surely, however, the lack of a consistent interference effect in the Stroop paradigm does not imply that the color features of color words are not genuine components of the meanings. By the same token, the discovery that the motor features of action verbs (and tool nouns) are not always accessed in the same way does not imply that those features are not genuine components of the meanings. Instead, the sorts of findings discussed here suggest that it may be prudent to abandon the traditional but rather simplistic assumption that all aspects of concepts are reliably retrieved in an invariant, impulsive fashion, and begin to explore in greater detail the more nuanced view that, as proposed by the kinds of theories outlined in the Introduction, different properties of concepts may be accessed to different degrees and with different time courses across different situations.

Functional relevance

It often is argued that if the precentral motor cortices subserve the motor features of verb meanings, altering the operations of these areas should affect comprehension. Support for this prediction comes from studies that have used either trancranial magnetic stimulation (TMS) or the lesion method. For instance, Pulvermüller et al. (2005a) delivered single TMS pulses to specific motor sites 150 ms before the onset of words and found that stimulation of a leg/foot site led to faster recognition of leg/foot-related verbs than arm/hand-related verbs, whereas stimulation of an arm/hand site led to faster recognition of arm/hand-related verbs than leg/foot-related verbs. TMS also can be applied repetitively to disrupt cortical computations, and in a study that adopted this kind of approach, Gerfo et al. (2008) showed that targeting an arm/hand site slowed down reaction times for morphologically transforming arm/hand-related verbs but not abstract verbs (see also Repetto et al., 2013; Kuipers et al., 2013). Turning to lesion studies, Kemmerer et al. (2012) reported an experiment in which 226 brain-damaged patients, most of whom suffered from strokes in varied sectors of the left and right hemispheres, were administered six tasks that probed their conceptual knowledge of actions in both verbal and nonverbal ways. The majority of items involved arm/hand-related actions, and analyses of deficit-lesion relationships revealed that impairments were most reliably and specifically associated with damage in just a few left-hemisphere areas, one of which was an arm/hand-related portion of the precentral gyrus. It is also noteworthy that the precentral motor cortices gradually degenerate in motor neuron disease (a.k.a. amyotrophic lateral sclerosis), and patients with this pathology have significantly worse understanding of action verbs than object nouns (Bak & Hodges, 2004; Hillis et al., 2006; Grossman et al., 2008). Similarly, Parkinson’s disease interferes, albeit indirectly, with the operations of the precentral motor cortices, and it has been argued that patients with this disorder have selectively impaired appreciation of action verbs (Boulenger et al., 2008a; Fernandino et al., 2013; Ibáñez et al., 2013).

At the same time, however, some studies either suggest or allow for the possibility that the precentral motor cortices are not always necessary for understanding action verbs. For instance, Arévalo et al. (2012) reported an experiment in which 27 left-hemisphere stroke patients were asked to judge whether a given verb correctly described a given picture of a leg/foot action, arm/hand action, or face/mouth action. Although many of the patients had lesions that included some precentral motor areas, significant correlations were not found between impaired performance on body-part-specific action categories and damage to the corresponding body-part-specific motor areas (see also Maieron et al., 2013). Furthermore, in the lesion study by Kemmerer et al. (2012) described above, it is possible that the patients who failed the tasks did so not because of their tendency to have damage in left precentral arm/hand-related motor areas, but rather because of their tendency to have damage in one or more of several other left-hemisphere regions that have been associated with action concepts—specifically, the inferior frontal gyrus, the supramarginal gyrus, and the posterior middle temporal gyrus (see also Urgesi et al., 2014). In addition, even though there is, as indicated above, some evidence that Parkinson’s disease disrupts the comprehension of action verbs, the impairments in these patients are often mild and may affect non-action verbs as well (Kemmerer et al., 2013; see also Da Silva, Machado, Cravo, Parente, & Carthery-Goulart, 2014).

On the other hand, while it may be the case that the motor features of verb meanings are not always essential for comprehension, this does not invalidate the hypothesis that those features are genuine components of the relevant action concepts. After all, as noted earlier, it may be possible to achieve a relatively high level of accuracy on some tasks by drawing mainly on other sources of information. For example, determining that pummel is more like punch than prod normally engages not only arm/hand-related precentral regions that presumably underpin the unique motor specifications of the three verbs, but also posterior middle temporal regions that presumably underpin the unique visual motion specifications of the three verbs (Kemmerer et al., 2008). But even though the former features most likely facilitate the conceptual comparison process, it is certainly possible that an accurate decision could be reached by relying mostly or even entirely on the latter features. Indeed, this may be part of the explanation for why patients with Parkinson’s disease are able to perform the task as accurately as healthy participants (Kemmerer et al., 2013). Likewise, consider the opening lines of the short story by Richard Bausch called Nobody in Hollywood: “I was pummeled as a teenager. For some reason I had the sort of face that asked to be punched.” Again, one’s understanding of the designated events is no doubt deepened by retrieving both the motor and the visual components of the verb meanings, but one could probably achieve a moderate level of comprehension by only accessing the visual components, and a more superficial level of understanding might be feasible by only accessing either the pertinent crossmodal convergence/divergence zones or the statistical co-occurrence patterns of the word-forms.

Still, the kinds of flexible, multilevel models that we have been considering are not so plastic that they can accommodate any type of result. For instance, they predict that if certain experimental tasks were designed so that they necessarily required access to specifically the motor features of verb meanings, performance would be significantly affected by either enhancing or disrupting the operations of the precentral motor cortices. The construction of such tasks is therefore an important methodological challenge for future research.