Introduction

Twenty-five years have passed since Marc Jeannerod used the simple behavior of grasping objects to create a model system for understanding goal-oriented action (Jeannerod 1984). By carefully documenting the detailed kinematics of a constrained yet ecologically robust behavior, his laboratory was able to decompose prehensile actions into fundamental components (Jeannerod 1986). Distinguishing components of an action such as reach, grasp and their primitives led to a conceptualization of action goals as cognitive representations. A key insight was that the underlying control processes were modular (Jeannerod 1994b). The modularity was supported by lesion studies in patients demonstrating double dissociations between object perception and action with objects, foreshadowing the coming revolution in cognitive neuroscience (Jeannerod 1988, 1994a). Within only a few years, Jeannerod’s ideas led to the characterization of a putative neural architecture of how the brain implements prehension based on a synthesis of behavior, physiology, anatomy and computational developments (Jeannerod et al. 1995). This architecture continues to be sustained and improved by a flood of experimental evidence, with over 50 new publications each year on the cognitive neuroscience of prehension. This review focuses on the leading edge of this flood, examining how recent experimental evidence from just the past few years has enhanced or modified how we might conceptualize prehension from a cognitive perspective, sensitive to evidence of how prehension is instantiated at a neural level. As such, the review serves as an “update” limited to recent results, rather than as a historic or detailed review as others have recently done (Castiello 2005; Tunik et al. 2007; Castiello and Begliomini 2008). Emphasis is placed on studies of reach and grasp, rather than all goal-oriented behavior, and on methods drawn from cognitive neuroscience. From this, it is possible to identify some points of opportunity, where data is sparse and uncertainty high. The review begins with a consideration of how precision grasps are selected and transformed into motor commands within premotor networks. Then, the complement to this is pursued. How is action relevant information generated from vision or the touch of an object? While there is overwhelming evidence for two visual streams for processing grasp relevant objects and it is possible to continue doing experiments dissociating these streams, the new excitement is in determining how parallel perceptual networks within parietal cortex, along with the ventral stream, are connected and share information to achieve common motor goals. The on-line control of grasping action is then considered within a state estimation framework that can accommodate a wealth of new behavioral results that accentuate the critical role of haptics in prehension. An additional development is the recognition that state estimation includes the maintenance of a desired goal and this in turn affects how internal models of objects, grip and load forces are used. The sections close with speculations about what constitutes an action goal and how they might be represented in the brain.

Action vocabularies

Humans and monkeys are capable of an extraordinary range of possible solutions for grasping (Macfarlane and Graziano 2009). Many theories that consider how the brain can create such a vast repertoire propose that the nervous system retrieves knowledge or programs about prototype grasping actions and use these to generate anticipatory motor commands (For early physiological evidence see Rizzolatti et al. 1987a, 1990). There may be a vocabulary for actions that provides a basis set that could be combined to approximate the full range of grasp behaviors. Some of the behavioral approaches to test this hypothesis are to determine what the content of an action vocabulary is, what the selection processes are within this vocabulary and how elements of a vocabulary are implemented as actual motor commands. This last step can be viewed as a transformation process from high- to low-level goals that are ultimately expressed primarily in motor cortex.

In precision grasping, a detailed vocabulary is needed to shape a multifinger, roughly prismatic shaped, opposition space to match an object’s shape (Ansuini et al. 2006) and with an appropriate distribution of desired multidigit forces (for review, see Zatsiorsky and Latash 2008). A classic approach for modeling this process is to model the thumb and fingers as double pointing actuators (Smeets and Brenner 2001). However, recent studies raise doubts whether this type of opposition space is sufficient (van de Kamp and Zaal 2007). Fingers need to do a lot more than just point to particular locations if they are to achieve sufficient dexterity for handling objects. Any action vocabulary must reflect additional task requirements, such as matching to object shape, generating an appropriate grip force and possibly even scaling lift forces. Evidence that these additional features are part of an action vocabulary can be demonstrated by showing they are anticipatory. They emerge behaviorally during transport toward an object or at initial contact with an object prior to on-line manipulation. There is clear evidence that finger preshaping is anticipatory and appears during the reach phase to a single object. Critically, as the goal of the task changes, so does the finger shaping, prior to object contact, establishing that anticipation reflects not only the object, but the action goal (Ansuini et al. 2008). Even proximal arm muscles will show EMG changes consistent with anticipatory feedforward planning that is object specific (Martelloni et al. 2009). Another anticipatory component that may be part of an action vocabulary is grip force scaling. This is nicely demonstrated in deafferented patients, who lack proprioceptive sensory input. They are able to generate appropriate grip force scaling with familiar objects during precision grasping (Hermsdörfer et al. 2008). Indirect evidence for at least a modest influence of cortical input to grip force knowledge can be inferred from patients with middle cerebral artery stroke. They show a very mild loss of anticipatory grip force control, even ipsilateral to the lesion (Quaney et al. 2009). Patients with cerebellar agenesis or other pathology show preserved size-force coupling, suggesting that the cerebellum is not needed for action selection (Rabe et al. 2009).

Another important system that could possibly play a direct role in grasp selection or planning is the basal ganglia (BG). There is overwhelming fMRI and PET evidence for the BG to be involved in force scaling across a range of tasks (for detailed review, see Prodoehl et al. 2009). Some of this may be related to on-line control of force or force pulses during each movement (Vaillancourt et al. 2004, 2007; Spraker et al. 2007; Tunik et al. 2009). In patients with degeneration of the BG due to Parkinson’s or Huntington’s disease, there can be marked abnormalities of force/load coupling, high variability in static force and higher than needed grip forces (Prodoehl et al. 2009). Of note, anticipatory grip force scaling can be preserved in Parkinson’s disease, suggesting that the selection of grip force at the time of initial movement planning is a cortico-cortical process that is not basal ganglia dependent (Weiss et al. 2009). An alternative hypothesis is that the basal ganglia are needed for the formation of experience-dependent short-term sensorimotor memories. Object knowledge acquired in one trial could then be used in the planning of the next trial, as has been shown recently (Weiss et al. 2009). This short-range adaptation might require a reward-based learning mechanism tied to the context or motor goal (for review, see Shadmehr and Krakauer 2008; Shadmehr et al. 2010).

Returning to this problem of selecting among possible actions, if an action vocabulary consists of distinct elements, then choosing among them might be subject to interference by distracters. This is observed behaviorally, where flanker objects will slow the reach, increase finger abduction and alter thumb flexion. However, the individual fingers do not shape into a new pattern that matches the specific shape of the distracters (Ansuini et al. 2007). This could be because the distracters add noise to the selection process or a default grasp of lower specificity is chosen. Another experimental strategy for understanding the content of stored action representations is to identify conditions that constrain what can be selected or anticipated in advance. What information is not stored by the nervous system in anticipatory planning? This has been tested in behavioral transfer studies, where the overall gain of finger forces for a familiar object is readily transferred between hands, but individuated finger grip and lift forces needed to control object dynamics are not transferred (Albert et al. 2009). Even explicit knowledge about an object does not improve on this limitation (Lukos et al. 2008). Rather, individuated grip and lift (as opposed to initial grasp) requires direct on-line experience with the object.

To be useful, an action vocabulary needs to learn from prior experience for generalization to new actions. Are there features of an object that are particularly influential in determining grip selection on future trials? Grasp memory is strongly influenced by object size, which sets a strong constraint on expected weight (Cole 2008). Density also has a significant albeit weaker influence. A putative action vocabulary that involves memory for grip force scaling can be disrupted by causing transcranial magnetic stimulation (TMS)-induced virtual lesions of the ventral premotor cortex during object exposure (Dafotakis et al. 2008).

From goals to movements

It has long been known that grasp-related information is represented by single neurons, particularly in the posterior part of area F5 of ventral premotor cortex in monkey (Rizzolatti et al. 1987b). These “canonical” neurons respond to many aspects of a grasp action including high-level action goals, consistent with the presence of a motor vocabulary. In remarkable new work in monkeys, it was shown that multiunit activity recorded in ventral premotor cortex (and also in dorsal premotor cortex) is very accurate (89%) at predicting the current grasp action. This level of accuracy was not observed at the single neuron level or with local field potentials, suggesting that small-sized networks may be particularly important for generating a particular grasping action (Stark and Abeles 2007; Stark et al. 2007a).

At some point during motor planning, a particular set of motor commands must be generated, based on higher-level representations that define an action goal. While it is possible that motor planning areas such as ventral premotor cortex (PMv) or dorsal premotor cortex (PMd) could do this directly by shaping activity of spinal cord motor neurons, recent studies show that direct spinal-motor connections are quite sparse, supporting a view that planning areas shape behavior elsewhere (Boudrias et al. 2009). Growing evidence shows that motor cortex is the critical final common influence on spinal cord. This is supported by studies that reversibly inactivate motor cortex in monkeys. In this condition, there is a pronounced loss of digit responses induced by microstimulation of PMv (Schmidlin et al. 2008). In other words, the PMv influence on finger shaping is through interactions with motor cortex (for review, see Chouinard 2006). Tight functional coupling between PMv and motor cortex can be demonstrated by applying a TMS conditioning stimulus to human PMv and then measuring the effect on the motor evoked potential (MEP) generated by second TMS pulse applied to motor cortex 6–8 ms later. During a precision grasp, this will facilitate an MEP response, whereas at rest it will actually inhibit the MEP response from motor cortex (Davare et al. 2008). Interestingly, power grip had no effect on the MEP, suggesting that PMv–motor cortex interaction is not required for this more basic type of grasp. A related TMS virtual lesion study shows that the information passed between PMv and motor cortex includes the sequential recruitment of appropriate hand muscles (Davare et al. 2006).

Object properties also appear to play a major role in determining motor selection within PMv. But is it the object identity that is important or the physical properties of the object that constrain grasp selection? Many studies support the latter interpretation. In a double-pulse TMS experiment, the preconditioning stimulus to PMv modified the MEPs in motor cortex as a function of object geometry. This shows that PMv processes key object properties that are then implemented through motor cortex (Davare et al. 2009). In fMRI studies performed in monkeys, the ventral premotor cortex is responsive to 3-D features of objects, suggesting that grasp relevant object information forms part of a shared representation in this area (Joly et al. 2009). Consistent with this interpretation, direct recordings within monkey PMv show non-selective, increasing local field potentials during object viewing that become selective during the grasp and hold phase of a precision grasp task (Spinks et al. 2008). In a single neuronal recording study comparing response properties between area F5, a component of ventral premotor cortex and motor cortex, neurons in F5 show object tuning properties that become more selective earlier than motor cortex (Umilta et al. 2007). Human fMRI adaptation studies that show that PMv adapts to repeated exposure to a particular grasping axis, but not to a particular object (Kroliczak et al. 2008). In other words, PMv is more closely linked to the specific motor solutions tied to an object than an object per se. This is further amplified in the studies of size weight illusions or contrasts. For example, in a human fMRI study of object grasping and lifting, there was fMRI adaptation in PMv with repeated exposures to objects of same density (Chouinard et al. 2009). Density estimation is essential for determining the initial grip and lift forces.

While PMv has been associated with selection of specific grip postures (Raos et al. 2006), prehension also requires integration of reach and grasp. A focus on just grasp-related activity at the single neuron level provides an incomplete measure of what may be planned in PMv. Ventral premotor cortex contains neurons that are specific for either reach direction or grasp type, and these are highly intermixed with any selectivity for proximal and distal movements (Stark et al. 2007a). This is consistent with a mosaic-like topological model where grasp/reach functions and joint/body are intertwined, allowing for enormous combinatorial variation. Multiunit activity in PMv that considers both reach and grasp selective activity can be used as a very strong predictor of prehension, demonstrating the emergence of composite representations between reach and grasp (Stark et al. 2008). It remains to be seen whether the differential distributions between proximal–distal and reach-grasp neurons within premotor areas could provide an anatomic substrate for the adaptive strategies used in patients with stroke and hemi-cerebral palsy. In general, there is increased reliance of proximal musculature in reach-grasp tasks (Domellöf et al. 2009).

The role of PMd in precision grasping is far less understood than what has been mapped in PMv. Functional comparisons of single neurons in PMv and PMd during precision grasping are rare. Like PMv, dorsal premotor cortex contains neurons that are specific for either reach direction or grasp type (Stark et al. 2007a). PMd neurons modulate in relationship to object identify as well as object size (independent of object vision). Grasp forces are specifically manipulated and modulated in a small percentage of PMd neurons (Hendrix et al. 2009). PMd also demonstrates composite representations for reach and grasp based on analysis of multiunit activity in monkeys (Stark et al. 2008). Virtual lesions of PMd in humans will disrupt the coupling of grasp and lift (Davare et al. 2006). An important distinguishing feature between PMd and PMv is that the former may play a stronger role in associating symbolic features with object properties. TMS to PMd in humans will disrupt the ability to associate object mass with symbolic information, reflected in abnormal lift forces (Nowak et al. 2009a).

In considering the role of motor cortex in prehension more closely, there is accumulating evidence that different neurons in motor cortex capture different parameters of movement in a precision grasping task, and it is the population encoding across this multidimensional space that enables complex behavior (Stark et al. 2007b). Action goals appear to influence response properties across all levels of the motor system including motor cortex. TMS-induced MEPs at motor cortex will vary depending on whether a goal is present or not, demonstrating a remarkable flexibility in the degree to which context will shape the output structure of motor cortex (Cattaneo et al. 2009). In another study, the size of the MEP evoked map of the first dorsal interosseus over motor cortex did not differ for a five or two finger precision grasp (Reilly and Mercier 2008). One interpretation of these findings is that the action goal and underlying map in motor cortex for controlling the grasp are closely related. This close connection may be important for designing optimal therapies to accelerate recovery after stroke. Rather than focusing on elemental movements, there might be better recovery with therapies involving goal-directed behavior. In support of this idea, in a recent animal experiment, a lesion to motor cortex reduced individuated finger movements needed for precision grasping. Dexterity in these animals recovers better with 1 h/day of goal-directed prehensile training (Murata et al. 2008).

Action initiation

Neural systems are required to control the initiation of a precision grasp in relationship to the actor’s needs, motivation and desires. Traditionally, the mesial wall premotor areas have been associated with the initiation of internally generated actions (Kermadi et al. 1997; Deiber et al. 1999). Intraoperative cortical stimulation of mesial wall areas including pre-supplementary motor area and anterior cingulate motor area (CMA) in patients with epilepsy can induce automatic reach and grasp movements (Chassagnon et al. 2008). These areas do not have strong direct connections to spinal motor circuits (Boudrias et al. 2010) and act on movement indirectly. Thus, the effect of stimulation might reflect a “release” of reach- and grasp-specific motor programs stored in other areas including PMv. The processes that select and initiate actions are likely to be embedded in larger-scale circuits that can estimate relative values of alternative behaviors, costs of different behaviors and confidence that an action can be achieved successfully (reviewed in Rushworth et al. 2007).

Two dorsal pathways for grasping

The concept of distinct dorsal and ventral visual streams for processing visual information is well accepted, with ventral temporal and occipital cortex essential to semantic object identification and dorsal occipital and parietal cortex critical for physical interaction with objects (Goodale et al. 1991; Milner and Goodale 1995). The distinctions are nicely captured in a recent fMRI study that showed greater activity in lateral occipital cortex (LOC) of the ventral stream when an object was part of a perceptual task and in the dorsal stream area AIP when the same object was grasped (Cavina-Pratesi et al. 2007). Differences were also shown in another functional imaging study using repetition suppression that dissociated ventral and dorsal streams based on object identify and orientation (Valyear et al. 2006). The separation of the streams appears to take place close to visual cortex. Lesion localization analysis in a well-characterized patient demonstrate that a lesion located in medial occipital/fusiform/lingual gyri is sufficient to cause visual agnosia with preserved visually guided grasping (Karnath et al. 2009).

While the ventral–dorsal stream dichotomy is a useful starting point, it is becoming increasingly apparent that there is more to the story, particularly within parietal cortex (for recent review, see Creem-Regehr 2009). Functional imaging provides a way to survey the entire cortex to identify putative areas involved in reach and grasp. High-resolution 2-deoxyglucose radiography of the brain in monkeys performing reach-to-grasp tasks demonstrates the involvement of AIP, MIP, LIP, VIP, PF, PFG, PG, V6, V6Ad, PGm7 and superior parietal lobule (Evangeliou et al. 2009). This widespread recruitment is also observed in vivo with blood flow studies of awake primates performing a reach-to-grasp task (Nishimura et al. 2007). In other words, almost all subsectors of parietal cortex show some level of involvement in prehension. Similar broad recruitment is found in human fMRI studies. How can this plethora of brain areas, schematized in Fig. 1, be broken into functionally relevant sub-networks? It is increasingly recognized that prehension is supported by two parallel networks in parietal cortex (for reviews, see Johnson and Grafton 2003; Rizzolatti and Matelli 2003). The classic pathway between ventral premotor cortex (F5) and AIP forms one key inferior parietal network, with AIP forming a critical hub that is closely coupled to the adjacent inferior parietal lobule, particularly cortical areas SII and PFG. The second network, discussed in detail later, contains a hub organized around area V6A in the medial wall of parietal cortex, with strong connections to MIP and PMd. The inferior parietal network has neurons that are very similar to those in area F5 in being active during execution of goal based natural actions (Bonini et al. 2010). Indeed, there is increasing evidence for widespread goal-related activity across IPL with some evidence for somatotopic organization (PF mouth, PFG hand, PG arm) (Rozzi et al. 2008). In humans, there is some evidence that this inferior parietal network is lateralized to the left. While AIP activity is mainly contralateral to the hand used for grasping, there is some leftward asymmetry of parietal cortex that is more apparent in right handers (Begliomini et al. 2008; Stark and Zohary 2008). AIP projects strongly to ventral premotor area F5, and new data establishes there are prefrontal areas connections to areas 46 and 12 (Borra et al. 2008).

Fig. 1
figure 1

Anatomic connections of areas V6A and AIP based on tract tracing in non-human primates. These are hubs that define superior and inferior networks within the parietal cortex. The superior network has strong connections to superior parietal lobule and dorsal premotor cortex. The inferior network has strong connections with temporal cortex, SII and the posterior portion of area F5. Anatomic labeling is approximate

The parietal cortex in the region of area AIP has long been known to contain neurons intimately involved in object grasping (Taira et al. 1990). One of the most important new developments in characterizing the inferior parietal network, including area AIP, is the role of stereoscopic vision. Stereoscopic vision, while not essential, leads to faster more accurate grasping. Behavioral experiments comparing monocular and binocular vision during grasp have historically been inconsistent on this point, in part because of the use of small sample sets, in which subjects could more easily remember object attributes. Differences between binocular and monocular vision on grasp precision emerge with larger object sets and are also apparent in persons with long-standing monocular vision, who show a prolongation of the time for the fingers to make final contact around an object (Keefe and Watt 2009; Melmoth et al. 2009). Stereoscopic vision is more important for grasping than pointing (i.e., placing objects), suggesting that there should be a high prevalence of neurons sensitive to binocular rivalry in grasp-related parietal cortex, particularly AIP (Greenwald and Knill 2009). Recent studies replicate and extend early experiments demonstrating that there are neurons in AIP sensitive to stereoscopic features of an object with depth, shape and curvature sensitivity similar to what was originally defined in area TE of inferior temporal cortex. In both areas, the sensitivity is preserved across position in depth and position in the fronto-parallel plane, with maximal selectivity when the stimulus is at the fixation point (Sakata et al. 1999; Durand et al. 2007; Srivastava et al. 2009). The two areas differ in terms of latency, sensitivity to disparity and tuning to object curvature (Srivastava et al. 2009). AIP neurons are less sensitive to sharp edges and show strong monotonic tuning to curvature. The findings suggest that TE neurons are more categorical in their selectivity, whereas AIP neurons are selective to metric properties of a 3D shape. Evidence for 3D shape processing is also captured in fMRI studies of awake behaving macaque monkeys. There is sensitivity to object depth disparity in temporal cortex (area TE) and AIP as well as a premotor area F5A (Joly et al. 2009). In humans, fMRI activity in AIP increases as the grasp precision increases (Begliomini et al. 2007), perhaps because there is increased processing of grasp-relevant object features. Alternatively, this could be due to increased on-line control necessary for higher precision movements. In another fMRI study, object-viewing conditions (monocular vs. binocular, object orientation) were manipulated during grasping. Both LOC and AIP showed increasing activity and also functional connectivity when subjects had to use monocular vision and with steeper slant angle of the object, making grasp more difficult (Verhagen et al. 2008).

Haptic or visual information converge in the inferior network of the dorsal stream and are used to develop grasp relevant information about objects. When reaching to grasp within virtual environments, it is possible to look at the relative contribution of haptic or visual knowledge on grasp planning in subsequent trials. In this setting, visual information describing an object’s features dominates future behavior, with incompatible haptic feedback having little influence on future grasps (Lee et al. 2008). TMS disruption studies also suggest that visual and tactile information are not processed equivalently in the inferior network. To test this, area AIP was disrupted with low-frequency rTMS during the tactile or visual encoding of an object, followed by tactile or visual recognition (Buelte et al. 2008). During the manipulation of objects with the right hand, rTMS over the left anterior intraparietal sulcus (IPS), the putative homolog of monkey area AIP, induced a significant deterioration for visual encoding and tactile recognition, but not for tactile encoding and visual recognition.

The second key parietal network is organized around area V6A in the medial wall of the posterior parietal cortex. The dorsal most portion of V6A has strong anatomic connections to superior parietal lobule, particularly area MIP, and to dorsal premotor cortex, as shown in Fig. 1. It also interconnects with AIP, LIP and VIP, as well as MST, underscoring the relative inter-connections throughout parietal cortex (Gamberini et al. 2009). The superior parietal network was traditionally viewed as a reach-related system. A more precise functional description of the superior parietal network is that it is essential for integrating reaching requirements with a goal-directed grasp. This new interpretation is based on several lines of evidence. The first is the fact that the cortex in the area of V6A has object-sensitive properties. In fMRI, this region is active when graspable objects are present (Maratos et al. 2007). When there are multiple objects present, this region is more involved in their individuation, rather than their identification (Xu 2009). Increasing the number of object distracters during grasping tasks increases activity in this region. This is likely related to either increased aiming requirements or individuation processes (Chapman et al. 2007).

Further support that the superior parietal network is more than just a reach-related system is based on studies that specifically manipulate grasping. In single neuronal recordings in the area V6A of monkeys identify neurons that are sensitive to reach direction as well as grip orientation (Fattori et al. 2009) and hand preshaping (Fattori et al. 2010). In human fMRI grasp studies, both V6 and PMd are active during grasp planning and sensitive to the angle of the object to be grasped and are insensitive to monocular or binocular viewing (Verhagen et al. 2008) The V6A area is also more active when targets are within reach (Gallivan et al. 2009). The area is closely connected to area MST and has motion-sensitive properties as well, perhaps allowing for joint coding of self and target motion (Pitzalis et al. 2010). From this, it can be speculated that the superior parietal network may also be important for grasping during motion, as when a monkey swings from branch to branch.

Unlike the inferior parietal network, response laterality in the superior network (including V6A and caudal IPS) is driven by target location (Stark and Zohary 2008). Receptive fields in V6A are broadly tuned and uninfluenced by stereoscopic vision, consistent with a role for grasping to peripherally placed targets. In human psychophysics experiments, restriction of peripheral vision in normal subjects will reduce performance for both planning and to a lesser degree execution of grasps (González-Alvarez et al. 2007). This could be due to reduced information passing through this superior parietal network. Damage of this network should lead to not only misreaching in the periphery but also errors in grasping that also require reaching. This idea is supported by behavioral experiments in optic ataxia patients, who show grip timing errors with respect to the reach but no grip error when there is minimal reach needed in the task (Cavina-Pratesi et al. 2010). In the latter case, the inferior parietal network is presumably intact for forming grip aperture independent of reach.

One uncharted functional territory that is ripe for study is the role of eye movements in grasp control. There is emerging behavioral evidence that saccades can be closely coupled to aspects of grasping and increase grasp precision. The eyes do not just look at the center of mass of objects but tend toward the index finger (Brouwer et al. 2009). They will also look toward occluded portions of target objects that the index finger will contact (de Grave et al. 2008). But how is this achieved? Somehow, the location of object features must be getting passed to gaze-related areas of cortex. Neurons in gaze-related regions such as area LIP do respond to 2-D objects, albeit with a complicated influence on receptive field properties (Janssen et al. 2008). Direct comparisons of fMRI responses during saccade and prehension tasks are mainly notable for the degree of common recruitment in parietal regions with very few parietal areas showing preferential activation for one task or the other (Hinkley et al. 2009). This may be because the tasks share many functional operations and that classic cognitive subtraction may not be the best way to map parietal function with fMRI. New fMRI sampling such as repetition suppression (Epstein et al. 2008) or analysis with multivoxel pattern classification (Haynes and Rees 2006) may become more effective at identifying these interactions between gaze and grasp.

Connections between dorsal and ventral streams

One of the more intractable problems for the next decade will be to understand how information is shared between the dorsal and ventral streams to achieve common task goals. Significant advance has already emerged from anatomic tract tracing studies. A key discovery was the identification of strong, direct connections between the two perceptual streams. For example, area AIP in the monkey has strong connections with areas in the ventral stream, including the lower bank of the superior temporal sulcus in the region of areas TEa/TEm and the middle temporal gyrus (Borra et al. 2008). Area TE also has strong projections to area 45B in prefrontal cortex (Borra et al. 2010; Gerbella et al. 2010). Connections between the dorsal and ventral streams are also being identified in human diffusion tensor imaging (DTI) maps of white matter tracts. Here too, there is a significant connection between posterior middle temporal gyrus and anterior IPL and posterior middle temporal gyrus with both anterior and posterior IPL. Furthermore, these connections are stronger in the left hemisphere and may provide some of the scaffolding for grasping-related behavior, particularly when involving tools (Ramayya et al. 2010) or accessing semantic knowledge about objects (for review, see Noppeney 2008).

Imaging studies are beginning to identify object features decoded in ventral stream that might influence grasp planning. Using repetition suppression, one study showed that the ventral stream is sensitive to not only high-level object identification but also to lower-level object properties including size or color and to a lesser degree weight and density (Chouinard et al. 2008, 2009). Grip selection is strongly biased toward superficial object features such as shape and less so for features such as density that require some prior knowledge about an object (Cole 2008). The implication is that object identity information, decoded in ventral stream, might provide a way to access prior knowledge about an everyday object’s expected properties and hence influence grasp planning. That said, it remains to be determined which aspects of an object that are defined from prior experience influence grasp planning and which features of an object are mostly managed on-line during object manipulation.

Another way to test how object knowledge in the ventral pathway might influence grasp planning in the dorsal stream is to look for performance deficits in patients with ventral pathway lesions. On simple testing, visual agnostic patients classically demonstrate preserved hand orientation (e.g., orienting the hand when posting a letter in a mail slot) and form normal grip apertures that match an object’s geometric properties, suggesting that the ventral stream is not needed for basic grasp planning (Goodale et al. 1991). However, in natural conditions when real objects such as tools are grasped, additional decisions must be made including whether to use an under- or overhand grip (Rosenbaum et al. 1992). These decisions are determined both by end-state comfort effects and task goals that place higher demands on object knowledge. With visual agnostics, the choice of grip selection (over or underhand) is far less consistent than in normal subjects, although the lower-level kinematics such as grip sizing are normal (Dijkerman et al. 2009).

Ventral and dorsal streams might also interact through memory. It has been proposed that memory buffers in dorsal stream are very limited (Milner et al. 2003). Once an object is no longer visible, the ventral stream may be required to maintain relevant information about the object if it is to be grasped in the dark. This has been tested with visual illusions, based on the assumption that illusion effects are mediated via ventral stream and grasps planned in dorsal stream are immune to illusion effects. With a delay, a grasp might appear to be influenced by an illusion if it were dependent on ventral stream. However, recent work shows that the changes in grasp with a delay are not necessarily due to an illusion effect and alternative experimental methods will be needed to test for memory effects (Franz et al. 2009). One new approach that shows a possible a role of the ventral pathway for supporting grasping when there is a memory load used single pulse disruptive TMS. It was delivered to ventral (LOC) or dorsal (AIP) stream cortex at movement onset, with or without a delay between a brief object presentation and movement onset. In this case, TMS to AIP disrupted grasp planning both at onset and with a delay. In contrast, TMS to LOC only disrupted grasp kinematics in the delayed condition (Cohen et al. 2009). A plausible interpretation is that AIP is used for all grasp planning and thus remains sensitive to the TMS both early and late, whereas the LOC only becomes relevant for grasp planning when there is a delay and object memory is required.

There may be limits in how categorical knowledge can influence grasp planning. Humans are very fast at learning to associate appropriate grasps that reflect size and to a lesser degree weight requirements for particular objects based on shape as well as symbolic features such as color. It is possible that ventral stream is needed for categorical classification and category knowledge could generalize grasping information to new objects via ventral–dorsal connections. However, attractive, this idea is not yet supported by any experiment. One study showed learned associations between color and grasp kinematics did not generalize to new objects of a same category (similar color) (Desanghere and Marotta 2008).

Controlling grasps on-line: state estimation

Object grasping and manipulation rely heavily on the ability of the nervous system to anticipate the consequences of ongoing movements so that fine dexterity can be achieved (Wolpert et al. 1998; Wolpert and Ghahramani 2000). This is particularly apparent when it is possible to make strong predictions about the properties of a familiar object, such as expected weight, texture and default size. Anticipation is also needed in manipulating objects of uncertain physical properties. Anticipation in this case implies that the brain has access to internal models based less on prior knowledge and more on information acquired in real time. That is, it knows something about how the motor apparatus and handled object should respond to motor commands under different conditions. In engineering and robotics, internal models are computational solutions for modifying or adapting motor commands based on both forward and inverse models of either kinematics or dynamics (Lalazar and Vaadia 2008). Whether these computational principles are actually implemented or just metaphors for what the human nervous system does remains uncertain. Regardless, from a cognitive perspective and as shown schematically in Fig. 2, internal models, when integrated into a larger conceptual framework, are extremely useful heuristics to account for many of the findings observed in experiments of goal-directed prehension. In this conceptual framework, objects must be included as intermediaries for accomplishing naturalistic prehension. That is, the objects are not the goals but are used to achieve goals, such as grasping a hammer to pound a nail. Prior knowledge in the brain allows for the selection of appropriate objects (a hammer) and motor programs (hammering) for accomplishing particular goals. Pre-existing motor programs such as hammering allow for task execution but there still needs to be an evaluative process to determine whether the desired goal is being achieved. If a nail was pounded in the dark, one might not know that additional force was needed to drive the nail into a hardwood. State estimation is an additional important component for relating a desired goal with all relevant information about the state of the body as well as the object. Object position, vision, haptic feedback, proprioception and efference copy are the key sources to generate a continuous estimation of the state of the actor and tool. A critical aspect of this functional compartmentalization is that state estimation includes a representation of the desired outcome (Tunik et al. 2007). Failure of the combined object/body state to move toward a desired outcome (as assessed by some form of difference vector) could lead to (1) the modification of an ongoing motor command, by way of internal models, or (2) replanning of the entire motor program if an outcome is predicted not to occur. One can swing the hammer harder to pound a nail or alternatively reprogram the entire task by employing a bigger hammer. Computationally, state estimation and internal models are tightly bound, with state knowledge needed by internal models and the output of internal models needed to update the state estimation. This structure allows for enormous capacity to adapt across a broad range of conditions. If some sources of information are unavailable at a given time, other sources of feedback or prior knowledge are used to update an action. Internal models that incorporate object dynamics will depend on prior knowledge of the object’s physical properties (e.g., the center of mass of a hammer and its swing weight) and to a lesser degree their functional properties. In contrast, action selection relies more on knowledge about an object’s potential functional (e.g., hammering with a shoe) rather than physical properties.

Fig. 2
figure 2

Schematic showing the functional modularity supporting prehensile behavior. Arrows indicate main sources of information sharing. A motor command can be selected and programmed in advanced and adjusted on-line based on state estimation and internal models that also track object information

Recent behavioral studies of prehension support the compartmentalization outlined in Fig. 2. The model suggests that internal models have access to object knowledge from both prior experience and on-line feedback (haptics). In the case where prior knowledge about an object is weak, such as the weight distribution of an object, internal models would rely more heavily on state information provided from on-line feedback. After grasping and lifting a novel object, one might predict that the physical properties of the object become generally accessible to internal models as object “priors”. If so, finger force requirements that were learned with one hand might transfer to the other inexperienced hand. In fact, global scaling parameters of grip force as well as lift forces do transfer (Nowak et al. 2009b), but the grip force distribution across individual fingers does not (Albert et al. 2009). Thus, individuated finger control requires on-line haptic feedback to adjust a refined internal model that can handle internal dynamics of an object.

In the case of reduced peripheral feedback, there should be increased reliance on preprogrammed behavior and delays in on-line control. Grasp execution is sensitive to even minor reductions in feedback used in state estimation. For example, grasp precision decreases if peripheral vision is restricted, even if there is no task relevant information in the periphery (González-Alvarez et al. 2007). Conversely improving the fidelity of feedback with stereovision increases grasp precision (Melmoth and Grant 2006). There appears to be a capacity limitation in state estimation during on-line control. Under some conditions, adding distracter targets in a precision grasp task does not necessarily influence initial planning or action selection, but it can modify on-line behavior (Olivier and Velay 2009). This may be due to an inability to represent multiple action goals and their desired outcomes.

Not surprisingly, TMS disruption to S1 will lead to an increased time spent in generating appropriate lifting forces (Schabrun et al. 2008). The need for haptic feedback is also revealed in patients with chronic peripheral deafferentation (Hermsdörfer et al. 2008). They display increased compensatory grip forces and show a failure in the dynamic scaling of finger forces during object manipulation, consistent with an inability to update state estimation and use internal models. Nevertheless, they can adjust their initial grip force with an object. One way to explain this discrepancy is that the initial grip force adjustments could be updated using an internal model derived mainly from the initial motor commands. However, during manipulation, the handler is faced with a far more difficult dynamical problem that also depends on peripheral feedback to build a reliable internal model.

The use of visual feedback for adjusting grasps can be observed as early as 15 months of age (Carrico and Berthier 2008) and suggests that internal models are largely built on implicit learning mechanisms. In adults, object priors based on explicit knowledge also appear to have only limited access to state estimation and internal models. Explicit knowledge tends to be linked to more superficial features, such as where an object’s center of mass resides. While this can guide where to place the fingers, there is far less explicit knowledge of object dynamics, such as how to apply grip forces to stabilize an object (Lukos et al. 2008). Explicit knowledge about what type of feedback is available for state estimation on subsequent trials has minimal influence on whether or not the system will reprogram a movement to reflect this anticipated change in state (Whitwell et al. 2008).

Haptic or visual feedback is needed for not only for real time control but also to calibrate internal models that are useful on future trials. To test this, haptic feedback can be withdrawn in a virtual reality experiment. This leads to degraded grasps on successive trials. This decalibration can be eliminated if there is intermixing of trials with and without haptic feedback implying that there is an inherent time constant for maintaining haptic calibration (Bingham et al. 2007). The need for calibration to maintain accurate performance over trials is one example showing that haptic and visual reference frames are very dynamic over time. Another example at a finer level of description is the observation that the relative size of peripersonal space centered around the hand can dynamically change as the action unfolds (Brozzoli et al. 2009).

While Fig. 2 shows feedback as a unidirectional source of information in state estimation, this is certainly an oversimplification. Organisms actively manipulate sensors to improve feedback. For example, the eyes will saccade toward the grasping index finger to increase foveation at critical points of object affordance (Brouwer et al. 2009). This is a rich area for further analysis.

There is general consensus that both cerebellum and parietal cortex play critical roles in forming internal models that involve state estimation for tasks such as pointing and grasping (For recent reviews, see Nowak et al. 2007b; Tunik et al. 2007; Andersen and Cui 2009). Many studies establish a role of the cerebellum in grasp execution, with impairments observed in hand transport, in hand shaping, the time to peak grip force and in grip/load force coupling (Brandauer et al. 2008). They are also impaired in anticipating lift forces (Rabe et al. 2009), in compensating when there are expected loads (self-triggered release of a ball) (Nowak et al. 2007a) and in transferring learned lift force knowledge between the hands (Nowak et al. 2009b). From the previous and many other studies, an important role of the cerebellum was proposed in inverse plus forward models to update motor commands. As such, there should be evidence for predictive or anticipatory signals in the cerebellum during precision grasping. Single neuronal recordings of cerebellar Purkinje cells show object-specific modulation of signals appearing within the reach phase or at grasp onset in a precision reach-to-grasp task (Mason et al. 2006). There was no significant interaction between object and grasp force modulation, supporting previous experiments of grasping that kinematics and force are signaled independently. These findings show there is cerebellar updating during on-line control. Cerebellar involvement is likely implemented in part via thalamo-cortical pathways to motor cortex. If so, there should be evidence for anticipatory or predictive activity within motor cortex activity during on-line control. This idea is now being pursued in part with fMRI. Motor cortex activity as measured in an fMRI adaptation paradigm was modulated by the weight of an object (Chouinard et al. 2009). This is probably a result of differences in final grip force adjustments needed to lift the objects. Responses within motor cortex and cerebellum track monotonically with overall grip force (Keisker et al. 2009). Exposure to an object where the dynamics have been learned causes increased activity in cerebellum and motor cortex prior to movement onset, suggesting that the physical priors of the object are triggering internal models stored in these two areas (Bursztyn et al. 2006). When the dynamics of an object are experimentally manipulated by making subjects balance a flexible ruler compared to a simple grip task, there is also relatively greater activity in cerebellum and motor cortex, suggesting that these areas are either implementing or representing internal dynamics (Milner et al. 2007). Motor cortical involvement in humans is also shown by associating predictive information about when a load perturbation will occur during gripping with EEG long latency reflexes that must originate in central cortical areas (Kourtis et al. 2008). A more direct relationship between cerebellum and motor cortex has been measured physiologically in non-human primates. As grip force is initiated, there is increased coherence between the deep cerebellar nuclei and motor cortex, as measured in non-human primates (Soteropoulos and Baker 2006).

In addition to cerebellar influences, there is growing evidence for direct influences between posterior parietal cortex and motor cortex based on double-pulse TMS studies (Koch and Rothwell 2009). In humans, the relationship between a desired goal and state estimation in prehension tasks has focused on the role of parietal cortex, particularly areas AIP and SPL. One experimental approach is to change the target object within a trial, forcing subjects to amend an ongoing movement to accommodate the new task goal. TMS disruption to AIP at movement onset consistently disrupts both the on-line maintenance of an action as well as the ability to update a goal when the target changes (Rice et al. 2006, 2007). Single-pulse TMS to AIP also disrupts adjustment of grip forces on-line when subjects are exposed to an unexpected object mass (Dafotakis et al. 2008). EEG linear source analysis of subjects grasping objects suggests that AIP is recruited earlier than nearby SPL. The duration of the response in AIP area is longer when there is an object perturbation (Tunik et al. 2008). In contrast, initiation of a corrective movement coincides with activation in SPL. AIP and adjacent SPL are closely connected and share many functional properties when measured at the single neuron level in monkeys (Gardner et al. 2007a, b). Both show object selective responses, increase in activity during the approach phase of a grasp and reach a peak of activity at object contact. Activity also increases with precision demands. Area 5 of the superior parietal lobule is also critical for mediating synergies between reach and grasp during ongoing movement (Chen et al. 2009) and maintaining on-line information about the state of the hand and its trajectory (Archambault et al. 2009). MIP, a portion of the SPL on the medial bank of the IPS, has strong anatomic connections with the gaze and arm area of interpositus nucleus of the cerebellum (Prevosto et al. 2010). Together these data support possible dissociable processes: the integration of target goal with an emerging action plan (within AIP) and further on-line adjustments (within SPL).

What is an action goal?

Prehension experiments remain a powerful approach for motivating experiments on what constitutes an action goal (Rizzolatti et al. 1987a; Rosenbaum et al. 1999). Prehension goals span many levels of complexity and can be defined by at least three experimental end-states. The first is grasp-centric, with completion of the grip defined as the completion of a goal. This is a long-standing experimental approach, but it is an uncommon goal in the real world. Objects are manipulated. To make prehension research ecologically valid, it is going to be essential for future studies to consider factors needed for controlling object dynamics such as lift forces and individuated finger movements as integral components of grasping. These are readily introduced when the end-state is object-centric, i.e., where the final position of an object is the goal. Selection in this case is influenced by either biomechanical constraints or end-state comfort effects (Rosenbaum et al. 1992). In this case, biomechanical comfort is a constraint on the end-state. The object position determines the end-state. Finally, how an object (tool) is used can be thought of as an end-state defining an action goal. This requires selection of specialized movements that go beyond end-state comfort effects and draw heavily on object knowledge. An important future question is whether there is evidence for hierarchically distributed neural architectures for supporting these different end-states.

Most work on end-states as goals rely on indirect evidence, such as behavioral studies. For example, when people grasp cubes, there is an effect on maximum grip aperture if a subject simultaneously observes another person grasping toward larger objects (Dijkerman and Smit 2007). A fundamental issue is whether this type of interference between observation and execution is at the level of the goal state (the object) or at lower levels, such as the underlying movements. To test this further, another behavioral study manipulated the congruency between observed and executed actions (power or precision grasp) with and without a target object. The critical finding was that interference effects required the presence of an object, and were more pronounced for precision than power grasps (Vainio et al. 2007). This suggests that grasp-based end-states may be organized at a supraordinate to lower-level planning processes such as the type of movement. Another way to think about this is to consider the goal representation as necessary to reconcile two parallel operations: the selection of a target object and the selection of an action to perform with the object. This joint selection among alternatives has been modeled computationally. The critical point is that the goal state shapes this interaction. This can be demonstrated by showing that distracter objects will differentially influence the reaching depending on the nature of the current action plan (Botvinick et al. 2009). These interactions across levels of action planning can also be observed when the two hands of a subject are tested against each other. In one study, subjects were asked to grasp and lift a smooth cylinder with one hand, before and after judging the level of difficulty of a ‘‘grasping for pouring’’ action, involving a smaller cylinder and using the opposite hand. The simulated grasp exerted a direct influence on an actual motor act with the other hand. This shows there may be conjoined representations of the graspable characteristics of the object, the biomechanical constraints of the arms and the overall action goal (Frak et al. 2007).

When an object is grasped to serve as a tool, then a whole host of additional computational requirements are introduced. Studies in apraxic subjects underscore the fact that multiple factors can influence grip selection for a given tool: Knowledge about the function of the object, structural tool characteristics, biomechanical costs of the movement and previous experience (Randerath et al. 2009). The basic principles of tool functions are acquired early in development (measured as early as 10 months) and require physical and not just observational experience (Sommerville et al. 2008). Relationships between the semantic understanding of a tool and the tool’s action properties remain uncertain (Noppeney et al. 2006; Noppeney 2008). Clearly, semantic knowledge can be used to constrain action planning. There is also a growing argument that these sources of knowledge actually share neural substrates, in line with a view that information storage is based on grounded cognition (Barsalou 2008). Nevertheless, this remains an area ripe for additional investigation.

One of the basic mechanisms that may be necessary for tool use is the remapping of the body schema to incorporate the acting part of a tool. This idea is based on work of primates who learned to use a tool that extended physical reach (Iriki 2006). It was proposed that the new skill led to an extension of the receptive fields of parietal neurons to cover the hand and tool. The same idea was tested in human subjects, who learned to reach and grasp with a set of “grabbers” that extended the reach of the limb (Cardinali et al. 2009). The key question was whether subjects adapt body schema to the tool and if so, whether they would show after-effects on trials when grasping with the hand alone. Indeed, after adaptation, there were measurable effects on trials with just the hand and no tool, suggesting a deep change of body schema induced by the tool. Adaptation of body schema was also tested by manipulating visual feedback during grasping. Using a video system, it is possible for the test subject to see their grasping hand as looking larger or smaller than normal (Marino et al. 2010). In trials where the hand is larger, there was an adaptive reduction in maximum grip aperture (MGA). This reduction persisted into catch trials that tested for after-effects, showing the depth of this adaptive process. The converse, where a smaller than normal hand was observed, did not lead to an increased MGA. The authors hypothesized that this asymmetry in the direction of adaptation was likely due to the fact that our bodies get bigger with growth and development and do not shrink.

These studies emphasize how the body representation is not constant, but highly plastic in changing shape. However, simple extension of a body schema by stretching the proprioceptive space to match the hand plus tool may be insufficient to explain the range of alterations needed for body schema to match complex tools. Instead, there is emerging evidence that simply expanding or stretching a body schema with tools does not necessarily alter the representation of peripersonal space and its boundaries (Gallivan et al. 2009). As opposed to extending the body schema, tool use may actually induce the distalization of the end-effector from hand to the tool. This is a far more specific process where body representations can be spatially relocated to a new island of space. Different tools extend the body schema in different ways, requiring the remapping of visual target and tool-specific haptic feedback of the hand (Arbib et al. 2009). Distalization as a process distinct from schema extension can be shown behaviorally by testing for differential gains of visual discrimination across the workspace. If the tool simply extends the body schema, then discrimination gains normally found at the finger tip in a pointing task should transfer to the tip of a tool used for pointing. Data shows that discrimination performance is enhanced in parallel at both spatial locations, but not at nearby and intermediate locations. In other words, there is distalization of the fingertip to a new location mediated in a specific way by the tool (Collins et al. 2008).

In some areas, such as AIP, the ability to represent an action goal is relatively concrete, in that it depends on the presence of the target object to generate a parallel representation of the motor action (Baumann et al. 2009). To show this, single neurons in area AIP were recorded in animals trained to perform power or precision grips on a handle at different orientations. In a cue separation task, when the object was presented first, neurons representing power or precision grips were activated simultaneously until the actual grip type was instructed. In contrast, when the grasp type instruction was presented before the object, type information was only weakly represented in AIP but was strongly encoded after the grasp target was revealed. As the granularity of functional imaging improves, it is becoming possible to dissociate areas more closely related to action plan/goal from underlying object affordances (Valyear et al. 2007). Another way to conceptualize AIP and interconnected prehension network is to assume they are optimally tuned for actions requiring prehensile interactions with objects. Other types of hand–object interactions would represent computational outliers and require increased processing demands. Evidence for this was found in an fMRI study where greater activity was active throughout left hemisphere fronto-parietal circuits for non-prehensile object manipulation (push, poke, etc.) than prehensile manipulation (Buxbaum et al. 2006).

The influence of action goals on underlying neural processing can be found throughout the premotor and motor cortex. In healthy subjects, TMS-induced MEPs from motor cortex are influenced by the observation of actions in others, and this can be used to examine the specificity of observed action goals on motor cortex. A clever experimental manipulation to do this is to dissociate muscle-specific responses used for an action from responses tied to the actions of a tool. This can be done by observing someone using regular pliers to grasp an object relative to using reverse pliers where the grip needs to close to open the pliers. Observed actions devoid of a goal will influence the MEP pattern of muscle recruitment that reflects the underlying muscle pattern of the observed action. When a goal is present, the MEP pattern of muscle recruitment reflects the action of the tool used to accomplish the action, rather than the specific muscle. In other words, motor cortex sensitivity was exquisitely linked to the motor goal and distalization of the hand action into the tool (Cattaneo et al. 2009). The same basic finding can be observed at the neuronal level in monkeys trained to use similar tools. Cortical motor neurons, active during hand grasping, also become active during grasping with pliers, as if the pliers were now the hand fingers. This motor embodiment occurs both for normal pliers and for “reverse pliers,” an implement that requires finger opening, instead of their closing, to grasp an object (Umiltà et al. 2008). Neuronal flexibility in remapping goals states to new tools rather than the hand can also be identified with fMRI in PMv and AIP in humans (Jacobs et al. 2010) These data underscore a remarkable flexibility of neuronal ensembles to functionally reconfigure to match new goal states. This implies that there may be a multiplicity of effector maps that could be recombined to achieve these new states. In support of this idea, a non-human primate study mapped distinct subregions for proximal and distal movements by intracortical microstimulation (Stark et al. 2007a). These same sites do not show segregation for reach and grasp actions during natural prehension. That is, there is a large degree of mixing neurons encoding reach and grasp across the premotor cortex, presumably to provide multiple solutions for coordinating the different components of prehension as a function of action goal.

Prehension remains a powerful “simple” system for understanding the neural underpinnings of goal-directed behavior. We have learned much about the underlying anatomic, functional and computational principles that guide object-centric movement. However, prehension is embedded within a much larger and more complex behavioral repertoire. What are missing are computational principles and neural mechanisms that explain how multiple movements, including prehension, are chained together to achieve temporally remote action outcomes (Lashley 1951; Fogassi et al. 2005). Computationally, this could be achieved via hierarchical task planning processes supported by a cascade of “if-then” rules (Cooper and Shallice 2006). This sort of highly structured, hierarchical planning architecture might readily map into prefrontal cortex, already recognized as playing a critical role in complex planning (Badre and D’Esposito 2007, 2009; Botvinick 2008). On the other hand, hierarchical planning may not be the only solution for getting complicated actions accomplished. There are strong computational arguments that many familiar actions can be executed via non-hierarchical processes, given sufficient practice (Botvinick and Plaut 2006). It is reasonable to propose that the next challenge in this domain will be reconciling this tension between automatic and controlled processes in forming goal-oriented behavior.