2006 Special issuePerceiving the unusual: Temporal properties of hierarchical motor representations for action perception
Introduction
An increased interest in computational mechanisms that will allow robots to observe, imitate and learn from human actions has resulted in a number of computational architectures that allow the matching of demonstrated actions to the observer robot's equivalent motor representations (Alissandrakis et al., 2002, Billard, 2000, Demiris and Hayes, 2002, Schaal et al., 2003). These architectures, whilst sharing common computational components such as modules for processing and classifying visual information and retrieving motor representations, differ in the way that the perceptual information is coded and classified, the organisation of the motor system, and the stage at which the motor representations are used. The final aspect, at what stage the motor representations are used, differentiate architectures that follow the general ‘observe, classify, imitate’ decomposition (Kuniyoshi, Inaba, & Inoue, 1994), from those that advocate a stronger involvement of the motor systems in the perception process, through a ‘rehearse, predict, observe, reinforce’ decomposition (Demiris and Hayes, 2002, Demiris & Johnson, 2003, Schaal et al., 2003). In the latter, the observer robot invokes its motor systems to rehearse potential actions, predicting and confirming incoming observed states during the demonstration. This approach has gained biological credibility with the discovery of the mirror system in monkeys and humans (Grezes et al., 2003, Rizzolatti et al., 1996). Not all theoretical models advocate the actual rehearsal of candidate actions as our previous work has done (Demiris & Hayes, 2002), opting instead for a weaker version of this motor theory of perception, usually termed ‘motor resonance’, in which the motor representations are retrieved through a resonance mechanism rather than a generative mechanism.
For imitation approaches that advocate the use of motor systems during the perception stage it becomes crucial to have a clear and flexible motor system organisation. Hierarchical representations, involving primitive motor structures at the lowest level, while increasing their complexity in higher levels, have been proposed (Demiris & Johnson, 2003, Wolpert et al., 2003), and tested in robotic systems (Demiris & Johnson, 2003), which successfully learned and used sequences of actions by observation. However, little has been done with respect to the temporal dimension of these representations, including how they can be coordinated, as well as their relation to biological data.
In this paper, we will examine in detail the issue of hierarchical representations, and in particular examine how higher level models can be composed from (and coordinate) lower levels primitives. Our approach will use representations based on the biologically plausible minimum variance model of movement control (Harris and Wolpert, 1998, Simmons and Demiris, 2005), which leads to principled and biologically plausible coordination of the underlying components. We subsequently compare a particular instantiation of our hierarchical attentive multiple models for execution and recognition (HAMMER) architecture (Demiris & Khadhouri, in press) for reaching and grasping actions, with transcranial magnetic stimulation (TMS) data from humans during the passive observation of grasping movements by a demonstrator (Gangitano et al., 2004).
Section snippets
Hierarchies
Hierarchies are computationally interesting since they advocate a logical representational decomposition: motor primitives at the lower levels take care of the executional details while progressively higher levels shift their emphasis towards exerting temporal, contextual and cognitive control. From a robotics point of view, this allows for easier task planning and execution. In action understanding and gesture recognition, hierarchical representations have been regularly used since they allow
Building blocks
The HAMMER family of architectures uses inverse and forward models (Karniel, 2002, Narendra and Balakrishnan, 1997, Wolpert and Kawato, 1998) as the basic building blocks. An inverse model is a module that takes as inputs the current state of the system and the target goal(s) and outputs the control commands that are needed to achieve or maintain those goal(s). The functional reverse to this concept is that of a forward model of a controlled system: a forward model is a module that takes as
The HAMMER–MV implementation
HAMMER–MV follows the general architecture of HAMMER, but uses minimum variance controllers as lower level inverse models, and coordinated combinations of these at the higher ones. We will start by giving an overview of our implementation of the minimum variance model, and show results on how it can be used to generate biologically plausible reaching trajectories. Subsequently, we will describe how we implement a particular instance of a hierarchical representation for a grasp using the minimum
Experiments
In our final set of experiments, human demonstrations of reaching actions were recorded and given as input to the 2D six degree of freedom simulated arm, controlled using HAMMER–MV. In the following sections, we will describe the visual stimuli we recorded, and the equations governing the matching of the model arm's performance against the human data.
Discussion
It has been advocated earlier in this paper that hierarchical representations can be a useful engineering tool when structuring motor systems. The HAMMER–MV implementation of this concept demonstrates why: hiding the details of the lower level details into higher level structures allows for easier task planning than that achieved with flat, non-hierarchical representations—only the details of the goal and desired task parameters need to be supplied and the higher inverse model will recruit and
Conclusions
The neurophysiological data mentioned in this paper lend support to the notion that the human brain does not passively observe actions but actively forms hypotheses and predicts forthcoming states. In Gangitano et al. (2004), it was shown that there is no temporal fragmentation of the action plan in the motor representation of the observer. The computational implementation of the HAMMER architecture described in this paper reproduced these results, using a hierarchical controller based on the
Acknowledgements
The first author acknowledges the support of UK Engineering and Physical Sciences Research Council (EPSRC Grant GR/S11305/01) and the Royal Society. The second author is supported by a EPSRC/DTA doctoral scholarship. Thanks to all the BioART members for their valuable feedback, and especially Anthony Dearden for his help in capturing and analysing the visual stimuli.
References (48)
- et al.
Modelling parietal–premotor interactions in primate control of grasping
Neural Networks
(1998) Upper processing stages of the perception–action cycle
Trends in Cognitive Sciences
(2004)- et al.
Activations related to mirror and canonical neurones in the human brain: An fMRI study
Neuroimage
(2003) Three creatures named forward model
Neural Networks
(2002)- et al.
Premotor cortex and the recognition of motor actions
Cognitive Brain Research
(1996) - et al.
Attention to action: Specific modulation of corticocortical interactions in humans
NeuroImage
(2002) - et al.
Action experience alters 3-month-old infants' perception of others' actions
Cognition
(2005) - et al.
I know what are you doing: A neurophysiological study
Neuron
(2001) - et al.
Multiple paired forward and inverse models for motor control
Neural Networks
(1998) - et al.
Imitating with alice: Learning to imitate corresponding actions across dissimilar embodiments
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
(2002)
Learning motor skills by imitation: A biologically inspired robotic model
Cybernetics and Systems
Imitation as a dual route process featuring predictive and learning components: A biologically-plausible computational model
Distributed, prediction perception of actions: A biologically inspired architecture for imitation and learning
Connection Science
Hierarchical organization and functional streams in the visual cortex
Trends in Neurosciences
Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex
Journal of Cognitive Neuroscience
The information capacity of the human motor system in controlling the amplitude of movement
Journal of Experimental Psychology
The coordination of arm movements: An experimentally confirmed mathematical model
The Journal of Neuroscience
Phase-specific modulation of cortical motor output during movement observation
Cognitive Neuroscience and Neuropsychology
Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements
European Journal of Neuroscience
What should a robot learn from an infant? Mechanisms of action interpretation and observational learning in infancy
Connection Science
Signal-dependent noise determines motor planning
Nature
Cited by (28)
Motor memory: Representation, learning and consolidation
2016, Biologically Inspired Cognitive ArchitecturesCitation Excerpt :The models are predominantly expressed in the forms of differential equations (Degallier, Righetti, Gay, & Ijspeert, 2011; Forte, Gams, Morimoto, & Ude, 2012; Ijspeert et al., 2002; Kober et al., 2012). Our work also relates to research in cognitive science (Knoblich & Flach, 2001; Tenenbaum, Kemp, Griffiths, & Goodman, 2011; Wolpert & Flanagan, 2010), which suggests that hierarchical representations are best suited for computational models of human cognition (Braun, Aertsen, Wolpert, & Mehring, 2009; Braun, Waldert, Aertsen, Wolpert, & Mehring, 2010; Demiris & Simmons, 2006; Nishimoto & Tani, 2009; Tenenbaum et al., 2011). Similarly, in computer vision, the principle of hierarchical compositionality has recently also proved very successful for visual object categorization (Fidler, Boben, & Leonardis, 2008).
Cross-modal and scale-free action representations through enaction
2009, Neural NetworksCitation Excerpt :In this fashion, the neuronal groups form hierarchies of different level descriptions set up from their basic neural bricks in a bottom–up fashion, in line with recent biological data supporting that the motoric system is organized into hierarchical representations (Lestou et al., 2008). Although some computational frameworks have been proposed to model hierarchical representations for action representation (Demiris & Simmons, 2006; Wolpert, Doya, & Kawato, 2003; Wolpert, Ghahramani, & Flanagan, 2001), they do not emphasize the importance of timing as the neuroscience dynamical systems viewpoints do (Edelman, 1987; Kelso, 1995; Rabinovich, Varona, Selverston, & Abarbanel, 2006; Tsuda, 1991), which we think important for its functioning. Besides it, polychronization of neural pairs might establish a “vertical association” between parallel neural processes to represent actions and to re-enact them.
The enhancing effect of tracking gesture on visuo-spatial learning
2022, Acta Psychologica Sinica