Elsevier

Neuropharmacology

Volume 37, Issues 4–5, 5 April 1998, Pages 407-419
Neuropharmacology

Goal-directed instrumental action: contingency and incentive learning and their cortical substrates

https://doi.org/10.1016/S0028-3908(98)00033-1Get rights and content

Abstract

Instrumental behaviour is controlled by two systems: a stimulus–response habit mechanism and a goal-directed process that involves two forms of learning. The first is learning about the instrumental contingency between the response and reward, whereas the second consists of the acquisition of incentive value by the reward. Evidence for contingency learning comes from studies of reward devaluation and from demonstrations that instrumental performance is sensitive not only the probability of contiguous reward but also to the probability of unpaired rewards. The process of incentive learning is evident in the acquisition of control over performance by primary motivational states. Preliminary lesion studies of the rat suggest that the prelimibic area of prefrontal cortex plays a role in the contingency learning, whereas the incentive learning for food rewards involves the insular cortex.

Introduction

Prediction and control are the keys to successful adaptation to varying environments. Predictive learning allows an animal to anticipate biologically important events and resources by detecting and learning about signals of their occurrence. Traditionally, this form of learning has been studied using Pavlovian or classical conditioning procedures in which the signal acquires the capacity to elicit anticipatory responses as a result of its predictive association with a reinforcer. Although the functional importance of Pavlovian responses is well established (Hollis et al., 1997), their adaptive form is determined by evolutionary processes rather than individual learning, with the consequence that a purely Pavlovian animal is at the mercy of the stability of the causal consequences of its behaviour.

This point can be illustrated by one of the simplest behavioural capacities-the ability to approach signals of valuable resources. Having fed chicks at a distinctive food bowl, Hershberger (1986)found that when they were later removed from the vicinity of the bowl, not surprisingly they immediately ran back to it. Presumably, the initial feeding established the visual features of the bowl as a Pavlovian signal for food, capable of eliciting a conditioned approach response. For a second group, however, Hershberger reversed the normal relationship between locomotion and spatial translation by placing the chicks in a 'looking glass' world where the bowl receded twice as fast as they ran towards it, and approached them at twice the speed that they ran away from it. The reversal of the normal relation between locomotion and relative spatial translation required the chicks to learn to run away from the bowl in order to reach it. This they were unable to do over 100 min of training.

The problem for Hershberger's chicks resides with their insensitivity to the change in the causal consequences of their behaviour, at least with respect to spatial locomotion, and it is the ability to learn about such causal relationships that represents a second form of acquired behavioural adaptation to varying environments. Learning about behaviour-outcome associations is typically studied using instrumental conditioning procedures in which a relationship is arranged between an action and a reinforcer. Whereas Pavlovian conditioning enables an animal to anticipate motivationally significant events, it is instrumental conditioning that allows control over these events in the service of its needs and desires. And it is instrumental learning that is the focus of this paper.

The classic learning theories of the neobehaviourist era (Tolman, 1932, Hull, 1943) were developed in response to studies of what was, at least nominally, instrumental conditioning. In the 1970s, however, the primary focus of learning theory shifted from the instrumental to the Pavlovian paradigm primarily for technical reasons. In the Pavlovian paradigm, the experimenter has control over the critical elements of the relationship, the signal and the reinforcer, whereas one of the elements of the instrumental association, the response itself, is under the subject's control. Consequently, contemporary theories of conditioning (Rescorla and Wagner, 1972, Wagner, 1981, Pearce and Hall, 1980, Gallistel, 1990) have focused almost exclusively on the Pavlovian paradigm. Much the same is true of the neurobiological analysis of conditioning. The favoured procedures for investigating the neural structures mediating conditioning are Pavlovian (Lavond et al., 1993, Ledoux, 1995) with the consequence that most neural network models (Hawkins and Kandel, 1984, Gluck and Thompson, 1987; Schmajuk, 1997) also address this form of learning.

It is true that brain mechanisms mediating rewards in general have been the subject of intensive study (see Robbins and Everitt, 1996, for a recent review), but this analysis has been largely undertaken without any attempt to specify exactly how reward processes make contact with structures that mediate instrumental action. To the extent that an associative structure and learning process for instrumental action has been specified, it typically takes the form of a variant of the classic stimulus–response (S–R)/reinforcement system originally advanced in Thorndike (1911)‘Law of Effect’ (Donahoe et al., 1993). The central idea embodied in this so-called law is simple-the presentation of a reward shortly after the performance of an instrumental action strengthens or reinforces an association between the stimuli present when the response was performed and the response production mechanism so that these stimuli become capable of eliciting the response.

There is no doubt that the S–R/reinforcement mechanism, when embodied within artificial creatures and elaborated with attentional and motivational mechanisms, can support sophisticated and complex instrumental behaviour (Grand et al., 1996). As a system for adapting to the causal structure of the environment, however, an S–R process has two major limitations. The first relates to the fact that S–R processes are sensitive to only the contiguous pairing of action and reinforcer rather than to the causal relationship between these events. As a result, it is prone to develop superstitious responding under conditions in which a reinforcer reliably follows a response even if there is no causal association between the response and reward.

The second limitation arises from the failure of an S–R process to encode the consequences of a response or, in other words, to represent the causal relationship between an action and a reward. All that is acquired during instrumental learning, according to this theory, is a procedural connection between a stimulus and a response so that the former becomes capable of reliably eliciting the latter. As a result, an S–R agent does not ‘know’ about the consequences of its behaviour and thus cannot evaluate different courses of action in terms of the relevance of the outcomes to its current needs and motivational states.

In the next two sections, we describe evidence demonstrating that real animals are subject to neither of these constraints.

Section snippets

Instrumental contingency

Hammond (1980)was the first to demonstrate that animals are sensitive to the causal relation between response and reward even when the contiguous pairings between them are kept constant. Under his schedule the first response in each 1-s period has a fixed probability of being reinforced. Thus, for example, hungry rats might be trained to press a lever under a schedule in which the first press in each second is followed the delivery of a food pellet with a fixed probability. The causal

Reward devaluation

The second limitation of an S–R process lies with the absence of any encoding of the relationship between the response and the reward with the consequence that an S–R agent is incapable of truly goal-directed behaviour. In other words, the theory treats all instrumental behaviour as simple, elicited habits. This is certainly not the interpretation that our folk psychology gives for many of our instrumental actions. We typically regard our actions as purposive and explicitly selected and

Motivational control

The fact that a hungry rat presses a lever more rapidly for a food reward than a sated animal hardly warrants experimental demonstration, and yet the explanation of such simple motivational effects has always been problematic for theories of instrumental action. The deceptively simple explanation that the motivational state of hunger activates specifically food-directed behaviour is precluded within S–R/reinforcement theory. In the absence of any knowledge of the consequence of behaviour, an

Multiple learning processes

This brief and selective survey demonstrates that the processes mediating simple instrumental or operant behaviour are much more complex than is envisaged by classic S–R/reinforcement theory. Specifically, we have identified two features of instrumental performance that lie outside the scope of this theory. First, performance is sensitive not only to the contiguity between the response and the reward but also their contingency. Secondly, the impact of reward devaluation demonstrates that some

Cortical structures and instrumental action

A favoured strategy in the psychological study of learning and memory is that of process dissociation, and the present tripartite analysis is no exception. Indeed, our analysis of instrumental learning conforms to the popular distinction between declarative and procedural learning (Dickinson, 1980, Squire, 1992) with contingency learning being declarative in nature and habit learning procedural. Such distinctions are first and foremost psychological in that they stand or fall by purely

Summary and conclusions

The control of instrumental action is complex involving the interaction of at least three psychological processes. Although the insensitivity of performance to reward devaluation after overtraining implies a role for the classic S–R/reinforcement mechanism, the fact that devaluing the reward after more limited training reduces responding in a subsequent extinction test demonstrates that instrumental action can be goal-directed. Goal-directed control is mediated by two further processes,

Acknowledgements

The research reported in this article and preparation of the manuscript was supported by grants from the National Institute of Mental Health, NIMH grant no. MH 56446, and from the European Commission BIOMED 2 programme.

References (50)

  • J.S Carp et al.

    Motoneuron plasticity underlying operantly conditioned decrease in primate H-reflex

    J. Neurophysiol.

    (1994)
  • X.Y Chen et al.

    Operant conditioning of H-reflex in freely moving rats

    J. Neurophysiol.

    (1995)
  • R.M Colwill et al.

    Postconditioning devaluation of a reinforcer affects instrumental responding

    J. Exp. Psychol: Anim. Behav. Proc.

    (1985)
  • R.M Colwill et al.

    Instrumental responding remains sensitive to reinforcer devaluation after extensive training

    J. Exp. Psychol: Anim. Behav. Proc.

    (1985)
  • Colwill R.M. and Rescorla R.A. (1986) Associative structures in instrumental learning. In: Bower G.H., (Ed.) The...
  • R.M Colwill et al.

    The role of response-reinforcer associations increases throughout extended instrumental training

    Anim. Learn. Behav.

    (1988)
  • T.L Davidson

    The nature and function of interoceptive signals to feed: Towards intergration of physiological and learning perspectives

    Psychol. Rev.

    (1993)
  • Dickinson A., 1980. Contemporary Animal Learning Theory. Cambridge University Press,...
  • A Dickinson et al.

    Motivational control of goal-directed action

    Anim. Learn. Behav.

    (1994)
  • A Dickinson et al.

    Motivational control of instrumental action

    Curr. Dir. Psychol. Sci.

    (1995)
  • Dickinson A., Shanks D.R., 1995. Instrumental action and causal representation. In: Sperber D., Premack A.J., (Eds.)...
  • A Dickinson et al.

    Motivational control after extended instrumental training

    Anim. Learn. Behav.

    (1995)
  • A Dickinson et al.

    Bidirectional instrumental conditioning

    Q. J. Exp. Pychol.

    (1996)
  • J.W Donahoe et al.

    A selectionist approach to reinforcement

    J. Exp. Anal. Behav.

    (1993)
  • J.W Donahoe et al.

    The unit of selection: What do reinforcers reinforcer

    J. Exp. Anal. Behav.

    (1997)
  • Cited by (0)

    View full text