Encoding the serial order of events is an essential function of working memory, but one whose neural basis is not yet well understood. In the present work, we advance a new model of how serial order is represented in working memory. Our approach is predicated on three key findings from neurophysiological research: (1) prefrontal neurons that code conjunctively for item and order, (2) parietal neurons that represent count information through a graded and compressive code, and (3) multiplicative gain modulation as a mechanism for information integration. We used an artificial neural network, integrating across these three findings, to simulate human immediate serial recall performance. The model reproduced a core set of benchmark empirical findings, including primacy and recency effects, transposition gradients, effects of interitem similarity, and developmental effects. The model moves beyond previous accounts by bridging between neuroscientific findings and detailed behavioral data, and gives rise to several testable predictions.
Working memory is a cognitive function that serves to preserve task-relevant information in an active and accessible form over periods of a few seconds (Baddeley, 1986; Jonides et al., 2005). It has long been recognized that one critical feature of working memory is its capacity to encode and maintain information about the serial order of perceived events (Marshuetz, 2005). This capacity is essential in many domains including the comprehension, learning, and production of action sequences, the encoding of causal relationships, and perhaps above all, language processing (Martin and Gupta, 2004).
The ability to recall serial order information from working memory, and the limits of this ability, have been studied by cognitive psychologists for decades, and this research effort has yielded an exceedingly detailed description of human serial recall performance. However, the neural mechanisms underlying the behavioral data are not yet fully understood. Although several neuroscientific models have been proposed previously (Dominey et al., 1995, 1997; Beiser and Houk, 1998; O'Reilly and Soto, 2001), few have made contact with behavioral data at any level of detail. At the same time, where psychologically sophisticated models have been offered, they have rarely made significant contact with evidence from neuroscience (Houghton, 1990; Burgess and Hitch, 1999; Brown et al., 2000; Farrell and Lewandowsky, 2002; Botvinick and Plaut, 2006).
In the present study, we introduce a novel computational model of working memory for serial order, which bridges between the domains of neuroscience and behavior. The model is based directly on a set of recent neuroscientific findings and shows how these observations, when integrated into a single account, might explain detailed patterns of serial recall performance. In what follows, we begin by reviewing the neuroscientific data on which the model is founded, and then report a series of simulation studies in which the model was tested against empirical benchmarks from the behavioral literature.
Elements of the account
(1) Conjunctive coding of item and rank information in prefrontal cortex
The first basic finding that our model draws on comes from single-unit recordings in monkeys performing immediate serial recall and related tasks. Across a series of such studies, beginning with Barone and Joseph (1989) and continued most recently by Inoue and Mikami (2006) (see also Kermadi et al., 1993; Kermadi and Joseph, 1995; Funahashi et al., 1997; Ninokura et al., 2003, 2004), a critical and consistent finding has been that sequences are encoded through a conjunctive code, which crosses item with order information.a Specifically, within the prefrontal cortex as well as caudate nucleus, single neurons have been found to respond selectively to particular items (shapes or locations), but their response to these items depends on the ordinal position in which the items appear (see Fig. 1A). The representational code carried by these neurons is conjunctive in the sense that the neurons respond maximally to a particular conjunction or combination of item and ordinal position. Such conjunctive coding provides an answer to the question of how the brain may solve the binding problem inherent to sequence encoding, the need to link individual items with individual serial positions.
(2) Information integration through gain-field encoding
The second key finding derives from single-unit recording studies that suggest how, in general, the brain may compute conjunctive codes. Starting from studies on spatial coordinate transformations in vision, Salinas et al. (Salinas and Thier, 2000; Salinas and Abbott, 2001) have proposed that information from multiple domains is commonly integrated, at the neural level, through multiplicative gain modulation. For example, in spatial processing, information about retinotopic location and eye position is integrated to yield head- or eye-centered representations. Single-unit recording data suggest that this mapping is mediated by parietal neurons whose response profiles can be modeled as the product of two receptive fields, one for retinotopic position and one for eye position (Brotchie et al., 1995). The sufficiency of this mechanism was demonstrated in a neural network model by Pouget and Sejnowski (1997) (see Fig. 1C). Additional computational studies have indicated how the same kind of gain modulation might support information integration in additional domains, including object recognition and sensorimotor mapping (Pouget and Snyder, 2000; Salinas and Thier, 2000; Salinas and Abbott, 2001; Salinas, 2004).
(3) Graded, compressive representations of sequential numerosity in intraparietal sulcus
The third finding of interest bears on the question of how serial position or rank may be represented at the neural level. It has been proposed, based in part on neuroimaging data, that serial order processing may draw on representations of number arising within the intraparietal sulcus (IPS) (Marshuetz et al., 2000, 2006; Marshuetz, 2005; Nieder, 2005). Previous single-unit recording work by Nieder et al. (2006) provides additional motivation for this idea, by demonstrating that neurons in the IPS respond selectively to the number of occurrences of a repeating event, with a distinct subset of neurons responding preferentially to the event's first occurrence, another subset to its second occurrence, and so forth. Nieder et al. (2006) described these neurons as coding for “sequential numerosity” or “sequential quantity.” In what follows, for brevity, we describe such neurons as coding for rank.
Importantly, the study by Nieder et al. (2006), together with closely related work (Nieder et al., 2002; Nieder and Miller, 2003, 2004; Nieder, 2005), provides detailed information concerning the format of rank representations within the IPS. First, Nieder et al. (2006) found that IPS neurons code for rank in a graded manner; individual neurons responded maximally to a specific rank, but also responded more weakly to other ranks, with the response dropping off in intensity with distance from the preferred rank. Second, closely related work on numerosity representation (Nieder and Miller, 2003) indicates that IPS neurons represent count information using a compressive code, reflected in more broadly tuned receptive fields for larger numbers (see Fig. 1B). As Nieder and Miller (2003) and others have noted, such compressive coding provides an explanation for the so-called scalar property, an instance of Weber's law according to which better discrimination is shown between small numerosities than between larger ones.
Our central proposal is that these three findings (conjunctive coding of item and rank, information integration through multiplicative gain modulation, and graded, compressive coding of count information) can be fit together to provide a satisfying account of how serial order is represented in working memory. According to this account, during sequence encoding, graded and compressive rank representations arising within the IPS feed forward to the prefrontal cortex, where rank information is integrated with item information through multiplicative gain modulation. The resulting graded conjunctive representation in the prefrontal cortex provides the basis for serial recall.
Neural network implementation
To make this account explicit, and to evaluate its ability to account for human recall performance, we implemented the account in the form of a runnable neural network model. The structure of the network was based directly on the gain field model of visual processing proposed by Pouget and Sejnowski (1997). Like that model, ours was composed of interconnected processing units, which assumed scalar activation values representing the time-averaged spike rates of individual neurons.b These were organized into four layers or groups (see Fig. 2). There were two input layers, one representing item (e.g., shape, location, or verbal item), and the other representing ordinal position or rank. As detailed below, each unit in the item layer responded maximally to a specific and unique item, but also responded submaximally to other items, to an extent determined by those items' similarity to the unit's optimal stimulus. The response profiles for units in the rank layer were chosen so as to resemble those reported by Nieder et al. (Nieder, 2005; Nieder et al., 2006). Specifically, each unit responded maximally to a unique rank, but also showed graded responses to surrounding ranks. This, as well as the compressive quality of empirically observed encodings of number, was captured by making each unit's response a scaled log-normal function of rank (see Fig. 1D).
Both input layers sent projections to an internal layer. Each unit within this layer received connections from one unit in the item layer and one unit in the rank layer, and assumed a level of activation equal to the product of the activations of these two input units (see Fig. 1E). All units in the internal layer sent projections to each unit in an output layer, within which each unit coded for a specific response sequence (see Materials and Methods).
The model was used to simulate immediate serial recall for six-item sequences. The first item in the target sequence was presented by imposing the appropriate patterns of activation over the item and rank input layers. Activations in the internal layer were updated, based on these inputs. The second item in the target sequence was presented on the next time step by imposing new patterns of activation over the input layers. The pattern of internal-layer activation induced by these inputs was added to the pattern induced by the first item in the sequence. This summation implemented the assumption that sequence elements are represented through a superpositional, activation-based code, as argued by Botvinick and Plaut (2006) (Beiser and Houk, 1998; O'Reilly and Soto, 2001). Empirical support for such a superpositional code proceeds from neurophysiological studies such as those by Inoue and Mikami (2006) and Mushiake et al. (2006), both of which reported representation of sequences through concurrent activation of prefrontal neurons coding conjunctively for item and rank.
Subsequent presentation of the third through sixth items resulted in a distributed pattern of activation in the internal layer that contained information pertaining to all six items in the target sequence (see Fig. 3A,B). With this pattern in place, activation fed forward from the internal layer to the output layer. The synaptic weights connecting these layers were trained, using supervised gradient-descent learning, to activate the output unit representing the target sequence (see Materials and Methods).
On each step of processing, random noise was added to the activation value of each unit in the input and internal layers, modeling the intrinsic variability of activation codes in biological neurons. The introduction of this noise meant that the model's internal representation for any given target sequence might “accidentally” end up looking like the pattern usually used to represent a different sequence, causing the model to commit a recall error (see Fig. 3C).
Using procedures detailed in the following section, we used the model to simulate immediate serial recall under a range of conditions, evaluating its ability to capture a key set of behavioral benchmarks.
Materials and Methods
Simulations were implemented using Matlab (Mathworks, Natick, MA).
The model comprised six item units, nine rank units, 54 internal units, and 720 output units. Each item unit was associated with an optimal stimulus (ψ) and unit activation was determined according to a function of this item and the item actually occurring as a stimulus, s: where Iψ(s) is the activation of the input unit with optimal stimulus ψ in response to stimulus item s, and δ is a model parameter controlling the degree of dissimilarity between item representations (range, 0–1). For the simulation involving mixed confusable and nonconfusable items, input items were divided into two groups: for alternating lists, one group of three confusable items and one group of three nonconfusable items; for isolate lists, one group of five confusable items and one separate nonconfusable item. Confusable items were assumed to differ by δC, and nonconfusable by δN. The separation between of confusable and nonconfusable items was determined by a third parameter, δNC.
Each rank unit was assumed to be activated maximally by a specific rank ρ, and to assume an activation based on this rank and the rank actually being encoded (r), according to the scaled log normal function: where Rρ(r) is the activation of the rank unit with preferred rank ρ, during presentation of the item at rank r. As shown in Figure 1D, this function leads to graded, compressive response profiles resembling those reported by Nieder et al. (2002, 2006; Nieder and Miller, 2003, 2004) (see Fig. 1B), graded in the sense that rank units respond maximally to a particular rank, but also submaximally to other ranks, and compressive in the sense that unit tuning curves broaden with increasing rank.
Each unit in the internal layer took inputs from a unique pair of item and rank units, and assumed an activation value based on the product of their activation values: where hρψ is the activation of the internal unit receiving input from the rank unit with preferred rank ρ and the item unit with preferred stimulus ψ. The δ symbol indicates that internal unit activation was augmented by the indicated activation product on each step of encoding. At each step of encoding, multiplicative noise, with SD ν, was applied to input and internal layers.
Each output unit represented a unique ordering of the six items represented in the item input layer. Every output unit received inputs from all internal units. At the end of encoding, the activation of each output unit was set according to the softmax function: where a i is the net input to unit i, determined by the activations of the internal units (hj) and the intervening connection weights wij: The task simulated was immediate forward recall for six-item sequences.c The target sequences always included the same six items. At the onset of each new trial, all unit activations were set to zero. Presentation of target items then proceeded as described above. After presentation of the sixth list item, the output layer was updated, and its most active output unit identified the output sequence.
Internal to output weights were set initially to 0. All 720 possible target lists were presented in random order, without replacement, and after each trial the internal to output weights were adjusted using the δ rule: where α is a learning rate, and ti is the target value for output unit i for the present target list (1 for the output unit representing the target list, otherwise 0). The learning rate was dynamically adjusted to minimize the training duration, which was truncated at 500 cycles through the training set. However, essentially identical results were obtained with a fixed learning rate of 0.001 and a fixed training duration of 2500 cycles. The noise parameter ν was set to zero during training.
To evaluate performance under a given set of parameters, the model was tested 50 times on each sequence in the training set, and average positional accuracy was computed. In addressing each behavioral benchmark, parameters minimizing root mean squared error were sought through grid search over the model's three free parameters σ, δ, and ν (in the mixed-list simulation, the five parameters σ, δC, δN, δNC, and ν).
In behavioral studies of serial recall, plotting recall accuracy by serial position typically results in a “bow-shaped” curve (Fig. 3D), reflecting a recall advantage for initial items (the primacy effect) and a smaller advantage for the last one or two items (the recency effect). The positional recall accuracy of the model displayed this same profile, as shown in Figure 3E. This pattern of performance stems from two factors. Both the primacy and recency effects derive from edge effects, because there are fewer opportunities for items at the boundaries of the sequence to exchange positions with near neighbors. The primacy effect derives, additionally, from the greater distinctiveness of items at the beginning of the list, driven by the compressive rank code of the model.d The contribution of this factor can be seen by comparing Figure 3, E and F. Figure 3F illustrates the performance of the model when ordinary Gaussian rather than log-normal rank codes are used, eliminating the broadening of tuning curves with increasing rank. As comparison with E makes clear, this change to the model significantly reduces the magnitude and extent of the primacy effect.
Another consistent finding from behavioral studies of serial recall is that when an item is recalled at the incorrect serial position (a transposition error), its recall position is likely to lie near its original position. As shown in Figures 3 and 4, the model's recall performance displayed this same property. This aspect of the model's behavior derives from the similarity structure of its internal representations. As a result of the form of the rank representations of the model, items in nearby ordinal positions are represented more similarly than items in more widely separated positions, a factor that makes it relatively common for the model to confuse the locations of closely spaced items.
Effects of interitem similarity
In behavioral studies, when sequence items are highly confusable (e.g., phonologically similar in verbal recall), recall performance is undermined (Fig. 4B). Conrad (1965) showed that this is attributable in part to an increase in the number of transposition errors when items are confusable. Moreover, transpositions in confusable lists are prone to span wider lags than in nonconfusable lists (Henson, 1996) (Fig. 4D). The performance of the model displayed these same effects (Fig. 4C,E). Variations in interitem similarity were simulated by varying the degree of overlap between activation patterns in the model's item input layer (see Materials and Methods) (Fig. 4A). Increasing interitem similarity reduced recall by increasing the number of transpositions, and increased the tendency of items to transpose across relatively wide lags. Once again, the model's performance can be understood in terms of the similarity relations among its internal representations. The internal representations of two different list orderings are more similar, and therefore more confusable, when items are relatively highly overlapping than when they overlap less.
Another behavioral finding that has received a great deal of recent emphasis involves recall for sequences of highly similar items (e.g., in verbal recall, the phonologically related letters B, P, T, C, G) that contain one or more distinctive items, for example, BPTRCG or BRPMTL. The general finding is that the distinctive or “nonconfusable” items within such mixed lists are recalled as well or better than when the same items appear among other nonconfusable items (e.g., JRYMQL) (Fig. 4B). Varying the degree of overlap among the model's item representations to simulate the presentation of mixed lists (see Materials and Methods) (Fig. 4A) yielded a comparable pattern of recall performance (Fig. 4C).
Another benchmark behavioral finding pertains to recall performance among children versus adults. Not surprisingly, recall accuracy improves with age. A more informative finding is that the transposition curve becomes steeper with age, that is, transpositions tend to span smaller lags (McCormack et al., 2000) (Fig. 5B,C). This effect has been proposed to derive from a progressive sharpening of neural rank representations over the course of development (Lipton and Spelke, 2003). We simulated this by varying the breadth of tuning among the rank input units in the model (see Materials and Methods) (Fig. 5A). Relatively broad tuning yielded recall performance resembling that observed among children (Fig. 5D,E).
One possible objection to the account implemented in the model is that it would seem, in the general case, to require a prohibitively large number of processing units. It is often assumed that conjunctive representational regimes scale poorly, because of the problem of combinatorial explosion. However, O'Reilly et al. (O'Reilly and Busby, 2002; O'Reilly et al., 2003) have demonstrated that this assumption is not generally warranted. In the present model, the use of conjunctive representations of item and order might appear to require at least I × R units, where I is the number of distinct items to be represented and R is the number of distinct ranks. However, as shown in Figure 6(blue data series), the present model can recall six-item lists when equipped with <36 internal units. As in the theoretical account provided by O'Reilly et al. (2002, 2003), the present model's ability to function with only a subset of its internal units is attributable to its use of coarse conjunctive representations, within which any given unit carries information about a range of item-rank pairings. The redundancy inherent in the use of such coarse coding also means that the model can continue to perform accurately if a small number of units are removed after training (data not shown). Another important consequence is that, although the model can function correctly with relatively few internal units, increasing the number of internal units results in performance that is more robust to noise. This is shown in Figure 6 (lower data series), which shows the model's performance under noise across a range of internal layer sizes.
Very similar results were obtained with an implementation of the model in which additive noise was used, an implementation in which activation in each input layer was normalized to sum to 1, and an implementation in which separate output groups were used for each ordinal position, with output item at each position represented by the most active unit in the relevant group. However, as noted previously, use of straight Gaussian rank representations with fixed variance, in place of the original log-normal representations, changed the behavior of the model considerably, yielding a pattern of recall accuracy inconsistent with the empirical data (Fig. 3F).e
We have presented a computational model addressing how serial order is represented in cortical working memory. The model is integrative in two senses. First, the model integrates across three basic findings from single-unit neurophysiology, indicating how they may fit together to subserve a single, critical cognitive function. Second, the model bridges across the domains of neuroscience and behavior, starting from formally specific and highly constraining neuroscientific findings, and leveraging these to explain detailed patterns of recall behavior.
Together, this combination of attributes represents a significant step beyond previous models of serial order processing. A number of psychological models have engaged behavioral data in detail (Page and Norris, 1998; Burgess and Hitch, 1999; Brown et al., 2000; Farrell and Lewandowsky, 2002; Botvinick and Plaut, 2006). In fact, the model we proposed has important features in common with some of these models, most notably the use of overlapping rank representations (Houghton, 1990; Burgess and Hitch, 1999; Brown et al., 2000; Botvinick, 2005; Botvinick and Plaut, 2006). However, in contrast to the present model, most models addressing detailed behavioral benchmarks have not made meaningful contact with neuroscientific data.
Our model also shares basic features with a number of previous models addressing the neural basis of serial order processing, including the use of conjunctive, superpositional sequence representations (Dominey et al., 1995; Dominey, 1997; Beiser and Houk, 1998; O'Reilly and Soto, 2001). The model we presented goes beyond this previous work by making contact with detailed behavioral data.
Predictions of the model
Like other work proposing the dependence of serial order memory on rank representations in the IPS (Marshuetz, 2005; Nieder, 2005), our model predicts that any disruption of these representations should specifically impair immediate serial recall performance. This appears consistent with neuropsychological evidence associating left parietal damage with impairments in memory span (Vallar and Shallice, 1990). A more distinctive prediction of the model is that there should exist neocortical neurons whose response properties take the form of gain fields combining item and order information in a graded and compressive manner. Although the data suggest that such neurons may occur in the inferior prefrontal cortex (Inoue and Mikami, 2006; see Fig. 1A), at least for visual stimuli, gain field representations might well arise first more posteriorly. Indeed, receptive fields resembling those predicted by the model have been observed in the context of motor production, located in the superior parietal lobule (Sawamura et al., 2002).
Directions for additional evaluation and development
To focus on the issue of representation, our model abstracted over several mechanisms and processes, which could be addressed in a fuller implementation. For example, the internal units in our model were assumed to display persistent activation, a key property of active memory widely believed to underpin working memory function (Fuster, 2001; Miller and Cohen, 2001). One way of elaborating the model would be to incorporate specific mechanisms giving rise to sustained activation, along the lines proposed by Compte et al. (2000) or Zipser et al. (1993). Our implementation also did not address the mechanism by which multiplicative codes might be computed. A more explicit account of this might be drawn work such as that of Mehaffey et al. (2005). Another simplification in our model was to abstract, like some previous neuroscientific models (Beiser and Houk, 1998), over the process of recall. This is another area where the model calls for further development, and where previous models once again provide useful precedents (Dominey, 1997; O'Reilly and Soto, 2001; Botvinick and Plaut, 2006).
There also remain a large number of interesting behavioral phenomena to which the present theory might also be applied. Findings not addressed in the present work include list length effects, suffix and modality effects, grouping effects, and effects of irrelevant speech, as well as effects of prior probability (Botvinick and Bylsma, 2005). Testing the applicability of the model to such additional phenomena presents a worthwhile direction for future work.
This work was supported by National Institutes of Health Grant MH16804 (M.B.) and the University of Tokyo International Academic Exchange Grant Program (T.W.).
↵a Although our focus in the present work is on activation-based mechanisms for serial order memory centered in the prefrontal cortex, it is important to acknowledge evidence that memory for serial order information may also depend on long-term memory mechanisms housed in medial temporal lobe structures (Fortin et al., 2002).
↵b As in the model of Pouget and Sejnowski (1997), no effort was made to capture differences in overall firing rates between cortical regions (e.g., between the IPS and prefrontal cortex). Such an undertaking would face the problem that spike rates in the relevant empirical studies have tended to be reported only in normalized form.
↵c Rank units with preferred ranks larger than six were included in the model because, given the graded nature of the rank code, such units naturally contribute to the representation of six-item sequences.
↵d Another consequence of this factor is that exchanges between adjacent items become more frequent with increasing rank. Thus, although the format of the data in Figure 3E does not make it evident, the model is less prone to exchange items at positions 2 and 3 than it is to exchange items 4 and 5.
↵e The strong recency effect in the figure reflects the fact that early list items are more subject to the cumulative effects of noise. If this factor is equalized across items (as might be justified given that in the laboratory task items are recalled one by one), the straight Gaussian implementation yields a symmetric recall accuracy curve, still inconsistent with the empirical pattern.
- Correspondence should be addressed to Matthew Botvinick, Princeton University, Psychology Department, 3-C-10 Green Hall, Princeton, NJ 08540.
- Baddeley, 1986.↵
- Barone and Joseph, 1989.↵
- Beiser and Houk, 1998.↵
- Botvinick, 2005.↵
- Botvinick and Bylsma, 2005.↵
- Botvinick and Plaut, 2006.↵
- Brotchie et al., 1995.↵
- Brown et al., 2000.↵
- Burgess and Hitch, 1999.↵
- Compte et al., 2000.↵
- Conrad, 1965.↵
- Dominey, 1997.↵
- Dominey et al., 1995.↵
- Farrell and Lewandowsky, 2002.↵
- Farrell and Lewandowsky, 2003.↵
- Fortin et al., 2002.↵
- Funahashi et al., 1997.↵
- Fuster, 2001.↵
- Henson, 1996.↵
- Henson, 1998.↵
- Houghton, 1990.↵
- Inoue and Mikami, 2006.↵
- Jonides et al., 2005.↵
- Kermadi and Joseph, 1995.↵
- Kermadi et al., 1993.↵
- Lipton and Spelke, 2003.↵
- Marshuetz, 2005.↵
- Marshuetz et al., 2000.↵
- Marshuetz et al., 2006.↵
- Martin and Gupta, 2004.↵
- McCormack et al., 2000.↵
- Mehaffey et al., 2005.↵
- Miller and Cohen, 2001.↵
- Mushiake et al., 2006.↵
- Nieder, 2005.↵
- Nieder and Miller, 2003.↵
- Nieder and Miller, 2004.↵
- Nieder et al., 2002.↵
- Nieder et al., 2006.↵
- Ninokura et al., 2003.↵
- Ninokura et al., 2004.↵
- O'Reilly and Busby, 2002.↵
- O'Reilly and Soto, 2001.↵
- O'Reilly et al., 2003.↵
- Page and Norris, 1998.↵
- Pouget and Sejnowski, 1997.↵
- Pouget and Snyder, 2000.↵
- Salinas, 2004.↵
- Salinas and Abbott, 2001.↵
- Salinas and Thier, 2000.↵
- Sawamura et al., 2002.↵
- Vallar and Shallice, 1990.↵
- Zipser et al., 1993.↵