Abstract
Recent behavioral studies have given rise to two contrasting models for limited working memory capacity: a “discrete-slot” model in which memory items are stored in a limited number of slots, and a “shared-resource” model in which the neural representation of items is distributed across a limited pool of resources. To elucidate the underlying neural processes, we investigated a continuous network model for working memory of an analog feature. Our model network fundamentally operates with a shared resource mechanism, and stimuli in cue arrays are encoded by a distributed neural population. On the other hand, the network dynamics and performance are also consistent with the discrete-slot model, because multiple objects are maintained by distinct localized population persistent activity patterns (bump attractors). We identified two phenomena of recurrent circuit dynamics that give rise to limited working memory capacity. As the working memory load increases, a localized persistent activity bump may either fade out (so the memory of the corresponding item is lost) or merge with another nearby bump (hence the resolution of mnemonic representation for the merged items becomes blurred). We identified specific dependences of these two phenomena on the strength and tuning of recurrent synaptic excitation, as well as network normalization: the overall population activity is invariant to set size and delay duration; therefore, a constant neural resource is shared by and dynamically allocated to the memorized items. We demonstrate that the model reproduces salient observations predicted by both discrete-slot and shared-resource models, and propose testable predictions of the merging phenomenon.
Introduction
Working memory (WM), the ability to internally maintain and manipulate information, is critical for cognition and executive control of behavior (Baddeley, 1992). A hallmark of WM is its limited capacity: we can actively hold a few (∼4) unrelated items of information at a time (Miller, 1956; Luck and Vogel, 1997; Cowan, 2005). For visual WM, studies suggest that the limited WM capacity can be accounted for by a fixed number of discrete memory slots (“discrete-slot” model) (Pashler, 1988; Luck and Vogel, 1997; Zhang and Luck, 2008). For instance, in Zhang and Luck's (2008) study, a number of colored squares were flashed on the screen, followed by a brief delay. Then, one of the items was cued and the subject had to report the color of cued square by clicking on a color wheel. The performance data were consistent with a model in which the report has a fixed precision regardless of the set size for a small number of items, and is random for the others, suggesting that the information is stored in discrete slots. Another recent study offered evidence for an alternative explanation for WM capacity limit in terms of a shared, finite resources (“shared-resource” model) with a power-law decay of precision as a function of the set size (Wilken and Ma, 2004; Bays and Husain, 2008). Although the discrete-slot model is intuitively appealing, its neural mechanism is poorly understood. A promising explanation is that each item is actively stored in a subset of neurons which fire synchronously at gamma band and different groups of neurons have different phases; the maximum number of phases limits WM capacity (Lisman and Idiart, 1995; Raffone and Wolters, 2001). Yet little direct neurophysiological evidence has been shown (Fukuda et al., 2010), especially when the items are simultaneously displayed. Moreover, an analog feature such as color is more likely to be encoded by a distributed neural representation (Conway and Tsao, 2009), where the similar colors would interfere with each other (Elmore et al., 2011). For these reasons, it remains unclear about the temporal dynamics of a WM circuit underlying limited capacity.
In this study, we investigated this issue using a spiking neural network of Compte et al. (2000) (with parameter variations), which was designed for WM of an analog quantity like a direction or a position on a color wheel. We found that, whereas the neural representation of cues is distributed in a continuous network, the system behaves in a way consistent with the discrete-slot model, because each item is stored in a distinct bell-shaped activity bump and the network is roughly normalized so that the total activity remains approximately the same for different set sizes, regardless of whether persistent activity bumps are uniformly or randomly distributed in space, and across time in the delay, despite fade-out and merging of bumps. Moreover, we identify two distinct dynamical effects limiting WM capacity, namely excessive (respectively insufficient) recurrent excitation leads to a merging (respectively fade-out) of the activity bumps, which have testable behavioral implications.
Materials and Methods
Model setup.
We adopted a ring architecture, suitable for representation of an angular feature by a continuous network with spiking neurons (Compte et al., 2000). The model was originally designed for a spatial WM task, in which the direction, between 0° and 360°, of a spatial cue must be remembered across a delay period (Funahashi et al., 1989). This setting is thus adequate for the Zhang and Luck (2008) experiment, because the position on a color wheel can be described by a directional angle. The model consists of 4096 direction-selective pyramidal cells and 1024 interneurons. Both pyramidal cells and interneurons are modeled as leaky integrate-and-fire neurons (Tuckwell, 1988). The subthreshold membrane potential, V(t), obeys: where Isyn(t) is the total synaptic currents to the neuron, Cm is the membrane capacitance, gL is the leak conductance, and VL is the resting potential; other parameters are the firing threshold potential, Vth, the reset potential, Vres, and the refractory period τ. Cm = 0.5 nF, gL = 0.025 μS, and τ = 2 ms for pyramidal cells; Cm = 0.2 nF, gL = 0.020 nS, and τ = 1 ms for interneurons; VL = −70 mV, Vth = −50 mV, and Vres = −60 mV for all neurons (Troyer and Miller, 1997; Wang, 1999).
The recurrent currents are mediated by the receptors of AMPA (AMPAR), NMDA (NMDAR), and GABA (GABAR). The current from the spontaneous neural activities outside the local network is modeled as task-irrelevant background noise, Inoise. The external current, Iext, encodes the stimuli in a cue array to pyramidal cells. Each neuron thus receives a total synaptic current as:
Currents mediated by AMPAR, NMDAR, and GABAR to neuron i are modeled as:
where [Mg2+] = 1 mm (Jahr and Stevens, 1990), VE = 0 mV, and VI = −70 mV. Given a spike train, {tk}, in the presynaptic neuron, a gating variable, s, for AMPAR or GABAR follows a fast dynamics,
The connectivity between pyramidal cells is structured, consistent with a columnar organization (Goldman-Rakic, 1995; Rao et al., 1999; Constantinidis et al., 2001; Conway and Tsao, 2009). Specifically, the synaptic coupling between neurons i and j, gij, is the synaptic conductance GEE multiplied by W(θi − θj), where θi is the preferred direction of neuron i. This connectivity W(θi − θj) = J− + (J+ − J−)
Decoding method.
Neurons are divided into subpopulations according to the stimuli. By calculating the population vector for the subpopulation of the αth stimulus in a cue array, we decoded its “memory trace” as
Simulation protocol.
We simulated two types of tasks to examine the WM performance: delayed-recall tasks (DRTs) and change-detection tasks (CDTs). In a DRT (Figs. 1⇓⇓⇓⇓⇓–7, 9A,B), the network actively maintains the directions in a cue array as bumps in a delay ≤9 s (Fig. 1B). Each cue array contains one or more different directions, and θin,α denotes the direction of the αth stimulus. A pyramidal cell, with preferred direction θ, thus receives the external input from all n items in a cue array as
In the CDT (Figs. 8, 9C–E), we used the same protocol as that in a DRT for WM retention process, where the color is encoded as a value of hue from 0° to 360° (color set green: 90° to 150°; blue: 210° to 270°; red: 0° to 30° and 330° to 360°). We used the wide connectivity network in Figure 8 (see below) and narrow connectivity network in Figure 9 (see below). In each trial, a cue array (with 2, 3, 4, 6 or 8 colors) and a test array (with the same set size as the cue), separated by a 1 s delay, are shown, and a decision must be made on whether they are the same. In half of the trials, the test arrays, θtest, are identical to the cue arrays, θin, namely “same” trials, where the amplitude of change is Δ = 0°; while in the other half of the trials, one color in the cue array is changed to a color with an amplitude, Δ, from 10° to 90° away from its value, namely “diff” trials. To make such a decision based on the memory, we used a downstream “match-nonmatch” neural circuit, which was previously developed by Engel and Wang (2011). Furthermore, we simplified this neural circuit as a sigmoid function (Fig. 8B), which decreases with the difference between the memory read-outs, θout, and test items, θtest: where a = 0.8214, b = −0.8243, θ0 = 23.57°, and dθ = 6.32° for Figure 8 (according to the human behavioral data of Wilken and Ma, 2004; see below); a = 0.9214, b = −0.9243, θ0 = 28.57°, and dθ = 6.32° for Figure 9C,E (according to the human behavioral data of Lin and Luck, 2009; see below).
Specifically, in a Lin and Luck CDT (Fig. 9C–E), we adopted 3 types of cue arrays, namely far (low similarity), close (high similarity), and far+close, and performed 3 types of tests, namely same, diff1 and diff2 (750 trials for each condition). On far trials, each color in a cue array is randomly chosen from a different color set (green, blue or red); on close trials, all colors are randomly chosen from the same set; on far+close trials, two colors are chosen from the same set, while the other one is chosen from a different set. In same tests, the test array is identical to the cue array; in diff1 tests, one color in the cue array is changed to a color 30° away from the original color (for colors with high similarity, this changed color is on its divergent side); in diff2 tests (only for colors with high similarity in close and far+close trials), one color with high similarity is changed to an intermediate level on its convergent side. A mixture of the same trials (50%) and diff1 trials (50%) is equivalent to the behavioral experiments of Lin and Luck (2009). In both close and far+close trials, the minimum distance between sampled colors are ≥20°.
Quantification of WM performance and capacity.
In DRTs, the WM performance can be measured using parameters Pm and s.d. from the discrete-slot model by a von Mises fit
In this study, we also developed two parameters for WM performance: correct rate of the reports,
For the CDT in Figure 8A (see below), we measured the WM performance using parameters hit rate, false alarm rate, and correct rate of the reports, Pc. We defined the hit rate as the probability to respond to “different” in the diff trials, 1 − P {same, Δ > 0°}, and the false-alarm rate as the probability to respond to different in the same trials, 1 − P {same, Δ = 0°}. The correct rate across the diff and same trials is:
Measures of population activity.
The average firing rate of pyramidal cells is
The instantaneous average recurrent excitatory synaptic conductance, Gα(t), and the instantaneous average firing rate of pyramidal cells, Rα(t) (Fig. 7D,E) of the αth bump are calculated as follows:
We calculated the average firing rates
Results
Population coding gives rise to the discrete-slot model in a continuous attractor network with normalization
Using a continuous network model of spiking neurons selective for an angle θ representing an analog feature such as the position on a color wheel (Fig. 1A), we investigated WM capacity by examining how the system responded to the presentation of an array of directions (Fig. 1B–D). Figure 2A shows the spatiotemporal spiking neural activity pattern of a network with wide connectivity (J+ = 3.62, σ = 11.25°), with firing rates plotted as a color-coded map, for a uniform array of 2, 3, 4 or 6 directions. Several characteristics are worth noting. First, pyramidal cells spontaneously discharge at a low rate (<3 Hz) without tuning to any specific directions before the onset of the cue array. Second, when the cue array is presented, the pyramidal cells, whose preferred directions are close to the stimuli in the cue array, increase their firing rates and form distinct bell-shaped activity profiles (bumps) that encode the directions of the corresponding stimuli. Third, these activity bumps continuously develop after the cue array withdrawn. When the set size is small (Fig. 2A, top), the WM load is low, and all the activity bumps can persist throughout the WM delay with slight drifts. For instance, Figure 2A (upper left) shows that two activity bumps are elicited in the cueing stage and persist during a 1 s delay with almost the identical bump width. Therefore, the representations of directions are actively maintained in WM and can be read out accurately after the delay. On the other hand, when the set size is large, the WM load is high; some activity bumps may fade out or merge in the WM delay. For instance, in a sample trial with 6 directions (Fig. 2A, lower right), one activity bump persists throughout the delay, three bumps fade out and two bumps merge in the early phase of the delay. Hence, after a short delay, i.e., 1 s, the information of three fade-out cues is lost; that of the original directions of two merging cues is blurred.
We assessed the network performance based on the readout from neural population activity in the last 0.25 s of the delay using a population decoding algorithm (Materials and Methods). Consistent with the observations from human visual experiments (Luck and Vogel, 1997; Zhang and Luck, 2008), the network model shows high (poor) performance at small (large, respectively) set sizes. With a 1 s delay (black, Fig. 2B–D), both the correct rate of reports (Pc ≈ 100%) and memory resolution (SD ≈ 2°) are high when the set size is smaller than a critical number (∼4), while Pc sharply decreases to a low level (∼20%) and SD drastically increases to a plateau (∼18°), once the set size exceeds this critical number (Fig. 2C,D). We found that this critical set size not only defines the WM capacity, which maximizes the product of the correct rate of reports and the set size (Fig. 2B), but also sets an upper bound of the number of the distinct activity bumps. In our model, since recurrent network dynamics continue to unfold over time, merging or fade-out could occur later in the delay period, therefore WM capacity depends on the delay duration, as it is shown by comparison of performance with 1 s versus 9 s delay (black vs gray, Fig. 2B–D). We will return to this model prediction later. The plateau of response precision implies that the network represents the memorized directions with an almost constant accuracy, and the low correct response rate indicates that the network forgets some of them if the set size outnumbers WM capacity. Therefore, even though our model is a continuous network and cue directions are encoded by a distributed neural population, it reproduces the defining behavior of a discrete-slot model.
Interestingly, in our model, the neural population activity is normalized, in the sense that the total width of activity bumps and the average firing rate of pyramidal cells are almost independent of the set size. Although the width of a single activity bump in the upper left panel of Figure 2A is obviously wider than that in the upper right panel, the total width of activity bumps are the same. Even the activity bump fades out in Figure 2A lower right, the total width of activity bumps remains essentially constant, as the merging bumps expand while the fade-out bumps shrink. To quantify this intuitive observation, we calculated the total width of activity bumps and the average firing rate of all pyramidal cells in 0.25 s preceding the end of the delay period, and found that both are almost constant despite the set size and the delay durations with or without fade-out and merging (Fig. 2E,F). If we define the WM resources as the activity of pyramidal cells, network normalization indicates that the system recruits a roughly constant amount of memory resource (Bays and Husain, 2008; Buschman et al., 2011), which is dynamically shared by the memorized items (Fig. 2A). With a set size smaller than WM capacity, each activity bump has the same width; the neural representations of items in WM thus equally share the resources. With a set size above WM capacity, an activity bump may merge with another activity bump or fade out, and the WM resources are dynamically shifted from a fade-out activity bump to another activity bump or absorbed by merging activity bumps, which agrees with the defining behavior of shared-resource model (Bays and Husain, 2008). Moreover, to maintain the persistent activity, a distinct activity bump must recruit a minimum number of pyramidal cells to make the local excitation strong enough. Therefore, the normalization, which implies a fixed total bump width, gives rise to a maximum number of distinct activity bumps.
Working memory capacity depends on the strength and width of recurrent excitatory connections
The model has structured excitatory recurrent connections and unstructured inhibitory connections, allowing us to focus on the effects of excitatory-to-excitatory (E-E) connections on WM capacity. Specifically, we gradually varied the E-E connection strength J+ and spatial footprint σ to examine how WM capacity depends on the local recurrent excitation (Fig. 3A). With weak or narrow connections (navy blue), recurrent excitation is insufficient to support any persistent activity bump. Otherwise, WM capacity varies from 2 to 7, which is consistent with the human reports of the single-feature WM capacity (Xu and Chun, 2006). The E-E connectivity thus plays an important role in modulating WM capacity. We found that WM capacity monotonically increases with J+ given a fixed σ (horizontal white line, Fig. 3A). The increase of J+ enhances iso-directional and weakens cross-directional E-E connections. Consequently, neurons within an activity bump receive stronger mutual excitation among themselves, but neurons in different bumps excite each other less effectively, hence self-maintenance of distinct activity bumps is favored and WM capacity is larger. In contrast, with increasing σ given a fixed J+, WM capacity increases at first, then decreases (vertical white line, Fig. 3A). A narrower connectivity (smaller σ) results in less pyramidal cells recruited to represent each item, as well as decreased mutual excitation; it thus is detrimental to the maintenance of an activity bump and leads to a smaller WM capacity. On the contrary, if σ is large, there would be an excessive number of pyramidal cells for the representation of each item (bumps are wide). These wide bumps merge with a high probability, and WM capacity is thus small. Therefore, a large WM capacity requires strong recurrent excitation with an optimal spatial footprint.
Although WM capacity depends on E-E connections, the typical characteristics of the networks with different E-E connections are similar. Given the uniform cue arrays, the performance curves of the network with narrow connectivity (×a, Fig. 3A) are similar to those of the network with wide connectivity (×b, Fig. 3A). Given uniform cue arrays, Pc drastically decreases to a low level and SD sharply increases to a plateau when the set size exceeds WM capacity (∼3 in Fig. 2 and ∼6 in Fig. 3B,C for a 9 s delay), which have a step-like shape. However, given the random cue arrays in which the minimum distance between items is ≥24° (Zhang and Luck, 2009), Pc continuously decreases and SD continuously increases with the set size until reaching the same plateaus for the uniform cue arrays (Fig. 3B,C). Of note, due to the attractor dynamics in our model, the memorized items are stored separately in different discrete slots during retention process (discrete-slot feature), wherefore SD (Fig. 3C) for random cue arrays resembles the human experimental data from the discrete-slot model. Using the global inhibition, the roughly constant memory resource is dynamically allocated to the memorized items (shared-resource feature), wherefore the relative precision, which is normalized by the maximum Ps over all trials (Materials and Methods), follows a power-law decay function of set size, which resembles that observed in the shared-resource model (compare Bays and Husain, 2008, their Fig. 3B, with our Fig. 3D).
Surprisingly, although the persistent activity pattern and performance are considerably different with random versus uniform array of cues, network's normalization is remarkably similar for the two types of cue arrays (Fig. 3E,F). Therefore, the total WM resources are independent of the details of external inputs, but are determined by the E-E connectivity profile (comparing Fig. 2E,F with 3E,F).
Working memory capacity depends on delay duration
Figure 4A shows the same sample spatiotemporal patterns as those in Figure 2A, except for a longer delay of 9 s. Notably, activity bumps may fade out or merge at different times in the delay. This explains why the performance is different for a delay of 1 s versus 9 s (Fig. 2B), when the set size is in an intermediate range. For a small set size, none of the activity bumps will fade out or merge in a prolonged delay, whereas for a large set size, bump fade-out or merging takes place early. In both cases, WM performance is insensitive to the delay duration. On the other hand, for intermediate “critical” set sizes, bump merging and fade-out exhibit slow stochastic dynamics during the delay. Consequently, WM capacity exhibits a dependence on delay duration (Fig. 4B, top). SD also depends on delay duration in a trend mirroring that of correct rate (Fig. 4B, bottom), and the set size at which SD starts to saturate is roughly a linear function of WM capacity (Fig. 4C).
Comparing capacity estimation using different measurements
The behavior of WM performance “near a critical set size” has been examined in a human study as a probe to the forgetting mechanism of WM (Zhang and Luck, 2009). It was assumed that a subject would report a random value in the case he or she forgets the memorized item, or report a value around the original cue when he or she remembers it (Zhang and Luck, 2008). This discrete-slot model was formulated using a 2-parameter von Mises fit. To compare with the psychophysical data from the discrete-slot model, we performed this fit to the distribution of the response offset, θout − θin (unbinned data), for random cue arrays. With a 1 s delay, we found that the discrete-slot fit of simulated data (Fig. 5A) is comparable with that of Zhang and Luck (2008), their Figure 1c. We compared quantities, Pm and s.d. of discrete-slot model with Pc and SD of our model using the same data and plotted as functions of set size (Fig. 5B,C) and delay duration (Fig. 5D,E). These two quantification methods display the same trend of performance: Pm and Pc decreases (s.d. and SD increases) as a function of set size and delay duration. Of note, (1) Pc with high threshold displays a smooth decrease, which resembles Pm (Fig. 5B,D); (2) although SD from our model exhibits a continuous and smooth increasing against set size (Bays and Husain, 2008; Bays et al., 2009) and delay duration, its alternative fit, s.d., reaches a plateau when set size is >4 (capacity), which is consistent with the behavioral data in recent studies using brief delays (Zhang and Luck, 2008; Anderson et al., 2011), and the prediction by Fukuda et al. (2010). Notably, s.d. is nearly constant against the delay duration (Fig. 5E), indicating that a declined performance with longer delays mostly results from the sudden death of the mnemonic items (fade-out; Zhang and Luck, 2009).
Network mechanism underlying fade-out and merging of activity bumps
We have identified the fade-out and merging as the main dynamic effects limiting WM capacity. Indeed, the probability of a fade-out or a merging bump sharply increases from a low level to a high plateau when the set size increases above the WM capacity, as shown for both wide (Fig. 6A) and narrow (Fig. 6B) network connectivities. The sum of the fade-out and merging probabilities approaches 100% when set size is much larger than WM capacity.
With the same parameters as in Figure 6B, and a set size of 6 (capacity), all the activity bumps persists throughout a delay of 9 s (Fig. 7A). However, with a set size of 8 (above capacity), only four activity bumps persist until the end of the delay, whereas two activity bumps fade out (at 67.5° and 337.5°) and two others merge (at 247.5° and 292.5°) into one (Fig. 7B). A persistent activity bump has a bell-shaped spatial distribution of neural activity, while the merging activity bump displays a wide plateau in the spatial profile of neural activity, as shown by the population activity for the last 1 s of the delay (Fig. 7A,B, right).
The single neurons inside these three types of bumps display the distinct firing activities during the delay (Fig. 7C). First, neurons around the peak of the persistent bump at 22.5° (a in Fig. 7B,C), spike at a high frequency in the cueing stage and at a moderate frequency ∼50 Hz with small fluctuations during the delay, which maintain the memory trace of the cue direction. Second, neurons in the bump at 67.5° (b in Fig. 7B,C) that eventually fades out, increase their firing rates upon cue presentation, exhibit persistent activity (∼50 Hz) in the early phase of the delay, but abruptly cease firing at ∼3 s in the delay. This sudden disappearance of mnemonic activity reveals an “all-or-none” mechanism for losing the memorized information. Third, for neurons within and between two bumps at 247.5° and 292.5° (c in Fig. 7B,C), the firing rates are initially quite different. Over time, however, they all converge to a similar activity level of ∼50 Hz late in the delay, when the two bumps eventually merge with each other. Neurons near the centers of the two activity bumps behave similarly as those in a persistent bump, whereas firing rates of neurons located at the edges of the original two distinct bumps slowly ramp up; neurons in the midpoint between the two distinct bumps are essentially silent in the early phase of the delay period, but display a sharp jump of activity to ∼50 Hz in the late phase of delay. Therefore a gradual ramping and a sudden transition from spontaneous activity to persistent state in the delay may be manifestations of a merging phenomenon that is observable at the single cell level.
E-E interactions support the persistent activity (Fig. 3A), which might be also a key factor determining whether a bump fades out or merges with another. To test this, we calculated two quantities for each activity bump in Figure 7B: the instantaneous average recurrent excitatory synaptic conductance, G(t), and the instantaneous average firing rate of pyramidal cells R(t) throughout the delay, and classified them into three groups: persistent bumps (P), fade-out bumps (F), and merging bumps (M) (Fig. 7D,E). In a fade-out bump, G(t) exhibits a sharp decrease at an unpredictable time in the delay, to a small but nonzero level. R(t) decreases to zero Hz, implying that the excitatory drive they receive is below firing threshold. Note that the sudden drop of G(t) precedes that of R(t), as expected for a fade-out process: the decrease of excitatory currents leads to less spikes in a localized activity bump, which in turn results in further weaker recurrent excitation; the cycle continues until the overall excitatory drive becomes too small and the bump fades out. For the bumps that eventually merge, G(t) and R(t) increase during the merging process and reach a high level afterward. The increase of G(t) preceding that of R(t) displays the process opposite to the observation of fade-out: stronger excitatory currents lead to more spikes in the localized activity bump, which results in even more recurrent excitation; when this positive feedback exceeds a certain level, neurons between the two activity bumps receive enough excitation to switch to a high activity state (Fig. 7C, right), and the two activity bumps merge with each other. For persistent bumps, G(t) and R(t) fluctuate but remain roughly constant over time. Their values are smaller than those of merging bumps and larger than those of fade-out bumps. Therefore, insufficient excitation leads to fade-out, while excessive excitation results in merging.
To better examine the correlation between recurrent excitation and neural activity, we calculated the average firing rates R̄ and the average excitatory synaptic conductance of each activity bump Ḡ. Figure 7F shows R̄ plotted against Ḡ for different activity bumps using a uniform cue array of set size 8. Three groups can be clearly discerned: the values of R̄ and Ḡ for merging bumps are larger than those for persistent bumps, which are larger than those for fade-out activity bumps. Specifically, insufficient local excitation (in nS), Ḡ < 32, leads to fade-out; strong recurrent excitation, 32 < Ḡ < 35, ensures a persistent bump; and excessive recurrent excitation, Ḡ > 35, results in merging of activity bumps.
Working memory capacity estimation using change-detection tasks
In addition to the DRTs, the CDT is an alternative experimental scheme widely used to assess WM capacity. However, it is not trivial to base the change-detection performance on that of DRTs, because a CDT includes three stages: (1) sample stage, for encoding the visual inputs; (2) retention stage, for working memory; (3) retrieval stage, for a decision upon the memory, and any of these stages can independently influence the post hoc performance in the detection task. The previous research exhibits either that the sample stage plays a bottleneck role of limiting the number of the encoded items in memory in a bottom-up manner via attention, and thus the working memory capacity (Zhang and Luck, 2008; Buschman et al., 2011), or that retrieval process affects the detection accuracy through an inhibitory reciprocal network (Johnson et al., 2009), nevertheless, little has been unfolded from WM retention process per se. The previous behavioral observation (Basile and Hampton, 2011) demonstrates that the psychometric curves from CDTs mimic that from DRTs; one can therefore predict that change-detection performance would decrease with increasing the set size (Luck and Vogel, 1997; Vogel et al., 2001; Wilken and Ma, 2004; Basile and Hampton, 2011; Elmore et al., 2011). To test this hypothesis, we performed a change-detection task with different set size (Fig. 8).
In this CDT, the set size of the cue array is the same as that of the test array. The network reports that the test array is the same (match) as or different (nonmatch) from the cue array. In half of the trials, we used the test arrays which are identical to the cue arrays, namely same trials (the amplitude of change is 0°; Fig. 8A, top), while in the other half of the trials, one color in the cue array is changed to a color with an amplitude from 10° to 90° away from its original value, namely diff trials (nonmatch; Fig. 8A, bottom). The probability to report match is obtained from a downstream match-nonmatch decision neural circuit (Fig. 8B; Engel and Wang, 2011). In simulations, we found that hit rate decreases and false-alarm rate increases as a function of set size (Fig. 8C), which is consistent with the behavioral observations (Wilken and Ma, 2004, their Fig. 4). Psychometric curves shows the probability to respond to diff as a function of the amplitude of change, |θin − θtest|, for different set sizes (Fig. 8D). Of note, when the amplitude of change is small, e.g., |θin − θtest| = 10°, the change-detection performance is improved as the set size increases, implying that “similarity” could improve the change-detection performance in some parameter regime. When the amplitude is large, e.g., |θin − θtest| > 50°, the psychometric curves are saturated, and the performance curve from CDT mimics that from DRT (Fig. 8E) (Wilken and Ma, 2004; Basile and Hampton, 2011; see also the comparison between the change-detection tasks with different amplitudes of change by Fougnie et al., 2010). Overall, our simulations demonstrate that the change-detection performance would decrease when the set size increases, which agrees with the predicted performance in DRTs in Figure 5, B and C.
Similarity effect on working memory performance
Our model exhibits two distinct mechanisms underlying the decrease of memory precision with the increase of WM loads or delay duration in DRT: fade-out (complete loss of stored information), and merging (more quantitative blurring of stored information). However, a misreporting error from merging can easily be overlooked in analysis of a DRT with a minimum distance ≥24°, using binned data (Bays et al., 2010). We thus proposed two testable tasks to investigate merging or similarity effect on the WM performance.
First, we investigated the merging process using 2-item cues with different similarity and found that (1) merging can take place given a long delay (∼8 s) when the items are of the weak similarity (100°; Fig. 9A); (2) merging leads the memory traces bias to the convergent side (Fig. 9B), where similarity of items gets enhanced (still 2 items), and network could thus confuse one another in a CDT using test arrays which are similar to the cue arrays (Fig. 8), e.g., purple to blue (Elmore et al., 2011).
We then conducted a Lin and Luck (2009) CDT (Materials and Methods), using three types of the cue arrays, namely far (low-similarity), close (high-similarity), far+close, and three types of the test arrays, namely same, diff1 (nonconvergent side), diff2 (convergent side; Fig. 9C). In simulations, merging occurs only between the high-similarity items. Consistent with 2-item-cue result, the memory traces of high-similarity items converge to an intermediate level, while that of a low-similarity item drifts around the cue (Fig. 9D). As a result, the distribution of response offset of the high-similarity items biases to the convergent side, implying an increase (decrease) of the distance between the cue and test arrays for diff1 (diff2, respectively) trials, while that of the low-similarity items is centered at zero. One can thus argue that the similarity would show a great effect on the tests of diff1 and diff2, but little on that of same. To test this, we assessed the probability of choosing same for each trial using a downstream match-non-match decision circuit (Fig. 8B). Figure 9E demonstrates that all three types of trials exhibit similar performance in the same test; trials with high similarity show a better performance in the diff1 test, which resembles the behavioral observation that similarity improves the performance in a Lin and Luck (2009) task, whereas the similarity deteriorates the performance in the diff2 test.
To conclude, we proposed testable tasks to detect the merging in WM delay, and showed that similarity in cue arrays can either improve (Johnson et al., 2009; Lin and Luck, 2009) or impair (Elmore et al., 2011) the detection accuracy, mainly relying on the post-WM comparison process (compare also performance curves in DRT and CDT in Fig. 8E; Hollingworth, 2003; Mitroff et al., 2004).
Discussion
In this work, we carried out a systematic study of WM capacity using a spiking network. We found that the model actively maintains the multiple objects with an analog feature using concurrent activity bumps and reproduces the salient characteristics of performance in visual WM tasks (Bays and Husain, 2008; Zhang and Luck, 2008, 2009; Anderson et al., 2011). The spatial extent (σ) and the strength (J+) of recurrent synaptic excitation greatly affect WM capacity (Wang et al., 2011), in contrast to or complement with previous work that the spatial extent of lateral inhibition determines WM capacity (Macoveanu et al., 2006; Edin et al., 2009). We also identify two distinct dynamical effects limiting WM capacity, namely excessive (respectively insufficient) recurrent excitation leads to a merging (respectively fade-out) of the activity bumps.
Reconciling discrete-slot and shared-resource models in a neural circuit model
Two models have been proposed to understand WM capacity based on the psychophysical observations, i.e., discrete-slot model, wherein the capacity originates from the number of discrete memory slots (Luck and Vogel, 1997; Zhang and Luck, 2008) and shared-resource model, wherein the capacity is conceptually limited by a constant memory resource (Wilken and Ma, 2004; Bays and Husain, 2008). For discrete-slot model, Lisman and Idiart (1995) and Raffone and Wolters (2001) suggested that items are maintained by oscillatory activity across populations. It has been observed that neuronal activity about items in WM is enhanced at specific phases of gamma cycle (Siegel et al., 2009). However, the non-overlapping enhanced phases may result from the sequential presentation of the items. Furthermore, if the memorized items were encoded as the different non-overlapping phases, the interference between them would not be observed in experiments. Alternatively, our model provides the neural mechanism underlying discrete-slot and shared-resource models without considering phase code.
Behaviorally, our model offers a unifying explanation for seemingly incompatible features from the two contrast models. In the psychological studies, the discrete-slot model predicts a hard limit of WM, where memory resolution decreases as a bilinear function of set size, while shared-resource model predicts a monotonic decline (Anderson et al., 2011). Our model exhibits the hard limit of WM capacity in a broad parameter region. However, for the network with narrow connectivity, this hard limit could increase to a large number, therefore the shared-resource-like behavior is observed, using random cue arrays. Furthermore, when the set size exceeds WM capacity, some bumps fade out suddenly, rather than a gradual exponential decay during WM delay. The fade-out implies the “sudden death” of WM in human experiments, which strongly supports the discrete-slot model (Zhang and Luck, 2009) but is hardly accounted for by the shared-resource model (Huang, 2010). While, with randomly distributed cues, correct rates and WM resolutions smoothly decrease, which was taken as strong evidence for shared-resource model (Bays and Husain, 2008). Finally, we found the interference between similar items, which only supports shared-resource model (Wilken and Ma, 2004; Elmore et al., 2011), but would be hardly incorporated in the discrete-slot model. Of note, in the broad parameter region, these two behavioral features, sudden death and “interference,” could coexist in our model; the probability of sudden death (respectively interference) increases in a network with narrow (respectively wide) connectivity. Therefore, our model provides a hybrid view for WM capacity, that the cue items are memorized into different chunks (“activity bumps”); items within the same chunk shows shared-resource-like behaviors (“merging bumps”), while chunks behave like discrete-slots (“fade-out” from through global inhibition; Buschman et al., 2011; Machizawa and Driver, 2011).
Besides the behavioral observations, we also found neural evidence for reconciliation of these two models. First, the overall activity of memory neurons remains nearly constant despite the fade-out and merging of bumps as increasing WM loads, agreeing with the neurophysiological observation that the average firing rate of the prefrontal cortex neurons of behaving monkey is roughly identical using different number of cues during WM maintenance (Siegel et al., 2009). Second, a limited WM resource is shared by all activity bumps and can be reallocated during WM delay (Bays and Husain, 2008). When the set size exceeds WM capacity, local excitation within a bump may be insufficient, and some activity bumps fade out, which leads to the reallocation of its memory resources to other bumps; this may result in excessive local excitation of some activity bumps and merging between them. The merging and fade-out phenomena are correlated, as a result of the “overload effect,” and thus a limited number of activity bumps persist separately. Consequently, a continuous recurrent (attractor) neural network endowed with normalization exhibits a rich repertoire of dynamical effects compatible with the discrete-slot and shared-resource models. Furthermore, the normalization of neural activity is a general principle for sensory information processing (Treue et al., 2000; Reynolds and Heeger, 2009). Here we suggest that it is also a desirable property of WM circuits (Buschman et al., 2011).
Role of recurrent excitatory connection in limited WM capacity
Using a neural network with uniform connections onto and from interneurons, we differentially assessed the impact of recurrent excitatory connections on WM capacity. First, increasing J+ enhances local iso-directional excitation, decreases long-range cross-directional excitation, and thus monotonically boosts WM capacity. While, an intermediate value of σ can maximize the WM capacity. A systematically analysis of the parameter space of J+ and σ indicates that WM capacity is constrained between 2 and 7, consistent with human studies (Xu and Chun, 2006). Furthermore, the E-E connections strongly affect the amount of memory resources, as measured by the total width of activity bumps and the mean population firing rate of pyramidal cells. Using randomly versus uniformly distributed cues, we found that the normalization is independent of the configuration of external inputs. Therefore, for a given network connectivity, the total amount of memory resources is roughly fixed, and different external inputs lead to a different dynamical allocation of resources. Previously, Edin et al. (2009) showed that WM capacity is limited by lateral inhibition, and top-down excitation could rescue a fade-out activity bump. Our work is complementary, suggesting that the recurrent synaptic excitation greatly affects, perhaps even predominantly controls, the limited capacity of a WM circuit.
Similarity effect on change-detection tasks
In our model, a network could show confusable memory slots, using random cue arrays for a set size below capacity, which would result from merging of neural subpopulations storing different items. Merging skews the response offset distribution to the convergent side, and maintains high-similarity objects with poor precision. Consequently, merging causes low Pc as increasing set size in CDTs (Luck and Vogel, 1997; Wilken and Ma, 2004). However, for a given set size of cue arrays, with the different similarities, we found a counterintuitive phenomenon that similarity improves the performance, when the test is placed on the non-convergent side of merging (Lin and Luck, 2009). Johnson et al. (2009) provided a population firing-rate model leading to the same prediction, which had a specialized and tuned network scheme for the retrieval process; they applied the model to behavioral experiments where the discriminability index d′ is larger with similar stimuli than disparate ones. Comparisons between two models are worthwhile. First, the prediction of their model originates from the proposed mechanism of the match-nonmatch decision circuit, rather than the WM retention per se. Second, fade-out is the exclusive mechanism for WM capacity in their model. When fade-out occurs, their model responded to nonmatch, whereas a more reasonable alternative is to respond randomly (since no memory trace is available to guide the response). Furthermore, our biologically-based spiking network (rather than an abstract population rate model) is required to elucidate the detailed circuit dynamics underlying the limited memory capacity during a retention delay.
To conclude, this study focused on delay-period dynamics of a WM circuit, which limits storage capacity for a single feature; the model can potentially be extended to a multi-feature version and used to study the resource allocation over different features (Fougnie et al., 2010). Although we did not explicitly investigate the influence of the delay duration on CDTs, from the result of DRTs, we could predict that the performance would decay as increasing the delay duration (Magnussen et al., 1996; Magnussen, 2000), e.g., in a sudden-death manner (Regan, 1985; Bennett and Cortese, 1996). Other factors may also contribute to determine the WM capacity, such as the role of selective attention during encoding stimulus items, i.e., bottleneck effect (Awh and Jonides, 2001), interactions of the distributed network perspectives of WM (Pessoa et al., 2002), overlaps of neural representation for different items (Warden and Miller, 2007) or synchronous oscillations (Siegel et al., 2009). Regardless, this work revealed and highlighted a rich repertoire of dynamical behaviors that unfold in time and underlie the limited capacity of a WM circuit. It shows that a shared-resource mechanism, using population coding in a continuous network, can nevertheless capture behavioral characteristics predicted by the discrete-slot model. Our work therefore contributes to resolving a major debate in the field, and shed new insights into the neurodynamical mechanism of WM capacity.
Footnotes
This work was supported by NSFC-60974075, 91132702, and the Fundamental Research Funds for the Central Universities (D.-H.W.), NIH-MH062349 and the Kavli Foundation (X.-J.W.). We thank Albert Compte and Wei Ji Ma for discussions, and Moran Furman for help with the model program code. Simulations were carried out at Beijing Normal University.
The authors declare no competing financial interests.
- Correspondence should be addressed to either of the following: Xiao-Jing Wang, Department of Neurobiology and Kavli Institute for Neuroscience, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510, xjwang{at}yale.edu; or Da-Hui Wang, Department of Systems Science and National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China, wangdh{at}bnu.edu.cn