Abstract
Visual working memory is an online workspace for temporarily representing visual information from the environment. The two most prevalent empirical characteristics of working memory are that it is supported by sustained neural activity over a delay period and it has a severely limited capacity for representing multiple items simultaneously. Traditionally, such delay activity and capacity limits have been considered to be exclusive for maintaining information about objects that are no longer visible to the observers. Here, by contrast, we provide both neurophysiological and psychophysical evidence that the sustained neural activity and capacity limits for items that are continuously visible to the human observer are indistinguishable from those measured for items that are no longer visible. This holds true even when the observers know that the objects will not disappear from the visual field. These results demonstrate that our explicit representation of objects that are still “in view” is far more limited than previously assumed.
Introduction
Visual working memory (WM) is considered to be an online workspace for temporarily representing task-relevant visual information that is no longer visibly present so that it may be manipulated or acted upon at a later time (Baddeley and Hitch, 1974; Cowan, 2001). Within the brain, evidence for this system has come from sustained neural activity that is measured once the memoranda are removed and generally persists until the observer must make a report (Pasternak and Greenlee, 2005; Jonides et al., 2008). This so-called “delay activity,” has been observed in numerous cortical areas across a wide variety of techniques including single-unit electrophysiology in monkeys (Miyashita and Chang, 1988; Funahashi et al., 1989; Fuster, 1990; Miller et al., 1993; Chafee and Goldman-Rakic, 1998; Constantinidis et al., 2001; Buschman et al., 2011) as well as electroencephalography (Klaver et al., 1999; Vogel and Machizawa, 2004), magnetoencephalography (Robitaille et al., 2010), and functional magnetic resonance imaging in humans (Courtney et al., 1997; Todd and Marois, 2004). While there is broad agreement in the literature that the neural delay activity is likely the neurophysiological implementation of visual WM, a fundamental attribute of this activity has remained untested. Specifically, does delay activity necessitate the removal of the memoranda from view, or does this neural phenomenon occur even when the relevant information remains continuously visible to the observer? At a broader level, this question has fundamental implications for the requisite conditions for visual WM itself; that is, is visual WM exclusively engaged once task-relevant information disappears, or is it continuously in operation for representing both visibly present and absent information?
In addition to delay activity measured at the neural level, visual WM has notable characteristics at the behavioral level. The most striking of these characteristics is that the capacity of this system is highly limited (Pashler, 1988; Luck and Vogel, 1997). While there is ongoing debate regarding whether this capacity is best described as a maximal number of items (Awh et al., 2007; Zhang and Luck, 2008) or as a fluid amount of resources (Wilken and Ma, 2004; Bays and Husain, 2008), all current models agree that behavioral performance is only highly accurate for a small amount of information at one time, often as few as three simple items (Fukuda et al., 2010). Note, though, that these limits have generally been observed in task situations in which the items are removed from view after an initial encoding period, and it is unclear whether similar capacity limits would be observed for memoranda that remain within view. On one hand, performance limits observed in visual WM tasks may be the consequence of a capacity-limited “storage” mechanism that becomes engaged once task-relevant information is removed. On the other hand, the limited performance may instead be due to a limited-capacity “representational” mechanism that is engaged regardless of whether the items are still in view or not. In the current study, we examined these questions by testing whether the same neural and behavioral indices of visual WM would be observed for displays of items that either remained continuously in view or disappeared across a delay.
Materials and Methods
Overview
We conducted three experiments. In the first experiment, we recorded event-related potentials (ERPs) from human observers while they performed a common visual WM task (Vogel and Machizawa, 2004; McCollough et al., 2007; Drew and Vogel, 2008). Experiment 2 and 3 served as follow-up behavioral experiments to test potential alternative hypotheses for the main results in Experiment 1.
Participants
A unique set of neurologically normal college students participated in each experiment [n = 25 (12 males), 27 (13 males), and 32 (18 males) for Experiment 1, 2, and 3, respectively]. All participants gave informed consent after the procedures of a protocol approved by the Human Subjects Committee at the University of Oregon.
Experiment 1
Behavioral procedure.
As in Figure 1a, at the beginning of each trial, a central arrow cue (200 ms) instructed the participants to covertly attend to the items in either the left or the right hemifield. In the absent condition, after the offset of the arrow cue (500 ms), the sample array was presented for 100 ms, which was followed by a blank retention interval for 900 ms. In the present condition, the sample array was presented for 1000 ms, which was directly followed by the test display without any blank intervals. The test display was presented until participant's response.
All stimulus arrays were presented within two 6.2° × 11.3° rectangular regions that were centered 5.1° to the left and right of a central fixation cross on a gray background (8.2 cd/m2). Each sample array consisted of two, four, or six colored squares (1° × 1°) in each hemifield. Each square was selected randomly without replacement from a set of nine highly discriminable colors (red, pink, brown, blue, cyan, violet, green, yellow, and white). Stimulus positions were randomized on each trial, with the constraint that the distance between squares within a hemifield was at least 2.6° (center to center). The test cue was presented at one of the sample array positions. Each test cue consisted of two colored rectangles with half the width of sample squares. One was the same color of the sample square at that position, and the other was a new color that was not presented in the sample array. Participants were asked to indicate the color of the sample stimulus at that location by pressing one of the two buttons on a game pad controller. The responses were unspeeded with strong emphasis on their accuracy.
Each participant performed six conditions of 180 trials per condition [two (absent or present) by three (two, four, or six in sample array size)]. The absent and present conditions were delivered in separate blocks in random order across participants, which assured that the participants knew in advance how long the sample array would be visibly present for each trial. Sample array size was randomly selected within the block.
Computing visual capacity.
We computed individual's visual capacity with a standard formula (Pashler, 1988; Cowan, 2001) that essentially assumes that if a participant can access and hold in K items from an array of S items, then they can indicate which color was presented at the cued location on K/S trials. To correct for guessing, this procedure also takes into account the chance level performance. The formula is K = S(P − 50)/50, where K is the visual capacity, S is the size of the array, and P is percentage correct.
Electrophysiological recordings and analysis.
ERPs were recorded using our standard recording and analysis procedures (Vogel and Machizawa, 2004; McCollough et al., 2007; Drew and Vogel, 2008), including rejection of trials contaminated by blinks or large (>1°) eye movements. We recorded from 22 standard electrode sites spanning the scalp, including international 10/20 sites F3, F4, C3, C4, P3, P4, O1, O2, PO3, PO4, T5, and T6, as well as nonstandard sites occipital left (OL; midway between O1 and T5) and occipital right (OR; midway between O2 and T6). The horizontal electrooculogram (EOG) was recorded from electrodes placed 1 cm to the left and right of the external canthi to measure horizontal eye movement, and the vertical EOG was recorded from an electrode beneath the right eye referenced to the left mastoid to detect blinks and vertical eye movements. Trials containing ocular artifacts, movement artifacts, or amplifier saturation were excluded from the averaged ERP waveforms. The average proportion of rejected trials was 5.5% (SD, 4.6) across the participants. The electroencephalography and EOG were amplified by an SA Instrumentation amplifier with a bandpass of 0.01–80 Hz (half-power cutoff, Butterworth filters) and were digitized at 250 Hz by a personal computer compatible microcomputer.
We computed contralateral waveforms by averaging the activity recorded at right hemisphere electrode sites when participants were cued to the left side of the sample array and vice versa. To rule out the possibility that the number of accepted trials from left and right hemisphere electrodes is unbalanced, we performed a three way ANOVA (arrow cue by sample array set size by presentation condition) on rejection rate. The result showed no main effect of arrow cue condition (F(1,24) = 1.02, p > 0.32) nor interaction among cue, sample array size, and presentation conditions (F(2,48) = 1.97, p > 0.15), suggesting that our manipulations did not induce any imbalance in the number of the accepted trials from left and right hemisphere electrodes to construct the contralateral waveforms. The contralateral delay activity (CDA) was measured at posterior parietal, lateral occipital and posterior temporal electrode sites (P3/P4, PO3/PO4, T5/T6, OL/OR, and O1/O2) as the difference in mean amplitude between the ipsilateral and contralateral waveforms, with a measurement window of 300–900 ms after the onset of the sample array. Mean amplitudes were compared across conditions by analysis of variance.
Differences in scalp topography were tested by normalizing the amplitude data for each electrode pair (F3/F4, C3/C4, P3/P4, PO3/PO4, T5/T6, OL/OR, O1/O2) and testing for the interaction between the electrode position and the presentation condition (Absent and Present) by a two-way ANOVA, which followed the procedure described by McCarthy and Wood (1985). To examine the effect of time course, the CDA amplitudes were first binned by every 100 ms time window (nine bins from 100∼900 ms), and then subjected to three-way ANOVA with time course (nine bins) by presentation condition (absent or present) by set size (set sizes two, four, and six).
Experiment 2: behavioral procedure
The method of Experiment 2 was identical to that of Experiment 1 except as noted below. The sample array was presented for either 1, 2, or 5 s. In the absent condition, the sample array was followed by a blank retention interval for 1 s. In the present condition, the sample array was directly followed by the test display without any blank retention interval. At the test display, the participants reported the sample color at the test location in the cued hemifield by pressing a button. The number of the sample stimulus in the array was fixed to six items. The participants performed six conditions of 60 trials per condition [two (absent or present) by three (1, 2, or 5 s for sample array presentation)]. All these six conditions were delivered in separate blocks.
To prevent verbal encoding strategy, we presented two digits before the sample array presentation and asked the participants to subvocally rehearse them throughout the trial (verbal suppression method) (Vogel et al., 2001). At the end of randomly selected trials (about every 20 trials in average), the participants were asked to write down two digits that they rehearsed on the paper sheet. Data was used for further analysis from 24 participants who were able to perform the subvocal task 100% correctly. Three participants were excluded from the further analysis because they did not meet this criterion.
Experiment 3: behavioral procedure
As in Figure 4a–d, at the beginning of each trial, a central arrow cue (200 ms) instructed the participants to covertly attend to the items in either the left or the right hemifield. Then, the sample array was presented for 1 s, which was directly followed by the test display. At the test display, the participants reported the sample color at the test location in the cued hemifield by pressing a button. Each sample square was subtended 1.6° × 1.6°. The number of the sample squares in the array was fixed to four items.
There were four conditions with regard to the type of the test cue. The first condition was exactly the same as the present condition in Experiment1. In this basic condition (see Fig. 4a), each test cue consisted of two colored rectangles with half the width of sample squares. One of the two was the same color of the sample square at the test position, and the other was a new color that was not presented in the sample array. In the other three conditions (see Fig. 4b–d), all the sample stimuli stayed on the screen at the test display. If nothing happens except for onset of new color cue at the test display, such a salient transient will exogenously attract attention to that location, which results in easy detection of new color rectangles (Phillips and Singer, 1974; Posner, 1980; Stelmach et al., 1984; Rensink et al., 1997). To avoid this, we presented an additional transient in three ways. (1) Four small dots subtended 0.2° by 0.2° were presented 0.2° apart from the each corner of the test cue (see Fig. 4b). (2) The test cue consisted of four stripe colors (see Fig. 4c). Two colors were alternated side by side and the participants were asked to use only the peripheral two peripheral alternatives in reporting the color of sample stimulus. (3) The test cue was subtended 1.1° by 1.1°, which was 0.5° smaller than original sample stimulus (see Fig. 4d).
Each participant performed four conditions of 60 trials per condition (basic, four-dot, stripe, or small cue). These four conditions were delivered in separate blocks in random order across participants.
Results
Experiment 1: behavioral and electrophysiological experiment
In the first experiment, we compared the visual capacity across two conditions. On each trial, a bilateral sample array of two, four, or six colored squares was presented, and the participants were asked to covertly attend to the colors in the hemifield that was indicated with an arrow cue (Fig. 1a). In the “absent” condition, the sample array was presented for 100 ms, which was followed by a blank retention interval of 900 ms and a test cue. In the “present” condition, the sample array was continuously present for 1000 ms, and the test display immediately followed it without a blank interval. The test display consisted of two conjoined colored rectangles (forming a square) drawn at the position of one of the sample items. The participants indicated the color of sample stimulus at that location by pressing one of the two buttons. The absent and present conditions were performed in separate blocks, which assured that the participants knew in advance how long the sample array would be visibly present for each trial.
Behavioral results
Performance was assessed for each condition using a common formula for estimating capacity (K; Fig. 1b) (Pashler, 1988; Cowan, 2001). In the absent condition, capacity estimates increased from two item arrays to four item arrays, yielding a main effect of sample array size by a one-way ANOVA (F(2,48) = 8.11, p < 0.001; Tukey's HSD, p = 0.001), but there was no further increase from four to six items (Tukey's HSD, p = 0.86). Surprisingly, the same pattern emerged when the items remained present on the screen, increasing from two item arrays to four item arrays (a main effect of sample array size by a one-way ANOVA, F(1,24) = 12.12, p < 0.001; Tukey's HSD, p < 0.001), also showing no increase from four to six items (Tukey's HSD, p = 0.33). Importantly, we found no significant interaction between presentation condition (present vs absent) and set size (F(2,48) = 1.39, p > 0.25), indicating that the same performance limits were observed regardless of whether the items remained visible or not. Furthermore, we found that individual differences in performance for the two conditions nearly perfectly correlated (r values >0.80, t(23) values >6.39, p values <0.001; Fig. 1c), suggesting that the average capacity results observed at the group level were determined by the same underlying limits at the individual level. Importantly, the regression slopes were near 1 with intercepts that were close to 0, which together indicate that both conditions yielded performance that was indistinguishable from one another.
Electrophysiology
While the behavioral results were suggestive that present and absent stimuli produce the same capacity limits, we also sought to determine whether performance was supported by the same neural mechanisms. We used a lateralized electrophysiological marker of working memory capacity referred to as the CDA (Vogel and Machizawa, 2004; Drew et al., 2011; Reinhart et al., 2012) recorded while subjects performed the task. In both the present and absent conditions, we found a sustained negative-going voltage from the parieto-occipital electrodes over the hemisphere that was contralateral to the visual field containing to be observed items. (Fig. 2a,b). The CDA amplitude was highly sensitive to the number of items in the sample array (Fig. 2c,d). It increased from two items to four items (a main effect of sample array size by a two-way ANOVA, F(2,48) = 46.08, p < 0.001; Tukey's HSD, p < 0.001), with no increase in amplitude from four items to six items (Tukey's HSD, p = 0.96). This bilinear function with an inflection at four items is a hallmark of capacity being reached and was equivalent for both the present and absent conditions, yielding no significant main effect of presentation condition (F(1,24) = 2.48, p > 0.12) nor interaction between array size and presentation condition (F(2,48) = 0.50, p > 0.60). Likewise, the scalp topography and time course of the CDA were indistinguishable between the two conditions. With respect to scalp topography, we observed no significant interaction between electrode position and presentation condition (F (1,24) = 1.39, p > 0.21). The time course of the CDA also showed no discernible differences between visibility conditions, with no main effect of presentation condition (F(1,24) = 0.11, p > 0.74) nor interaction between presentation condition, the array size, and the time course bin (nine bins from 100 to 900 ms; F (16,384) = 1.27, p > 0.22) by a three-way ANOVA (see Materials and Methods). Importantly, the observation of a normal time course of the CDA in the present condition demonstrates that this delay activity becomes engaged at the same time following the sample onset, regardless of whether it remained visible or not. This particular finding ruled out an alternative explanation of the behavioral results that subjects may have simply waited to store the items into memory until just before the presentation of the test. If this were the case, we would have expected either no CDA or one that was delayed until just before the test.
The CDA amplitude was tightly correlated between the present and absent conditions (r values >0.79; t(23) values were >6.17; p values were <0.001), yielding nearly identical voltage values within each participant (Fig. 2e). We also found that the known relationship between the CDA and performance was still observed and was nearly identical for both the present and absent conditions (r values were >0.64, t(23) values were >3.99, p values <0.001; Fig. 2f). Together, these results suggest that the same neural signatures of visual WM are observed for both the visibly present and absent memory items.
To test the hemispheric differences, we compared the ERP responses to the sample array by a four-way ANOVA with arrow cue (cued left and right) by presentation condition (absent and present) by sample array size (two, four, and six) by channels (ipsilateral and contralateral channels; F3/F4, C3/C4, P3/P4, PO3/PO4, T5/T6, OL/OR, O1/O2). The result yielded no significant main effect of arrow cue (F(1,24) = 2.04, p > 0.16), nor the interaction between presentation condition, sample array size, arrow cue, and channels (F(26,624) = 0.92, p > 0.57), which suggested that there was not a systematic hemispheric effect. The left and right hemispheres served as a memory (contralateral) or nonmemory (ipsilateral) hemisphere in a very similar way.
Experiment 2: insufficient sample duration?
It is possible that the lack of a behavioral advantage in the present condition was due to the items being visibly available for only one second, which may not have been long enough to process a larger amount of information from the display than the absent condition. We tested this alternative by manipulating the duration of the sample array, with exposures of up to 5 s. In the “absent” condition, the sample array was followed by a blank retention interval of 1 s and a test cue. In the “present” condition, the sample array was immediately followed without a blank interval.
Here, we again found identical performance regardless of visibility condition across the increasing sample presentation durations (Fig. 3). Although extending the sample array duration from 1 or 2 to 5 s significantly increased estimated capacity (main effect of sample array duration by ANOVA, F(2,46) = 3.83, p = 0.02; Tukey's HSD, p = 0.03 between 1 and 5 s, p = 0.09 between 2 and 5 s), the increment was so slight that the slope was 0.10 items per second in the absent condition and 0.09 items per second in the present condition by a linear regression analysis. We interpreted that the slight increase in performance as duration increased was due to the possible contributions of long-term memory encoding.
To test the consistency of the measured capacity between individual observers, correlation analysis of capacity was performed between the sample presentation durations (1,2, and 5 s) and between the presentation conditions (absent and present). We found strong correlations between all of the conditions (Pearson's correlation coefficients ranged from 0.56 to 0.81; all t(22) values >3.19, p values <0.005), which ruled out the possibility that similar capacities for both visibility conditions are due to mere coincidence.
Experiment 3: offset of unprobed items?
In the present condition of the first two experiments, immediately before the presentation of the test problem, each of the unprobed items disappeared from the screen. This highly salient disappearance may have been disruptive to the ongoing representation of the display, leading subjects to adopt a strategy of selecting a small number of items to memorize to protect them from the potentially disruptive offset. To minimize such effect, we left all of the unprobed items on the screen until the participants completed the response to the probed item.
In addition to a condition in which the unprobed items disappeared (Fig. 4a), we introduced three probe conditions (Fig. 4b–d) in which all the sample stimuli stayed on the screen at the test display. It is well known that if a new color cue onsets alone during the test display, such a salient transient would exogenously attract attention to that location, leading to an easy detection of the changed color (Phillips and Singer, 1974; Posner, 1980; Stelmach et al., 1984; Rensink et al., 1997). To avoid this transient confound, we presented an additional transient in three different ways to indicate the item to be reported: (1) Four small dots were presented slightly apart from each corner of the test cue (Fig. 4b). (2) Two colors were alternated side by side in a stripe pattern. The participants were asked to use only the two peripheral alternatives in reporting the color of sample stimulus (Fig. 4c). (3) The test cue was smaller than original sample stimulus (Fig. 4d).
The measured performance again showed a consistent limitation across the four probe conditions (Fig. 4e); no main effect of cue type by a one-way ANOVA (F(3, 93) = 0.37, p = 0.77). As in the previous experiments, significant correlations were also found between all of the conditions (Pearson's correlation coefficients ranged from 0.52 to 0.72; all t(30) values >3.33; p values <0.005), which ruled out the possibility that similar capacity is due to mere coincidence.
To test the hemispheric differences, we compared the behavioral capacity estimates across left and right hemifields (cued left and cued right condition) in our experiments. The result showed a highly consistent capacity estimates across left and right visual hemifields, yielding no main effect of arrow cue (left and right; F values <1.43, p values >0.24) in all the experiments.
Discussion
Our study provides novel evidence that the behavioral capacity limits and the sustained neural activity underlying them are indistinguishable for no-longer-visible and continuously visible items. The same capacity limitation was consistently observed even when stimuli were continuously visible to the observers for displays presented for up to 5 s. We replicated this general finding controlling for both encoding duration and the test probes. Furthermore, our observation of a normal time course of the CDA for the present condition demonstrates that the neural resources necessary for visual WM are deployed at the same time and in the same fashion as in the absent condition. The no-longer-visible and continuously visible conditions were performed in separate blocks, which assured that the participants knew in advance that there would be no blank delay period in the continuously visible condition. Thus, the observed capacity limit is not likely due to a mnemonic strategy that might be involved in the previous change detection task in which the sample stimuli are usually expected to go off after several hundred milliseconds (Luck and Vogel, 1997; Buschman et al., 2011). Together, the present findings are sharply inconsistent with the historical assumption that the delay activity, and associated behavioral limitations of visual WM are exclusively observed when an observer must store sensory inputs that no longer exist. Instead, they argue that the neural and behavioral correlates of visual WM reflect the same limit on representing task-relevant information regardless of its physical presence.
Historically, visual cognition has been subdivided into two coarse stages: perception, which represents spatial and identity information for items that are still within view, and visual memory, by which observers can continue to represent relevant visual information that is no longer visible. Experimentally, tasks that examine “perception” and “memory” have generally differed in terms of whether the information of interest is still visibly available to the observer. The logic being that by making the information no longer available, one can be more confident that the resulting neural activation and performance is primarily determined by memory rather than the perceptual activity driven by the physical presence of the stimulus itself. Using this simple methodological distinction, many neuroscientists have separately characterized the neural and psychophysical properties of these two systems. While this general methodological distinction has been productive, our current results argue that this core assumption regarding the physical availability of the stimulus is not a valid means of separating these two constructs. Instead, these results suggest a strong degree of overlap between the traditionally defined constructs of perception and memory, at least in terms of the two most notable attributes thought to be exclusive of visual WM: sustained neural delay activity (Curtis and D'Esposito, 2003; Pasternak and Greenlee, 2005; Jonides et al., 2008) and behavioral capacity limits (Phillips, 1974; Pashler, 1988; Luck and Vogel, 1997; Vogel et al., 2001). This could either be interpreted as evidence that visual WM is engaged for stimuli that are still in view or that our conscious representation of perceptual information is highly limited. In either scenario, the limiting factor appears to be the ability to simultaneously represent multiple pieces of information, a finding that is consistent with those of Buschman et al. (2011), who demonstrated that capacity limits are determined by an encoding bottleneck in parietal cortex.
Compatible findings of representational limits and delay activity for visible items have been reported in paradigms such as change blindness (Simons et al., 2000), visual search (Rensink, 2000; Emrich et al., 2009), and multiple object tracking (Pylyshyn and Storm, 1988; Oksama and Hyönä, 2004). In each of these paradigms, the observer must actively represent information about multiple items from a display that is continuously in view. However, in each of these other examples, in addition to representing items from the display, the observer must engage in some form of operation among these representations, such as filtering of distractors in visual search tasks and spatial updating of target positions in multiple object tracking tasks. Such operations on their own may have served to limit performance or necessitate delay activity. Alternatively, these tasks might have implicitly required visual memory component. For instance, a visual search task requires participants to remember previously selected search items to avoid reexamine same items to find a target as quickly as possible (Gilchrist and Harvey, 2000; Husain et al., 2001; Peterson et al., 2001, 2007; Boot et al., 2004). Also in multiple object tracking, the observer have to remember previous positions of the target objects to correctly update the current positions of them (Oksama and Hyönä, 2004, 2008; Alvarez and Cavanagh, 2005; Drew and Vogel, 2008). Thus, even though all the stimuli are continuously visible throughout a trial in these paradigms, there are compelling reasons for a participant to rely on their visual working memory. Thus, the observed similar capacity limit for perceptually visible items and no-longer-visible (working memory) items could have been the result of the implicit memory requirement. By contrast, the continuously visible condition in the present study did not require participants to remember items in any traditional means because they simply had to maintain static information about the items within view with no additional cognitive processes. Therefore, the present result directly shows that the limited performance is purely due to the capacity limit in representing items that are still within view. Together, the delay activity and capacity limits for representing multiple visual items are not exclusive to maintenance of information that is no longer there, but also limit our conscious representations for items that are still within view.
Footnotes
This work was supported by a grant from the Japan Society for the Promotion of Science (H.T.) and a grant from the National Institute of Mental Health (E.K.V.). We thank Dr. Hirohito M. Kondo for helping data collection in a pilot experiment.
- Correspondence should be addressed to Hiroyuki Tsubomi, Faculty of Humanities, University of Toyama, Gofuku 3190, Toyama-shi, Toyama 930-8555, Japan. htsubomi{at}hmt.u-toyama.ac.jp