Abstract
Human cognition is characterized by severe capacity limits: we can accurately track, enumerate, or hold in mind only a small number of items at a time. It remains debated whether capacity limitations across tasks are determined by a common system. Here we measure brain activation of adult subjects performing either a visual short-term memory (vSTM) task consisting of holding in mind precise information about the orientation and position of a variable number of items, or an enumeration task consisting of assessing the number of items in those sets. We show that task-specific capacity limits (three to four items in enumeration and two to three in vSTM) are neurally reflected in the activity of the posterior parietal cortex (PPC): an identical set of voxels in this region, commonly activated during the two tasks, changed its overall response profile reflecting task-specific capacity limitations. These results, replicated in a second experiment, were further supported by multivariate pattern analysis in which we could decode the number of items presented over a larger range during enumeration than during vSTM. Finally, we simulated our results with a computational model of PPC using a saliency map architecture in which the level of mutual inhibition between nodes gives rise to capacity limitations and reflects the task-dependent precision with which objects need to be encoded (high precision for vSTM, lower precision for enumeration). Together, our work supports the existence of a common, flexible system underlying capacity limits across tasks in PPC that may take the form of a saliency map.
Introduction
Visual cognition is characterized by high flexibility but also capacity limits. Although the visual system can adapt its representational accuracy, the number of items concurrently processed is limited: in tasks as different as rapid object enumeration or visual short-term memory (vSTM), subjects can only process three or four items at a time. These capacity limits may reflect a general mechanism of object individuation (Piazza et al., 2011; Wutz and Melcher, 2013), commonly accessed in many different attentional tasks and that we suggested may take the form of a saliency (or priority) map (Bisley and Goldberg, 2003). Saliency maps topographically represent the conspicuity (or “saliency”) of items at every location. Map-like architectures for spatial attention have been observed previously in the monkey lateral intraparietal (LIP) area (Bisley and Goldberg, 2003) and the putative human homolog posterior parietal cortex (PPC; Connolly et al., 2002). Critically, PPC has been implicated in studies of capacity limits in both enumeration (Piazza et al., 2002) and vSTM (Todd and Marois, 2004), as well as in visuospatial attention tasks in general, suggesting a shared neural substrate for capacity limits across tasks (Colby and Goldberg, 1999). Proof for the hypothesis of shared neural systems across tasks remains scarce because of a lack of studies investigating more than one task at a time (but see Silk et al., 2010). Here we directly test the hypothesis that a map architecture in human PPC (Gottlieb, 2007; Bays et al., 2010; Melcher and Piazza, 2011; Franconeri et al., 2013) represents individual items with a flexible degree of precision (e.g., modulable by context and task requirements) and reflects capacity limits across different tasks. Recent empirical and computational evidence link lateral inhibition strength between items to the precision of represented items within a map (Roggeman et al., 2010; Dempere-Marco et al., 2012; Sengupta and Melcher, 2014). High inhibition reduces the noise within a map, allowing for precise representations of items, but restricts capacity to few items. Conversely, low inhibition allows for a larger number of items to be represented yet less precisely. The representational precision of a given item varies with the observer's current goals. Whereas in a vSTM task participants encode multiple features, such as location and orientation of items, in enumeration tasks, no precise encoding of object features is necessary. The mere individuation of items is sufficient to encode them as units (Melcher and Piazza, 2011; Wutz and Melcher, 2013).
Here, we manipulated the required representational precision of objects by engaging participants in two tasks: (1) a vSTM task, requiring high encoding precision; and (2) an enumeration task, requiring low encoding precision. This allowed us to test whether there were common PPC maps for the two tasks with changing neural response profile dependent on task demands. We predict a nonlinear increase of PPC activation with increasing number of to-be-encoded objects in a task-dependent manner: in a task requiring low precision, activation should increase only when the set exceeds three or four items, whereas in a task requiring high precision, activation should increase already beyond one item.
Materials and Methods
The current study comprises two experiments, both described below: (1) a main experiment; and (2) a control experiment.
Main experiment
Participants.
A total of 19 healthy adults with normal or corrected-to-normal vision and no history of neurological or psychiatric illness participated in the study, which was approved by the Ethics Committee of the University of Trento. Two participants were excluded from subsequent analysis because of extensive head motion during scanning. All subsequent analyses are based on data from 17 participants (seven females; mean ± SD age, 25.78 ± 10.3 years).
Stimuli.
Stimuli consisted of a variable number of Gaussian modulated sinusoidal grating (Gabor) patches. A given Gabor in a set was individually tilted from vertical to the left or right with a random angle between 15° and 45°. In the main experiment, subjects performed two tasks in different fMRI runs of a single session: (1) enumeration; and (2) vSTM (see below). For the enumeration task, the number varied between one and eight, whereas for the vSTM, it varied between one and six. Each numerosity was presented equally often and at least twice in each of four blocks. Numerosity was fully crossed with saliency that had two levels. For low-saliency displays, all the Gabors had the same contrast of 35%. In high-salient displays, one Gabor was flickering at 20 Hz between 100 and 33% contrast. Because this saliency manipulation did not have a significant effect on behavioral or functional imaging data, we collapsed across saliency levels for all analyses. Two sets of Gabors were created to control for non-numerical factors of the stimuli (http://www.unicog.org/pm/pmwiki.php/Main/Arithmetics). In one set, the overall surface of the Gabor patches was kept constant across numerosities, thus individual item size (varying between 2.65° and 0.93° visual angle for numerosities 1 and 8, respectively) and density (defined as total surface covered by the Gabors divided by convex hull of Gabors) were inversely related to numerosity. In the second set, the individual item size was kept constant (1.26° visual angle), thus total area covered by the Gabors and density increased with increasing numerosity. The average overall surface covered by the Gabor patches across different numerosities was identical for the two sets.
Procedure of main tasks.
Two seconds before each trial, a red fixation dot appeared and remained on the screen for 1000 ms to indicate the upcoming trial. Each trial began with the presentation of a gray fixation dot. After a delay of 800 ms, the fixation dot disappeared for 200 ms, signaling the subsequent onset of a stimulus. A number of Gabor patches (sample) appeared on screen for 200 ms (500 ms for the vSTM task), followed by a white fixation cross presented for 500 ms (600 ms for vSTM trials). In the enumeration task, subjects were to name as quickly as possible the number of Gabors. In the vSTM task, a second display (test) was presented that included only one of the previously shown Gabor patches from the sample set (Fig. 1). Subjects were instructed to judge whether the orientation of the test stimulus was changed with respect to the item in the sample set that had been presented in that location. The orientation of the test Gabor was either identical to the sample item or was a mirrored version along the vertical axis. Subjects were required to give their response within a range of 1.7 s. The next trial started after a variable delay (±0–500 ms) with a mean duration of 7400 ms (7000 ms for enumeration) within which the red fixation appeared. The average trial length was 11.2 s for the vSTM task and 10.4 s for the enumeration task. Each experiment was divided into four fMRI runs. Each run lasted ∼7.1 min for the vSTM task and 6.9 min for the enumeration task. Figure 1 schematically depicts a trial in the main task.
Schematic depiction of a trial in the main experiment. After the initial presentation of a gray fixation cross for 800 ms, followed by a brief blank period, a variable number of Gabors appeared on screen (for 500 and 200 ms in the vSTM and enumeration tasks, respectively). In the vSTM task (left part), a delay period of 600 ms was followed by the presentation of an arbitrarily chosen Gabor that had to be evaluated via button press with respect to a change in orientation (here: orientation changed). In the enumeration task (right part), participants were asked to immediately utter an estimate of the number of Gabors on screen that was recorded and transcoded offline.
Procedure of saccades localizer.
In one additional fMRI run, subjects performed 10 blocks of eye movements, each followed by a baseline period in which identical visual stimulation was presented but participants did not move their eyes. The change between the eye movement and the fixation task was signaled via a change in the color of a central fixation cross. Each block of saccades was composed of 14 sequential presentations of a target cross (width and height, 0.38° visual angle) that appeared ∼5° (up to ±0.42° jitter in x and y) to the left or the right of fixation or near fixation (with the same jitter) for, on average, 1000 ms (±200 ms jitter; five trials of 800 and 1200 ms, four of 1000 ms). Each block used a different order, and block order was randomized across participants. The total duration of the localizer was 4 min.
Imaging parameters.
Functional data in the main experiment were acquired at the Laboratory for Functional Neuroimaging at the Center for Mind/Brain Sciences in Mattarello, Italy on a 4 T MR system (Bruker MedSpec Biospin MR) as T2*-weighted echo-planar image (EPI) volumes using an eight-channel birdcage head coil. Thirty-seven axial slices covering the whole brain were obtained with a TR of 2.2 s (TE, 33 ms; flip angle, 75°; 3 × 3 × 3 mm voxels; no gap). For the saccades localizer task, the TR was 2.4 s. Before each block, we performed an additional scan to measure the point-spread function (PSF) of the acquired sequence, which served for distortion correction. The first three images (6.6 s) in each series served to guarantee stable magnetization and were not recorded. For each participant, an anatomical scan was obtained using a MPRAGE sequence with 176 slices covering the entire brain (TR, 2.7 s; TE, 4.18 ms; flip angle, 7°; voxel size, 1 × 1 × 1 mm; no gap).
Behavioral data analysis.
Vocal responses in the enumeration task were recorded and manually labeled offline. Vocal onset times (VOTs) were determined for each trial using an in-house MATLAB algorithm that detected intensity changes above a participant-specific threshold. We then determined the subitizing range (capacity) per participant by fitting a bilinear function to the VOTs and accuracy rates over numerosities. The function identified the best combination of ranges, one with a 0 slope for small numbers, followed by one with a variable positive slope for larger ones. Subitizing range was operationalized as the intersection of the two lines. As described by Cowan, 2001, vSTM capacity was determined by calculating Cowan's K using the formula K = (hit rate + correct rejection rate − 1)n, with n describing the number of items in a given set.
Imaging data analysis.
After correcting the data for field distortions using the acquired PSF, the functional imaging data were preprocessed using SPM8 software (http://www.fil.ion.ucl.ac.uk/spm/software/spm8). Images were corrected for motion and slice-timing differences, realigned to the first image in the series of the respective experiment, and coregistered to the individual anatomies. For the reported random effects and classifier analyses, the functional images were smoothed with a 6 mm2 FWHM Gaussian kernel after normalization to the standard template of the Montreal Neurological Institute. The fMRI data from the two main tasks were modeled in a common design to allow for direct comparisons between tasks. The enumeration task was modeled with 16 predictors: 1 predictor for each numerosity (8 levels) × saliency (2 levels). The vSTM was modeled using 12 predictors (6 numerosities × 2 saliencies). All predictors were convolved with a canonical hemodynamic response function and its temporal derivative in SPM8. Session-specific motion parameters were included as effects of no interest to account for remaining artifacts attributable to head motion. We extracted and normalized the β weights for each value of numerosity for each participant in both tasks. Voxels were selected as follows. Based on previous fMRI data (Piazza et al., 2003; Todd and Marois, 2004) and on a computational saliency map model of individuation (Sengupta and Melcher, 2014), we hypothesized that the brain activation profile over numerosities in the enumeration task would differ from the profile in the vSTM task in the PPC regions. In the enumeration task brain, it was expected that activity would follow a “subitizing profile,” characterized by a constant level of activation for small numerosities within the subitizing range (1–3) and a linear increases for higher numbers (a “flat and then increase” profile). Conversely, the “VSTM profile” would be characterized by an initial linear increase of brain activity with increasing number of items in the 1–3 range display and a plateau for displays with a larger number of items (“increase and then flat” profile). Any subsequent reference to brain regions that were active in either of the two tasks refers to the activations as defined by these two response profiles. To demonstrate that the same voxels in the PPC flexibly adapt their activation profile to the required representational precision of the task at hand, we analyzed voxels that fulfilled three criteria. First, we isolated superior parietal cortex voxels by an anatomically defined mask using the Wake Forest University (WFU) PickAtlas toolbox in SPM (Maldjian et al., 2003). Second, we used the saccades localizer random-effects contrast (saccade vs fixation) to further restrict our initial PPC anatomical voxel selection. Finally, we chose voxels on the basis of the results from one task (e.g., enumeration) and analyzed their response profile in the other task (e.g., vSTM task).
Additionally, we applied multivariate pattern analysis (MVPA) algorithms to test how far individual numerosities elicit distinguishable patterns of brain activation and how this may differ across tasks. Pattern recognition analysis applied a linear multiclass classification based on support vector machines in the implementation of LIBSVM (Library for Supports Vector Machines; http://www.csie.ntu.edu.tw/∼cjlin/libsvm/), with the regularization parameter C fixed to 1. The data entering classification were 240 β images in total (20 images for each of the six numerosity conditions corresponding to individual trials in each of the two tasks). For each of the regions of interest (ROIs) mentioned below, each individual pattern was mean corrected across voxels, and, to reduce potential session- or time-related confounds, voxelwise activations were normalized by subtracting from each voxel the mean across the six numerosity conditions. Because for each session there were 10 trials per condition, this was done 10 times, starting with the first trial of each condition and repeating the same procedure up to the 10th trial for each condition in a given session. Separate classifiers were trained and tested for the enumeration and the vSTM conditions (with the theoretical chance level corresponding to
Control experiment
Participants.
Six healthy participants (all females; mean ± SD age, 24.4 ± 0.61 years) were tested in the control experiment that was approved by the Ethics Committee of the Humboldt University of Berlin.
Stimuli.
Stimuli consisted of two sets of tilted dark gray bars, displayed against a light gray background. Bars were used instead of Gabor patches to facilitate feature encoding (orientation) and increase performance in the vSTM task (Alvarez and Cavanagh, 2008; Melcher and Piazza, 2011). Two sets were created using a variant of the above described MATLAB routines to control for non-numerical stimulus features in the same way as described above. Individual bar size varied between 0.92° × 0.32° (set size 8) and 2.3° × 0,86° (set size 1) visual angle in set 1 and was fixed to 0.92° × 0.32° in set 2. As in the main experiment, participants were presented with a variable number of items in a vSTM task (one to six items) and an enumeration task (one to eight items).
Procedure and behavioral data analysis.
The control study was designed to test whether minor procedural differences between the tasks in the main experiment (e.g., slightly different duration of stimulus presentation) might explain the observed behavioral performance differences and related changes in brain activation profiles (see below). Each trial started with the presentation of a black fixation dot in the center of the screen (500 ms), followed by the simultaneous presentation of a variable number of tilted bars on the screen for 150 ms (sample). In the vSTM task, the offset of the sample stimulus was followed by the presentation of a white fixation dot for 1000 ms (delay period), which was replaced by the presentation of the test stimulus, containing identical number of tilted bars in identical positions. The participants' task was to remember the orientations of the sample bars and to decide whether or not one of the bars in the test display had changed orientation by 90°, which was the case in half of the trials. The test display was replaced by black fixation dots on button press or after 1.7 s. The next trial started on average 3.8 s (minimum, 3.3 s; maximum, 4.4 s) after the response period, yielding a mean trial length of 7.15 s. In enumeration trials, a 1.7 s response period started immediately with the presentation of sample display. Subsequent trials started after an average interval of 7.8 s (minimum, 7.4 s; maximum, 8.3 s). Intertrial fixation was ensured by the presentation of a gray fixation dot in the center of the screen in both tasks.
Vocal responses in the enumeration task were recorded and transcribed offline. Because of technical limitations, no VOT determination was possible and no manual responses were recorded during the vSTM task. Six additional participants (three females; mean ± SD age, 28.3 ± 7.2 years) were tested with the identical paradigm outside the scanner.
Imaging parameters and analysis.
Functional data were acquired at the Berlin Center for Advanced Neuroimaging on a 3 T TIM Trio scanner (Siemens) as T2*-weighted EPI volumes using a standard 12-channel head coil. Forty-two axial slices covering the whole brain were obtained with a TR of 2.5 s (TE, 25 ms; flip angle, 82°; 2.5 × 2.5 × 2.5 mm2 voxels; 20% gap). The first two images (5 s) in each series served to guarantee stable magnetization and were not recorded. For each participant, an anatomical scan was obtained using an MPRAGE sequence with 192 slices covering the entire brain (TR, 1.9 s; TE, 2.52 ms; flip angle, 9°; voxel size, 1 × 1 × 1 mm2; no gap; generalized autocalibrating partially parallel acquisitions factor, 2).
Functional imaging data were analyzed using the same software (SPM8) and routines as the main experiment. To ensure sampling brain activation from identical voxels, we used the (resliced) second-level masks from the main experiment.
Computational saliency map model
The saliency map model we used to make quantitative predictions on the activation level of the PPC in the two tasks extended the work by Roggeman et al. (2010), who considered set sizes from 1 to 64 to the set sizes (one to eight) used in this experiment. Following Roggeman et al. (2010) and Sengupta and Melcher (2014), we constructed a recurrent on-center, off-surround network with a single layer of 70 completely interconnected nodes (Fig. 2A). Each node can be considered, theoretically, as a group of neurons in the parietal cortex encoding an object or location of an object in an attentional priority or saliency map. The three main parameters that define the type of network are as follows: (1) strength of self-excitation for each node (α); (2) strength of lateral inhibition between nodes (β); and (3) decay constant for the passive decay term (λ). The differential equation governing the time evolution of the network of nodes is given by the following:
where xi(t) is the activation of node at time t, and Ii represents the intensity of external input (∀i,0 ≤ Ii ≤ 1). In our simulation is a unit step function, i.e., it has the value 1 for a certain number of time steps for the particular node i and 0 for rest of the time steps. Input is only presented for a finite amount of time, typically much less than total time of simulation. F(x) is the activation function given by the following formula:
The decay parameter was set to λ = 1. We modeled the dynamics according to the discrete form of the differential equation governing the time evolution. The activation of the nodes are updated at each step according to the following equation:
As reported previously by Roggeman et al. (2010), the inhibition parameter determines the degree to which the network behavior can track a small or larger number of items. At high inhibition, the network activation increases with the number of items up to an upper limit that should reflect encoding capacity in the vSTM task. At medium inhibition levels, necessary to individuate but not to track fine local features of the individuated objects, the network activation should show no detectable increase within the very small number range, followed by a steeper increase for larger numerosities, which should reflect higher capacity in enumeration tasks (also referred to as subitizing range). Very low inhibition allows a larger number of nodes in the network to respond in the case of a larger number of inputs, and this might reflect activation related to numerosity estimation, a task that we did not use in our experiments. The behavior of the network can be understood intuitively in terms of competition: when there is strong competition between nodes, strong nodes inhibit other nodes, leading to a winner-take-all system. When the inhibition parameter is weak, the activity of one node does not inhibit its neighbors, allowing many different nodes to be active at the same time. Thus, there is a tradeoff between inhibition and precision. As in the studies by Roggeman et al. (2010) and Sengupta and Melcher, (2014)), the input was presented to the model for five time steps, and the simulations ran for 50 time steps. Then we plotted the mean of 100 simulations of the presentation of one to eight items under a high inhibition parameter (β = 0.28), simulating the high object feature coding precision requirements of the vSTM task and medium inhibition parameters (β = 0.12) simulating the lower object feature coding precision requirements of the enumeration task.
A, Graphical illustration of the computational saliency/priority model. Self-excitation and inhibitory connection between nodes are indicated by the different endpoints (arrows and filled circles, respectively). B–E, Behavioral results from the main (black) and control (gray) experiments. Average voice onset times (B) and percentage correct (D) across numerosities in the enumeration task and Cowan's K (C) and percentage correct (E) in the vSTM task.
Results
Computational model
One important argument in favor of a common, flexible system for both enumeration and vSTM would be the ability to model such flexibility in the same computational model. To test this, we applied a model of visuospatial saliency maps in parietal cortex initially developed by Roggeman et al. (2010). The model is based on a recurrent on-center/off-surround network of connected nodes, with each node representing a spatial location. The nodes are interconnected, with a self-excitation parameter α and a lateral inhibition parameter β (Roggeman et al., 2010; Sengupta and Melcher, 2014). As shown previously, the behavior of this saliency map model depends critically on the inhibition between nodes. In a high inhibition regimen, we found that activation increased as set size went from one to three items and then reached a plateau (Todd and Marois, 2004; Kawasaki et al., 2008; Fig. 3I). However, for a medium level of inhibition, we found flat activation up to approximately three items (Fig. 3G) as would be expected for subitizing and in line with previous results (Piazza et al., 2003). Thus, changing only the inhibition parameter leads to changing response profiles as a function of the representational precision required by the task at hand. Based on these results, we predict that the activation profiles in the two tasks should reflect the varying representational precision in brain areas organized in a map-like architecture, thus resembling the activation profiles of the computational model.
Brain activation results. A, Brain regions exhibiting a vSTM profile. B, Results of the saccades localizer. C, Brain regions with a subitizing profile. All random-effects contrasts projected onto the top view of left and right hemispheres of an inflated brain template, thresholded at p < 0.05 (FDR-corrected) except saccades localizer (p = 0.005, uncorrected). D, E, Enlarged view of the overlapping activation in PPC. D, Overlap (purple) between saccades localizer (blue) and vSTM (red) activation. E, Overlap (turquoise) between saccades localizer (blue) and enumeration (green) activation. F–I, Empirically observed and computational model activation profiles in the enumeration task (F, G, respectively) and the vSTM task (H, I, respectively) expressed as standardized β weights (data) and arbitrary units (model). Empirical activation profile for enumeration is based on voxels that have been identified by the overlap between vSTM and saccades. The profile for vSTM is based on voxels that have been identified by the overlap between enumeration and saccades. Results from the main and control experiments are shown in black and gray, respectively. Error bars depict SEM.
Behavioral results
Performance in the enumeration task matched the expected response profiles. For numerosities 1–3, the verbal estimates were highly accurate and did not vary in speed (Fig. 2B,D for VOTs and accuracy, respectively). Beyond numerosity 3, latency increased and accuracy decreased for larger numerosity values as expected. To determine the subitizing range, we fitted the data (VOTs and error rates) using a bilinear fit algorithm. Using pairwise t tests, we found VOTs attaining a plateau at seven items (six vs seven items, t(16) = −3.39, p = 0.004; seven vs eight items, t(16) = −0.054, p = 0.958). To avoid artificially reduced estimates of subitizing range, we included only numerosities 1–7 in the VOT analysis. This analysis revealed a subitizing range of approximately three items for VOTs (mean ± SD, 2.71 ± 0.75) and of approximately four items for error rates (mean ± SD, 4.0 ± 0.98).
To confirm the presence of a subitizing range in the enumeration task, we used a one-way repeated-measures ANOVA with numerosity as the only factor to test for (1) the absence of an impact of numerosity for small numerosities and (2) an impact of numerosity for larger numerosities. We observed a marginal impact of numerosity for enumeration performance in the one to three range for VOTs (F(2,32) = 2.95, p = 0.07), which was absent for error rates (F(2,32) = 0.93, p = 0.39). For numerosities 1–4, we observed an impact of numerosity on VOTs [F(3,48) = 18.943, p < 0.001, Greenhouse-Geisser ε = 0.61 (Greenhouse and Geisser, 1959)] that was marginal only in accuracy (F(3,48) = 3.3, p = 0.053, ε = 0.63). In contrast, a marked effect of numerosity was present for larger numerosities between 4 and 8, for both VOTs (F(3,48) = 16.340, p < 0.001, ε = 0.6) and error rates (F(3,48) = 20.203, p < 0.001, ε = 0.9). Together, these results suggest an average subitizing range of three to four items.
Accuracy in the vSTM task significantly decreased with increasing numerosity over the whole range (Fig. 2E), with an average ± SD Cowan's K of K = 1.2 ± 0.53 (maximum K = 1.38; Fig. 2C). The capacity estimate of ∼1.5 items for orientation memory is consistent with previous studies using an orientation task on Gabors (Alvarez and Cavanagh, 2008; Melcher and Piazza, 2011).
Behavioral performance in the control experiment was comparable with the main experiment. Enumeration accuracy was characterized by a near-perfect performance for numerosities one through three and monotonically decreasing accuracy with additional increases of set size (Fig. 2D,E). To visualize vSTM performance, we calculated Cowan's K (Fig. 2C). K (mean K = 2.11; maximum K = 3.0) increased with increasing set sizes until reaching a plateau.
Brain imaging results
GLM results
As a starting point, we traced brain regions that responded according to the different expected response profiles across numerosities in the enumeration and vSTM tasks. For the enumeration task, we were looking for voxels with a response profile that would parallel the behavioral results, that is, voxels that did not exhibit an increase of activation for low numerosities (n ≤ 3) but a parametric increase in activation for higher numerosities (n > 3), equivalent to an exponential function. For the vSTM task, we traced voxels that showed a complementary response profile with an increase of activation for lower numerosities (n ≤ 3), reaching a plateau for higher numerosities (n ≥ 3), equivalent to the inverse of an exponential function. Figure 3 shows the resulting activated networks projected onto an inflated brain template using the Human PALS (population-average landmark- and surface-based)-B12 Atlas (Van Essen, 2005; Van Essen and Dierker, 2007) implemented in Caret software (Van Essen et al., 2001). vSTM (Fig. 3A) activated bilateral precentral regions (frontal eye fields), superior parietal cortex and occipital cortex. Figure 3B shows the activations elicited by the saccades localizer task, consisting mainly of superior parietal and occipital regions. Enumeration (Fig. 3C) activated a large network of frontal, precentral, and parietal regions extending into the occipital cortex (for a detailed list of activated sites, see Table 1). Virtually identical brain regions were obtained when using regressor profiles with lower inflection points of 3 and 2, better matching the empirically observed profiles for enumeration and vSTM, respectively.
Brain areas from the enumeration task, the vSTM task, and saccades localizer
Given our interest in the activity of saliency/priority maps, we sought to sample from the human homolog of monkey area LIP by adopting an inclusive masking approach that only included voxels that were (1) anatomically located in the parietal cortex and (2) were active in the saccades localizer task. From these voxels, we selected only those voxels that were active in either the enumeration task or the vSTM task with their differential response profiles as described above. Overlapping voxels between vSTM and saccades, and enumeration and saccades are shown in Figure 3, D and E, respectively. For voxels that exhibited an STM profile in the vSTM task, we then plotted the activation profile in the enumeration task (Fig. 3F). Conversely, for voxels that showed a subitizing profile, we plotted the activation profile in the vSTM task (Fig. 3H). The response profile changed completely as a function of the specific task at hand. Voxels that paralleled the behavioral profile in the enumeration task changed their profile in the context of the vSTM task and vice versa. In both cases, the β values varied significantly with numerosity as indicated by one-way repeated-measures ANOVA (vSTM, F(5,80) = 11.1, p < 0.001, ε = 0.70; enumeration, F(7,112) = 31.73, p < 0.001, ε = 0.72).
To statistically validate that voxels in PPC changed their response profile with task requirements, we analyzed the data points that were common to both tasks (i.e., numerosities 1–6) by fitting a log-linear function (Anobile et al., 2012) to the β weights in the context of both tasks. The function of the form
Very similar results, including a reversal of response profiles as a function of task requirements, were obtained from the following: (1) voxels in the parietal cortex without masking the task-related activation maps with the saccades map; and (2) voxels within the saccades map without applying the masks from the respective activation maps. This suggests that the observed pattern of results is not confined to the voxels from the mask conjunction but that other regions in PPC also exhibit a similar response behavior.
To test whether minor procedural differences between the enumeration and vSTM tasks rather than the task itself could account for the observed differences in the neural response profiles, we conducted a control experiment. In particular, in the main experiment, there were slightly longer stimulus presentation times during the vSTM task, which in theory might have influenced the results such as by allowing for more saccadic eye movements during the vSTM task compared with enumeration.
As shown in Figure 3, F and H (depicted in gray), despite now using identical stimulus displays and identical stimulus presentation parameters in the two tasks, we again found that brain activation profiles in PPC showed remarkable differences between tasks. In line with the results from the main experiment, we found a stable activation level for numerosities 1–4 in the enumeration task and an increase of activation with increasing set size (more than three). In contrast, in the vSTM task, activation increased from one to approximately four items before reaching a plateau and decreasing again for six items. It is important to note that the pattern of results from the main experiment was replicated despite independent samples, different MR systems, imaging parameters, and stimuli, and this underlines the idea of flexible coding accuracy as the driving factor in the current findings. Thus, it is unlikely that marginal procedural task differences (e.g., stimulus duration) in the main experiment played an important role in the observed differences in response profiles between the two tasks.
Decoding results
From single-cell recordings in monkeys (Roitman et al., 2007), we know that area LIP contains neurons that code for numbers. Previous fMRI multivariate decoding experiments showed that individual numbers can be discriminated in the human brain using activation patterns in the PPC (Eger et al., 2009) and in the functional equivalent of area LIP (Eger et al., 2013). In the present context, we used decoding to obtain an additional neuronal index of differential capacity limitation in the two tasks: we asked whether the number of items that can be accurately discriminated varied as a function of task demands. The confusion matrices in Figure 4 depict classification results from multiclass classification between the six numbers for both tasks and ROIs. A repeated-measures ANOVA of Cohen's κ with the factors ROI (PPC vs PVC) and task (enumeration vs vSTM) revealed higher classification performance for enumeration (F(1,16) = 42.16, p < 0.0001) and in PPC (F(1,16) = 158.97, p < 0.0001). No interaction was observed (F(1,16) = 0.19, p = 0.666). This result was corroborated by significantly better classification accuracy (percentage correct) for enumeration (F(1,16) = 74.14, p < 0.0001) and in PPC (F(1,16) = 10.997, p = 0.0044). Individual classification accuracy was higher than chance (16.7%) in each ROI and task (PPC enumeration, 34%, t(16) = 10.96, p < 0.0001; PPC vSTM, 30.3%, t(16) = 9.24, p < 0.0001; PVC enumeration, 25.4%, t(16) = 9.1, p < 0.0001; PVC vSTM, 21.1%, t(16) = 2.9, p = 0.0104).
Classification results. Confusion matrices displaying the percentages of trials in which patterns from each of the six given numerosities were classified as the same or each one of the other numerosities. Values along the diagonal correspond to correct classifications, and off-diagonal values correspond to misclassifications. A, Confusion matrices of the multiclass classification algorithm in PPC in the vSTM task (left) and the enumeration task (right). B, Confusion matrices of the multiclass classification algorithm in PVC in the vSTM task (left) and the enumeration task (right). C, Confusion matrices in PPC when the multiclass classification algorithm generalized from one task to the other. For example, on the left, the classifier was trained using numerosities from the vSTM task and tested with numerosities from the enumeration task (vSTM → Enum).
Paralleling the GLM results, classification performance for different numerosities was modulated by task, resulting in different classification profiles. For enumeration, classification was on average much higher than for vSTM, being best for values 1–2 and 5–6 and reaching lowest values for classification of three and four items. However, in vSTM, classification peaked at low numerosities and decreased with increasing numerosity, resulting in a broader confusion range for larger numerosities compared with enumeration. To further corroborate these impressions, we computed classification “tuning curves” for each task and numerosity in both ROIs. That is, for each predicted numerosity (e.g., one item), we computed the difference between the correct classification (i.e., the value on the diagonal in the confusion matrix) and the mean of the false classifications (i.e., values off the diagonal in the confusion matrix). We compared these classification profiles in a repeated-measures ANOVA with the factors task (enumeration and vSTM) and numerosity (one through six) for each ROI. In PPC, we observed a main effect of numerosity (F(5,80) = 37.54, p = 0.0001, ε = 0.73). Task marginally influenced performance (F(1,16) = 3.62, p = 0.08). Most importantly, an interaction between numerosity and task (F(5,80) = 16.49, p < 0.00001, ε = 0.8) indicated that the confusion profile across numerosities was significantly modulated by task. For the early visual area, main effects of task and numerosity were significant (F(1,16) = 7.49, p = 0.0146; F(5,80) = 2.57, p = 0.0443, ε = 0.83) but did not significantly interact with each other (F(5,80) = 2.23, p = 0.0838, ε = 0.72). In summary, these analyses based on decoding directly show that the number of items that can be accurately represented is lower in the vSTM task than the working memory task, confirming the prediction related to differential capacity limitations.
Finally, the notion of shared neural map architecture underlying enumeration and vSTM may suggest that numerosity coding should at least partially generalize across tasks. To test this idea, we trained a classifier to discriminate numerosity information in one task (e.g., enumeration) and tested it with numerosities from the other (e.g., vSTM). Overall classification rates were lower than in within-task classification (t(16) = −8.79, p < 0.0001; Fig. 4C), and they were higher in PPC than in early visual area (F(1,16) = 19.35, p = 0.0005), whereas no task difference or interaction between task and ROI was observed (F(1,16) = 0.009, p = 0.925; F(1,16) = 0.124, p = 0.7292). Classification rates were significantly better than chance in PPC for both vSTM to enumeration generalization (22.2%, t(16) = 4.928, p = 0.0002) and enumeration to vSTM generalization (22.0%, t(16) = 3.76, p = 0.0017). However, classification rates in early visual areas were at chance level (enumeration to vSTM, 17.8%, t(16) = 1.38, p = 0.187; vSTM to enumeration, 17.5%, t(16) = 1.246, p = 0.2305).
These results may lead to the conclusion that number is automatically extracted in this region independently of the tasks that subjects are performing. However, a closer exploration of the generalization confusion matrices suggested that this is not the case because we observe an asymmetric pattern of confusions between numerosities across tasks. In Figure 4C, classification errors in the top right part of the confusion matrix seemed to be more prominent compared with the bottom left part of the matrix. This would imply that a classifier that was trained to infer numerosities from PPC activation while subjects were performing the vSTM task would tend to “underestimate” numerosities when fed with PPC activation while subjects perform the enumeration task. This is compatible with the idea that numerosity might be encoded as the number of peaks in a saliency map and that, for any given number of objects during enumeration, there are systematically more active peaks than during vSTM. To test this hypothesis, we directly compared classification errors in the top right to errors in the bottom left, separately for each ROI. Because generalization in early visual area was not above chance, we restricted this analysis to PPC. False classification rates in the vSTM to enumeration generalization were systematically biased (F(1,16) = 42.96, p < 0.0001) and bias was modulated by direction of generalization, as indicated by the interaction (F(1,16) = 20.51, p = 0.0003). We observed significantly higher false classification rates in the top right part of the confusion matrix compared with the bottom left part when generalizing from vSTM to enumeration (t(16) = 6.45, p < 0.0001) but not when generalizing from enumeration to vSTM (t(16) = 0.81, p = 0.429).
Discussion
Going beyond previous, separate studies showing different patterns of PPC activation for enumeration and vSTM tasks in different subjects, here we demonstrate that the same voxels in PPC are involved in both tasks. However, their response profiles were flexible and task dependent. For example, voxels defined by their flat activity profile up to approximately three items during enumeration exhibit a different response profile during vSTM, closely tracking vSTM capacity.
The current findings provide direct evidence that object enumeration and vSTM, previously studied separately, share a crucial neural mechanism that reflects modulations of their capacity limits. Starting from a map architecture in which representational precision could be varied according to task demands (Roggeman et al., 2010; Sengupta and Melcher, 2014), we predicted task-specific activation profiles reflecting previously reported performance patterns. In such a flexible saliency/priority map, a small number of items can be presented with high precision with minimal noise to allow for rich encoding of stimulus features, such as orientation and spatial position, that were required in our vSTM context. With lower precision, more items would be represented at the cost of lower feature resolution, albeit sufficient for mere enumeration of items in a given set (Melcher and Piazza, 2011). Such map architecture has a number of advantages. First, it provides a way to account for both evidence of discrete representations and also the fact that capacity limits change across context and task (Melcher and Piazza, 2011; Franconeri et al., 2013), providing a way forward from debates about slots versus resources (Franconeri et al., 2013). Second, these maps are biologically plausible models of the well studied behavior of neurons in PPC areas, such as the lateral intraparietal sulcus (Gottlieb, 2007).
Although a saliency/priority map in PPC that represents multiple objects with a variable degree of precision would parsimoniously account for the observed pattern of results, we cannot exclude the contribution of other regions to the observed capacity limits of visual perception. For example, in the case of vSTM, there are additional processes involved beyond individuating items (for review, see Melcher and Piazza, 2011; Wutz and Melcher, 2013). However, at the very least, our results indicate that PPC is part of a neural network that reflects capacity-limited information coding. Moreover, we show how a saliency/priority map model can account for previous results showing differential variation in activation profiles as a function of number of items for both enumeration and vSTM tasks by varying only a single parameter: inhibition between nodes.
To further validate the predictions of the saliency/priority map hypothesis, we used MVPA over the PPC regions to decode cortical activity patterns associated with the different set sizes in the two tasks. Distributed activity in PPC was differentially modulated by set size across the two tasks: numerosity was accurately decoded from PPC activation in the two tasks, but the overall decoding accuracy of numerosity was higher for the enumeration task than the vSTM task. Indeed, the tuning precision of the decoder was high across all numerosities in the enumeration regimen, whereas it was high only for numerosities 1 and 2 in the vSTM regimen. These data are in line with the idea that a common saliency/priority map in PPC codes for multiple objects with a different degree of precision in tasks differing by the amount of individual object resolution required. In particular, the low decoding accuracy for larger numerosities in the vSTM task supports the idea that activity of the saliency map is concentrated on a limited number of items, and, because of a lack of free resources, item information of additional items in a given set is lost when set size exceeds the given task-specific capacity. Additionally, beyond what was shown in the univariate analyses, we demonstrated that, in the enumeration regime, PPC was sensitive to number throughout the entire tested range of numerosities, including numerosities 1–3, which were impossible to differentiate based on the activation level in the univariate analysis. This finding speaks to a previously open question of whether small numerosities are coded using the same parietal cortex mechanisms as the ones involved in the coding of high numerosities or whether there is a separate neural system underlying subitizing (Vetter et al., 2011; He et al., 2013). Although overall activation was constant across numerosities in the 1–3 range, the information distributed over several PPC voxels was sufficient to discriminate number, in line with the existence of number neurons in the PPC of the macaque monkey specifically tuned to a broad range of numerosities (Roitman et al., 2007). The multivariate analysis also allowed us to directly compare the representations of items in PPC with those of PVC. Although some information related to numerosity was also present in PVC, numerosity coding was more reliable [enumeration, 34% vs 25% decoding accuracy in PPC and PVC, respectively (chance = 16.7%) vs vSTM, 30% vs 21%] and more precise (i.e., smaller width of the tuning profile in PPC vs PVC in both enumeration and vSTM) in PPC. These results are consistent with the idea that PPC is specifically sensitive to the individuation of specific items rather than just the total amount of visual stimulation. Consequently, the task-specific response profiles dismiss the idea that PPC activation merely reflects automatic extraction of numerical information regardless of task requirements.
Finally, the multivariate analysis allowed us to more directly compare the information in PPC and PVC during the two tasks. We found that numerosity information in the PPC (but not in PVC) generalized across tasks: here the decoder was trained on the data from the vSTM task and tested on the data from the enumeration task (and vice versa). Interestingly, however, training the decoder in a high inhibition, vSTM regime led to an underestimation of numerosity as encoded in PPC during the low inhibition, enumeration regime. For example, a PPC pattern evoked by six elements in the low inhibition task was more similar to a pattern evoked by four elements than to a pattern evoked by six elements in the high inhibition task. This finding provides additional support to the idea of a flexible representation system in which the same voxels change their response profile as a function of task.
Because of the very nature of our vSTM task and our neural measure (the BOLD signal extends over many seconds) in the present study, we cannot distinguish whether PPC activation is related to the perceptual encoding versus the memory maintenance stage of vSTM. Previous studies on brain activity during similar vSTM tasks have shown that the BOLD signal is similarly modulated by set size both at encoding and during memory maintenance (Todd and Marois, 2004). Hence, it is reasonable to assume that both encoding and maintenance rely on activity in neural circuits in PPC that are organized in a map-based architecture. Maintaining the positions of the items, weighted for their behavioral and sensory relevance may indeed be one of the key functions of the map architecture in PPC.
The differential activation patterns reported for the enumeration and the vSTM task conditions may be seen as reflecting “task difficulty.” Indeed, both the pattern of error rates and response times indicate that increasing set size differentially modulates the difficulty of the two tasks (it is equally easy to enumerate sets of one to three items, whereas the difficulty for encoding/maintaining the features of the objects increases with increasing set sizes from one to three). According to our model, this is attributable to the differential amount of lateral inhibition of the saliency map that is set by the task demands and that occurs right at the encoding stage in both tasks. Therefore, in this respect, our model precisely explains why set size differently modulates difficulty in the two tasks. Although we cannot completely exclude that response time may have affected the PPC BOLD activation amplitude, especially for the enumeration task, our idea of task-specific flexible representational precision in a map architecture is confirmed by MVPA, for which the BOLD amplitude is discarded. Nevertheless, it remains an interesting question for future research to investigate whether or not PPC activity for different set sizes would be similar to the currently observed ones after equating both the maintenance component and the response time components across tasks.
Conclusions
Overall, the current results suggest that previous reports of neural activity in parietal cortex during enumeration and vSTM tasks reflect a common, flexible system to represent multiple individual objects. This flexibility can be accounted for by a map-like architecture. Indeed, such a model is biologically plausible, given previous studies of PPC and would help to reconcile findings showing both discrete representations and variations in capacity and resolution of representations in different tasks across multiple experiments.
Footnotes
- Received June 28, 2013.
- Revision received May 19, 2014.
- Accepted May 22, 2014.
This work was supported by the Italian Ministry of Universities and Research (Research Programs of Relevant National Interest 2009), the Caritro Foundation (Fondazione Cassa di Risparmio di Trento e Rovereto), German Research Foundation Grant KN 959/2 (A.K.), India-Trento Programme for Advanced Research (R.S.), and European Research Council Starting Grant 313658 (D.M.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. André Knops, Department of Psychology, Humboldt University of Berlin, Rudower Chaussee 18, D-12489 Berlin, Germany. knops.andre{at}gmail.com
- Copyright © 2014 the authors 0270-6474/14/349857-10$15.00/0