Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Featured ArticleResearch Articles, Systems/Circuits

Neural Correlates of Crowding in Macaque Area V4

Taekjun Kim and Anitha Pasupathy
Journal of Neuroscience 12 June 2024, 44 (24) e2260232024; https://doi.org/10.1523/JNEUROSCI.2260-23.2024
Taekjun Kim
1Department of Biological Structure, University of Washington, Seattle, Washington 98195
2Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Taekjun Kim
Anitha Pasupathy
1Department of Biological Structure, University of Washington, Seattle, Washington 98195
2Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anitha Pasupathy
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF
Loading

Abstract

Visual crowding refers to the phenomenon where a target object that is easily identifiable in isolation becomes difficult to recognize when surrounded by other stimuli (distractors). Many psychophysical studies have investigated this phenomenon and proposed alternative models for the underlying mechanisms. One prominent hypothesis, albeit with mixed psychophysical support, posits that crowding arises from the loss of information due to pooled encoding of features from target and distractor stimuli in the early stages of cortical visual processing. However, neurophysiological studies have not rigorously tested this hypothesis. We studied the responses of single neurons in macaque (one male, one female) area V4, an intermediate stage of the object-processing pathway, to parametrically designed crowded displays and texture statistics-matched metameric counterparts. Our investigations reveal striking parallels between how crowding parameters—number, distance, and position of distractors—influence human psychophysical performance and V4 shape selectivity. Importantly, we also found that enhancing the salience of a target stimulus could alleviate crowding effects in highly cluttered scenes, and this could be temporally protracted reflecting a dynamical process. Thus, a pooled encoding of nearby stimuli cannot explain the observed responses, and we propose an alternative model where V4 neurons preferentially encode salient stimuli in crowded displays. Overall, we conclude that the magnitude of crowding effects is determined not just by the number of distractors and target–distractor separation but also by the relative salience of targets versus distractors based on their feature attributes—the similarity of distractors and the contrast between target and distractor stimuli.

  • object recognition
  • primate
  • saliency computation
  • shape perception
  • temporal dynamics
  • ventral visual pathway

Significance Statement

Psychophysicists have long studied the phenomena of visual crowding, but the underlying neural mechanisms are unknown. Our results reveal striking correlations between the responses of neurons in midlevel visual cortical area V4 and psychophysical demonstrations, revealing that crowding is influenced not only by the number and spatial arrangement of distractors but also by the similarity of features between target and distractors, as well as among the distractors themselves. Overall, our studies provide strong evidence that the visual system uses strategies to preferentially encode salient features in a visual scene presumably to process visual information efficiently. When multiple nearby stimuli are equally salient, the phenomenon of crowding ensues.

Introduction

In cluttered natural visual environments, object recognition capacity can be severely limited. This is well demonstrated by find-the-difference puzzles and inattention blindness displays where even large objects can be missed (Rock et al., 1992; Simons and Chabris, 1999; Mack, 2003). Human psychophysical studies have explored this issue in detail for decades and have demonstrated that diminished object recognition in clutter, referred to as “crowding,” has several defining characteristics (Bouma, 1970; Levi, 2008; Pelli and Tillman, 2008; Whitney and Levi, 2011). First, the discriminability of an object (i.e., target) depends on the distance between the object and surrounding distractors: nearby distractors more severely limit discriminability. Second, crowding worsens with distance from the fovea: target–distractor distance over which crowding operates increases with eccentricity (Bouma, 1970). Third, at a fixed target–distractor distance, the crowding zone is not circular but elongated toward the point of fixation, such that radially positioned distractors induce a stronger crowding effect than tangentially positioned ones (Toet and Levi, 1992; Petrov and Meleshkevich, 2011; Kwon et al., 2014). However, more recent studies have revealed that these principles may not universally apply to all stimulus configurations (Saarela et al., 2010; Manassi et al., 2012). Specifically, the spatial configuration of targets and distractors, not just their number and spatial separation, are thought to be critical for determining the strength of crowding (Herzog et al., 2015).

One prominent model posits that visual crowding represents the loss of information, a by-product of pooled stimulus encoding in visual cortex (Anderson et al., 2012; Zhaoping, 2019; Henry and Kohn, 2022). Concretely, single neurons in low- to midlevel stages of the ventral stream (e.g., V2, V4) are thought to encode the aggregated statistics of the image region within their receptive fields (RFs; i.e., the texture-pooling model) resulting in crowding (Parkes et al., 2001; Pelli and Tillman, 2008; Balas et al., 2009; Freeman and Simoncelli, 2011). An alternative model argues for a lossless encoding process in visual cortex; instead, crowding may result from limited attentional resolution during the decoding process for object recognition in higher processing stages (He et al., 1996; Burrows and Moore, 2009; Chaney et al., 2014). Still other models, based on feature substitution between targets and distractors (Ester et al., 2015), and inaccurate mapping after eye movements have also been proposed (Nandy and Tjan, 2012). These models, however, cannot account for all observed psychophysical results. For example, psychophysical findings demonstrating a reversal of crowding effects with an increase in the number of distractors indicate a role for global stimulus configurations in determining crowding (Manassi et al., 2012, 2013; Herzog et al., 2016). Furthermore, neurophysiological studies have seldom queried the responses of neurons to crowded displays (but see Motter, 2018; Henry and Kohn, 2020, 2022), leaving a significant gap in our understanding of the neuronal mechanisms underlying crowding.

In this study, we targeted macaque area V4, an intermediate stage in the ventral visual pathway known to be critical for object shape processing and recognition (Pasupathy et al., 2020). We used parametrically designed arrays of shape stimuli to investigate how shape selective responses of V4 neurons, in two awake, fixating macaque monkeys, were influenced by the manipulation of distance, number, and position of clutter stimuli. Across the V4 population, we evaluated whether the most effective distractor position varies based on the RF location, as predicted by anisotropy of crowding zones, providing a neurophysiological correlate to human psychophysical observations. We also sought to assess the appropriateness of the texture-pooling model by studying neuronal responses to texture statistics-matched metameric stimuli and crowded displays where we titrated target saliency. To gain insights into the temporal dynamics of representational processes that impact visual crowding, we conducted single trial population decoding analyses as a function of time. Lastly, based on our findings, we propose a novel model that selectively processes salient features in a cluttered visual scene to efficiently manage visual information.

Materials and Methods

Animal preparation

Two healthy adult macaque monkeys (Monkey 1: male, 9.0 kg, 9 years old; Monkey 2: female, 6.0 kg, 9 years old) participated in the study. Animals had custom-built head posts anchored to the skull with orthopedic screws. A V4 recording chamber was placed over the left prelunate gyrus based on structural MRI scans. A craniotomy was performed in a subsequent surgery, 1–2 d before the first recording date. All animal procedures conformed to NIH guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Washington.

During experiments, animals were seated in a primate chair in front of an LCD monitor (57 cm away) and were required to hold their gaze within 1° of a small central fixation spot (0.1° diameter). As animals were engaged in this simple passive fixation task, a series of stimuli were presented in the visual periphery.

Data collection

Recordings were performed using an epoxy-insulated tungsten microelectrode (PHC). The microelectrode was lowered into the cortex with a hydraulic microdrive (MO-97A; Narishige). Signals were amplified, bandpass filtered (150Hz and 8 kHz), and digitized (sampling rate 32 kHz) with a Plexon MAP system (RASPUTIN v2 HLK3 System; Plexon). Spike waveforms were sorted offline using principal component analysis (Offline Sorter; Plexon). Time stamps of single unit spiking activity, eye positions (EyeLink 1000; SR Research), and stimulus events (verified with photodiode signal) were stored at 1 kHz sampling rate for later analysis.

Experimental procedures

Once a well-isolated single unit was identified, we first estimated the position of the RF using a hand-mapping procedure as animals were engaged in a passive fixation task (see above, Animal preparation). This was followed by an automated mapping procedure on a 7 × 7 grid (1° step) centered on the RF estimated by hand-mapping. For this experiment, we used a circular random dot motion patch (diameter, 1.4°) and presented the stimulus for 300 ms, 10 times at each of the 49 grid locations in random order. The revised RF center was estimated by fitting a 2D Gaussian to the measured responses (for more details, see below, RF shape estimation) and the RF size was estimated based on the following equation [based on data from Gattass et al. (1988); validated by Pasupathy and Connor (2001)]:RFdiameter=1.0∘+0.625×RFeccentricity. Following the RF mapping, the main experiment was performed to evaluate the effect of clutter on shape responses. While a monkey maintained fixation, a series of 3–4 stimuli (see below, Clutter conditions) were presented, each for 300 ms with a 300 ms interstimulus interval, at the center of the RF. We studied the responses of 147 well-isolated V4 neurons with RF eccentricities within the central 10° of the visual field (mean ± SD: 5.45 ± 1.80° for Monkey 1; 5.48 ± 2.05° for Monkey 2). For all neurons, we collected a minimum of six repetitions for each of the 111 stimulus conditions described below.

Clutter conditions

To determine how surrounding clutter modulates shape responses and selectivity of V4 neurons, we measured the responses of each neuron to a central target shape in the presence of an array of surrounding distractors that were systematically varied across trials. Overall, we had a total of 111 stimulus conditions (Fig. 1A) where a central target shape was presented at one of eight rotations (0–315°, 45° steps) either alone or in combination with surrounding distractors that varied in number, distance from central target, and their saliency. The following paragraphs describe our target and distractor shape and clutter arrangements in detail.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Visual stimulus design. A, Tabulation of clutter conditions. For each target–distractor configuration (rows), responses to eight target rotations (# Conditions = 8) were evaluated, except when distractors were presented without a target (Target = None). An open circle under Metamer column indicates that metameric stimuli were also presented (see Materials and Methods). B, Shape stimulus set. Target (red box) and distractor shapes were chosen from a set of 2D shapes [a subset of shapes created by Pasupathy and Connor (2001)]. Target stimulus was at the RF center, scaled to half of the estimated RF diameter. C, The target stimulus was presented either (i) alone or in combination with various distractor arrangements which varied in terms of (ii) distance from central target, (iii) number, (iv) saliency defined by target color, and (v) shape, shape + size of distractors. In all conditions, the target shape was shown at eight rotations in 45° increments (as in i). Targets were achromatic or chromatic when presented alone. In all clutter conditions targets were achromatic except when target saliency was titrated by color (iv). Distractors were always achromatic. The target size was the same in all conditions, but it is scaled down in ii for illustration purposes. Distractors were the same size as target except when titrating saliency by size (yellow dot). Metameric stimulus pairs with matched texture statistics (in vi: panels 1–2, 3–4, and 5–6) were included to test the texture-pooling model (see Materials and Methods). Colored dots in A and C identify identical stimulus conditions repeated in the figure for illustration purposes.

Target and distractor shapes

The target and distractor shapes were chosen from a standard set of 2D stimuli constructed by a systematic combination of convex and concave boundary elements (Fig. 1B). In previous studies (Pasupathy and Connor, 2001; Kim et al., 2019), this shape set has been used successfully to evoke a broad range of responses, facilitating the characterization of shape selectivity in V4 neurons tuned to different boundary features, for example, convexities pointing up, concavities to the left, etc. For this study, we used one shape as the target stimulus (red box) across all recorded neurons. This shape has different convex and concave features along the boundary and when presented at eight rotations (0–315°, 45° steps), could evoke a broad range of responses from individual neurons (see Results). The target shape (achromatic; luminance, 20 cd/m2) was presented at the center of the RF, and its size was adjusted to 0.5 × RF diameter. The luminance of the achromatic background was set to 2.4 cd/m2. The distractor shapes were randomly chosen from the same shape stimulus set. They were achromatic and their luminance was randomly set to one of 5.7, 9.8, or 15.4 cd/m2. Distractors were arranged around the target stimulus at varying distances and sizes as noted below.

Target–distractor distance

Target shapes were surrounded by distractors positioned at three radial distances from the target: near, middle, or far (Fig. 1Cii). In the near distance condition, six distractors were evenly spaced and positioned at a radial distance equal to the RF radius of the neuron under study. For middle and far distance conditions, the center-to-center distances between the target and distractors were set to 2× and 3× radius of RF, respectively. Since we estimated V4 RF diameter as 1.0° + 0.625 × RF eccentricity, distractors in near distance are within the critical spacing described by Bouma's law (i.e., 0.4–0.5 × eccentricity), and those in the middle and far distance conditions are outside the critical spacing. To keep the distance between distractors (i.e., density of distractors) constant across the three distance conditions, we used 12 and 18 distractors for the middle and far distance conditions, respectively. Distractor shapes were randomly chosen for each repetition, but they were the same across days. Thus, every neuron in our dataset was subjected to the same array of target + distractor stimuli that were repositioned and scaled to match RF center and size. To avoid stimulating the exact same position with the distractor stimuli, distractors were evenly spaced around the circle, but the precise angular positions were randomly chosen. In addition, we introduced a minor spatial jitter in both the horizontal and vertical positions of individual distractor positions. This jitter amounts to ∼10% of the size of the distractor stimulus.

Number of distractors

To determine how the number of distractors influences responses, we presented the target shape surrounded by one, three, or six distractors (Fig. 1Ciii). The distance between the central target and surrounding distractors was fixed as the radius of the RF. Distractors were evenly spaced along the angular dimension (e.g., 120° apart for three distractors, 60° for six distractors), but their angular positions and shapes were randomly chosen for each trial with a small spatial jitter as described above. The angular positions were carefully calibrated across 10 trial iterations to ensure uniform stimulation of the peripheral region of the RF. All neurons were tested with the same target–distractor arrangements.

Target saliency—color, shape, and size

To determine whether target saliency, that is, target contrast relative to distractors, resists effects of crowding, we compared responses with the target + distractor arrays in the near distance condition discussed above with three conditions where target saliency was enhanced using different cues: color, shape, and size. In the color saliency set (Fig. 1Civ), the distractors were the same as in the near condition, but the target stimulus was defined by a chromatic contrast as well; its luminance contrast was the same as that of the achromatic target (20 cd/m2) in the near condition. In the shape saliency set (Fig. 1Cv, second panel), all distractors were circles to facilitate distractor grouping. In the shape + size saliency set (Fig. 1Cv, first panel), the distractors were small circles: 12 small circles that were half the size of the target (0.5 × target size).

Visual crowding models

Metameric stimuli: texture statistics model

To rigorously test whether neuronal responses to the clutter displays in our main experiment can be characterized as encoding pooled texture statistics, we created metameric stimulus images for a subset of our stimuli using the Portilla–Simoncelli texture statistics model (Portilla and Simoncelli, 2000; see examples in Fig. 1Cvi). The model extracts 740 texture synthesis parameters from each stimulus image and then iteratively modifies a white noise image until its parameters are matched with those from the original stimulus image. We directly compared responses to the original source with those to the synthesized images that share pooled texture statistics. We performed this comparison for three of the stimulus conditions where we presented six random shape distractors, six circle distractors, or 12 small circle distractors, all at a distance of RF radius from the target stimulus (Fig. 1A).

Saliency computation model

We built a three-stage model to simulate how preferential encoding of salient stimuli might arise in midlevel cortical stages. The visual input for the model was the stimulus conditions discussed above, rescaled to 224 × 224 pixels. The first processing stage of the model extracts basic visual features using the first convolutional layer (Conv1) of AlexNet, a deep neural network which was pretrained to classify images into 1,000 object categories (Krizhevsky et al., 2017). This stage convolves the input image with 64 filters of Conv1, and the resulting feature maps are represented with dimensions of 55 × 55 pixels each. The critical next step is to combine multiple feature maps into a single saliency map. Drawing from prior work (Itti et al., 1998; Gao et al., 2008; Coen-Cagli et al., 2012; Erdem and Erdem, 2013), we quantified saliency as the absolute difference in intensity between the center and the surrounding regions. The center region and surround region were defined using spatially overlapping 2D Gaussian functions, with a threefold difference in spatial scale (σcenter = 2 pixels; σsurround = 6 pixels). Then, only those maps with high center-surround difference exceeding a threshold (activation from the center region > 2× activation from the surround region) were linearly combined.

Data analysis

Time course of spiking responses

On each trial, we extracted spike trains (1 ms bin) aligned on stimulus onset. Peristimulus time histograms (PSTHs) were constructed by averaging spike trains across multiple repetitions and convolving with a Gaussian kernel (σ = 5 ms).

Average responses of single neurons

For each stimulus condition, we quantified the average response magnitude by counting spikes within a window from 0 to 400 ms after each stimulus onset (to include both onset and offset responses of V4 neurons) and averaging across multiple repetitions.

Quantification of distractor modulation

To quantify how the distractors that surround the central target modulate responses to the target stimulus, we computed a distractor modulation index (DMI), given by the following:Distractormodulationindex=CSC+S−B−1, where CS,C,S,B represent neuronal responses to target–distractor stimulus configuration, center target stimulus alone, surround distractors alone, and baseline (i.e., no stimulus) conditions, respectively. The denominator indicates the response level estimated by linear summation of center target alone and surrounding distractors alone conditions. Negative DMI values indicate suppressive modulation.

Modulation of shape selectivity by distractors

Shape selectivity of individual neurons was assessed by performing a one-way analysis of variance (ANOVA) of the responses to the eight rotations of the target shape presented in isolation. The target shape stimulus we used for this study is characterized by multiple boundary features (three sharp convexities, two concavities, and a broad convexity). For different rotations of the stimulus, these features occupy different angular positions relative to object center. Depending on the feature preferences of the neuron, the responses as a function of target rotation may not be unimodal; if a neuron responds preferentially to a sharp convexity at the top relative to object center, shapes at 0, 90, and 180° would evoke strong responses. Our shape selectivity metric simply assesses whether responses varied with rotation. Neurons that showed significant (p < 0.05) shape selectivity (73/147; 49.3%) based on this one-way ANOVA were subjected to the following additional analyses. To quantify the effects of distractors on shape selectivity, we evaluated the similarity of shape tuning between target alone condition and each of target–distractor conditions by calculating r^ER , an unbiased estimation of the correlation between responses in the two stimulus conditions (Pospisil and Bair, 2022). This metric corrects the downward bias of correlation (r^2) caused by noise. Briefly, it takes the square root of spike counts from individual trials to stabilize variance and computes noise terms reflected by the trial-to-trial variability. Then, subtracting noise terms from the numerator and denominator of the r^2 gives the unbiased r^ER2 . We took the square root of the r^ER2 , and the sign of r^ER was set to that based on the Pearson's correlation coefficient and the values were truncated to lie in the [−1 1] range. The statistical significance of r^ER was assessed by comparing the observed value with a distribution of r^ER resulting from 1,000 bootstrap samples in which target rotation orders were randomly shuffled. A significant correlation coefficient implies that the shape selectivity survived despite the presence of distractors.

Average shape tuning curves across the population

To construct average shape tuning curves across the population for each clutter condition, we first normalized the tuning curves from each neuron by the responses to the most preferred target shape from the “target alone” condition and averaged the normalized tuning curves across all neurons.

Temporal dynamics of target shape selectivity

To determine the timing of target shape selectivity in each of the distractor conditions, we divided responses into two groups (four preferred targets vs four nonpreferred targets) based on target preference determined from target alone condition and asked when responses between the two groups significantly deviated from each other using the Mann–Whitney U test (p < 0.05) within a 30 ms sliding window (moving in 1 ms steps).

RF shape estimation

We sought to examine whether V4 neuron responses exhibit anisotropies in the effect of clutter as a function of the distractor position relative to the RF center. For each neuron, we obtained the RF map from spiking responses elicited during the automated RF mapping paradigm (see above, Experimental procedures). We then fitted a two-dimensional elliptical Gaussian function to the map to determine the position, size (full-width at half-maximum = 2.355 × σ), and the orientation of the long axis of the RF.

A two-dimensional elliptical Gaussian function is expressed as follows:f(x,y)=A0+A×exp(−(a(x−x0)2+2b(x−x0)(y−y0)+c(y−y0)2)), where A0 is a constant, A is the scale factor, and x0 and y0 are the center of the Gaussian along x and y axis, respectively.a=cos2θ2σX2+sin2θ2σY2, b=sin2θ4σX2−sin2θ4σY2, c=sin2θ2σX2+cos2θ2σY2. Parameters a,b,c are set as above, where θ represents the counterclockwise rotation angle of the Gaussian blob. A value of 0 for θ indicates that the blob is pointing to the right.

Determination of RF hotspot

A 2D Gaussian model described above assumes a symmetrical RF shape from the RF center. Thus, it does not allow for quantifying the difference in RF extent between two directions on the radial axis. To determine if the RF extent in the outward radial direction is larger than the extent in the inward radial direction, we define the RF hotspot as the center of mass of the RF positions that evoke responses that exceed 90% of the peak response.

Realigning distractor positions based on radial and tangential axes from the fixation point

To explore whether V4 neurons represent the anisotropy observed in visual crowding, we determined the radial and tangential axes for each neuron based on their RF locations. The radial axis is defined by the line connecting the RF center to the fixation point, and the tangential axis is orthogonal to it. Subsequently, we realigned the 10 single distractors’ positions onto these established axes. The original positions of the distractors were spaced 36° apart, but to accurately reflect the orthogonal radial and tangential axes, we divided the repositioned positions into 12 overlapping bins, each spanning 60° and spaced 30° apart.

Population decoding analysis

To examine the temporal evolution of target shape processing in V4 neurons under various target–distractor conditions, we assessed population decoding performance over time using a 100 ms sliding window. Using the shape selective neurons identified above (see above, Modulation of shape selectivity by distractors), the analysis procedure was as follows: a linear discrimination analysis (LDA) model was trained using responses from a subpopulation (N = 30) under the target + far distance distractor condition. The fitted LDA model was then utilized to predict the target shape across all other target–distractor conditions (chance level, 0.125). This procedure was repeated 100 times, involving the random selection of 30 neurons without replacement in each iteration.

Experimental design and statistical analysis

Details of experimental procedure and visual stimuli are described above (see above, Data collection and Clutter conditions). The shape selectivity of individual neurons for the eight target shapes was assessed by conducting a one-way ANOVA. The strength of the linear relationship between pairs of variables (e.g., responses from “target alone” and “target + distractor” conditions) was assessed by computing an unbiased correlation metric, r^ER (the square root of r^ER2 , the unbiased estimation of r2 between two sets of noisy neural responses; Pospisil and Bair, 2022). Wilcoxon signed-rank test was used for paired comparison of correlation coefficients or modulation indices between clutter conditions in each neuron. Independent group comparisons were performed using a nonparametric Mann–Whitney U test. A p value of 0.05 or less was considered significant.

Data and software availability

The data and analysis code that support the findings of this study are available from the corresponding author upon request.

Results

We studied the responses of 147 V4 neurons in two awake, fixating macaque monkeys (66 in Monkey 1, 81 in Monkey 2) to examine how their responses and selectivity to an isolated shape are modified by the presence of surrounding clutter stimuli. We used a large set of systematically designed stimulus arrays (Fig. 1) where we varied the number, distance and position of the distractor stimuli, and the salience of target stimuli, to probe how V4 responses correlate with the perceptual characteristics of crowding.

Effects of target–distractor distance on target shape selectivity

A fundamental psychophysical observation is that crowding effects dissipate with increasing distance between target and distractors (Bouma, 1970; Levi, 2008; Whitney and Levi, 2011). Specifically, target–distractor distance that is less than approximately half of the target eccentricity impairs target discrimination (Bouma, 1970; Pelli and Tillman, 2008). To determine if V4 responses exhibit a similar trend, we examined how shape selectivity and response strength are modulated by distractors at three different distances from the target stimulus centered within the RF. The results from an example neuron are shown in Figure 2A–C. When the target stimulus was presented in isolation, this neuron showed diverse responses across stimulus orientation (Fig. 2A, red). When the distractors were far from the target (Fig. 2A, second row), responses were weaker than for target alone (distractor modulation index = −27%), especially for the preferred target orientations (compare columns 1–4 for rows 1 and 2). Despite this, the responses showed considerable variability across target orientations (one-way ANOVA; p < 0.01) and the overall tuning curve was similar to that when the target was in isolation (Fig. 2B, compare red and light gray curves), implying that shape selectivity was well maintained [correlation coefficient (r^ER)=0.99 ; p < 0.01] despite suppressive modulation by surrounding distractor stimuli. The results were similar when the distractors occupied an intermediate distance (Fig. 2A, third row; Fig. 2B, dark gray). However, when the distractors were close to the target (i.e., separated by RF radius, within Bouma's area; see Materials and Methods), the variation in responses across target orientation was weak (Fig. 2B, compare black vs other lines), and as expected, there was a weak correlation between the tuning curves for target alone and target + near distractors (r^ER=0.10 ; p = 0.49).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Effect of target–distractor distance on shape tuning. A–C, Example neuron. A, Raster plots and PSTHs for responses to (rows) target alone, target + far, middle, and near distractor conditions, respectively. Columns show responses to different target orientations, rank-ordered by responses to target-alone. Stimulus panels are shown here at a higher contrast to aid visibility (see Fig. 1C for veridical illustration). B, Tuning curves based on average responses (0–400 ms) for the four target + distractor conditions. Error bars indicate the standard error of the mean. C, Average PSTHs for the preferred (top 4) and nonpreferred (bottom 4) targets for each distractor condition are shown in solid and dotted lines, respectively. Black asterisks indicate time points with significant difference between solid and dotted curves (Mann–Whitney U test in a 30 ms sliding window; p < 0.05). D–F, Population results. D, Average normalized tuning curves across the subpopulation of neurons with significant shape selectivity in the target alone condition (73/147). Error bars indicate the standard error of the mean. E, The distribution of distractor modulation index (DMI) for each distractor condition is presented. Filled bars represent neurons whose shape tuning curve exhibited statistically significant correlations with that in the “target alone” condition. Red symbols denote the mean correlation values in neurons within the lower (0–50%), middle (25–75%), and upper (50–100%) ranges of the DMI distribution. F, The proportion of neurons with significant shape-dependent modulation (i.e., asterisks in C) as a function of time for each distractor condition. The red shaded area, representing the equivalent analysis for the “target alone” condition, is included for panel-to-panel comparison.

Figure 2C illustrates the time course of target shape selectivity as a function of distractor distance. For far and intermediate distractor distance, statistically significant difference between preferred (solid lines) and nonpreferred (dotted lines) targets emerged early (soon after visual response onset) but this was not the case for near distractors.

We observed similar results across our subpopulation of 73 shape selective neurons. Population average tuning curves for target + distractor conditions were shallower (Fig. 2D) and correlation with target alone decreased [mean correlation coefficient (r^ER) : 0.75→0.67→0.24] as target–distractor distance decreased. Differences in all pairwise comparisons between three distributions were statistically significant (Wilcoxon signed-rank test; p < 0.01). The magnitude of suppressive modulation by surrounding distractors was also strongest in the near distractor condition (46% decrease) and weaker in the far distractor condition (31% decrease; Wilcoxon signed-rank test; p < 0.01; Fig. 2E). In each of all three distance conditions (far, middle and near), the average correlation values between tuning curves for the “target alone” and “target + distractors” showed a slight increase as the suppressive modulation magnitude decreased (Fig. 2E, red symbols). The time course of emergence of target shape selectivity across the population was also very similar for target alone and target + far/middle distractor conditions (Fig. 2F): for all except the near distractor case, the proportion of significantly modulated neurons showed a rapid increase ∼50 ms after the stimulus onset, reaching peak values at ∼100 ms.

Effects of distractor number on target shape selectivity

Psychophysical studies have also documented that the effects of crowding increase with the number of distractors (Felisberti et al., 2005; Põder and Wagemans, 2007). To assess the influence of the number of distractors on V4 shape selectivity, we compared responses in the “target alone” condition with those from target + one, three, and six distractor conditions. In this case, all distractors were placed at the same distance from the target (i.e., 0.5 × RF diameter; Fig. 3A). The results from an example neuron (Fig. 3, example unit #2) show that target shape preference was consistent between the “target alone” and “target + one distractor” conditions [Fig. 3A,B, correlation coefficient (r^ER)=0.99 ; p < 0.01]. However, as the number of distractors increased, shape-dependent response variation gradually decreased, and shape selectivity could no longer be observed in the “target + six distractors” condition (one-way ANOVA; p = 0.78). Consequently, the separation between average PSTHs for preferred (solid line) and nonpreferred (dotted line) targets decreased with increasing distractor number (Fig. 3C).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Effect of distractor number on responses and selectivity. A–C, Example neuron responses. A, Raster plots with PSTHs for responses to (rows) target alone, target + 1, 3, or 6 distractors, respectively. Columns show responses to different target orientations. B, Target shape selectivity curves of the example unit for the four different conditions. C, Average PSTHs for the preferred (top 4) and nonpreferred (bottom 4) targets. D–F, Population results. D, Average normalized tuning curves for target shape selectivity in the presence and absence of distractors. E, The distribution of distractor modulation index (DMI) for each distractor condition is presented. Filled bars represent neurons whose shape tuning curve exhibited statistically significant correlations with that in the “target alone” condition. Red symbols denote the mean correlation values in neurons within the lower (0–50%), middle (25–75%), and upper (50–100%) ranges of the DMI distribution. F, The proportion of neurons with significant shape-dependent modulation (i.e., asterisks in C) as a function of time for each distractor condition. All conventions are as in Figure 2.

Population results confirm that more numerous distractors generally induce stronger suppression of preferred responses and a decline in shape selectivity (Fig. 3D–F). As the number of distractors increased (1→3→6), the mean correlation coefficient with target alone decreased (mean r^ER : 0.71→0.45→0.24). The magnitude of suppressive modulation in one, three, and six distractor conditions gradually increased from −37 to −41 to −46%, respectively. However, under a specific distractor number condition, the higher suppressive modulation by distractors was not associated with the weaker shape selectivity (Fig. 3E, red symbols). Another feature is that the strength of suppression from distractors did not increase linearly: three or six times as many distractors did not result in three or six times stronger surround suppression.

Anisotropy of visual crowding in area V4

In addition to distance and number effects of distractors, psychophysical experiments have reported significant inward–outward anisotropy in visual crowding (Petrov et al., 2007; Whitney and Levi, 2011; Nandy and Tjan, 2012). Along a radial axis, connecting target position and the fixation point, studies have found a greater interference from distractors positioned beyond the target than between the target and the fixation point (Fig. 4C). To examine whether V4 provides a neural correlate for this psychophysical finding, we compared the strength of visual crowding effect across various near-surround positions.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Effects of distractor position on visual crowding: anisotropic crowding zone. A, B, RF summary illustrates both location (A) and size (B). RFs from two monkeys are positioned in the lower right visual field and their sizes increase with eccentricity. Red line in B denotes the values computed using the equation employed to estimate the RF diameter (see Materials and Methods). C, For each neuron, the positions of the distractors were transformed into a radial/tangential axis relative to the fixation point. The yellow circles depict two example RF locations. The red, green, and blue arrows indicate the radial outward, radial inward, and tangential directions with respect to the target location. D, The histograms illustrate the correlation between tuning curves for target alone versus target + one distractor calculated at 12 distractor positions, which were realigned based on the radial and tangential axes originating from the fixation point. Filled bars indicate statistically significant cases. In the polar plot, red and black data points compare the median values (red vertical lines) and proportion of significant cases (filled bars) from histograms of the matched directions, respectively. E, RF shape estimation from four example neurons. A 2D Gaussian fit (white ellipses) is superimposed on the raw response map (7 × 7 grid, 1° intervals). Theta angle represents the counterclockwise angle of the major axis of RF with respect to the horizontal line. Red and blue dots indicate Gaussian fit center and RF hotspot, respectively. F, Distribution of the major axis orientation in the Gaussian RF fitting of the recorded neurons. Red vertical line indicates the median value. For the RFs in the lower right visual field, an angle bigger than 90° represents an elongated RF shape oriented toward the fixation point. G, In most neurons, the RF hotspot (y-axis) is closer to the fixation point than the center of the 2D Gaussian fit (Wilcoxon signed-rank test; p < 0.05), suggesting that the RF extent is larger in the outward radial direction compared with the inward direction. The red line represents the unity line.

In the “target + one distractor” condition, we studied responses to targets in the presence of a single distractor positioned at each of 10 evenly spaced near-surround locations on each trial (Fig. 1Ciii and see Materials and Methods). From these data for each neuron, we computed the correlation between target alone” and “target + one distractor” at each of the 10 locations. These correlation values underestimate the true correlation because they are based on a single repeat (Pospisil and Bair, 2022) but we can aggregate the data across neurons to look for population trends. We determined the radial/tangential axes based on the RF location of each neuron (Fig. 4C) and then realigned distractor locations onto these axes (see Materials and Methods; Fig. 4D).

Along the radial axis, the correlation in shape tuning between target alone and target + one distractor was significantly weaker for outward than that for inward position (compare 0 vs 180° in the polar plot; median r: 0.29 vs 0.56; Mann–Whitney U test: p < 0.01; proportion of significant correlation cases: 18.0 vs 29.2%), consistent with the anisotropy observed in psychophysical studies. Psychophysical studies have also demonstrated radial versus tangential anisotropy (Whitney and Levi, 2011; Nandy and Tjan, 2012; Kwon et al., 2014). While we were unable to directly assess this phenomenon in our study because we did not place radial or tangential distractors in pairs, our RF shape analysis revealed that a majority of recorded V4 cells exhibited an elongated RF shape oriented toward the fixation point (Fig. 4E,F). We also found that most neurons exhibited a skew in their RF profile: the RF hotspot associated with the strongest response (Fig. 4E, blue dots) was shifted toward the fixation point relative to the center of the 2D Gaussian fit (Fig. 4E, red dots; Wilcoxon signed-rank test; p < 0.05; Fig. 4G). This observation corroborates previous findings (Motter, 2009) and suggests that visual stimuli arranged radially (and outwardly) may exert a stronger influence on V4 neuronal activity compared with those arranged tangentially (and inwardly).

Visual crowding in V4 cannot be explained by pooled encoding of nearby stimuli

One popular model to explain visual crowding posits that nearby stimuli within the RF of neurons in low- to midlevel visual cortex may be pooled and encoded, for example, in terms of their summary statistics (Parkes et al., 2001; Pelli and Tillman, 2008; Balas et al., 2009; Freeman and Simoncelli, 2011). To test this hypothesis rigorously, we pursued two additional lines of inquiry. First, we asked whether enhancing the saliency of the target stimulus alleviates crowding effects (Fig. 5) reasoning that it would not if the effect was due to pooled encoding. Second, we asked whether metameric versions of crowded displays evoked similar responses (Fig. 6) reasoning that it would if the representation was based on pooled image statistics.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Effects of target saliency (color cue) on visual crowding. Target (gray or colored) appeared alone or in combination with six random distractors. A–C, Example unit responses. A, Raster plots with PSTHs. B, Target shape selectivity curves of the example unit for the four different conditions. C, Average PSTHs for the preferred (top 4) and nonpreferred (bottom 4) targets. D–F, Population results. D, Average normalized tuning curves for target shape selectivity across conditions. E, The distribution of distractor modulation index (DMI) for each distractor condition. Filled bars represent neurons whose shape tuning curve exhibited statistically significant correlation with that in the “target alone” condition. Red symbols denote the mean correlation values for neurons within the lower (0–50%), middle (25–75%), and upper (50–100%) ranges of the DMI distribution. F, The proportion of neurons with significant shape-dependent modulation as a function of time for each distractor condition. All conventions are as in Figure 2.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Effects of target saliency (shape, size cues) on visual crowding. Target appeared alone or in combination with 12 small circles, six circles, or six random distractors. A–C, Example unit responses. A, Raster plots with PSTHs. B, Target shape selectivity curves of the example unit from four different conditions. C, Average PSTHs for the preferred (top 4) and nonpreferred (bottom 4) targets. D–F, Population results. D, Average normalized tuning curves for target shape selectivity across conditions. E, The distribution of distractor modulation index (DMI) for each distractor condition. Filled bars represent neurons whose shape tuning curve exhibited statistically significant correlation with that in the “target alone” condition. Red symbols denote the mean correlation values for neurons within the lower (0–50%), middle (25–75%), and upper (50–100%) ranges of the DMI distribution. F, The proportion of neurons with significant shape-dependent modulation as a function of time for each distractor condition. All conventions are as in Figure 2.

Enhancing saliency

In the preceding experiments, the distractors and target shape were achromatic, possessed bounding contours with varying curvature, and were sized to be ∼0.5 × RF diameter in linear extent (see Materials and Methods). Thus, the target stimuli did not stand out relative to the distractors. We used two strategies to enhance target saliency. First, the central target was defined by a chromatic contrast in addition to the luminance contrast, and we asked how this influenced the strength of responses and shape selectivity.

Figure 5 compares responses to achromatic (gray) and color target stimuli surrounded by six random shaped distractors. Importantly, gray and color targets were defined by the same luminance contrast, but the color targets were also defined by a chromatic contrast. For the example neuron in Figure 5A–C, the shape responses and tuning curves under the “target alone” condition were very similar for the gray and color targets, indicating that shape selectivity was not affected by target stimulus color for this neuron. However, in the “target–distractor” conditions, the results were markedly different. When the gray target was surrounded by gray distractors (black curves), target shape selectivity was completely abolished. In contrast, when the central target had a different color (light blue curves), the presence of surrounding distractors had little impact on shape selectivity (Fig. 5B).

The population-averaged tuning curves of the 62 shape-selective neurons (62/135; 45.9%) reaffirmed that shape selectivity for color targets was more pronounced when compared with gray targets in the presence of multiple achromatic distractors (Fig. 5D–F). Despite comparable suppressive modulation by gray distractors for both color and gray targets (−0.41 vs −0.39), a substantially higher proportion of neurons exhibited significant shape tuning for color targets compared with gray targets (21/62 vs 12/62; Fig. 5E). While the effects were strong in some single units (as in Fig. 5B), across the population, it was weaker than our observed results from the far distractor condition or the single distractor condition (Figs. 2D, 3D, light gray curves). But, it is important to note that the magnitude of these results is comparable with attentional modulation in V4 populations with stimuli of intermediate contrasts (Reynolds et al., 2000). A more pronounced finding was that the onset of target shape selectivity was significantly delayed in the “color target + gray distractor” condition when compared with the color or gray “target alone” condition. This delay was observed in both the single neuron and the population data (Fig. 5C,F).

In a second experiment, rather than modifying the target directly, we enhanced its saliency by presenting distractors of the same shape (all circles) that could be perceptually grouped together (Wagemans et al., 2012; Chang and Gauthier, 2022). Similar to the random shaped distractors, these circle distractors were achromatic and positioned at a distance of 0.5 × RF diameter, but they were either the same size as the target (Fig. 6A, third panel) or smaller (Fig. 6A, second panel), further enhancing the saliency of the central target. The results obtained from an exemplar neuron (Fig. 6A–C) demonstrate that shape selectivity is better preserved when all the distractors share a circular shape [correlation coefficient, r^ER : 0.71 (random) vs 0.84 (circle)], and this effect is further enhanced when the circles are smaller than the target shape (r^ER=0.95) . Across the population (57/133; 42.9%), as more salient cues were added to the target in cluttered scenes (shape cues→shape + size cues), we observed an increase in the slope of the target shape tuning curve (Fig. 6D, black vs gray curves) and enhanced shape selectivity (see proportions of filled bars in Fig. 6E). This finding suggests that salient cues enhance the discriminability of target shapes in cluttered scenes based on V4 responses.

The analysis of the temporal dynamics of target shape selectivity once again highlighted notable distinctions between the target alone condition and the condition involving a salient target in a cluttered scene. For the example neuron (Fig. 6C), the PSTHs across the different conditions showed a similar profile in terms of response onset and peak time, but striking contrast in terms of when significant differences emerged between preferred and nonpreferred responses. For target alone, significant differences emerged soon after response onset, but much later (after peak time) for salient targets presented in cluttered scenes (Fig. 6C,F).

Comparison with metameric stimuli

To investigate the relationship between texture-like representation and visual crowding, we used a texture synthesis model to create synthetic images that differ physically from the originals (i.e., target + distractors) but have matched texture statistics within the RF (see Materials and Methods). If it is true that a single neuron in area V4 encodes a cluttered visual scene using texture-like summary statistics, then the neuronal responses to the originals will have a strong correlation with the neuronal responses to the synthetic images and relatively weak correlation with “target alone” responses. Our results show that this prediction is not true. Responses to “target + distractors” were better correlated with responses to “target alone” than those to texture statistics matched synthetic (metameric) images (Fig. 7A). Moreover, this finding was clearer when the target was salient (i.e., “target + circle,” “target + small circle” distractors; Fig. 7B,C). This indicates that V4 neurons encode the shape information of a salient target, which is not captured by the texture statistics model.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Test of texture statistics model. A–C, Test of the texture statistics model for visual crowding. A, For each neuron, we computed two correlations: (1) the correlation between responses to “target + random distractor” and “target alone” stimuli and (2) the correlation between responses to “target + random distractor” stimuli and matched metamers. Population data are shifted below the diagonal suggesting that responses to target + random distractors are better correlated with the “target alone” condition (x-axis). Cells with a significant correlation (p < 0.05) in x-axis alone, y-axis alone, or both are identified (see legend). B, C, The same analyses as in A for the “circle distractor” (B) and “small circle distractor” (C) conditions in which the target is more salient.

Temporal dynamics of perceptual grouping

To understand the time course and efficacy of saliency- and perceptual grouping-related processes in alleviating crowding, we quantified the target shape decoding performance as a function of time using a 100 ms sliding window for the different classes of stimuli tested here. Given that we studied all neurons with the same set of stimuli and every trial was unique (in terms of target and distractor placement combinations), we constructed a pseudopopulation across all recording sessions. Using the far distractor condition as the training dataset, we first used LDA to build a classifier for target stimulus orientation (see Materials and Methods). This classifier was then used to decode target orientation across all other stimulus conditions.

For the achromatic target alone condition, decoding performance (red, all panels) showed an immediate increase after stimulus onset, reaching its peak ∼100 ms afterward. As expected, the decoding performance declined in the distractor distance (Fig. 8A) and number (Fig. 8B) conditions, but the dynamics were quite similar in terms of rise and peak times (Fig. 8A,B, compare gray and red lines). For the condition where we presented six distractors close to the target (black lines), decoding performance remained close to chance level (0.125, 1 out of 8).

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Temporal dynamics of population decoding. Population decoding performance plotted as a function of time across different distractor conditions. In all four panels, target alone (red) and target + 6 near distractors (black) are identical. Decoding performance declines in the presence of distractors but time course varies across conditions. For the salient target conditions (light blue curve in C, gray curves in D), rise time and the maximum decoding performance time are delayed compared with target alone conditions (red and blue curves in C,D), but this is not the case for distance and number effects (gray curves in A,B). Green line indicates the chance level of target orientation decoding (0.125, 1 out of 8). Different colored lines represent different target–distractor configurations.

However, under conditions where there are multiple distractors in close proximity to the target, the temporal evolution of decoding performance displayed a marked dependency on salience. When a target was salient by a color cue, decoding performance increased at a slower rate compared with the target-only condition, reaching its peak ∼200 ms after the target onset (Fig. 8C). Similarly, when saliency was titrated by distractor shape/size cues (Fig. 8D), we also observed a noticeable delay in when decoding performance began to increase. This finding strongly supports the notion that the effects of target saliency and those of distractor distance/number are mediated by distinct neuronal mechanisms.

Alternative model: saliency computation

Our findings strongly indicate the need for an improved model for visual scene encoding that not only explains visual crowding effects but also incorporates the computation of visual saliency. Specifically, our results suggest that salient stimuli, defined as those stimuli with a high contrast in visual features relative to nearby stimuli, may be elevated and preferentially encoded in V4. Many researchers have proposed biologically plausible models for calculating visual saliency (Itti et al., 1998; Li, 2002; Itti, 2005; Coen-Cagli et al., 2012). Although these models may vary in specific details, they share common aspects: (1) early visual areas extract the most basic visual features of an image such as orientation, luminance, color, and texture in parallel and pass the outputs to the next level area, and (2) later stages integrate differentially weighted feature-based representations to create a saliency map. Here we incorporate this saliency map strategy to schematize an alternative model for processing of visual scenes with multiple shapes that could account for the observed crowding effects in area V4 (Fig. 9). In the first processing stage, the visual input undergoes parallel processing through a group of low-level feature detectors (linear filtering), such as orientation, color, luminance, and texture. For our simulation, we utilized 64 Conv1 layer filters from AlexNet (Krizhevsky et al., 2017), but more intricate models can be integrated, encompassing filters with different scales. The output of the linear filtering process (convolution of the input image and each of the 64 linear filters) produces the feature maps which are illustrated for the two example stimuli.

Figure 9.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 9.

Hierarchical saliency computation model. Visual input is first processed in parallel by a set of low-level feature detectors (e.g., orientation, color, luminance, texture) in earlier visual areas. To focus on the central region of the visual scene, visual inputs (112 × 112 pixels) and feature maps (28 × 28 pixels) were cropped from the larger images of size 224 × 224 pixels and 55 × 55 pixels, respectively. Feature maps show the outputs from the first five filters from AlexNet. The next stage of processing performs a RF center-surround operation for each feature dimension and selectively combines only informative feature maps in which the RF center region is more strongly activated than its surround (see more details in the main text).

Next, consistent with prior work (Itti et al., 1998; Gao et al., 2008; Coen-Cagli et al., 2012; Erdem and Erdem, 2013), we propose a within-feature surround normalization process which elevates high contrast features within each map. These normalized maps may then be thresholded, pooled, and form the input for shape-selective computations in V4. Figure 9 shows our simulation results for two stimuli. In the case of a chromatic central target, feature maps based on chromatic contrast may elevate salience of the target stimulus compared with when the central target is also achromatic. As shown in Figure 6, the salience of the central target can also be enhanced through the grouping of surrounding distractors possessing common features, while keeping the central target unchanged. This may involve recurrent processing and further investigation is necessary to understand how neural circuits accomplish this.

Discussion

We investigated the neural correlates of visual crowding in macaque area V4. Our findings align with previous human psychophysics research, demonstrating that neuronal selectivity for target shapes is influenced by the number of distractors, their distance from the target, the overall spatial arrangement and relative features of target and distractor assemblies. Importantly, our findings demonstrate that a salient target could mitigate the influence of nearby distractors, consistent with psychophysical results of uncrowding. To the best of our knowledge, our results provide the first comprehensive description of how various target–distractor configurations modulate V4 shape selectivity in single neurons. This delineates how simple scenes are encoded and its implications for both causing and alleviating crowding.

Saliency trumps crowding

Traditionally, the predominant idea has been that adding distractors to a visual scene decreases the discriminability of a target (Andriessen and Bouma, 1976; Palmer, 1994; Pelli et al., 2004). However, Herzog and colleagues demonstrated an “uncrowding” effect whereby crowding by flankers could be reversed by the appropriate placement of a larger number of flanking elements (Manassi et al., 2012, 2013; Herzog et al., 2016). Our own results are consistent with these findings. Even in highly cluttered visual scenes, the detrimental effect of distractors on V4 shape selectivity can be mitigated when target objects possess distinctive features, in terms of shape, size, or color, that set them apart from surrounding distractors. These results indicate that visual crowding effects do not simply depend on target–distractor separation but also on the spatial position of the distractors and the overall stimulus configuration based on the properties of the distractors and target stimuli.

The uncrowding effect of Herzog and colleagues is thought to rely on different levels of perceptual grouping cues, ranging from low-level feature similarity between target and distractors to global context integration such as contour completion and may be closely linked to the computation of target saliency (Herzog et al., 2016; Whitney and Yamanashi Leib, 2018; Choung et al., 2021). The simple bottom-up saliency computation stream we propose in this study (Fig. 9) does not incorporate global contextual information, which may be influenced by feedback signals from higher-order visual areas (Chicherov et al., 2014; Jastrzębowska et al., 2021). Future research should investigate the role of feedback signals in perceptual grouping and the computation of target saliency.

The visual crowding effect, which relies on target saliency, shows similarities to the mechanisms that distinguish specific sounds from background noise. In both cases, the perceptual system encounters challenges when extracting and isolating relevant information in a cluttered or noisy environment. In the auditory domain, the detectability of a target sound in the presence of masking can be increased by adding sound energy that is frequency distant from both the masker and the target (Hall and Grose, 1990; Verhey et al., 2003). This effect is observed when the remote sound and the masker share a consistent pattern of amplitude modulation, which is known as comodulation masking release. Interestingly, in our experiments, we observed a similar phenomenon where increasing the grouping cues among distractors enhances the saliency of the target and consequently reduces visual crowding.

Encoding salient objects in V4

Many past studies have described in detail how single isolated stimuli—oriented bars, shapes, and texture patches—modulate the responses of neurons in midlevel stages of the ventral visual pathway (Desimone and Schein, 1987; Gallant et al., 1993; Pasupathy and Connor, 2001; Okazawa et al., 2015; Kim et al., 2019). But how simple scenes composed of multiple objects are encoded and how that contributes to visual crowding is largely unknown.

One theory posits that neurons with peripheral RFs encode visual scenes in terms of texture-like summary statistics, integrating information across spatial regions that increase in size with eccentricity (Balas et al., 2009). Supporting this idea, a psychophysical study demonstrated that human observers struggle to discriminate images synthesized to have matched texture statistics at the RF sizes of area V2 in the ventral visual pathway (Freeman and Simoncelli, 2011). However, alternative perspectives argue that texture-based features alone are insufficient for representing the heterogeneous global structure of a scene (Wallis et al., 2019; Bornet et al., 2021). We directly addressed the limitations of the texture summary statistics model in explaining the crowding effect. We found that the responses of V4 cells to a target surrounded by multiple distractors were more similar to the responses in the target alone condition, which had significantly different texture statistics, compared with the responses to metamers with matched texture statistics. This trend was more pronounced when the target was salient. It is important to note that we calculated the texture statistics within a square region encompassing the near distractor and generated a metameric image based on these statistics. The square region covered an area slightly larger than the RF of an individual V4 neuron, and the synthesized images had a square boundary not present in the original stimulus set, which could potentially influence the neuronal response. In future work, it may be possible to improve our evaluation of the model by using more naturalistic stimuli and systematically increasing the area over which texture statistics are computed.

Overall, our results provide further support to the hypothesis that V4 neurons encode segmented objects in a visual scene (Pasupathy et al., 2020). When one of multiple objects is salient, that object is preferentially encoded across the V4 population. If multiple objects are equally salient, the individual objects may remain perceptually inaccessible due to limited processing capacity and the phenomenon of crowding ensues (Whitney and Yamanashi Leib, 2018; Chang and Gauthier, 2022). We do note that our experiments were conducted in animals engaged in a passive fixation task. Future studies will need to relate V4 neuronal responses to shape discrimination performance using crowded displays.

Visual processing stages that support (un)crowding

To uncover neural mechanisms underlying crowding, as in prior work (Motter, 2018; Henry and Kohn, 2020, 2022), we targeted area V4 for several critical reasons. RF sizes of V4 neurons are in good agreement with the sizes of crowding zones (e.g., 0.5 × target eccentricity; Bouma, 1970; Gattass et al., 1988; Pelli and Tillman, 2008). V4 responses encode a variety of stimulus properties (Roe et al., 2012; Pasupathy et al., 2020), are strongly modulated by both spatial and feature-based attention, and exhibit task–relevant flexible representations of visual stimuli (Mirabella et al., 2007; Popovkina and Pasupathy, 2022). Therefore, it could be optimal for computing target saliency. However, previous studies have shown that visual crowding effects are evident as early as V1 or V2 (Millin et al., 2014; He et al., 2019), suggesting that crowding may be based on activity at multiple levels throughout the visual processing hierarchy (Whitney and Levi, 2011; Manassi and Whitney, 2018; Rosenholtz et al., 2019). Indeed, functional magnetic resonance imaging and electroencephalogram investigations are consistent with this view (Chicherov et al., 2014; Jastrzębowska et al., 2021). A recent study reported that surrounding distractors led to a greater impairment of neuronal discriminability for the orientation of a small target grating stimulus in V4 than V1 (Henry and Kohn, 2022), and our preliminary data suggest that, unlike in V4, in anesthetized V2 recordings target saliency does not mitigate the detrimental effects of distractors on shape selectivity (Kim and Pasupathy, 2022). Furthermore, the extent of visual crowding varies with perceptual experience (or training) (Chung, 2007; Williamson et al., 2009; Wong and Gauthier, 2012). Future studies in other brain regions and with a broader range of stimuli can clarify our understanding of how neuronal processing impacts perception of crowded displays.

Footnotes

  • We are grateful to Rohit Kamath, Dr. Anjani Chakrala, and Dr. Dina Popovkina for providing helpful discussions and comments on the manuscript and Amber Fyall for assistance with animal training. This work was supported by National Eye Institute (NEI) Grant R01 EY018839 to A.P.; NEI Center Core Grant for Vision Research P30 EY01730 to the UW; National Institutes of Health/Office of Research Infrastructure Programs Grant P51 OD010425 to the WaNPRC.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Taekjun Kim at taekjun{at}uw.edu or Anitha Pasupathy at pasupat{at}u.washington.edu.

SfN exclusive license.

References

  1. ↵
    1. Anderson EJ,
    2. Dakin SC,
    3. Schwarzkopf DS,
    4. Rees G,
    5. Greenwood JA
    (2012) The neural correlates of crowding-induced changes in appearance. Curr Biol 22:1199–1206. https://doi.org/10.1016/j.cub.2012.04.063 pmid:22658599
    OpenUrlCrossRefPubMed
  2. ↵
    1. Andriessen JJ,
    2. Bouma H
    (1976) Eccentric vision: adverse interactions between line segments. Vision Res 16:71–78. https://doi.org/10.1016/0042-6989(76)90078-X
    OpenUrlCrossRefPubMed
  3. ↵
    1. Balas B,
    2. Nakano L,
    3. Rosenholtz R
    (2009) A summary-statistic representation in peripheral vision explains visual crowding. J Vis 9:13. https://doi.org/10.1167/9.12.13 pmid:20053104
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Bornet A,
    2. Choung O-H,
    3. Doerig A,
    4. Whitney D,
    5. Herzog MH,
    6. Manassi M
    (2021) Global and high-level effects in crowding cannot be predicted by either high-dimensional pooling or target cueing. J Vis 21:10. https://doi.org/10.1167/jov.21.12.10 pmid:34812839
    OpenUrlCrossRefPubMed
  5. ↵
    1. Bouma H
    (1970) Interaction effects in parafoveal letter recognition. Nature 226:177–178. https://doi.org/10.1038/226177a0
    OpenUrlCrossRefPubMed
  6. ↵
    1. Burrows BE,
    2. Moore T
    (2009) Influence and limitations of popout in the selection of salient visual stimuli by area V4 neurons. J Neurosci 29:15169–15177. https://doi.org/10.1523/JNEUROSCI.3710-09.2009 pmid:19955369
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Chaney W,
    2. Fischer J,
    3. Whitney D
    (2014) The hierarchical sparse selection model of visual crowding. Front Integr Neurosci 8:73. https://doi.org/10.3389/fnint.2014.00073 pmid:25309360
    OpenUrlCrossRefPubMed
  8. ↵
    1. Chang T-Y,
    2. Gauthier I
    (2022) Domain-general ability underlies complex object ensemble processing. J Exp Psychol Gen 151:966–972. https://doi.org/10.1037/xge0001110
    OpenUrlCrossRef
  9. ↵
    1. Chicherov V,
    2. Plomp G,
    3. Herzog MH
    (2014) Neural correlates of visual crowding. NeuroImage 93:23–31. https://doi.org/10.1016/j.neuroimage.2014.02.021
    OpenUrl
  10. ↵
    1. Choung O-H,
    2. Bornet A,
    3. Doerig A,
    4. Herzog MH
    (2021) Dissecting (un)crowding. J Vis 21:10. https://doi.org/10.1167/jov.21.10.10 pmid:34515740
    OpenUrlCrossRefPubMed
  11. ↵
    1. Chung STL
    (2007) Learning to identify crowded letters: does it improve reading speed? Vision Res 47:3150–3159. https://doi.org/10.1016/j.visres.2007.08.017 pmid:17928026
    OpenUrlCrossRefPubMed
  12. ↵
    1. Coen-Cagli R,
    2. Dayan P,
    3. Schwartz O
    (2012) Cortical surround interactions and perceptual salience via natural scene statistics. PLoS Comput Biol 8:e1002405. https://doi.org/10.1371/journal.pcbi.1002405 pmid:22396635
    OpenUrlCrossRefPubMed
  13. ↵
    1. Desimone R,
    2. Schein SJ
    (1987) Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. J Neurophysiol 57:835–868. https://doi.org/10.1152/jn.1987.57.3.835
    OpenUrlCrossRefPubMed
  14. ↵
    1. Erdem E,
    2. Erdem A
    (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11. https://doi.org/10.1167/13.4.11
    OpenUrlAbstract/FREE Full Text
  15. ↵
    1. Ester EF,
    2. Zilber E,
    3. Serences JT
    (2015) Substitution and pooling in visual crowding induced by similar and dissimilar distractors. J Vis 15:4. https://doi.org/10.1167/15.1.4 pmid:25572350
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Felisberti FM,
    2. Solomon JA,
    3. Morgan MJ
    (2005) The role of target salience in crowding. Perception 34:823–833. https://doi.org/10.1068/p5206
    OpenUrlCrossRefPubMed
  17. ↵
    1. Freeman J,
    2. Simoncelli EP
    (2011) Metamers of the ventral stream. Nat Neurosci 14:1195–1201. https://doi.org/10.1038/nn.2889 pmid:21841776
    OpenUrlCrossRefPubMed
  18. ↵
    1. Gallant JL,
    2. Braun J,
    3. Van Essen DC
    (1993) Selectivity for polar, hyperbolic, and cartesian gratings in macaque visual cortex. Science 259:100–103. https://doi.org/10.1126/science.8418487
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. Gao D,
    2. Mahadevan V,
    3. Vasconcelos N
    (2008) On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis 8:13. https://doi.org/10.1167/8.7.13
    OpenUrlAbstract/FREE Full Text
  20. ↵
    1. Gattass R,
    2. Sousa AP,
    3. Gross CG
    (1988) Visuotopic organization and extent of V3 and V4 of the macaque. J Neurosci 8:1831–1845. https://doi.org/10.1523/JNEUROSCI.08-06-01831.1988 pmid:3385477
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Hall JW 3rd.,
    2. Grose JH
    (1990) Comodulation masking release and auditory grouping. J Acoust Soc Am 88:119–125. https://doi.org/10.1121/1.399957
    OpenUrlCrossRefPubMed
  22. ↵
    1. He D,
    2. Wang Y,
    3. Fang F
    (2019) The critical role of V2 population receptive fields in visual orientation crowding. Curr Biol 29:2229–2236.e3. https://doi.org/10.1016/j.cub.2019.05.068
    OpenUrlCrossRef
  23. ↵
    1. He S,
    2. Cavanagh P,
    3. Intriligator J
    (1996) Attentional resolution and the locus of visual awareness. Nature 383:334–337. https://doi.org/10.1038/383334a0
    OpenUrlCrossRefPubMed
  24. ↵
    1. Henry CA,
    2. Kohn A
    (2020) Spatial contextual effects in primary visual cortex limit feature representation under crowding. Nat Commun 11:1687. https://doi.org/10.1038/s41467-020-15386-7 pmid:32245941
    OpenUrlPubMed
  25. ↵
    1. Henry CA,
    2. Kohn A
    (2022) Feature representation under crowding in macaque V1 and V4 neuronal populations. Curr Biol 32:5126–5137.e3. https://doi.org/10.1016/j.cub.2022.10.049 pmid:36379216
    OpenUrlCrossRefPubMed
  26. ↵
    1. Herzog MH,
    2. Sayim B,
    3. Chicherov V,
    4. Manassi M
    (2015) Crowding, grouping, and object recognition: a matter of appearance. J Vis 15:5. https://doi.org/10.1167/15.6.5 pmid:26024452
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. Herzog MH,
    2. Thunell E,
    3. Ögmen H
    (2016) Putting low-level vision into global context: why vision cannot be reduced to basic circuits. Vision Res 126:9–18. https://doi.org/10.1016/j.visres.2015.09.009
    OpenUrl
  28. ↵
    1. Itti L
    (2005) Chapter 94: models of bottom-up attention and saliency. In: Neurobiology of attention (Itti L, Rees G, Tsotsos JK, eds), pp 576–582. Burlington: Academic Press.
  29. ↵
    1. Itti L,
    2. Koch C,
    3. Niebur E
    (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254–1259. https://doi.org/10.1109/34.730558
    OpenUrlCrossRef
  30. ↵
    1. Jastrzębowska MA,
    2. Chicherov V,
    3. Draganski B,
    4. Herzog MH
    (2021) Unraveling brain interactions in vision: the example of crowding. NeuroImage 240:118390. https://doi.org/10.1016/j.neuroimage.2021.118390
    OpenUrlCrossRef
  31. ↵
    1. Kim T,
    2. Bair W,
    3. Pasupathy A
    (2019) Neural coding for shape and texture in macaque area V4. J Neurosci 39:4760–4774. https://doi.org/10.1523/JNEUROSCI.3073-18.2019 pmid:30948478
    OpenUrlAbstract/FREE Full Text
  32. ↵
    1. Kim T,
    2. Pasupathy A
    (2022) Visual saliency alleviates crowding in macaque area V4 but not V2. Program no. 715.14. 2022. Neuroscience meeting planner. San Diego, CA. Society for Neuroscience.
  33. ↵
    1. Krizhevsky A,
    2. Sutskever I,
    3. Hinton GE
    (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386
    OpenUrlCrossRef
  34. ↵
    1. Kwon M,
    2. Bao P,
    3. Millin R,
    4. Tjan BS
    (2014) Radial-tangential anisotropy of crowding in the early visual areas. J Neurophysiol 112:2413–2422. https://doi.org/10.1152/jn.00476.2014 pmid:25122703
    OpenUrlCrossRefPubMed
  35. ↵
    1. Levi DM
    (2008) Crowding—an essential bottleneck for object recognition: a mini-review. Vision Res 48:635–654. https://doi.org/10.1016/j.visres.2007.12.009 pmid:18226828
    OpenUrlCrossRefPubMed
  36. ↵
    1. Li Z
    (2002) A saliency map in primary visual cortex. Trends Cogn Sci 6:9–16. https://doi.org/10.1016/S1364-6613(00)01817-9
    OpenUrlCrossRefPubMed
  37. ↵
    1. Mack A
    (2003) Inattentional blindness: looking without seeing. Curr Dir Psychol Sci 12:180–184. https://doi.org/10.1111/1467-8721.01256
    OpenUrlCrossRef
  38. ↵
    1. Manassi M,
    2. Sayim B,
    3. Herzog MH
    (2012) Grouping, pooling, and when bigger is better in visual crowding. J Vis 12:13. https://doi.org/10.1167/12.10.13
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Manassi M,
    2. Sayim B,
    3. Herzog MH
    (2013) When crowding of crowding leads to uncrowding. J Vis 13:10. https://doi.org/10.1167/13.13.10
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Manassi M,
    2. Whitney D
    (2018) Multi-level crowding and the paradox of object recognition in clutter. Curr Biol 28:R127–R133. https://doi.org/10.1016/j.cub.2017.12.051
    OpenUrlCrossRefPubMed
  41. ↵
    1. Millin R,
    2. Arman AC,
    3. Chung STL,
    4. Tjan BS
    (2014) Visual crowding in V1. Cereb Cortex 24:3107–3115. https://doi.org/10.1093/cercor/bht159 pmid:23833128
    OpenUrlCrossRefPubMed
  42. ↵
    1. Mirabella G,
    2. Bertini G,
    3. Samengo I,
    4. Kilavik BE,
    5. Frilli D,
    6. Libera CD,
    7. Chelazzi L
    (2007) Neurons in area V4 of the macaque translate attended visual features into behaviorally relevant categories. Neuron 54:303–318. https://doi.org/10.1016/j.neuron.2007.04.007
    OpenUrlCrossRefPubMed
  43. ↵
    1. Motter BC
    (2009) Central V4 receptive fields are scaled by the V1 cortical magnification and correspond to a constant-sized sampling of the V1 surface. J Neurosci 29:5749–5757. https://doi.org/10.1523/JNEUROSCI.4496-08.2009 pmid:19420243
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. Motter BC
    (2018) Stimulus conflation and tuning selectivity in V4 neurons: a model of visual crowding. J Vis 18:15. https://doi.org/10.1167/18.1.15 pmid:29362808
    OpenUrlCrossRefPubMed
  45. ↵
    1. Nandy AS,
    2. Tjan BS
    (2012) Saccade-confounded image statistics explain visual crowding. Nat Neurosci 15:463–469. https://doi.org/10.1038/nn.3021 pmid:22231425
    OpenUrlCrossRefPubMed
  46. ↵
    1. Okazawa G,
    2. Tajima S,
    3. Komatsu H
    (2015) Image statistics underlying natural texture selectivity of neurons in macaque V4. Proc Natl Acad Sci U S A 112:E351–E360. https://doi.org/10.1073/pnas.1415146112 pmid:25535362
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Palmer J
    (1994) Set-size effects in visual search: the effect of attention is independent of the stimulus for simple tasks. Vision Res 34:1703–1721. https://doi.org/10.1016/0042-6989(94)90128-7
    OpenUrlCrossRefPubMed
  48. ↵
    1. Parkes L,
    2. Lund J,
    3. Angelucci A,
    4. Solomon JA,
    5. Morgan M
    (2001) Compulsory averaging of crowded orientation signals in human vision. Nat Neurosci 4:739–744. https://doi.org/10.1038/89532
    OpenUrlCrossRefPubMed
  49. ↵
    1. Pasupathy A,
    2. Connor CE
    (2001) Shape representation in area V4: position-specific tuning for boundary conformation. J Neurophysiol 86:2505–2519. https://doi.org/10.1152/jn.2001.86.5.2505
    OpenUrlCrossRefPubMed
  50. ↵
    1. Pasupathy A,
    2. Popovkina DV,
    3. Kim T
    (2020) Visual functions of primate area V4. Annu Rev Vis Sci 6:363–385. https://doi.org/10.1146/annurev-vision-030320-041306 pmid:32580663
    OpenUrlCrossRefPubMed
  51. ↵
    1. Pelli DG,
    2. Tillman KA
    (2008) The uncrowded window of object recognition. Nat Neurosci 11:1129–1135. https://doi.org/10.1038/nn.2187 pmid:18828191
    OpenUrlCrossRefPubMed
  52. ↵
    1. Pelli DG,
    2. Palomares M,
    3. Majaj NJ
    (2004) Crowding is unlike ordinary masking: distinguishing feature integration from detection. J Vis 4:12. https://doi.org/10.1167/4.12.12
    OpenUrlAbstract
  53. ↵
    1. Petrov Y,
    2. Meleshkevich O
    (2011) Asymmetries and idiosyncratic hot spots in crowding. Vision Res 51:1117–1123. https://doi.org/10.1016/j.visres.2011.03.001
    OpenUrlCrossRefPubMed
  54. ↵
    1. Petrov Y,
    2. Popple AV,
    3. McKee SP
    (2007) Crowding and surround suppression: not to be confused. J Vis 7:12. https://doi.org/10.1167/7.2.12 pmid:18217827
    OpenUrlAbstract/FREE Full Text
  55. ↵
    1. Põder E,
    2. Wagemans J
    (2007) Crowding with conjunctions of simple features. J Vis 7:23. https://doi.org/10.1167/7.2.23
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Popovkina DV,
    2. Pasupathy A
    (2022) Task context modulates feature-selective responses in area V4. J Neurosci 42:6408–6423. https://doi.org/10.1523/JNEUROSCI.1386-21.2022 pmid:35840322
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Portilla J,
    2. Simoncelli EP
    (2000) A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vis 40:49–70. https://doi.org/10.1023/A:1026553619983
    OpenUrlCrossRef
  58. ↵
    1. Pospisil DA,
    2. Bair W
    (2022) Accounting for bias in the estimation of r2 between two sets of noisy neural responses. J Neurosci 42:9343–9355. doi: 10.1523/JNEUROSCI.0198-22.2022
    OpenUrlAbstract/FREE Full Text
  59. ↵
    1. Reynolds JH,
    2. Pasternak T,
    3. Desimone R
    (2000) Attention increases sensitivity of V4 neurons. Neuron 26:703–714. https://doi.org/10.1016/S0896-6273(00)81206-4
    OpenUrlCrossRefPubMed
  60. ↵
    1. Rock I,
    2. Linnett CM,
    3. Grant P,
    4. Mack A
    (1992) Perception without attention: results of a new method. Cogn Psychol 24:502–534. https://doi.org/10.1016/0010-0285(92)90017-V
    OpenUrlCrossRefPubMed
  61. ↵
    1. Roe AW,
    2. Chelazzi L,
    3. Connor CE,
    4. Conway BR,
    5. Fujita I,
    6. Gallant JL,
    7. Lu H,
    8. Vanduffel W
    (2012) Toward a unified theory of visual area V4. Neuron 74:12–29. https://doi.org/10.1016/j.neuron.2012.03.011 pmid:22500626
    OpenUrlCrossRefPubMed
  62. ↵
    1. Rosenholtz R,
    2. Yu D,
    3. Keshvari S
    (2019) Challenges to pooling models of crowding: implications for visual mechanisms. J Vis 19:15. https://doi.org/10.1167/jov.19.7.15
    OpenUrlCrossRefPubMed
  63. ↵
    1. Saarela TP,
    2. Westheimer G,
    3. Herzog MH
    (2010) The effect of spacing regularity on visual crowding. J Vis 10:17. https://doi.org/10.1167/10.10.17
    OpenUrlAbstract/FREE Full Text
  64. ↵
    1. Simons DJ,
    2. Chabris CF
    (1999) Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28:1059–1074. https://doi.org/10.1068/p281059
    OpenUrlCrossRefPubMed
  65. ↵
    1. Toet A,
    2. Levi DM
    (1992) The two-dimensional shape of spatial interaction zones in the parafovea. Vision Res 32:1349–1357. https://doi.org/10.1016/0042-6989(92)90227-A
    OpenUrlCrossRefPubMed
  66. ↵
    1. Verhey JL,
    2. Pressnitzer D,
    3. Winter IM
    (2003) The psychophysics and physiology of comodulation masking release. Exp Brain Res 153:405–417. https://doi.org/10.1007/s00221-003-1607-1
    OpenUrlCrossRefPubMed
  67. ↵
    1. Wagemans J,
    2. Elder JH,
    3. Kubovy M,
    4. Palmer SE,
    5. Peterson MA,
    6. Singh M,
    7. von der Heydt R
    (2012) A century of gestalt psychology in visual perception: i. Perceptual grouping and figure–ground organization. Psychol Bull 138:1172–1217. https://doi.org/10.1037/a0029333 pmid:22845751
    OpenUrlCrossRefPubMed
  68. ↵
    1. Wallis TS,
    2. Funke CM,
    3. Ecker AS,
    4. Gatys LA,
    5. Wichmann FA,
    6. Bethge M
    (2019) Image content is more important than Bouma’s Law for scene metamers Behrens TE, Herzog M, Herzog M, Caas J, eds. Elife 8:e42512. https://doi.org/10.7554/eLife.42512 pmid:31038458
    OpenUrlCrossRefPubMed
  69. ↵
    1. Whitney D,
    2. Levi DM
    (2011) Visual crowding: a fundamental limit on conscious perception and object recognition. Trends Cogn Sci 15:160–168. https://doi.org/10.1016/j.tics.2011.02.005 pmid:21420894
    OpenUrlCrossRefPubMed
  70. ↵
    1. Whitney D,
    2. Yamanashi Leib A
    (2018) Ensemble perception. Annu Rev Psychol 69:105–129. https://doi.org/10.1146/annurev-psych-010416-044232
    OpenUrlCrossRefPubMed
  71. ↵
    1. Williamson K,
    2. Scolari M,
    3. Jeong S,
    4. Kim M-S,
    5. Awh E
    (2009) Experience-dependent changes in the topography of visual crowding. J Vis 9:15. https://doi.org/10.1167/9.11.15 pmid:20053078
    OpenUrlAbstract/FREE Full Text
  72. ↵
    1. Wong YK,
    2. Gauthier I
    (2012) Music-reading expertise alters visual spatial resolution for musical notation. Psychon Bull Rev 19:594–600. https://doi.org/10.3758/s13423-012-0242-x pmid:22460744
    OpenUrlPubMed
  73. ↵
    1. Zhaoping L
    (2019) A new framework for understanding vision from the perspective of the primary visual cortex. Curr Opin Neurobiol 58:e1002405. https://doi.org/10.1016/j.conb.2019.06.001
    OpenUrl
Back to top

In this issue

The Journal of Neuroscience: 44 (24)
Journal of Neuroscience
Vol. 44, Issue 24
12 Jun 2024
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Neural Correlates of Crowding in Macaque Area V4
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Neural Correlates of Crowding in Macaque Area V4
Taekjun Kim, Anitha Pasupathy
Journal of Neuroscience 12 June 2024, 44 (24) e2260232024; DOI: 10.1523/JNEUROSCI.2260-23.2024

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Neural Correlates of Crowding in Macaque Area V4
Taekjun Kim, Anitha Pasupathy
Journal of Neuroscience 12 June 2024, 44 (24) e2260232024; DOI: 10.1523/JNEUROSCI.2260-23.2024
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF

Keywords

  • object recognition
  • primate
  • saliency computation
  • shape perception
  • temporal dynamics
  • ventral visual pathway

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • V2b neurons act via multiple targets to produce in phase inhibition during locomotion
  • Functional and Regional Specificity of Noradrenergic Signaling for Encoding and Retrieval of Associative Recognition Memory in the Rat
  • The Neurobiology of Cognitive Fatigue and Its Influence on Effort-Based Choice
Show more Research Articles

Systems/Circuits

  • V2b neurons act via multiple targets to produce in phase inhibition during locomotion
  • The Neurobiology of Cognitive Fatigue and Its Influence on Effort-Based Choice
  • Specializations in Amygdalar and Hippocampal Innervation of the Primate Nucleus Accumbens Shell
Show more Systems/Circuits
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.