Abstract
Crowding, the phenomenon of impaired visual discrimination due to nearby objects, has been extensively studied and linked to cortical mechanisms. Traditionally, crowding has been studied extrafoveally; its underlying mechanisms in the central fovea, where acuity is highest, remain debated. While low-level oculomotor factors are not thought to play a role in crowding, this study shows that they are key factors in defining foveal crowding. Here, we investigate the influence of fixational behavior on foveal crowding and provide a comprehensive assessment of the magnitude and extent of this phenomenon (N = 13 human participants, four males). Leveraging on a unique blend of tools for high-precision eyetracking and retinal stabilization, we show that removing the retinal motion introduced by oculomotor behavior with retinal stabilization, diminishes the negative effects of crowding. Ultimately, these results indicate that ocular drift contributes to foveal crowding resulting in the same pooling region being stimulated both by the target and nearby objects over the course of time, not just in space. The temporal aspect of this phenomenon is peculiar to crowding at this scale and indicates that the mechanisms contributing to foveal and extrafoveal crowding differ.
Significance Statement
Foveated stimuli are often crowded. The effects of crowding have been extensively studied in the visual periphery and are thought to have a cortical origin. Nonetheless, foveal crowding mechanisms remain elusive. Here, we show that acuity drops by two lines on a Snellen Chart when flankers surround a stimulus presented at the very center of gaze. Further, at this scale, crowding cannot be regarded as a purely cortical phenomenon. Because foveal neurons’ receptive fields are the smallest, eye jitter during fixation introduces spatial uncertainty by sweeping target and surrounding distractors over the same cortical pooling region even during short fixation periods, exacerbating crowding effects.
Introduction
Imagine driving in a busy urban street: the visual system is overwhelmed by a complex and crowded scene including traffic signs, cars approaching at different speeds, pedestrian crossings, and bicycles coming from all sides. Often, in this familiar situation, stimuli are closely packed together. This makes it challenging for the visual system to focus on individual elements and identify relevant information. One phenomenon that makes this scenario particularly taxing is known as visual crowding. Even if an object can be accurately identified in isolation, when it is surrounded by similar-looking stimuli (flankers), visual discrimination is significantly impaired. Crowding occurs in most everyday visual tasks such as reading, visual search, and driving. The study of crowding has a long history (Bouma, 1970; Andriessen and Bouma, 1976), with some early studies dating back to 1923 (Korte, 1923), and early reports dating back as far as 1684 (Strasburger and Wade, 2015). It is now well known that discrimination abilities decrease when the space between optotypes decreases (even when they do not overlap), and stimulus discriminability gets progressively more difficult as stimuli are presented at increasing eccentricities. Together, these effects are described under the umbrella of the Bouma’s Law (Bouma, 1970; Pelli, 2008).
While the study of crowding has traditionally focused on parafoveal and peripheral vision (Whitney and Levi, 2011), the input to the area immediately surrounding the center of gaze is often crowded (Fig. 1A). This region of the visual field projects onto a retinal region known as the foveola. The foveola is of paramount importance for vision; it spans only 1° in size, it is free of capillaries and rods, and it is characterized by the highest cone density, allowing for high visual resolution (Curcio et al., 1990; Kolb et al., 1995). Understanding how crowding works at this scale can yield valuable insights into the mechanisms underpinning visual acuity in more natural conditions. Yet, foveal crowding is little studied.
Notably, the study of foveal crowding in healthy observers has been marked by conflicting results and interpretations. Some studies reported the absence of crowding within the fovea (He et al., 1975; Strasburger et al., 1991), while others provided evidence supporting its existence (Flom et al., 1963; Danilova and Bondarko, 2007; Siderov et al., 2013; Pelli et al., 2016). However, some of these latter studies were questioned (Hess et al., 2000), and it was argued that the reported effects could be attributed to optical blur, which would cause stimuli to overlap on the retina even if they are physically separated in space, inducing a masking effect (Levi, 2008; Coates et al., 2018). Using an Adaptive Optics Scanner Laser Opthalmoscope to bypass the eye’s optics and present aberration-free stimuli directly on the retina, a recent study showed that crowding occurs in the foveola (Coates et al., 2018). Although it is now increasingly accepted that crowding affects vision even at the very center of gaze where acuity is highest, its contributing factors remain unclear.
According to the dominant view, crowding is the result of integration/pooling of visual information over a region beyond the bounds of the target object; crowding occurs when the same pooling region is simultaneously stimulated by a flanker and a target (Parkes et al., 2001; Pelli et al., 2004; Pelli, 2008; Greenwood et al., 2009; Rosenholtz et al., 2019). These pooling regions grow linearly with eccentricity (Bouma, 1970), which explains why the effects of crowding increase with eccentricity, and how increasing the spacing between the target and flanker can alleviate crowding effects (Toet and Levi, 1992; Pelli et al., 2004). Further, crowding is modulated by specific stimulus features (color, shape, likeness) (Bernard and Chung, 2011), and it is likely a combination of an increase of positional uncertainty, source confusion, and featural averaging (Harrison and Bex, 2017). Notably, not all flankers cause crowding; it has been shown that in certain instances, the target perceptually un-pairs from the flankers effectively leading to uncrowding (Herzog, 2022), suggesting that crowding is the result of grouping mechanisms rather than purely pooling, and also relies on the similarity of the flankers (Manassi et al., 2012). In either case, crowding is considered a fundamentally cortical phenomenon (but see Rodriguez and Granger, 2021). It is however not clear if the same mechanisms could be responsible for crowding in the central fovea where pooling may happen to a lesser extent and other factors like fixational eye movements may play a role.
Although recent evidence supports the idea that the retinal motion introduced by large eye movements (i.e., saccades) shapes some aspects of peripheral crowing (Nandy and Tjan, 2012), studies on crowding generally assume that the image on the retina remains still, or retinal motion is negligible during fixation. As a result, often observers’ eye movements during crowding experiments are not recorded. However, the eye is in constant motion even during fixation. This motion is mostly the result of microsaccades and ocular drift (Steinman, 2003; Cherici et al., 2012; Ratnam et al., 2017; Poletti, 2023). Ocular drift in particular continually shifts the retinal projection of stimuli across many photoreceptors in the fovea. Extrafoveally, the motion introduced by ocular drift likely does not have an impact on crowding, it is smaller compared to the size of the pooling regions, and as a result, it does not move stimuli across separate pooling regions. However, this motion cannot be ignored when examining crowding at the center of gaze where Retinal Ganglion Cells (RGCs) Receptive Fields (RFs) and potential pooling regions are much smaller. This motion, although seemingly negligible, causes the stimulus to traverse multiple foveolar cones and RGC RFs within the short time frame of a fixation (≈ 500 ms) (Fig. 1B–C).
One way ocular drift may influence foveal crowding is by shifting RGCs RFs on both the flanker and target over time during the course of fixation. Even if at each instant in time target and flankers do not stimulate the same RF, a RF that initially is stimulated by a flanker may later, during fixation, be stimulated by the target too as a consequence of ocular drift sweeping stimuli across the central fovea over time (Fig. 1D). When considering a typical pattern of ocular drift and an average cone size at the preferred locus of fixation (Curcio et al., 1990), we estimate that for stimuli near the acuity limit, 5′ in size, and flanker edge-to-edge spacing ranging from 1′ to 3′ on average
The role of oculomotor behavior in foveal crowding, an important but neglected component, remains uncharted territory. In this study, we bridge this knowledge gap by assessing the magnitude and the extent of crowding at the center of gaze and the impact of ocular drift. With a unique blend of tools allowing for higher resolution in recording eye movements and higher accuracy in localizing the line of sight (Fig. 2A) (Santini et al., 2007; Ko et al., 2016; Wu et al., 2023); here, we show that ocular drift plays an important role in foveal crowding, and that at this scale, crowding is not a purely cortical phenomenon.
Methods and Materials
Observers
Thirteen adult observers with normal vision participated in this study, including twelve naive subjects and one experienced observer who is an author of the study. Participants consisted of four males and nine females, with ages ranging from 18 to 25 years old. With the exception of the author, all subjects had emmetropic vision and did not require any corrective measures to achieve a minimum Snellen acuity of 20/20. The author of the study wore corrective contact lenses. Seven observers took part in the retinal stabilization experiment. Ethical approval for this research study was obtained from the University of Rochester’s Research Subjects Review Board. Prior to participating in the study, subjects underwent an initial screening session where they were provided with a comprehensive explanation of the experiment and had the opportunity to review the materials in the consent form in detail. Informed consent was obtained from each subject after they demonstrated their understanding of the study and verbally agreed to participate. The consent process was properly documented for each participant.
Experimental setup
Eye movements in the main experiment were recorded with high precision either by means of a Generation 6 DPI eye tracker (Fourward Technologies), with a 1 kHz sampling rate (Crane and Steele, 1985; Ko et al., 2016), or by means of custom-made digital dual Purkinje Image ( dDPI) eye tracker, with a sampling rate of 340 Hz (Wu et al., 2023). Both systems have an internal noise well below 1′ and a spatial resolution of at least 1′ (Crane and Steele, 1985; Ko et al., 2016; Wu et al., 2023). To reduce noise and achieve higher precision in the eyetracking signal, the head was immobilized by means of a dental-imprint bite bar and head-holder. Stimuli were shown on an LCD monitor (ASUS PG258Q), with a vertical refresh rate of 200 Hz, and a spatial resolution of 1920 × 1080 pixels. The monitor was either 3 or 5 m away from the observer (1 pixel = 0.25′ and 1 pixel = 0.19′, respectively).
Stimuli and apparatus
Stimuli were presented monocularly to the right eye while the left eye was patched. Stimuli consisted of 3, 5, 6, and 9 digits from the Pelli number-font (Pelli et al., 2016). These targets are designed specifically for studying crowding in the fovea, and it has been shown that the critical spacing is independent from the spacing-to-size ratio used for this font (Pelli et al., 2016). Consistent with Pelli et al. (2016) we show that when using the Pelli’s font, critical spacing remains constant across a range of different size-to-spacing ratios (1.2, 1.4, 1.5, and 1.8 times the stimulus width) (Fig. 3).
In the main experiment, the stimulus was either presented in an uncrowded (the target number was presented in isolation) or crowded (the target number was surrounded by other digits in Pelli font) condition (Fig. 2B). In an effort to avoid facilitation effects, in the crowded condition, horizontal flankers never matched the target number and were never the same on both sides of the target. All targets were presented at maximum contrast in black text on a uniform gray background at the center of the display. The central region of the display where the stimuli appeared was highlighted by presenting four peripheral arches for the 400 ms before the stimulus appeared and during the 500 ms the stimulus was displayed. Stimuli were rendered by means of EyeRIS (Santini et al., 2007), a custom-developed system allowing flexible gaze-contingent display control. This system acquires eye movement signals from the eye-tracker, processes them in real time and, if necessary, then updates the stimulus on the display according to the desired combination of estimated oculomotor variables.
Visual acuity was calculated both as units of stimulus width (arcminutes) and as minimum angle of resolution (MAR). To convert Pelli digits to MAR, instead of taking 1/5 of the stimulus width as with the tumbling E, MAR was defined as 1/2 of the stimulus strokewidth (Pelli et al., 2016). Therefore, an optotype that was 2′ wide would correspond to the 20/20 Snellen MAR line.
Calibration procedure
Data were collected by means of multiple experimental sessions. Each session lasted approximately 1 h, and each subject completed on average 8 sessions. Every session started with preliminary setup operations that lasted a few minutes, involving comfortably positioning the observer in the apparatus, tuning the eye tracker for optimal performance, and executing a two-step gaze-contingent calibration procedure to map the eye tracker’s output into visual angle. This procedure improves localization of the preferred retinal locus of fixation by approximately one order of magnitude over standard methods (Poletti and Rucci, 2016). In the first phase (automatic calibration), observers sequentially fixated on each of the nine points of a 3 × 3 grid, as it is standard in oculomotor experiments. Points in the grid were 1° or 1.25° apart from each another on the horizontal axes, and 40′ or 50′ on the vertical axes (varying based on screen distance). In the second phase (manual calibration), observers confirmed or refined the mapping given by the automatic calibration by fixating again on each of the nine points of the grid while the location of the line of sight, estimated on the basis of the automatic calibration was displayed in real time on the screen. Observers used a joypad to fine-tune the estimated gaze location if necessary. The manual calibration procedure was repeated for the central position before each trial to compensate for possible microscopic head movements and system drift that may occur even on a bite bar.
Experimental paradigm
Once the subject initiated the trial, a brief 10′ × 10′ fixation point was presented at the center of the screen to clearly identify the location where the stimuli would appear. A 400 ms delay period followed the blank screen to avoid any after effects from the fixation point. The target was then presented. Subjects were asked to identify the stimulus, choosing among four possible digits, by pressing a button on a remote controller.
Target acuity and critical spacing were determined by following the parametric estimation by sequential testing (PEST) procedure (Taylor and Creelman, 1967), according to which, both size and spacing of flankers (in the crowded condition) are changed online based on subject’s performance using a spacing-to-size ratio of 1.4. Spacing sizes tested varied from 0.19′–1.60′ edge-to-edge.
Retinal stabilization
In the experiment using retinal stabilization (Fig. 6), eye movements were recorded with a dDPI eyetracker with a sampling rate of 1000 Hz (Wu et al., 2023). Stimuli were shown on a LCD monitor (ASUS PG259QN), with a vertical refresh rate of 240 Hz, and a spatial resolution of 1920 × 1080 pixels. The monitor was placed 5 m away from the observer (1 pixel = 0.19′, respectively). The average system rendering time was 2.8 ms.
In the stabilized condition, the entire array of stimuli moved on the monitor to compensate for subject’s eye movements using EyeRIS (Santini et al., 2007), a custom-made system for gaze-contingent display control. We examined the effect of varying critical spacing after finding the threshold stimulus size for each subject. Stimuli were sized for each individual subject and condition to yield a performance of ≈75% correct responses. To ensure good stabilization, trials with blinks, saccades, microsaccades, or poor quality tracking were discarded. Based the system rendering latency, we estimated the average residual motion on the retina across subjects to be 1.5′ ± 2′ on the horizontal axis and 1.3′ ± 0.9′ on the vertical axis.
Data analysis
Eye movements
Eye movements were categorized into two main groups: saccades (including microsaccades) and ocular drift. Ocular motion in between saccades was defined as drift. Classification of these eye movements was first performed automatically, then thoroughly reviewed by an expert experimenter. Trials containing saccades, blinks and/or bad tracking during stimulus presentation were removed. Furthermore, trials in which subjects did not respond or gaze was more than 30′ from the center fixation point at the beginning of the trial were also removed (on average
Determination of cone stimulation
In Figure 1C, a theoretical foveal cone mosaic was used. Each cone was 0.5′ in size, which approximately matches with the average cone size at the preferred retinal locus across subject (Curcio et al., 1990). Flankers and target at the threshold size were overlaid on the theoretical cone mosaic for each observer. Motion of the stimuli on the mosaic was based on the individual subject’s eye traces. For the stimulated cones we then determined the probability of a cone being stimulated by both the target and either of the flankers over time. Probabilities were calculated at the individual trial level and then averaged across trials. In Figure 1D, instead of using a theoretical cone mosaic, we used the actual foveal cone mosaic for one of the subjects.
Estimation of strokewidth thresholds and critical spacing
Strokewidth threshold, i.e., the minimum stimulus width required to perform reliably above chance level (62.5% correct, with a 25 % chance level), was determined using a cumulative Gaussian psychometric function (Wichmann and Hill, 2001). The critical spacing estimates in the crowded condition were measured based on the distance from the center of the target to the center of one of the neighboring horizontal flankers.
Statistics were run using MATLAB’s available toolboxes. All data and MATLAB scripts used to create the figures in the manuscript have been uploaded onto the Open Science Framework repository.
Results
To investigate crowding in the foveola, we first measured acuity thresholds in 13 individuals with isolated targets in the uncrowded condition. Acuity was then measured again when flankers surrounded the target in the crowded condition (Fig. 2). Stimuli consisted of digits in Pelli’s font (Pelli et al., 2016), a font specifically designed for this purpose as it allows for smaller flankers spacing while preserving acuity. Because this font has an aspect ratio of 1:5 (width:height), acuity is determined based on the width of the stimulus (here referred to as strokewidth), and crowding is primarily determined by the spacing of the horizontal flankers. Therefore, crowding effects reported here are referred to the horizontal spacing of flankers. Stimulus size changed adaptively based on a PEST procedure (Taylor and Creelman, 1967) and ranged in width from 0.3′ to 4′ (and 1.5′–20′ vertically). In the crowded condition, the center-to-center spacing varied, together with stimulus size, from 0.4′–5.6′ with a spacing-to-size ratio of 1.4 applied separately on each axis based on the width and height of the target, respectively, as in Pelli et al. (2016). Notably, when using this font, foveal crowding effects are constant across a range of spacing-to-size ratios (Pelli et al., 2016) (Fig. 3). Subjects were asked to determine the identity of a briefly presented digit among four possible choices (Fig. 2B). Fixational eye movements were recorded using a high-precision Dual Purkinje Image eyetracker (Crane and Steele, 1985; Ko et al., 2016; Wu et al., 2023) coupled with a system for gaze contingent display control (Santini et al., 2007). Together, these systems enable not only high-precision recordings of fixational eye movements, but also a more accurate localization of the line of sight compared to commercial video eye-trackers (Poletti and Rucci, 2016).
On average, in the uncrowded condition, subjects required a stimulus width of 1.67′ to perform above chance level (Fig. 4A), approximately equivalent to 20/12 on the MAR acuity chart (see methods for details). Acuity thresholds were higher, i.e., worse acuity, for all subjects in the crowded condition (Fig. 4B). When the stimulus was crowded, acuity decreased to 20/22, i.e stimulus size was doubled (2.2′ stimulus width) (Figs. 4B–C and 5, P < 0.0001, paired two-tailed t-test, the effect size, as measured by Cohen’s d, was d = 1.10). Hence, surrounding a stimulus with flankers has the immediate impact of decreasing visual acuity by approximately 2 lines on the Snellen eye chart. The critical spacing along the vertical meridian was 11′, however, as mentioned earlier, the horizontal flankers constitute the main limiting factor for crowding when using stimuli in Pelli’s font, hence, further analyses focused on the horizontal rather than the vertical spacing. When comparing the percentage of correct responses for stimuli of the same size in both the uncrowded and crowded conditions, we observed a significant drop in performance. On average, there was a
Importantly, retinal projections of stimuli during the task were constantly moving as a result of fixational eye movements. Because stimuli were already presented at the center of gaze and their size was in the order of a few arcminutes, the rate of microsaccades, which are used to precisely recenter the stimuli on the preferred locus of fixation (Intoy and Rucci, 2020; Poletti, 2023), was low (approximately
Therefore, if ocular drift contributes to foveal crowding, then we expect that removing the retinal motion introduced by fixational eye movements reduces critical spacing. More specifically, when stimuli are maintained approximately at the same retinal location throughout the viewing time, crowding effects should decrease. To test this prediction, we used a technique known as retinal stabilization to maintain stimuli at the same retinal location during the task (Fig. 6A, see methods for detail). In this condition, the probability of the same cone being stimulated both by the target and the flanker over time is very low and only due to the residual retinal motion introduced by stabilization errors.
Subjects’ acuity in the stabilized condition was determined using the same adaptive procedure as in the main experiment. Based on the psychometric fits, we determined the stimulus size that would yield a performance of ≈75% correct responses for each condition (stabilized vs. unstabilized) separately when the stimulus was viewed in isolation. Stimuli in the crowded condition were maintained at this fixed size throughout the task, while flankers spacing varied between 0.25 and 2 times the stimulus width. Consistent with previous work (Rucci et al., 2007; Intoy and Rucci, 2020), for isolated stimuli, a larger strokewidth value was necessary in the stabilized condition (i.e., worse acuity) for performance to be comparable to the unstabilized condition (paired t-test, P = 0.0074, d = 0.76, see Fig. 6B). Therefore, stimulus size was larger in the stabilized condition for all subjects.
To account for this acuity change, we expressed spacing thresholds in nominal spacing (i.e., as a multiplier of the tested stimulus width), a measure that has been used before when comparing crowding for stimuli with different sizes (Siderov et al., 2013). As illustrated in Figure 6C, crowding effects decreased when the stimulus was stabilized. We found that the nominal center-to-center spacing and edge-to-edge spacing (Fig. 7A, also reporting results for center-to-center spacing in arcminutes B) were smaller in the stabilized condition. Whereas visual acuity was worse under retinal stabilization, as demonstrated in the earlier studies (Rucci et al., 2007; Ratnam et al., 2017; Intoy and Rucci, 2020), here we found that stabilization reduced visual crowding. These findings emphasize the impact of ocular drift on foveal crowding; eliminating the retinal motion introduced by ocular drift, which causes flankers and distractors to stimulate the same pooling region during fixation, leads to a reduction of crowding effects. It is important to point out that retinal stabilization did not eliminate visual crowding indicating that, in addition to ocular drift, both cortical and optical factors still contribute to this phenomenon.
Discussion
Traditionally, since neurons integrate information over relatively long intervals, the fixational motion of the eye has been regarded as a possible source of smearing and blurring. However, contrary to this idea, it has long been argued that sensitivity to temporal changes of luminance could actually enhance fine spatial vision. This proposal has a long history. Weymouth and colleagues proposed that these movements might explain sub-cone Vernier acuity and contribute to overall visual acuity (Andersen and Weymouth, 1923; Averill and Weymouth, 1925; Weymouth et al., 1928). This idea then evolved into the so-called dynamic theories of visual acuity (Andersen and Weymouth, 1923; Marshall and Talbot, 1942; Arend, 1973; Ahissar and Arieli, 2001; Rucci and Victor, 2015), and recent technological advances have provided increasing evidence supporting this view (Rucci et al., 2007; Kuang et al., 2012; Ratnam et al., 2017; Anderson et al., 2020; Intoy and Rucci, 2020; Clark et al., 2022; Intoy et al., 2024; Witten et al., 2024). The essence of this proposal is that the motion of the eye, combined with the strong sensitivity of the early visual system to temporal changes, enables encoding of spatial patterns into a spatiotemporal format (Rucci et al., 2018). In keeping with this proposal, retinal stabilization experiments showed that when stimuli are retinally stabilized acuity drops (Rucci et al., 2007; Ratnam et al., 2017; Intoy et al., 2024), a finding that has been replicated in this study in the uncrowded condition. However, while it is now generally accepted that fixational eye movements contribute to fine spatial vision, their influence on visual crowding, which involves mechanisms different from those involved in acuity, has not been considered.
According to the current theories and models, crowding is primarily driven by cortical mechanisms in V1 and beyond (Parkes et al., 2001; Pelli, 2008; Balas et al., 2009; Bi et al., 2009; Van den Berg et al., 2010). It’s believed that lower-level factors, such as the eye’s physiological instability during fixation, don’t affect crowding in the visual periphery. This is because the spatial region from which visual information is pooled together at the cortical level is larger than the amount of motion ocular drift introduces. However, in the central fovea, where receptive fields are much smaller, ocular drift moves stimuli over different pooling regions. Yet, to date, the most studies on foveal crowding do not even record eye movements. Here, we show that the foveola is not exempt from crowding, that foveal crowding is shaped by the retinal consequences of oculomotor behavior and it is not a purely cortical phenomenon. Importantly, we show that, whereas retinal stabilization has a negative impact on acuity, it ameliorates crowding. Again, this pattern of results cannot be explained by retinal smearing causing blurring of the stimulus on the retina. According to this account, both acuity and crowding under retinal stabilization would improve. However, if the visual system groups together stimuli that fall within the same crowding pooling region, not just in space but also in time, when the flankers, moved on the retina as a result of ocular drift, stimulate the same pooling region previously stimulated by the target, uncertainty in target identification increases. This causes more crowding compared to when stimuli are viewed under retinal stabilization. Whether or not the stimulus is viewed in isolation ocular drift enhances stimulus edges and high spatial frequencies. However, when the stimulus is surrounded by flankers, enhancing high spatial frequencies patterns in the stimulus does not improve target visibility (unless the flankers and the target form a gestalt Herzog et al., 2015) because ocular drift at the same time introduces uncertainty in the identity of the target when the same pooling regions are stimulated both by flankers and target over time. Further, our findings suggest that factors contributing to foveal crowding may originate as early as the retina, as the impact of ocular drift on crowding may change depending on individual differences in receptor spacing.
In the central fovea, the increased probability of the same receptors being stimulated by both the target and distractors over time as a result of ocular drift is a determining factor inducing foveal crowding. This is reminiscent of the phenomenon of temporal crowding reported in the visual periphery (Yeshurun et al., 2015). Temporal crowding occurs when target and distractors are presented at the same spatial location in quick succession at different instances in time. Notably, temporal crowding in the periphery only influences crowding in specific circumstances, i.e., when stimuli in the external world change over time. In the central fovea, the similar mechanisms are at play under normal viewing conditions. However, rather than stimuli changing physically over time, they are fixed in space but are continuously being swept over many foveal cones by fixational eye movements.
Although fixational drift is often ignored in visual crowding research, studies investigating clinical populations represent a notable exception. These populations are characterized by an abnormally large fixational instability (Chung and Bedell, 1995; Maxwell et al., 1995; Schor and Hallmark, 1978; Zhang et al., 2008; González et al., 2012) (see Verghese et al., 2019 for a review). Amblyopia, characterized by reduced visual acuity in one eye, frequently leads to erratic fixations, intensifying crowding effects (Chung et al., 2015). Similarly, the effects of crowding are exacerbated in individuals with nystagmus who struggle to maintain a stable gaze (Pascal and Abadi, 1995; Tailor et al., 2021). As a result of this abnormal gaze instability, stimuli move on the retina at higher speeds, and they are quickly brought away from the preferred locus of fixation, and likely outside the boundaries of the foveola, where crowding effects are stronger, in the pauses between saccades (Chung et al., 2015). Here, we show that even the normal physiological instability of gaze contributes to visual crowding. Even if ocular drift is small enough to maintain the stimuli within the region of the highest acuity in the central fovea, this motion naturally introduces uncertainty in target identification in the presence of surrounding flankers.
The prevailing notion in vision science has traditionally held that vision within the foveola is relatively flat and uniform (Hirsch and Curcio, 1989; Marcos and Navarro, 1997; Domdei et al., 2021) and primarily limited by the optics at the front of the eye (Hirsch and Curcio, 1989). While we observe crowding at the very center of gaze, an intriguing question remains unanswered: does crowding behave uniformly within this highly sensitive region, or does it exhibit variations similar to those observed in the periphery? Previous work from our lab has highlighted that vision is not uniform within this 1° region (Poletti et al., 2013; Intoy and Rucci, 2020). However, it remains an open question whether the magnitude and extent of crowding increase with larger eccentricities within the 1° foveola, and if so, how the rate of increase with eccentricity compares to the way crowding changes extrafoveally. Understanding whether crowding displays spatial non-uniformities even within the foveola, carries significant implications for our comprehension of the underlying mechanisms of foveal crowding and foveal vision. Further, at some point, with increasing stimulus eccentricity and with an increase in receptive fields size, the effects of drift on visual crowding should become negligible. Ultimately, based on an individual amount of ocular drift, extent of pooling regions and acuity thresholds at different eccentricities across the fovea, it should be possible to predict the eccentricity at which the influence of ocular drift becomes negligible.
Crowding thresholds in the absence of optical constraints (Coates et al., 2018) are lower than those reported here. Therefore, it is likely that under normal viewing conditions crowding results from a combination of oculomotor behavior and optical factors. If we were to eliminate both optical constraints and the temporal modulations of ocular drift at the same time, and if no additional pooling mechanisms are present at this scale, we might observe a further reduction in critical spacing. This could potentially lead to crowding spacings as small as 0.5′, approximately the spacing between cone photoreceptors at the preferred retinal locus.
Even though crowding is a phenomenon that reduces acuity, in natural conditions, crowding may often be advantageous. Crowding can facilitate perception and extraction of patterns, i.e., textures and gestalts, from the visual input. By using a Bayesian ideal observer approach model, it has been shown that crowding can effectively convey information about overall similar patterns and spatial redundancies of the natural world (Cicchini et al., 2022). While some features are lost, other features may be enhanced (this has been observed as a trade-off in color and motion discrimination tasks Greenwood and Parsons, 2020). Ultimately, crowding can be considered a mechanism for efficient exploitation of spatial redundancies of the natural world (Cicchini et al., 2022). Crowding in the foveola likely serves a similar role, facilitating fine texture discrimination.
Ultimately, this study shows that the mechanisms driving crowding in the central fovea are different than those normally at play in the visual periphery where crowding is primarily the result of cortical and spatial factors. In the central fovea, fixational oculomotor behavior has a major impact on crowding and it adds a temporal dimension to this phenomenon.
Footnotes
This work was supported by National Science Foundation grant BCS-1534932 (to M.P.), by National Institutes of Health grant R01EY029788-01 (to M.P.) and grant EY001319 (to the Center for Visual Science). We would also like to thank Florian Jaeger, Michele Rucci, Krish Prahalad and Janis Intoy for their helpful discussion and decision-making in the experimental design and analysis. We would like to thank Benjamin Moon and Austin Roorda for the Adaptive Optics image acquired for one of our subjects (Fig. 1D). We also thank the anonymous reviewers for the useful and constructive comments.
The authors declare no competing financial interests.
- Correspondence should be addressed to Martina Poletti at martina_poletti{at}urmc.rochester.edu.