Abstract
Category learning and visual perception are fundamentally interactive processes, such that successful categorization often depends on the ability to make fine visual discriminations between stimuli that vary on continuously valued dimensions. Research suggests that category learning can improve perceptual discrimination along the stimulus dimensions that predict category membership and that these perceptual enhancements are a byproduct of functional plasticity in the visual system. However, the precise mechanisms underlying learning-dependent sensory modulation in categorization are not well understood. We hypothesized that category learning leads to a representational sharpening of underlying sensory populations tuned to values at or near the category boundary. Furthermore, such sharpening should occur largely during active learning of new categories. These hypotheses were tested using fMRI and a theoretically constrained model of vision to quantify changes in the shape of orientation representations while human adult subjects learned to categorize physically identical stimuli based on either an orientation rule (N = 12) or an orthogonal spatial frequency rule (N = 13). Consistent with our predictions, modeling results revealed relatively enhanced reconstructed representations of stimulus orientation in visual cortex (V1–V3) only for orientation rule learners. Moreover, these reconstructed representations varied as a function of distance from the category boundary, such that representations for challenging stimuli near the boundary were significantly sharper than those for stimuli at the category centers. These results support an efficient model of plasticity wherein only the sensory populations tuned to the most behaviorally relevant regions of feature space are enhanced during category learning.
Significance Statement
Poisonous or edible? Friend or foe? Quickly grouping objects into appropriate categories is critical to our survival. Many category decisions are supported by the presence of one or more defining features—for example, the shape and color of a banana can easily distinguish it from other fruits at the store. Other decisions require highly precise perceptual representations—which exact shade of yellow determines whether a banana is ripe? We tested the hypothesis that ongoing learning of new visual categories should lead to more precise sensory representations, especially where precision is likely to improve categorization performance. Our results bore this out: active category learning can lead to rapid and specific improvements in the way early visual cortex represents relevant features.
Introduction
Category learning enables us to predict the behavioral relevance of novel stimuli. In the visual domain, this is made possible by selectively attending to the specific features that lead to successful categorization. For example, noting whether an organism has wings is useful for distinguishing birds from mammals but uninformative when classifying bird species. Instead of discrete features, bird watchers are better served by attending to continuous dimensions such as color or texture. Learning to categorize such stimuli can lead to improved perception of subtle differences across relevant dimensions, especially for physically similar stimuli that nonetheless belong to distinct categories (Rosch, et al., 1976; Jolicoeur et al., 1984; Diamond and Carey, 1986; Hamm and McMullen, 1998; Tarr and Gauthier, 2000; Zeithamova and Maddox, 2007; Curby and Gauthier, 2010; Seger et al., 2015).
Dimensional relevancy is a likely catalyst for this improved perceptual sensitivity. For instance, categorizing size- and brightness-varying objects by size makes small size differences easier to distinguish, but not brightness differences (Goldstone, 1994). This may be due to perceptual stretching along the relevant dimension, where small feature value differences become exaggerated (Goldstone and Steyvers, 2001; Folstein et al., 2013, 2015). Neuroimaging and single-unit recording studies support the hypothesis that category learning leads to warped neural representations of relevant exemplars (Sigala and Logothetis, 2002; O’Bryan et al., 2018a; but see Jiang et al., 2007), and such neural plasticity may directly support perceptual discrimination (Folstein et al., 2012, 2013).
Dimension-wide perceptual stretching can account for a broad range of results (Nosofsky, 1986). Nonetheless, open arguments suggest that category learning should produce localized enhancement for a subset of features along an attended dimension (sometimes termed categorical perception). Perceptual noise leads to particularly high classification error rates near category boundaries (Aha and Goldstone, 1992; Ashby and Maddox, 1993), and as such, precise perceptual representations for these exemplars may be uniquely crucial. If so, classifying visually similar between-category exemplars should lead to enhanced neural representations at or around the boundary—especially during learning when internal boundaries are inherently noisy.
The behavioral evidence for such localized representational enhancement effects have been mixed (Van Gulick and Gauthier, 2014; Folstein et al., 2015; Juárez et al., 2019), where most studies search for demonstrations of persistent perceptual improvements outside of active categorization. However, localized representational enhancement is consistent with the known neurobiology of feature-based selective attention. When nonhuman primates attend to specific feature values (e.g., red), sensory neurons that are tuned to the most task-informative values exhibit elevated firing rates, whereas responses from neurons that are tuned to uninformative values (e.g., blue) within the same feature space are often suppressed (Sigala and Logothetis, 2002; Martinez-Trujillo and Treue, 2004; Yang and Maunsell, 2004), leading to enhanced representations of relevant sensory input (Ling et al., 2009). Importantly, the perceptual learning literature indicates that this representational enhancement is task-dependent, especially in early visual cortex (Byers and Serences, 2014).
Most visual categorization studies have focused on parietal, prefrontal, and extrastriate regions with the expectation that they are uniquely sensitive to learning effects (Freedman and Assad, 2016; Uyar et al., 2016). Few studies have examined the possible downstream effects of category learning on retinotopically organized regions of visual cortex, with the recent exception of Ester et al. (2020). Despite relatively sparse research, there is ample evidence to suggest that V1 may play an integral role during category learning, analogous to its role in perceptual learning.
We address this question using fMRI and an encoding model to reconstruct orientation representations within early visual cortex while subjects actively learn to categorize grating stimuli based on an orientation (line angle) rule or an orthogonal spatial frequency (line width) rule. We predicted that orientation representations should be enhanced among orientation learners to optimally support boundary acquisition and minimize prediction error during learning. Furthermore, these sensory modulations should be most pronounced for exemplars that border subjects’ assigned category boundaries, consistent with an efficient model of plasticity.
Materials and Methods
Subjects
A total of 26 healthy adult human subjects (age range: 18–32 years; 13 females, 12 males, and 0 nonbinary) with normal or corrected-to-normal vision were recruited from the Texas Tech University community. Data from one subject was removed due to excessive movement in the scanner, which resulted in a considerable loss of visual cortex coverage. All the subjects provided written informed consent before participating in accordance with the Declaration of Helsinki. The subjects were paid $20/h for the fMRI scanning sessions, and $10/h for behavioral training completed outside of the scanner. This study was approved by the Texas Tech University IRB.
Materials
Visual stimuli were rendered using MATLAB (v.9.1, MathWorks) and presented via Psychophysics Toolbox (v.3.3; Kleiner et al., 2007) on a desktop PC running Windows 10. For a pre-scan training session, stimuli were displayed on a 1,920 × 1,080 pixel resolution BenQ XL2430T monitor measuring 58 cm wide and set to a 100 Hz refresh rate. During all fMRI scans, stimuli were presented on a 1,024 × 768 resolution projection screen measuring 19 cm wide and at a 60 Hz refresh rate.
Categorization task
The primary goal of this experiment is to characterize modulations in orientation-selective population responses while subjects actively learn categories, where orientation is either a category-relevant or irrelevant stimulus dimension. To accomplish this goal, the subjects learned to classify grating stimuli into one of two categories via trial and error based on either an orientation rule (n = 12) or a spatial frequency rule (n = 13). This group sample size was determined based on related studies obtaining medium to large within-group effects sizes with samples ranging between 8 and 13 (Scolari et al., 2012; Byers and Serences, 2014; Ester et al., 2020). Pseudo-random assignment to these conditions was performed based on subject number to ensure an approximately equal number of subjects in each group.
The subjects were not aware of the rule they would learn prior to beginning the categorization task, but they were informed that it may be based on either the orientation or spatial frequency dimensions of the gratings. Critically, all the subjects encountered an identical stimulus set over the course of the experiment regardless of their assigned categorization rule; the task differed between subjects only with respect to the categories to which each stimulus belonged.
Procedurally, each trial of the categorization task began with a 3 s grating stimulus. Gratings were presented centrally on a middle gray background with a radius of 8° of visual angle and flickered at a rate of 5 Hz to drive responses in early visual cortex. During both stimulus presentation and inter-stimulus intervals (ISIs), the subjects were instructed to maintain fixation on a black point in the center of the screen. Fixation was monitored in real time by the experimenter via an MRI-compatible eye tracker (Eyelink 1000 Plus; SR Research) to ensure that the retinotopic location of stimuli was consistent both within and between subjects across task conditions.
The subjects responded with a button press corresponding to “Category A” or “Category B” during the stimulus presentation period. During the last 1 s of the trial, feedback was administered via a color change at central fixation (green and red for correct and incorrect, respectively) while the grating stimulus remained on the screen. Following the 3 s combined stimulus presentation, response, and feedback window, the grating was removed from the screen, and the subjects encountered a fixation-only ISI. The duration of each ISI was pseudo-randomly jittered with a mean of 4 s and drawn from a distribution ranging between 2 and 6 s in 500 ms steps (resulting in 9 possible ISI durations encountered equally often during each scanning run). The subjects completed six categorization scanning runs, of 54 trials each, with a run time of 6 min 20 s.
The exemplars encountered during the experiment varied on the two critical dimensions. Each exemplar took on one of 18 possible values in orientation space, ranging from 5° to 175° in 10° steps. Similarly, the exemplars expressed 1 of 18 possible values in spatial frequency space, ranging from 0.44 cycles/degree to 1.25 cycles/degree in 0.045 cycle/degree steps. The values for each dimension were randomized throughout the experiment.
For all the subjects assigned to learn the spatial frequency rule, the category boundary was defined as the midpoint of the constrained spatial frequency space, with the nine highest spatial frequencies belonging to Category A and the nine lowest spatial frequencies belonging to Category B. For the subjects assigned to learn an orientation rule, one of four possible category boundary pairs was assigned based on subject number (20°/110°, 40°/130°, 60°/150°, and 80°/170°). In the 180° orientation space, boundary pairs are required because the orientation space is circular (Fig. 1).
Within the week prior to their scheduled fMRI scans, the subjects attended a brief (<30 min) training session outside of the scanner where they completed two practice blocks of a categorization task. The task employed the same stimuli and response mappings used in the primary fMRI experiment. Critically, however, the categorization rule for these practice blocks was identical across all the participants, using a 45/135° orientation boundary pair that was not assigned to any subjects for the scanning session. The subjects were told that the categorization rule could be based on either the spatial frequency (width) or the orientation of the gratings, but were not explicitly informed to which rule they were assigned. The rationale for this brief practice session was to sufficiently familiarize the participants with the task procedure and stimuli, and ultimately was expected to support more rapid learning when the categorization task was completed in the scanning environment. On the day of the scanning session, the subjects were reminded that they would encounter a new, random rule defined by either the spatial frequency or orientation of the gratings.
Orthogonal contrast discrimination task
To allow for tightly controlled within-subjects comparisons, the subjects completed six scanning runs of an orthogonal contrast discrimination task made up of the same flickering stimuli used in the categorization task. Here, the subjects were required to discriminate between slight increases and decreases in grating contrast. We reasoned that discriminating contrast changes would provide a strong control condition, because this requires that the subjects attend to the grating to successfully complete the task (thus matching the presumed spatial extent of attention across tasks).
Once in each trial, the contrast of the grating either decreased or increased for 100 ms (within a single flicker cycle). The subjects were instructed to press a button with their index finger to indicate a perceived decrease in contrast, and with their middle finger to indicate a perceived increase in contrast. As with the categorization task, feedback was administered in the form of a red or green fixation point appearing on the screen for the final 1 s of the 3 s stimulus presentation window. Each trial was separated by a jittered ISI with the same parameters used in the categorization task described above.
To allow enough time for the subjects to respond and receive feedback during the 3 s stimulus presentation window, the brief contrast changes were applied at pseudo-random intervals within the first 1.5 s of stimulus onset. For the first run of the contrast task, the magnitude of contrast changes (both increases and decreases) started at a default of 20%. After the first run, task difficulty was manually titrated by the experimenter on a run-by-run basis to approximately match the expected performance in the categorization task by increasing or decreasing the magnitude of contrast change for each run in 5–10% increments.
Importantly, all contrast scans were run first to ensure that the subjects did not engage in orientation or spatial frequency categorization during the task. The same contrast changes were then implemented on a scan-by-scan basis during the categorization task to perfectly equate all stimulus properties across the two study phases, but these changes were irrelevant during categorization.
Retinotopic mapping
All the subjects recruited for the study completed a separate, standard retinotopic mapping scan. This procedure is used to identify and map early visual cortical areas (V1, V2, and V3) unique to each subject. The scans required fixation on a rotating checkerboard stimulus, subtending 60° of visual angle and flickering at a rate of 8 Hz (Engel et al., 1994; Sereno et al., 1995; Swisher et al., 2007; Arcaro et al., 2009). To ensure that the subjects were attentive throughout the scan, they were instructed to press a button with their right index finger when they detected a gray segment that periodically appeared in the stimulus display. The functional datasets were later projected onto an inflated representation of cortex for each subject to demarcate the functional borders between visual areas V1v, V1d, V2v, V2d, V3v, and V3d.
fMRI data acquisition and preprocessing
Imaging data were acquired on a 3.0T Siemens Skyra MRI scanner at the Texas Tech Neuroimaging Institute. MPRAGE anatomical scans (two collected during the retinotopy scan session; one collected during the experimental scan session) provided high-resolution structural images of the whole brain in the sagittal plane for each participant (TR = 2.5 s; TE = 1.7 ms; θ = 7°; slice thickness = 1 mm, slices = 172). Functional images were acquired using a single-shot T2*-weighted gradient echo EPI sequence (TR = 2 s; TE = 40 ms; θ = 72°; FoV = 256 mm; matrix = 128 × 128 mm; number of axial slices = 25, voxel size = 2 × 2 × 3 mm with 0.5 mm gap), and slices were oriented to cover the full extent of the occipital lobe.
Data preprocessing was carried out using AFNI and SUMA with custom time series analysis routines for slice-time correction, between- and within-scan motion correction, and high-pass temporal filtering (3 cycles/run). Voxel time series were normalized (z-scored) within run to correct for differences in mean signal intensity across voxels, and trial-level activation in each voxel was demeaned to ensure that evidence of orientation selectivity can be attributed to the activation patterns in orientation-selective cortex as opposed to mean changes in the BOLD response across voxels that may be evoked by different orientations. Finally, data were spatially smoothed using a 4 mm full width at half maximum Gaussian kernel.
Voxel selection
For the primary categorization task and orthogonal contrast discrimination task, 3 s trial-level BOLD responses were estimated using block regressors in AFNI's 3Ddeconvolve program. Estimates for the amplitude of the BOLD response on each trial served as input for the inverted encoding model described below that was used to generate reconstructed orientation representations associated with each task. This was accomplished using an iterative leave-one-out approach, in which one scanning run from each task type was held out for testing and the remaining 10 runs (five from the contrast discrimination task and five from the categorization task) were used for training. This approach was repeated six times so that all runs ultimately served as test sets.
Prior to each iteration of the inverted encoding model, the independent training set was first used to identify subsets of voxels in V1, V2, and V3 that best distinguished between differing orientation values. F-values for a one-factor ANOVA with orientation as the single factor were computed for all voxels contained within the current training set, and then ranked where the top 25% of voxels were used to train and test the model. Importantly, this approach allows us to reconstruct orientation representations for each task from a common set of orientation-selective voxels.
Inverted encoding model analyses
The BOLD responses observed for selected voxels should be associated with the summed activity of many individual orientation-selective neurons. Although the underlying neurons contributing to the BOLD signal from each single voxel may theoretically exhibit different orientation preferences, they should collectively exhibit small but reliable biases in orientation sensitivity that are then detectable at the voxel level (Kamitani and Tong, 2005; Serences et al., 2009; Jia et al., 2011). These consistent biases can be leveraged to make quantitative predictions about how representations of stimulus orientation have changed across visual cortex as a result of task demands or voluntary attention (Sprague et al., 2015). This approach was adopted under the premise that learning categories defined by an orientation rule may lead to shifts in the amplitude, slope, and/or bandwidth of population-wide response functions (Byers and Serences, 2014), particularly for challenging stimuli falling near the category boundaries.
To generate these predictions, we employed an inverted encoding model (Brouwer and Heeger, 2009, 2011; Scolari et al., 2012; Byers and Serences, 2014; Sprague and Serences, 2015; Sprague et al., 2018; Ester et al., 2020 ). Encoding models make theoretically motivated assumptions about how relevant features are represented in the brain. When the subjects encounter visual features represented in this model, the resulting BOLD response can be used to weigh voxels according to the similarity between their true response and the theoretical response for each feature. Finally, the model is “inverted,” such that the voxel weights associated with each feature are used to reconstruct channel response functions (CRFs) using independent task data.
The functions used for the theoretical basis set in the model were based on well-established single-unit tuning functions in V1 associated with orientation perception. Specifically, the model assumes each orientation tuning function to be half-sinusoidal in shape and raised to the ninth power, where the half-bandwidth of orientation-selective neurons spans 20° of orientation space. The model requires a minimum number of evenly spaced functions such that the entire 180º space is covered, and the maximum number of functions should not exceed the number of unique features presented in order to avoid overfitting. To both satisfy these criteria and to maintain consistency with previous studies (Scolari et al., 2012), we used a basis set of 10 evenly distributed orientation functions in the current experiment.
The a priori model parameters described above were incorporated into an encoding model first described by Brouwer and Heeger (2009, 2011) with the goal of reconstructing orientation representations associated with different task conditions. Formally, the model requires input parameters for the number of voxels selected (m), the number of trials in the training or testing datasets (n), and the number of pre-defined orientation channels (k, where k = 10 for the current study). B1 and B2 represent m × n matrices used to denote the training and testing datasets. The training data (B1) was mapped on to the full rank matrix of hypothetical channel outputs (C1, k × n) using a weight matrix (W, m × k) estimated from the training data using a GLM:
Finally, each subject's averaged CRF was fit with the following exponential cosine function (Byers and Serences, 2014; Ester et al., 2020):
The statistical effects of categorization condition (orientation/spatial frequency), task phase (contrast/categorization), and distances from the category boundary (near/far) on each CRF parameter were evaluated using linear mixed models implemented in the “lme4” R package with subjects as a random effect, accompanied by pairwise comparisons via paired or two-sample t tests, where appropriate. Effect size estimates for pairwise comparisons were computed using Cohen's d. In addition to traditional p-values, we report Bayes Factors (BF10) to characterize evidence for the null or alternative hypotheses using the “BayesFactor” R package (Morey and Rouder, 2011).
Inverted encoding model predictions
Our task design allowed us to test for possible changes in orientation representations both within and between subjects. First, the orthogonal contrast detection task served as a stimulus-matched comparison condition to determine if learning to categorize stimuli based on orientation enhances the neural representation of behaviorally relevant feature values in visual cortex. We anticipated that the reconstructed representations of stimulus orientation should be relatively enhanced during the categorization task compared to the contrast discrimination task for subjects assigned an orientation rule. This enhancement could take the form of higher amplitudes, steeper slopes, and/or narrower bandwidths. Such enhanced representations may be most beneficial when they are centered on or near the presented orientation value, especially during early learning when the participants are engaged in active exploration of the category space. Thus, we might also expect the estimated function centers to be closer to the presented orientation value during orientation categorization compared to contrast discrimination. Because we are specifically evaluating representations of orientation space in visual cortex across all conditions, and because we expect such enhancement to only be present within neural populations representing the task-relevant dimension, these predictions are specific to only the orientation rule group. Conversely, we expected no differences in any of these measures between the categorization and contrast tasks among the spatial frequency group, where in both cases, orientation is irrelevant.
We furthermore predicted that the subjects learning to categorize stimuli based on orientation would exhibit enhanced orientation representations that are specifically relevant to categorization decisions. In particular, we expected representational enhancement to be most prominent for stimuli near subjects’ assigned category boundaries in the orientation group compared to exemplars at the center of each category.
Offline, we randomly applied one of the four orientation boundary pairs to each of the spatial frequency learners’ data to accommodate between-subject comparisons of orientation representations for near- and far-boundary trials. Importantly, we used the same boundary pairs that were assigned to the orientation group, so that the boundaries were fully matched between groups. For the spatial frequency group, we expected the shape of the resulting CRFs to be uniform, as no significant representational differences should occur for stimulus values that are near or far from arbitrarily assigned orientation boundaries.
It is possible that boundary-specific enhancement effects emerge at a specific stage of learning. For example, enhanced sensory representations of stimuli may only be beneficial during early learning when many errors are committed around the category boundary. Alternatively, such effects may instead emerge only after an adequately high level of performance is achieved in the task. To address these possibilities, we divided the data into early (blocks 1 and 2) and late (blocks 5 and 6) learning stages to test whether learning duration differentially modulates the shape of reconstructed representations for near- and far-boundary stimuli.
Results
Learning performance
To ensure that the spatial frequency categorization task was an appropriate control for orientation categorization, we first compared mean task accuracy and asymptotic learning between both groups. Mean categorization accuracy was well above chance among both the orientation (M = 83.0%, SD = 11.8%) and spatial frequency (85.1%, SD = 4.1%) groups across the 6 categorization blocks. Critically, all the subjects learned their respective category rules as indicated by accuracy on the last two learning blocks (orientation: range = 62.0–96.3%; spatial frequency: range = 75.9–92.6%), including the worst-performing subject whose accuracy remained significantly above chance in the last two blocks based on a one-sample t test of trial responses against a mean of 0.5, t(107) = 2.57, p = 0.006.
To ensure that category learning was well matched between groups, we used a linear mixed model with factors for learning block, categorization group, and their interaction. The model revealed a significant main effect of block on accuracy, F(5, 115) = 6.94, p < 0.001, while neither the main effect of group, F(1, 23) = 0.35, p = 0.56, nor block × group interaction, F(5, 115) = 0.60, p = 0.70, were significant. Taken together, these results suggest that the accuracy of both groups improved significantly over the course of the six learning blocks, and that these improvements did not differ between categorization rules (Fig. 2a). Thus, the spatial frequency rule served as an appropriate control condition to the orientation rule.
Secondary within-subject comparisons were carried out to assess performance on the stimulus-matched contrast discrimination task relative to categorization. The mean accuracy for the discrimination task was somewhat lower than that observed for the categorization task in both the orientation (M = 74.9%, SD = 15.1%) and spatial frequency groups (M = 79.5%, SD = 6.2%). Linear mixed models with accuracy as the outcome variable and factors for task (contrast vs categorization), block, and their interaction revealed a significant main effect of task for the orientation group, F(1, 11) = 7.42, p = 0.02, with categorization accuracy being higher than contrast discrimination accuracy on average (Fig. 2b). For the spatial frequency group, a significant task × block interaction was observed, F(5, 120) = 3.54, p = 0.005, such that relative differences in accuracy were larger for spatial frequency subjects on the categorization task relative to the contrast task for early blocks, but not late blocks (Fig. 2c). Importantly, the mean performance on the contrast discrimination task did not significantly differ between the orientation and spatial frequency subjects, t(23) = −0.99, p = 0.33, d = 0.43. These results suggest that the orthogonal contrast discrimination task was slightly more difficult than the subsequent categorization tasks completed by both groups, but critically, these differences were largely equated between the experimental groups.
Channel response functions for categorization versus contrast discrimination tasks
We predicted relative increases in amplitude and slope, as well as possible decreases in bandwidth and center shift of orientation CRFs, when comparing the orientation categorization task to the orthogonal, physically matched contrast discrimination task. This prediction is based on the broader theory that fine perceptual discriminations between continuously valued stimuli may be supported by stronger (e.g., higher amplitude) and/or more specific (e.g., steeper slopes) neural representations of task-relevant features in the sensory populations responsible for their perception (Scolari et al., 2012; Byers and Serences, 2014). We were particularly interested in testing the interaction between task (contrast vs categorization) and category learning condition (orientation vs spatial frequency), as an evidence of representational enhancement. The reported analyses were computed using all trials regardless of behavioral performance to ensure that all unique orientation values were equally represented in each condition. However, we note that the results remained consistent when restricted to the correct trials only. The results for areas V2 and V3 were closely matched across statistical comparisons, so the CRFs were averaged across both regions for all analyses.
Linear mixed models including factors for categorization condition (orientation and spatial frequency), task phase (contrast discrimination and categorization), and their interaction were performed with each CRF measure as the outcome variables for both V1 and V2/V3 (Fig. 3). Consistent with our predictions, the model revealed a significant crossover interaction between categorization dimension and task phase in amplitude within area V1, F(1, 23) = 9.87, p = 0.003, BF10 = 15.8. Amplitudes were significantly higher during categorization compared to the contrast discrimination task among orientation rule learners, t(11) = 2.33, p = 0.04, d = 1.18, BF10 = 1.96. The spatial frequency group showed a trend in the opposite direction: amplitude on the contrast discrimination task was slightly greater and did not significantly differ from that on the categorization task, t(12) = −1.47, p = 0.17, d = 0.62, BF10 = 0.67 (Fig. 3). Between groups, orientation rule learners exhibited significantly higher amplitudes than spatial frequency rule learners during categorization, t(23) = 2.85, p = 0.01, d = 1.25, BF10 = 5.50, but not during the orthogonal contrast discrimination task, t(23) = −1.33, p = 0.20, d = 0.53, BF10 = 0.70. Directionally consistent, albeit less reliable, patterns were observed in V2/V3 (categorization dimension × task phase interaction: F(1, 23) = 3.52, p = 0.07, BF10 = 0.45; categorization vs contrast task: orientation rule learners: t(11) = 1.39, p = 0.19, d = 0.48, BF10 = 0.63; spatial frequency rule learners: t(12) = −0.79, p = 0.44, d = 0.26, BF10 = 0.36; orientation vs spatial frequency rule learners: categorization task: t(23) = 2.21, p = 0.04, d = 0.55, BF10 = 2.02; contrast task: t(23) = −0.13, p = 0.90, d = 0.03, BF10 = 0.37).
Within V1, we similarly observed a significant two-way interaction between categorization dimension and task phase within slope, F(1, 23) = 4.21, p = 0.046, BF10 = 73.5. Slopes were significantly steeper in the categorization task compared to the contrast discrimination task for the orientation group, t(11) = 2.78, p = 0.02, d = 1.36, BF10 = 3.72, while they did not reliably differ for the spatial frequency group, t(12) = 1.08, p = 0.30, d = 0.50, BF10 = 0.46. Likewise, slopes were steeper for orientation rule learners compared to spatial frequency rule learners during categorization, t(23) = 2.94, p = 0.007, d = 1.27, BF10 = 6.48, but did not differ between the two groups during the contrast discrimination task, t(23) = 0.56, p = 0.58, d = 0.23, BF10 = 0.41. Once again, the slope patterns in V2/V3 were consistent but weaker than what was observed in V1 (categorization dimension × task phase interaction: F(1, 23) = 3.23, p = 0.08, BF10 = 7.03; categorization vs contrast task: orientation rule learners: t(11) = 2.28, p = 0.04, d = 0.79, BF10 = 1.84; spatial frequency rule learners: t(12) = 0.66, p = 0.52, d = 0.18, BF10 = 0.34; orientation vs spatial frequency rule learners: categorization task: t(23) = 2.72, p = 0.01, d = 0.68, BF10 = 4.46; contrast task: t(23) = 0.33, p = 0.74, d = 0.09, BF10 = 0.38).
In addition to amplitude and slope, we tested the effects of category learning on orientation CRF center shift and bandwidth. Center shift reflects the relative accuracy of orientation representations, where absolute values indicate how close the peak of the CRF is to the true orientation presented on a given trial. Bandwidth reflects the specificity of the representation in orientation space. We found strong evidence for a two-way interaction between categorization dimension and task phase for center shift within V1, F(1, 23) = 3.81, p = 0.06, BF10 = 12.7. Pairwise comparisons revealed that the CRF centers were significantly closer to 0° during category learning than during the contrast discrimination task among orientation rule learners, t(11) = −3.25, p = 0.008, d = 0.98, BF10 = 7.25. This pattern, however, was absent among the spatial frequency rule learners, t(12) = −0.82, p = 0.43, d = 0.23, BF10 = 0.37. CRF centers for the orientation group were also significantly closer to 0° when compared to the spatial frequency group during the categorization task, t(23) = −2.32, p = 0.03, d = 0.62, BF10 = 2.37, while the groups did not differ during the contrast discrimination task, t(23) = 0.69, p = 0.50, d = 0.19, BF10 = 0.44. This suggests that orientation rule learners exhibited more accurate representations of presented orientations than spatial frequency rule learners specifically during category learning. This pattern was restricted to V1, however (V2/V3: interaction term: F(1, 23) = 0.68, p = 0.41, BF10 = 0.22). In contrast to the other measures, bandwidth was not significantly modulated by categorization condition and phase in either V1, F(1, 23) = 0.60, p = 0.45, BF10 = 0.12, or V2/V3, F(1, 23) = 0.51, p = 0.48, BF10 = 0.08.
Thus far, we have compared reconstructed representations of stimulus orientation across visually matched tasks. Taken together, the results suggest that neural representations of orientation were enhanced during active categorization, and only when orientation was the category-relevant dimension (Fig. 3). This was largely true in all tested areas (V1 and V2/V3), albeit stronger and more reliable in primary visual cortex.
Channel response functions for near-boundary orientations versus central orientations
In the analyses thus far, we have considered all stimulus orientation values together. However, we hypothesized that representational enhancement should be most pronounced for difficult-to-classify stimuli that border the category boundary, where stronger and/or more specific perceptual representations would benefit performance the most. To test these predictions, we created two groups of trials: ones containing orientation values near the assigned boundary (5° offset), and ones containing orientation values far from the boundary (35° and 45° offset). Note that for spatial frequency learners, these trial groups were created based on arbitrary orientation boundaries assigned offline for comparison purposes.
Four linear mixed models with factors for categorization condition (orientation and spatial frequency), orientation offset from the category boundary (near and far), and their interaction were carried out for each tested visual area with amplitude, slope, center shift, and bandwidth as the respective outcome variables (Fig. 4). The amplitude model revealed a significant interaction effect in V1 between category rule and stimulus distance from the boundary, F(1, 23) = 5.56, p = 0.03, BF10 = 91.3 (but not in V2/V3: F(1, 23) = 0.62, p = 0.44, BF10 > 100 for a simplified model including only main effects of categorization task and distance from the boundary). Reconstructed representations of stimulus orientation in V1 had higher amplitudes for stimuli bordering the category boundary relative to those far from the boundary within the orientation group, t(11) = 4.06, p = 0.002, d = 0.76, BF10 = 23.4, but not the spatial frequency group, t(12) = 1.38, p = 0.19, d = 0.45, BF10 = 0.61.
The same interaction was significant within the slope of V1 CRFs, F(1, 23) = 11.9, p = 0.002, BF10 = 26.4: Among orientation rule learners, near-boundary stimuli elicited steeper slopes than those far from the boundary, t(11) = 4.32, p = 0.001, d = 0.88, BF10 = 33.9, an effect that was not present for spatial frequency rule learners, t(12) = −1.15, p = 0.27, d = 0.48, BF10 = 0.48 (Fig. 4). As with amplitude, this effect was largely restricted to V1 (V2/V3: F(1, 23) = 0.71, p = 0.41, BF10 = 3.38).
A converging albeit marginally significant two-way interaction between categorization task and distance from the boundary was present in center shift within V1, F(1, 23) = 3.72, p = 0.07, BF10 = 0.53 (but not in V2/V3: F(1, 23) = 0.02, p = 0.90, BF10 = 0.16). Consistent with predictions, CRFs were centered closer to the presented stimulus on near-boundary trials compared to far-from-boundary trials among orientation rule learners, t(11) = −2.60, p = 0.02, d = 0.45, BF10 = 2.86. At the same time, the centers did not significantly differ across distances from the arbitrary orientation boundaries among spatial frequency rule learners, t(12) = 0.76, p = 0.46, d = 0.19, BF10 = 0.36. Furthermore, the mean center shift was significantly closer to 0 on near-boundary trials among the orientation group compared to the spatial frequency group, t(23) = −2.77, p = 0.01, d = 0.72, BF10 = 4.80. Consistent with our contrast discrimination results (Fig. 3), categorization group and distance from the boundary did not reliably modulate bandwidth in either V1, F(1, 23) = 0.89, p = 0.35, BF10 = 0.98, or V2/V3, F(1, 23) = 2.70, p = 0.11, BF10 = 0.19.
Taken together, our results across amplitude, slope, and center shift converge in strong support of the hypothesis that sensory representations within V1 were stronger and more precise for task-relevant stimulus dimensions in response to learning. Moreover, this enhancement was primarily applied to the most behaviorally relevant features in the space (in this case, orientations flanking the category boundary).
Effects of learning on reconstructed representations of orientation
Averaging across all six blocks of a categorization task, we demonstrated that category learning enhances the sensory representation of task-relevant features. One question that remains is whether and how these representations change over the course of learning. For example, it is possible that the boundary-specific enhancement of orientation representations in primary visual cortex only emerges after asymptotic learning, when the subjects have successfully detected and established the location of the category boundaries. Alternatively, representational sharpening may be driven by prediction error during active learning and thus more apparent during the early stages of the task when the subjects may engage in explicit hypothesis testing to determine the category rule (Medin and Schaffer, 1978; Choi et al., 1993; Johansen and Palmeri, 2002). Finally, a third possibility is that the boundary-specific enhancement holds stable across the course of learning. Most subjects across both categorization conditions reached a performance asymptote by the fourth task block (Fig. 2). Thus, to isolate possible early- and late-learning effects in the present study, we compared reconstructed representations in blocks 1 and 2 of the categorization task to those in blocks 5 and 6. Notably, the model training procedure was identical for early- versus late-learning scanning runs.
To test whether the boundary-specific representational sharpening observed in V1 was differentially modulated early or late during the learning process, we extended the previous model to include a categorical predictor for early versus late learning, with particular interest in the three-way interaction between categorization condition, distance from the boundary, and learning stage. In amplitude, this three-way interaction was not significant despite strong evidence in favor of it, F(1, 46) = 0.01, p = 0.97, BF10 = 59.9 for the full model. However, the model revealed a significant two-way interaction between categorization condition and early/late learning, F(1, 46) = 8.32, p = 0.005, BF10 > 100, for a simplified model including only an early/late × condition interaction and marginal main effects. This interaction reflects the fact that for orientations both near and far from the participants’ boundaries, amplitudes were significantly higher in late versus early learning for the orientation group, t(11) = 2.66, p = 0.02, d = 0.34, BF10 = 3.10, whereas learning duration was associated with a small but statistically significant decrease in amplitude among the spatial frequency group, t(12) = −2.20, p = 0.048, d = 0.63, BF10 = 1.66. This inverse effect in amplitude between the two groups occurred independently of boundary effects: The difference in amplitude for near- versus far-from-boundary exemplars within the orientation group did not differ between learning stages, t(11) = 0.01, p = 0.99, d = 0.001, BF10 = 0.29. Neither the three-way interaction effect, F(1, 46) = 0.19, p = 0.66, nor any marginal effects reached significance when slope was used as the outcome variable, BF10 = 0.01 for the full slope model.
Interestingly, orientation CRF bandwidths were also seemingly modulated in response to category learning, reflected in a significant two-way interaction between categorization condition and learning stage, F(1, 46) = 5.80, p = 0.02, BF10 = 0.26 (but note the discordance between null hypothesis testing and Bayesian analysis, which indicates anecdotal evidence in favor of the null). Independent of category boundaries, bandwidths were relatively narrower in late learning compared to early learning in the orientation group, t(11) = −1.91, p = 0.08, d = 1.02, BF10 = 1.14, albeit not significantly so. At the same time, we observed a learning effect trending in the opposite direction for the spatial frequency group, t(12) = 2.01, p = 0.07, d = 0.69, BF10 = 1.29, whereby bandwidths widened over the course of learning. Across the board, patterns in V2/V3 were once again directionally consistent with V1, but largely unreliable (see Fig. 5).
The amplitude and (to a lesser extent) bandwidth modulation observed within V1 suggests that representations of category-relevant stimulus dimensions are enhanced, especially at later stages of learning. At the same time, boundary-specific representational changes emerged relatively early in learning and remained consistent after the subjects reached asymptotic performance. Because high behavioral performance was achieved quickly on average during the categorization tasks (Fig. 2a), it may be unsurprising that evidence for perceptual enhancement was present during early learning, as we defined it. We note that our approach may have missed dynamic changes to sensory representations occurring on a smaller scale than we can reliably evaluate with the current dataset (e.g., within the first block of exposure and/or on the scale of individual trials; O'Bryan et al., 2018a). Thus, more research is needed to understand the precise time course of sensory modulation during category learning.
Association between task accuracy and reconstructed representations of orientation
Finally, we were interested in testing whether differences in behavioral performance between subjects during the scanning session were associated with the shape of their reconstructed representations of orientation specifically for boundary-adjacent exemplars. On one hand, it is possible that representational sharpening was most pronounced for high-performing subjects who had more time to narrow their attentional focus on the category boundaries after quickly establishing the category rule. Alternatively, it is possible that the representational sharpening observed for near-boundary exemplars in the orientation group is an error-driven effect, such that individuals who were committing more errors in this perceptually challenging region of the feature space exhibited the strongest sharpening effects as a compensatory mechanism.
To address this question, we performed Pearson correlations between mean categorization accuracy and subject-level differences in CRF parameters associated with near- versus far-from-boundary stimuli (e.g., near amplitude–far amplitude). Because all observed patterns were stronger and more reliable in primary visual cortex, we expected any significant associations with behavior to occur in this area. We found that orientation subjects’ categorization accuracy was significantly associated with relatively higher CRF amplitudes in V1 for near-boundary stimuli, r = 0.59, 95% CI [0.02, 0.87], t(10) = 2.31, p = 0.04, as well as with relatively steeper slopes for near-boundary stimuli, r = 0.68, 95% CI [0.17, 0.90], t(10) = 2.90, p = 0.02 (Fig. 6, red). Among spatial frequency learners, the associations between task accuracy and indices of near-boundary representational sharpening were negative and nonsignificant (amplitude: r = −0.46, 95% CI [−0.81, 0.12], t(11) = −1.71, p = 0.12; slope: r = −0.10, 95% CI [−0.62, 0.48], t(11) = −0.33, p = 0.75; Fig. 6, blue). Fisher's r-to-z transformations revealed that the correlations between accuracy and boundary-specific representational changes significantly differed between categorization groups (amplitude: z = 2.56, p = 0.01; slope: z = 2.02, p = 0.04). We found no significant associations between learning performance in V1 center shift or bandwidth, nor among any individual CRF parameters in V2/V3.
Collectively, the results suggest that learning category rules defined by orientation not only led to sharper representations of orientation than learning an orthogonal rule, but that the strength and specificity of the reconstructed representations in V1 for challenging near-boundary (5°) exemplars track individual differences in categorization accuracy. Higher-performing orientation subjects exhibited more relative enhancement of near-boundary orientation representations than lower-performing subjects who nonetheless learned the category rules. Importantly, because the current sample size is limited for the purposes of testing individual differences in behavior, future studies should seek to replicate the demonstrated associations between categorization performance and indices of representational enhancement.
Discussion
Learning to categorize visual stimuli leads to improved perceptual representations (Jolicoeur et al., 1984; Hamm and McMullen, 1998; Zeithamova and Maddox, 2007; Soto and Ashby, 2015), but a consensus has not been reached on how and when this occurs. We hypothesized that category learning is supported by a representational enhancement of near-boundary exemplars within sensory cortex, and that this effect should be most pronounced during learning, especially if it facilitates exploration and discovery of the boundaries that define categories. Using task manipulations that matched all aspects of the visual display, we found evidence of stronger and sharper orientation representations during a categorization task compared to an orthogonal contrast discrimination task and only for orientation rule learners. Moreover, reconstructed representations of near-boundary exemplars within V1 exhibited higher amplitudes, steeper slopes, and smaller shifts from the presented orientation compared to those far from the boundary. This suggests that visual category learning is accompanied by rapid, feature-specific functional plasticity in early visual cortex to support more challenging category-relevant perceptual discriminations.
Category learning is a complex cognitive process that recruits a variety of distinct brain regions in a task-dependent manner (Seger and Miller, 2010). In the current study, the participants learned to categorize stimuli based on unidimensional rules that can be acquired via explicit reasoning processes (Ashby et al., 1998; Zeithamova and Maddox, 2006; Ashby and Maddox, 2011) which are thought to depend critically on the prefrontal cortex (Ashby et al., 1998; Freedman et al., 2003; Jiang et al., 2007; Paniukov and Davis, 2018; O’Bryan et al., 2018b) and executive cortico-striatal loops between the head of the caudate and PFC (Seger and Cincotta, 2006; Seger and Miller, 2010). Rule-based categorization is therefore expected to operate on abstract stimulus representations that are far removed from their basic sensory properties. Indeed, recent work shows that rule-based category learning generalizes for stimuli presented at untrained retinal locations, consistent with the view that this type of categorization depends principally on visual input from higher-level regions with large receptive fields (Rosedahl et al., 2018). Thus, a direct neurobiological mechanism linking visual category learning to changes in early visual cortex has been unclear to date, and whether neural plasticity includes sensory regions has been largely ignored or discounted in most neuroscientific investigations of categorization (Freedman and Assad, 2016).
We argue that our new findings of localized perceptual enhancement during categorization are the result of top-down, feature-based attention mechanisms recruited to support learning in tasks that require discrimination of specific feature values. The importance of accounting for learning-related attentional flexibility has long been recognized across theoretical accounts of categorization (Nosofsky, 1986; Kruschke, 1992), and several neuroimaging studies have shown that higher-order visual areas exhibit greater sensitivity to stimuli along attended, diagnostic dimensions relative to those that do not predict category membership (Sigala & Logothetis, 2002; Li et al., 2007; Meyers et al., 2008; Folstein et al., 2013; Mack et al., 2013; Uyar et al., 2016; O’Bryan et al., 2018b). Mounting evidence indicates that attentional control can also exert modulatory effects in early visual cortex, including V1 (Kamitani and Tong, 2005; Scolari and Serences, 2009; Serences et al., 2009; Scolari et al., 2012), leaving open the possibility that V1 contributes to category learning despite its limited treatment in the literature. For example, perceptual learning elicits sensory modulation in early visual cortex by increasing gain for attended features, suggesting that such learning may directly support behaviorally relevant perceptual discriminability (Byers and Serences, 2014).
Our data likely reflect a similar mechanism, whereby modulation of orientation representations in early visual cortex are a result of selective attention toward category-relevant features. These attentional control signals may originate from frontoparietal cortex (Scolari et al., 2015) or from frontal regions that are routinely activated during categorization tasks (e.g., rostrolateral prefrontal cortex; Davis et al., 2017; Paniukov and Davis, 2018; O'Bryan et al., 2018b). Future research should investigate how higher-order regions interact with sensory cortex in support of category learning.
The current study demonstrates that neural representations in early, sensory-driven regions can be rapidly modified to optimize behavior in a learning context. Our results compliment recent work by Ester et al. (2020), who similarly used an inverted encoding model to test for categorization-related sensory modulation in visual cortex. The participants in their study were trained to ceiling performance on an orientation categorization task prior to scanning, such that their fMRI results reflect perceptual representations following, but not during, the acquisition of category rules. In a departure from our findings, the results revealed that the reconstructed representations of stimulus orientation were shifted toward the mean of the category they belonged to, suggesting that increasing within-category similarity in sensory populations supports generalization of learned rules.
This study extends the previous findings by providing novel support for localized representational enhancement during ongoing category learning. Stimuli bordering orientation rule learners’ assigned boundaries elicited stronger (via increased amplitude), more specific (via steeper slope), and more faithful (via center shift) representations of orientation values in visual cortex than far-from-boundary stimuli. This was especially true in V1. This representational gain implies a stretching of the feature space that is specific to difficult-to-classify exemplars. For example, small offset differences near an orientation category boundary should be more neurobiologically separable (and by extension, perceptually separable) than an identical difference between exemplars near a category center. Moreover, among orientation rule learners, the reconstructed representations for stimulus orientation falling at least 35° from a boundary did not significantly differ from those observed in two orthogonal tasks (contrast discrimination; spatial frequency categorization) where stimulus orientation was irrelevant to task performance. These results suggest that representational sharpening—at least for orientation perception—can be a highly specific effect. Notably, the same pattern could arise from specific feature-based attention paradigms in which only select values are cued or consistently relevant to behavioral goals (Addleman et al., 2022). In our study, trial and error likely guided selective attention to the features that maximize classification accuracy.
The disparate findings between Ester et al. (2020) and the current study could be reconciled by acknowledging that different stages of categorization behavior may depend on distinct attention mechanisms. For example, the participants may initially apply top-down attention during early category learning in service of explicit hypothesis testing, especially in the context of simple (unidimensional) rules (Medin and Schaffer, 1978; Choi et al., 1993; Ashby et al., 1998; Johansen and Palmeri, 2002). This is consistent with our interpretation of the current data. As learning progresses, the participants may switch from a rule-based approach to an exemplar-based one (Johansen and Palmeri, 2002; consistent with Ester et al.). Thus, it is possible that the boundary-specific enhancement we observed exclusively during learning would dissipate after the participants settled on a precise categorization rule and shifted to more automatic, procedural response patterns (Soto et al., 2014). This may correspond with a feature-based attentional shift toward central exemplars and an associated representational shift (Ester et al., 2020) to maximize accurate classification of novel stimuli. Although we did not observe this same center shift here, our results revealed a relative increase in amplitude for central exemplars during late-learning stages, which could serve as a precursor to an exemplar-based approach.
Mechanisms to account for the enhancement of category-relevant dimensions via selective attention are included in most contemporary models of category learning, including exemplar (Nosofsky, 1986; Kruschke, 1992), clustering (Love et al., 2004), and decision bound models (Ashby and Maddox, 1993). However, these models assume that attention is applied uniformly to stimuli along category-relevant dimensions. Consistent with this assumption, several behavioral studies have provided evidence for dimension-wide attentional modulation (Folstein et al., 2012, 2015; Van Gulick and Gauthier, 2014) while failing to detect localized perceptual effects. Nonetheless, more flexible models that allow for exemplar-specific attentional modulation may better account for human behavior in certain categorization tasks (Aha and Goldstone, 1992; Rodrigues and Murre, 2007). Our data provide evidence that such flexible modulation is possible. Accordingly, future research should test whether and how different categorization models could account for these neurobiological data.
Relatedly, more research is needed to establish whether and how perceptual enhancements that result during category learning transfer to novel stimuli and subsequent orthogonal tasks. Different studies attempting to characterize such sensory modulations at the neural and behavioral levels have tested their predictions during the learning process (Goldstone and Steyvers, 2001; Sigala and Logothetis, 2002), following asymptotic learning (Jiang et al., 2007; Folstein et al., 2012, 2013; Ester et al., 2020), and using interleaved categorization and discrimination tasks (Gureckis and Goldstone, 2008). Future studies should seek to contrast these different approaches to establish which scenarios best facilitate transfer between category learning and indices of perceptual sensitivity.
Our data support the prediction that visual category learning is associated with a representational enhancement of sensory populations that are tuned to category-relevant stimulus dimensions. Furthermore, this enhancement was uniquely observed for challenging stimuli that bordered subjects’ category boundaries. Collectively, these results suggest that learning-related changes to the human visual system may be implemented more flexibly and efficiently than previously thought.
Footnotes
This work was supported by the National Science Foundation Grant 1923267 awarded to M.S. S.R.O.’s primary contributions to this work were made at Texas Tech University. He is currently affiliated with the Department of Cognitive, Linguistic & Psychological Sciences, Brown University. We thank Thomas C. Sprague for his helpful discussions on this project.
The authors declare no competing financial interests.
- Correspondence should be addressed to Sean R. O’Bryan at sean_obryan{at}brown.edu.