A Neural Mechanism of Social Categorization

Humans readily sort one another into multiple social categories from mere facial features. However, the facial features used to do so are not always clear-cut because they can be associated with opponent categories (e.g., feminine male face). Recently, computational models and behavioral studies have provided indirect evidence that categorizing such faces is accomplished through dynamic competition between parallel, coactivated social categories that resolve into a stable categorical percept. Using a novel paradigm combining fMRI with real-time hand tracking, the present study examined how the brain translates diverse social cues into categorical percepts. Participants (male and female) categorized faces varying in gender and racial typicality. When categorizing atypical faces, participants' hand movements were simultaneously attracted toward the unselected category response, indexing the degree to which such faces activated the opposite category in parallel. Multivoxel pattern analyses (MVPAs) provided evidence that such social category coactivation manifested in neural patterns of the right fusiform cortex. The extent to which the hand was simultaneously attracted to the opposite gender or race category response option corresponded to increased neural pattern similarity with the average pattern associated with that category, which in turn associated with stronger engagement of the dorsal anterior cingulate cortex. The findings point to a model of social categorization in which occasionally conflicting facial features are resolved through competition between coactivated ventral–temporal cortical representations with the assistance of conflict-monitoring regions. More broadly, the results offer a promising multimodal paradigm to investigate the neural basis of “hidden”, temporarily active representations in the service of a broad range of cognitive processes. SIGNIFICANCE STATEMENT Individuals readily sort one another into social categories (e.g., sex, race), which have important consequences for a variety of interpersonal behaviors. However, individuals routinely encounter faces that contain diverse features associated with multiple categories (e.g., feminine male face). Using a novel paradigm combining neuroimaging with hand tracking, the present research sought to address how the brain comes to arrive at stable social categorizations from multiple social cues. The results provide evidence that opponent social categories coactivate in face-processing regions, which compete and may resolve into an eventual stable categorization with the assistance of conflict-monitoring regions. Therefore, the findings provide a neural mechanism through which the brain may translate inherently diverse social cues into coherent categorizations of other people.


Introduction
Humans naturally sort the world into categories to "provide maximum information with the least cognitive effort" about its myriad contents (Rosch, 1978;p. 28). In the case of other people, such categorization occurs automatically (e.g., sex or race), in turn activating stereotypes and attitudes that influence interpersonal behavior (Macrae and Bodenhausen, 2000). Although seamlessly categorized, faces across the human population exhibit natural within-category variability and thus vary along relevant social category continua in a graded fashion. Therefore, we frequently encounter faces that vary in their prototypicality (e.g., a female face with masculine features). Once perceived, such withincategory variability in facial features can affect the activation of stereotypes and attitudes (Blair et al., 2002) and bear real-world consequences (Galinsky et al., 2013).
Initial research acknowledging the possibility of such gradedness during social categorization argued against simplified binary assumptions (i.e., a category is either activated or not). According to this early account, not only do faces automatically activate a particular social category, but the strength of that activation can vary (Locke et al., 2005). More recent computational models, such as the dynamic interactive model, posit that multiple social categories are always activated in parallel to some extent, particularly when a given face's features are associated with different categories (Freeman and Ambady, 2011). For instance, a female face with certain masculine features may elicit partial activations of both male and female categories early on. This triggers a dynamic competition process that later stabilizes the category percept over time. Recent behavioral studies have provided evidence for this process through the use of computer mouse tracking, in which the attraction of hand trajectories to unselected response options (en route to a final response option) indexes partial activation of multiple categories during perception (e.g., in the case of gender and race; . However, it is currently unclear at what level of neural representation social category coactivation manifests and how the brain arrives at stable perceptions from multiple activated social categories. The fusiform gyrus (FG) and surrounding ventral temporal cortex are involved in face (Haxby et al., 2001) and social category representation (Contreras et al., 2013). If multiple social categories are indeed coactivated by natural mixtures of facial features, then we may expect FG representational patterns of target faces to simultaneously approximate those of two distinct category representations in a graded manner. Indeed, gradations have been observed recently both within (Iordan et al., 2016) and between (Sha et al., 2015) semantic category representations in ventral-temporal cortex and linked to semantic categorizations behaviorally (Ritchie et al., 2015). However, research has yet to investigate how such graded neural representation may involve the competition and resolution of coactivated categories.
Once social categories are coactivated, conflict-monitoring mechanisms are likely important to detect the conflict and help resolve competition between multiply activated representations (Botvinick et al., 2001). Indeed, the resolution of competing perceptual representations is integral to categorization responses . A large body of research suggests that such functions may be performed by the cingulo-opercular network (Dosenbach et al., 2006), including the anterior insula/frontal operculum (aI/fO) and, centrally, the presupplementary motor area and dorsal anterior cingulate cortex (pre-SMA/dACC). Neuroimaging studies have suggested that more dorsal components of the extended pre-SMA/dACC region hold a conflict monitoring signal over and above other often confounding processes in nearby regions, namely task difficulty (Neta et al., 2014), arousal (Nachev et al., 2005), and prediction error (Jahn et al., 2016). It has also been proposed that the nature of pre-SMA processing is more cognitive than the motor processing associated with the nearby SMA (Nachev et al., 2008). Relevant to the current research, conflict-monitoring regions have been shown to respond to similar instances of conflictual social category activations, such as cases in which bottom-up facial features are inconsistent with top-down expectations (Hehman et al., 2014). Therefore, here, we hypothesized the engagement of the pre-SMA/dACC by instances of conflicting social categories.

Materials and Methods
The present work sought to test a model of social categorization in which social categories are represented in a sensitive, graded fashion in the FG. In cases of natural inconsistencies frequently encountered in the social world (e.g., feminine male, a white face with black-related features), we predict a corepresentation of conflictual social categories in the FG, which may in turn trigger conflict-related processes in the pre-SMA/ dACC and other cingulo-opercular regions that may help to resolve multiply activated social categories into stable perceptions. To test this, we developed a novel paradigm to synchronize fMRI with real-time categorization dynamics assessed by computer mouse tracking. Figure 1. a, Stimuli across all three tasks. Per categorization task (top: color; middle: sex; bottom: race), faces varied from one category to another, being either typical or atypical exemplars of their respective categories. b, Behavioral results (n ϭ 16) and example of mouse-tracking paradigm. Results showed significantly higher maximum perpendicular deviation (MD) toward atypical (white line) versus typical (black line) targets across tasks (b ϭ 0.05, SE ϭ 0.006, t (15.058) ϭ 7.935, p Ͻ 0.0001), demonstrating increased deviation toward the nonchosen response options (i.e., coactivation). Results in the bar plot are depicted to plot condition differences intuitively using within-subject error bars (39); however, the analysis was completed in a multilevel mixed model. In this paradigm, participants clicked a start button, after which the stimulus appeared and they selected response options in either top corners of the screen (e.g., male vs female; white vs black; red vs blue). Mouse trajectories were recorded continuously to observe the amount of deviation toward nonchosen responses and index category coactivation. Typical and atypical black faces are included as examples of these conditions.
In the scanner, participants (n ϭ 16) made speeded categorizations of the sex (male vs female) and race (white vs black) of typical and atypical face targets and, to assess the domain generality of the effects, of the color (blue vs red) of object targets as well (Fig. 1a). Atypical targets were those still reliably perceived as belonging to the correct category yet exhibiting a slight featural resemblance to the opponent category (e.g., male face with feminine features; Fig. 1a). We hypothesized that right FG (Kanwisher et al., 1997;Haxby et al., 2001; specializing in face perception) multivoxel patterns should show evidence of social category representation, including the corepresentation of conflictual categories, whereas the pre-SMA/dACC and cingulo-opercular regions should exhibit a stronger engagement during such conflicts.
To investigate the coactivation of opponent categories during perception, we measured online mouse trajectories and blood oxygenation-leveldependent (BOLD) responses from subjects performing a categorization task during fMRI. We used a fiber-optic computer mouse system (NataTech) to allow subjects to operate the mouse in the scanner environment. Pretesting ensured this introduced negligible motor artifacts. The experiment involved three separate 2 ϫ 2 within-subjects design categorization tasks: gender (gender: female vs male ϫ typicality: typical vs atypical), race (race: black vs white ϫ typicality: typical vs atypical), and color (color: blue vs red ϫ typicality: typical vs atypical). Participants also completed a standard demographic survey.

Subjects
Sixteen adult subjects were recruited from the Dartmouth College undergraduate student community (62% male; M age ϭ 19.37; 6 white, 6 Asian, 2 black, 2 other; all right-handed; number of subjects based upon recent studies applying similar methods to study face processing; Ratner et al., 2013;Watson et al., 2014;Stolier and Freeman, 2016). Subjects were financially compensated or received partial course credit for participation. Before the study, subjects underwent an informed consent process and screening for fMRI scanning, which was approved by the Committee for the Protection of Human Subjects at Dartmouth College.

Materials
Task. Evidence for social category coactivation has mostly been obtained via a computer mouse-tracking paradigm. In the mouse-tracking paradigm, hand movements en route to response options are recorded such that, in addition to a final categorical response, the hand's attraction toward each response option indexes the extent of its activation. For instance, when categorizing a female face with subtle masculine features, although participants ultimately select the female response, their hand trajectory simultaneously exhibits a partial attraction to select the male response on the opposite side of the screen . To date, such parallel attraction effects in mouse-tracking paradigms have successfully provided evidence in favor of category coactivation during the categorization of a face's sex , race ), age (Cloutier et al., 2014), or emotion (Mattek et al., 2016. The use of hand movements as a continuous index of evolving categorization dynamics is widely supported by neurophysiological research (Cisek and Kalaska, 2005). For instance, in perceptual decision-making tasks in which a monkey commits a response by reaching in one of two potential directions, premotor cortical populations initially tune toward the two response directions simultaneously. As evidence accumulates over time, gradually, the population for the to-be-selected response is amplified, whereas that for the unselected response is suppressed, demonstrating that information about a perceptual decision is made immediately available to the premotor cortex as it accrues, rather than once it has finalized (Cisek and Kalaska, 2005). In humans, event-related potential studies show that ongoing processing results during categorization (e.g., evidence for a male vs. female perceptual target and according response in a categorization task) are immediately and continuously shared with the motor cortices to steer a hand-guided categorical response over time (Freeman et al., 2011b). Such work suggests that a participant's hand motion, as recorded in mouse-tracking paradigms, can reflect dynamic updates of a decision process as it evolves over fractions of a second (Freeman and Ambady, 2010; Hehman et al., 2015). Nevertheless, hand movements in these paradigms cannot definitively rule out the possibility that a more indirect trajectory reflects merely a less decisive movement (i.e., weaker activation of the selected category) rather than genuine parallel attraction (i.e., coactivation; van der Wel et al., 2009) and thus would benefit from converging evidence. Although other methods of detecting competition between response options are also prevalent (e.g., eye tracking), hand tracking is best suited to the current context due to its high temporal resolution (providing an index of response competition ϳ70 times/s) and its continuous manual data are preferable over discrete oculomotor data for measuring genuine simultaneous activation of multiple response options Freeman et al., 2011).
Participants completed a set of two-choice categorization tasks within the fMRI scanner. The task was designed with MouseTracker software . This allowed us to collect online mouse trajectory data in addition to response decision and timing data Wojnowicz et al., 2009). Mousetracking trials were implemented in a standard two-choice categorization task. Participants were required to make a speeded two-choice categorization decision once the target stimulus appeared. Subjects first clicked a start button at the bottom-center of the screen and then used the mouse to click response options at the top-right and top-left corners of the screen (e.g., male vs female). During this motion online x-and y-coordinates of the mouse were continuously recorded as the participants responded at ϳ70 Hz. These data were used later to estimate curvature toward opposing responses (e.g., when categorizing an atypical female face, deviation toward the "male" response en route to a final "female" response). These deviations were used as an index of category coactivation.
Stimuli. Face stimuli were generated with FaceGen Modeler. This software uses a 3D morphing algorithm based on anthropometric parameters of the human population, in which various social category cues can be precisely manipulated while holding other extraneous cues constant. Forty unique face identities were generated for both face categorization tasks (gender and race; Fig. 1a). These identities were then morphed to appear as typical and atypical category members (e.g., male and female; white and black). Atypical category members were still reliably recognized as members of their specified category but displayed facial features of the opposing category (e.g., female with slight masculine features). This resulted in a total of 160 face stimuli per task, made up of 4 conditions: category (2: male and female or white and black) ϫ typicality (2: typical vs atypical).
Color stimuli were 40 household object photographs fully tinted to different colors. Object photographs were used so as to have an equal number of exemplars of each category as used in the face tasks. Each object photograph was colored as typical and atypical colors. Specifically, each photograph was tinted as typical red and blue, as well as two colors on the spectrum between red and blue still recognized as their respective color category condition. Therefore, a total of 160 color stimuli were generated, made up of 4 conditions: category (2: blue vs red) ϫ typicality (2: typical vs atypical).
Stimulus condition validation. Given the a priori typicality condition labeling of stimuli as typical or atypical based upon parameters in their generation, we collected additional, independent data to validate the assigned typicality of each stimulus. In an independent online sample, we had three groups of participants rate the typicality of each stimulus used in the main imaging study for the color task (n color ϭ 25; M age ϭ 37.84, SD age ϭ 13.79, 40% male, all white), race task (n race ϭ 25; M age ϭ 39.96, SD age ϭ 14.63, 72% male, all white), and sex task (n sex ϭ 25; M age ϭ 40.38, SD age ϭ 12.88, 40% male, all white). Participants were recruited online through Amazon Mechanical Turk and received monetary compensation for their participation. They gave informed consent in a manner approved by the University Committee on Activities Involving Human Subjects at New York University. Participants were presented with each stimulus within their assigned stimulus set from the scanner tasks (160 stimuli per task, as described above) and asked to indicate how "typical" the stimulus appeared of its respective category (e.g., "How typical of the color BLUE is this image?"; 7-point Likert items, spanning the condition labeling, we regressed the independent sample typicality ratings on the condition labels per stimulus (contrast coded: Ϫ1 ϭ typical, 1 ϭ atypical). We found that atypical stimuli were rated as significantly less typical than typical stimuli (F (1,478) ϭ 956.112, p Ͻ 0.00001). This analysis validates the generation of stimuli varying along the typicality condition.

Procedure
Before beginning the experiment, participants completed a practice shape categorization task (triangles vs squares) in the scanner to familiarize them with the scanner mouse and task. Participants then completed three two-choice categorization tasks during fMRI: race, gender, and color. The task order was pseudorandomized per participant; however, the color task never occurred first. Tasks were completed one at a time, each comprising four sequential functional runs. Each run included 40 total trials with a trial order pseudorandomized optimally for event-related BOLD signal estimation using optseq (Dale, 1999), presenting each task condition 10 times within the run. Therefore, participants completed a total of 40 trials per condition over the course of the experiment. Another 10 trials were null events including a fixation cross to estimate baseline. Trials were 4000 ms in duration, in which participants had up to 2500 ms to provide a response. The stimulus was replaced by a fixation cross after any response or if participants did not respond on time, which remained on screen until the beginning of the next trial. During this period, participants were required to return the mouse to click the "start" button at the bottom of the screen and await the next trial. The next trial was not presented if participants failed to return to the start button on time. After the scan, participants completed a general demographic survey.

Experimental design and statistical analyses
Mouse-trajectory preprocessing. Standard mouse-tracking preprocessing was used (Freeman and Ambady, 2010). All response trajectories were rescaled into a standard coordinate space (top left: [Ϫ1, 1.5]; bottom right: [1, 0]) and normalized into 100 time bins using linear interpolation to permit averaging of their full length across multiple trials. For comparison, all trajectories were remapped rightward. To obtain a by-trial index of category coactivation, we calculated the maximum perpendicular deviation (MD) of each mouse trajectory toward the opposite response option. During two-choice mouse-tracking categorization tasks (e.g., male vs female), deviation in a subject's mouse trajectory toward an opposite category response (indexed by MD) is a well validated measure of the degree to which that other category was also activated during the perceptual process (Fig. 2b;Spivey and Dale, 2006;Freeman et al., 2011a).
Image acquisition. Subjects were scanned using a 3 T Philips Intera Achieva Scanner equipped with a SENSE birdcage head coil in the Dartmouth Brain Imaging Center. All stimuli were back projected onto a screen visible via a mirror mounted on the MRI head coil (visual angle ϳ13.5 ϫ 13.5°). Anatomical images were acquired using a T1-weighted protocol (256 ϫ 256 matrix, 128 1.33 mm transverse slices). Functional images were acquired using a single-shot gradient echo EPI sequence (TR ϭ 2000 ms, TE ϭ 35 ms). Thirty-five interleaved oblique-axial slices (3 mm ϫ 3 mm ϫ 4 mm voxels; no slice gap) parallel to the AC-PC line were obtained.
Data preprocessing and pattern estimation. Preprocessing of the imaging data was conducted using AFNI software (version 16.0.09; Cox, 1996). Functional imaging data preprocessing included high-pass filtering of frequencies, slice timing correction, 3D motion correction, voxelwise detrending, spatial smoothing using a 3D Gaussian filter (4 mm FWHM for pattern analyses; 8 mm FWHM for univariate ANOVA analyses), and time-series z-normalization. Structural and functional data of each subject were transformed to standard MNI space. We estimated the average hemodynamic response per voxel for each condition (using the 3dDeconvolve procedure in AFNI). The design matrix included a total of eight predictors: the four stimulus conditions within each task (typical and atypical conditions per each of the two categories) and several predictors of no interest were modeled as well (incorrect responses, no responses, failed starts, null trials). All predictors were modeled as boxcar functions across the first 2 s of each event (during which the face stimuli were presented) and convolved with a gamma variate function (GAM in AFNI). Trial-by-trial neural response estimates were also performed with 3dDeconvolve and the same response function (GAM) and onset specifications, however fitting a unique regressor per stimulus presentation timepoint (via the stim_times_IM method). For pattern analysis, we used the resulting voxelwise t-values (comparing condition responses with baseline) to comprise the whole-brain patterns of activation per stimulus condition (either per run for classification analyses or per trial within each run for trial-by-trial analyses). t statistics were used for multivariate pattern analyses because they have been found advantageous in analyses decoding fMRI data and these features were not normalized (Misaki et al., 2010). For univariate ANOVA analyses, we followed standard procedures and used the resulting voxelwise ␤ values (comparing conditionresponses to baseline) per condition (either per run for ANOVA analyses or per trial per run for trial-by-trial analyses).
Multivoxel pattern analyses (MVPAs). All MVPAs were performed using PyMVPA (Hanke et al., 2009). Per task, we performed a two-way classification between each general category condition (i.e., sex: male vs female; race: white vs black; color: blue vs red). Classification was executed with a support vector machine algorithm. All classification analyses were cross-validated in a leave-one-run-out cross-validation scheme (n Ϫ 1 cross-validation scheme with 4 runs; 2 observations per condition within each run, therefore 6 observations per condition in the training data). These analyses were performed whole brain through a searchlight algorithm . Specifically, cross-validated classification was performed within a 123 voxel sphere (radius ϭ 3 voxels) surrounding each voxel in the brain, with average performance of the classifier mapped back to the center voxel of the sphere. This resulted in a whole-brain map of average classification performance in each task per subject to be submitted to group-level analysis.
Group-level analyses and multiple-comparisons corrections. Wholebrain group-level classification results reported were significance tested and corrected for multiple comparisons using a cluster-wise nonparametric permutation scheme appropriate to MVPA results acquired through a searchlight procedure ( . Results from category searchlight classification per task (n ϭ 16). n-fold crossvalidated classification results (using support vector machines) from a searchlight analysis (radius ϭ 3 voxels) were analyzed at the group level, indicating regions where general target category (e.g., female, Black, blue) could be decoded accurately (above chance, i.e., 50% of the time in a 2-way classification analysis). A swath of cortex spanning earlier ventral and dorsal streams from early visual cortex (EVC) through the bilateral fusiform gyrus (FG) was found to hold information about target categories and significantly decode sex, race, and color category membership. Result maps were significance tested and corrected for multiple comparisons using a cluster-wise nonparametric permutation scheme (voxelwise p Ͻ 0.005; FWE rate of 0.01). All task result maps are depicted on cortical surfaces: a, Sex task. b, Race task. c, Color task.
GroupClusterThreshold in PyMVPA). This algorithm first performed within-subject classification accuracy permutation analyses by generating 100 maps per subject (shuffling classifier labels) and using an identical classifier and cross-validation method as the nonpermuted analyses. Next, a voxelwise cluster-forming threshold for the accuracy map was formed from permutation testing of a group-level map of voxelwise null distributions (feature-wise threshold of p Ͻ 0.005; 100,000 permuted group-level maps formed via a stratified random sampling bootstrapping process, averaging maps between participants). These thresholded bootstrap samples were then used to derive an empirical probability of various cluster sizes in searchlight classification accuracy maps under the null hypothesis [familywise error (FWE) rate of 0.01]. Results reported are searchlight classification clusters surviving this significance test (Fig. 2, Table 1).
To compare univariate differences in BOLD responses across regions of the brain, a 2 (typicality: typical vs atypical) ϫ 3 (task: sex, race, color) whole-brain mixed-effects ANOVA was conducted (p Ͻ 0.01, corrected; participant included as a random effect; 3dANOVA2 in AFNI). Furthermore, to further explore fundamental task differences, we contrast coded the main effect of task [1 sex, 1 race, Ϫ2 color] in a whole-brain analysis. This analysis provided a whole-brain map of univariate effects per subject to be submitted to group-level analyses (b-value maps per typicality, task, their interaction, as well as the contrast coded effect of face vs color tasks). For such univariate analyses, we corrected for multiple comparisons using Monte Carlo simulations (3dClustSim in AFNI; smoothness estimated by a spatial autocorrelation function). We maintained an experiment-wide ␣ Ͻ 0.01 by using a voxelwise threshold of p Ͻ 0.001 with a minimum cluster extent of 83. Minimum threshold and cluster extents were those provided by the output of 3dClustSim.
Trial-by-trial analyses. To conduct analyses investigating the close trial-by-trial relationship of mouse trajectories (MDs) and neural responses (multivoxel pattern effects and univariate activation effects), we extracted trial-by-trial estimates of independent neural effects within ROIs identified from each of our whole-brain analyses. From the results of our classification analysis (Fig. 2, Table 1), we segmented an ROI of the right FG (rFG) from result maps per task (sex rFG ROI ϭ 198 voxels, race rFG ROI ϭ 130 voxels; color rFG ROI ϭ 84 voxels). This mask was created as the portion of the result clusters spanning visual regions that was upon the ventral temporal cortex, being an intersection mask of the result map cluster with a ventral temporal lobe mask as defined by the Harvard-Oxford atlas (Jenkinson et al., 2012). From the results of our whole-brain univariate ANOVA analysis (Fig. 3, Table 2), we created a single pre-SMA/dACC ROI from the pre-SMA/dACC cluster responding significantly stronger toward atypical than typical trials (pre-SMA/dACC ROI ϭ 295 voxels). Due to the robustness of ANOVA results, the pre-SMA/dACC cluster that survived significance testing was notably larger than the rFG ROIs at voxelwise p Ͻ 0.005 used to correct classification analyses (Table 2). Therefore, the pre-SMA/dACC mask was created as the surviving pre-SMA/dACC cluster at a conservative voxelwise p Ͻ 0.0001 to make the cluster a more reasonably comparable size.
To match neural effects to our trial-by-trial behavioral measures, we extracted them from independent trial-by-trial estimates of neural responding. Specifically, to acquire a trial-by-trial estimate of category coactivation in the rFG, within the rFG ROI per task, we extracted the correlation distance (Kriegeskorte et al., 2008) neural pattern similarity of each atypical trial to the average neural pattern of its opponent category. For instance, on a given atypical male trial, we would estimate the neural pattern similarity of that trial (e.g., during perception of the atypical male) to the average neural pattern of its opponent category (e.g., average of neural pattern to typical female), therefore indexing the degree to which an atypical male elicits a neural response similar to female. To acquire trial-by-trial estimates of potential conflict monitoring activation, for each atypical trial we extracted the pre-SMA/dACC BOLD response estimated in the trial-by-trial general linear model (GLM) (against baseline), indexing the extent of responsiveness trial-by-trial in that region to atypical exemplars.
This provided a dataset in which, for each participant, stimulus, trial, and task, we matched the MD, reaction time (RT), rFG neural pattern similarity to opponent category, and pre-SMA/dACC atypical activation for that trial. These datasets were analyzed in R software (http://www. R-project.org/). We used the multilevel mixed linear model (lmer) from the R package lme4 (Bates et al., 2014). Unstandardized regression coefficients are reported. All variables were normalized to optimize performance of the random slopes algorithm (lmer performs best with similarly scaled variables; Bates et al., 2014; normalized across subjects, each variable mean set to 0 and SD to 1). This same normalization transformation was used in later mediation analyses to maintain a similar metric. Each of these models allowed for random slopes and intercept with specific stimulus identity nested within each participant.
Category competition versus general indecision. Another alternative explanation of increased trajectory deviations is that they merely reflect a less decisive movement toward the selected response rather than a genuine parallel attraction toward the opposite response. For instance, during perception of a less typical white face, participant trajectories may take a less direct route to the white response due to mere indecision (e.g., slower accumulation of evidence in favor of the white category) as opposed to genuine attraction to the alternate (black) category reflecting category coactivation and competition. We conducted an additional behavioral experiment to rule out this potential alternative, in which we included target faces bearing partial cues of both irrelevant and relevant  Figure 3. Results from atypical versus typical contrast from whole-brain typicality ϫ task group-level mixed effects repeated-measures ANOVA. Results indicate regions where responses were greater to atypical versus typical category targets on average across tasks (there were no interactions with task across the brain). Interestingly, we found that the cingulo-opercular network, the pre-SMA/dACC and FO/IA, responded more strongly to atypical than typical category members, suggesting potential involvement of conflict-monitoring processes in response to increased category coactivation and competition during atypical target perception. The pre-SMA/dACC cluster was used to construct an ROI for trial-by-trial analyses. p Ͻ 0.01 corrected (voxelwise p Ͻ 0.001 uncorrected with a minimum cluster extent of 83). There was no evidence of any interactions across the brain between typicality and task. All regions showed greater activation to atypical relative to typical targets. L, Left; R, right; M, medial.
categories. For instance, in a white versus black categorization task, partial Asian cues on a white face would be task irrelevant, whereas partial black cues on a white face would be task relevant. If participants' trajectory effects merely reflected less decisive movements because both irrelevant and relevant conditions feature the same degree of increased ambiguity and noise with respect to category cues, then they should elicit similar MD effects. If trajectory effects can reflect genuine coactivation and parallel attraction toward the opposite response, then only when facial cues partially specify the opposite category response will trajectories deviate toward that response.
Participants. An additional behavioral experiment was performed in which participants (n ϭ 49) were recruited to perform a computer mouse tracking task online (M age ϭ 33.06, SD age ϭ 9.1; 61.22% male; 71.43% white, 10.2% black, 18.37% other; one participant omitted due to trackpad use). Participants were recruited online through Amazon Mechanical Turk and received monetary compensation for their participation. Subjects gave informed consent in a manner approved by the University Committee on Activities Involving Human Subjects at New York University.
Task. Participants completed three sets of two-choice categorization tasks. The task was designed with MouseTracker (see Materials and Methods;. Mouse-tracking trials reflect standard two-choice categorization trials. Participants were required to make a speeded two-choice categorization decision once the target stimulus appeared. However, participants first clicked a start button at the bottom of the screen and then used the mouse to click response options at the top-right and top-left corners of the screen (e.g., "black" vs "white"). During this motion, online x-and y-coordinates of the mouse were recorded continuously as the participants responded. These data were used later to estimate curvature toward opposing responses (e.g., when categorizing an atypical black face, deviation toward the "white" response en route to a final "black" response).
Stimuli. Face stimuli were generated with FaceGen software. Ten unique face identities were generated. For each identity, three base race faces were generated (Asian, black, and white). In addition, for each base race, three race cue conditions were generated: no partial cues (base race) and partial cues of the two other races. This generated a total of 90 (10 identity ϫ 3 base race ϫ 3 race cues) face stimuli. Each face stimulus was placed on a gray background (RGB: 175,175,175). Relevance of face partial race cues was determined by task.
Procedure. Participants were instructed to categorize each face according to its perceived race. Each participant completed all three tasks (Asian vs black, Asian vs white, black vs white). Task order was randomly assigned for each participant. Within each task, participants categorized a total of 60 stimuli: per relevant race faces (2), 10 base race faces, 10 relevant cue faces, and 10 irrelevant cue faces. Stimulus presentation was randomly ordered per task.
Analysis preparation. One subject was removed for not following task instructions. Consistent with prior face categorization mouse-tracking work, we limited analysis to correct trials with quick RTs (Ͻ2000 ms). The average error rate across subjects was low (2.44% of trials) and nearly all RTs were within the target range (98.8% of trials, Ͻ2000 ms). To perform analyses with subject as the unit of analysis, we estimated our measure of category coactivation (MD) for three conditions within each subject: base race, relevant partial cues, and irrelevant partial cues. This analysis was collapsed across task, base race, and partial cues race for parsimony.

Results
Sex, race, and color categorization proceeded in separate runs. During neuroimaging, participants categorized targets by moving a fiber-optic computer mouse. On each trial, a start button appeared at the bottom center of the screen. Once clicked, the target face or object appeared at the bottom center of the screen and participants were asked to click a response at the top-left and top-right corners of the screen as quickly and accurately as possible. The movement trajectory recorded during each trial, including the MD toward the unselected category response (on the opposite side of the screen), indexed coactivation of that category (Fig. 1b;.

Behavioral results
First, we analyzed the relationship between MD and target typicality to assess behavioral measures of coactivation and competition between categories due to partial cues of the opponent category. We used a multilevel random-slopes model to regress MD upon typicality (typical vs atypical; coded Ϫ1 and 1, respectively) on a trial-by-trial basis across tasks (allowing for random intercepts and slopes, with face identities nested within subjects; this model structure was used for all subsequent analyses). Consistent with prior mouse-tracking studies involving sex, race, and color categorization , there was significantly higher MD during atypical compared with typical trials [b ϭ 0.122, SE ϭ 0.015, t (15.058) ϭ 7.935, p Ͻ 0.0001, 95% confidence interval (CI) ϭ 0.091-0.153; Fig. 1b], suggesting that mouse trajectories were partially attracted to the opposite category response due to atypical cues related to that category.
Rather than providing evidence for a parallel competition between coactivated categories, increased MD could also be spuriously produced from sequential shifts in movements. Specifically, our prediction is that, throughout the response trajectory, a participant's movement in the atypical condition should always reflect a dynamically weighted combination of movement toward both categories due to parallel activation (e.g., both male and female; both white and black). If true, then the average trajectory in the atypical condition should exhibit graded, partial attraction toward the opposite category en route to the selected category. However, a higher average MD could also be caused by nongraded activations with several discrete-like errors in which, on some trials, one category activates ϳ100% (straight movement to "female"), followed by a subsequent correction and the other category activating ϳ100% (straight movement toward "male"). If all trials in the atypical condition exhibited such discrete shifts, then the average trajectory would clearly be shaped as such, which was not the case (Fig. 1b). However, if only a subpopulation of trials in the atypical condition exhibited such shifts, it is possible the average trajectory would spuriously exhibit graded, partial attraction, but the amount of attraction (MD) would be bimodally distributed. This is because some trials would involve a shift (i.e., extreme attraction), whereas others would proceed normally with a direct movement (i.e., zero attraction). The modality of the MD distribution was tested with Hartigan's dip statistic, a method found to most reliably distinguish between such discrete-shift versus parallel-attraction trajectory profiles in mouse-tracking experiments (Freeman and Dale, 2013). There was no evidence of multimodality in the MD distribution of the atypical condition (D atypical ϭ 0.0043, p atypical ϭ 0.9831, n.s.), nor in that of the typical condition (D typical ϭ 0.0031, p typical ϭ 0.9977, n.s.; D all ϭ 0.0019, p all ϭ 0.9999, n.s.).
Together, these findings cement the evidence that the trajectory attraction effects observed in the fMRI experiment reflect genuine coactivation of both social categories in parallel, rather than a subpopulation of discrete-like error responses or a mere weaker representation of the selected category and less decisive movement.

Category representational analyses
We first sought to identify regions involved in representing faces' social categories. Rather than assessing response differences be-tween conditions averaged across voxels within a region, MVPAs may identify regions where conditions reliably elicit distributed multivoxel patterns of local activity, which is often the case in perceptual representation (Haxby et al., 2001). We performed classification to identify regions that could discriminate between overall category conditions above chance (male vs female; white vs black; blue vs red). A searchlight procedure was used, in which classification analyses (via support vector machines) were conducted iteratively in local searchlight spheres throughout cortex . We limited these analyses to occipital and ventral-temporal cortex given their well established role in perceptual representation of such stimuli (Kanwisher et al., 1997;Haxby et al., 2001). Searchlight analysis (FWE rate of 0.01, corrected) revealed a single broad swath spanning occipital and early ventral-temporal cortex, including the FG, which exhibited above-chance classification accuracies in all three tasks (Fig. 2, Table 1). These findings are consistent with prior research observing categorical representation of faces and colors in these regions (Brouwer and Heeger, 2013;Contreras et al., 2013;Freeman and Johnson, 2016;Stolier and Freeman, 2016).
Having confirmed that faces' sex and race categories were indeed reflected in multivoxel patterns of the FG and other ventral-temporal regions involved in perceptual representation, we next tested whether the multivoxel pattern elicited by a given face approximates that associated with the face's opponent category to the extent that the face bears cues associated with that category. Specifically, we were interested in whether the extent of coactivation of an opponent category (e.g., female for a male face), as measured by MD, is associated with a stronger approximation in right FG multivoxel patterns toward those associated with that opponent category (e.g., female). To do so, we first estimated the neural pattern representational similarity (Kriegeskorte et al., 2008) of each trial category representation to its opponent category's. This was calculated as the similarity of each atypical trial voxel pattern to the average voxel pattern of the opponent category (Pearson correlational distance; 1 Ϫ r). For instance, we calculated the representational similarity of each atypical male trial voxel pattern to the average voxel pattern of the typical female condition. These trial-by-trial data were calculated within the rFG region elicited by our searchlight classification analysis (see Materials and Methods). Whereas the whole-brain searchlight demonstrated the discriminability of categories collapsing across typicality conditions (e.g., male vs female regardless of typicality), here, we performed an independent analysis within the atypical condition. Specifically, we assessed the trial-by-trial relation of each atypical exemplar's neural-pattern similarity with that trial's associated MD, an analysis that is statistically independent from the initial whole-brain analysis (overall discriminability of categories within each subject). Using multilevel regression that can incorporate trial-by-trial data (performed in R with lme4), rFG neural category similarity values were regressed upon MD, finding a significantly positive relationship between rFG pattern similarity and MD (b ϭ 0.015, SE ϭ 0.003, t (17.284) ϭ 4.427, p ϭ 0.0004, 95% CI ϭ 0.008 -0.023). Therefore, to the extent that participants were partially attracted to the alternate category response behaviorally (e.g., toward "male" for a female face), rFG patterns exhibited a degree of greater similarity to that alternate category (e.g., male).

Category competition analyses
To identify neural regions responsive to the extent of coactivation and competition, we first conducted a 2 (typicality: typical vs atypical) ϫ 3 (task: sex, race, color) whole-brain mixed-effects ANOVA ( p Ͻ 0.01, corrected; participant included as a random effect; 3dANOVA3 in AFNI). This revealed a significant main effect of typicality, with stronger BOLD responses to atypical versus typical target faces in the cingulo-opercular network, including the pre-SMA/dACC and aI/fO (Fig. 3, Table 2). No regions were elicited by the typicality ϫ task interaction effect that survived correction ( p Ͻ 0.01, corrected).
More importantly, we sought to further characterize the nature of the pre-SMA/dACC's stronger responses to atypical exemplars. Although suggestive that the pre-SMA/dACC may be involved in monitoring for conflicting coactivations that are more present in the atypical condition, stronger evidence in support of this hypothesis is that pre-SMA/dACC response in this context is specifically responsive to category coactivation; if so, then pre-SMA/dACC activation should correlate with MD on a trial-by-trial basis. Using the pre-SMA/dACC region elicited by the previous whole-brain analysis (see Materials and Methods), trial-level mean responses within this ROI were calculated to examine the relationship of pre-SMA/dACC activity with trial-bytrial behavioral indices of category competition (MD). A separate GLM was constructed to estimate mean pre-SMA/dACC response for each trial (see Materials and Methods). Within this ROI of the pre-SMA/dACC, we performed an analysis wherein trial-by-trial neural responses for only atypical targets were extracted and their relationship with trial-by-trial MD was tested. Note that this represents an independent analysis from the initial whole-brain contrast of atypical Ͼ typical.
Consistent with our hypothesis, regressing pre-SMA/dACC activation on MD in a trial-by-trial fashion indicated a significantly positive relationship (b ϭ 0.120, SE ϭ 0.025, t (14.552) ϭ 4.76, p ϭ 0.0003, 95% CI ϭ 0.081-0.196). To more stringently assess the nature of the pre-SMA/dACC response, we also performed this analysis while additionally controlling for an alternative explanation of pre-SMA/dACC responses, namely mere task difficulty (i.e., RTs) and motor effort (i.e., total hand motion and force; summated velocity and absolute acceleration across all time points in the mouse-trajectory data per trial). If pre-SMA/dACC responsiveness reflected task difficulty alone rather than genuine coactivation, then controlling for trialby-trial RT should eliminate the correlation with MD. In addition, if pre-SMA/dACC responsiveness reflected mere motor effort due to task demands of more atypical trials (given the putative recruitment of regions surrounding the pre-SMA in motor preparation as well; Nachev et al., 2008), then controlling for trial-by-trial motor effort indices should eliminate the relationship with MD. Indeed, prior work has identified a signal for conflict processes separable from other pre-SMA/ dACC signals such as task difficulty through such covariate analyses (Neta et al., 2014). We regressed pre-SMA/dACC activity upon MD controlling for RT (task difficulty), summated x-/y-axis velocity (total motion), and summated absolute acceleration (total force), again finding a significant positive relationship between pre-SMA/dACC activation and MD (b ϭ 0.083, SE ϭ 0.029, t (23.2) ϭ 2.884, p ϭ 0.0083, 95% CI ϭ 0.027-0.143). These findings suggest that the pre-SMA/dACC is specifically responsive to competition between opposing social category responses above and beyond the mere difficulty and motor effort of the categorization (Nachev et al., 2008; Jahn et al., 2016), which is consistent with certain prominent accounts of this region.

Mediation analysis
Last, if competition between coactivated categories in the rFG elicits conflictmonitoring processes in the pre-SMA/ dACC, then we may expect increased pre-SMA/dACC activity on trials with greater rFG category coactivation. Specifically, we tested a mediation model in which MD (behavioral index of competition) mediates the relationship between rFG pattern similarity (category coactivation) and pre-SMA/dACC activity. We tested this model with the multilevel approach put forth by Bauer et al. (2006), which uses a Monte Carlo simulation (10,000 iterations) to estimate 95% CIs for the total and indirect effect. Consistent with the previous analyses, there was a significant total effect (p Ͻ 0.0001, 95% CI ϭ 0.24235-0.36201), with a positive relationship between rFG category coactivation and pre-SMA/dACC activation. More importantly, there was a significant indirect effect of MD ( p ϭ 0.005, 95% CI ϭ 0.0031-0.01563), supporting our hypothesis that the positive relationship between rFG pattern similarity and pre-SMA/ dACC responses may partly be accounted for by the competition between category activations ( Fig. 4; b indirect effect ϭ 0.00893, SE indirect effect ϭ 0.00321, p indirect effect ϭ 0.005, 95% CI indirect effect ϭ 0.00263-0.01523; b path a ϭ 0.08029, SE path a ϭ 0.01745; b path b ϭ 0.1043, SE path b ϭ 0.02374; b path c ϭ 0.30237, SE path c ϭ 0.03056, p path c Ͻ 0.0001, 95% CI path c ϭ 0.24246 -0.36227]; b path cЈ ϭ 0.2934, SE path cЈ ϭ 0.0301). Together, this suggests that competition between category coactivations in the rFG may trigger stronger responses in the pre-SMA/dACC that is then engaged to help resolve such conflict.

Category competition versus general indecision
Last, we analyzed our additional data to rule out the alternative explanation that increased trajectory deviations are merely due to less decisive movement toward the response. We used a repeatedmeasures ANOVA with Helmert coding to make two primary comparisons: (1) relevant cues versus both irrelevant cues and base race and (2) irrelevant cues versus base race. We found relevant cues to elicit significantly greater MD (M ϭ 0.509) than other conditions on average (M ϭ 0.475; F (1,48) ϭ 7.006, p ϭ 0.01). Pairwise comparisons revealed that, whereas relevant cues (M ϭ 0.509) elicited significantly higher MD than both irrelevant cues (M ϭ 0.478; t (48) ϭ 2.214, p ϭ 0.034) and base race (M ϭ 0.472; t (48) ϭ 2.567, p ϭ 0.014), there was no evidence of a difference between irrelevant cues and base race (t (48) ϭ 0.486, p ϭ 0.629).
These results show that relevant partial other race cues elicited increases in MD relative to no partial cues and irrelevant partial cues, whereas irrelevant partial cues did not elicit MDs relative to no partial cues. Therefore, only when a face bore cues specifying the alternate category did participants exhibit a parallel attraction to that category; mere ambiguity or uncertainty was not sufficient to elicit MD effects. These findings confirm the specific sensitivity of MD to task-relevant cues and category coactivation, rather than mere indecision.

Discussion
Through the integration of mouse-tracking and neuroimaging, we found convergent evidence that multiple social categories are simultaneously and partially activated during the perception of faces whose features partly overlap with other categories (e.g., feminine male face). Specifically, while categorizing faces' gender or race, or objects' color, participants' hand trajectories partially deviated toward the opposite category response when a target's features resembled that category. Such results are evidence for coactivation between social categories that compete over time, consistent with previous behavioral studies (Freeman and Johnson, 2016), and additional analyses ruled out alternative explanations such as discrete-like errors or mere indecisiveness. Neuroimaging results demonstrated that such social category coactivation was reflected in the similarity of multivoxel patterns in the rFG and ventral visual stream. To the extent that participants' response trajectory exhibited a parallel attraction to an opponent social category (e.g., male for a feminine male face), the elicited rFG pattern was correspondingly more similar to the average pattern associated with that opponent category. In turn, this increased pattern similarity effect predicted stronger overall pre-SMA/dACC activation, likely reflecting the pre-SMA/dACC's role in detecting and resolving the category inconsistency. Moreover, such results were not limited to the social domain (sex, race), instead generalizing across nonsocial categorization as well (color).
The present findings bolster previous behavioral studies suggesting that perceivers translate a natural spectrum of facial cues into stable categorizations of other people through a competition process involving multiply activated categories (Freeman and Johnson, 2016), as proposed by recent computational models of social categorization (Freeman and Ambady, 2011). Earlier accounts proposing activation of a single category representation of variable strength (Locke et al., 2005) are not well accommodated by the findings. Comparisons of neural responses across social and nonsocial categorization strongly converged, supporting accounts that social categorization draws on domain-general mechanisms suited to other forms of perceptual categorization (Freeman and Ambady, 2011). Indeed, dynamic competition between coactivated representations is prevalent across domains, path c ϭ 0.30237, SE path c ϭ 0.03056, p path c Ͻ 0.0001, 95% CI path c ϭ 0.24246 -0.36227) was partly accounted for by the extent of category competition as measured behaviorally (MD; reduced effect: path c'; b path cЈ ϭ 0.2934, SE path cЈ ϭ 0.0301). This result suggests that visual representations approximating the target category (e.g., male) and its competitor (e.g., female) simultaneously may lead to increasedcategorycompetition,whichinturnleadstostrongerengagementofthepre-SMA/dACC.Resultsweresignificancetestedwitha Monte Carlo simulation (10,000 iterations) to estimate confidence intervals for the total and indirect effect (for other effects, *p Ͻ 0.05, **p Ͻ 0.01, ***p Ͻ 0.0001). Unstandardized betas and their SEs are reported per path. ral resolution of fMRI. Therefore, in modeling the interplay of the rFG and the pre-SMA/dACC, the statistical mediation is suggestive of a possible causal chain of events, but the correlational nature limits any strong inference. Despite this, the role of rFG pattern-similarity in representing social categories, including coactivated categories, and the pre-SMA/dACC and cinguloopercular response to such pattern similarity effects in the presence of coactivated categories is clear.
In summary, the members of any social category that we encounter are rarely a perfect prototype of that given category. Not only do they deviate in the degree of their membership, but also to the extent that their cues relate to alternative, opponent categories. Although recent computational models and behavioral studies have suggested that this leads to the coactivation and competition of potential categories (Freeman and Ambady, 2011;Freeman and Johnson, 2016), the neural basis of how the brain resolves occasionally conflicting cues into social categorical percepts has remained unclear. The present results provide evidence that, in processing the gender or race of a face, opponent social categories coactivate in the rFG, which compete and may resolve into an eventual stable categorization with the assistance of the pre-SMA/dACC. Therefore, the findings provide a neural mechanism through which the brain may translate inherently diverse social cues into coherent categorizations of other people.