Abstract
Eye movements provide a functional signature of how human vision is achieved. Many recent studies have consistently reported robust idiosyncratic visual sampling strategies during face recognition. Whether these interindividual differences are mirrored by idiosyncratic neural responses remains unknown. To this aim, we first tracked eye movements of male and female observers during face recognition. Additionally, for every observer we obtained an objective index of neural face discrimination through EEG that was recorded while they fixated different facial information. We found that foveation of facial features fixated longer during face recognition elicited stronger neural face discrimination responses across all observers. This relationship occurred independently of interindividual differences in preferential facial information sampling (e.g., eye vs mouth lookers), and started as early as the first fixation. Our data show that eye movements play a functional role during face processing by providing the neural system with the information that is diagnostic to a specific observer. The effective processing of identity involves idiosyncratic, rather than universal face representations.
SIGNIFICANCE STATEMENT When engaging in face recognition, observers deploy idiosyncratic fixation patterns to sample facial information. Whether these individual differences concur with idiosyncratic face-sensitive neural responses remains unclear. To address this issue, we recorded observers' fixation patterns, as well as their neural face discrimination responses elicited during fixation of 10 different locations on the face, corresponding to different types of facial information. Our data reveal a clear interplay between individuals' face-sensitive neural responses and their idiosyncratic eye-movement patterns during identity processing, which emerges as early as the first fixation. Collectively, our findings favor the existence of idiosyncratic, rather than universal face representations.
Introduction
The visual system continuously processes perceptual inputs to adapt to the world by selectively moving the eyes toward task-relevant, i.e., diagnostic information. As a consequence, eye movements do not unfold randomly, and during face processing humans deploy specific gaze strategies. For many years, face recognition was considered to elicit a T-shaped fixation pattern encompassing the eye and mouth regions, which was universally shared across all observers (Yarbus, 1967; Henderson et al., 2005). However, over the last decade, a growing body of work has challenged this view by revealing cross-cultural (Blais et al., 2008; Miellet et al., 2013), idiosyncratic (Mehoudar et al., 2014), and within-observer (Miellet et al., 2011) differences during face recognition. For example, both Western and Eastern observers exhibit comparable face recognition proficiency while deploying respectively a T-shaped versus a more central fixation bias (for review, see Caldara, 2017). In addition, in line with early observations based on individual participants (Walker-Smith et al., 1977), recent studies demonstrate that observers deploy unique sampling strategies (Kanan et al., 2015; Arizpe et al., 2017), which are stable over time (Mehoudar et al., 2014), and relevant to behavioral performance (Peterson and Eckstein, 2013). Specifically, individuals' sampling strategies deviate considerably from the well established T-shaped pattern, which is merely the result of the group averaging of idiosyncratic visual sampling strategies of individual Western observers (Mehoudar et al., 2014).
Despite the growing literature on the existence of idiosyncratic sampling strategies, their functional role and underlying neural mechanisms remain poorly understood. Some studies have investigated the impact of the fixated facial information input on neural responses, by recording the electroencephalographic (EEG) signals while observers fixated different facial information [i.e., viewing positions (VPs)]. This body of work has focused on the N170 face-sensitive event related potential (ERP) component (Bentin et al., 1996), and has demonstrated that VPs differentially modulate the N170. The finding of the eye region eliciting larger amplitudes (Itier et al., 2006; de Lissa et al., 2014; Nemrodov et al., 2014; Rousselet et al., 2014) has been interpreted in terms of a universal neural preference toward this facial information. However, these studies have mainly involved grand-average analyses, and did not control for individual fixation preferences. Consequently, it remains unclear whether idiosyncratic fixation biases concur with idiosyncratic neural responses.
A paradigm that has been increasingly used to examine different aspects of face processing, including e.g., face categorization, identity or facial expression discrimination (Liu-Shuang et al., 2014; Norcia et al., 2015; Rossion et al., 2015; Dzhelyova et al., 2017) involves fast-periodic visual stimulation (FPVS). Such FPVS paradigms involve stimulation with a series of stimuli that periodically differ with respect to a given dimension. Neural synchronization to the frequency of changes provides an implicit measure of the process of interest. Compared with traditional ERPs, the FPVS response is less susceptible to noise artifacts, and its remarkably high signal-to-noise ratio increases the likelihood of detecting subtle differences between experimental manipulations (Norcia et al., 2015). Such signal properties make the FPVS paradigm paired with EEG recordings ideal to investigate the relationship between VP-dependency of neural responses and idiosyncratic visual sampling strategies.
In the present study, we extracted observers' fixation patterns exhibited during an old/new face recognition task (Blais et al., 2008). Additionally, we recorded their neural face discrimination responses using a FPVS paradigm, in which same identity faces were presented at a constant frequency rate with periodically intervening oddball identities, while observers fixated 1 of 10 VPs. We then applied a robust data-driven statistical approach to relate the idiosyncratic sampling strategies to the electrophysiological responses across all electrodes independently. As early as the first fixation, we find a strong positive relationship between idiosyncratic sampling strategies and neural face discrimination responses recorded across different VPs, which can be observed across all observers. In particular, independently of the sampling strategy, the longer a VP was fixated under natural viewing conditions, the stronger the neural face discrimination response during its enforced fixation.
Materials and Methods
Participants
The sample size opted for was motivated by studies using the same FPVS paradigm to index neural face discrimination that were published up to data acquisition (Dzhelyova and Rossion, 2014a,b; Liu-Shuang et al., 2014, 2016; sample size range: 8–12). In Dzhelyova and Rossion's (2014b) study using a within-subject design, the observed minimal effect size resulting from a repeated ANOVA was 0.2 (partial-η). As the effect size estimation is often overly optimistic in the literature, we planned our experiment based on an effect size of 0.1 and an estimated sample size of 15 participants, which results in a power of 0.95 to detect an effect. Based on prior experience and the requirement of high-quality data from independent methods, we chose to test a total number of 20 participants. Our cohort comprised 20 Western Caucasian observers (11 females, two left-handed, mean age: 25 ± 3 years) with normal or corrected-to-normal vision and no history of psychiatric or neurological disorders. Three observers were excluded because of poor quality of the eye movement data. All participants provided written informed consent and received financial compensation for participation; all procedures were approved by the local ethics committee. Finally, all observers performed the eye-tracking and the EEG experiment during the same testing session and systematically in this order. It is worth noting that the stimuli used across those sessions are different and cannot account for an order effect. In addition, none of the observers were aware of their fixation biases.
Procedures
Eye-tracking
Experimental design.
Stimuli consisted of 56 Western Caucasian (WC) and 56 East Asian (EA) identities respectively obtained from the KDEF (karolinska directed emotional faces; Lundqvist et al., 1998) and the AFID (asian face image database; Bang et al., 2001). Faces were presented at a viewing distance of 75 cm and subtended 12.56° (height from chin to hairline) × 9.72° (width) of visual angle on a VIEWPIxx/3D monitor (1920 × 1080 pixel resolution, 120 Hz refresh rate).
Observers completed two learning and recognition blocks per stimulus race. In each block, observers were instructed to learn 14 face identities (7 females) randomly displaying either neutral, happy, or disgust expressions. After a 30 s pause, a series of 28 faces (14 old faces) were presented and observers were instructed to indicate as quickly and as accurately as possible whether each face was familiar or not by key-press. To prevent image matching strategies, learned identities displayed different facial expression in the recognition blocks. Each trial involved presentation of a central fixation cross dot (which also served as an automatic drift correction), followed by a face presented pseudorandomly in one of four quadrants of the computer screen, to avoid potential anticipatory fixation strategies. During the learning phase, stimuli were presented for 5 s; during the recognition phase presentation was terminated upon participants' responses. Eye movements were recorded during both the learning and recognition phases.
Data acquisition and processing.
The oculomotor behavior was recorded for each participant using an EyeLink 1000 Desktop Mount with a temporal resolution of 1000 Hz. The raw data are available upon request. Data were registered by using the Psychophysics (Brainard, 1997) and the EyeLink (Cornelissen et al., 2002) Toolbox running in a MATLAB R2013b environment. Calibrations and validations were performed at the beginning of the experiment using a nine-point fixation procedure. Additionally, before each trial a fixation cross appeared in the center of the screen and participants were instructed to fixate on it until a new stimulus appeared to ensure that eye movements were correctly tracked. A new calibration was performed at this stage if the eye drift exceeded 1° of visual angle.
After removing eye blinks and saccades using the algorithm developed by Nyström and Holmqvist (2010), observers' eye-movement data from the Old-New task were processed to create individual fixation maps, independently for learning and recognition phase. For both phases we removed noisy trials suffering from loss of data and/or precision and for the recognition session we only considered trials where subjects provided a correct response. Previous studies have shown that with this paradigm there are no differences in the sampling strategies used to sample WC or EA faces (Blais et al., 2008; Caldara, 2017). Therefore, to increase the signal-to-noise ratio, fixation maps were extracted independently of the stimulus race. After preprocessing the eye movement data, fixation maps were computed independently for each subject based on 54 and 60 trials for the learning and recognition phases, respectively. These were the minimum number of trials available for all subjects. Individuals' fixation intensities (based on the cumulative fixation duration) were derived using these fixation maps and predefined circular regions-of-interest (ROIs; Fig. 1). The ROIs covered 1.8° of visual angle and were centered on the 10 viewing positions fixated during the FPVS experiment.
Illustration of the ROIs surrounding the 10 VPs. Observers' fixation maps were overlaid onto a ROI mask to compute the fixation intensity per ROI. The ROI were covering 1.8° of visual angle and were centered on nine equidistant viewing positions (red circles) and on an additional VP corresponding to the center of the stimulus (black circle).
EEG
Experimental design.
We used full-front, color images of 50 identities (25 female) from the same set described previously (Liu-Shuang et al., 2014). All faces conveyed a neutral expression, were cropped to exclude external facial features, and were presented against a gray background. Each original stimulus subtended 11.02° (height) × 8.81° (width) of visual angle at a viewing distance of 70 cm.
Face-stimuli were displayed through the fast periodic visual stimulation (i.e., FPVS) paradigm at a constant frequency rate of 6 Hz. Each trial lasted 62 s and consisted in presenting a series of same-identity faces (i.e., base), with intervening oddball identities every seventh base, hence at a frequency of 0.857 Hz (Fig. 2A–C). The experiment comprised a total of 20 trials: 10 conditions (the viewing positions participants were required to fixate on; Fig. 2B,C), with two trials per condition (trials differed with respect to the gender of the face stimuli). To prevent eye movements, participants were instructed to maintain fixation on a central cross. The position of face stimuli was manipulated to vary, across trials, the fixated viewing position, hence the facial information. Faces were presented through sinusoidal contrast modulation (Fig. 2A). Additionally, 2 s of gradual fade in and fade out were added at the beginning and end of each trial. To maintain subjects' attention, the fixation cross briefly (200 ms) changed color (red to blue) randomly between seven and eight times within each trial; participants were instructed to report the color change by button-press. Subjects were also monitored through a camera placed in front of them communicating with the experimenter computer. No additional eye-tracking was performed during EEG acquisition, as these measures were considered as sufficient for the intended purposes. Finally, to avoid pixel-wise overlap, stimulus size varied randomly from 80 to 120% of the original size [visual angle ranged from 8.82 to 13.22° (height) to 7.05–10.57° (width)].
FPVS paradigm and viewing positions. A, Faces were presented at a frequency rate of 6 Hz through sinusoidal contrast modulation. Base stimuli consisted of images of the same facial identity; interleaved oddball stimuli conveying different identities were presented every seventh base stimulus. B, Illustration of the 10 VPs fixated by participants. C, Examples of two trials displaying fixation on the left eye (VP1; top row), or mouth (VP8; bottom row).
Data acquisition and processing.
Electrophysiological responses were recorded with BioSemi Active-Two amplifier system (BioSemi) with 128 Ag/AgCl active electrodes and a sampling rate of 1024 Hz. Electrodes were relabeled according to the more conventional 10–20 system notation following the guidelines by Liu-Shuang et al. (2016). Additional electrodes placed at the outer canthi and below both eyes registered eye movements and blinks; the magnitude of the offset of all electrodes was reduced and maintained <±25 mV. The recorded EEG was analyzed using Letswave 5 (https://github.com/NOCIONS/Letswave5; Mouraux and Iannetti, 2008). The raw data are available upon request. Preprocessing consisted in high- and low-pass filtering the signal [with a 0.1 and 100 Hz Butterworth bandpass filter (fourth-order)]. Data were subsequently downsampled to 256 Hz and segmented according to condition resulting in twenty 66 s epochs, which included 2 s before and after stimulation. Independent component analysis was performed on each participant's data to remove contamination because of eye movements and blinks.
Noisy electrodes were interpolated using the three nearest spatially neighboring channels; this process was applied to no >5% of all scalp electrodes. Segments were then re-referenced to a common average reference and cropped to an integer number of oddball cycles, excluding 2 s after stimulus onset and 2 s before stimulus offset (∼58 s epochs; 14,932 bins). Epochs were then averaged separately for each subject per condition.
Frequency domain.
Fast Fourier transform was applied to the averaged segments and amplitude was extracted. The data were baseline corrected by subtracting from each frequency's amplitude the average of its surrounding 20 bins excluding the two neighboring ones. Finally, for each subject and condition, the summed baseline-corrected amplitude of the oddball frequency and its significant harmonics provided the index of neural face discrimination. Following previous procedures (Dzhelyova et al., 2017), harmonics were considered significant until the mean z-score across all conditions was no longer >1.64 (p < 0.05). Based on this criterion we considered the first 11 harmonics excluding the seventh harmonic, which is confounded with the base stimulation frequency rate.
Statistical analyses
Using the iMAP4 toolbox (Lao et al., 2017) we computed a linear regression to explore the relationship between the fixation bias (the z-scored fixation duration) displayed during the recognition phase and neural face discrimination (i.e., the FPVS response amplitude). To this aim we performed a linear mixed-effects model with random effect for intercept and Fixation duration grouped by subject. To avoid a priori assumptions regarding topography of the effect, we regressed the two variables at all scalp electrodes independently and results were Bonferroni-corrected.
This computation will determine whether VP-dependent fixation duration are associated with the amplitude of the neural face discrimination response elicited by each VP. Importantly, because the analysis takes into consideration idiosyncrasies, there is no a priori expectation on how VPs are ranked. We opted for this approach in light of individual differences in fixation patterns reported previously (Mehoudar et al., 2014; Kanan et al., 2015; Arizpe et al., 2017), and similar idiosyncrasies assumed to exist for neural face discrimination responses across VPs. Therefore, the model used here allows each subject to have his/her specific VP-pattern and a relationship emerges if the fixation pattern is predictive of the neural response pattern of the same subject. Finally, as the current work does only focus on individual subjects, we did not perform any analysis involving average fixation maps and average EEG responses.
To determine whether fixation maps would show a stronger correlation with EEG responses of the same subject, we randomly sampled the fixation maps of our subjects to correlate eye movement from one observer with EEG response of another observer. On these new data we performed the same regression described above. This process was repeated 1000 times, and within each iteration we summed the significant F values (p < 0.5/128). We then ranked the 1000 summed significant F values and selected the 95th percentile as the threshold to assess statistical significance. Only if the summed significant F values from the original analysis were above this simulated threshold, results were retained as being significant.
Additionally, although the main focus of this work was to isolate the relationship between eye movements during correct recognition of faces and neural face discrimination responses, to provide a comprehensive view of our data we also investigated whether such relationship would occur when considering fixation biases based on the (1) first or (2) second face fixations in each trial. Moreover, we also performed the same analysis by considering the eye movements of the learning phase. We thus investigated the potential existence of such relationship between eye movements and neural face discrimination for (1) all fixations, (2) the first, or (3) the second only for the learning and recognition phases.
Results
Behavior
As expected, subjects' recognition performance in the Old-New task, as indexed by d′, was significantly better for Western Caucasian (M = 1.62, SD = 0.64) than East Asian faces (M = 0.97, SD = 0.60), t(16) = 5.72, p < 0.01. Subjects' performance was nearly at ceiling for the FPVS orthogonal task (M = 0.91, SD = 0.18). Note that a color change was considered as detected if observers reported it within 700 ms from its onset. Because of technical issues, one subject's behavioral responses were not recorded.
Eye movements and FPVS response
Table 1 summarizes the number of fixations and the similarity between fixation maps during learning and recognition sessions (indexed by the cosine distance, with a distance of 0 indicating identical fixation maps). The average fixation map (computed for descriptive purposes; Fig. 3A) demonstrates that, as a group, observers preferentially sampled facial information encompassing the eyes, nasion, nose, and mouth. However, because the focus of this work was to investigate the relationship between fixation patterns and neural responses at the individual level, group data were not subject to any further analysis.
Number of fixations and fixation maps' similarity between learning and recognition session for each observer
Fixation maps and oddball responses. A, C, The grand-average fixation map and FPVS responses are shown, respectively, whereas B and D show the two measures for the same subjects. For illustration, only four subjects are reported.
At the individual level, the majority of individual observers' fixation maps did not perfectly conform to the grand average fixation pattern (Figs. 3A,B, 4), clearly demonstrating the existence of idiosyncratic visual sampling strategies. Mirroring these results, the grand average neural face discrimination response amplitudes varied as a function of VPs, with the greatest amplitudes for the central position (Fig. 3C). However, the neural responses amplitudes also markedly differed across individuals (Figs. 3D, 4).
Fixation maps for the recognition session and neural face discrimination responses of all subject.
Regression analysis: assessing the relationship between fixation and neural biases
The data-driven regression between individuals' fixation durations and FPVS responses across VPs computed independently on all electrodes revealed a positive relationship at right occipito-temporal and central-parietal clusters (Fig. 5A).
The relationship between fixation duration and neural face discrimination responses across VPs observed across all subjects considered individually. A, Regression F values (top) and β maps (bottom) are shown only for electrodes exhibiting a significant effect (p < 3.91e−04). B, Scatterplot illustrates individual subjects (light gray lines) as well as the group (black line) effect at electrode P10. C, Zoom in to B. Each subject is plotted alone with their individual correlation (blue line). VPs are color- and shape-coded as indicated in the key. The subjects are ordered as a function of their relationship magnitude (slope). Although observers exhibited idiosyncratic VP-dependent fixation durations (see also individuals' fixation maps in Fig. 7), all showed a positive relationship, with facial features fixated longer (i.e., VPs) eliciting stronger neural responses. Note that here the neural face discrimination response magnitude is displayed at the occipito-temporal electrode showing the largest effect (i.e., P10).
The occipito-temporal cluster includes 12 significant electrodes with the strongest effect at P10 [F(1,169) = 32.91, β = 0.27 (0.17, 0.36), p = 4.40e−08] and the smallest at P9 [F(1,169) = 13.26, β = 0.20 (0.09, 0.31), p = 3.61e−04; Table 2]. Despite interindividual variations in the neural face discrimination response amplitude and fixation durations, we observed a positive relationship for all observers (Fig. 5B,C).
Results for fixation-dependent regression analyses (for the recognition session)
An effect was also found on the central-parietal cluster comprising 13 electrodes, with C1 showing the strongest effect [F(1,169) = 33.05, β = 0.14 (0.09, 0.19), p = 4.14e−08] and FCz exhibiting the smallest effect [F(1,169) = 15.41, β = 0.12 (0.06, 0.18), p=1.26e−4; Fig. 5A; Table 2].
Finally, to determine whether fixation maps would correlate better with EEG responses of the same subject, we ran simulations of the same analyses when EEG responses were correlated with fixation maps of different observers. In each iteration, we summed the significant F value and the 95th percentile of this distribution constituted our simulated threshold (see Statistical Analysis). The sum of significant F values (670.89) obtained using the original data exceeded the simulated threshold determined (536.32), and was therefore significant (Table 2). Significant results were also obtained for analyses performed on the first (summed F values = 474.07, simulated threshold = 345.67) and the second fixation (summed F values = 500.79, simulated threshold = 310.82; Fig. 6B; Table 2). The results of the same analyses performed on data acquired during the learning session were significant only for the first fixation (summed F values = 447.33, simulated threshold = 315.06; Fig. 6A; Table 3).
Results of the analyses performed using the fixation bias computed based on the learning (A) or recognition (B) data. For the learning analyses are reported for all, only the first or second fixation. For the recognition session, analyses are reported for only the first or second fixation. F and β values are reported only electrodes showing a significant (p < 7.81e−05) are shown. Below each topography of the effect are the fixation maps of all observers. *Indicates which effect was significant at the simulated threshold.
Results for fixation-dependent regression analyses (for the learning session)
Can specific fixation biases account for the observed relationship?
To explore whether subjects exhibiting a particular fixation bias during recognition (e.g., for the eyes) would show a stronger relationship between fixation and neural biases, we first ranked observers' fixation maps based on the magnitude of their individual relationship. As shown in Figure 7A, subjects showing similar fixation patterns could exhibit relationships of slightly different magnitude (e.g., nasion: S04 and S05), whereas observers exhibiting different fixation maps could rank closely in terms of relationship strengths (e.g., S11 and S16). Additionally, we computed the distance of each observer's fixation map from the average fixation pattern. In this case, each map is treated as a vector and the measure-of-interest is the cosine distance between each observer's map and the average one (a distance of 0 indicates identical fixation maps). This produces a value ranging between 0 and 1 for each subject. The higher the distance the more dissimilar that given subject's pattern is from the average. Finally, we performed a Spearman correlation between this distance and the strength of the relationship between fixation and neural bias, which resulted to be nonsignificant (r = −0.31, p = 0.22; Fig. 7B).
Fixation maps and strength of the fixation–neural-bias relationship. A, Observers' fixation maps sorted as a function of the slope of observers' relationship between fixation bias and neural face discrimination response amplitude. The slope is reported for the occipito-temporal electrode showing the strongest effect (i.e., P10). B, The scatterplot illustrates the lack of correlation between: the cosine distance of individuals' fixation maps from the average fixation map (y-axis) and strength of the relationship between fixation and neural bias (x-axis). The data show there was not a particular fixation bias more likely to correlate with the neural bias. Note that a cosine distance of zero indicates identical fixation maps.
Discussion
This study investigated the relationship between idiosyncratic visual sampling strategies for faces and the magnitude of neural face discrimination responses during fixation on different facial locations. Our data show that visual information sampling is distinct across observers, and these differences are positively correlated with idiosyncratic neural responses predominantly at occipito-temporal electrodes. Specifically, the VPs that elicited stronger neural face discrimination responses coincided with the VPs that were more fixated under free-viewing conditions. Altogether, our data show that face processing involves idiosyncratic coupling of distinct information sampling strategies and unique neural responses to the preferentially sampled facial information.
For many years, the accepted notion in vision research was that face processing elicits a unique and universal cascade of perceptual and cognitive events to process facial identity, with particular importance ascribed to information conveyed by the eye region. For instance, eye movement studies have revealed a bias toward sampling of the eye region (Blais et al., 2008), the diagnosticity of which has been further documented by psychophysical approaches (e.g., Bubbles; Gosselin and Schyns, 2001). Electrophysiological studies have also reported increased N170 magnitude during fixation on the eyes, compared with other information (Nemrodov et al., 2014; de Lissa et al., 2014). Collectively, these independent findings were taken to support the existence of a fixation and neural preference for the eye region that is shared across all observers.
However, this idea has recently been challenged. For example, findings from eye movement studies emphasize idiosyncrasies in sampling preferences that are highly distinct from the group-average T-shaped pattern (Mehoudar et al., 2014; Arizpe et al., 2017), or by the existence of cultural differences (Blais et al., 2008; Caldara, 2017). These individual differences are not systematically associated with performance, as “mouth lookers” (i.e., observers showing preferential fixation on the mouth) could perform similarly to “eyes lookers”. Equally, two eyes lookers could exhibit very different performance (Peterson and Eckstein, 2013). Nonetheless, each observer's adopted sampling strategy is optimal in the sense that performance is maximal when fixation is enforced on preferably sampled information, and decreases during fixation of other information (Peterson and Eckstein, 2013). These results suggest that individual differences do not reflect random intersubject variation, but rather subtend functional idiosyncrasies in face processing.
Our results replicate and extend these previous findings, by showing that idiosyncratic visual sampling strategies strikingly mirror individuals' patterns of neural face discrimination responses across VPs. Specifically, the facial regions preferentially sampled during natural viewing were those eliciting stronger neural face discrimination responses when fixated. This pattern was present in all observers, with even some of them showing a perfect match between the most fixated facial feature and the one eliciting the strongest neural response at the electrode showing the strongest statistical relationship.
Interestingly, such relationship emerged also when fixation bias was computed only based on the first or the second fixation. This observation suggests that from very early information intake fixations are directed toward observer-specific preferred face information. Moreover, it also indicates that idiosyncratic fixation strategies emerge as early as the first fixation on faces.
When considering single fixations performed during face learning, a significant relationship emerged only on the first one. The reduced sensitivity of the learning phase compared with the recognition phase, might be because of the imposed time duration (i.e., 5 s) to process faces during this part of the experiment. This long time period introduces an inherent variability in information sampling. In the recognition session, however, observers are required to recognize faces as quickly and as accurately as possible, eliciting a restricted number of diagnostic fixations (Table 1) during a short period of time (i.e., M = 1457.3 ms, SD = 421.3). However, it is worth noting that overall observers deployed similar fixations across both sessions (Table 1), a result that reinforces the idea of a reliable occurrence of idiosyncratic eye-movement strategies over (a long period of) time (Mehoudar et al., 2014) for the face recognition task.
The effect we find could be partially related to an overall preference toward facial features, such as the eyes and mouth or the center of the face (i.e., T-shaped pattern). However, significantly weaker effects are observed when correlating fixation maps and neural response derived from different individuals. These observations clearly demonstrate the existence of a tight coupling between idiosyncratic fixation biases and neural responses, instead of a general tuning for facial features per se.
The strong and striking relationship between information sampling and neural idiosyncrasies suggests a functionally relevant process. Eye movements feed the neural face system with the diagnostic information to optimize information processing. The eyes constantly move to center elements-of-interest in the fovea, where visual acuity is greatest. This critical functional role, coupled with the relationship reported here between idiosyncratic sampling strategies and the neural face discrimination response pattern thus leads to two main considerations. First, our data show that face identity processing involves a fine-tuned interplay between oculomotor mechanisms and face-sensitive neural network. Second, the diagnosticity associated with different facial information varies across observers. For a long time, researchers have debated on the nature of face representations, mainly opposing the idea of faces being represented as indivisible wholes (holistic or configural), as opposed to a collection of multiple, distinctively perceivable features (featural). This ongoing debate cannot be settled based on our finding of visual and neural idiosyncrasies. These idiosyncrasies do, however, refute the concept of a single face representation format shared across observers.
Our observations raise further important methodological and theoretical questions. The first concerns the traditional approach of standardizing the visual input to allow comparability across observers. The idiosyncratic differences in facial location tuning call into question the appropriateness of using a single visual stimulation location. Specifically, the conventional central presentation used in the majority of face processing studies might inherently create a perceptual bias that favors some but not all observers, which exhibit differential neural responses for this fixation location (and others). Additional open questions concern for instance (1) the extent to which the relationship between the visual sampling strategies and neural response patterns is task- and category-specific, and (2) the direction of this relationship. Future studies are required to accurately determine the neural structures underlying the observed relationship (for example, by means of fMRI). Finally, our approach may offer a promising novel route in clinical settings, if disorders comprising face processing impairments (i.e., prosopagnosia, autism, schizophrenia, etc.) involved an abnormal relationship between fixation patterns and neural responses to faces.
Footnotes
This work was supported by the Swiss National Science Foundation (Grant IZLJZ1_171065/1) to R.C.
The authors declare no competing financial interests.
- Correspondence should be addressed to Roberto Caldara at roberto.caldara{at}unifr.ch