Abstract
Infants preferentially process familiar social signals, but the neural mechanisms underlying continuous processing of maternal speech remain unclear. Using EEG-based neural encoding models based on temporal response functions, we investigated how 7-month-old human infants track maternal vs. unfamiliar speech and whether this affects simultaneous face processing. Infants (13 boys, 12 girls) showed stronger neural tracking of their mother’s voice, independent of acoustic properties, suggesting an early neural signature of voice familiarity. Furthermore, central encoding of unfamiliar faces was diminished when infant’s heard their mother’s voice and face-tracking accuracy at central electrodes increased with earlier occipital face tracking, suggesting heightened attentional engagement. However, we found no evidence for differential processing of happy vs. fearful faces, contrasting previous findings on early emotion discrimination. Our results reveal interactive effects of voice familiarity on multimodal processing in infancy: while maternal speech enhances neural tracking, it may also alter how other social cues, such as faces, are processed. The findings suggest that early auditory experiences shape how infants allocate cognitive resources to social stimuli, emphasizing the need to consider cross-modal influences in early development.
Significance statement How infants continuously process familiar social signals and how this familiarity shapes perception has remains unknown. Here we demonstrate that infants’ brains preferentially track maternal speech over unfamiliar voices, highlighting the early tuning of the auditory system to socially relevant signals. Moreover, the maternal voice modulates the neural encoding of concurrently presented faces without eliciting emotion-specific differences. These findings underscore the role of caregiver signals in shaping multisensory integration during early development.
Footnotes
The authors declare no competing financial interests.
The authors are grateful to the families for their participation. We further would like to thank the German Research Foundation (DFG) for funding to SJ (JE781/3-1). Generative Artificial Intelligence (DeepL by DeepL; chatGTP by openAI) was used for proofreading, for the content of which the authors retain full responsibility.





