People interact with other people and with objects in distinct and categorizable ways (e.g., kicking is making contact with foot). We can recognize these action categories across variations in actors, objects, and settings; moreover, we can recognize them from both dynamic and static visual input. However, the neural systems that support action recognition across these perceptual differences are unclear. Here we used multivoxel pattern analysis of fMRI data to identify brain regions that support visual action categorization in a format-independent way. Human participants were scanned while viewing eight categories of interactions (e.g., pulling) depicted in two visual formats: (1) controlled videos of two interacting actors; and (2) still images selected from the internet involving different actors, objects, and scene contexts. Action category was decodable across visual formats in bilateral inferior parietal, bilateral occipitotemporal, left premotor, and left middle frontal cortex. In most of these regions, the similarity space of action categories was consistent across subjects and visual formats, a property that can contribute to a common understanding of events among individuals. These results suggest that the identified brain regions support action category codes that are crucial for action recognition and action understanding.
Humans tend to interpret the observed actions of others in terms of categories that are invariant to incidental features: whether a girl pushes a boy or a button, and whether we see it in real-time or in a single snapshot, it is still pushing. Here we investigate the brain systems that facilitate the visual recognition of these action categories across such differences. Using fMRI, we identify several areas of parietal, occipitotemporal, and frontal cortex that exhibit action category codes that are similar across viewing of dynamic videos and still photographs. Our results provide strong evidence for the involvement of these brain regions in recognizing the way that people physically interact with objects and other people.
The authors declare no competing financial interests.
This work was supported by the Center for Functional Neuroimaging at the University of Pennsylvania; and NSF Integrative Graduate Education and Research Traineeship, NSF Graduate Research Fellowship, and NIH Vision Training Grant 2T32EY007035-36 (to A.H.). We thank Rachel Olvera and Jennifer Deng for assistance with stimulus collection, Jack Ryan and Stamati Liapis for assistance with data collection, and Michael Bonner, Steven Marchette, and Anjan Chatterjee for helpful comments on an earlier version of this manuscript. We also thank the two actors who appeared in the videos.