Abstract
The perception of our limbs in space is built upon the integration of visual, tactile, and proprioceptive signals. Accumulating evidence suggests that these signals are combined in areas of premotor, parietal, and cerebellar cortices. However, it remains to be determined whether neuronal populations in these areas integrate hand signals according to basic temporal and spatial congruence principles of multisensory integration. Here, we developed a setup based on advanced 3D video technology that allowed us to manipulate the spatiotemporal relationships of visuotactile (VT) stimuli delivered on a healthy human participant's real hand during fMRI and investigate the ensuing neural and perceptual correlates. Our experiments revealed two novel findings. First, we found responses in premotor, parietal, and cerebellar regions that were dependent upon the spatial and temporal congruence of VT stimuli. This multisensory integration effect required a simultaneous match between the seen and felt postures of the hand, which suggests that congruent visuoproprioceptive signals from the upper limb are essential for successful VT integration. Second, we observed that multisensory conflicts significantly disrupted the default feeling of ownership of the seen real limb, as indexed by complementary subjective, psychophysiological, and BOLD measures. The degree to which self-attribution was impaired could be predicted from the attenuation of neural responses in key multisensory areas. These results elucidate the neural bases of the integration of multisensory hand signals according to basic spatiotemporal principles and demonstrate that the disintegration of these signals leads to “disownership” of the seen real hand.
Introduction
We conduct our daily activities while paying little attention to the plethora of sensory signals that continuously reach our brain from the body and its surrounding space. The brain constantly combines information from multiple sensory channels to optimize behavior and construct stable representations of the environment (Ernst and Bülthoff, 2004; Stein and Stanford, 2008) and the body (Graziano and Botvinick, 2002; van den Bos and Jeannerod, 2002). The maintenance of an updated representation of the body is an essential prerequisite for goal-directed or defensive interactions with the external world (Jeannerod et al., 1995) and the sense of bodily self (Gallagher, 2006; Ehrsson, 2007; Blanke, 2012).
Research on the neural bases of the integration of bodily signals began with the characterization of neurons in the posterior parietal and premotor cortices of nonhuman primates. These neurons respond to both tactile and visual stimulation of a body part (Rizzolatti et al., 1981a,b; Graziano et al., 1997; Duhamel et al., 1998) and are modulated by proprioceptive inputs (Andersen, 1997; Graziano, 1999; Graziano et al., 2000). The integration of visual, tactile, and proprioceptive signals is governed by principles of temporal and spatial congruence (Meredith and Stein, 1986; Avillac et al., 2007), which constrain the selection of the signals that are to be combined. In humans, fMRI studies have identified a set of fronto-parieto-cerebellar regions with activation patterns that suggest that they integrate bodily signals across sensory channels (Bremmer et al., 2001; Lloyd et al., 2003; Makin et al., 2007; Beauchamp et al., 2010; Gentile et al., 2011). However, it remains to be investigated whether the integration of visual, tactile, and proprioceptive signals from the upper limb follows the basic spatiotemporal principles of multisensory integration.
Models of bodily self-perception include multisensory integration as the key mechanism underlying the self-attribution of limbs (Botvinick and Cohen, 1998; Tsakiris, 2010; Ehrsson, 2012). However, neuroimaging studies of the neural bases of the feeling of limb ownership have relied upon the use of a perceptual illusion: the “rubber hand” illusion (Ehrsson et al., 2004, 2005; Tsakiris et al., 2007). This approach constitutes a major limitation because the employment of illusions relies on the untested assumption that the same mechanisms that mediate illusory percepts underlie the “default” self-perception of one's real body and that the observed neuronal activations are not confounded by nonspecific cognitive effects when unfamiliar bodily sensations are experienced. Moreover, no previous neuroimaging study has tested the prediction that the disintegration of multisensory signals from one's real hand will result in a loss of the default feeling of limb ownership (Newport and Gilpin, 2011).
Here, we developed a setup based on 3D video technology that allowed us to manipulate the congruence of visual, tactile, and proprioceptive stimuli on a participant's real hand during fMRI. In three separate experiments, we first describe how BOLD responses in multisensory areas in the premotor, parietal, and cerebellar cortices, as well as the effective connectivity between these regions, obey spatiotemporal principles of multisensory integration. A control experiment excluded the possibility that these results can be explained by differences in visuospatial attention. Finally, we revealed a link between the disintegration of multisensory hand signals and losses in default limb self-attribution, as indexed by converging subjective, psychophysiological, and threat-evoked BOLD measures.
Materials and Methods
Participants
Thirty healthy volunteers participated in the study. All of the participants were right-handed, except for one left-handed volunteer, according to self-reports. A total of 15 participants (ages 21–33 years, mean 27 ± 3 years, 11 males) were recruited for Experiment 1. A total of 15 participants (ages 22–33 years, mean 26 ± 3 years, 10 males) participated in both Experiment 2 and 3 (5 of whom had taken part in Experiment 1). The participants had normal or corrected-to-normal vision and had no history of neurological or sensory disorders. Informed consent was obtained from all participants before the experimental sessions. The Regional Ethical Review Board of Stockholm approved the study.
Recording of participant-specific stereoscopic visual stimuli
An important methodological aim of the present study was to tailor the visual stimuli for each individual participant, thereby creating a video interface that allowed us to manipulate the congruence of multisensory stimuli on the participant's own real hand. This aspect represents an important methodological difference compared with previous studies that have used artificial hands or mannequins in conjunction with perceptual illusions to investigate the role of multisensory integration in bodily self-perception (Botvinick and Cohen, 1998; Ehrsson et al., 2004, 2007; Lenggenhager et al., 2007; Tsakiris et al., 2007; Petkova and Ehrsson, 2008; Barnsley et al., 2011; Petkova et al., 2011). Before the scanning sessions, the participants were asked to lie down on a bed in a position that matched the position that they would later take inside the MRI scanner (Fig. 1A). The participant's right hand was placed on a table with an adjustable slope (42 × 35 cm). The right hand was always located within the right hemispace and never crossed the body midline. The tilt angles of the table and the arm and the location of the hand on the table's surface were documented to allow full reproducibility inside the scanner. While the participants kept their eyes closed, two identical cameras (CamOne Infinity HD, resolution 1920 × 1080, Touratech AG, Germany) were placed just above their eyes at a distance of ∼9 cm from each other. The experimenter used a 3D object (Brozzoli et al., 2011, 2012; see also Gentile et al., 2011)—a red sphere made of a soft material ∼2 cm in diameter attached to a 50-cm-long thin wooden stick—to deliver tactile stimuli to the participant's right hand while receiving specific audio cues to control the stimulus timing. The audio cues were designed to ensure that the experimenter was blind to the presentation order of the different experimental conditions. For details on the different conditions, see the corresponding section for each experiment below. The recordings that were obtained from the two cameras were imported into a computer running Final Cut Pro 7 (Apple), and the two video streams were synchronized using a frame-by-frame technique. The recordings from the left and right cameras, which corresponded to the left and right eye, respectively, were arranged side-by-side in a single frame with a size of 1600 × 600 pixels. An appropriate metronome was added to the video for each experimental condition to help the experimenter synchronize the visual and tactile stimuli. Using a frame-sequential technique implemented with custom-made hardware and software, the visual stimuli were transmitted to the MR-compatible head-mounted displays during the imaging sessions (see below), yielding true stereoscopic, high-quality visual stimuli that featured the participant's own right hand (Fig. 1B).
Validation of the participant-specific experimental setup
Before the fMRI acquisition sessions, the participants were interviewed about the quality of the stereoscopic images that featured their own right hand. All participants reported that they could clearly and unambiguously visually recognize their real right hand through the head-mounted displays. Because the seen and felt locations of the hand matched, all of the participants reported that they “were just looking directly at their hand.” Therefore, all participants reported a strong “default” feeling of ownership of the seen 3D video image of their real right hand. Moreover, pilot experiments on two of the participants validated the setup in terms of the sensitivity in detecting BOLD responses to synchronous visuotactile (VT) stimuli on the participants' own hand, analogous to a previous study describing BOLD responses to the VT stimulation of one's hand in direct full view without head-mounted displays (Gentile et al., 2011). We conducted two pilot experiments in which we compared the effect of congruent VT stimulation of the hand (compared with the baseline in a block design) in separate runs in which the participants saw their hand either through the “see-through” head-mounted displays or without them (direct vision). In both pilot datasets, the pattern of the brain responses to congruent VT stimuli in key premotor and posterior parietal areas reproduced previously described activation patterns obtained under direct viewing conditions, thus confirming the ecological validity of the present setup (Gentile et al., 2011; data not shown).
Experimental setup
In all three experiments, the participants lay comfortably in a supine position on the bed of the MRI scanner. Each participant's head was propped up (≈30°) using a custom-made wooden wedge and additional foam pads to allow him/her to look directly into a pair of MR-compatible head-mounted displays (Nordic Neuro Laboratory; FOV 30° horizontal × 23° vertical; resolution 800 × 600) that were positioned in front of their eyes (Fig. 1A). The participants' right hand was placed on the same table used for the video recordings, which was mounted on the bed above the participants' waist. Great care was taken to ensure that the participant's position in the scanner matched the position in which the 3D videos were recorded: specifically, the angle at which the table was tilted and the position of the participant's hand on the table could easily be reproduced inside the scanner, thereby ensuring a match between the seen and felt positions of the hand. The same small spherical object was used to deliver tactile stimuli to the participant's right hand in all experiments. An MR-compatible camera (acquisition frequency 60 Hz; MRC Systems) mounted in proximity to the left head-mounted display was used to record eye movements during all experimental sessions. All participants could maintain appropriate fixation throughout the experiment and no sessions were discarded because of an inability to fixate. To ensure that the timing between the visual stimuli and the manually delivered tactile stimuli fulfilled the experimental manipulations, the same experimenter performed pilot experiments on four participants before Experiment 1 (see below) using a custom-made instrumented probe to deliver the tactile stimuli. The probe contained a simple mechanical sensor that allowed the logging of the onset and offset time points of each individual stroking movement. Therefore, the timing error between the onsets of the visual (taken as the first frame showing the object making contact with the hand) and tactile (as recorded by the sensor) counterparts of each individual stimulus could be computed for all conditions in Experiment 1 (see below for details). The average timing error was 66.7 ± 0.1 ms for the synchronous conditions, 69.6 ± 0.2 ms for the spatially incongruent condition, and 68.1 ± 0.1 ms for the temporally incongruent condition. The timing errors did not differ significantly across conditions or experimental sessions for all four pilot participants. After the main experimental sessions, the participants were openly questioned about their perception of the different conditions and reported no perceivable mismatch between the visual and tactile stimuli during the temporally synchronous multisensory events.
Experiment 1: Temporal and spatial congruence enhances VT integration
In Experiment 1, we tested the hypothesis that the integration of visual and tactile signals from the hand depends on their spatial and temporal congruence. Therefore, the key experimental manipulations involved changes in the temporal or spatial congruence of VT stimuli delivered on the participant's own right hand. In particular, we opted for a categorical comparison of fully congruent versus fully incongruent VT stimulation conditions in an efficient block design. This allowed us to compare situations in which strong perceptual binding of visual and somatic signals occurred with situations in which such binding did not occur (or only minimally so) in otherwise equivalent experimental conditions. Individual VT stimuli comprised 1 s strokes applied to the hand and digits with a small spherical object made of a soft material (see above). Two parts of the hand were chosen as stimulation target zones, the right index finger and the back of the right hand. Stimuli to the right index finger were applied starting from the proximal phalange and stopping at the distal phalange in proximity to the fingertip (Fig. 1C). Stimuli to the back of the hand were delivered starting laterally and ending medially at an ∼90° orientation with respect to the stimulus trajectory on the index finger (Fig. 1C). The temporal and spatial congruence of the VT stimuli was manipulated for both parts of the hand, which resulted in a total of six experimental conditions of interest in a block design in which each block lasted for a period of 16 s (Fig. 1C). Stimulation blocks contained a total of six visual stroking movements and six tactile stroking movements. Each stroking movement lasted 1 s. In VT Congruent blocks, the visual stroking movement fully matched the tactile stroking movement in time and location for either the index finger or the back of the hand, yielding a 1 s congruent VT stroking movement. The onset of each stroking movement was 1.5 s after the offset of the preceding movement; in other words, the onsets of consecutive VT stroking movements within each block were separated by 2.5 s. VT Time Incongruent blocks also contained six 1 s visual stroking movements and six 1 s tactile stroking movements. The visual and tactile stroking movements of each pair of VT stroking movements were matched in terms of spatial location (index finger or back of the hand). However, a delay of 1.25 s was introduced between the onsets of the visual and tactile stroking movements of each pair. Therefore, a time interval of 250 ms separated the offset of the visual stroking movement and the onset of the tactile stroking movement. Consistent with earlier behavioral and neuroimaging investigations, this temporal gap between the seen and felt strokes was chosen to ensure that no (or only minimal) perceptual binding of the visual and tactile stimuli occurred in this condition (Ehrsson et al., 2004; Shimada et al., 2005, 2009). The time interval occurring between the onsets of consecutive visual (or, equivalently, tactile) stroking movements was 2.5 s. This resulted in a temporally asynchronous succession of nonoverlapping visual and tactile stroking movements on the corresponding part of the hand. VT Space Incongruent blocks also contained six visual stroking movements and six tactile stroking movements. The visual and tactile stroking movements were temporally congruent, but their spatial locations on the hand differed (visual: index finger, tactile: back of the hand or visual: back of the hand, tactile: index finger; Fig. 1C). Extensive pilot experiments confirmed that both the VT Time Incongruent and VT Space Incongruent conditions strongly minimized or eliminated the perceptual binding of visual and tactile stimuli on the hand. In addition, all of the experimental conditions were fully matched in terms of the total amount of seen and felt strokes.
Each block started with a 1 s interval in which the participant's right hand was visible but no stimuli were applied. Consecutive blocks were separated by a 9 s baseline interval in which a black screen with a fixation target was placed in the same location at which the hand would appear during the trial. Each of the six conditions was repeated five times per session in three acquisition sessions, resulting in a total of 15 repetitions per condition. The order of the different conditions was randomized throughout the experiment. The participants were instructed to keep their gaze on the fixation cross during the baseline or on the corresponding location (the position of the hand) during the VT blocks. To further monitor the participants' alertness, two catch trials in each of the three acquisition sessions were included. In both of the catch trials, one of the six individual stroking stimuli was replaced by a stationary (non-moving) stimulus either in its visual (one visual catch trial) or in its tactile (one tactile catch trial) component. The participants were instructed to press a button with their left hand as soon as they detected the occurrence of a catch stimulus. By assigning the catch trial to either the visual or the tactile component of the VT stimuli, we ensured that the participants would not exclusively pay attention to one of the sensory modalities of interest (i.e., only vision or only touch). The catch trials were modeled as regressors of no interest and were discarded from all further analyses.
Experiment 2: VP congruence and changes in hand self-perception
Integration of visual, tactile, and proprioceptive signals from the hand
In this experiment, we tested the hypothesis that the neural and perceptual integration of visual and tactile signals from the hand requires a concurrent match between the seen and felt positions of the hand. Moreover, we sought to examine the link between the BOLD signatures of multisensory integration and the default perception that the real hand in view is one's own. The overall procedure involved the preparation of tailor-made visual stimuli of each participant's own body and hand; the experimental setup and the number and timing of the stimuli were identical to that described in Experiment 1. To avoid extending the total duration of the scanning sessions, we included VT stimuli on only one of the two parts of the hand that was stimulated in Experiment 1, namely the right index finger. Furthermore, we selected only one of the congruence manipulations, specifically the temporal aspect of the VT stimuli. This decision was further warranted by the detailed analysis of the data from Experiment 1, which revealed no significant differences between the two target zones on the hand in terms of sensitivity to spatiotemporal multisensory congruence (Fig. 3). We implemented a 2 × 2 factorial design in which we independently manipulated the congruence of the VT stimuli (congruent vs temporally incongruent) and the posture of the hand (hand on the table: visuoproprioceptive [VP] match; hand retracted: VP mismatch). This arrangement resulted in a total of four experimental conditions of interest (Fig. 1D). Two of these conditions were identical to those containing congruent or temporally incongruent VT stimuli on the right index finger used in Experiment 1, which allowed internal validation of the findings from the first experiment. The two remaining conditions differed only in terms of the posture of the participant's own right hand in the scanner. Although the visual and tactile stimuli remained identical, in two the four acquisition sessions, the right hand was retracted from the table, placed on the participant's chest, and rotated by ∼90 degrees within the right hemispace in proximity to the body's midline. This postural manipulation introduced a clear mismatch between the visual and proprioceptive hand signals. The order of the acquisition sessions for the two positions of the hand was counterbalanced across the participants. Each condition was repeated a total of eight times in each session. The participants were instructed to perform the same tasks assigned in Experiment 1 comprising continuous fixation throughout the acquisition sessions and the detection of two catch trials per session (see Experiment 1 above for details).
Quantifying self-perception of the hand
Our experimental setup allowed us to relate BOLD responses reflecting different levels of congruence between visual, tactile, and proprioceptive signals directly with changes in the multisensory perception of the hand in view as one's own. The degree of self-attribution of the seen hand was quantified by objective (BOLD and skin conductance responses during fMRI) and subjective (postscan questionnaire) measures, as described in detail below.
Recording of skin conductance responses during fMRI
Previous studies of the feeling of body ownership that have used perceptual illusions have described how the skin conductance responses (SCRs) evoked by physical threats applied to artificial limbs or mannequins can be used as objective physiological evidence of self-attribution (Armel and Ramachandran, 2003; Ehrsson, 2007; Petkova and Ehrsson, 2008; Guterstam et al., 2011; Newport and Gilpin, 2011; Guterstam and Ehrsson, 2012). Similarly, a previous fMRI study described threat-evoked BOLD responses in key areas related to pain anticipation and anxiety when a sharp object approached the participants' real hand or an artificial hand during the rubber hand illusion (Ehrsson et al., 2007). Here, instead of using perceptual illusions to transfer the feeling of ownership over an artificial limb, we experimentally altered the congruence of multiple sensory signals to probe the maintenance of self-attribution of one's real hand in the context of incongruent multisensory stimulation. We concurrently recorded BOLD and SCR responses to threats on the participant's own right hand after periods of exposure to multisensory stimuli under different levels of congruence between visual, tactile, and proprioceptive signals. To accomplish this goal, a randomly selected 50% of all of the trials for each of the 4 main experimental conditions (see above) were followed by a 2 s threat stimulus. The latter featured a kitchen knife appearing in the field of view of the head-mounted displays and sliding swiftly just above the participant's own right hand (Fig. 8A; adopted from Guterstam et al., 2011). To record the physiological skin conductance response to the threat stimuli, we used an MR-compatible SCR-recording module (Brain Products). Before the onset of the acquisition sessions, two electrodes were attached to the index and middle fingers of the participant's left hand using electrode gel. The electrodes were connected to an MR-compatible amplifier (BrainAmp ExG MR; Brain Products), and continuous recordings for each session were collected using a computer running the BrainVision Recorder (acquisition sampling rate 5000 Hz). All recordings were stored and imported into MATLAB (MathWorks) for further offline analysis. For each threat event, we identified the maximal and minimal values of the SCR within a 5 s temporal window that was aligned to the event onset. For all trials, the event-specific SCR amplitude was then calculated as the difference between the maximal and minimal values and an average value was computed for each participant and condition (Dawson et al., 1990; Ehrsson, 2007; Petkova and Ehrsson, 2008; Guterstam and Ehrsson, 2012). For technical reasons, SCR data could not be recorded for one participant. Therefore, all further analyses featuring SCR data included 14 participants. The data passed the Kolmogorov–Smirnoff test for normality and were analyzed using a 2 × 2 repeated-measures ANOVA in SPSS version 20 software.
Postscan questionnaire.
Immediately after the acquisition sessions, the participants were presented with one additional repetition of each of the four experimental conditions of interest in counterbalanced order. At the end of each repetition, the participants were asked to rate five statements on a scale that ranged from 0 (“I completely disagree with the statement”) to 10 (“I fully agree with the statement”). The statements were presented via the head-mounted displays and the experimenters logged the verbally reported subjective ratings. The first two statements were designed to probe the multisensory perception of the hand in view as one's own single hand (S1: “It felt like my own real hand was located where I saw it”; please note that the visual stimulus always comprised the participant's own right hand) and the perceptual binding of the visual and tactile events on one's hand (S2: “It felt as if the touch I experienced was directly caused by the object I saw”). The third statement directly addressed the hypothesis that increasing the incongruence of the multisensory signals from the upper limb would impair limb self-perception in such a way that the participants would attribute the visual image of their hand to someone else, which is compatible with a loss of ownership (S3: “It felt as if I was looking at somebody else's hand”). The last two statements served as controls for the task demands and suggestibility (S4: “When I saw the objects, I had the sensation that my hand was numb”; S5: “I did not know exactly where my hand was located”). All of the subjective ratings were tested for normality using the Kolmogorov–Smirnoff test in SPSS version 20 software. Because all of the variables passed the test for normality, we used parametric repeated-measures ANOVAs to test for interaction effects among the four conditions for each individual statement as well as paired sample t tests predicated on the a priori hypotheses. For simplicity, two-tailed tests were used in all cases, although our hypotheses were always related to the directionality of the effects. The α value was set at 5%, and Bonferroni correction for multiple comparisons was applied when necessary.
Experiment 3: Multisensory integration versus endogenous visuospatial attention
In this control experiment, we tested the hypothesis that the responses to multisensory congruent VT stimuli on the hand observed in Experiments 1 and 2 could be detected even in the context of an explicit manipulation of the participants' endogenous visuospatial attention. Such a finding would eliminate the possibility that the parietal and premotor responses to multisensory VT stimuli that were measured in the present study, as well as in previously published work (Gentile et al., 2011), could be entirely accounted for by differential levels of endogenous visuospatial attention between congruent and incongruent multisensory stimuli (Corbetta and Shulman, 2002; Talsma et al., 2010). To accomplish this goal, we adapted a visuospatial attention task that was used in previous investigations of multisensory modulations of brain responses across different levels of endogenous attention (Zimmer and Macaluso, 2007). For simplicity, we included only two of the experimental conditions used in Experiments 1 and 2, namely VT congruent (temporal, spatial, and VP match) and VT temporally incongruent (spatial and VP, but not temporal, match) stimuli. The overall procedures, the number of stimuli, and their timing within each trial were identical to those used in the main experiments (see above). Regardless of the congruence of the multisensory stimuli, participants were instructed to maintain their gaze on a semitransparent fixation cross placed close to their visible right hand. While maintaining their gaze on the cross, participants directed the focus of their endogenous visuospatial attention to a small circle containing black and white grating lines (the circle's diameter subtended an angle of approximately 3° in the visual field of view of the head-mounted displays; Fig. 9A). The orientation of the grating lines changed randomly every 2 s, assuming 1 of 9 possible orientations (angles in degrees: −45, −33.75, −22.5, −11.25, 0, 11.25, 22.5, 33.75, 45). The participants were instructed to press a button with their left hand as soon as they detected the grating lines in the 0 degrees (i.e., vertical) orientation. Responses and reaction times were recorded using a computer running Presentation (Neurobehavioral Systems). Each one of the two experimental conditions was repeated nine times in a single acquisition session.
FMRI data acquisition
FMRI acquisition was performed using a Siemens TIM Trio 3T scanner equipped with a 12-channel head coil. Gradient echo T2*-weighted EPIs with BOLD contrast were used as an index of brain activity (Logothetis et al., 2001). A functional image volume was composed of 40 continuous near-axial slices of 3 mm thickness (with a 0.1 mm interslice gap), which ensured that the whole brain was within the FOV (58 × 76 matrix, 3.0 mm × 3.0 mm in-plane resolution, TE = 40 ms). One complete volume was collected every 2.54 s (TR = 2540 ms). A total of 900 functional volumes were collected for each participant for Experiment 1, which was equally divided into three sessions. A total of 660 volumes were acquired for Experiment 2, which comprised four sessions of equal length. Finally, 165 volumes were collected in Experiment 3, which comprised a single session. An initial baseline of 15 s and a final baseline of 15 s were included in each session for all experiments. The first three volumes of each session were discarded to account for non-steady-state magnetization. To facilitate the anatomical localization of statistically significant activations, a high-resolution structural image was acquired for each participant at the end of the experiment (3D MPRAGE sequence, voxel size = 1 mm × 1 mm × 1 mm, FOV = 250 mm × 250 mm, 176 slices, TR = 1900 ms, TE = 2.27 ms, flip angle = 9°).
Data preprocessing, modeling, and statistical inference
All fMRI data were screened for potential artifacts using ART (Massachusetts Institute of Technology). No noteworthy artifacts were detected in any of the acquired datasets. The functional imaging data from all three experiments underwent the same series of preprocessing steps using SPM8 (Wellcome Trust Center for Neuroimaging) before all successive analyses. The functional volumes were motion corrected with respect to the first volume of each series, corrected for slice-timing errors, and coregistered to the high-resolution structural image. The latter was segmented into gray matter, white matter, and CSF partitions and was normalized to the MNI standard space. The same transformation was then applied to all functional images, which were resliced to a resolution of 2 mm × 2 mm × 2 mm and spatially smoothed with an 8 mm FWHM Gaussian kernel.
For each experiment, we fitted a general linear model (GLM) to the data for each individual participant. We defined boxcar regressors for the conditions of interest (see below for experiment-specific details) and convolved them with the standard hemodynamic response function modeled in SPM8. Linear contrasts of interest were defined for each participant as appropriate combinations of the model parameters and exported to a second-level random-effects analysis. Given our strong a priori hypotheses on the anatomical localization of the multisensory brain regions, we applied a correction for multiple comparisons in all statistical tests within regions of interest defined around peaks from a previous study (Gentile et al., 2011). Therefore, unless otherwise specified, all reported peaks of activations are statistically significant at a threshold of p < 0.05 after correction for multiple comparisons using familywise error corrections within the corresponding volumes of interest. For visualization purposes only, all activation maps are displayed at a threshold of p < 0.001 (uncorrected) and are overlaid onto a representative inflated cortical surface using Freesurfer (MGH) as well as onto axial, sagittal, or coronal sections from the average anatomical image for all participants in the study. All anatomical localizations of the significant peaks of activation refer to this average anatomical image, with the nomenclature from the human brain atlas of Duvernoy and Parratte (Duvernoy and Parratte, 1999). Contrast estimates for all significant peaks of activation were extracted using MATLAB (Mathworks) and displayed as bar charts together with the corresponding SEs.
FMRI regional analyses
In Experiment 1, we intended to identify brain regions that display BOLD responses to VT stimuli that are enhanced by the temporal and spatial congruence of the unisensory components. We defined two linear contrasts, VT Congruent vs VT Time Incongruent and VT Congruent vs VT Space Incongruent, by pooling data from the trials that involved both the index finger and the back of the hand. These two contrasts are fully matched in terms of all visual and tactile inputs across the two locations on the hand. Therefore, significant voxels obtained from the above contrasts are taken to contain neuronal populations that specifically integrate visual and tactile stimuli from the hand that are congruent in time and colocalized on the same part of the hand. Significant peaks of activations, corrected for multiple comparisons, were obtained from both contrasts. To combine the activation maps from the two contrasts and display them as a single map, we used an inclusive masking procedure. One of the two contrasts, VT Congruent vs VT Space Incongruent, served as an inclusive mask for the other contrast, VT Congruent vs VT Time Incongruent. Please note that using one of the contrasts as an inclusive mask for the other contrast has no effect on the validity of the statistical methods applied and is not in any way circular. Swapping the role of the two contrasts, for example, using VT Congruent vs VT Time Incongruent as an inclusive mask for VT Congruent vs VT Space Incongruent, yielded the same significant peaks of activation, as expected.
Next, we moved on to test the hypothesis that the spatiotemporal multisensory congruence effect obtained from the previous analysis was present for both stimulated parts of the hand. We repeated the procedure described above with linear contrasts separately defined for the two parts of the hand. In this case, spatial incongruence was first defined as identical visual input in the presence of incongruent tactile input. The same results were obtained by labeling as spatially incongruent all trials with identical tactile input but incongruent visual input for each part of the hand.
In Experiment 2, we first tested the hypothesis that the integration of visual and tactile signals from the hand is modulated by proprioceptive signals from the upper limb to construct a unitary percept of one's hand in space. To this end, we defined the interaction contrast from the 2 × 2 factorial design (see above) as (VT CongruentVP Match vs VT Time IncongruentVP Match) vs (VT CongruentVP Mismatch vs VT Time IncongruentVP Mismatch), where the labels VP Match and VP Mismatch indicate the two different postures of the right hand, respectively. This contrast yielded voxels that exhibited BOLD responses that cannot be accounted for by an additive effect of the (temporal) congruence between the visual and tactile stimuli and the congruence between the seen and felt postures of one's own right hand, as determined by the match between vision and proprioception. Instead, this contrast revealed voxels containing neuronal populations that compute the integration of the visual and tactile signals only in the context of a postural match between the seen and felt location of the hand. This response pattern would be expected from brain areas that perform multisensory integration in hand-centered coordinates.
Next, we tested the hypothesis that the disintegration of the congruence between vision, touch, and proprioception would impair the perceptual mechanisms that underlie the “default” self-attribution of the visible real hand. We defined a 2 × 2 factorial interaction contrast that was identical to the contrast described above but that modeled the two-second threat events after the periods of exposure to multisensory stimuli under different levels of congruence. This contrast revealed voxels with BOLD responses that reflected the modulation of the cortical-threat-evoked response in areas related to pain anticipation and anxiety (Lloyd et al., 2006; Ehrsson et al., 2007).
In Experiment 3, our aim was to reproduce the sensitivity to multisensory congruence that we observed in the first two experiments during a situation in which we explicitly controlled for endogenous visuospatial attention. Behavioral data were collected in terms of the response times and accuracies for each trial. The fMRI data were analyzed by defining the linear contrast VT Congruent vs VT Time Incongruent in the presence of an explicit task that shifted the focus of the participants' endogenous visuospatial attention away from the stimuli on the hand. Statistical inference was conducted by testing the above contrast on all the significant peaks of activation identified by the main analysis in Experiment 2 (see above). Please note that because the visuospatial attention task was introduced in a separate acquisition session at the end of the main experiment, this statistical analysis is not in any way circular.
FMRI effective connectivity analyses
We used psychophysiological interaction (PPI) analyses to test the hypothesis that the integration of spatially and temporally congruent signals in the fronto-parieto-cerebellar regions would be associated with increases in the effective connectivity between different nodes of the involved networks. The PPI reflects context-induced changes in the strength of the connectivity between two brain regions, a seed and a target, as measured by a change in the magnitude of the linear regression slope that relates their underlying activities. A significant PPI indicates that the contribution of one area to the activity of another area changes significantly with the experimental or psychological context (Friston et al., 1997). Given that such interactions are assumed to take place at the neuronal level, it is recommended to estimate the underlying neuronal signal from the measured BOLD response. This process is built on the deconvolution of the hemodynamic response function from the measured BOLD time series and was described in detail in a seminal paper by Gitelman et al. (2003). In both Experiments 1 and 2, connectivity changes between a seed region in the left intraparietal sulcus (IPS) and the other areas of the brain were assessed for specific manipulations of the congruence of visual, tactile and proprioceptive signals from the hand. We chose to place the seed in the left IPS on the basis of both neurophysiological evidence and previous brain imaging studies. Given its ideal anatomical location for receiving converging inputs from sensory areas that process visual, tactile, and proprioceptive signals and its well known anatomical connectivity with other multisensory areas in the frontal and inferior parietal lobes (Grefkes and Fink, 2005; Culham et al., 2006; Avillac et al., 2007), the IPS is ideally suited to act as a central node for the processing of multisensory bodily signals (Makin et al., 2008; Ehrsson, 2012). Indeed, the left IPS has been implicated in a number of studies that have investigated the processing of multisensory signals from the contralateral right hand (Lloyd et al., 2003; Ehrsson et al., 2004; Makin et al., 2007; Brozzoli et al., 2011, 2012; Gentile et al., 2011). The seed region was defined for each participant and was centered on the peak voxel found within a 10 mm radius sphere centered on the group peak for the contrast of interest. The seed region's time series was computed as the first eigenvariate of all voxels within a 4 mm radius sphere centered on the participant-specific peak voxel. At the individual level, three regressors were created that represented the time series of the seed region (the physiological factor), the experimental manipulation of interest (the psychological factor), and their product (the PPI). A GLM containing these three regressors was estimated for each participant and contrast estimates for the PPI regressor were analyzed at the group level using one-sample t tests in a random-effects model. Statistical inference was applied in a manner that was identical to the approach used for all of the fMRI regional analyses (see above).
In Experiment 1, we tested the hypothesis that both the temporal congruence and the spatial congruence of visual and tactile stimuli on one's own seen real hand modulate the effective connectivity between key nodes of multisensory circuits that integrate hand signals (Makin et al., 2008; Gentile et al., 2011; Guterstam et al., 2013). The seed region in the left IPS was centered on the group peak from the main analysis of Experiment 1 (MNI coordinates: x = −36, y = −44, z = 58; Figs. 2B, 4A–C). The average distance of the participant-specific seed region from the group peak was 6.52 ± 2.40 mm. The PPI analyses followed the same logic as the corresponding regional analyses. We conducted two separate analyses using VT Congruent vs VT Time Incongruent and VT Congruent vs VT Space Incongruent as experimental factors and extracted significant voxels for both contrasts by correcting for multiple comparisons on the basis of our a priori hypotheses (see above). In the first of these two analyses, the psychological regressor in the PPI model corresponded to VT Congruent vs VT Time Incongruent, where the first condition was coded with a weight of +1 and the second with a weight of −1, respectively. All other conditions in the original GLM were coded with a weight of 0. In the second PPI analysis for Experiment 1, the psychological regressor corresponding to VT Congruent vs VT Space Incongruent was generated by coding the first condition with a weight of +1 and the second with a weight of −1, respectively. Again, the remaining conditions were coded with 0.
In Experiment 2, our aim was to demonstrate that the effective neural connectivity between the multisensory regions described above changes as a function of the proprioceptive inputs from the upper limb. We defined the comparison VT Congruent VP Match vs VT Congruent VP Mismatch as the experimental factor for a PPI analysis with the seed in the left IPS (peak voxel from the factorial interaction analysis: x = −36, y = −44, z = 58; Figs. 5B, 6A–C). The psychological regressor was therefore generated by coding the condition VT Congruent VP Match with +1 and the condition VT Congruent VP Mismatch with −1, respectively. All remaining conditions in the original GLM were coded with a weight of 0. The average distance of the participant-specific seed region from the group peak was 6.06 ± 2.24 mm. It is noteworthy that, because the PPI-GLM contains a regressor that explicitly models the experimental manipulation of interest, the effective connectivity results cannot be explained by differential levels of brain activation in response to the different experimental conditions. Instead, these findings provide independent evidence in favor of the hypothesis that the identified brain regions work in concert to build a representation of the hand that draws from multiple sensory modalities and strictly depends on their relative congruence.
Correlations between subjective and objective measures of self-attribution of the hand and multisensory brain activity
To further corroborate the finding that different levels of congruence among visual, tactile, and proprioceptive signals impact the way participants maintained a stable perception of their own hand as part of their body, we performed multiple regression analyses to relate observed changes in the BOLD response to subjective and objective indices of hand self-perception for Experiment 2. First, we defined a “self-attribution index” by computing the interaction score for subjective ratings to statement S1 for each participant according to the formula: (VT Congruent VP Match vs VT Time Incongruent VP Match) vs (VT Congruent VP Mismatch vs VT Time Incongruent VP Mismatch). The indices for all participants were then entered as a covariate in a multiple regression model alongside the BOLD interaction effect. The model was estimated for the whole brain and yielded voxels that displayed a significant positive correlation between the subjective ratings of hand self-perception and the corresponding brain responses. Second, we computed the interaction effect size for the skin conductance responses to the four experimental conditions in Experiment 2 according to the same formula: (VT Congruent VP Match vs VT Time Incongruent VP Match) vs (VT Congruent VP Mismatch vs VT Time Incongruent VP Mismatch). The SCR indices for each participant were entered as a covariate in a new multiple regression model alongside the BOLD interaction effect, which was related to the intervals of multisensory stimulations preceding the threat events. This analysis revealed voxels with a BOLD effect size (which is related to the magnitude of the response to multisensory inputs) that significantly predicted the physiological response to a threat event that followed in time. Both multiple regression analyses were anatomically unbiased (i.e., they were whole-brain analyses) and independent from the main regional analyses (see above) and therefore were not circular.
Results
Experiment 1: Temporal and spatial congruence enhances the integration of visual and tactile signals from the hand
We first tested the hypothesis that the integration of visual and tactile signals from one's own hand in multisensory premotor-parietocerebellar regions would obey basic principles of temporal and spatial congruence (Meredith and Stein, 1986; Avillac et al., 2007; Stein and Stanford, 2008). We observed that the bilateral cortices that line the anterior and medial segments of the IPS, which are close to the junction with the postcentral sulcus and the superior parietal gyrus, contained multiple peaks of activation that displayed responses to VT stimuli on the hand that were significantly modulated by both the temporal and the spatial congruence of the sensory signals VT Congruent vs VT Time Incongruent and VT Congruent vs VT Space Incongruent (p < 0.05 corrected for multiple comparisons; Fig. 2; Table 1). More inferiorly in the parietal lobe, we observed significant peaks of activation in the bilateral supramarginal gyri, part of clusters that encompass the inferior posterior parietal cortex, the parietal operculum, and the lower segment of the postcentral sulcus. In the frontal lobe, significant responses were measured in the bilateral portions of the precentral gyrus and in the inferior segments of the precentral sulcus, which correspond to activations in the dorsal and ventral premotor cortices. Significant modulations were also observed in the bilateral parts of the lateral occipital cortex (LOC) that matched the stereotactic location of the proposed extrastriate body area (EBA; Downing et al., 2001; Costantini et al., 2011; Weiner and Grill-Spector, 2011). Subcortically, the same BOLD pattern was observed in lobule VII of the inferior and posterior right (ipsilateral) cerebellum. Further details on the anatomical locations of the reported activations are presented in Figure 2 and Table 1.
Next, we broke down the analysis described above for the two anatomical locations on the hand separately (finger vs dorsum; see Materials and Methods for details), which confirmed that the congruence principles that underlie the integration of vision and touch generalizes across both locations, as expected. Significant effects of temporal and spatial VT congruence were found in the same areas identified by the main contrast described above (p < 0.05 corrected for multiple comparisons; Fig. 3) for both the index finger and the back of the hand when the data from these two stimulation sites were analyzed separately. Therefore, vision and touch are fused together under conditions of spatiotemporal congruence to form a unitary percept on both parts of the hand that received VT stimulation.
Increased effective connectivity for temporally and spatially congruent VT stimuli
We hypothesized that the coactivation of premotor and posterior parietal areas indicates the specific engagement of anatomically interconnected frontoparietal circuits that integrate sensory information from the body (Graziano and Botvinick, 2002; Fogassi and Luppino, 2005; Ehrsson, 2012). Similarly, the reported activations in the cerebellum suggest the involvement of specific parieto-cerebellar circuits (Ramnani, 2006; Sultan and Glickstein, 2007; Sang et al., 2012). Finally, we predicted that multisensory influences would alter the effective connectivity between the intraparietal cortex and body-sensitive visual areas in the lateral occipital cortices (Orlov et al., 2010; Costantini et al., 2011; Downing and Peelen, 2011). In the PPI analyses that used the left IPS as the seed region, we observed increases in the effective connectivity between this area and regions in the right (ipsilateral) intraparietal cortex, the bilateral inferior parietal cortices (supramarginal gyrus, parietal operculum, and inferior segments of the postcentral sulcus), and the bilateral precentral gyri and sulci (dorsal and ventral premotor cortices; Fig. 4). Second, significant modulations in the effective connectivity with the left IPS were observed in the inferior and posterior right (ipsilateral) cerebellum, with peaks located in lobule VII. Third, regions in the bilateral lateral occipital cortices in locations that match that of the EBA in standard space (Downing and Peelen, 2011; Weiner and Grill-Spector, 2011) displayed significantly stronger connectivity with the left intraparietal cortex under conditions of VT congruence. These increases in the effective connectivity were dependent on the temporal and spatial congruence of visual and tactile signals from the hand. Therefore, our results suggest that the construction of unitary percepts on the hand is the result of the dynamic integration of information along specific anatomical-functional pathways that is governed by principles of temporal and spatial congruence between sensory signals.
Experiment 2: VP congruence and effects on hand self-attribution
Next, we tested the hypothesis that the integration of visual and tactile signals from the hand requires concurrent congruent visual and proprioceptive signals from the upper limb and, consequently, that a mismatch between the seen and felt posture of the hand would abolish this VT integration effect. Moreover, we sought to relate the BOLD responses that reflect multisensory integration in the premotor-parietocerebellar regions to changes in the default perception of the hand in view as one's own. Changes in limb self-attribution were probed with complementary neuroimaging (threat-evoked BOLD responses in areas related to pain anticipation) and subjective (questionnaire ratings) and objective psychophysiological (skin conductance) measures.
VT integration depends on congruent VP signals
To test our hypothesis, we implemented a 2 × 2 factorial design in which we independently manipulated the (temporal) congruence of visual and tactile signals and the congruence between the seen and felt positions of the participant's hand (Fig. 1D). When examining the interaction contrast (VT CongruentVP Match vs VT Time IncongruentVP Match) vs (VT CongruentVP Mismatch vs VT Time IncongruentVP Mismatch), which identifies voxels that combine visual, tactile, and proprioceptive signals exclusively under conditions of congruence among all three modalities, we identified multiple peaks of activation in the bilateral anterior and medial segments of the IPS, which belong to clusters that encompass the junction with the postcentral sulcus and the superior parietal gyrus (p < 0.05 corrected for multiple comparisons; Fig. 5; Table 2). In the inferior parietal lobes, voxels in the bilateral supramarginal gyri, parts of the clusters that extend into the inferior end of the postcentral sulcus, the parietal operculum, and the right inferior postcentral gyrus displayed the same modulations. Portions of the bilateral precentral gyri and inferior segments of the precentral sulcus in the frontal lobes (dorsal and ventral premotor cortices) also exhibited a significant interaction. Finally, the same effect was observed in the right (ipsilateral) inferior and posterior lobule VII of the cerebellum and in the right LOC (in a stereotactic location that possibly corresponds to the proposed EBA). Further details on the anatomical locations of the activations and the corresponding statistical parameters are presented in Figure 5 and Table 2. Therefore, we demonstrated that the neural and perceptual integration of VT hand signals is contingent upon congruent VP signals from the upper limb in a way that is compatible with multisensory integration in a hand-centered reference frame (Makin et al., 2007; Brozzoli et al., 2011, 2012).
Connectivity between multisensory areas tightens with congruent VP inputs
Next, we set out to demonstrate that congruent VP inputs that concern the location of the participant's hand constitute a necessary factor for the increase in connectivity between multisensory areas that integrate vision and touch (the areas observed in Experiment 1). To test this hypothesis, we performed an independent PPI analysis by selecting the posture of the hand as the experimental factor of interest. We replicated the same connectivity patterns between a seed region in the left IPS and key multisensory regions that we identified in Experiment 1 (p < 0.05 corrected for multiple comparisons; Fig. 6). Specifically, we observed increases in the effective connectivity between the left anterior IPS and regions in the right (ipsilateral) anterior and medial IPS, in the bilateral inferior parietal cortices (with peaks located in the supramarginal gyrus and in the inferior portions of the postcentral sulcus and gyrus), and in the bilateral precentral gyri and inferior parts of the precentral sulci (dorsal and ventral premotor cortices). Moreover, the right (ipsilateral) inferior and posterior cerebellum displayed enhanced connectivity with the left IPS. Finally, the same effect was observed in regions of the LOC in locations compatible with the activations reported above. These results imply that the effective connectivity within multisensory circuits that process the integration of visual and tactile signals from the hand is significantly enhanced in the context of congruent visual and proprioceptive inputs that concern hand position.
Quantifying the multisensory perception of the real hand in view as one's own
We hypothesized that the neural mechanisms that underlie the integration of vision, touch, and proprioception from one's real hand would be related to the perceptual binding of multisensory signals into a unified experience of the hand as part of one's own body (the feeling of limb ownership). Accordingly, we expected that increasing the incongruence between the visual, tactile, and proprioceptive signals would affect the default feeling of limb ownership (Moseley et al., 2008; Barnsley et al., 2011; Newport and Gilpin, 2011). We recorded both subjective (questionnaires) and objective (SCR and BOLD responses to physical threats applied to the hand) indices of self-attribution.
Subjective evidence
During the experimental sessions, participants retained a strong subjective feeling that they were looking directly at their own hand only under conditions of congruence between visual, tactile, and proprioceptive inputs (statement S1; interaction between VT congruence and VP match, F(1,14) = 19.286, p = 0.001; Fig. 7A). Moreover, such multisensory congruence was associated with the subjective perception of a unitary VT event on the hand (statement S2; interaction between VT congruence and VP match, F(1,14) = 4.808, p = 0.046; Fig. 7A). Interestingly, in conditions with incongruent VT or VP signals, the participants provided significantly lower ratings as if they were no longer feeling that the hand that they looked at was their own. Moreover, we found that incongruent visual, tactile, and proprioceptive signals resulted in the participants affirming the experience that they were looking at somebody else's hand when they were in fact looking at their own real right hand (statement S3; main effect of condition, F(1,14) = 182.21, p < 0.001; pairwise t tests, all p < 0.014; VT CongruentVP Match < VT Time IncongruentVP Match < VT CongruentVP Mismatch < VT Time IncongruentVP Mismatch, two-tailed paired samples t tests Bonferroni-corrected for multiple comparisons; Fig. 7A). These findings demonstrate that violations of the spatiotemporal congruence of VT-proprioceptive signals from one's real hand affect the default multisensory perception of the hand as one's own.
Peripheral physiological evidence
We recorded threat-evoked SCRs as objective evidence in favor of the hypothesis that multisensory congruence is required to maintain default self-attribution of one's real hand. A repeated-measures ANOVA revealed significantly larger SCRs to threat events that followed periods of congruent multisensory stimuli compared with the three different incongruent conditions (interaction between VT congruence and VP match, F(1,13) = 6.517, p = 0.024; post hoc two-tailed paired t tests revealed greater SCRs to VT CongruentVP Match than to all other conditions, p < 0.05; Fig. 8C).
BOLD evidence
Finally, we analyzed the BOLD responses that were evoked by the appearance of the knife close to the hand. To accomplish this analysis, we defined a 2 × 2 factorial interaction contrast that revealed increased responses in the left anterior cingulate cortex and in the right anterior insular cortex (p < 0.05 corrected for multiple comparisons; Fig. 8D), areas previously associated with emotional responses related to the anticipation of nociceptive stimuli on the hand (Critchley et al., 2003; Singer et al., 2004; Wager et al., 2004; Farrell et al., 2005; Lloyd et al., 2006; Ehrsson et al., 2007). Moreover, significant responses were observed in the right premotor cortex and in the right cerebellum (Fig. 8D), possibly associated with the activation of a sensorimotor circuit related to the preparation of defensive bodily movements (Graziano and Cooke, 2006). In conclusion, converging evidence from subjective, physiological, and BOLD measures indicates that breaking down the congruence between visual, tactile, and proprioceptive signals from the real hand significantly weakens the default feeling of ownership of the seen hand.
Multisensory brain activity is related to changes in self-attribution of the real hand
We performed independent multiple regression analyses to test the hypothesis that BOLD responses that reflect the integration of congruent multisensory signals would be linearly related to changes in the self-perception of the hand (as quantified above). First, we found that the rated degree of ownership of the seen hand could be predicted from the effect size of the interaction effect reflecting the VT integration in the context of congruent VP signals in the left ventral premotor and inferior parietal cortices and in the right intraparietal and lateral occipital cortices (p < 0.05 corrected for multiple comparisons in a separate multiple regression analysis; Fig. 7B). At a lower threshold, the same relation was observed in the right premotor cortex (x = 48, y = 14, z = 40, p < 0.001 uncorrected for multiple comparisons) and in the left intraparietal cortex (x = −32, y = −48, z = 48, p < 0.001 uncorrected), in line with the activations observed in these regions in the main factorial analyses described above. Second, we could predict the effect size of the threat-evoked SCRs from the interaction of BOLD effect size, which reflects the multisensory integration associated with the preceding period of VT stimulation. This relationship was observed in the right ventral and dorsal premotor cortices and in the right LOC (p < 0.05 corrected for multiple comparisons; Fig. 8E). In summary, these data provide further robust evidence for a link between multisensory integrative mechanisms and the maintenance of default self-attribution of one's hand.
Experiment 3: Multisensory congruence versus endogenous visuospatial attention
Finally, we conducted an additional experiment to test the hypothesis that the congruence effects between visual, tactile, and proprioceptive signals that we measured in Experiments 1 and 2 could be reproduced even in the context of an explicit manipulation of the participants' endogenous visuospatial attention (Corbetta and Shulman, 2002; Macaluso and Driver, 2005) by using a visuospatial attention task (Zimmer and Macaluso, 2007). Neither the average reaction time (congruent stimuli: 596 ± 65 ms; incongruent stimuli: 585 ± 70 ms; t = 0.727, p = 0.48, two-tailed paired t test; Fig. 9B) nor the accuracy (congruent stimuli: 88 ± 4%; incongruent stimuli: 87 ± 4%; t = 0.564, p = 0.59, two-tailed paired t test; Fig. 9B) differed between congruent and incongruent blocks; this suggests that participants fully engaged their task-related visuospatial attention equally across congruent and incongruent blocks. In terms of brain activity, all the key areas that displayed enhanced BOLD responses to congruent visual, tactile, and proprioceptive signals in Experiment 2 (Fig. 5) showed a greater response to congruent as opposed to incongruent VT stimuli on the hand, even in the context of an explicit visuospatial attention task (p < 0.05 corrected for multiple comparisons; Fig. 9C). This finding rules out the possibility that our results could be accounted for by changes in the level of visuospatial attention between congruent and incongruent multisensory events on the hand. Rather, this result suggests that the neural mechanisms that underlie the integration of sensory signals from the hand are, at least in part, independent of the deployment of one's endogenous visuospatial attention and likely operate in a primarily bottom-up fashion (Macaluso et al., 2005; Zimmer and Macaluso, 2007).
Discussion
This study had two main findings. First, we identified a set of functionally interconnected areas in the premotor, parietal, and cerebellar cortices that contain neuronal populations that integrate visual, tactile, and proprioceptive hand signals according to basic spatiotemporal rules. Second, we found that disrupting the congruence of the sensory inputs leads to a diminished sense of ownership of the hand. This loss of self-attribution could be predicted from multisensory responses in premotor and parietal areas and was quantified by subjective, psychophysiological, and BOLD measures. These findings extend our knowledge of the mechanisms that underlie the integration of multisensory bodily signals (Graziano and Botvinick, 2002; Macaluso and Maravita, 2010) and have important bearings on models of bodily self-perception (Tsakiris, 2010; Blanke, 2012; Ehrsson, 2012; Moseley et al., 2012).
Integration of vision, touch, and proprioception from the real hand
We report that multisensory regions in the frontal, parietal, and cerebellar cortices combine visual and tactile hand signals preferentially under conditions of spatiotemporal congruence (Calvert et al., 2000; Holmes and Spence, 2005). Our analyses revealed BOLD population responses that are reminiscent of the basic spatiotemporal principles described in seminal neurophysiological studies of multisensory integration (Meredith and Stein, 1986; Avillac et al., 2007). Furthermore, we provide evidence that VT integration depends on congruent VP signals from the upper limb in a manner that is consistent with integration in a hand-centered reference frame (Makin et al., 2007; Brozzoli et al., 2011, 2012). Because our results stem from fMRI block designs that contrast fully congruent versus fully incongruent multisensory conditions, we cannot make inferences about the specific nature of the temporal and spatial windows of neuronal integration for individual stimuli as in earlier neurophysiological investigations in animals (Avillac et al., 2007; Stein and Stanford, 2008). However, our findings provide neuroimaging evidence that human premotor-parietocerebellar regions are sensitive to the congruence of VT-proprioceptive signals from one's real upper limb. Finally, a control experiment revealed that the multisensory congruence effects could be reproduced in the context of an explicit manipulation of visuospatial attention (Zimmer and Macaluso, 2007).
Frontoparietal circuits
Increases in the BOLD response in the anterior intraparietal cortices and regions in the superior and inferior parietal and premotor cortices as well as the effective connectivity between these regions were contingent on the spatiotemporal congruence of multisensory signals. In nonhuman primates, dense anatomical connections have been described between anterior and medial portions of the intraparietal cortex (Grefkes and Fink, 2005) and posterior superior parietal (Mountcastle et al., 1975; Iriki et al., 1996; Graziano et al., 2000), inferior parietal (Hyvärinen and Poranen, 1974; Luppino and Rizzolatti, 2000), and premotor (Rizzolatti et al., 1981a,b; Fogassi et al., 1999; Graziano and Gandhi, 2000) areas. Neurons in the above regions increase their discharge rate in response to VT signals that obey principles of spatiotemporal congruence (Duhamel et al., 1998; Avillac et al., 2007) and to congruent visual and proprioceptive information from the upper limb (Graziano, 1999; Graziano et al., 2000). Our results support the existence of frontoparietal networks that integrate VT-proprioceptive hand signals and extend previous findings by demonstrating that spatiotemporal congruence is a key factor that modulates neuronal processing and connectivity within the above circuits.
Corticocerebellar circuits
Our findings also suggest the involvement of posterior-inferior parts of the lateral cerebellar hemispheres in the integration of multisensory hand signals. This part of the cerebellum receives converging proprioceptive (Murphy et al., 1973; van Kan et al., 1993), visual (Glickstein et al., 1994; Sultan and Glickstein, 2007), and tactile (Bloedel, 1973; Dum and Strick, 2003) inputs and is responsive to congruent multisensory stimulation of the right upper limb (Kavounoudias et al., 2008; Naumer et al., 2010; Gentile et al., 2011). Moreover, the cerebellum plays an important role in the processing of synchrony and congruence among the senses (Miall et al., 1993; Blakemore et al., 2000; Ito, 2000). Our finding of increased connectivity between this region and the intraparietal cortex is consistent with known anatomical connections between these areas (Glickstein et al., 1994; Ramnani, 2006; Sang et al., 2012). We theorize that the cerebellum is pivotal for the detection of congruent multisensory signals and the formation of crossmodal predictions. These computations are then embedded into multisensory processing in frontoparietal areas, leading to the fusion of hand signals from multiple sensory modalities.
Multisensory incongruence leads to a loss of default limb self-attribution
We present converging evidence from subjective, psychophysiological, and neural measures demonstrating that breaking down the congruence between multisensory hand signals entails a significant loss of the feeling of ownership of one's real hand. Our approach differs from previous studies of limb self-attribution that have relied on perceptual illusions in which the sense of touch and the feeling of ownership are referred to external objects (Botvinick and Cohen, 1998; Ehrsson et al., 2004; Tsakiris et al., 2007; Moseley et al., 2008). Therefore, whereas previous studies have demonstrated the importance of multisensory integration in establishing illusory ownership of noncorporeal objects, our results suggest that analogous mechanisms underlie the natural self-attribution of one's real hand (Botvinick, 2004; Moseley, 2011; Newport and Gilpin, 2011). Our findings are compatible with neuropsychological studies of disorders in bodily self-awareness that have proposed that the failure to integrate visual, tactile, and proprioceptive signals in multisensory frontoparietal circuits might result in impaired bodily self-perception and loss of ownership (Vallar and Ronchi, 2009; Newport and Gilpin, 2011). Our results support this proposal by revealing that the disintegration of multisensory signals in frontoparietal-cerebellar multisensory circuits leads to losses in hand self-attribution among healthy participants.
Subjective, psychophysiological, and neural evidence for reduced self-attribution of the hand
The observed changes in the self-perception of the upper limb were systematically related to the degree of functional engagement of the multisensory neural circuits under investigation. Specifically, declines in the subjective feeling of ownership of the hand could be predicted from BOLD responses in the ventral premotor, intraparietal, and inferior parietal cortices, which is consistent with the central role of frontoparietal networks in maintaining default bodily self-perception (Vallar and Ronchi, 2009; Gentile et al., 2011). Moreover, we observed that SCRs significantly echoed the disintegration of multisensory inputs by displaying weakened physiological responses to physical threats directed toward the hand after exposure to incongruent multisensory stimuli (Newport and Gilpin, 2011). We were able to relate the threat-evoked SCRs to the multisensory premotor responses. Furthermore, the analysis of the brain responses evoked by the same threat events revealed an attenuation of the BOLD signal in the anterior cingulate and insular cortices, which are key areas involved in the anticipation of painful stimuli (Critchley et al., 2003; Singer et al., 2004; Wager et al., 2004; Farrell et al., 2005; Ehrsson et al., 2007); this attenuation mirrored the loss of self-attribution of the hand. In addition, threat-induced BOLD responses were observed in parts of the premotor cortex and cerebellum, which are central components of neural circuits that plan and control hand movements (Jeannerod et al., 1995; Culham et al., 2006; Bernier and Grafton, 2010), with an emphasis on defensive movements that are executed to protect the body from potential threats within peripersonal space (Graziano and Cooke, 2006; Lloyd et al., 2006; Sambo et al., 2012). Exposure to incongruent hand signals led to reductions in the activation of these neural circuits, which reflected the loss of self-attribution of the real hand.
Multisensory modulations of body-sensitive visual areas
Interestingly, we found that congruent multisensory inputs enhanced the BOLD response in the bilateral LOCs. Although we did not perform independent functional localization, the sites of the reported activations are compatible with studies that have characterized the EBA, a cortical region highly specialized in the visual processing of body parts (Downing et al., 2001; Kontaris et al., 2009; Orlov et al., 2010; Weiner and Grill-Spector, 2011). The finding of multisensory modulations of LOC activity is consistent with previous reports of tactile (Amedi et al., 2001; Amedi, 2002; Kitada et al., 2009; Lacey et al., 2009; Costantini et al., 2011) and proprioceptive (Astafiev et al., 2004; Orlov et al., 2010) influences on visual responses in lateral occipital areas. These influences likely stem from sources in the posterior parietal cortex (Macaluso et al., 2000; Downing and Peelen, 2011), which is consistent with our finding of enhanced connectivity with the intraparietal cortex. Substantial debate surrounds the role of these regions in the visual self-recognition of body parts (Arzy et al., 2006; Saxe et al., 2006; Peelen and Downing, 2007; Myers and Sowden, 2008; Hodzic et al., 2009). Our results suggest that body-sensitive extrastriate areas receive information from frontoparietal circuits about the self-identity of the seen hand. Crucially, the BOLD response in these areas was enhanced when the participants experienced that they were looking at their own real hand and was correlated with the subjective and psychophysiological measures of limb self-attribution. We theorize that the LOC responses reflect crossmodal interplay (Driver and Noesselt, 2008; Kayser et al., 2010), whereby congruent tactile and proprioceptive signals influence the visual processing of hand signals via top-down modulations from posterior parietal regions in a way that potentially aids the self-recognition of the hand.
Concluding remarks
This study demonstrates that the integration of hand signals in frontoparietal-cerebellar circuits depends on the spatiotemporal congruence of the VT-proprioceptive inputs. Moreover, our findings unveiled a multifaceted connection between the disruption of multisensory congruence and the loss of the default self-attribution of the hand. The present findings offer an important advancement in our understanding of the multisensory integrative mechanisms that support the default self-perception of one's body.
Footnotes
This study was supported by the European Research Council, the Swedish Foundation for Strategic Research, the Human Frontier Science Program, the James S. McDonnell Foundation, the Swedish Research Council, and Söderberska Stiftelsen. G.G. and A.G. are members of the Stockholm Brain Institute Research School. C.B. was supported by Marie Curie Actions. The fMRI scans were conducted at the MR-Center at the Karolinska University Hospital in Huddinge, Stockholm, Sweden. We thank Hiske van Duinen, Christopher Berger, and Zakaryah Abdulkarim for assistance with the experimental procedures.
- Correspondence should be addressed to Giovanni Gentile, Department of Neuroscience, Karolinska Institutet, Retzius väg 8, 17177 Stockholm, Sweden. Giovanni.Gentile{at}ki.se
This article is freely available online through the J Neurosci Author Open Choice option.