Positron emission tomography in three-dimensional acquisition mode was used to identify the neural populations involved in tactile–visual cross-modal transfer of shape. Eight young male volunteers went through three runs of three different matching conditions: tactile–tactile (TT), tactile–visual (TV), and visual–visual (VV), and a motor control condition. Fifteen spherical ellipsoids were used as stimuli.
By subtracting the different matching conditions and calculating the intersections of statistically significant activations, we could identify cortical functional fields involved in the formation of visual and tactile representation of the objects alone and those involved in cross-modal transfer of the shapes of the objects.
Fields engaged in representation of visual shape, revealed in VV–control, TV–control and TV–TT, were found bilaterally in the lingual, fusiform, and middle occipital gyri and the cuneus. Fields engaged in the formation of the tactile representation of shape, appearing in TT–control, TV–control and TV–VV, were found in the left postcentral gyrus, left superior parietal lobule, and right cerebellum.
Finally, fields active in both TV–VV and TV–TT were considered as those involved in cross-modal transfer of information. One field was found, situated in the right insula–claustrum. This region has been shown to be activated in other studies involving cross-modal transfer of information. The claustrum may play an important role in cross-modal matching, because it receives and gives rise to multimodal cortical projections. We propose here that modality-specific areas can communicate, exchange information, and interact via the claustrum.
The relations between sight and touch have been for long a matter of debate, as in 1709 the philosopher George Berkeley in his book An Essay Towards A New Theory of Vision concluded that there were no necessary connections between a tactile world and a visual world. Even today there is theoretical and experimental support for the view that there are no cortical convergence regions, in which neuron populations integrate information from different sensory modalities and from different submodalities (Abeles, 1991; Felleman and van Essen, 1991; Young et al., 1992; Singer 1993, 1995). Information from visual submodalities seem to be processed by parallel “functional streams” (Ungerleider and Mishkin, 1982;Livingstone and Hubel, 1988; Haxby et al., 1991; Zeki et al., 1991;Gulyas et al., 1994). Because further direct anatomical connections between somatosensory and visual areas in primates are sparse, if at all existing (Selemon and Goldman-Rakic, 1988; Cavada and Goldman-Rakic 1989; Neal et al., 1990), one might expect that somatosensory and visual information is processed by segregated populations of neurons. Still everyone recognizes a key, whether it is felt in a pocket or seen on a table. How is this possible?
The research done on tactile–visual cross-modal performance has been based mainly on the assumption that there must exist amodal representations of form in so-called polysensory areas, defined as areas activated by stimuli from more than one sensory modality. Nevertheless, in a review on cross-modal abilities in nonhuman primates, Ettlinger and Wilson (1990) concluded that there is no polysensory cross-modal area, no cross-modal region “in which representations formed in one sense would reside and be accessed by another sense,” but suggested instead a system in which the senses can access each other directly from their sensory-specific systems. For the present purpose, we define cross-modal-specific areas as areas activated only when information coming from two or more different sensory modalities is compared.
We examined tactile–visual matching of the shapes of objects. In tactile–visual matching of three-dimensional objects, the tactile information is of a different nature from that of visual information. When the hand is used to palpate an object, the information is sampled in a piecemeal manner, such that only a part of the object surface is covered by the fingers during each sampling path (Roland and Mortensen, 1987), and information is integrated over time to form a truly three-dimensional shape representation. In the visual system, information about an object can be simultaneously obtained and transferred to the visual cortex, but if it is seen from a stationary angle of view, only a part of its surface is sampled. This difference speaks against any common polymodal or amodal representation for the two modalities.
We studied tactile–tactile and visual–visual intramodal matching aiming to identify cortical fields engaged in processing and representation of tactile and visual shape. Putative polysensory areas thus should be activated by tactile as well as visual intramodal shape matching. We also studied tactile–visual cross-modal matching of shape with the purpose of identifying cross-modal-specific areas. Neither any polymodal nor any cross-modal cortical areas were found; instead the claustrum was specifically activated by cross-modal shape matching.
MATERIALS AND METHODS
Eight young male volunteers (aged between 22 and 32 years; mean, 26 years) participated in the study. All of the subjects gave informed consent according to the requirement of the Ethics Committee and the Radiation Safety Committee of the Karolinska Institute. None had previous or present history of significant medical illness, and all had a normal magnetic resonance imaging (MRI) scan. All subjects were right-handed, according to the Edinburgh questionnaire (Oldfield, 1971).
Two similar sets of 15 spherical ellipsoids weighing all the same but having different shapes were used as stimuli (Fig.1).
The regional cerebral blood flow was measured during four different conditions, each scanned three times. These were matching tasks in tactile–tactile (TT), tactile–visual (TV), and visual–visual (VV) modalities, and a control condition.
The experimenter was standing beside the subject. The stimuli were presented as pairs. The pairs were constituted in the following way: one-fourth were matching ellipsoids of identical shapes; another fourth were ellipsoids having one step difference between each other; a third fourth were ellipsoids having two steps difference between each other; and finally one-fourth were three steps different. We define a step as the gap between two ellipsoids having the least difference in shape (for a detailed description of the stimuli, see Roland and Mortensen, 1987). The different pairs were presented in a random order. The subjects had not seen the objects before the experiment, nor had they an idea about the number of objects that were in the series. The TT tasks were always done first, because we did not want the subjects to see the stimuli before the experiments requiring visual exposure. The other conditions were randomized.
TT matching. The subjects were instructed to fix a point on the presentation shelf seen in the mirror during the whole procedure. The tactile stimuli were presented sequentially in pairs to the right hand of the subjects. The subjects received the ellipsoids in their palm in different orientations and explored them by palpation between fingers and opposing thumb. The first ellipsoid in the pair was palpated for 3–4 sec, and the second was palpated for 2–3 sec. A new pair was presented approximately every eighth second. The subjects were not able to see the ellipsoids. If they thought that the two ellipsoids had identical shapes, the subjects had to raise their right thumb.
TV matching. The tactile and the visual stimuli were presented simultaneously. The subjects had ∼3 sec to palpate one ellipsoid (without being able to see it) and three seconds to look at the ellipsoid that was presented on a shelf placed over their head. A new pair was presented approximately every fourth second. A mirror allowed them to see the objects at a distance of 60 cm. The subjects again extended their right thumb if they thought that the two ellipsoids were identical.
VV matching. The first ellipsoid in the pair was presented on the shelf for ∼4 sec, and then the second ellipsoid was presented for ∼2 sec. A new pair was presented every eighth second. The subjects answered by raising their right thumb if they thought that the two ellipsoids had identical shapes.
Control. Subjects fixated the same fixation point as in the TT experiment and were instructed to move their right hand in the same way as if they were actually palpating the objects.
The monitoring of eye movement was done by a video recording on a split screen of the subject’s face. The movement of the right hand of the subjects was recorded as well. Subjects were told not to speak during the entire procedure, and the room was kept as quiet as possible.
Before the positron emission tomographic (PET) experiment, five additional subjects underwent psychophysical testing of the same conditions. The only notable exception is that we tested separately visual–tactile matching and tactile–visual matching to be sure that there was no difference to be noted depending on the modality presented first. Each subject went through four runs of 120 pairs of ellipsoids, with a priori varying probabilities of matching. Receiver operating characteristic curves and d′ values measuring the separation between the means of the noise and signal distribution were calculated for each subject in each modality (Green and Swets, 1966).
Each subject, lying in a supine position and equipped with a stereotactic helmet (Bergström et al., 1981), had a high-resolution MRI scan and a PET scan. The MRI scans were done using a spoiled gradient echo sequence obtained with a 1.5 T General Electric Signa scanner [echo time, 5 msec; repetition time, 21 msec; flip angle, 50°, giving rise to a three-dimensional (3D) volume of 128 × 256 × 256 in isotropic voxels of 1 mm3]. Each subject had an arterial catheter inserted under local anesthesia in the left radial artery for the measurement of arterial concentration of radiotracer.
The regional cerebral blood flow (rCBF) was measured in 3D acquisition mode with a Siemens ECAT EXACT HR PET camera (for technical description, see Wienhard et al., 1994). The radiotracer used was15O-labeled butanol, which was synthesized according to the method of Berridge et al. (1991). Fifteen microcuries of radiotracer were injected intravenously as a bolus at the beginning of each run, followed by a 20 ml flush with saline. The tasks began ∼15 sec before the radiotracer injection and proceeded throughout the duration of the scan (180 sec). During this time, the subjects could match ∼22 pairs of stimuli. The rCBF was calculated by an autoradiographic procedure by taking frames between 0 and 60 sec (Meyer, 1989). The sinograms were reconstructed with a cutoff frequency of 0.5 cycle with a Ramp filter, and the reconstructed image was subsequently filtered with a 4.2 mm full-width half-maximum 3D isotropic Gaussian filter.
The individual MRI and rCBF images were standardized anatomically using the human brain atlas of Roland et al. (1994). To reduce variance of the rCBF measurements the global blood flow was normalized to 50 ml · 100 gm−1 · min−1.
The statistical analysis tested the hypothesis that clusters of high t values occur by chance in the standard anatomical space and was described in detail previously (Roland et al., 1993). In short, individual voxel-by-voxel rCBF subtraction images were calculated for each subject and averaged. For example, for a given test repeated i times in subject k: These images were subsequently averaged. These individual mean images were used to calculate a group mean image: and eventually a t image was calculated: The spatial three-dimensional autocorrelation was then determined in Student’s t pictures obtained from the ΔrCBFi,k images of subtracting two TT matching images to give a noise image, as described by Roland et al. (1993). The resulting noise t pictures were thresholded at different tvalues, and clusters of voxels of suprathreshold values were identified. We simulated 2000 groups of eight subjects each for eacht threshold. From these noise t picture tables of clusters of suprathreshold t values, exceeding a certain number of voxels, were produced (Roland et al., 1993). The criteria used for accepting rCBF changes in adjacent clustered voxels as activations were set so that there was an average probability ofp < 0.1 of finding one false-positive cluster or more within the three-dimensional space of the standard anatomical brain format. Accordingly the descriptive Student’s t pictures were thresholded such that the (omnibus) probability of finding one or more false-positive clusters was p < 0.1. The resulting cluster images thus show only the activated parts of the brain and zero elsewhere. The significance of each cluster was also assessed by nonparametric method of Holmes et al. (1996). For a t threshold of 2.5 this method gave p < 0.1 for one or more false-positive clusters of ≥900 mm3 in size.
These cluster images are henceforth referred to as TT–control, TV–control, VV–control, TV–TT, and TT–VV. For the convenience of the reader, the different significant clusters are listed in Tables 2and 3 in accordance with the method of Roland et al. (1993). In Tables2 and 3 the average t value of each cluster is also calculated as the mean of the values of the voxels constituting the cluster.
The cluster images TT–control, TV–control, VV–control, TV–TT, and TT–VV were then used to form Boolean intersection images (Ledberg et al., 1995), as, for example, TT–control ε TV–VV. This Boolean intersection carries no assumptions and shows the intersections or overlaps of clusters that correspond to cortical or subcortical regions active in both TT–control and TV–VV. For example, if a cluster in TT–control has p < 0.05 of being a false-positive, the probability that any cluster from TV–control by chance will overlap this is p < 0.05, because the prerequisite for overlap is that the TT–control cluster is present.
The psychophysical testing showed that there is a linear relationship between presented and chosen stimulus, regardless of the modalities, as shown by the linear regression curves. The direction of cross-modal information transfer (i.e., tactile to visual vs visual to tactile) had no influence on the performance (Fig.4).
During the actual PET scanning, the probability of a responding match given a matching pair, i.e., p(match ‖ match), and the probability of a responding match given a nonmatching pair, i.e.,p(match ‖ nonmatch), was calculated for each subject. On the basis of these probabilities a measure of the performanced′ was calculated (Greens and Swets, 1966). The subjects hadd′ values between 0.40 and 2.88, indicating that all of them actually performed above chance or noise level. Furthermore, this range of d′ values was comparable to the range of 0.65–3.0 obtained by the five subjects performing the psychophysical test outside of the PET camera, indicating no major differences between the two groups. For the subjects doing PET, there were no differences ind′ between conditions, indicating the same level of difficulty between the tasks. By paired comparison of d′values, the subjects being their own control, the average intrasubject differences in d′ were TV–VV, −0.66 ± 0.93 (SD); and TV-TT, −0.08 ± 0.61.
To examine whether the motor activity of exploring the ellipsoids tactually was balanced between the matching tasks (i.e., TV and TT), between TV and the control condition, and between TT and the control condition, we analyzed the frequencies of movements of the individual fingers of the subjects doing PET. By a paired comparison, the subjects being their own controls, the average intrasubject differences were in TT-control for the thumb (−0.05 ± 0.08 Hz) and for the index (0.02 ± 0.12 Hz). The corresponding differences in frequencies for TV control were for the thumb (−0.02 ± 0.07 Hz) and for the index (−0.06 ± 0.13 Hz). The number of thumb and index movements in TT, TV, and control did not vary very much for a given subject: the intrasubject SDs were between 0.6 and 8%. The subjects fixated the fixation point as instructed during the TT and the rest condition, and there was no significant eye movement while the subjects were presented the visual stimuli in TV and in VV.
Regional cerebral blood flow changes
Tale 1 shows the rationale of the different subtractions. Somatosensory areas and perhaps polysensory areas participating in the formation of tactile representation of the stimuli were assumed to be active in tactile–tactile minus control, tactile–visual minus control, and tactile–visual minus visual–visual. By calculating the intersection between the two independent cluster images of TV–VV and TT–control, we expected to isolate those areas involved in the formation of the tactile representation of the stimuli. Visual areas and perhaps polysensory areas were expected to be activated during the visual matching in the following paradigms: visual–visual minus control, tactile–visual minus control, and tactile–visual minus tactile–tactile. The overlap between TV–TT and VV–control was expected to show areas specifically engaged in the formation of the visual representation of the stimuli. The overlap between the two cluster images of the cross-modal matching tasks, i.e., TV–TT and TV–VV, would isolate areas specifically engaged in the cross-modal matching procedure, whereas the overlap between all of the tasks minus control would isolate polysensory areas activated regardless of the mode of stimulation. Tables 2 and3 show the location (i.e., center of gravity), volume, and mean t value of the fields of activation in the different tasks. Tables4-6show the location and extent of the overlaps of fields engaged in the formation of tactile–visual representation of the stimuli and in the cross-modal transfer of information.
Tactile–tactile minus control
Several fields of activation were found in the parietal lobe. The biggest increase of rCBF was situated in the left postcentral gyrus, extending posteriorly from the posterior part of the gyrus and the cortex lining the postcentral sulcus into superior parietal lobule and the anterior part of the intraparietal sulcus. A second focus of activation was found in the right parietal lobe, situated in the supramarginal gyrus (Table 2).
Other foci of activation were found in the right thalamus, the right temporal pole, and the left anterior prefrontal cortex. The cerebellum showed several foci of activation bilaterally.
Tactile–visual minus control
Fields of activation were found mainly in the occipital, temporal, and parietal lobes.
The biggest activation cluster was situated on the left cuneus, extending anteroposteriorly along the calcarine sulcus. A second cluster was situated on the left superior occipital gyrus, on its posterior part. A third cluster was situated on the left middle occipital gyrus. The right occipital cortex contained one cluster of activation that was situated on the lingual gyrus and the cortex lining the collateral sulcus, extending anteriorly to the fusiform gyrus (Table 3).
In the left parietal lobe, a field of activation was found in the postcentral gyrus, extending medially onto the cortex lining the postcentral gyrus and on the posterior parietal cortex, including the cortex lining the anterior part of the intraparietal sulcus. A second cluster was situated on the left superior parietal lobule. A third cluster was found in the left precuneus and in the cortex lining the parieto-occipital sulcus. On the right parietal lobe, we found a cluster of activation on the superior parietal lobule that was situated more posteriorly than the one on the left side (Table 2).
Other foci of activation were found in the right thalamus, in the anterior prefrontal cortex bilaterally, and in the right cerebellum.
Two minor clusters at the location of the right insula–claustrum were found of 172 and 143 mm3 with mean tvalues 3.35 and 3.34, respectively. These were not statistically significant.
Visual–visual minus control
In the occipital lobes, several clusters of activation were found bilaterally. The most important one was situated on the left lingual gyrus, extending anteroposteriorly along the calcarine sulcus (Table3).
The inferior part of the middle occipital gyri was bilaterally activated, and the left superior part of the middle occipital gyrus also showed a cluster of activation. The right fusiform gyrus was activated in its posterior part.
The parietal lobes showed two foci of activation on the right side. One was situated in the superior parietal lobule and extended to the cortex lining the intraparietal sulcus, and another was in the angular gyrus.
We found bilateral fields of activation in the anterior and posterior cingulate cortex and in the anterior part of the middle frontal gyrus. The left anterior insula was also activated.
Tactile–visual minus visual–visual
According to our hypothesis, this subtraction should reveal the areas engaged in the tactile exploration of the stimuli and in cross-modal matching. Clusters of activation were found in the parietal lobes, with a big cluster of activation centered on the left postcentral gyrus, extending anteriorly to the cortex lining the central sulcus and posteriorly to the postcentral sulcus and the anterior part of the intraparietal sulcus as well as going more medially on the superior parietal lobule; a second cluster was situated on the right postcentral gyrus. Another cluster was found at the bottom of the left central sulcus. The left middle cingulate and the left thalamus were activated. The cerebellum showed bilateral foci of activation. Finally, a cluster of activation was situated in the right insula–claustrum region.
Tactile–visual minus tactile–tactile
This subtraction image was hypothesized to reveal the areas engaged in the visual perception of form as well as in cross-modal matching. Fields of activation were found in the occipital and parietal cortex as well as in the thalamus and in the right insula–claustrum region (Table 3).
In the occipital cortex, the middle occipital gyri were activated bilaterally; other foci of activation were found in the left lingual gyrus extending on to the collateral sulcus and in the left posterior and anterior fusiform gyrus.
The parietal cortex showed areas of activation in the left precuneus, in the right superior parietal gyrus, and in the cortex lining the intraparietal sulcus as well as in the right supramarginal and angular gyri.
The pulvinar nucleus of the thalamus was activated bilaterally, and a cluster of activation was found in the left hippocampus. At a more liberal level of significance (p < 0.6) a small cluster (t = 6.3) was found in the left superior colliculus (Talairach and Tournoux, 1988; coordinate 6, −30, 2). Fields of activation were found in the left cerebellum.
A cluster of significant activation was found in the right insula–claustrum region, with a center of gravity situated in the claustrum.
TT–control ∩ TV–VV
This intersection of two independent cluster images was supposed to reveal the areas specifically activated in the formation of the tactile representation of the stimuli (Fig.5). A cluster of activation was found in the left parietal lobe, on the left postcentral gyrus, which was extending into the cortex lining the postcentral sulcus and the anterior intraparietal sulcus, and in the posterior parietal cortex. A second one was situated more anteriorly on the postcentral gyrus, extending to the cortex lining the central sulcus (Table 4).
Two clusters of activation were found in the right cerebellum.
That two activated fields,X i,A andX i,B, originating from the respective cluster images A and B, reflect activity in approximately the same synaptic cortical field can be addressed in a forward way. Let the estimated volumes and centers of gravities of a cluster in TT–control and a cluster in TV–VV be Vi,TT′ and Vi,TV–VV and TTi,cogand TV–VVi,cog, respectively. In this case, VTT′ < VTV–VV (Table 2). A reasonable criterion of judging whether two clusters reflect activation in the same location is that the overlap, Q, produced by the two clusters is equal to or greater than half of the volume of the smallest of the two clusters, and the centers of gravity of the two clusters are included in the overlap (Ledberg et al., 1995). i.e.: This was fulfilled for all three intersections (overlaps) as seen by comparing Tables 2 and 4.
VV–control ∩ TV–TT
This intersection of two independent cluster images was supposed to show the areas specifically engaged in the formation of the visual representation of the stimuli (Fig. 6). We found all the clusters of activation in the occipital lobes. (Table5). The lingual gyri and collateral sulci were activated bilaterally, as were the fusiform gyri. Other foci of activation were found in the right hemisphere in the superior occipital gyrus and in the left hemisphere in the middle occipital gyrus, the left cuneus, and the cortex lining the parieto-occipital sulcus, and in the cortex lining the calcarine sulcus. Of these overlaps the centers of gravity and the volume of overlap in the middle occipital gyri produced by the two clusters was greater than half of the volume of the smallest of the two clusters, and the centers of gravity of the two clusters were included in the overlap.
TV–TT ∩ TV–VV
This intersection image was supposed to show the areas engaged in cross-modal transfer of information (Fig.7). Only one cluster of activation was found, which was situated in the right insula–claustrum, with a center of gravity situated toward the claustrum (Fig.8, Table 6). Here also the centers of gravity of the two clusters were included in the overlap.
TT–control ∩ TV–control ∩ VV–control
This intersection image could isolate the polysensory areas engaged by processing of somatosensory and visual shape information, regardless of the modality. We did not find any significant cluster of activation by performing these intersections.
This experiment, designed for each subject with three runs of four different tasks, gave the opportunity to study, within the same group of subjects, different aspects of the somatosensory and visual processing of shape and to isolate the structures involved in cross-modal transfer of shape.
The motor measurements showed that the matching conditions were mutually balanced and balanced against the control condition for motor activity of the right hand. Accordingly, no activations appeared in any of the motor cortices in any of the subtractions.
In addition, the signal energies for the somatosensory and visual shape stimulation (Roland and Mortensen, 1987) were also balanced across the three matching conditions. Accordingly, we observed no changes in visual cortices in TV–VV and no changes in somatosensory cortices in TT–TV. One might argue that the attention in TV was divided between the somatosensory and the visual modalities, whereas it was allocated to the visual modality in VV and the somatosensory modality in TT. However, to balance the allocation of attention toward the somatosensory and the visual modality between conditions TV and TT and TV and VV, the number of matchings in TV were twice the number during TT and VV.
The performance of the subjects, reflected by their d′values, were balanced across matching conditions. This made it unlikely that differences between conditions could be interpreted as differences in attention and/or task difficulty. The attentional effects or rCBF may vary between TT and VV. Cross-modal attentional effects in visual tasks tend to decrease the rCBF in somatosensory areas, and somatosensory tasks tend to decrease rCBF in visual areas (Haxby et al., 1994; Kawashima et al., 1995). It is not unlikely that the greater pace of the TV compared with the TT condition and the VV condition might imply a higher rate of switching attention from the tactile to the visual modality and vice versa. This could be a possible explanation for the pulvinar activation seen in TV–TT and TV–VV (Petersen et al., 1985, 1987).
Tables 2 and 4 show the fields activated every time the subjects perceived the ellipsoids tactually. Fields specifically engaged in the haptic processing of the ellipsoids were found by computing the intersection of the cluster images of TT–control and TV–VV. They were located to the contralateral postcentral gyrus, superior parietal lobule, and the cortex lining the anterior part of the intraparietal sulcus. The cortex lining the postcentral sulcus, anterior part of the intraparietal sulcus, and the cortex of the anterior part of the superior parietal lobule have in other studies been activated specifically during haptic processing of shape and length of objects (Roland and Larsen, 1976; Seitz et al., 1991; O’Sullivan et al., 1994;Roland et al., 1996). Together with the present results, this strongly indicates that neurons in these regions are engaged in the formation of the haptic shape. We use the expression haptic processing of shape to note that although the experimental design attempted to balance allocation of attention between the somatosensory and visual modalities, we cannot exclude that in the intersection TT–control ε TV–VV, the effect of somatosensory attention to shape might not be separable from the processing of shape information.
Structures involved in visual processing of the shape of the ellipsoids, revealed in VV–control, TV–control, and TV–TT, showed similar patterns of activation of the visual cortex. Fields solely engaged in visual processing of the shapes of the objects, and to some extent attention to visual object shape, were isolated in the intersection of the two independent cluster images of VV–control with TV–TT, which revealed several areas in the occipital cortex. All areas—lips of the calcarine sulcus, cuneus, lingual gyrus, and the cortex lining the collateral sulcus, fusiform gyrus, and the middle occipital gyrus—have been described with different methods as being visual areas (Clarke and Miklossy, 1990, Zilles and Schleicher, 1993,Clarke, 1994a,b; Hadjikhani et al., 1994; Hadjikhani, 1995, Clarke et al., 1995; Sereno et al., 1995; Van Essen et al., 1995a,b). Of these regions, the lingual, fusiform, and occipital gyri have been activated by perception or discrimination of visual form and geometrical patterns (Gulyas and Roland, 1994; Gulyas et al., 1994; Roland and Gulyas, 1995).
We did not find any polymodal areas, i.e., fields of activation present consistently in tactile–tactile, tactile–visual, and visual–visual matching versus control. But we found an area consistently activated in the two subtractions of task involving cross-modal transfer, i.e., TV–TT and TV–VV, that was situated in the right insula–claustrum.
The neural structures participating in cross-modal transfer have been for a long time matters of debate, because lesion studies were never able to point to a particular structure consistently involved when cross-modal deficits were present. With notable exceptions (Ettlinger and Wilson, 1990), cross-modal research has been generally based on the assumption that there must be a special process to deal with the confluence of different sensory input in “polysensory convergence areas” (Pandya and Kuypers, 1969; Jones and Powell, 1970; Petrides and Iversen, 1976). The earliest attempts to study the effects of brain lesions on cross-modal performance used the cross-modal recognition method of Cowey and Weiskrantz (1975). Sahgal et al. (1975) andPetrides and Iversen (1976) reported impairment in cross-modal (tactile–visual) matching abilities after posterior temporal and prestriate removal and after lesions of the arcuate sulcus cortex. In more recent studies, authors have used a different cross-modal recognition paradigm (Jarvis and Ettlinger, 1977) in which monkeys learn cross-modality (vision or touch) discrimination tasks. Cortical lesions involving the superior temporal sulcus and the lateral prefrontal region in monkeys did not produce deficits (Ettlinger and Garcha, 1980). Streicher and Ettlinger (1987) examined cross-modal performance for entirely new and unfamiliar objects. Lesions in the frontal, temporal, and parietal cortex gave rise to impairment of cross-modal recognition of unfamiliar objects, despite normal performance on familiar objects.
Thus lesions to the cortex claimed as polymodal, e.g., the cortex lining the superior temporal sulcus, the intraparietal sulcus, the amygdala, and the lateral prefrontal cortex, have failed to abolish cross-modal matching consistently and specifically (Cowey and Weiskrantz, 1975; Sahgal et al., 1975; Petrides and Iversen, 1976;Jarvis and Ettlinger, 1977; Ettlinger and Garcha, 1980; McNally et al., 1982; Murray and Mishkin, 1984; Streicher and Ettlinger, 1987; Nahm et al., 1993). With the exception of the prefrontal cortex, none of these areas was activated by TV, and neither was the superior colliculus, also claimed a polymodal structure (Stein et al., 1976). The amygdala and the cortex lining the superior temporal sulcus were not activated in cross-modal matching (i.e., TV–TT ε TV–VV), even if the threshold was set quite liberally. Ettlinger and Wilson (1990)suggested an alternative model for the mechanism of cross-modal performance, claiming a so-called “leakage” between perceptual and memory systems. On the basis of a 2-deoxyglucose study in monkeys trained to a high level of cross-modal performance, they suggested that one pathway for such leakage may be through the ventral claustrum (Hörster et al., 1989).
Then how does the brain match visual shape with somatosensory shape? Is it possible that the cortical fields representing visual shape communicate with the cortical fields representing somatosensory shape? From studies in monkeys there seems to be no support for such anatomical arrangement. The possible candidates, areas 7a and 7b, are not interconnected to any significant extent (Cavada and Goldman-Rakic, 1989; Andersen et al., 1990; Neal et al., 1990), and we do not know where the homologs of areas 7a and 7b are in humans. Neurophysiological studies in monkeys support the notion of parallel processing of visual and tactile shape, because neurons in a TV task in the somatosensory cortex react only to the tactile components immediately and during the short delay, and neurons in the visual association cortex react only to the visual components (Maunsell et al., 1991; Zhou and Fuster, 1996).
The left lateral prefrontal cortex was activated in TT–control, in VV–control, and in TV–control as the only consistent example of activations in nonsomatosensory nonvisual cortex (Tables 3, 4). Although these functional fields were located near each other, they did not overlap. Thus the lateral prefrontal cortex may be engaged in matching of two stimuli, but stimuli originating from different modalities do not engage the same prefrontal functional field. However, that the active fields did not overlap does not exclude that they might communicate. Indeed, bilateral cooling of a larger lateral prefrontal region in monkeys reversibly hampers not only TV and VT cross-modal matchings but also TT matchings (Shindy et al., 1994).
Direct communication between cortical fields representing visual shape and cortical fields representing somatosensory shape or the communication of each of these sets of representative fields with a third common cortical field might not be necessary for the matching. However, if one wants to advocate a fully parallel processing of somesthesis and vision, it is difficult to envisage how the matching is actually achieved. On the basis of our results, a speculative solution is that the populations of neurons associated with somatosensory processing of shape synchronize their activity with the populations associated with the visual processing (Singer, 1995). If this is so, then communication must exist at at least one location to facilitate this synchronization.
In previous experimental studies, short-term memory components have been a confounding factor. In our approach the TV did not contain any memory component, whereas TT and VV had short delay between the first and second object. Because we observed no changes or decreases in the claustrum–insula region in TT–control or in VV–control, the consistent activation of the claustrum–insula cannot be attributed to this. Although unlikely, it cannot be excluded that the claustrum–insula activation may be partly attributable to the fact that the number of matchings were twice those of TT and VV. Against this is the fact that the claustrum–insula was not active in TT–control and VV–control.
Other studies may support the idea of the involvement of the claustrum in cross-modal transfer of information. The claustrum is best developed in primates, cetaceans, and carnivores. Its size is roughly in proportion with cortical volume. The claustrum is connected with virtually all of the cerebral cortex. Its connections have been studied mostly in cats and nonhuman primates (Neal et al., 1986; Hinova-Palova et al., 1988; Hörster et al., 1989; Cortimiglia et al., 1991;Boussaoud et al., 1992; Clasca et al., 1992; Morecraft et al., 1992;Baizer et al., 1993; Steele and Weller, 1993; Tokuno and Tanji, 1993;Updyke, 1993; Webster et al., 1993) (for review, see Sherk, 1986). The different studies show that the claustrum receives and gives rise to direct cortical projections and that it contains maps of different sensory (visual, auditory, and somatosensory) and motor systems.
A recent study of projection by retrograde labeling from claustrum to S1 and V1 done by Minciacchi et al. (1995) in the cat shows a clear topographic organization, composed of two parts. In the somatosensory claustrum, there is a progression of cells projecting to hindpaw, forepaw, and face representation. The visual claustrum has a retinotopical organization, and claustral neurons project in a retinotopical manner to corresponding parts of V1. A second pattern of claustrum projections is composed of neurons distributed diffusely throughout the nucleus. In both somatosensory and visual claustrum, they intermingle with the topographically projecting cells.
In the monkey, Webster et al. (1993) demonstrated that portions of the claustrum connected with TEO and TE appear to overlap portions connected with other cortical areas, including V1, V2, V4, MT, MST, inferior prefrontal cortex, frontal eye fields, and posterior parietal cortex. Tokuno et al. (1993) showed reciprocal connections between the primary motor area in the monkey and the claustrum, and Baizer et al. (1993) demonstrated in the monkey that cells in the claustrum project both to temporal and parietal cortex, and that there are two representations of face and hand.
We found the insula–claustrum consistently active only when somatosensory shape representations were compared with visual shape representations, whereas we did not find any polymodal areas active during the processing of somatosensory as well as visual shape information. This and other studies support the involvement of the insula–claustrum in cross-modal transfer of information. The claustrum may be a site of organized and direct interaction between modality-specific areas. Because only the somatosensory areas were specifically active in the formation of the somatosensory representation of shape, and because only visual areas were specifically active in the formation of the visual representation of shape, we propose here that, instead of being based on modality-nonspecific representations in polysensory areas, cross-modal transfer takes place between modality-specific areas, and that those modality-specific areas can communicate via the claustrum. This does not, however, exclude that communications for the purpose of matching may exist at other locations, for example, in the prefrontal cortex. Because the claustrum is a small nucleus difficult to distinguish from the insula with PET, more studies are needed with more sensitive techniques to confirm our hypothesis that the claustrum plays a crucial role in cross-modal transfer of information.
This work was supported by the Swiss National Foundation for Scientific Research, the Societé Académique Vaudoise, and the Volvo foundation.
Correspondence should be addressed to Dr. Nouchine Hadjikhani, Massachusetts General Hospital-Nuclear Magnetic Resonance Center, Building 149, 13th Street, Charlestown, MA 02129.