Perception of the acoustic world requires the simultaneous processing of the acoustic patterns associated with sound objects and their location in space. In this functional magnetic resonance experiment, we investigated the human brain areas engaged in the analysis of pitch sequences and sequences of acoustic spatial locations in a paradigm in which both could be varied independently. Subjects were presented with sequences of sounds in which the individual sounds were regular interval noises with variable pitch. Positions of individual sounds were varied using a virtual acoustic space paradigm during scanning. Sound sequences with changing pitch specifically activated lateral Heschl's gyrus (HG), anterior planum temporale (PT), planum polare, and superior temporal gyrus anterior to HG. Sound sequences with changing spatial locations specifically activated posteromedial PT. These results demonstrate directly that distinct mechanisms for the analysis of pitch sequences and acoustic spatial sequences exist in the human brain. This functional differentiation is evident as early as PT: within PT, pitch pattern is processed anterolaterally and spatial location is processed posteromedially. These areas may represent human homologs of macaque lateral and medial belt, respectively.
Considerable controversy surrounds the anatomical and functional organization of the human cortical auditory system (Cohen and Wessinger, 1999; Belin and Zatorre, 2000; Romanski et al., 2000; Middlebrooks, 2002; Zatorre et al., 2002). In nonhuman primates, distinct ventral “what” and dorsal “where” auditory processing streams have been proposed on electrophysiological grounds (Kaas and Hackett, 2000; Rauschecker and Tian, 2000; Tian et al., 2001). In humans, anatomical (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Galuske et al., 1999; Tardif and Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al., 2001; Warren et al., 2002), electrophysiological (Alain et al., 2001; Anourova et al., 2001) and lesion (Clarke et al., 2000) data are consistent with an anterior auditory cortical what pathway that processes sound object information and a posterior where pathway that processes spatial information. However, the extent and functional basis of any such separation of processing remains contentious (Cohen and Wessinger, 1999; Belin and Zatorre, 2000; Middlebrooks, 2002; Zatorre et al., 2002). Representative previous human functional imaging studies of auditory what and where processing are summarized in a supplemental table (available at www.jneurosci.org).
It has recently been proposed that the human planum temporale (PT) plays a critical role in disambiguating the intrinsic properties of sounds from the acoustic correlates of spatial location, before further processing of those specific attributes in distinct cortical areas (Griffiths and Warren, 2002). PT is a large region of auditory association cortex, occupying the superior temporal plane posterior to Heschl's gyrus (HG) (Westbury et al., 1999). PT is involved in processing many different types of sound patterns, including both intrinsic spectrotemporal features of sound objects and auditory spatial information (Griffiths and Warren, 2002). Taken together, the results of a number of functional imaging studies [summarized by Griffiths and Warren (2002)] suggest that distinct subregions for processing particular sound attributes may exist within human PT: however, its functional architecture has not been established (Recanzone, 2002).
In this functional magnetic resonance imaging (fMRI) experiment, we tested the hypothesis that there are distinct cortical substrates for processing pitch patterns and the location of sounds in space by comparing directly the processing of sequences of pitch and sequences of spatial positions. Specifically, we hypothesized that pitch sequences are processed in a network of areas including lateral HG, PT, and planum polare (PP) (Patterson et al., 2002), whereas spatial information is processed in a posterior network that includes PT and inferior parietal lobe (IPL) (Pavani et al., 2002; Warren et al., 2002; Zatorre et al., 2002). We predicted a common involvement of PT in both tasks and were interested specifically in the possibility that distinct subregions of PT may be associated with each task. The stimuli were sequences of sounds with temporal regularity and associated pitch [iterated ripple noise (IRN)] presented in virtual space. Like natural sound objects, these broadband stimuli can be localized accurately in external acoustic space. However, their associated pitch and spatial characteristics can be varied independently in a factorial experimental design.
Materials and Methods
Stimuli were based on either IRN or fixed amplitude, random phase noise with passband 1 Hz to 10 kHz, created digitally at a sampling rate of 44.1 kHz. Stimuli were convolved with generic head-related transfer functions (HRTFs) (Wightman and Kistler, 1989) to create a percept of external location in virtual acoustic space. Sounds were combined in sequences containing either 25 or 23 elements in which the duration of each individual element was fixed at 250 msec with an intersound pause of 75 msec. The pitch of the IRN stimuli either remained fixed throughout the sequence or was varied randomly among the first six elements of a 10-note octave spanning 70–140 Hz. Sounds were located at one of four initial spatial positions: 0, 90, 180, or -90° in azimuth. The spatial location of the sound either remained fixed or was varied randomly from element to element. Sequences with changing spatial location were generated from four different combinations of azimuthal positions: the step between successive azimuthal positions could be ±20, 30, or 40° in size, and the order and direction (clockwise or counterclockwise) of steps was randomized. The pitch of the first and last element and the spatial location of the first and last element were constrained to be identical in any given sequence. The experimental paradigm is represented schematically in Figure 1.
Subjects (five males, seven females) were aged 23–38. All were righthanded. None had any history of hearing or neurological disorder, and all had normal structural MRI scans. All subjects gave informed consent, and the experiment was performed with the approval of the local Ethics Committee.
During fMRI scanning, stimuli were delivered using a custom electrostatic system (http://www.ihr.mrc.ac.uk/caf/soundsystem/index.shtml) at a sound pressure level of 70 dB. Blood oxygenation level-dependent (BOLD) contrast images were acquired at 2 T (Siemens Vision, Erlangen, Germany) using gradient echo planar imaging in a sparse protocol (repetition time/echo time = 12,000/40 msec) (Hall et al., 1999). Each volume comprised 48 contiguous 4 mm slices with an in-plane resolution of 3 × 3 mm. Seven stimulus conditions, each corresponding to a different type of sound sequence and a distinct percept, were used (Fig. 1): (1) IRN with fixed pitch and fixed spatial position (fixed pitch notes with fixed location in azimuth); (2) IRN with changing pitch and fixed spatial position (changing pitch notes at a fixed azimuthal location); (3) IRN with fixed pitch and changing spatial position (fixed pitch notes at a sequence of azimuthal locations); (4) IRN with changing pitch and changing spatial position (changing pitch notes at a sequence of azimuthal locations); (5) fixed amplitude random phase noise with fixed spatial position (a noise burst at a fixed azimuthal location); (6) fixed amplitude random phase noise with changing spatial position (a noise burst at a sequence of azimuthal locations); (7) silence. Subjects were pretested before scanning with examples of stimuli based on each generic HRTF to select the HRTF that gave the most reliable percept of an external sound source during scanning. All subjects perceived the stimuli used during scanning as originating from locations outside the head. In sequences during which spatial location varied, the percept was an instantaneous “jump” between consecutive positions. Sequences were presented in randomized order. Two hundred twenty-four brain volumes were acquired for each subject (16 volumes for each condition, in two sessions). Subjects were asked to attend to the sound sequences. To help maintain alertness, they were required to make a single button press with the right hand at the end of each sequence (25 element and 23 element sequences were presented in random order) and to fixate a cross piece at the midpoint of the visual axes.
Each subject's ability to detect changes in pitch pattern, changes in spatial pattern, or simultaneous changes in both types of pattern was assessed psychophysically immediately after scanning using a two-alternative, forced-choice procedure. Subjects listened to pairs of sound sequences in which each sequence contained seven elements that varied either in pitch or spatial location or both simultaneously. The task was to detect a single difference in pitch or spatial pattern associated with changing one element between the members of each pair. Psychophysical test sequences were based on the same pitch and spatial parameters as those used during scanning; noise-based versions were also included. All subjects could easily detect sequences that differed only in pitch pattern (mean correct response rate 84%), sequences that differed only in spatial pattern (mean correct response rate 78%), and sequences that differed in both pitch and spatial pattern (mean correct response rate 78%). Oneway ANOVA did not show any effect of trial type on performance at the p < 0.05 significance threshold.
Imaging data were analyzed for the entire group and for each individual subject using statistical parametric mapping implemented in SPM99 software (http//:www.fil.ion.ucl.ac.uk/spm). Scans were first realigned and normalized spatially (Friston et al., 1995) to the Montreal Neurological Institute (MNI) standard stereotactic space (Evans et al., 1993). Data were smoothed spatially with an isotropic Gaussian kernel of 8 mm full width at half maximum. Statistical parametric maps (SPMs) were generated by modeling the evoked hemodynamic response for the different stimuli as boxcars convolved with a synthetic hemodynamic response function in the context of the general linear model.
In the group analysis, BOLD signal changes between conditions of interest were assessed using a random effects model that estimated the second level t statistic at a significance threshold of p < 0.05 after false discovery rate correction for multiple comparisons (Genovese et al., 2002). Individual subject data were analyzed to further assess the anatomical variability of pitch and auditory spatial processing within the group. In the analysis of each individual subject, BOLD signal changes between conditions of interest were assessed by estimating the t statistic for each voxel at a significance threshold of p < 0.05 after small volume correction taking the a priori anatomical hypotheses into account. For the pitch conditions, anatomical small volumes that included right and left lateral HG, PP, and PT were derived from the group mean normalized structural MRI brain volume and 95% probability maps for left and right human PT (Westbury et al., 1999). For the spatial conditions, anatomical small volumes were based on 95% probability maps for left and right human PT (Westbury et al., 1999).
In the group random effects analysis, significant activation was demonstrated in each of the contrasts of interest at the p < 0.05 voxel level of significance after false discovery rate correction for multiple comparisons. Broadband noise (without pitch) compared with silence produced extensive bilateral superior temporal activation, including medial HG (Fig. 2b, center). The contrasts between conditions with changing pitch and fixed pitch (main effect of pitch change) and between all conditions (both pitch and noise) with changing spatial location and fixed location (main effect of spatial change) produced specific activations restricted to distinct anatomical regions on the superior temporal plane (Fig. 2a,b). Pitch changes (but not spatial location changes) produced bilateral activation involving lateral HG, anterior PT, and PP anterior to HG, extending into superior temporal gyrus. Lateral HG activation lay outside the 95% probability boundaries for primary auditory cortex (PAC) as defined by Rademacher et al. (2001). In contrast, spatial location changes (but not pitch changes) produced bilateral activation involving posterior PT. Within PT (Fig. 2b), activation attributable to pitch change occurred anterolaterally, whereas activation attributable to spatial change occurred posteromedially. Local maxima in the superior temporal plane for each of the main effects are listed in Table 1. Within PT, local maxima for spatial change were clearly posterior bilaterally to those for pitch change. For pitch change, additional local maxima occurred anteriorly in right PP and left lateral HG. Although no local maxima occurred in left PP and right lateral HG, these regions were clearly also activated by pitch change (Fig. 2a,b). Only a small number of voxels within PT were activated by both pitch changes and spatial location changes (Fig. 2a,b). No interactions were observed between the pitch and spatial change conditions. For both the main effect contrasts of interest, the group SPMs for left and right cerebral hemispheres were compared in a random effects analysis using a paired t test thresholded at the p < 0.05 voxel level after small volume correction taking the a priori anatomical hypotheses into account. For the main effect of pitch, anatomical small volumes were based on right and left lateral HG, PP, and PT (derived from the group mean normalized structural MRI brain volume) and 95% probability maps for left and right human PT (Westbury et al., 1999); for the main effect of space, anatomical small volumes were based on 95% probability maps for left and right human PT (Westbury et al., 1999). The distributions of activation did not differ significantly between cerebral hemispheres for either pitch or spatial processing.
Individual subject analyses (using a voxel significance threshold of p < 0.05 after small volume correction) showed activation patterns similar to the group analysis. Pitch change produced local maxima within the prespecified region (contiguous areas in each hemisphere comprising lateral HG, PT, and PP) in 10 of 12 individual subjects. Changing spatial location produced local maxima within the prespecified region (PT in each hemisphere) in all individual subjects.
This study has demonstrated distinct human brain substrates for the analysis of pitch sequences and acoustic spatial sequences in a single fMRI paradigm. These substrates comprise secondary and association auditory cortical areas beyond PAC in medial HG (Rademacher et al., 2001). A bilateral anterior network of areas dedicated to the processing of pitch sequences includes lateral HG, anterior PT, PP, and superior temporal gyrus, whereas a bilateral posterior network dedicated to the processing of spatial sequences includes posteromedial PT.
The present findings are consistent with proposed dual what and where processing pathways in the macaque (Kaas and Hackett, 2000; Rauschecker and Tian, 2000; Tian et al., 2001) and the increasing evidence for distinct anterior and posterior auditory networks emerging from human anatomical (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Galuske et al., 1999; Tardif and Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al., 2001; Warren et al., 2002), electrophysiological (Alain et al., 2001; Anourova et al., 2001), and lesion (Clarke et al., 2000) studies. In humans, the anterior network (including PP, anterior superior and middle temporal gyri, and superior temporal sulcus) has been implicated in the analysis (what) of many different types of spectrotemporal pattern, including simple spectral and temporal patterns (Griffiths et al., 1998b; Binder et al., 2000; Thivard et al., 2000; Zatorre and Belin, 2001; Hall et al., 2002; Patterson et al., 2002), musical melodies (Zatorre et al., 1994, 1996), vocal sounds (Belin et al., 2000), and speech (Zatorre et al., 1992; Scott et al., 2000; Vouloumanos et al., 2001; Wise et al., 2001). The posterior network including IPL is active in the spatial (where) analysis of both stationary (Alain et al., 2001) and moving (Baumgart et al., 1999; Warren et al., 2002) sounds. The present experiment has demonstrated distinct human auditory cortical mechanisms that are simultaneously and specifically engaged in processing different properties of sound sequences. The mechanism for processing pitch pattern is situated anteriorly, whereas the mechanism for processing spatial pattern is situated posteriorly.
Bilateral activation of the hemispheric networks that process auditory spatial and pitch sequences is evident in the present study (Fig. 2). For both pitch processing and spatial sequence processing, the distributions of activation did not differ significantly between the left and right cerebral hemispheres. Previous studies of auditory spatial processing have suggested bilateral (Pavani et al., 2002; Warren et al., 2002) or right-lateralized (Baumgart et al., 1999) activation of PT. For the processing of pitch sequences and chords, a more consistent pattern of right-lateralized activation in superior temporal lobe areas beyond PAC has been shown in a number of studies (Zatorre et al., 1994; Tervaniemi et al., 2000; Patterson et al., 2002). The contrast between random pitch and fixed pitch elements in the study of Patterson et al. (2002) is closest to the pitch change contrast used here. Patterson et al. (2002) also found bilateral activation of lateral PT and PP, although the rightward asymmetry of activation demonstrated in that study was not evident in the present experiment.
This study has shown that analysis of both pitch sequences and spatial sequences involves PT. Previous human functional imaging studies have indicated that PT is involved in the analysis of both the intrinsic spectrotemporal (Binder et al., 1996; Giraud et al., 2000; Thivard et al., 2000; Hall et al., 2002; Warren et al., 2002) and the spatial (Baumgart et al., 1999; Pavani et al., 2002; Warren et al., 2002) properties of many types of complex sounds (for review, see Griffiths and Warren, 2002). We have argued previously (Warren et al., 2002) that posteromedial PT activation is a neural correlate of the perception of acoustic space. In contrast, the network of parietal and frontal areas that have been activated inconsistently in previous studies of auditory spatial processing (Griffiths et al., 1998a, 2000; Baumgart et al., 1999; Bushara et al., 1999; Griffiths and Green, 1999; Weeks et al., 1999; Lewis et al., 2000; Alain et al., 2001; Maeder et al., 2001; Pavani et al., 2002; Warren et al., 2002; Zatorre et al., 2002) may have a role in auditory attention or (covert) motor preparation. The lack of an output task therefore may account for the absence of activation in this frontoparietal network in the present experiment.
In this study, we have demonstrated that patterns of pitch and auditory spatial location are analyzed at different sites within human PT. Pitch information is processed anterolaterally, whereas spatial information is processed posteromedially. Such functional differentiation is not evident in medial HG, the site of PAC (Rademacher et al., 2001). Although we do not dismiss the possibility that neurons within PAC may process acoustic correlates of spatial position (Toronchuk et al., 1992), the present evidence suggests that the processing of intrinsic and spatial sound properties diverges beyond PAC and as early as PT. These distinct functional subregions may correspond to the cytoarchitecturally distinct regions Te2 (medial) and Te3 (lateral) identified in the human posterior temporal plane (Morosan et al., 2001). Such a functional subdivision of human PT is consistent with anatomical and electrophysiological data in nonhuman primates. Auditory association cortices in humans and macaques share a number of cytoarchitectural features (Galaburda and Sanides, 1980). Functionally distinct medial (CM) and lateral (CL) belt areas have been described in the macaque posterior superior temporal plane (Tian et al., 2001). This region has been implicated in the analysis of sound source location (Leinonen et al., 1980; Recanzone, 2000) and proposed as the origin of an auditory dorsal stream for processing spatial information (Rauschecker and Tian, 2000). However, a certain subpopulation of neurons in area CL responds both to the spatial location of complex sounds and to specific call sounds (Tian et al., 2001). This observation and the present human evidence suggest that auditory association cortex may have a similar functional organization in humans and nonhuman primates. There is relative (rather than absolute) selectivity of medial belt areas for processing spatial information and lateral belt areas for processing object information. However, the electrophysiological properties of the medial portion of the posterior superior temporal plane are technically difficult to study in both humans and nonhuman primates. We therefore would hesitate to suggest a precise functional or anatomical homology between macaque CM and CL, human Te2 and Te3, and the posteromedial and anterolateral PT functional subregions in the present study.
The controversy surrounding the existence of dual what and where human auditory processing streams (Middlebrooks, 2002) was a major motivation for the present experiment. No account has satisfactorily reconciled the evidence, on the one hand, for a duality of processing streams and, on the other hand, for their mutual interdependence (Middlebrooks, 2002; Zatorre et al., 2002). On the basis of the present evidence, we propose a crucial role for human PT in gating auditory information between the two streams. Previously, we have hypothesized (Griffiths and Warren, 2002) that human PT acts a “computational hub” that is able to disambiguate object from spatial information in complex sounds. According to this generative model, in performing its computations, PT both accesses learned representations in higher order cortical areas and also gates spatial and object-related information to those higher areas. The present study refines our earlier model of PT operation in two ways: it suggests anatomically distinct spatial (posteromedial) and object (anterolateral) processing mechanisms within PT and distinct communication between these and other cortical areas. Acoustic spatial information is processed in a well defined region of the posterior superior temporal plane, whereas the areas that process object properties (pitch patterns) are distributed along the anteroposterior axis of the superior temporal lobe, including both the posterior temporal plane and anterior auditory areas. According to our model of human PT function, deconvolution in the posterior superior temporal plane will yield spatial and object information for further processing in distinct pathways. However, we do not exclude the possibility, suggested by macaque work (Rauschecker and Tian, 2000), that there may be other direct inputs to the distributed object identification (what) network from PAC or thalamus. The anterior–posterior distribution of object processing in our data is consistent with macaque electrophysiology (Tian et al., 2001). Specifically, object specificity in the macaque defined using a range of animal calls is present in both anterior and posterior belt areas but is shown in a smaller proportion of neurons in the posterior belt. We suggest that in both humans and nonhuman primates there are mechanisms for processing the spatial and object properties of complex sounds in different subregions of the posterior temporal plane and that these mechanisms access distinct cortical areas.
