Abstract
A central tenet in brain research is that early sensory cortex is modality specific, and, only in exceptional cases, such as deaf and blind subjects or professional musicians, is influenced by other modalities. Here we describe extensive cross-modal activation in the auditory cortex of two monkeys while they performed a demanding auditory categorization task: after a cue light was turned on, monkeys could initiate a tone sequence by touching a bar and then earn a reward by releasing the bar on occurrence of a falling frequency contour in the sequence. In their primary auditory cortex and posterior belt areas, we found many acoustically responsive neurons whose firing was synchronized to the cue light or to the touch or release of the bar. Of 315 multiunits, 45 exhibited cue light-related firing, 194 exhibited firing that was related to bar touch, and 268 exhibited firing that was related to bar release. Among 60 single units, we found one neuron with cue light-related firing, 21 with bar touch-related firing, and 36 with release-related firing. This firing disappeared at individual sites when the monkeys performed a visual detection task. Our findings corroborate and extend recent findings on cross-modal activation in the auditory cortex and suggests that the auditory cortex can be activated by visual and somatosensory stimulation and by movements. We speculate that the multimodal corepresentation in the auditory cortex has arisen from the intensive practice of the subjects with the behavioral procedure and that it facilitates the performance of audiomotor tasks in proficient subjects.
Introduction
A widely held assumption is that the auditory cortex, like other early sensory cortical areas, is unimodal and primarily involved in the processing of sounds and that the auditory modality is eventually integrated with other modalities in specific brain structures (Stein and Meredith, 1993). This view, however, is challenged by a number of studies on awake behaving animals. It has been found that responses of neurons in primary and secondary auditory cortical areas to acoustic stimuli depend on whether or not animals were performing an auditory detection task (Miller et al., 1972) on whether there was spatial attention to the sound source (Benson and Hienz, 1978) or attention to specific frequencies or temporal targets (Fritz et al., 2003, 2005). Other studies reported that responses are modulated by auditory short-term memory (Gottlieb et al., 1989; Sakurai, 1994), long-term memory (Recanzone et al., 1993; Beitel et al., 2003; Suga and Ma, 2003; Weinberger, 2004), stimulus anticipation (Hocherman et al., 1981), attention (Hubel et al., 1959), audio-motor association (Vaadia et al., 1982; Durif et al., 2003), eye position (Werner-Reiss et al., 2003), reinforcement condition (Beaton and Miller, 1975), and vocal production (Müller-Preuss and Ploog, 1981; Eliades and Wang, 2003, 2005). Aside from nonauditory modulations of auditory responses, it has been found that neurons in the auditory cortex can respond to somatosensory (Schroeder et al., 2001; Fu et al., 2003) or to visual stimuli alone in normal (Schroeder and Foxe, 2002) or in experimentally cross-wired animals (Sur et al., 1990). Indication for cross-modal activation of the auditory cortex also comes from noninvasive imaging studies in professional musicians (Bangert et al., 2001) and in deaf subjects (Finney et al., 2001). Because all of the animal studies showing nonacoustic activation of the auditory cortex have been conducted on nonperforming or anesthetized subjects, very little can be inferred about possible functional roles of cross-modal activation in the auditory cortex.
In the present study, we searched for nonacoustic activation of the auditory cortex in highly trained monkeys while they performed an auditory categorization task. Within a positive reinforcement go/no go behavioral task paradigm, monkeys discriminated a rising from a falling pitch direction, independent of the initial frequency of the tones in the sequence (Brosch et al., 2004). We recorded neuronal firing from the auditory cortex while these monkeys were engaged in the task and analyzed whether and how the firing was related to various nonacoustic events of the behavioral procedure.
Materials and Methods
Behavioral procedure. Experiments were performed on two male adult Macaca fascicularis that had been trained to categorize the direction of a frequency change in a tone sequence (rising versus falling) (Brosch et al., 2004). Monkeys were seated in a custom-made primate restraining chair within a double-walled soundproof room (1202-A; IAC, Winchester, UK). The front compartment of the chair accommodated a red light-emitting diode (LED), a stainless steel touch bar, and a water spout, which were controlled by a computer. During a behavioral session, the monkeys were permanently connected to a battery-powered resistance meter. Contact with the touch bar closed the circuitry of the resistance meter (current of <5 μA). The changes in resistance (high or low) were signaled, via fiber optics, to the computer interface, which measured these changes with a temporal resolution of 1 μs. The water spout was connected through a plastic tube to a magnetic valve, which was located outside the soundproof room.
A trial started by turning on the light-emitting diode (see Fig. 1). This cue indicated that, within the following 3 s, the monkeys could grasp and hold a touch bar. Once they did so, a tone sequence was triggered 2.22 s later. The first three tones in the sequence had the same frequency. They were followed by three tones of lower frequency, either immediately or after three to six intermittent tones of higher frequency. The monkeys were immediately rewarded with ∼ 0.2 ml of water when they released the touch bar 240-1240 ms after the onset of the first tone of lower frequency. This onset will be termed the “go event.” After bar release, the cue light was turned off and the tone sequence was stopped, which could happen during either a 200 ms tone or one of the 200 ms silent intervals. This was followed by a 6 s intertrial period. When the monkeys prematurely released the touch bar before the go event, a 7 s timeout was added to the intertrial period. In case the monkeys did not release the touch bar during the entire tone sequence, the cue light was extinguished after the last of the three low-frequency tones in the sequence, and the 7 s timeout was applied. For the procedure, monkey F used his left hand, and monkey B used his right hand.
The frequency of the initial tones in the sequence varied randomly from trial to trial in steps of 0.5 octaves over a range of 4.5 octaves. The range was centered on a frequency that was fixed during a training session and that varied from 1 to 2 kHz between sessions, depending on the frequency selectivity of the neurons in a given session. Frequency changes in the sequence were 0.5 or 1 octaves. Tones had a duration of 200 ms with a rise/fall time of 10 ms and were presented at a level of ∼60 dB sound pressure level (SPL). Intertone intervals were 200 ms. The acquisition of this categorization task was very difficult for the monkeys and required ∼100,000 trials of training with strict temporal contingencies between the tone sequence, the cue light, behavior, and reward to reach a performance level of >70% (Brosch et al., 2004).
Animal preparation. After completion of the training, a head holder was surgically implanted into the monkeys' skull to allow atraumatic head fixation. After retraining with head restraint, a recording chamber implant operation was performed over the left auditory cortex. All surgical procedures were performed under deep general anesthesia, followed by a full course of antibiotic and analgesic treatment (for additional details, see Brosch et al., 1999). Experiments were approved by the authority for animal care and ethics of the federal state of Sachsen Anhalt (number 43.2-42502/2-253 IfN) and conformed to the rules for animal experimentation of the European Communities Council Directive (86/609/EEC).
Acoustic stimuli. Acoustic signals were generated digitally with a computer, which was interfaced with an array processor (Tucker-Davis Technologies, Gainesville, FL), at a sampling rate of 100 kHz and digital-to-analog converted. The analog signal was amplified (model A202; Pioneer, Long Beach, CA) and fed to a free-field loudspeaker (Manger, Mellrichstadt, Germany), which was placed 1.2 m and 40° from the midline into the right side of the animal. For electrophysiology, we generated acoustic search stimuli (pure tones, noise bursts, and frequency sweeps) with a waveform generator (Tucker-Davis Technologies). For a quantitative assessment of the best frequency and the spectral bandwidth of a unit, we presented a random sequence of 400 pure tones at 40 different frequencies, usually covering a range of approximately eight octaves (e.g., 0.125-32 kHz) in equal logarithmic steps. Tones had the same duration, envelope, and level as those used for the auditory categorization task. Intertone intervals were 980 ms. Generally, these tones were presented at the end of recording session. From the spikes recorded during passive tone stimulation, we generated response planes, which yielded the frequency range that elicited a significant response, the best frequency (the tone that gave the largest response), and the latency of the first spike in response to the tones (see Figs. 1 A, 2 A, F,H, insets) (for additional details, see Brosch et al., 1999). The frequency selectivity of most units could also be assessed from the responses to the tones presented during the auditory categorization task.
The SPL was measured and calibrated with a free-field ½ inch microphone (40AC; G.R.A.S., Vedbæk, Denmark) located close to the monkey's head and a spectrum analyzer (SA 77; Rion). The output of the sound delivering system varied 10 dB in the frequency range of 0.2-32 kHz. At the SPLs used in our study, harmonic distortion was >36 dB below the signal level.
Electrophysiology. For electrophysiological recordings, we used a seven-electrode system (Thomas Recording, Giessen, Germany). Electrodes were arranged in a 610-μm-diameter circle and laterally separated from one another by 305 μm. They could be independently advanced in z-direction. After preamplification, the signals from each electrode were amplified and filtered to split them into local field potentials (10-140 Hz) and action potentials (0.5-5 kHz). Only spike data are analyzed in this paper. All data were recorded onto 32-channel analog-to-digital data acquisition systems [BrainWave (DataWave Technologies, Minneapolis, MN) or Alpha-Map (Alpha Omega, Grapeland, TX)]. By means of the built-in spike detection tools of the data acquisition systems (threshold crossings and spike duration), we discriminated the action potentials of a few neurons (multiunit) from each electrode and stored the time stamp and the waveform of each action potential with a sampling rate of 20.833 or 50 kHz.
From individual multiunit records, the action potentials of a single unit were extracted off-line with a template-matching algorithm (Schmidt, 1984). The template was created by selecting a number of visually similar and large spike shapes and calculating the average waveform. Subsequently, the waveforms of all events in a multiunit record were cross-correlated with the template, and those waveforms were considered to be generated by the same neuron whose normalized cross-correlation maximum was >0.9. This separation was followed by verifying that there were no first-order interspike intervals <1.5 ms, e.g., smaller than the refractory period of single units in the cortex. Figure 2 D-F shows examples of sorted spike waveforms. The firing of single units, therefore, is included in the firing of some of the multiunits shown in Results. Unless specified otherwise, we will use the term “unit” to refer to both single units and multiunits throughout the article. We also searched for events with artifactual waveforms in individual multiunit records and extracted templates from them to detect and delete all events that were attributable to electrical interferences. This procedure was particularly effective in “raw” records in which we observed many events immediately after the switching of the magnetic valve. Despite limitations in the interpretation of multiunit activity, this signal also includes the activity of neurons that are rarely revealed in extracellular single-unit recordings. Thus, it may be that preferentially neurons with small spikes are more prone to exhibit firing that is related to nonacoustic events.
Electrodes were oriented at an angle of ∼45° in the dorsoventral plane such that they either directly penetrated the auditory cortex or first traversed parietal cortex. We only included the following: (1) sites at which neurons responded to the tones during the performance of the behavioral task, (2) sites at which neurons responded to passive presentation (while the animal was not engaged in the behavioral task) of either pure tones of different frequencies or noise bursts, and (3) sites that were more ventral and <1 mm in the supratemporal plane from a site with an auditory response. These criteria guaranteed that only recordings from the auditory cortex entered our analysis. Recordings were made from a region extending 7 mm in mediolateral direction in monkey B and 6 mm in monkey F, and from a region extending 7 mm in caudomedial direction in monkey B and 8 mm in monkey F, including the primary auditory cortex in both monkeys. Because animals have not yet been killed, we preliminarily assessed areal membership by physiological criteria, namely the spatial distributions of best frequencies and response latencies that are characteristic for the primary auditory cortex and posterior auditory fields (Kaas and Hackett, 2000). In both monkeys, neuronal responses to nonacoustic events were seen within the entire region from which we recorded.
Data analysis. To determine the temporal relationship between the neuronal firing and various events of the behavioral paradigm, we calculated for each record perievent time histograms (PETHs) with a bin size of 20 ms, which were referenced to individual events of the behavioral procedure. To identify periods with significantly increased or decreased firing, we first identified a time window of interest in a PETH in which the firing was >3 SDs above or below the baseline firing, which was taken as the mean firing in the period of 1800 ms before light onset. We then conducted Wilcoxon's test to compare the firing in the time window of interest with the firing in a time window of the same duration taken from baseline. The statistical tests revealed that 97.4% of the 2447 time windows of interest in our sample had p < 0.05, most of which with p < 10-10.
Because the tone sequence always started 2.22 s after a monkey had touched the bar, the PETH synchronized to this event also showed responses that were time locked to the tone sequence. Because responses to different frequencies across sequences were collapsed, this PETH does not reveal the frequency selectivity of a unit. The time between onset of the cue light and the monkey's grasp as well as the time between the go event and the monkey's bar release varied from trial to trial. For all sessions combined, average reaction time to the cue light was 711 ± 194 ms in monkey F and 791 ± 138 ms in monkey B. Average reaction time to the go event was 709 ± 201 ms in monkey F and 755 ± 269 ms in monkey B.
Audio and visual recordings. Many recording sessions were video and audio monitored with a Sony (Tokyo, Japan) CCD-F375E video camera, placed ∼40° from the midline on the left side and at a distance of ∼1.4 m from the animal. The camera signal was fed into a video cassette recorder located outside the recording room, on which the signal was stored. Video tapes were analyzed to determine the monkeys' movements relative to specific events of the behavioral procedure in fine temporal detail (41.67 ms resolution). The audio track was digitized and served to search for sounds generated by the monkeys or by our equipment that might have been audible to the monkeys. The audio recordings with the built-in microphone of the video camera could detect pure tones played at 5 dB SPL in the frequency range from 1 to 15 kHz from the calibrated free-field speaker. Outside this frequency range, the sensitivity decreased by 20 dB at 200 Hz and by 15 dB at 22 kHz. Because the monkeys might have been closer to some of the potential sound sources in the experimental room than the microphone of the video camera, these sources could produce higher SPLs at the monkey's ear. For example, a sound generated 14 cm from the monkey is ∼20 dB above the SPL measured at the location of the microphone. However, such sound sources did not seem to be present because we were not able to detect any acoustic artifacts attributable to the LED or other equipment in the recording room when we searched for them with a bat detector (Pettersson D200), which included the ultrasound range (<120 kHz, sensitivity ∼15 dB SPL).
Visual control task. To further control for artifacts and to explore the task specificity of the firing that was related to nonacoustic events of the behavioral procedure, we trained one monkey in a later experimental stage to also perform a visual detection task. For this task, the same behavioral procedure was used as for the standard auditory task, except that the auditory stimuli were replaced by visual stimuli. After the cue light was lit, the monkey had to grasp and hold the touch bar until the light started blinking periodically (50 ms off, 50 ms on). The hold period varied from trial to trial between 2.2 and 5.8 s, to yield similar intervals between the first contact with the touch bar and the go event in the visual and auditory task conditions. If the monkey released the touch bar during the interval 240-1240 ms after onset of the blinking, the light was turned off, the trial was scored correct, and a reward was administered. Otherwise, the trial was scored incorrect. Monkey F learned to switch between the auditory and the visual task in the same session within 70 trials of a single training session. After a few sessions, the monkey scored correct in ∼98% of the visual task trials and remained ∼70% correct in the auditory task trials.
Results
This report is based on 60 single units and 315 multiunits from the auditory cortex of two monkeys. They were recorded with a seven-electrode system over 70 behavioral sessions while the monkeys categorized tone sequences with rising and falling pitch direction (Brosch et al., 2004). We found that many of these units fired action potentials not only in response to the tones in the sequence but also exhibited firing that was synchronized to nonacoustic events that coordinated the auditory categorization task. Figure 1 shows an example of such firing in a representative multiunit from the primary auditory cortex. The three PETHs were triggered on the onset of the cue light (A), on the moment the monkey's hand touched the bar (B), and on the moment the monkey released the bar (C). Each of the PETHs consists of one or several peaks, which indicates that the neuronal firing was synchronized to these events. The unit fired 80-120 ms after the cue light went on, 120 ms before until 80 ms after the monkey had contacted the touch bar, and from 160 ms before until 1220 ms after the release of the bar. Figure 1B also shows the responses to the first two tones of the sequence, which started 2.22 s after bar touch. More examples of such firing in other multiunits and in single units are presented in Figures 2, 4, 6, and 7. In the following, we will first describe general characteristics of the firing that was related to nonacoustic events. We then show how the firing to different events of the behavioral procedure can be disentangled. Next, we present results from control experiments that aimed at ruling out artifacts. The Results are concluded by a comparison of the neuronal sensitivity for different events of the behavioral procedure.
Cue light-related firing
In our sample, 45 multiunits (14.3%) and one single unit (1.7%) exhibited firing that was synchronized to the onset of the cue light. In 39 multiunits and the single unit, this firing was significantly stronger than before light onset, whereas the firing decreased in six multiunits (Wilcoxon's test, p < 0.05). Increases in firing ranged from 119 to 700% (median of 139%), and decreases ranged from 57 to 75% (median of 70%). Latencies of cue light firing ranged between 60 and 240 ms after light onset (median latency of 100 ms). In all units, the increase in firing was transient, with a median duration of 60 ms (range of 20-220 ms). These latencies and response durations corresponded well to those in early visual cortical areas (Raiguel et al., 1989). To describe the dynamics of the entire population as a function of time, we computed a recruitment function, which shows the number of units that fired in different time bins relative to the onset of the cue light (Fig. 3A). This function reveals that the maximal number of units fired 120 ms after the onset of the cue light and that cue light-related firing disappeared within 360 ms after light onset.
It is unlikely that the light caused artifacts in the neuronal recordings. The switching of the LED with a relay outside the recording room did not produce any electrical interferences. Even if it had produced electrical artifacts, they should occur at the time of switching and not ∼100 ms later. Neither the switching of the relay nor of the LED itself produced any measurable sound in the soundproof room (Fig. 4D,G). An additional argument against artifacts is that, except for two of the 45 multiunits, light-related firing was not observed in error trials, i.e., in trials in which the monkeys did not grasp the touch bar during the 3 s period after light onset (Fig. 5). This suggests that, in many neurons, light-related firing was evoked only under specific conditions, such as when the monkey attended to visual stimuli or associated the cue light with auditory processing, or when the visual stimulus fell into the visual receptive field of a cell in the auditory cortex.
Grasping-related firing
One hundred ninety-four multiunits (61.6%) and 21 single units (35.0%) exhibited firing that was associated with the grasping of the touch bar, i.e., the PETH triggered on bar touch consisted of bins in which the firing was significantly different from baseline firing (Wilcoxon test, p < 0.05). Like the examples shown in Figures 1B, 2, A and C, 4B, and 6A, these units transiently increased or decreased their firing for a period of 20-1080 ms (median of 100 ms for multiunits and 120 ms for single units). The first unit in our sample that increased its firing did so 340 ms before bar touch (Fig. 3B). Subsequently, the number of firing units in our sample increased and reached a maximum of 20 ms after the hand had touched the bar. During the following 800 ms of the hold period, the number of recruited units with transiently elevated firing gradually decayed.
Many units also exhibited slow changes of their firing after the monkeys had touched the bar, which persisted at least until the tone sequence started. The firing of the multiunit shown in Figure 2A slowly decreased ∼700 ms after the grasp and continued to do so until the onset of the tone sequence (this distinguished slow changes of firing from transient firing). Figure 2C shows a single unit with a slow decrease of firing during the hold period. Slow decreases of firing were seen in 37 multiunits (11.8%) and four single units (6.7%). The opposite behavior, a slow increase of firing during the hold period, was seen in 43 multiunits (13.7%). A representative multiunit with such firing is shown in Figure 2B. In two single units, there appeared to be such increases, which did not become significant, however. In our sample, slow changes of firing became significant at the earliest 140 ms after the monkeys had touched the bar (Fig. 3B). Subsequently, the percentage of multiunits with these properties increased and reached a value of ∼25% when the tone sequence began.
It is unlikely that myogenic artifacts during arm movements were contained in our recordings of grasping-related activity because such firing was also observed in single units (Fig. 2C) and because multiunits simultaneously recorded at different cortical sites often had uncorrelated firing (Fig. 2A,B). We could not detect any sounds associated with the grasping of the touch bar that might have evoked such firing (Fig. 4E,H). A key argument against artifacts arises from the visual control experiment in which grasping-related firing of a given unit disappeared when the monkey performed a visual task instead of the auditory task (Fig. 6).
Release-related firing
In correct trials, firing synchronized to the release of the touch bar was found in 36 single units (60%) and 268 multiunits (85.1%), similar to the number of acoustically responsive single units (37; 61.7%) and multiunits (271; 86.0%). Firing started at the earliest 380 ms before the release (median latency of -120 ms for multiunits and -100 ms for single units) and lasted, with variable time courses, for a median period of 900 ms in multiunits and 760 ms in single units (Fig. 3C). All but two of these units increased their firing during this period, with changes ranging from 60 to 6300% of the baseline firing (median change of 218%). Release-related multiunit firing was significantly stronger than light-related multiunit firing (median increase of 139%; Wilcoxon's test, p < 10-9) and grasping-related multiunit firing (median increase of 167%; p < 10-13) when the largest bins in the PETHs were compared.
Segregation of firing synchronized to different events of the behavioral procedure
We were partly able to identify the events of the behavioral procedure to which the neuronal firing was synchronized. One approach was to analyze events whose temporal separation varied from trial to trial, such as the time between the go event (onset of the first tone with a lower frequency in the sequence) and the monkey's reaction to this event (bar release). To disentangle the firing related to each of the two events, we calculated four PETHs, which were triggered on either the go event or the bar release and were established separately for trials with different reaction times to the go event. This is exemplified in Figure 2, G and H, which show two PETHs of a multiunit whose firing was triggered on the bar release and on the go event, respectively. In both panels, one PETH was calculated from trials in which the monkey released the touch bar early (red bars) after the go event and another PETH in which the release occurred late (blue curve). The two PETHs were distinguished by the median reaction time.
In Figure 2G, the PETH consists of three narrow peaks around the time of bar release and two broader peaks well after bar release, which are all in register both for trials with early and late releases. Conversely, the multiple small peaks before bar release, representing firing synchronized to the onset of the tones in the sequence, are not in register. This is attributable to “incorrectly” triggering this firing on the bar release. Hence, only the three narrow peaks around bar release and the following broader peaks are independent of reaction time and therefore represent firing that is time locked to bar release. Because the tone sequence was played until it was stopped by the bar release, this analysis also suggests that the large peak immediately before bar release represents firing that is related to the last tone in the sequence. This can be inferred from the periodicity of the two PETHs that reflect the repetitive tone stimulation. Therefore, we speculate that a tone response may be greatly modified if the tone is followed by a behavioral response, such as the monkey's recognition of pitch direction or to a reward-expecting decision. Similar response modifications have been observed in the somatosensory cortex of monkeys while they performed a vibrotactile categorization task (Romo et al., 2002).
The opposite synchronization behavior was seen when the firing of the unit was analyzed relative to the go event, i.e., to the first tone with a lower frequency in the sequence (Fig. 2H). The PETHs for early and late behavioral responses initially consist of five narrow peaks in register. This indicates that, independent of the monkey's reaction time, the firing is precisely coupled to the onset of the tones in the sequence. In contrast, the firing >400 ms after the go event depends on the monkey's reaction time. A high and broad peak emerged ∼500 ms after the go event in trials with early releases (red bars), i.e., when the median release occurred 558 ms after the onset of the first low-frequency tone and thus 118 ms after the second low-frequency tone. No such peak was present when the monkey released the touch bar late after the go event (blue bars), i.e., when the median release occurred 859 ms after the first low-frequency tone and thus 59 ms after the third low-frequency tone. In this case, rather, strong firing emerged ∼900 ms after the go event and thus at the time of the third low-frequency tone. These observations indicate that the occurrence of strong firing was related to the variable time of bar release. Because the monkey indicated with the bar release the (subjective) occurrence of the falling pitch direction, this again suggests that late tone responses can be modified in magnitude and time by the monkey's decision.
Similar to the separation of firing synchronized to tone onsets and bar release, we could disentangle firing that was time locked to the cue light from firing that was time locked to the bar touch. This analysis revealed, for example, that the sharp peak in the PETH of Figure 1A represents firing synchronized to the cue light, whereas the following broad and small peak represents firing that was synchronized to the reaction time to the cue light. Likewise, the two peaks ±200 ms around bar touch in Figure 1B were found to be synchronized to this event, whereas the preceding small peak was not synchronized to this event. The procedures to separate the firing synchronized to tone onsets and bar release as well the separation of the firing synchronized to the cue light from that synchronized to the bar touch were performed on all multiunits, such that Figure 3 shows only the firing that could be identified as being temporally related to the event indicated on the abscissa.
Analysis of error trials
We further identified how the firing was related to different events at the end of the tone sequence (sequence offset) by comparing the following: (1) correct trials in which the monkeys responded correctly to the go event with (2) incorrect trials in which they prematurely released the bar or (3) incorrect trials in which they did not release the touch during the entire tone sequence (no-response trials). Although different events were associated with correct and incorrect trials, the common event was the extinction of the cue light. Therefore, we calculated separate PETHs from the firing in the three conditions, all of which were triggered on the extinction of the cue light. A representative example in which we applied our analysis is shown in Figure 7. Comparison of the PETHs revealed that this multiunit fired differently in correct and incorrect trials. In correct trials (Fig. 7A), the PETH consisted of four distinct peaks, one shortly before and one shortly after sequence offset, and another two ∼700 and ∼1200 ms later. Because these peaks were found in the same time bins, independent of the monkey's reaction time (data not shown), they indicate firing that was synchronized to sequence offset. In premature trials (Fig. 7B), the PETH consisted of only the two peaks around sequence offset. In no-response trials (Fig. 7C), only the peak ∼700 ms after sequence offset was significant, except for some earlier tone-evoked peaks. Because the monkey moved his hand from the touch bar only in correct and premature trials but not in no-response trials, these comparisons suggest that the strong firing shortly before sequence offset in correct and premature trials was related to the monkey's (correct or incorrect) recognition of the falling pitch direction or to his motor response to this go event. Furthermore, we conclude that the late peaks 700 and 1200 ms after sequence offset were not attributable to myogenic or acoustic artifacts evoked by the licking because these peaks were not observed in premature trials, in which the monkey licked in all 28 trials at a time comparable with the licking in correct trials. Both monkeys exhibited this licking habit in other behavioral sessions as well. The late firing after correct releases could not be attributable to the motor control of licking because the peak ∼700 ms after sequence offset was also observed in no-response trials. In the latter trials, the monkey did not lick immediately after sequence offset but started licking 2214 ± 1808 ms later and did so in 44% percent of the trials.
Despite the limitations of smaller sample sizes in incorrect trials, we found that, in premature trials, most multiunits started firing before sequence offset (Fig. 3C, green curve). The number of firing multiunits remained high until 140 ms after sequence offset and then decayed monotonically, unlike in correct trials. In no-response trials (Fig. 3C, blue bars), the first peak in the histogram (at -200 ms) represents responses to last tone in the sequence. However, there were also multiunits that fired after sequence offset. Their firing commenced at the earliest 40 ms after sequence offset, and most multiunits fired 20 ms later. The activation was relatively brief in most multiunits (median duration of 60 ms). As at sequence offset only the cue light was extinguished, this firing may have been evoked by this visual event. In addition to the early firing, there were multiunits that fired during the period 260-1260 ms after sequence offset, most of which at ∼760 ms. The later firing might be related to the release of the touch bar in no-response trials, which occurred, on average, 600 ± 120 ms after sequence offset. Qualitatively similar results were seen in single units, although the small number of incorrect trials associated with smaller number of spikes did not permit quantitative analyses.
Measurements of sounds synchronized to bar release
Some weak sounds were synchronized with bar release, but their measurements exclude them as the sole source of release-related firing. From the audio recordings in the soundproof room, we computed spectrograms that were synchronized to bar release. The representative spectrogram shown in Figure 4F consists of several separate epochs with increased sound pressure. Until shortly before bar release, it shows the tones of different frequency in the sequence. At the time of bar release, there was a brief noise band ∼5 kHz, which was attributable to the click produced by the magnetic valve of the water delivery system outside the sound chamber. After bar release, there were three periods with licking sounds, which occurred ∼360, 1080, and 1720 ms.
When we compared the spectrogram and with the PETH (Fig. 4C), we found, first, that strong firing occurred before bar release during the stimulus- and artifact-free interval between the end of the last tone in the sequence and the click of the valve. This suggests that the firing shortly before bar release was not evoked by sounds synchronized with the release. Second, the sharp peak immediately after the origin of the PETH probably indicates firing that was evoked by the sounds of the water delivery system. Third, it is noted that the late periods of firing 760 and 1300 ms after bar release did not coincide with the licking sounds. This suggests that the licking sounds themselves did not drive the unit. This conclusion is further supported by the lack of increased firing in premature trials in which the monkey also produced licking sounds (data not shown). Alternatively, the self-generated licking could modulate neuronal activity in the auditory cortex, similar to auditory cortex modulations observed before and during self-generated vocalizations (Eliades and Wang, 2003, 2005) and self-generated sounds (Martikainen et al., 2005).
Neuronal firing in the auditory cortex during the performance of a visual task
To further control for artifacts and to explore the task dependence of the firing that was related to nonacoustic events of the behavioral procedure, we trained monkey F to perform a visual detection task in addition to the standard auditory categorization task. In the visual task condition, the go event was the transition of the cue light from being constantly lit to a blinking state, whereas all other procedures were like in the auditory task condition. Figure 6 shows the firing of a multiunit from the auditory cortex that was recorded during a session in which the monkey initially performed the auditory task and, a few seconds after the last auditory trial, was switched to the visual task. In the auditory task condition, the multiunit exhibited firing that was related to the grasping of the touch bar and to the tone sequence (A) as well as to the release of the touch bar (B). In contrast, no modulations of the firing were observed during the performance of the visual detection task, i.e., there were neither increases of firing synchronized to the grasping nor increases of firing synchronized to the release of the touch bar. A similar task dependence was observed in another five multiunits in the auditory cortex. None of them exhibited cue light-related responses in either task condition. Because only the stimuli were different in the two task conditions while all other procedures were unchanged, these observations suggest that the grasping-related and release-related firing observed in the auditory task condition were not caused by sound or other artifacts. Furthermore, they suggest that auditory cortex neurons only fired during nonacoustic events if these events were associated with an auditory task.
Relationship between firing related to acoustic and nonacoustic events
In our sample, units could exhibit firing that was related to one or more events of the behavioral procedure (Fig. 8). There were single units that responded to the tone sequence (Fig. 2F), single units whose firing was synchronized only to nonacoustic events of the behavioral procedure and not to the tone sequence (Fig. 2E), and single units that exhibited acoustically evoked firing as well as firing synchronized to nonacoustic events (Fig. 2C,D). This suggests that there may be different types of neurons in the auditory cortex, which we tentatively termed purely auditory, nonauditory (i.e., neurons that did not respond to pure tones during and outside the behavioral task; no other auditory stimuli were tested), and mixed type. In the entire sample, there were 37 single units with tone responses. A surprisingly small proportion of them (41%), however, was purely auditory, whereas most of them were of mixed type, i.e., they also exhibited firing that was related to the grasp or release of the touch bar. The mixed-type neurons were supplemented by 23 nonauditory single units. The specific pairings of different response properties are summarized in Figure 8A. A Monte Carlo simulation revealed that the pairings were not by chance but systematic. For this simulation, we calculated the expected number of units with different pairings of response properties if pairings were by chance and generated many simulated numbers of units of such chance pairings. We then compared the average squared difference between the measured and the expected number of pairings to the distribution of the average squared difference between simulated numbers and the expected number with such pairings. This comparison revealed that it was highly unlikely to find the observed pairings in a sample with chance pairings (p < 0.0001).
A qualitatively similar although quantitatively different picture emerged when the multiunit data were analyzed (Fig. 8B). In the multiunit sample, purely auditory sites as well as nonauditory sites were less common than in the single-unit sample, whereas the number of mixed-type sites was increased.
For the multiunit sample, we also compared spectral receptive field properties of units that did or did not exhibit firing that was related to nonacoustic events of the behavioral procedure (Table 1). Receptive field properties were taken from the responses to 40 different pure tones in the passive listening condition and included best frequency as well as the lowest and highest frequency that yielded a response. Pairwise t tests were applied for each receptive field property to find out whether purely auditory units had response properties different from those of mixed-type or nonauditory units. Except that units without grasping-related firing responded to significantly lower frequencies than units with grasping-related firing, no other significant differences in spectral receptive field properties were found between purely auditory units and units that fired during nonacoustic events.
Discussion
In monkeys performing an auditory categorization task, we found many neurons in the auditory cortex that fired during nonacoustic events of the behavioral procedure. Because most of the firing was transient and time locked to these events, was not seen, or was modified when the monkeys did not react to the events or performed a visual task and was not attributable to artifacts we conclude the following. (1) The firing after turning on the cue light was evoked by this light. (2) The firing before contact with the touch bar was related to the preparation or execution of movements. (3) The firing after the monkey's hand had been placed on the touch bar was evoked by tactile stimulation. The sudden increase of the number of firing neurons 20 ms after bar touch was not the time when neurons are expected to be recruited for the control of hand movements. (4) The firing before the bar release was either related to the recognition of the pitch direction in the tone sequence or to its behavioral consequence (hand movement, reward).
Our results extend previous findings that auditory cortex responses to acoustic stimuli are modulated during the performance of an auditory task (Miller et al., 1972; Benson and Hienz, 1978; Fritz et al., 2003, 2005). They corroborate that auditory cortex neurons respond to somatosensory stimulation in awake nonperforming (Schroeder et al., 2001) and anesthetized monkeys (Fu et al., 2003) and to visual stimuli in awake nonperforming (Schroeder and Foxe, 2002) and in experimentally cross-wired (Sur et al., 1990) animals. These findings are in line with recent demonstrations of anatomical connections between early auditory and visual cortical areas (Falchier et al., 2002; Rockland and Ojima, 2003) and complement reports of acoustic responses in visual cortical areas 17-19 (Morrell, 1972; Fishman and Michael, 1973; Bronchti et al., 2002).
The sustained changes of firing we observed during the bar-holding period may reflect stimulus expectation or preparation and adjustments for the processing of upcoming acoustic stimuli. The transient activation before and synchronized to bar release may, in part, reflect late responses to the go tones that are modified in latency and magnitude by the monkeys' decision to release the bar. In contrast, we consider modulatory inputs unlikely to account for most of the firing in the auditory cortex that was synchronized with nonacoustic events of the behavioral task, such as the onset of the cue light and the grasping and release of the touch bar. First, during their occurrence, there were no responses to acoustic stimuli that could have been modulated by nonauditory input. Second, many transient responses to nonacoustic events occurred at short latencies, comparable with those in early visual (Raiguel et al., 1989) and somatosensory cortical areas (Romo et al., 1996). Nevertheless, some of the firing seen in the present experiments may be prone to nonsensory modulations. Both grasping-related and release-related firing in the auditory cortex appeared to require the monkey to be engaged in an auditory task because these activations disappeared when the monkey performed a visual instead of the standard auditory task. This may also apply to cue light-related firing in the auditory cortex, which we hardly ever observed when the monkeys subsequently did not touch the bar. This implies that, in many neurons, cue light-related firing was evoked only under specific conditions, as when the monkey attended to visual stimuli, associated the cue light with auditory processing, or when the visual stimulus fell into the visual receptive field of a cell in the auditory cortex.
The responses to nonacoustic stimuli and the firing during arm movements suggest that the auditory cortex can represent nonacoustic events in addition to sound. The functional implications of the nonauditory corepresentation are not clear yet. There is some evidence that blind subjects benefit from a somatosensory representation in the visual cortex for Braille reading (Cohen et al., 1997). Our interpretation of the extensive multimodal corepresentation in the auditory cortex of our highly trained monkeys is that the categorization of acoustic stimuli was intimately associated with visual stimuli, hand movements, and tactile feedback about the proper placement of the hand on a touch bar. The cue light indicated that the monkeys could initiate a tone sequence. By grasping and holding the touch bar, the monkeys signaled their readiness to listen to acoustic stimuli and started the tone sequence. By withdrawing their hand from the touch bar, the monkeys expressed the result of their auditory processing. It is conceivable that the corepresentation of nonacoustic events in the auditory cortex of our monkeys has emerged during the long training period they have spent on the acquisition of the task. The representation of nonauditory sensory modalities and movements in the auditory cortex could accelerate and improve the subject's performance in highly demanding auditory tasks (Bangert et al., 2001).
Footnotes
This work was supported by the State of Sachsen Anhalt, Bundesministerium für Bildung und Forschung, and Deutsche Forschungsgemeinschaft (Br1385/2; SFB-TR31, A4). We thank Cornelia Bucks for technical assistance during and after the experiments. The valuable suggestions of Drs. Peter Heil and Jonathan Fritz and those of the anonymous reviewers are greatly acknowledged.
Correspondence should be address to Dr. Michael Brosch, Leibniz-Institut für Neurobiologie, Brenneckestra βe 6, 39118 Magdeburg, Germany. E-mail: brosch{at}ifn-magdeburg.de.
Copyright © 2005 Society for Neuroscience 0270-6474/05/256797-10$15.00/0