Abstract
Sound categorization is essential for auditory behaviors like acoustic communication, but its genesis within the auditory pathway is not well understood—especially for learned natural categories like vocalizations, which often share overlapping acoustic features that must be distinguished (e.g., speech). We use electrophysiological mapping and single-unit recordings in mice to investigate how representations of natural vocal categories within core auditory cortex are modulated when one category acquires enhanced behavioral relevance. Taking advantage of a maternal mouse model of acoustic communication, we found no long-term auditory cortical map expansion to represent a behaviorally relevant pup vocalization category—contrary to expectations from the cortical plasticity literature on conditioning with pure tones. Instead, we observed plasticity that improved the separation between acoustically similar pup and adult vocalization categories among a physiologically defined subset of late-onset, putative pyramidal neurons, but not among putative interneurons. Additionally, a larger proportion of these putative pyramidal neurons in maternal animals compared with nonmaternal animals responded to the individual pup call exemplars having combinations of acoustic features most typical of that category. Together, these data suggest that higher-order representations of acoustic categories arise from a subset of core auditory cortical pyramidal neurons that become biased toward the combination of acoustic features statistically predictive of membership to a behaviorally relevant sound category.
Introduction
We rapidly categorize sensory stimuli in our natural environment with seemingly little effort (Thorpe et al., 1996; Murray et al., 2006), yet categorization is a challenging task for the brain to implement, as it involves simultaneously disregarding some sources of stimulus variation while ascribing importance to others. This can be particularly difficult when distinct categories overlap in their sensory features, which then requires learning to discriminate across multiple features to separate categories. The neural mechanisms that permit such separation are of great interest. Though visual categorization has been well studied (Tsao and Livingstone, 2008; Seger and Miller, 2010; DiCarlo et al., 2012), less is known about auditory categorization (Ohl et al., 2001; Bao et al., 2013). Vocal categories can elicit selective responses in higher-order fields along the pathway to sensorimotor and prefrontal areas (Gifford et al., 2005; Prather et al., 2009; Chang et al., 2010; Leaver and Rauschecker, 2010), but how these categorical responses emerge in the feedforward representation of stimuli is unknown.
One possibility is that, with experience, a preferential weighting across the stimulus features associated with a category arises progressively in higher-order areas based on an unbiased representation of these features in primary sensory cortex. Alternatively, neural activity in primary sensory areas may already be modulated by category, even if those responses are not category specific. This could manifest as enhanced neural firing for the statistically more likely combinations of stimulus features in a behaviorally important category. Indeed, in core auditory cortex (AC), tuning maps can be expanded toward behaviorally relevant stimulus features following conditioning (Recanzone et al., 1993; Weinberger, 2004; Polley et al., 2006), though how such an expansion might accommodate the features of behaviorally relevant categories, which vary in multiple dimensions simultaneously, is not clear.
Here, we exploit a mouse model of communication sound learning to address this question, using two natural categories of vocalizations whose features overlap in multiple acoustic dimensions. Ultrasonic vocalizations (USVs) of mouse pups and adult male mice are both call categories that an adult female mouse encounters naturally. Though neither category carries sustained behavioral relevance for nonmaternal females (Hammerschmidt et al., 2009; Shepard and Liu, 2011), experience raising pups results in recognition of pup USVs as behaviorally important, and thus perceptually salient (Ehret et al., 1987; Lin et al., 2013). Intriguingly, pup USVs are perceived categorically by mothers along both spectral and temporal dimensions (Ehret and Haack, 1982; Ehret, 1992). Core AC responses to pup USVs are altered by motherhood in terms of their temporal encoding (Liu et al., 2006; Liu and Schreiner, 2007), inhibitory plasticity (Galindo-Leon et al., 2009; Lin et al., 2013), population dynamics (Rothschild et al., 2013), and integration with multimodal pup cues (Cohen et al., 2011). However, it is not known whether the increased behavioral relevance of pup over adult USVs for maternal females promotes excitatory plasticity that helps distinguish these overlapping categories. Here we find that in mothers, pup and adult USV categories are “untangled” via plasticity occurring in a specific subset of late-onset putative pyramidal neurons (PPNs).
Materials and Methods
All procedures were approved by the Emory University Institutional Animal Care and Use Committee. Subjects were 14- to 24-week-old female CBA/CaJ mice. Animals were held on a 14 h light/10 h dark cycle, and were allowed access to food and water ad libitum.
Acoustically matched USVs.
Mouse pup and adult mouse USVs are complex, single-frequency whistles that differ from one another in many ways. Some of these differences are captured along basic acoustic parameters like duration, frequency, and degree of frequency modulation (FM) at onset (Fig. 1A,B). Pup and adult USVs naturally concentrate in different regions of these acoustic dimensions, but clearly overlap such that it is possible to find a natural USV from each vocal category that falls within systematically varied combinations of narrow ranges of each parameter (Fig. 1A, 9 frequency/duration grid ranges, B, ×2 FM ranges). Hence, we consider each pair of such selected pup and adult USVs to be “acoustically matched” (Fig. 1C, USV 1 and 19, 2 and 20). Despite the matching in “basic” acoustic parameters, “higher-order” acoustic features like the frequency trajectory of individual USVs (Fig. 1C) are obviously not matched across the categories, with adult USVs generally showing much greater FM throughout the calls.
Anesthetized multiunit electrophysiological mapping.
Tone- and vocalization-evoked responses were mapped across the left core auditory cortices of eight pup-naive virgin females and eight maternal females (0–1 d after weaning their first litter of pups). Animals were anesthetized with a 6:1 ketamine/xylazine cocktail (100 and 5 mg/kg, respectively) and a craniotomy was performed to reveal the left auditory cortex. A head post was also cemented to the skull of the animal at bregma to maintain head position throughout the recording.
Multiunit electrophysiological recordings (sample rate, 24,414.0625/s) were taken using a 4 MΩ 3 × 1 tungsten matrix multielectrode (FHC) with a 305 μm interelectrode spacing. Recordings were made at a depth of 400 μm, while pure tones (60 ms duration, seven sound intensities from 5 to 65 dB SPL, at 30 frequencies log-spaced 4–80 kHz, repeated five times each, and presented in pseudorandom order) and USVs (Fig. 1, 5, 6, 9, 10, 13, 14, 23, 24, 27, 28, 31, and 32 repeated 15 times each and presented at 65 dB SPL in pseudorandom order) were played back (sample rate, 223214.2857/s) from a free-field speaker (EMIT high-energy speaker, Infinity Systems) positioned 11 cm lateral to the right ear, and calibrated to equalize sound pressure across frequencies. Sound presentation and data acquisition were controlled using Tucker-Davis Technologies System 3 hardware and the Brainware software application. Details on our acoustic stimulation equipment have been published previously (Galindo-Leon et al., 2009; Lin et al., 2013).
Core auditory cortex includes those fields with dominant feedforward thalamocortical input from the lemniscal auditory pathway, and consists of primary auditory cortex (A1), anterior auditory field (AAF), and ultrasound field (UF; Stiebler et al., 1997)—the functionally defined, rostrodorsally located field that encompasses neurons tuned to frequencies in the ultrasound range (≥40 kHz; Fig. 2B). Sharp frequency tuning and a peristimulus time histogram (PSTH) peak <15 ms from sound onset were required for recordings from core auditory cortical areas, and the best frequency (BF) of the tuning curve and the spatial position of the recording site dictated to which core subregion (AAF, A1, or UF) a site was assigned (Stiebler et al., 1997). Auditory-responsive sites not meeting these criteria were deemed likely noncore sites and are not considered here. A perimeter of nonresponsive sites was established around auditory-responsive areas to ensure that the entire spatial extent of core auditory cortex was captured. Spatial coordinates of each recording site were tracked by comparing electrode position to a high-resolution photo of the craniotomy region obtained before recording, using vascular features as references.
Off-line, BF was identified as the frequency that elicited the strongest response (in spikes per second), averaging over all sound intensities at or below threshold. BF maps were generated by performing Voronoi tessellations (MATLAB, voronoi) on all recording sites for a given animal. All area-based measurements (e.g., proportionate area tuned to a given frequency band) were made by summing the areas of the Voronoi polygons of recording sites that met the given inclusion criteria. A fraction of the recording sites were associated with polygons with spuriously large or infinite boundaries because, for technical reasons, we were unable to “enclose” that area with a perimeter of nonresponsive sites. Therefore, we reassigned the median polygon area measurement to the 5% of sites with the largest polygon areas.
Awake single-unit electrophysiology.
Single-unit (SU) recordings were made from the core auditory cortices of 43 awake female mice. “Maternal” animals were either primiparous mothers recorded within 2 weeks of pup weaning (n = 12 mice) or cocaring virgin female mice (“early” cocarers, n = 6) prepared for recordings immediately following 6 d of pup care. “Nonmaternal” animals were either pup-naive virgin females (n = 12) or former cocarers recorded within 2 weeks after pup weaning (n = 13). The subgroups contained within the maternal and nonmaternal groups were previously shown to have similar behavioral preferences for and neural responses to pup USVs (Ehret and Koch, 1989; Lin et al., 2013), and were grouped in the present study to provide additional experimental power.
Two days before recording, animals were anesthetized with isoflurane (2–5%, delivered with O2), and aseptic surgery was performed to stereotaxically define a recording grid over the left auditory cortex and implant a head post (Galindo-Leon et al., 2009). Briefly, the skull was exposed, and the left temporal muscle was deflected to permit access to the bone overlying auditory cortex. Using India ink applied to a stiff wire mounted on a stereotaxic manipulator, dots ∼100 μm in diameter were drawn on the skull in three rows (1.5, 2.0, and 2.5 mm below bregma) and five columns (spanning 50–90% of the distance between bregma and lambda, in 10% steps). Following recording grid placement, dental cement was used to secure an inverted flat-head machine screw on the midline equidistant from bregma and lambda. The animal was then allowed to recover, with buprenorphine (0.05 mg/kg) provided for pain management.
On the day of a recording, the animal was reanesthetized with isoflurane and holes (∼150 μm in diameter) were drilled on one or more grid points with an insect needle held by a pin vise. In addition, a hole for the ground wire was drilled in the left frontal cortex. Two hours after recovering from this procedure, the animal was handled for 10 min and then placed into a foam-lined cylindrical (∼3 cm diameter) restraint device that secures the body while leaving the head exposed. The implanted head post was then secured to a post mounted on a vibration-isolation table, while the capsule containing the body of the animal was suspended from rubber bands. Each recording typically lasted 2–4 h, and excessive movement or signs of stress signaled the end of an experiment.
Electrophysiological activity was recorded (sample rate, 24,414.0625/s) by a 6 MΩ tungsten electrode (FHC), and filtered at >300 Hz and <3 or 6 kHz. The electrode was driven orthogonally into auditory cortex using a hydraulic microdrive (FHC) to an initial depth of 700 μm. The electrode was then retracted toward the cortical surface in 5 μm steps until an SU was detected. SU isolation was based on the absence of spikes during the absolute refractory period (1 ms), and on a cluster analysis of various spike features (e.g., first vs second peak amplitudes, vs peak–peak times). In several cases, multiple SUs were recorded at one location and could be extracted by clustering based on spike features.
As in the anesthetized mapping study, stimulus presentation and data acquisition were controlled through Tucker-Davis Technologies System 3 hardware and the Brainware application via modules programmed in the RPvdsEx environment. Pure tones (60 ms duration, 40 frequencies log-spaced at 6.4–95 kHz, at 60 dB SPL, repeated 5–15 times each, and presented in pseudorandom order) and USVs (Fig. 1, 1–36 repeated up to 50 times each and presented at 65 dB SPL in pseudorandom order) were played back (sample rate = 223214.2857/s) via a free-field speaker (EMIT high-energy speaker, Infinity Systems) positioned 11 cm lateral to the right ear of the animal. Occasionally, an SU drifted sufficiently in amplitude that it could no longer be isolated, in which case stimulus presentation was terminated with fewer trials.
Single-unit classification.
Off-line, the consensus of three independent, blind observers determined whether an SU was unresponsive, excited by, or only inhibited by USVs, judging from the spike rasters and overall PSTH. Raster trials were grouped according to USV category, and were arranged in order of increasing USV frequency and duration. Observers judged whether there was increased or decreased firing in the 100 ms after stimulus onset by looking for a change in the density and consistency across trials of the rasters. If there was only decreased firing, the SU was classified as USV inhibited (and is not considered further here); if there was any increased firing, the SU was classified as USV excited, even if inhibition was also present. This method largely agreed with an automated algorithm for classifying USV-excited SUs (Lin and Liu, 2010), which can suffer from false negatives.
Here we focused only on USV-excited units. Then, to account for some diversity present in the SU population on the whole, we subdivided the pool of USV-excited SUs into subgroups meeting previously validated physiological parameters. In our prior work (Lin and Liu, 2010), we identified a subset of the thin-spiking units in auditory cortex whose onset responses to USVs are best predicted by a feedforward model of acoustic envelope integration (Neubauer and Heil, 2008); a separate subpopulation of late-responding thick-spiking units was quite poorly predicted by this model. The study found that these two distinct groups of auditory cortical neurons that encode sound onsets so differently actually varied significantly in basic physiological parameters aside from their spike waveform, including spontaneous firing rates and response latency. We used those parameter means ± 3 SDs as templates for pruning our larger SU population down to these nonoverlapping best-predicted (peak–peak spike width, <0.35 ms; onset latency between 7.1 and 12.5 ms; spontaneous rate, <29.2 spikes/s) and poorly predicted (peak–peak spike width, >0.35 ms; onset latency between 19.9 and 58.7 ms; spontaneous rate, <5.4 spikes/s) subgroups. Because spike width is known to correlate with neurochemical identity, and because our subpopulations have nonoverlapping distributions of spike widths, we refer to these subsets of SUs as distinct subgroups of putative interneurons (PINs) and putative pyramidal neurons, respectively.
Single-unit response analyses.
We analyzed SU firing rates evoked by stimuli, measured over a 100 ms window starting at stimulus onset. Absolute firing rates were compared by Wilcoxon rank-sum tests, since these rates were not generally normally distributed due to their clustering at low rates. Firing rates were also normalized by either dividing the evoked firing rate by the spontaneous rate [signal-to-noise ratio (SNR)], or by subtracting the spontaneous rate from the evoked rate. Normalized firing rates were then compared using a two-way ANOVA. Significant results were reported only if both the divisive and subtractive normalization methods produced statistically significant comparisons. To conserve space, only division-normalized firing rate responses were plotted.
To estimate the dissimilarity between the spiking responses to two different calls, the metric of van Rossum (2001) was used. To compute the van Rossum distances, each spike train was first convolved with a decaying exponential function, as follows (Eq. 1): where H(t) is the Heaviside step function, M is the total number of spikes in the spike train, and τ is the exponential decay constant. From here, a distance was computed between two spike trains (Eq. 2), where Dij is the van Rossum distance between spike trains i and j, and T is the length of the window. We chose an exponential decay constant (τ) of 10 ms, which is well matched to the time course of EPSPs in the auditory cortex (Wehr and Zador, 2005) and has been associated with optimal discrimination of conspecific vocalizations in the songbird analog of primary auditory cortex (Narayan et al., 2006). Distances were computed for “collapsed” spike trains that included all spikes for a given unit in all trials in which the stimuli of interest were played. This helped us overcome low spike numbers in individual trials, while retaining stimulus information. If in the resulting collapsed spike train there were two spikes with identical times, one of the spike times was jittered by the smallest amount possible bounded by our sampling rate of the spiking data (0.0001 ms). Finally, the metric of Victor and Purpura (1997) was also computed and produced the same conclusions, so only the van Rossum metric is reported here.
To compare responses to acoustically matched USVs, we calculated distances between the collapsed spike trains generated by pup and adult USVs matched for frequency, duration, and degree of frequency modulation at onset (Fig. 1, e.g., calls 5 and 23, 6 and 24) or between two adult USVs matched for frequency and duration (Fig. 1, e.g., calls 23 and 24). Then, for each SU, we took the simple arithmetic mean of distances between matched pup and adult calls, and between matched adult calls. For a given SU, between-category discrimination was compared with within-category discrimination by expressing as a ratio the average pup–adult USV distance to the average adult–adult USV distance, such that a value of 1 would indicate comparable between-category and within-category discrimination.
To complement our analyses of individual SU responses, we also examined the population responses of each neural subpopulation as a whole. Three independent, blinded observers scored the USV responses of each SU using a binary yes/no scale. Only USVs that generated unambiguous excitatory responses (i.e., three “yes” scores) were counted. The reported proportions were calculated by dividing the number of such responses for a given USV by the total number of SUs in the subpopulation. These proportions were compared with the relative probability of the USV under consideration. These probabilities were calculated by constructing a three-dimensional histogram of our USV library over the three acoustic parameters we controlled for (duration, frequency, and degree of frequency modulation at onset), with a bin resolution of 2 ms, 0.976 kHz, and 1.952 kHz/ms.
Results
Motherhood does not alter core AC map for ultrasounds or USVs
Mouse pup and adult USVs form distinct multidimensional acoustic clusters (see Materials and Methods; Fig. 1), and the combination of these acoustic features allows >90% accurate categorization by an ideal receiver (Liu et al., 2003; Liu, 2006). Of these features, ultrasonic frequency best predicts membership to a USV category, and thus might elicit a biased representation across the maternal AC as a consequence of pup USVs having greater behavioral relevance to those animals. Since much prior work has established that laboratory conditioning with pure tones drives an increase in the representation of the relevant frequencies across the AC map (Weinberger, 2004), we asked whether this also holds for ultrasonic frequencies in the maternal AC after pup USVs have gained behavioral relevance.
Using standard cortical mapping methodology, we recorded multiunit electrophysiological responses to pure tones across the ACs of anesthetized pup-naive females and mothers who had weaned their first litter immediately before recording. At each site, we extracted a BF, which was defined as the frequency evoking the strongest firing rate, averaging over all superthreshold levels (Fig. 2A), and created a BF map of core AC (Fig. 2B). Our samplings of core fields A1, AAF, and UF were not different between groups, as assessed by their field-specific BF distributions (Kolmogorov–Smirnov test: A1, D(231) = 0.072; AAF, D(181) = 0.18; AAF/A1 border, D(61) = 0.14; UF, D(89) = 0.21; p > 0.05 in all cases).
We found that the spatial map of ultrasound tone responses was not changed at this postweaning time point by pup experience. The proportionate size of the core auditory field UF, which encompasses neurons tuned to frequencies in the ultrasound range (Stiebler et al., 1997), was comparable in mothers and virgins [two-sample t test: t(14) = 0.05, p > 0.05, not significant (ns); Fig. 2C]. The total area of core AC tuned to frequencies above ∼50 kHz, including sites inside and outside the UF, was also not different between groups (two-sample t test: t(14) = 0.30, p > 0.05, ns; Fig. 2D). More broadly, there was no systematic difference in the BF distribution across groups (D(568) = 0.089, p > 0.05). Finally, the average area of AC excited by a subset of natural pup USVs themselves was comparable for maternal and nonmaternal animals (two-sample t test: t(14) = 0.35, p > 0.05, ns; Fig. 2E). Thus, in contrast to expectations based on the concept of sensory cortical map plasticity, the increased behavioral relevance of the pup USV category for maternal mice did not enlarge the portion of core AC responding to the ultrasonic frequencies associated with pup USVs, or to natural combinations of features present in real pup USVs.
Increased behavioral relevance does not alter average SU excitation by pup USV category in awake mice
That increasing the behavioral relevance of pup USVs for maternal mice failed to yield map plasticity could be a true negative result consistent with the emergence of categorical representations only in higher-order areas. Alternatively, experimental factors may have masked the ability to observe such an effect. Responses to natural vocalizations could be different under anesthesia (Huetz et al., 2009). Further, the coarse resolution afforded by multiunit recording—while sufficient for typical laboratory mapping studies—could be insufficient to reveal plasticity for the subtle differences that distinguish acoustically similar categories from one another.
To avoid these limitations, we next analyzed USV-excited SU responses from awake maternal (n = 18 mice; 52 SUs) and nonmaternal (n = 15 mice; 78 SUs) mice, using our full set of acoustically matched pup and adult USVs (Fig. 1C). Our sampling of core AC fields did not differ between animal groups (χ(3)2 = 6.2, p > 0.05). Example raster plots and PSTHs showing responses to pup and adult USVs illustrate the diversity with which different SUs can be excited by USVs (Fig. 3). The two SUs on the left, one each from a nonmaternal and maternal mouse, exhibit short-latency bursts of firing whose rate is graded by increasing onset frequency, regardless of whether a call is a pup or adult USV. In contrast, the two SUs on the right of Figure 3, one each from a nonmaternal and maternal mouse, show longer-latency and longer-duration responses to fewer calls. In particular, Figure 3D shows an SU with highly specific, more sustained responses to only certain pup (but not adult) USVs that have lower onset frequencies and onset FM. Spontaneous rates across these SUs also varied widely.
Given such diversity, when we pooled across SUs (Fig. 4A), we found no significant change in the pup USV-evoked absolute firing rate as a result of maternal experience (Wilcoxon rank-sum test, W(52,78) = 5387, p > 0.05; Fig. 4B), agreeing with a previous study based on a smaller number of USV-excited SUs (Galindo-Leon et al., 2009). Furthermore, when compared with responses evoked by adult USVs, neither maternal animals (Wilcoxon rank-sum test, W(52,52) = 2623, p > 0.05) nor nonmaternal animals (Wilcoxon rank-sum test, W(78,78) = 6049, p > 0.05) showed a significant difference for pup USV-evoked responses (Fig. 4B), suggesting that the average excitatory response of core auditory cortical neurons to USVs does not differentiate between these vocal categories. Hence, insofar as average pooled responses and spatial response maps are similar coarse measures of stimulus representations, our result from USV-excited SUs in awake animals is consistent with the absence of excitatory plasticity observed in our anesthetized multiunit mapping study.
Improved SNR for pup USV category among maternal putative pyramidal neurons
Given the high degree of diversity in response profiles though, averaging over the entire USV-excited SU population could obscure systematic plasticity occurring within distinct subpopulations of neurons. Segregating SUs into different physiological subclasses, as has been done based on an action potential waveform (Atencio and Schreiner, 2008; Lin and Liu, 2010), could help to reduce this variability. We thus subdivided (see Materials and Methods) our SU population to isolate more homogeneous subsets whose sound-encoding characteristics have been previously validated using a feedforward modeling approach (Lin and Liu, 2010). Specifically, we had found that a distinct subset of all PINs with thin spikes have short-latency, more transient responses to calls that can be well predicted by the acoustics of the onset of a sound (Fig. 3A,B), whereas a subset of all PPNs with thick spikes and longer latencies are much less acoustically faithful to these onset characteristics (Fig. 3C,D). We next assessed whether these neural subgroups, which can be found in both maternal and nonmaternal animals, are differentially plastic for the pup USV category as it gains behavioral relevance.
Among our PPNs in nonmaternal mice, the average absolute firing rates were similar in response to pup and adult USVs (Fig. 4D, top, E, right). For maternal mice, though (Fig. 4D, bottom), the time course of PPN firing was stronger on average in response to pup USVs compared with adult USVs. Additionally, the spontaneous firing of USV-excited PPNs (but not PINs) in maternal mice was significantly lower than for nonmaternal PPNs (Wilcoxon rank-sum test; PPN: W(14,15) = 156, p < 0.05; PIN: W(8,11) = 68, p > 0.05; Fig. 4C), refining a similar result found earlier for USV-responsive SUs (Lin et al., 2013). As such, there was a significant interaction between maternal experience and USV type for both the absolute firing rate (two-way ANOVA, F(1,27) for group × USV = 7.53, p < 0.05) and the SNR (two-way ANOVA, F(1,27) for group × USV = 8.30, p < 0.05; Fig. 4E). Indeed, nearly every PPN in maternal mice had a higher absolute firing rate and SNR for pup versus adult USVs, a percentage far greater than chance (14 of 15 PPNs; proportion test: z = 2.51, p < 0.05), unlike in nonmaternal mice (9 of 15 PPNs; proportion test: z = 0.55, p > 0.05).
In contrast, neither the absolute firing rate responses (two-way ANOVA: F(1,17) for group × USV = 0.48, p > 0.05, ns) nor the SNR (two-way ANOVA, F(1,17) for group × USV = 2.67, p > 0.05, ns; Fig. 4F,G) of PINs were systematically influenced by USV category in nonmaternal or maternal mice. The number of PINs with higher responses to pup USVs compared with adult USVs was also no different from chance for either group of animals (proportion test: maternal mice: 6 of 8 PINs, z = 1.0, p > 0.05; nonmaternal mice: 5 of 11 PINs, z = −0.21, p > 0.05). Given that pup and adult USVs were matched for basic acoustic features (see Materials and Methods), our results suggest that PPNs, but not PINs, in maternal animals become sensitive to higher-order features differentiating the two vocal categories, thereby selectively biasing the representation of pup over adult USVs, presumably to enhance detection of this behaviorally relevant category.
Improved neural discrimination of pup from acoustically matched adult USVs among maternal putative pyramidal neurons
If PPNs in maternal animals differentiate pup from adult USVs based on higher-order acoustic features, then responses for a pup and adult USV that are matched in basic acoustic parameters should differ more than the responses for the two adult USVs that are matched in the same basic parameters. This would indicate sensitivity to features of the sound that we cannot match, and which would be more systematically different between pup and adult USVs (e.g., the more pronounced FM observed throughout the adult USVs). We sought support for this idea by applying spike distance metrics to measure the similarity between spike trains evoked by pairs of matched USVs. Spike trains with high similarity (or a small “distance” between trains) are less discriminable than those with low similarity (large distance). We calculated distances between spike train pairs consisting of either two adult USV responses or one pup USV response and one adult USV response, where stimuli were matched in all cases for duration and frequency (i.e., calls occupied the same grid location as in Fig. 1A, e.g., USV 1–19 vs 19–20).
Among PPNs, mean pup–adult distances were reliably larger than adult–adult distances for most SUs in maternal but not nonmaternal mice (Fig. 5A, top), indicating enhanced discrimination of pup from acoustically matched adult USVs in the maternal group. For this subpopulation, we found the ratio of mean pup–adult to mean adult–adult USV response distance to be significantly higher for maternal than for nonmaternal mice (two-sample t test: t(27) = 2.39, p < 0.05; Fig. 5A, top inset). This was not found for the PIN subpopulation (Fig. 5A, bottom), where pup–adult and adult–adult USV response distances were comparable across groups (two-sample t test: t(17) = 1.22, p > 0.05, ns; Fig. 5A, bottom inset). Our results were consistent whether we used the van Rossum (2001) method for computing spike distances (Fig. 5) or the alternative Victor and Purpura (1997) metric.
Our findings were made more striking by the fact that the pup and adult USVs used to estimate pup–adult distances were actually matched for all three basic acoustic parameters (frequency, duration, and degree of FM at USV onset); whereas, the USVs used for adult–adult distances were necessarily matched only for frequency and duration. Thus, in these comparisons, the pup USV was more similar in its basic acoustics to the adult USV than the latter was to another adult USV. Yet for putative pyramidal neurons in maternal animals, adult USVs evoked spike trains that were more dissimilar from the semantically different pup USV than the semantically identical adult USV. This improved discrimination of the pup away from the adult USV is presumably based on natural acoustic features that were not matched between categories, such as the full trajectory of the frequency of a USV.
Bias for prototypical combinations of acoustic features in pup USV category by maternal putative pyramidal neurons
The analyses above suggest plasticity within individual USV-excited PPNs for the combination of acoustic features found in natural pup USVs. At the population level, we can ask whether this translates into a bias in how many such units respond when a specific sound contains the combination of features most predictive of this category. We constructed maps of the incidence of detectable neural responses (see Materials and Methods) for each pup USV within our acoustic grid for all SUs classified as USV excited, with separate maps for USVs with low- or high-onset FM (Fig. 6A). The gray shading overlaid on the acoustic feature probability clouds indicates the proportion of PPNs that responded to the corresponding USV exemplar.
Across the 18 pup USVs, the average proportion of the USV-excited PPN subpopulation responding was significantly higher for maternal compared with nonmaternal animals (paired t test: t(17) = 3.85, p < 0.01). Interestingly, the increase was not simply uniform across USVs. The difference between nonmaternal and maternal animals in the proportion of responding units for each USV was significantly correlated with the probability of the combination of onset frequency, duration and onset FM for that USV to be found in our pup USV library (r = 0.49, p < 0.05; Fig. 6B). Taking the three most probable pup USVs (Fig. 6A, highlighted in red) and comparing them to the three least probable pup USVs (Fig. 6A, highlighted in brown), we found a significant interaction between animal group and USV probability (two-way ANOVA: F(1,4) for group × USV = 7.76, p < 0.05). Put another way, the difference in the proportion of responsive PPNs between the maternal and nonmaternal populations was greater for the more probable pup USV exemplars, relative to the proportion difference observed for less probable pup USV exemplars. No such effect was observed for PINs, as the proportion of this subpopulation that was excited by the most and least probable pup USVs was not modulated by maternal experience (two-way ANOVA: F(1,4) for group × USV = 5.08, p > 0.05, ns). Importantly, since this per-call analysis was performed only on USV-excited SUs (i.e., excited by any call), whose proportion as a fraction of all recorded SUs is not different between maternal and nonmaternal animals (χ(3)2 = 3.544, p > 0.05), our result here is not inconsistent with the lack of large-scale map plasticity across core AC reported above. Instead, what becomes biased in the core AC of animals that find a vocal category behaviorally relevant is the representation by specific PPNs of the combinations of acoustic features that are statistically more predictive of that category.
Discussion
Natural sound categories like speech exhibit stimulus variation in multiple acoustic dimensions simultaneously, so that more than one feature is often needed to distinguish between categories. Here, we investigated how core auditory cortical activity differentiates such feature combinations in the case of species-specific vocal categories that overlap in their features (Liu et al., 2003) and asked whether increasing the behavioral importance of a stimulus category biases this differentiation. Unexpectedly, standard methods from laboratory-conditioning studies for assessing experience-dependent plasticity revealed that in our natural maternal communication context, the enhanced behavioral relevance of a vocal category did not correlate with map expansion for the features of that sound. Instead, robust excitatory plasticity for category features was found at a finer scale. We identified a distinct set of core auditory cortical putative pyramidal neurons that develop increased sensitivity to the specific combinations of basic and higher-order acoustic features in the acquired vocal category. In maternal animals, these neurons, but not a complementary subset of putative interneurons, exhibited increased absolute firing rates in response to behaviorally important pup USVs, signal-to-noise ratio, and more spike train dissimilarity relative to acoustically matched but semantically distinct adult USVs. Importantly, at a population level, the plasticity within this class of neurons favored the combination of acoustic features that is statistically more likely to be naturally found in the behaviorally relevant vocal category.
Expansion of the core AC area tuned to a behaviorally relevant frequency has been the classic means by which experience-dependent excitatory plasticity has been documented in the laboratory (Shepard et al., 2012; Schreiner and Polley, 2014) and has been argued to be a long-term physiological trace of the acquired behavioral relevance of a sound (Weinberger, 2004). Hence, given that the single-frequency USVs of mouse pups are a natural extension of the pure tones used for laboratory conditioning, the lack of map plasticity for high ultrasonic frequencies and natural USVs after motherhood was unexpected. Several experimental considerations might have contributed to this negative result. First, it is possible that by mapping mothers at a postweaning time point, we missed a peak in map plasticity that occurred transiently to support pup USV learning during active pup care (Reed et al., 2011). Second, we mapped using only 60 ms pure tones and prerecorded USVs that, while statistically similar to them, were not actually calls produced by the pups of the experimental mothers, and we did not include other synthetic sounds that might have revealed circuitry differences (e.g., FM sweeps). This choice allowed fine control over the acoustic features of the USVs and ensured that our results would be attributable to these features rather than the explicit familiarity of a USV. However, this practice differs from conditioning studies that map responses to the actual exemplars used in training, which might be a requirement for observing map-level changes. Alternatively, our negative map finding may reflect what happens in the real world when sound categories (e.g., pup and adult USVs) overlap in the frequency domain. Tonotopic map expansion correlated with the enhanced relevance of one category could be counterproductive since it would also expand the representation of the less relevant category spanning the same frequency range. Moreover, if expansion occurs only for the exact stimulus experienced, such plasticity would not be adaptive for processing natural variants within a sound category.
In a real-life situation where categorization across multiple acoustic features is needed, a more critical form of experience-dependent plasticity may pertain to how specific subpopulations of SUs respond to important categories. In the present study, we found that while maternal experience did not alter firing rates or SNRs when averaging over the entire SU pool, robust plasticity favoring pup USVs over acoustically matched adult USVs emerged when a definable subgroup of PPNs was considered separately. This effect was not found for a complementary subgroup of PINs. We carved out these subgroups using characteristics validated from a feedforward model that predicts initial cortical spiking from the integration of the amplitude envelope of a sound (Neubauer and Heil, 2008; Lin and Liu, 2010). The best-predicted units faithfully encode sound onsets with short latency and are all thin spiked with higher spontaneous firing rates, while the worst-predicted units are predominantly thick spiked with low spontaneous rates and longer, less predictable response latencies (Lin and Liu, 2010). The spike widths of these subpopulations suggest that they represent subsets of interneurons and pyramidal neurons, respectively (McCormick et al., 1985; González-Burgos et al., 2005). Recent studies have identified differential roles in sound encoding for these neural subtypes (Atencio and Schreiner, 2008; Lin et al., 2013; Schneider and Woolley, 2013), but, to our knowledge, our results are the first to demonstrate how plasticity specific to a subset of thick-spiked core AC neurons could seed a more categorical representation of a natural vocal category.
The difference in long-term plasticity between these subsets of PPNs and PINs may seem somewhat surprising, under the presumption of a general balance between excitation and inhibition in AC (Wehr and Zador, 2003; Wu et al., 2008). Indeed, making a stimulus perceptually relevant by pairing it with cholinergic activation alters both inhibitory and excitatory synaptic inputs onto AC neurons until a new balance is reached (Froemke et al., 2007, 2013). It may be that our systematically defined PIN subpopulation represents only one type of inhibitory interneuron (presumably fast-spiking, parvalbumin-positive; Lin and Liu, 2010), and, by leaving unclassified other thin- and thick-spiked SUs, we may have missed long-term increases in the pup USV-excited responses of other classes of inhibitory interneurons. For example, late-onset somatostatin-expressing interneurons in layer 2/3 of AC target distal dendrites of excitatory neurons and might be responsible for modulating the frequency trajectory-dependent responses of our late-responding USV-excited PPNs (Li et al., 2014). Prior work also suggests that motherhood alters USV-inhibited responses (Galindo-Leon et al., 2009; Lin et al., 2013), particularly in regions tuned lower than the ultrasound frequencies, though it is not yet clear whether this would be due to increased firing of fast-spiking interneurons (Cohen et al., 2011) or plasticity in other interneurons.
Within PPNs though, we uncovered an acquired “combination sensitivity” for multiple acoustic features of a behaviorally relevant category. Auditory combination sensitivity has been seen in many species (Suga et al., 1983; Margoliash and Fortune, 1992; Rauschecker et al., 1995; Atencio et al., 2008; Sadagopan and Wang, 2009), but whether it is innately established, developmentally emergent, and/or subject to plasticity in adulthood has not been widely investigated. Our study used a unique paradigm to provide evidence that adult plasticity in this form of stimulus encoding is possible. Most studies take individual complex sounds and either parametrically morph (DiMattina and Wang, 2006) or decompose them into constituent features (Margoliash and Fortune, 1992). We instead used a collection of natural USV exemplars chosen to systematically span combinations of basic acoustic features (onset frequency, duration, and onset FM) shared by both USV categories.
Plasticity in combination sensitivity was found both within individual SUs and across the neural population. In maternal mice, but not nonmaternal mice, individual USV-excited PPNs better separated how they spiked for the set of pup USVs compared with adult USVs matched for basic acoustic features, as if they had become more sensitive to higher-order features (e.g., frequency trajectory) that make pup USVs distinct. Further, at the population level, responsiveness within the USV-excited population was reweighted across the sampled acoustic space to better encode calls with combinations of basic acoustic features that are statistically more likely to occur in natural pup USVs (Liu et al., 2003). The bias emerging from such combination sensitivity in these units likely primes higher-order areas along the auditory pathway to respond in a more category-selective manner.
More generally, our findings help to illuminate the process by which the auditory system extracts behaviorally relevant sound “objects” from the acoustic environment. Our data suggest that this begins as early as in primary auditory cortical fields, albeit in a specific subclass of neurons. Plasticity here due to learning could allow the acoustic features of the behaviorally relevant auditory object to better “pop out” of background sounds. For this reason, “salient” may be a more appropriate term than “behaviorally relevant” by connoting a more bottom-up process influenced by experience (Treue, 2003). This could enable downstream areas to be more sensitive to learned cross-category stimulus variations and to be more tolerant of within-category variations, thereby helping to untangle categorical stimulus features (DiCarlo et al., 2012). The plasticity we describe may in fact be an early mechanism by which a speaker becomes attuned to the acoustic feature combinations within phonemes that are ubiquitous in a language. For instance, the ability of a Mandarin speaker to facilely discriminate different tonal contours that a non-Mandarin speaker does not clearly perceive may be enabled by just the sort of plasticity for co-occurring acoustic features that we uncovered here.
Footnotes
↵*K.N.S. and F.G.L. are co-first authors.
This work was supported by National Institutes of Health Grants R01-DC-8343 (R.C.L.), F31-DC-11987 (K.N.S.), T90-DA-032466 (C.L.Z.), and T32-HD-071845 (K.K.C.). We thank E.E. Galindo-Leon for electrophysiology assistance.
The authors declare no competing financial interests.
- Correspondence should be addressed to Robert C. Liu, Rollins Research Building, Room 2006, 1510 Clifton Road NE, Atlanta, GA 30322. robert.liu{at}emory.edu