Abstract
Hair cells of the inner ear are essential for hearing and balance. As a consequence, pathogenic variants in genes specifically expressed in hair cells often cause hereditary deafness. Hair cells are few in number and not easily isolated from the adjacent supporting cells, so the biochemistry and molecular biology of hair cells can be difficult to study. To study gene expression in hair cells, we developed a protocol for hair cell isolation by FACS. With nearly pure hair cells and surrounding cells, from cochlea and utricle and from E16 to P7, we performed a comprehensive cell type-specific RNA-Seq study of gene expression during mouse inner ear development. Expression profiling revealed new hair cell genes with distinct expression patterns: some are specific for vestibular hair cells, others for cochlear hair cells, and some are expressed just before or after maturation of mechanosensitivity. We found that many of the known hereditary deafness genes are much more highly expressed in hair cells than surrounding cells, suggesting that genes preferentially expressed in hair cells are good candidates for unknown deafness genes.
Introduction
Hair cells (HCs), the mechanoreceptor cells for hearing and balance, are epithelial cells that have undergone terminal division and differentiation. They are embedded in sensory epithelia adjacent to cells that do not have a mechanosensory function. Understanding the gene-expression patterns that differentiate HCs from various surrounding cells (SCs) can help elucidate the proteins that give HCs their unique function.
Because HCs do not divide they must last the life of an animal, which can be many decades in the case of humans. Death of HCs over time leads to deficits in hearing and balance (Groves, 2010). Much research on treatment for age-related hearing loss is focused on identifying the patterns of gene expression that lead to an HC phenotype, so that induction of these genes in remaining supporting cells might restore function (for review, see Géléoc and Holt, 2014). But first we need to understand gene expression during development.
Some hearing loss has hereditary causes. Congenital deafness occurs in 1 of approximately 500 births and >50% of cases are caused by gene defects (Smith et al., 1993). Many such genes are enriched in HCs: although HCs constitute only approximately one-tenth of the cells in the sensory epithelium of the cochlea, approximately one-third of the genes causing inner ear malformations or dysfunction in the mouse involve HC pathology (85 of 277; http://hearingimpairment.jax.org/master_table1.html). In humans, nearly 300 genetic loci for hearing impairment have been described, but the causative genes for ∼100 have been identified (116; http://hereditaryhearingloss.org). Understanding the complete transcriptome of HCs may thus facilitate the discovery of new deafness genes.
We previously performed transcriptome studies of HCs by differential analysis between normal whole-sensory epithelia and mutant epithelia, either lacking HCs (Atoh1−/−; Scheffer et al., 2007a, b) or lacking hair bundles (Pou4f3−/−; Z.-Y. Chen and D.P. Corey, unpublished observations). But elimination or inactivation of one cell type can affect gene expression by remaining cells, compromising gene identification by differential comparison. It would be better to isolate pure populations of normal cells. Here, we used a mouse strain that expresses eGFP driven by the promoter for Pou4f3 (Huang et al., 2013), an HC transcription factor. We developed an enzymatic treatment to dissociate cells of the sensory epithelia, and FACS to purify HC. With next-generation or high-throughput sequencing (HTS), we performed an unbiased and quantitative transcriptome study at four developmental time points, before and during the acquisition of mechanosensitivity. We compared gene expression by HCs to that of the other cells in the sensory epithelium, collectively referred to as surrounding cells. Groups of genes differentially expressed in one or another cell type were linked to function. To make these data and comparative expression metrics publically available, we created the Shared Harvard Inner Ear Laboratory Database (shield.hms.harvard.edu), which presents gene expression data integrated with comprehensive annotation including potential deafness loci.
Materials and Methods
Animal protocols.
All experiments were performed in compliance with ethical regulations and approved by the Animal Care Committees of Massachusetts Eye and Ear and Harvard Medical School.
Cell dissociation, FACS, and RNA extraction.
We used a transgenic mouse strain expressing GFP under the control of the Pou4f3 promoter (Tg(Pou4f3-Isl1-eGFP)). These mice have normal morphology, normal expression of differentiation markers, and normal function until 17 months old (Huang et al., 2013). We also used a strain expressing tdTomato under the control of the Gfi1 promoter (Gfi1tm1(Cre)Gan;R26tdTomato). In both strains, the only fluorescent cells in the inner ear are HCs. Animals from either sex were used. Utricles were dissected from the temporal bone and (at postnatal stages) incubated for 2 min in protease XXIV (0.1 mg/ml) to remove the otoconia. Cochleae were dissected and freed from the spiral ganglion and Reissner's membrane to expose the sensory epithelium. All dissections were done in ice-cold PBS, and utricles and cochleae were dissected in <1 h. The organs were collected in DMEM (Life Technologies) + 5% FBS on ice. The cells were dissociated by incubating the organs at 37°C in 1 mg/ml dispase (Gibco) and 1 mg/ml collagenase I (Worthington) in 100 μl for ∼10–12 utricles or 200 μl for 10–12 cochleae for 30 min at E16 and P0 or 45 min at P4 and P7 and triturating with a pipette. The dissociation was controlled visually with an inverted microscope. Dissociation buffer (Gibco 13151–014 + 5% FBS) was added to complete the dissociation and the samples were placed on ice. The dissociated tissues were filtered through a 40 μm cell strainer to eliminate clumps before sorting. Cells were sorted on a BD FACS Aria II cell sorter using a 100 μm nozzle and low pressure. Hair cells were collected using the brightest GFP fluorescence signal and other cells were collected using the lowest fluorescence signal. The number of collected cells is indicated in Figure 1D. Cells were collected directly into a lysis buffer from the RNeasy Plus Micro kit (Qiagen). Total RNA extraction was performed using the manufacturer's recommendation and the quality was assessed with Agilent 2100 Bioanalyzer picoChips.
Study design and statistics.
In this series of 16 samples, we used a matrix design of three factors: cell types (HCs and SCs), tissues sources (cochlea and utricle), and developmental stages (E16, P0, P4, and P7). We did not include a biological duplicate of the same condition; instead, for differential gene expression analysis of each factor, we combined samples of different levels in other factors and treated them as replicates to allow statistical analysis. This practice overestimated the variance due to other factors; therefore, the FDR values were more conservatively estimated than those calculated when true biological replicates were used.
HTS.
mRNA was amplified using the Ovation 3′-DGE System (NuGEN). 3′-enriched transcriptome tags [∼500 bp on average from the poly(A) tail] were converted into nondirectional Illumina sequencing libraries at the Biopolymer Facility at Harvard Medical School. Synthetic spike-in standards were not used, because the goal was not absolute quantification, but relative differential expression. Single-end reads of 35 bp or 50 bp were sequenced using standard protocols on an Illumina HiSeq Sequencer. Paired-end reads were used for GFP+ cochlea and utricle at P0 and some sample libraries were rerun when the original read depths were not sufficient. The sequence quality was assessed using the open-source FastQC program. Quality-filtered reads were mapped against NCBI build 37/mm9 mouse genome assembly at DNAnexus (dnanexus.com). Assignments of nonunique alignments were weighed by the posterior probability of their mapping. The read counts of 20,207 RefSeq mouse genes were calculated using the 3SEQ transcriptome-based quantification analysis pipeline (https://classic.dnanexus.com/whitepapers/DNAnexus-whitepaper-rna-3seq-transcriptome.pdf). Custom scripts were developed to generate raw read-count tables. These data were summarized, normalized, and statistically tested for differential gene expression for each gene under various comparison schemes using the DESeq package in R/Bioconductor (Anders and Huber, 2010). Multiple testing corrections were performed using the Benjamini and Hochberg methods and the false discovery rate (FDR) was calculated for each gene.
We chose nondirectional single-end sequencing of 3′-tagged mRNA. RNA-Seq of the 3′ end more accurately quantifies transcripts than RNA-Seq of full-length transcripts, because length and GC content contribute most to sequencing bias (Asmann et al., 2009; Morrissy et al., 2009). Because recovered HCs were few in number, we had to amplify the transcripts to reach sufficient amount to construct the HTS libraries. Amplification inevitably introduces bias, which would be exacerbated by variable lengths of different genes if samples were random primed and the integrity of the RNA samples would be variably affected. Using RPKM/FPKM to represent gene expression values would not be as accurate because of variable actual length of transcripts amplified, especially for genes expressed at low levels and with complex genomic organization (Steijger et al., 2013). On the other hand, 3′-tags of the transcripts are of similar lengths, less susceptible to RNA degradation, and universally amplified from the polyA tail. Read counts are therefore less biased and they more faithfully reflect gene expression levels in the cells. Because the 3′-tags are short (with a mode of ∼500 bp) and because we aimed to quantify the expression level, single-end sequencing was sufficient to generate the read counts. Because few genes have overlapping 3′-ends in opposite direction in mammalian genomes, we opted for nondirectional sequencing.
The dChip V2010.01 software (https://sites.google.com/site/dchipsoft/) was used to visualize the expression data by hierarchically clustering genes. The distance metric between two genes was calculated as correlation of the expression of the 3′-tags of two genes across samples, and the centroid linkage method was used to compute the distances between a gene and a gene cluster, and between two gene clusters (Golub et al., 1999). This involved log transformation of the normalized expression values from the DESeq analysis, computing the standardized expression values (scaled to have mean 0 and SD 1) of a gene across samples, averaging the standardized values of genes sample wise in a gene cluster, and using this averaged expression profile as the expression vector of a gene cluster to compute distance between gene clusters.
The raw data are available on the GEO database, accession number GSE60019.
RT-PCR.
RNA was extracted from P6 cochleae, utricles, and hearts of CD1 mice using QIAzol (Qiagen), treated with DNase I (Roche), and reverse transcribed using random hexamer primer, with Superscript II (Invitrogen) to generate cDNA. PCRs were performed with recombinant TaqDNA polymerase (Invitrogen).
Quantitative PCR.
Real-time analysis was performed using TaqMan gene expression assays (Life Technologies) on an Mx3000P QPCR system (Agilent Technologies). All reactions were run in duplicate using the Platinum Quantitative PCR SuperMix-UDG w/ROX (Life Technologies). Data were normalized to the housekeeping gene Pgk1 and analyzed using the deltaCt method. Probes used include the following: Mm01181529_s1 (Atoh1), Mm00438168_m1 (Cdkn1b/p27Kip1), Mm00517585_m1 (Isl1), Mm01274015_m1 (Myo7a), Mm01168739_m1 (Pcdh15), Mm00436617_m1 (Pgk1), Mm00454761_m1 (Pou4f3), Mm00435971_m1 (Prox1), and Mm03053810_s1 (Sox2) from Life Technologies. The Cdh23 probe was designed to recognize isoforms E and F (accession EU681829 and EU681830): forward primer (5′-GTGATCACACGGAAGGTGAATA-3′, Probe/56-FAM/CCACATTCC/ZEN/ACAACCAGCCCTACA/3IABkFQ/, and reverse primer 5′-TTGACGATGAAGATGGGTGTC-3′, synthesized by Integrated DNA Technologies.
PCR primers include the following: Gapdh: Forward 5′-CTCATGACCACAGTCCATGC-3′, Reverse 5′-AGGTCCACCACCCTGTTGC-3′; Emcn ISH probe: Forward 5′-CAGATGGAACACCTCCCG-3′, Reverse 5′-TCCACGGATCGAGGCTA-3′; Ptgds ISH probe: Forward 5′-GACACAGTGCAGCCCAACTTTCAA-3′, Reverse 5′-TGACTGACTTCTCTCACCTGCGTT-3′; Strip2 ISH probe: Forward 5′-GAATATGGAGATTCAGACGGGC-3′, Reverse 5′-AAACATGACCACCTTCCAGAGC-3′; and Lmod3 ISH probe: Forward 5′-GTGAGGAGCTCGATGAAGACG-3′, Reverse 5′-TCGTCATCTTCCTCCTCCTCC-3′.
In situ hybridization.
Probes were obtained from Anja Beckers (Dnah5; Beckers et al., 2007), I.M.A.G.E clone (ID 4984025; Grp), or amplified from mouse cochlear cDNA (see primer list), and cloned into the pCRII-Topo vector (Invitrogen). Inner ears of CD1 mice were collected at stages E16 and P6, fixed in 4% formaldehyde, and cryosectioned (7- to 10-μm-thick frozen sections). In situ hybridization was performed as previously described (Scheffer et al., 2007b).
Immunocytochemistry.
For cryosections, inner ears of P6 CD1 mice of either sex were collected, fixed in 4% paraformaldehyde, and cryosectioned (7–10 μm thick). A microwave antigen-retrieval technique was applied (H-3300; Vector Laboratories) before permeabilization and blocking in 1× PBS + 0.05% Triton+ 8% normal goat serum for 1 h at room temperature. The sections were then incubated with primary antibodies overnight at 4°C and secondary antibodies for 1 h in blocking solution at room temperature. Stained slices and tissues were mount with ProLong Gold Antifade Reagent with DAPI (Invitrogen).
For whole mount inner ears of CD1 mice were fixed in 4% paraformaldehyde and dissected to expose the organs of Corti. Tissues were permeabilized/blocked in 1× PBS + 0.3% Triton + 8% normal goat serum (1 h, room temperature), then incubated with primary antibodies overnight (4°C) and secondary antibodies for 1 h in blocking solution (room temperature).
Primary antibodies were rabbit polyclonal anti-MYO7A (1/2000; Proteus Biosciences; 25-6790), mouse monoclonal anti-MYO7A (1/2000; Developmental Studies Hybridoma Bank, University of Iowa; 138-1), mouse monoclonal anti-LDB3 (1/300; Abnova; H00011155-M06), rabbit polyclonal anti-PMCA2 (1/200; Dumont et al., 2001), rabbit polyclonal anti-LMOD3 (1/1500; Proteintech; 14948-1-AP), and rabbit polyclonal anti-LMOD1 (1/1000; Proteintech; 15117-1-AP). Secondary antibodies were Alexa Fluor 488 or 594 goat anti-rabbit or anti-mouse antibodies (Invitrogen). Actin was counterstained with Alexa Fluor 594 phalloidin (Invitrogen).
Biological process ontology analysis.
To ascertain the extent of functional enrichment we performed an analysis with the functional annotation tool DAVID 6.7 (Huang da et al., 2009a, b), which determines whether biological processes are enriched within a target list of genes. Genes were considered enriched in cochlear and utricular HCs when total read count was >50 and, at the considered age, (GFP+ − GFP−)/(GFP+ + GFP−) > 0.67.
Principal component analysis.
Normalized gene expression values were assembled into a matrix with rows of different sample types and columns of genes. Genes with a total number of reads across all ages <10 were excluded and the numbers <1 at a single age were rounded to 1. The matrix was log2 transformed and the principal component analysis (PCA) was performed on the data using the statistics package in R. Each gene was considered a variable vector.
Results
RNA-Seq of purified mouse hair cells
To purify HCs, we used a transgenic mouse strain expressing GFP under the control of the Pou4f3 promoter. In this strain, the GFP signal is specific for cochlear inner and outer hair cells (IHCs and OHCs) and for vestibular HCs (Fig. 1A; Keithley et al., 1999; Huang et al., 2013). SCs, which include supporting cells in the sensory epithelium and nearby cells of the inner ear epithelia, were not fluorescent. We dissected the sensory epithelia from the cochlea and utricle at four stages of development—E16, P0, P4, and P7—which correspond to ages before, during, and after acquisition of mechanosensitivity. The cochlear epithelium was trimmed to include the organ of Corti, the spiral limbus, and basilar membrane but not spiral ganglion or lateral wall, and the utricular epithelium was trimmed to the edge of the macula (Fig. 1B). Cells from cochleae and utricles expressing GFP (GFP+) were separated from GFP− cells using enzymatic dissociation followed by FACS. The strong GFP signal in HCs and the size differential enabled specific separation of the HCs from SCs (Fig. 1C). Flow cytometry showed that the strongly GFP+ HCs constituted 1.2 ± 0.2 and 0.30 ± 0.08% of all cells in the utricular and cochlear samples, respectively. The number of cells harvested for each sample is indicated in Figure 1D.
FACS purification and RNA-Seq results. A, GFP signal in transverse sections of cochlear and utricular HCs from P4 Tg(Pou4f3-Isl1-eGFP) mice. Note the specific GFP expression in cochlear IHCs and OHCs. Supporting cells and other SCs are not labeled. DAPI and fluorescent phalloidin counterstains outlined surrounding areas. B, Dissected tissues used for FACS at P4, shown as combined fluorescence and bright field. Hair cells expressing GFP are green. Cochlear samples included hair cells and supporting cells of the organ of Corti, spiral limbus, basilar membrane, part of the spiral ligament, and spiral ganglion neuronal processes. Utricle sensory epithelium was trimmed away from crista ampullaris, vestibular ganglion, and the roof of the utricle. C, FACS analysis of dissociated sensory epithelia. Gating on GFP+ DAPI− cells, coupled to the size of the cells, allowed specific separation of the hair cells from surrounding cells and exclusion of dead cells and doublets. To avoid contamination, only cells exhibiting the highest intensity levels in the GFP channel were harvested as GFP+ cells. GFP− cells were those with low GFP fluorescence, as delineated. D, Number of GFP+ and GFP− cells harvested by FACS at all ages in cochlea and utricle.
Transcriptomes of the purified HCs and SCs were then studied by HTS. To quantify coding RNAs accurately, polyadenylated mRNA was amplified and 3′-enriched transcriptome tags were converted into nondirectional cDNA sequencing libraries. Each of 16 samples (HCs and SCs from cochlea and utricle at four developmental stages) was sequenced to a depth of 40–100 million reads.
PCA
Gene expression varied widely across the 16 samples. To understand the primary determinants of gene expression, we used PCA to identify the drivers of differences. The normalized GFP+ and GFP− matrices were merged based on gene names; the merged matrix was then analyzed by PCA. The first three principal components explained >50% of the variance. A biplot of principal components 1 and 2 is shown in Figure 2A. HC samples are green and SC samples purple; samples from cochlea are in dark colors and utricle in light colors. The first component (PC1; accounting for 22% of the variance) is strongly associated with the cell type: all the SC samples (left) are separated from all the HC samples (right). The second component (PC2; accounting for 17% of the variance) is loosely related to the developmental stage, with the oldest stages (P4 and P7) represented at the top of the plot and P0 at the bottom. The third and subsequent components do not show obvious correlation with identifiable factors, suggesting high complexity of gene expression patterns in HC and other cells in the inner ear during development.
Tissue and age differences in expression. A, Principal component analysis plot of all 16 samples (light colors for utricle and dark for cochlea) for all 20,207 genes represented in the NCBI build 37/mm9 mouse genome assembly. PC1 correlated with cell type: GFP− (purple) to the left and GFP+ (green) to the right, and PC2 correlated approximately with age. Most points are expression from Pou4f3-eGFP mice; those denoted as -T1 and -T2 are from mice expressing tdTomato under control of Gfi1. B, C, Analysis of the HTS results. Differential expression in GFP+ and GFP− samples was assessed using the average number of reads at all four ages. Only genes with a minimum of 15 total reads across all samples were considered. There were 5364 enriched in hair cells (GFP+/GFP− > 2), 3230 genes were enriched in surrounding cells (GFP+/GFP− < 0.5), and 9539 genes were nonspecific (0.5 < GFP+/GFP− < 2). D, Differential mRNA expression in postnatal utricular and cochlear HC samples. Genes included in this plot were at least four-fold enriched in HCs compared with SCs in either utricle or cochlea (or both). HC genes preferentially expressed (>5×) are shown in light green (utricle) or dark green (cochlea); not preferentially expressed are black.
Reproducibility of expression data
To study the widest range of cell types and developmental points, we did a single set of four RNA-Seq experiments at each age. To understand whether duplicates would produce the same results, we repeated the experiment at one age using a second mouse strain in which the fluorescent protein tdTomato is expressed under control of the Gfi1 promoter (Gfi1tm1(Cre)Gan;R26tdTomato). We dissociated and sorted cells from P0 mice using FACS as before, but with settings appropriate for tdTomato fluorescence. The experiment was run in duplicate, with independent dissections, sorting, and RNA-Seq. RNA-Seq was highly reproducible. Technical replicates obtained by sequencing the same library multiple times showed over 95% correlation. RNA-Seq results of biological replicates in P0 Gfi1tm1(Cre)Gan;R26TdTomato HCs and SCs in cochlea and utricle at P0 were correlated by 84–93%. PCA confirmed the reproducibility among samples from the same cell type, age, and tissue source, even those from different transgenic mouse strains (Fig. 2A). Data from the P0 tdTomato experiments are represented in the PCA plot (Fig. 2A) as “-T,” with the duplicates denoted as -T1 and -T2. For each cell type, duplicates tend to cluster with each other in the PCA plot, and they cluster with the data from GFP-expressing mice. Although there is no perfect overlap, these experiments provide confidence that the patterns of gene expression during development are generally reproducible.
Genes enriched in HCs
We examined the differential mRNA expression in HCs and SCs by pairwise comparison between the GFP+ and GFP− samples using the normalized read counts at four ages. The number of reads reflect the abundance of mRNA in each sample. These were divided into three different categories of genes: the HC-enriched genes (green; GFP+/GFP− > 2), the SC-enriched genes (purple; GFP+/GFP− < 0.5), and the nonspecific genes (black; 0.5 < GFP+/GFP− < 2; Fig. 2B,C). Among the 20,207 annotated mouse genes, only 2008 genes were not expressed (<15 total read counts). Among the remaining 18,199 genes, 5430 were found to be preferentially expressed in HCs (including the genes with no reads in the SCs) and 3230 in SCs; 9539 were expressed in both.
Differences between cochlear and utricular HCs
To identify genes encoding proteins important for mature HC function, we divided them further by expression in cochlear HCs, utricular HCs, or both (Fig. 2D). We selected 916 genes by choosing those enriched 4× with FDR < 0.05 in postnatal (P0, P4, and P7) cochlear HCs compared with postnatal cochlear SCs, or enriched in postnatal utricular HCs with the same criteria. Utricle HC reads were plotted against cochlear HC reads for each gene. Genes were marked as cochlea-enriched if cochlear reads were >5× utricle reads (222 genes; dark green), or as utricle enriched (41 genes; light green), or as expressed in both (653; black). Selected genes are labeled with gene names. As the cochlea sample is dominated by OHCs, some of these genes may be involved in functions unique to OHCs, such as the generation and regulation of electromotility. Indeed, one of the most differentially expressed genes is Slc26a5, encoding prestin. Genes highly enriched in postnatal utricular cells include those encoding axonemal dyneins (Dnah5, Dnah9, Dnah10, and Dnah11), a flagellar protein (Tekt2), and a centrosomal protein (Ccdc81)—consistent with the presence of kinocilia in mature utricular but not cochlear HCs.
Gene-expression analysis
To determine the specificity of the HTS results, we examined the expression patterns of genes previously known to be enriched in HCs relative to SCs. Figure 3 shows, for selected genes, the number of reads in utricular and cochlear HCs compared with SCs at each age. Transcripts from well known HC genes such as Atoh1, Gfi1, Slc26a5, Tmc1, Lhfpl5, Pcdh15, Fscn2, and Chrna9 were found, as expected, to be highly enriched in HCs (green bars) compared with SCs (purple bars; Zuo, 2002; Peng et al., 2011; Schimmang, 2013). The enrichment of each gene in HCs, averaged across organs and ages, is indicated by the fold-change (FC) value (FC = GFP+/GFP−), which ranged from 35 (Pcdh15) to 1395 (Gfi1), and by the FDR (Fig. 3). Similarly, genes known to be enriched in SCs, such as Wnt5a and the deafness gene Coch (Robertson et al., 2001; Qian et al., 2007), have few reads in GFP+ cells (Fig. 3). Fold-change values for these two are 0.06 and 0.008, respectively.
Representative HTS expression profiles for Slc26a5 (prestin), Atoh1, Tmc1, Gfi1, Fscn2, Pcdh15, Lhfpl5, Chrna9, Coch, and Wnt5a. Histograms display the normalized number of reads in hair cells (green) and surrounding cells (purple) in samples from the cochlea (co; dark colors) and utricle (ut; light colors) at ages E16, P0, P4, and P7. Indicated for each gene are the FC representing the GFP+/GFP− counts ratio and the multiple test adjusted FDR determined by the Benjamini–Hochberg procedure.
These data, from developmental ages E16 to P7, reveal the time course of gene expression in addition to enrichment in different cell types. For instance the transcription factor Atoh1, involved in the differentiation of HCs as early as E12.5, is known to be downregulated at postnatal stages when HCs mature (Bermingham et al., 1999; Scheffer et al., 2007a). As expected, HTS data show expression of Atoh1 decreasing between E16 and P7 in the utricle and between P0 and P7 in the cochlea—consistent with the earlier maturation of the utricle compared with the cochlea (Fig. 3).
Purity of the FACS HCs
We examined differential expression in GFP+ and GFP− cells of genes known to be expressed in HCs and supporting cells. In particular, we looked at the Notch, Hes, Jag, and Delta genes, which are Notch pathway effectors that determine the mosaic pattern of the inner ear sensory epithelia. As expected, Jag2 and all the Delta-like (Dll) members were highly enriched in GFP+ cells (Fig. 4A). Notch1, Notch2, Notch3, and Hes1 were enriched in GFP− cells. Jag1 and Hes5, previously shown to be expressed in both HCs and SCs (Zine et al., 2000; Hartman et al., 2009), had a broad expression pattern in our HTS data. Thus the dissociation protocol apparently separated HCs from SCs.
Validation of the HTS results. A, HTS expression profiles at all ages of key proteins of the Notch pathway. The Notch receptor genes, Notch1, 2, and 3, as well as the downstream effector, Hes1, were mostly expressed in surrounding cells (purple). Hes5 and Jag1 had nonspecific expression (black). The ligand-encoding genes Jag2, Dll1, Dll3, and Dll4, were enriched in hair cells (green). B, HTS results (blue) compared with qPCR (gray). The average counts and qPCR results in P0, P4, and P7 cochlea samples relative to the E16 sample were plotted as fold change.
We also used gene expression to assess the purity of the samples. HCs and supporting cells are connected by both tight junctions and adherens junctions, so they can be difficult to separate. Despite the mesh filtering before FACS and the use of a size criterion to avoid doublets of cells during the sort, there might be a supporting cell contamination among the purified hair cells. At P7, when adherens junctions were mature and contamination might be highest, we identified nine genes that had at least 60 total reads in the two GFP+ samples and had between 100- and 270-fold more reads in the GFP− samples. These include Ptgds, which encodes prostaglandin D2 synthase; Col1a2, which encodes collagen type 1a2; Dcn, which encodes the collagen-associated protein DECORIN; and Coch, which encodes the surrounding cell protein COCHLIN (others were Atp1a2, Car3, Lef1, Apod, and Apoe). If there was significant contamination from SCs, SC genes would have been detected at higher levels in the GFP+ samples. Suppose, for instance, that a gene is expressed exclusively in SCs and that there are 8000 reads for that gene in the GFP− sample. If there is 1% contamination of SCs in the GFP+ sample, we would expect 100-fold fewer reads in the GFP+ sample, or 80 reads. Because we found at least nine genes with >100-fold fewer reads in GFP+ than GFP− samples, we can conclude that the HC sample is at least 99% pure (1 − 1/100).
A similar analysis found 14 genes (Ocm, Smpx, Espnl, Tmc1, Srrm4, Calb1, Strip2, Kcnmb2, Pcp4, Pvalb, Nrxn3, Mreg, Apba1, and Bdnf) that had 100- to 1000-fold more reads in the GFP+ than GFP− samples at P7, indicating that the SC sample is also at least 99% pure.
Validation of RNA-Seq results by quantitative PCR
To validate the differences in expression among cell types and developmental stages revealed by HTS, we used quantitative PCR (qPCR) to examine the expression of selected HC genes (Pcdh15, Pou4f3, Cdh23, Atoh1, and Myo7A). In HTS profiles from the cochlear samples from E16 to P7, these genes were found to have a significantly differential expression pattern (Fig. 4B). qPCR on the same samples also showed differential expression and good agreement with HTS data. Similarly, we studied the expression of genes expressed in the prosensory domain during early development that later becomes supporting cell specific, such as Prox1, Cdkn1b, Isl1, and Sox2 (Chen and Segil, 1999; Bermingham-McDonogh et al., 2006; Hume et al., 2007; Huang et al., 2008). Both HTS and qPCR showed a relative decrease in expression in GFP+ cells from embryonic to postnatal cochleae, as expected (Fig. 4B).
Deafness genes
Mutations of genes that are uniquely expressed in HCs are likely to cause deafness. We ranked, by enrichment level in HCs, the 18,199 genes expressed in our samples (Fig. 5A). We then identified the 112 known syndromic and nonsyndromic deafness genes (http://hereditaryhearingloss.org) and examined the positions of these genes within the ranked list. The deafness genes represented ∼0.6% of the detected genes (109/18,199). We found that 50 of the deafness genes known to be HC specific, such as the Usher syndrome genes and myosin genes Myo6 and Myo7a, were enriched in HCs according to HTS data. More than half of them (27) were included in the top 1000 genes of this ranking (Fig. 5A,B, first bin). With additional selection criteria, deafness gene candidates can be enriched further. For instance, if a search is restricted to those genes upregulated by >5× during development and sorted by the GFP+/GFP− ratio, then 20 deafness genes are in the top 500, and 10 are in just the top 100. Genes differentially expressed in HCs are therefore a rich pool of deafness-gene candidates.
Deafness gene distribution. A, Expressed genes (total reads > 15) ranked in order of the GFP+/GFP− read ratio at all ages. Green indicates the HC-enriched genes (fold change > 2), purple the genes enriched in SCs (fold change < 0.5), and black the nonspecific genes (NS). Known deafness genes are indicated in each category. B, Histogram showing the distribution of deafness genes relative to their GFP+/GFP− rank at all ages. Each bin represents 1000 genes. The majority of deafness genes previously shown to be expressed in hair cells was found in the top 2000 genes, and deafness genes expressed in surrounding cells were found within the last 3000 genes.
Similarly, more than half of the deafness genes enriched >2× in SCs, such as Coch and Pou3f4, were in the 1000 genes most enriched in SCs (Fig. 5A,B, last two bins). Genes most enriched in HCs or SCs are more likely to encode proteins with specialized roles in the inner ear and, perhaps, are more likely to produce only hearing and balance disorders if mutated, so are good candidates for deafness genes.
Discovery of new HC and SC genes
To provide insight into gene networks involved in HC development and function, we grouped genes with similar patterns of expression. Such grouping has the potential to identify narrower classes of genes than a simple enrichment as in Figure 3. We constructed relative mRNA expression profiles and used dChip to arrange the genes according to their expression pattern. The hierarchical clustering heat maps showed subsets of genes enriched in SCs, in all HCs, in cochlear HCs, or in utricular HCs (Fig. 6, Tables 1 and 2). A variety of expression patterns was observed and these match expression patterns of constituent genes when known (Zine et al., 2000; Zuo, 2002; Jones et al., 2006; Hertzano et al., 2007; Scheffer et al., 2007b; Cotanche and Kaiser, 2010; M.J. Shin et al., 2010; Peng et al., 2011; Yoon et al., 2011; Carlisle et al., 2012; Schimmang, 2013; Hereditary Hearing Loss Homepage, http://hereditaryhearingloss.org). For instance, some genes were much more highly expressed in HCs than SCs, and were expressed at the earliest ages tested (Fig. 6A). Among them are Atoh1, Gfi1, Pou4f3, and Lhx3—all transcription factors essential for HC development.
Spatial and temporal expression patterns of genes by hierarchical clustering. Heat maps are shown for selected tree branches clustered by spatial and temporal expression patterns; red represents above-average expression levels and blue below-average levels. Each row represents a gene, and each column a sample. Plotted below each heat map are the average standardized gene expression values across samples for that cluster. Examples of genes in the cluster are listed. The GFP+ samples are shown in green and the GFP− samples in purple. Clusters represent embryonic and postnatal hair cell-enriched genes (A), postnatal hair cell-enriched genes (B), cochlear hair cell-enriched genes (C), utricular hair cell-enriched genes (D), and surrounding cell-enriched genes (E). See also Tables 1 and 2. Coch, cochlea; Ut, utricle.
Genes expressed by embryonic and postnatal hair cells, postnatal hair cells, cochlear hair cells, and utricular hair cells as shown in Fig. 6
Genes expressed by surrounding cells as shown in Fig. 6
HCs become mechanically sensitive between E16 and P0 in the utricle and between P0 and P4 in the cochlea (Géléoc and Holt, 2003; Lelli et al., 2009). It is reasonable to suppose that genes critical for mechanotransduction are expressed in HCs but not SCs and are first expressed during those periods. Thus it was not surprising to find such genes enriched in HCs but not expressed early in the development of the cochlea (Fig. 6B). These include Cdh23, Tmc1, Myo15, and Kcna10, which are associated with mature HC function (Kawashima et al., 2011; Peng et al., 2011; Lee et al., 2013). Still others (Fig. 6C) were expressed only later in development, but only in cochlear HCs. Genes in this cluster may be associated especially with OHCs, as OHCs outnumber IHCs by three to one in the cochlear HC sample. Notable among these is Slc26a5, encoding the OHC motor prestin. A fourth cluster (Fig. 6D) is primarily expressed in utricular HCs. Some of these genes, such as Dnah5, Dnah10, Dnah11, and Tekt2 (Yoon et al., 2011), encode known components of microtubule-based cilia, and their presence in this group may reflect the retention of kinocilia in the utricle and their degradation in cochlea. Another cluster (Fig. 6E) is expressed at much higher levels in SCs than in HCs. These include genes known to function in SCs, such as Coch, Gjb2, and Notch2. Table 1 lists the genes in each of the HC clusters and Table 2 lists the genes in the SC cluster. The clusters shown are just a subset of all branches in a hierarchical tree, but they illustrate the potential to identify genes that may have similar functions in HCs at similar stages of development.
Validation with in situ hybridization and immunocytochemistry
To validate the tissue and cell specificity revealed by the normalized read counts and the hierarchical clustering, we performed in situ hybridization and immunocytochemistry for selected genes (FC > 90; FDR< 0.1). Based on HTS reads, gastrin releasing peptide (Grp) and glutaredoxin cysteine rich 2 (Grxcr2) were expressed in both cochlear and vestibular HCs (Fig. 7A,B). Indeed, in situ hybridization in the organ of Corti and vestibular epithelium showed high levels of message in the HCs. Striatin interacting protein 2 (Strip2) and leiomodin 3 (Lmod3) were predicted to be expressed only by cochlear HCs (Fig. 7C, D). With in situ hybridization we found message only in the OHCs of the cochlea and not in vestibular epithelia. Dynein axonemal heavy chain 5 (Dnah5) was predicted to be highly expressed in vestibular but not cochlear HCs and this was confirmed by in situ hybridization (Fig. 7E). Endomucin (Emcn) was predicted to be in SCs, especially of the cochlea. With in situ hybridization we found high levels in cells of the basilar membrane (Fig. 7F), which were part of the SC samples. Finally, Ptgds, encoding a prostaglandin D2 synthase, was predicted to be mainly in vestibular SCs. We found expression in the dark cells of the ampulla (Fig. 7G), which are involved in pumping K+ into endolymph. We also found Ptgds expressed in the stria vascularis of the cochlea, which is similarly involved in ion transport. The low number of reads in cochlear SCs, in contrast to the conspicuous expression in the stria vascularis, is probably because the stria vascularis was not included in the sensory epithelium dissected for FACS.
Validation by in situ hybridization of cell specificity predicted from hierarchical clustering. Shown for each gene are the normalized HTS read counts for each condition (left), a schematic of the predicted cell specificity based on these counts (middle), and in situ hybridization in cochlear and vestibular sensory epithelia (right). Scale bars: 20 μm. A, Grp was expressed in both cochlear and saccular hair cells at E18. B, Grxcr2 was expressed in both cochlear and utricular hair cells at P6. C, D, Strip2 and Lmod3 were detected only in P6 OHCs. E, Dnah5 expression was restricted to utricular hair cells at P6. F, Emcn, predicted to be in cochlear surrounding cells, was found mostly in basilar membrane at P6. G, Ptgds, predicted to be mainly in vestibular surrounding cells, was found adjacent to the sensory epithelium of the ampulla but also in the stria vascularis of the cochlea at E18. Co, cochlea; ut, utricle.
We also used immunocytochemistry to validate predicted expression and to confirm translation of selected genes. Lmod3 encodes leiomodin 3, an actin-filament nucleator (Chereau et al., 2008; Campellone and Welch, 2010; Nanda and Miano, 2012). It was predicted from HTS data to be expressed only in HCs of cochlea but not utricle, and primarily at the later two developmental stages. With antibody labeling, we found strong leiomodin 3 labeling of cochlear outer but not inner hair cells at P6, and little or no label in the utricle, consistent with in situ hybridization (Figs. 7D, 8A). A closely related gene, Lmod1, was apparently expressed in all HCs: it was found in the embryonic and postnatal HC cluster (Fig. 6A, Table 1). By antibody label we found it throughout the cytoplasm of all HCs of the cochlea and utricle, but not SCs, as predicted by HTS (Fig. 8B).
Validation by immunocytochemistry of predicted cell specificity at P6. Normalized HTS read counts for each condition, predicted cell specificity, and antibody labeling in cochlear and vestibular sensory epithelia. Scale bars: 20 μm. A, Lmod3 was found mostly in cochlear but not utricular hair cells; within the cochlea, it was expressed mainly in OHCs. B, Lmod1 was found in all cochlear and utricular hair cells and was located throughout the cell bodies. C, Ldb3 was found in cochlear and utricular hair cells; in the cochlea, it was expressed mainly in IHCs. LDB3 immunoreactivity was localized to the zonula adherens region. Co, cochlea; ut, utricle.
Similarly, Ldb3 encodes the cardiac muscle protein LIM-domain-binding 3 (LDB3/ZASP/Cypher), which links α-actinin-2 to PKC (Zhou et al., 1999). HTS counts predict expression only in HCs, and more in utricular than cochlear HCs. Consistent with HTS data, we found sparse LDB3 labeling in cochlear IHCs and strong labeling in vestibular HCs, with striking localization at the circumferential actin bands of the zonula adherens (Fig. 8C). Although a limited number of genes was tested, immunocytochemistry suggests that gene expression at the mRNA level generally reflects protein expression.
Biological process changes with age
HCs in both cochlea and utricle undergo a striking differentiation between E16 and P7. The development of the cochlea is not perfectly correlated with development of the utricle: the utricle gains mechanosensitivity by E17, whereas the cochlea is delayed until P2, and hair bundles have all formed in cochlea by P2, whereas some continue to develop in utricle for 2 weeks. Even within the cochlea, development progresses from base to apex over ∼2 d. However the broad patterns of expression may be revealed by ages separated by 3–4 d. We generated four lists of genes that are preferentially expressed in all HCs at E16, P0, P4, or P7. To investigate the biological processes associated with HC development, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID) for functional annotation of these genes (Huang da et al., 2009a, b). DAVID uses the Gene Ontology classifications of associated genes with biological processes, and then calculates which processes are strongly represented in a specific gene list.
Figure 9 represents the degree of enrichment for 43 of the most represented processes, and how enrichment changes with age. E16 HCs mostly expressed genes related to morphogenesis, differentiation, and development, reflecting the transition to a HC phenotype. P0 and P4 HCs continued to express genes associated with differentiation, but sensory and inner ear categories were more highly represented. P7 HCs were no longer enriched in morphogenesis genes, but showed much more representation of cell signaling and synaptic genes, consistent with a mature phenotype. Although P4 HCs are largely mature in mechanosensation, these data reveal further differentiation of function between P4 and P7, especially as HCs connect to the brain.
Enrichment of gene ontology categories expressed by HCs at E16 (gray), P0 (purple), P4 (blue), and P7 (red). The y-axis represents the enrichment (observed number of genes/expected).
Discussion
Isolation of HCs
Biochemical and molecular genetic studies of the HC proteome have been difficult because HCs are embedded in a sensory epithelium that includes a variety of other cell types. Biochemical characterization of one organelle—the stereocilium—has been achieved by mechanical harvesting of hair bundles (Shepherd et al., 1989; Gillespie and Hudspeth, 1991; J.B. Shin et al., 2010), but the rest of the HC has been hard to purify. A zebrafish transgenic line expressing GFP in hair cells (Tg(pou4f3:GAP-GFP)) was used to purify precursors of lateral line HC for characterizing gene expression during HC regeneration (Steiner et al., 2014), looking at earlier stages of cell-fate determination. Hertzano et al. (2011) used the endogenous cell-surface markers CD326 (Epcam) and CD49f (Itga6) to separate the sensory epithelial cells from nonsensory cells. Our data show, however, that these markers are not specific to HCs. In this study, we used FACS of cells from a transgenic mouse line (Pou4f3-Isl1-eGFP) to separately purify HCs and SCs of the cochlea and the utricle. Based on the relative abundance of selected mRNAs from brightly fluorescent and nonfluorescent cells, we estimate that the samples were >99% pure.
Sorting based on Pou4f3 expression provides higher purity than with some other HC markers such as Atoh1. For instance, we and others have FACS-purified HC from the Tg(Atoh1-GFP) mouse line (M. Huang and Z.-Y. Chen, unpublished observations; Sinkkonen et al., 2011). In those experiments we found significant contamination from supporting cells, a likely consequence of the expression of Atoh1 in the common progenitors of HC and supporting cells (Yang et al., 2010; Driver et al., 2013). Gfi1 is strongly expressed by HCs, but we found it to be expressed as well by a few other cell types in the cochlea (data not shown). Pou4f3 expression is limited to HCs, so its promoter is a better driver of HC-specific markers (Huang et al., 2011; Masuda et al., 2011).
Picking single cells with visual guidance is another way to achieve high purity, as HCs are morphologically distinguishable from surrounding cells. In this way, Liu et al. (2014) investigated gene expression in adult IHCs and in OHCs, but were not able to compare them to SCs (Liu et al., 2014). In the top 200 genes they found expressed in IHCs or OHCs, we found that only 42 or 50, respectively, were enriched by more than twofold in the cochlea at P7. For instance, they found expression of the ion channel genes Trpm3 and Trpm7 in HCs, as we did, but these genes are expressed even more highly in SCs. Comparing HC expression with SCs, at different ages, complements such studies and gives additional insight into function.
Advantages of RNA-Seq
We used RNA-Seq to generate an unbiased dataset of HC and SC mRNAs. In other studies, a pool of 30–100 cells was sufficient to obtain high-resolution transcriptomes with RNA-Seq (Marinov et al., 2014), so our dataset, with a minimum of 1120 sorted cells (Fig. 1D), is expected to accurately represent HC gene expression. RNA-Seq has a much higher dynamic range than the microarrays used in our previous studies (Scheffer et al., 2007a, b), with read counts varying over a millionfold in these samples. Moreover, a significant number of genes are not represented on microarrays or have sequence errors. For instance, RNA-Seq data led us to study the Xirp2 gene (Scheffer et al., 2015), but no probe sets corresponding to Xirp2 are included in the Affymetrix Mouse GeneChips MG-U74v2 or MOE430 that we used previously (Scheffer et al., 2007b). Although newer microarrays are more representative, they are still limited to known transcript isoforms, are unable to detect transcripts expressed at low levels, are saturated for highly expressed transcripts, and are difficult to use for cross-platform analysis. RNA-Seq data may be reanalyzed with data from different studies or with new reference sequences to generate new insights. While RNA-Seq data do not replace existing microarray-generated transcriptomic data, the miniaturization of RNA-Seq library preparation, the high multiplexibility with DNA barcodes, the ability to use single cells (Shalek et al., 2014), and the striking decline in sequencing costs all suggest that RNA-Seq should be the method of choice for transcript quantification and identification.
SHIELD database
We compiled these expression data in the Shared Harvard Inner Ear Laboratory Database (SHIELD; https://shield.hms.harvard.edu). It includes similar expression data from other laboratories (Lu et al., 2011; Shin et al., 2013; Kwan et al., 2015) and provides links for each gene to a variety of other databases.
Sorting by hierarchical analysis of gene expression
To understand the development and differentiation of HC, we performed RNA-Seq of HC mRNA at various developmental stages. These data can be sorted or clustered in a variety of ways to identify genes that may be involved in particular functions. We began with a hierarchical clustering that groups genes based on their expression levels in the 16 conditions. We were able to identify genes that are upregulated during development of mechanosensitivity (E16 compared with P0 in the utricle and P0 compared with P4 in the cochlea). Many such genes are known to be essential for HC function (Fig. 6A,B), and these include the following: Tmc1, Tmc2, Pcdh15, Cdh23, Lhfpl5, Myo15, Tmie, and Fscn2. It is likely that many more genes important for mechanotransduction are among this subset.
Another cluster contained genes that are highly expressed in cochlear HCs but less so in utricular HCs (Fig. 6C). Because the cochlear sample is dominated by OHCs, many genes enriched in cochlea relative to utricle may participate in functions unique to OHCs. With in situ hybridization or specific antibodies, we validated the restricted expression in OHCs of two such genes, Lmod3 and Strip2 (Figs. 6C,D, 7A). The STRIP2 protein is associated with cortical F-actin bundles and cell–cell adhesion (Goudreault et al., 2009; Bai et al., 2011) and may participate in the OHC cortical lattice. Slc26a5, Strip2, and Lmod3 were consistently found in the 222 genes enriched in cochlear HCs (Fig. 2D). Our comparison of cochlear and utricular HCs is not as direct as that of Liu et al. (2014) for determining IHC and OHC differences, but has the advantages of much greater dynamic range and an understanding of changes in expression during development.
A third cluster is expressed in HCs, but more highly in utricle than cochlea (Fig. 6D). Some of these encode proteins associated with microtubule-based cilia; they could be components of the kinocilia, which are present in utricle but absent in mature cochlea.
Some closely related genes have very different expression patterns in the inner ear, reflecting distinct function even within one cell type such as HCs. For instance, the leiomodins, Lmod1 and Lmod3 (Fig. 8), are differentially expressed: Lmod1 is expressed in all HCs of the inner ear, throughout development, whereas Lmod3 is restricted to the OHC, and only after P0.
It is important to recognize the limitations of such comparisons. In this analysis, genes highly enriched in HCs are enriched only relative to SCs. They might be highly expressed in related cell types such as central neurons, and not unique to HCs. Similarly, genes expressed in both HCs and SCs might, nevertheless, be exclusive to inner ear sensory epithelia and appear nowhere else in the body. Comparison to expression databases from other tissues can help illuminate the possible functions of a gene.
Finally, genes expressed in other cochlear cell types and important for other aspects of cochlear function could be identified in the same way, limited only by the availability and specificity of mouse lines expressing a fluorescent tag in certain cells of the inner ear.
Deafness
These data, presented in the SHIELD database, have already helped identify a new Usher syndrome gene, CIB2 (Riazuddin et al., 2012), and helped confirm the HC expression of Kir2, an inwardly rectifying potassium channel protein (Levin and Holt, 2012). One of the genes most highly enriched in our HC data is Grxcr2, and mutations in human GRXCR2 were recently found to cause recessive hearing loss (Imtiaz et al., 2014). Another highly enriched gene is XIRP2; in related work we found XIRP2 to be associated with the stereocilia and cuticular plates of HCs, and to cause progressive stereocilia disorganization when absent (Scheffer et al., 2015). We expect that these expression data will aid in the discovery of additional human deafness genes. In addition, deafness is not always associated with vestibular dysfunction, and indeed we found many genes enriched in just cochlea or utricle. Differential expression data should aid in understanding the etiology of complex inner ear disorders.
In summary, we performed the first comprehensive gene expression study specific for vestibular and cochlear HCs during mouse development. The public database generated is a powerful tool for discovering and understanding proteins involved in HC function. We anticipate these data will lead to the discovery of new genes important for HC mechanotransduction and new deafness genes in mouse and human.
Footnotes
This research was supported by National Institutes of Health grants R01-DC000304 and R01-DC002281 (D.P.C.), R03-DC013866 (J.S.), and R01-DC006908 (Z.-Y.C.); by the Frederick and Ines Yeatts Hair Cell Regeneration Grant (Z.-Y.C.); by P30 DC05209 to the Eaton-Peabody Laboratory of M.E.E.; and by a Hearing Health Foundation Emerging Research Grant (J.S.). D.P.C. is an Investigator and J.S. was an Associate of the Howard Hughes Medical Institute. We appreciate assistance from the Harvard Stem Cell Institute Flow Cytometry Core Facility at Massachusetts General Hospital and from the Harvard Medical School Biopolymer Facility. We also appreciate the gift of a DNAH5 probe from Anja Beckers at Hannover Medical School. We thank the Neurobiology Imaging Facility (a part of the Neural Imaging Center supported by National Institute of Neurological Disorders and Stroke P30 Core Center Grant NS072030) for consultation and instrumentation and Dr. Lisa Goodrich for useful comments on this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to either of the following: David P. Corey, Department of Neurobiology and Howard Hughes Medical Institute, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115, dcorey{at}hms.harvard.edu; or Zheng-Yi Chen, Department of Otology and Laryngology, Harvard Medical School and Massachusetts Eye & Ear, Boston, MA 02114. zheng-yi_chen{at}meei.harvard.edu