Abstract
The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function. To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex. We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type. Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain. For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase. This dataset will provide a powerful new resource for understanding the development and function of the brain. To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://web.stanford.edu/group/barres_lab/brain_rnaseq.html) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain.
Introduction
Together with neurons, glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) are essential for the development and function of the nervous system (Allen and Barres, 2009; Molofsky et al., 2012). Each of these major cell types express a distinct repertoire of genes, and together these transcriptomes provide fundamental insights into the development and function of the brain. A transcriptome database of purified brain cell types has a variety of potential uses, including identifying cell type-specific markers, helping to develop tools that allow cell type-specific genetic manipulations (e.g., cell ablation, gene silencing, and optogenetic manipulations), revealing the metabolic division of labor between different cell types and suggesting novel cell-specific functions.
Previous groups have performed transcriptome studies of one or more purified cell types in the brain (Dugas et al., 2006; Rossner et al., 2006; Lovatt et al., 2007; Cahoy et al., 2008; Doyle et al., 2008; Lau et al., 2008; Daneman et al., 2010; Wylie et al., 2010; Bracko et al., 2012; Friedrich et al., 2012; Lerch et al., 2012; Bandyopadhyay et al., 2013; Beutner et al., 2013; Chiu et al., 2013; Orre et al., 2014), but because of the technical complexity involved in purifying numerous cell types and varying foci of previous studies, a complete transcriptome dataset that allows direct comparison of gene expression across all of the major glial and vascular brain cell types is lacking. Moreover, the majority of previous transcriptome studies were performed using microarrays, in which transcript abundance is indirectly deduced from fluorescence signal intensities.
RNA sequencing (RNA-Seq) is a method that profiles the transcriptome by deep sequencing of isolated RNAs. Transcript abundance is directly proportional to the number of sequencing reads that map to a specific transcript, resulting in lower false-negative and false-positive discovery rates in addition to a greater linear range (Bainbridge et al., 2006; Cloonan et al., 2008; Marioni et al., 2008; Mortazavi et al., 2008; Nagalakshmi et al., 2008; Sultan et al., 2008; Wilhelm et al., 2008; Trapnell et al., 2010; Wu et al., 2010). Direct sequencing of RNA libraries also provides the opportunity to explore alternative splicing, a key mechanism that contributes to transcriptome diversity (Pan et al., 2008; Wang et al., 2008). Aberrant alternative splicing has been implicated in many neurological diseases (Buckanovich et al., 1996; Yang et al., 1998), yet the complex landscape of differential splicing among glia, neurons, and vascular cells has not been investigated.
In this study, we purified neurons, astrocytes, microglia, endothelial cells, pericytes, and various maturation states of oligodendrocytes from mouse cortex and used RNA-Seq to generate a high-resolution transcriptome database of >22,000 genes. We identified thousands of novel cell type-enriched genes and developed a library of thousands of cell type-specific splicing events. To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://web.stanford.edu/group/barres_lab/brain_rnaseq.html) that provides readily accessible platforms for analyzing and comparing transcription and alternative splicing profiles for various cell classes in the brain.
Materials and Methods
Purification of glia, neurons, and vascular cells
All procedures involving animals were conducted in conformity with Stanford University guidelines that are in compliance with national and state laws and policies. The purification procedures are modified from previously described dissociation, immunopanning, and fluorescence-activated cell sorting (FACS) purification protocols (Dugas et al., 2006; Cahoy et al., 2008; Daneman et al., 2010; Foo et al., 2011).
Purification of astrocytes.
To purify astrocytes, we used a bacterial artificial chromosome (BAC) transgenic mouse line expressing EGFP under the control of regulatory sequences in Aldh1l1–BAC (Heintz, 2004). This line has been characterized previously to have complete astrocyte-specific labeling throughout the brain (Cahoy et al., 2008). For all cell types, one biological replicate consists of pooled cells from a litter of 3–12 mice. The cortices were dissected out, and the meninges were removed. The tissue was enzymatically dissociated to make a suspension of single cells as described previously (Dugas et al., 2006; Cahoy et al., 2008). Briefly, the tissue was incubated at 33°C for 45 min in 20 ml of a papain solution containing Earle's balanced salt solution (EBSS; catalog #E7510; Sigma), d(+)-glucose (22.5 mm), NaHCO3 (26 mm), DNase (125 U/ml; catalog #LS002007; Worthington), papain (9 U/ml; catalog #LS03126; Worthington), and l-cysteine (1 mm; catalog #C7880; Sigma). The papain solution was equilibrated with 5% CO2 and 95% O2 gas before and during papain treatment. After papain treatment, the tissue was washed three times with 4.5 ml of inhibitor buffer containing BSA (1.0 mg/ml; catalog #A-8806; Sigma) and ovomucoid (also known as trypsin inhibitor; 1.0 mg/ml; catalog #109878; Roche Diagnostics) and then mechanically dissociated by gentle sequential trituration using a 5 ml pipette. Dissociated cells were layered on top of 10 ml of a high-concentration inhibitor solution with 5 mg/ml BSA and 5 mg/ml ovomucoid and centrifuged at 130 × g for 5 min. The cell pellet was then resuspended in 12 ml of Dulbecco's PBS (DPBS; catalog #14287;Invitrogen) containing 0.02% BSA and 12.5 U/ml DNase and filtered through a 20 μm Nitex mesh (Lab Pak 03-20/14; Sefar America) to remove undissociated cell clumps. Cell health was assessed by trypan blue exclusion. Only single-cell suspensions with >85% viability were used for purification experiments.
Propidium iodide (PI; 1 μg/ml; catalog #P4864;Sigma) was added to the single-cell solution to label dead cells. Cells were sorted on a BD Aria II cell sorter (BD Bioscience) with a 70 μm nozzle. Dead cells and debris were gated first by their low forward light scatter and high side light scatter and second by high PI staining. Doublets were removed by high side light scatter. Cell concentration and flow rate were carefully adjusted to maximize purity. Astrocytes were identified based on high EGFP fluorescence. FACS routinely yielded >99% purity based on reanalysis of sorted cells. Purified cells were harvested by centrifugation at 2000 × g for 5 min. The cell pellet was then used for RNA extraction.
Purification of endothelial cells.
To purify endothelial cells, we used Tie2–EGFP transgenic mice available from The Jackson Laboratory. These mice express EGFP under the pan-endothelial Tie2 promotor (Motoike et al., 2000; Daneman et al., 2010). Single-cell suspensions and FACS were performed as described above.
Preparation of panning plates.
To prepare panning plates for immunopanning, Petri dishes (150 × 15 mm; catalog #351058; BD Biosciences) were incubated with 22 ml of Tris-HCl buffer solution (50 mm, pH 9.5) and 150 μg secondary antibody overnight at 4°C. Each dish was then washed three times with 10–20 ml of DPBS and incubated with the corresponding primary antibodies diluted in 12 ml of DPBS/0.2% BSA solution per dish for at least 2 h at room temperature. Lectin-coated panning plates were prepared by adding 22 ml of DPBS and 50 μg of Banderiaea simplicifolia lectin 1 (BSL-1; catalog #L-1100; Vector Laboratories) and incubating at 4°C overnight. All panning dishes were washed three times with 10–20 ml of DPBS immediately before use. The secondary antibodies (Jackson ImmunoResearch) we used included affinity-purified goat anti-mouse IgG + IgM [heavy and light (H + L) chain; catalog #115-005-044], goat anti-mouse IgM μ-chain (catalog #115-005-02), and goat anti-rat IgG (H + L chain; catalog #112-005-167). All immunopanning was performed at room temperature.
Purification of neurons.
To purify neurons, a single-cell suspension was prepared as described above and incubated at 34°C for 1 h to allow expression of cell-surface protein antigens digested by papain and then incubated on two sequential panning plates coated with BSL-1 to deplete endothelial cells (10 min each), followed by a 30 min incubation on a plate coated with mouse IgM anti-O4 hybridoma (Bansal et al., 1989; 4 ml of hybridoma supernatant diluted with 8 ml of DPBS/0.2% BSA) to deplete oligodendrocyte precursor cells (OPCs), and then incubated for 20 min on a plate coated with rat anti-mouse cluster of differentiation 45 (CD45) (catalog #550539; BD Pharmingen; 1.25 μg in 12 ml of DPBS/0.2% BSA) to deplete microglia and macrophages. Finally, cells were added to a plate coated with rat anti-mouse L1 neural cell adhesion molecule (L1CAM; 30 μg in 12 ml of DPBS/0.2% BSA; catalog #MAB5272; Millipore) to bind neurons. The adherent cells on the L1CAM plate were washed eight times with 10–20 ml of DPBS to remove all antigen-negative nonadherent cells and then removed from the plate by treating with trypsin (1000 U/ml; catalog #T-4665; Sigma) in 8 ml of Ca2+ and Mg2+ free EBSS (catalog #9208; Irvine Scientific) for 3–10 min at 37°C in a 10% CO2 incubator. The trypsin was then neutralized with 20 ml of fetal calf serum (FCS) solution containing 30% FCS (catalog #10437-028; Gibco), 35% DMEM (catalog #11960-044; Invitrogen), and 35% Neurobasal (catalog #21103-049; Gibco). The cells were dislodged by gentle squirting of FCS solution over the plate and harvested by centrifugation at 200 × g for 10 min.
Purification of microglia and oligodendrocyte-lineage cells.
Both microglia and peripheral macrophages express the surface protein CD45. Therefore, macrophages are potential contaminants in the microglia preparation isolated with anti-CD45 antibody. To minimize macrophage contamination, we first perfused the mice with 10 ml of PBS to wash away blood from the brain. A single-cell suspension was then prepared as described above and incubated 20 min on a plate coated with rat anti-mouse CD45 (1.25 μg in 12 ml of DPBS/0.2% BSA; catalog #550539; BD Pharmingen) to harvest microglia. To purify oligodendrocyte-lineage cells, we then incubated cells not bound to the anti-CD45 antibody-coated plate sequentially on four BSL-1-coated plates (8 min each) to deplete endothelial cells and remaining microglia. The remaining cells were next incubated for 30 min on a rat anti-PDGF receptor α (PDGFRα; 10 μg in 12 ml of DPBS/0.2% BSA; catalog #10R-CD140aMS; Fitzgerald) coated plate to harvest OPCs and then incubated on an additional PDGFRα plate and mouse A2B5 monoclonal antibody ascites (American Type Culture Collection) coated plate for 30 min each to deplete remaining OPCs. The cell suspension was next incubated on an anti-myelin oligodendrocyte glycoprotein (MOG) hybridoma-coated plate for 30 min to harvest myelinating oligodendrocytes (MOs), followed by an additional anti-MOG hybridoma-coated plate for 30 min to deplete any remaining MOs. Finally, the cell suspension was incubated on an anti-GalC hybridoma-coated plate for 30 min to harvest newly formed oligodendrocytes (NFOs). For purification of RNA, the cells were lysed while still attached to the panning plate with Qiazol reagent (catalog #217004; Qiagen), and total RNA was purified as described below.
Purification of pericytes.
To purify pericytes, a single-cell suspension was prepared as described above and incubated at 34°C for 1 h to allow expression of cell-surface protein antigens digested by papain. The cells were incubated on three sequential panning plates coated with anti-CD45 (1.25 μg in 12 ml of DPBS/0.2% BSA) antibody to deplete microglia and macrophages (30 min each) and then incubated 30 min on a panning plate coated with anti-CD31 antibody (6 μg in 12 ml of DPBS/0.2% BSA; catalog #10RCD31gRT; Fitzgerald) to deplete endothelial cells. The cells were finally incubated 45 min on a panning plate coated with anti-PDGFRβ antibody (20 μg in 12 ml of DPBS/0.2% BSA; catalog #ab91066; Abcam) to harvest pericytes. Cells were removed from the plate by trypsin digestion as described above.
RNA library construction and sequencing
The polyadenylated fraction of RNA isolated from brain cell types was used for 100 bp paired-end RNA-Seq. Total RNA was extracted using the miRNeasy kit (Qiagen) under the protocols of the manufacturer. The quality was accessed by Bioanalyzer. Samples with high RNA integrity number (>8) were used for library construction. One hundred nanograms of total RNAs were used for each sequencing library. We used the TruSeq RNA Sample Prep Kit (Illumina) to construct poly(A) selected paired-end sequencing libraries following the instructions in the TruSeq RNA Sample Preparation V2 Guide (Illumina). All libraries were then sequenced using the Illumina HiSeq 2000 Sequencer. Two replicates of pooled animals for each cell type were sequenced. To minimize batch effects in library preparation and sequencing, samples were collected and sequenced in the largest feasible group size and performed by the same individuals.
Read mapping, transcript assembly, and expression level estimation
Mapping of Illumina 100 bp paired-end reads to the mouse reference genome [University of California, Santa Cruz (UCSC) Genome Browser version mm9] was performed using TopHat software (version 1.3.3; Trapnell et al., 2010), which invokes Bowtie (version 0.12.7) as an internal read mapper (Langmead et al., 2009). TopHat was designed to align reads from an experiment to a reference genome without relying on known splice sites and can be used to identify novel splice variants of genes. Read mapping (TopHat) was run with default settings and -G option, which supplies TopHat with gene model annotation of known transcripts (Illumina iGenome UCSC mm9.gtf annotation file; downloaded from http://cufflinks.cbcb.umd.edu/igenomes.html) to facilitate read mapping. After read mapping, transcripts were then assembled using Cufflinks software (version 1.3.0; Trapnell et al., 2010). Expression level estimation was reported as fragments per kilobase of transcript sequence per million mapped fragments (FPKM) value together with confidence intervals for each sample. An in-house pipeline was developed to automate the abovementioned mapping and assembly process.
Determination of the threshold for reliable FPKM estimation
To facilitate downstream analyses (such as gene fold change analysis, etc.) and to assess the accuracy and reliability of our RNA-Seq experiments, we applied a widely used procedure to determine the threshold for reliable FPKM estimation based on optimizing the intersection of false-positive and false-negative rates (Ramskold et al., 2009; Lerch et al., 2012; Chen et al., 2013). We conducted an analysis based on the 95% confidence intervals of FPKM values as calculated by Cufflinks (Trapnell et al., 2010). Genes whose estimated FPKM values had lower confidence bounds of 0 were labeled as “unreliable.” We calculated the numbers of reliable and unreliable FPKM values at various FPKM thresholds and produced false-positive and false-negative rate curves to identify an FPKM value that is optimized for both false positives and negatives. We determined an FPKM value of ∼0.04 as a threshold for minimum gene expression according to the above procedure and chose a more conservative FPKM threshold of 0.1 for the following analyses. Therefore, genes expressed at FPKM values >0.1 are expressed at statistically significant levels (>99% confidence), although this represents the bottom of an exceedingly large dynamic range (>5 orders of magnitude) of FPKM values. Any FPKM that is <0.1 were set to 0.1 for fold enrichment calculations to avoid ratio inflation (Quackenbush, 2002). The RNA-Seq data have been deposited in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) and is accessible through GEO Series accession number GSE52564.
Analysis of long noncoding RNAs
To identify cell type-enriched long noncoding RNAs (lncRNAs), we obtained the most current lncRNA annotations from the GENCODE project and created an annotation file that included both Ref-Seq and lncRNA genes. We then recalculated expression levels with Cufflinks using the amended annotation file and extracted corresponding FPKM values for lncRNAs.
Analysis of cell type-enriched genes, transcription factors, signaling pathways, and metabolic pathways
Differential expression was calculated as the FPKM of a given cell type divided by the average FPKM of all other cell types. Genes were ranked by their fold enrichment in each cell type, and top enriched genes for each cell type were identified (see Fig. 3B). To identify cell type-enriched transcription factors, we filtered our data with a previously annotated transcription factor database (Zhang et al., 2012), and top enriched transcription factors of each cell type were identified as described above.
Compared with the other seven cell types, the pericyte samples are confounded by a small number of contaminating astrocytes and endothelial cells. Accordingly, we did not include these data in any differential expression calculations of the remaining cell types. Nevertheless, when calculating pericyte-enriched genes, small astrocyte and endothelial cell contaminants do not contribute to the list of pericyte-specific transcripts because astrocyte and endothelial genes are included in the denominator of the enrichment calculation. Therefore, we have included pericyte data exclusively in lists of top cell type-specific genes (see Fig. 4; Table 1). Considering the current dearth of RNA-Seq pericyte transcriptome data, the pericyte gene expression presented here provides novel opportunities to identify molecular handles to study this important cell population.
We used Ingenuity Pathway Analysis (Ingenuity) to identify signaling pathways and metabolic pathways enriched in each cell type. Genes with FPKM >5 and fold enrichment >5 in each cell type were fed into Ingenuity Pathway Analysis to highlight those pathways with substantial expression values. Information of gene functions in specific signaling pathways and metabolic pathways with experimental evidence from the Ingenuity knowledge base were used for analysis.
Microarray analysis
Identical cell samples were used for RNA-Seq and microarray analyses. Each sample was purified with miRNeasy kit (Qiagen) and equally split into two portions, one for RNA-Seq and the other for microarray analysis. Mouse genome 430 2.0 Array (Affymetrix) was used for analysis. The .cel files from the arrays were analyzed with Arraystar 4.0 software (DNAstar) using robust multichip average processing and quantile normalization.
qRT-PCR
Forty genes found by RNA-Seq to be enriched in specific cell types were selected for qRT-PCR validation. We designed primers using NCBI primer blast software (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) and selected primer pairs with the least probabilities of amplifying nonspecific products as predicted by NCBI primer blast. We designed primer pairs to amplify products that span exon–exon junctions to avoid amplification of genomic DNA. We tested the specificity of the primer pairs by PCR with mouse whole-brain cDNA (catalog #MD201; Zyagen), and we examined the PCR product by agarose gel electrophoresis.
We performed qRT-PCR with the Fluidigm BioMark system using methods modified from a previously described protocol (Citri et al., 2012). Briefly, purified cell samples for qRT-PCR were prepared identically to the methods described for the RNA-Seq samples. Cells were harvested and lysed with SuperscriptIII CellsDirect Onestep qPCR kit (catalog #11732-020; Invitrogen) according to the instructions of the manufacturer. A 500 nm primer mixture of all primer pairs was mixed with cell lysates and components of the Superscript III CellsDirect Onestep qPCR kit components according to the instructions of the manufacturer. Reverse transcription and 8–13 cycles of specific target amplification were performed in a single program. Specific target amplification products were then treated with 1 U/μl Exonuclease I (catalog #M0293L; New England Biolabs) to degrade unbound primers. The products and exonuclease were incubated at 37°C for 30 min, followed by inactivation at 80°C for 15 min. Exonuclease I-treated samples were diluted 1:1 with DNA suspension buffer (catalog #T0223; Teknova) and mixed with 2× SsoFast EvaGreen Supermix (catalog #172-5211; Bio-Rad) and 20× DNA Binding Dye Sample Loading Reagent (catalog #100-0388; Fluidigm) before loading to preprimed 96.96 Dynamic Array IFC (Fluidigm). Primer pairs were mixed with 2× Assay loading reagent (catalog #85000736; Fluidigm) and DNA suspension buffer (Teknova) and loaded to the same 96.96 Dynamic Array IFC (Fluidigm). In addition to purified cell samples, we included unpurified mouse whole-brain samples as positive controls and reactions without primer pairs or cDNA as negative controls. Thirty cycles of qPCR amplification were performed on the Fluidigm BioMark system, followed by a melting-curve analysis. A total of 9216 (96 × 96) qPCR reactions were performed on a single chip. The Fluidigm BioMark software was used for data analysis and visualization. As a quality-control step, we manually examined the melting curves for each reaction and included only those reactions with a single peak at the expected melting temperature for final data analysis. Threshold was determined automatically, and threshold cycle (Ct) values were calculated by the Fluidigm BioMark software. Gene enrichment was calculated using the ΔΔCt method in relation to the housekeeping gene Gapdh.
Analysis of alternative splicing
We developed a pipeline, Quantas, for analysis of alternative splicing information from RNA-Seq data (http://zhanglab.c2b2.columbia.edu/index.php/Quantas). Briefly, we first performed de novo mapping of raw reads to the reference genome (mm9), as well as known and novel exon junctions using small seeds with the OLego algorithm (Wu et al., 2013). We then used gapless to infer transcript structure of alternative isoforms between ambiguously located paired-end reads using a simple Bayes model that considers size constraints of each cDNA fragment and previous isoform abundance estimations from mapped junction reads. We then used countit, a library of codes that quantifies the gapless output against a comprehensive database of alternative splicing events (six in our case; Zhang et al., 2010). We derived the information relative to alternatively spliced events and corresponding genes from the output files. We computed an inclusion score estimating the probability of an exon to be included or excluded for each event type in each sample. The score is calculated as the ratios of the OLego computed counts for the form including the exon over the total. Most splicing events exhibit a bimodal distribution in which the majority of the minor isoforms constitute <20% of the reads. Based on these data, we considered an event to be alternatively spliced only if it had a score between 0.2 and 0.8 to isolate those splicing events with the most biological significance. Finally, we also performed statistical tests (Fisher's test) to determine significance of the splicing events across different cell types. We then formatted output results into html and visualized both aligned reads and mapped junction reads on a web browser.
Double-fluorescent in situ hybridization
Full-length mouse cDNA expression plasmids were obtained for Eno2, Glast, Plp1, Cx3cr1, Gdpd2, Ppapdc1a, Eltd1, Olfml3, and Lppr3 (Thermo Fisher Scientific). A full-length Cldn5 expression plasmid was a gift from D. Agalliu (University of California, Irvine, Irvine, CA). Digoxigenin (DIG)- or fluorescein-labeled single-stranded antisense and sense riboprobes were prepared by linearizing plasmids and transcription with T7, Sp6, or T3 RNA polymerases and an RNA labeling kit (Roche) according to the instructions of the manufacturer. Fresh frozen 16-μm-thick brain sections were processed as described previously (Schaeren-Wiemers and Gerfin-Moser, 1993), with the following modifications for fluorescent in situ hybridization: after hybridization, sections were treated three times (20 min each) in 3% hydrogen peroxide (Sigma-Aldrich) in PBS to block endogenous peroxidase activity, washed three times in buffer B1 (0.1 m Tris-HCl, pH 7.5, and 150 mm NaCl), and blocked in blocking reagent (PerkinElmer Life and Analytical Sciences) in buffer B1 plus 0.1% Triton X-100 for 1 h. Anti-DIG POD or anti-FITC POD (Roche) antibodies were cross-absorbed against lightly fixed brain tissue to decrease nonspecific binding, diluted in blocking buffer (1:400), and applied to tissue overnight at 4°C. Slides were developed using a fluorescein or Cy3 Tyramide Signal Amplification Plus kit (PerkinElmer Life and Analytical Sciences). For double-fluorescent in situ hybridization, the protocol was repeated starting with the peroxidase blocking step, using the alternate antibody, and a different fluorophore. Slides were extensively washed in PBS and coverslipped with Vectashield with DAPI (Vector Laboratories) for imaging.
Results
Creation of an RNA-Seq transcriptome library
To maximize biological diversity, we pooled dissected cerebral cortices from two litters of mice for each cell type that we profiled. Purified cell populations were obtained by immunopanning with cell type-specific cell-surface antibodies and FACS of transgenically labeled cell populations (astrocytes with Aldh1l1–GFP mice and endothelial cells with Tie2–GFP mice). Despite the fact that our protocols for immunopanning and FACS differ only in the final step of cell isolation, we wanted to ensure that the two methods did not affect gene expression differently. Therefore, we used both immunopanning and FACS to isolate astrocytes from mouse brain cortices and directly compared expression profiles of the two populations. As expected, both methodologies produced highly pure cell populations with minimal variation in gene expression (data not shown). Selecting an ideal time point for transcriptome profiling is a balance between maximizing cell maturation with increasing difficulty of cell isolation over time. We chose P7 because it is an age when differentiated populations of neurons, astrocytes, and endothelial cells are present and can be acutely purified with minimal activation. At P7, astrocytes are robustly growing and promoting synapse formation. Although these astrocytes have not reached their final mature size and morphology, their gene expression profiles closely resemble that of mature astrocytes isolated from P30 brains (Cahoy et al., 2008). Oligodendrocyte-lineage cell isolation occurred at P17 because it represents the earliest time point when the full collection of oligodendrocyte-lineage cells are present. We extracted RNA from purified cell populations and performed RNA-Seq with the Illumina HiSeq 2000 platform. We obtained 65.6 ± 5.4 million (data represent mean ± SD unless otherwise specified) 100 bp reads per sample. We successfully mapped 86.9 ± 6.4% of the fragments to the genome. Few fragments were mapped to multiple locations of the genome (1.5 ± 0.7%).
To assess the reproducibility of our data and conservation across biological replicates, we calculated correlations across all RNA-Seq samples and found high correlations among cell type replicates (Spearman's rank correlation, mean r = 0.979 ± 0.004) and lower correlations across differing cell types (Spearman's rank correlation, mean r = 0.714 ± 0.008). A notable and expected exception is that NFOs and MOs exhibit a tighter correlation (mean r = 0.873 ± 0.010) than typically observed across differing cell types (Fig. 1C).
Purity of neuronal, glial, and vascular samples
We next wanted to validate the purity of the isolated brain cell types. We probed the transcriptome data for expression of well known cell type-specific genes for astrocytes (e.g., Gfap, Aldh1l1, Slc1a3, Aqp4), neurons (e.g., Tubb3, Stmn2, Snap25, Eno2, Syn1), OPCs (e.g., Pdgfra, Cspg4), NFOs (e.g., Enpp6, Nfasc), MOs (e.g., Plp, Mog, Mbp, Mobp), microglia (e.g., Ccl3, Cd11b, Tnf), and endothelial cells (Cldn5, Flt1, Esam) (Fig. 1D; Dugas et al., 2006; Cahoy et al., 2008; Daneman et al., 2010; Beutner et al., 2013; Chiu et al., 2013). These classic cell type-specific markers each exhibited high expression levels in their corresponding cell types and undetectable or extremely low expression levels by the remaining cell populations (Fig. 1D). As expected, some of the oligodendrocyte-lineage markers are expressed in a graded manner at high levels during one maturation stage and at lower levels at a different maturation stage (see OPC, NFO, and MO data in Fig. 1D; Dugas et al., 2006). These data confirmed the purity of the various isolated cell types and established the feasibility of constructing a high-quality transcriptome database.
Improved sensitivity of RNA-Seq versus microarray
We chose to use RNA-Seq to construct our transcriptome database because it offers several substantial advantages over microarray-based platforms. Comparing these two technologies, RNA-Seq is believed to have increased sensitivity, improved linearity, and a vastly larger dynamic range (Marioni et al., 2008; Mortazavi et al., 2008; Wang et al., 2009). Furthermore, microarray experiments also rely on species-specific predesigned probes of varying quality that preclude analysis of unannotated genes, which substantially limits data comparison across different studies or species. In contrast, RNA-Seq generates results at single-nucleotide resolution without a priori knowledge and includes results of both known and novel genes. The high reproducibility and platform agnostic nature of RNA-Seq allows direct comparison of data from different studies across diverse species.
Expression estimation by RNA-Seq can be biased by factors including non-uniform distribution of cDNA fragments and differences in fragment GC content. To correct for these biases, we adopted the widely accepted Cufflinks algorithm to estimate FPKM values. Cufflinks implements a likelihood-based bias correction step that considers fragment size and GC content bias, among others, and has been shown to significantly improve expression estimation (Roberts et al., 2011).
To directly compare the quality of our RNA-Seq transcriptome with microarray-generated data, we compared the two technologies on identical RNA samples. We evenly divided the RNA isolated from several of the brain cell types into two groups. Half of the RNA from each cell type was analyzed by RNA-Seq, whereas the other identical half was analyzed using microarray. Comparing identical samples by the two platforms back-to-back made it possible to validate the RNA-Seq data with an orthogonal methodology and to confirm previously reported advantages of RNA-Seq over the microarray platform.
To directly compare the RNA-Seq and microarray data, we calculated the correlation between expression of those transcripts that were present in both platforms (Fig. 2A). As expected, samples from identical cell types demonstrated high correlations (Spearman's rank correlation, mean r = 0.945 ± 0.014) in contrast to those samples from different cell types (Spearman's rank correlation, mean r = 0.721 ± 0.009), indicating that both RNA-Seq and microarray are consistent in identifying characteristic gene expression signatures of distinct cell classes. Although most correlation values between mismatched cell types were low, we again found slightly higher correlations between oligodendrocytes of different maturation stages (OPCs, NFOs, and MOs). These trends correctly represented the developmental and functional similarities and differences of glia, neurons, and vascular cells in the brain. Biological replicates of identical cell types showed high correlation as determined from either method, indicating that our cell purification methods are highly reproducible.
To compare the sensitivities of the two technologies, we compared the distributions of cell type-specific gene enrichments. For a given cell type, we calculated the fold enrichment of each gene—defined as expression of the gene in the given cell type divided by the average expression level of the gene in all other cell types—for both RNA-Seq and microarray datasets. We plotted these fold enrichment values for each gene to compare the two platforms directly (Fig. 2B–E). The resulting scatter plots did not reveal the 1:1 trend line expected if both technologies were equally sensitive but rather produced plots that are stretched across the RNA-Seq axis (x-axis). These data indicate that RNA-Seq identified a greater number of differentially expressed genes than did the microarray platform, likely because of the higher sensitivity of RNA-Seq in detecting low-abundance transcripts and the increased linear range of RNA-Seq compared with the microarrays.
We next examined the overlap between genes that were identified as differentially expressed by RNA-Seq and microarrays. We identified differentially expressed genes (using a fourfold enrichment or depletion cutoff) in both modalities and found that the majority of genes classified as differentially expressed by microarray were similarly identified by RNA-Seq (mean, 92.5 ± 2.6%; for cell type-specific counts, see Fig. 2B–E). Conversely, RNA-Seq consistently identified a significant number of differentially expressed genes that were not classified as differentially expressed in the microarray dataset [ranging from 1471 (oligodendrocytes) to 3529 (microglia) genes; Fig. 2B–E]. This trend was consistent across all fold enrichment cutoffs (from 1.5 to 50; data not shown). Together, these data indicate that the RNA-Seq platform accurately identified many of the expression differences observed with the microarray analysis but also detected a substantially larger number of new genes with previously unknown cell type-specific distributions.
To investigate the relationship between enrichment and transcript abundance, we plotted the fold enrichment of all genes against their transcript abundance in the whole cortex. Consistent among all cell types, we observed a random distribution of enrichment scores across all transcript abundance levels. In other words, differential gene expression (as determined by RNA-Seq) is not significantly biased by transcript abundance in the cortex. We have included one such enrichment/expression plot for the astrocyte data in Figure 2F as a representative example.
To further examine the genes detected as specifically enriched in the RNA-Seq data, we compared the expression levels of cell type-enriched genes identified by both RNA-Seq and microarray versus those identified as cell type-enriched by RNA-Seq alone (Fig. 2G). Here we observed a slight trend where genes enriched in the RNA-Seq data alone were clustered toward lower FPKM expression levels compared with those genes that were classified as enriched in both modalities. This likely reflects a consequence of the improved sensitivity and linearity of RNA-Seq compared with microarray, and it is from this pool of lesser expressed genes that we chose to validate new candidates with qPCR (see Fig. 6).
RNA-Seq analysis reveals distinct gene expression signatures of each cell type
To assess the relationship between gene expression profiles of different cell types, we performed unsupervised hierarchical clustering of our complete RNA-Seq transcriptome data (Fig. 3A). As expected, different maturation stages within the oligodendrocyte lineage clustered together. Although astrocytes, oligodendrocytes, and microglia are all considered glia, they do not cluster distinctly from neurons as is typically assumed based on their misleading nomenclature. These glial cell types are actually as dissimilar from each other as they are from neurons (Cahoy et al., 2008). These differences are consistent with their distinct functions and unique developmental origins. Indeed, ectoderm-derived neurons, astrocytes, and oligodendrocyte-lineage cells cluster more closely together than do the mesoderm-derived microglia and endothelial cells. The clustering heat map reveals distinct groups of genes specifically expressed by each cell type that appear as red blocks in Figure 3A. Principal component analysis revealed similar distinctions between cell classes (data not shown).
RNA-Seq analysis identifies novel cell type-enriched genes
We analyzed the most highly enriched genes for each cell type (Figs. 3B, 4). These genes are highly expressed (FPKM >20) by one cell type and undetectable or expressed at low levels by other cell types in the brain. As expected, several genes on this list are well known cell type-specific markers, e.g., Aqp4, Aldh1l1, and Fgfr3 for astrocytes, Dlx1 and Stmn2 for neurons, Pdgfra and Cspg4 for OPCs, Mbp and Mog for MOs, Tnf and C1qa for microglia, and Cldn5 for endothelial cells (Fig. 4). Additionally, however, we discovered a large number of cell type-enriched genes that have not yet been described previously as cell type-specific in the literature to the best of our knowledge. The restricted expression pattern of these genes reveals potential cell type-specific roles in brain development, signaling, metabolism, and disease. For example, we found astrocytic enrichment of the autism and schizophrenia-associated gene Tspan7, neuronal enrichment of the gene encoding a novel transmembrane protein Tmem59l that is homologous to proteins implicated in the processing of amyloid precursor protein, and OPC enrichment of the gene encoding Neurexophilin1, a protein involved in synapse formation. Additional novel cell type-specific genes are included in Figure 4.
Transcription factors are key regulators of gene expression. Cell type-specific transcription factor expression is important for cell fate determination and cell differentiation (e.g., NeuroD for specification of neuron cell fate, Olig2 for specification of oligodendrocyte fate, and GM98/Myrf for oligodendrocyte differentiation; Lee et al., 1995; Zhou and Anderson, 2002; Emery et al., 2009). Our RNA-Seq analysis identified numerous cell type-specific or enriched transcription factors (Fig. 4). These include well known transcription factors (e.g., Sox9 specific to astrocytes, Dlx1, Dlx5, and Tbr1 specific to neurons, Olig1 and Olig2 specific to OPCs, Nkx2-2, Nkx6-2, and Myrf specific to oligodendrocytes) and a large number of transcription factors that were not recognized previously to be cell type-specific. Additional investigation of these genes is likely to be a fruitful approach to understanding the development of the CNS.
What are the global differences in the signaling and metabolic pathways among cell types of the brain? Historically, this question has been difficult to address without a highly linear transcriptome database. Here, we performed Ingenuity Pathway Analysis enriched by each cell type among all expressed genes from our samples and identified distinct signaling and metabolic signatures for neurons, glia and vascular cells of the brain. As expected, our analysis identified the enrichment of a collection of neurotransmitter receptor signaling pathways in neurons, immune cell signaling pathways in microglia, and epithelial adherence junction signaling pathways in endothelial cells. Moreover, we found the enrichment of NF-κB, Wnt/β-catenin, and sonic hedgehog signaling pathways in astrocytes, suggesting the importance of these major signaling pathways in the development of astrocytes and/or the induction of reactive astrogliosis. We detected the enrichment of the inhibition of matrix metalloproteases and intrinsic prothrombin activation pathways in OPCs, suggesting that OPCs have unique properties in their interactions with the extracellular matrix compared with other cell types of the brain. Metabolic differences among unique cell types of the brain are poorly understood. We systematically analyzed enrichment of metabolic pathways in each cell type of the brain. Highlights include the enrichment of the γ-linolenate biosynthesis pathway in astrocytes, 3-phosphoinositide (IP3) biosynthesis and degradation pathways in oligodendrocytes, the pyridoxal 5′-phosphate salvage pathway in microglia, the citrulline–nitric oxide cycle in endothelial cells, and the retinoate biosynthesis pathway in pericytes. In addition, we conducted weighted gene coexpression network analysis (WGCNA) to cluster genes into coexpression modules using our RNA-Seq data (data available on our RNA-Seq data website; see Notes).
The eukaryotic genome is transcribed in a developmentally regulated manner to produce large numbers of lncRNA or large intervening noncoding RNAs (Guttman et al., 2009; Mercer et al., 2009), the function of which in gene regulation and cancer pathogenesis are increasingly being recognized. However, the expression landscape of lncRNAs across different cell types has not been characterized in a complex organ such as the brain. We detected the expression of 811 lncRNAs with an inclusive criterion (FPKM >1) and 227 lncRNAs with a stringent criterion (FPKM >5) in at least one cell type. Some lncRNAs are among the highest expressed transcripts in the brain, with 12 having FPKM values >100, placing them among the 7% most highly expressed genes in our dataset. To ask whether we could observe cell type-specific expression of lncRNAs, we compared the numbers of lncRNAs expressed by each cell type. We found that astrocytes and neurons express large numbers of lncRNAs (109 ± 2 and 92 ± 3, respectively), whereas MOs and endothelial cells express fewer lncRNAs (44 ± 3 and 48 ± 3, respectively). Some lncRNAs are expressed in cell type-specific or enriched manners (Table 1). Their function in cell type-specific gene regulation warrants future investigation.
How do cells of the brain communicate with each other to coordinate their unique roles in brain development and function? Much of our knowledge of cell–cell interactions is derived from studies of secreted ligands and their transmembrane receptors. For example, astrocytes secrete thrombospondins, which bind their receptors α2δ1 on neurons to stimulate synapse formation (Christopherson et al., 2005; Eroglu et al., 2009). Endothelial cells secrete heparin-binding EGF-like growth factor, which binds its receptor epidermal growth factor receptor on astrocytes to promote astrocyte survival (Foo et al., 2011). Biochemical and genetic identification of receptor–ligand pairs in cell types of the brain has been cumbersome and time consuming. Our transcriptome database provides an unprecedented opportunity to systematically identify cell type-specific expression of ligands and their corresponding receptors (Table 2). For example, we identified specific expression of Lgr6 by astrocytes and its ligands, the R-spondins, by neurons. Stimulation of Lgr6 by R-spondins potentiates Wnt/β-catenin signaling (Gong et al., 2012). Our finding of the cell type-specific expression of this ligand-receptor pair suggests a role of neurons in regulating Wnt/β-catenin signaling in astrocytes. Investigation of the function of these receptors and ligands will tremendously expand our knowledge of neuroglial, neurovascular, and gliovascular interactions in the brain.
An alternative splicing database of glia, neurons, and vascular cells of the brain
Alternative splicing generates enormous transcriptome complexity by producing multiple mRNA isoforms from a single gene. To capture this transcriptome diversity, we constructed the first RNA-Seq alternative splicing database of collective populations of glia, neurons, and vascular cells. A daunting task in analyzing alternative splicing from RNA-Seq data is to map hundreds of millions of short reads (usually 50–150 nt in length) back to the reference genome and to detect known or novel splice junctions. For this purpose, we recently developed an algorithm, OLego, that has an improved sensitivity and accuracy compared with previously published programs (Wu et al., 2013). The false-negative rate (8.2%) of OLego almost halved the false-negative rate of the widely used programs TopHat (15.4%) and PASSion (14.8%). OLego is capable of detecting six common alternative splicing event types (detailed in Fig. 5A), including alternative 3′ and 5′ splice sites, retained introns, and various types of exon inclusion/exclusion events.
To address the global question of how many genes in the brain are alternatively spliced, we mapped RNA-Seq reads from neurons, astrocytes, OPCs, NFOs, MOs, microglia, and endothelial cells with the OLego algorithm. This analysis identified 6588 total genes that are alternatively spliced in at least one cortical cell type. We next considered how alternative splicing varies across each of these differing cell types. We found that the overall frequency of alternative splicing (5549 ± 674 events corresponding to 3187 ± 347 unique genes per cell type) and the distribution of various splicing event types were similar across all of our samples (Fig. 5B,C). Cassette exon events consistently comprised the most frequent alternative splicing event (35.3 ± 1.3%), followed by a nearly identical distribution of the remaining five splicing event types (Fig. 5B). Despite the overall similarity in the frequency of alternative splicing, we detected thousands of cell type-specific alternatively spliced genes. Neurons contain the greatest number of specific alternative splicing events (3110) in contrast to oligodendrocyte-lineage cells, which contained less than half as many (1469 unique splicing events in OPCs; Fig. 5D). This observation likely reflects the significant heterogeneity that is present among various neuronal subtypes. The most significant splicing events enriched in each cell type can be found in Table 3. How many alternative splicing events of a gene, on average, are present in a given cell type? Per cell type, we detected 5549 ± 275 alternative splicing events in 3171 ± 142 alternatively spliced genes, resulting in ∼1.7 alternative splicing events per alternatively spliced gene. To provide easy access to the alternative splicing data that we generated, we deposited the complete splicing dataset in an interactive web browser and database (see Notes).
To test the accuracy of our splicing database, we looked for genes reported previously to be differentially spliced in neurons and glia. Two particularly interesting examples include the differential expression of neuronal and glial isoforms of Nfasc and Sgce genes (Zonta et al., 2008; Ritz et al., 2011), both of which are confirmed in our splicing database, among others. Next, to validate the accuracy of the novel cell type-specific splicing events in our dataset, we searched for a biologically significant gene candidate to validate via PCR. One of our top cell type-specific splicing events involved the gene pyruvate kinase (Pkm), which catalyzes the final step of glycolysis (Tani et al., 1988). Our RNA-Seq data indicated that neurons and glial cells express distinct splicing isoforms of the Pkm gene, Pkm1 and Pkm2, respectively, and we confirmed this result using RT-PCR with isoform-specific primers (Fig. 5E,F). Unlike Pkm1, the Pkm2 isoenzyme is allosterically regulated to dynamically regulate the relative rate of energy production (see below, Regulation of energy metabolism in neurons and astrocytes).
qRT-PCR and in situ hybridization validate RNA-Seq results
We next used two different methods, qRT-PCR and in situ hybridization, to determine the accuracy of the cell type-enriched genes identified by RNA-Seq. We selected 40 genes that had not been identified previously in the literature as cell type-specific for our validation. Some of these genes are expressed at moderate levels and others at low levels. We chose genes with low expression values because their validation provides confidence in the overall accuracy of the RNA-Seq dataset, whereas preferentially validating highly expressed genes is more straightforward but less informative. Additionally, these cell type-enriched genes have not been well characterized and are therefore novel markers worth validating. We used the Fluidigm Biomark microfluidics technology for qRT-PCR analysis. Using the Biomark microfluidics system, we loaded small volumes of 96 primer pairs and 96 samples to a microfluidics chip, in which each sample and each primer pair were mixed together to produce 9216 individual nanoliter-scaled qRT-PCR reactions in a single experiment. This eliminates the plate-to-plate variation seen commonly in traditional qRT-PCR experiments and allowed us to easily assay the expression of a large number of genes with many replicates. Thirty-four of the 40 genes we assayed were detected by qRT-PCR in the identical cell-enriched pattern as determined by RNA-Seq (Fig. 6). Six genes were not consistently amplified across replicates in any cell type. This is likely attributable to the lower sensitivity of running nano-scaled qRT-PCR reactions on microfluidics chips compared with RNA-Seq. None of the 40 genes we tested were reproducibly amplified in a cell type that contradicted the RNA-Seq data. Of the 34 genes we validated by PCR, our RNA-Seq data indicated that each was >10-fold enriched in a given cell type. Using the same threshold, only 20 of these genes were similarly identified as cell type-enriched by our microarray data. At a less-stringent twofold enrichment cutoff, only 29 were identified as cell type-enriched by microarray, thus demonstrating the increased sensitivity of the RNA-Seq data.
Finally, we sought to examine the accuracy of our transcriptome database by performing in situ hybridization on several relatively unknown cell-specific genes from our dataset. We generated specific probes to these candidate genes and cohybridized them with classic cell type-specific markers to investigate their cell specificity (Fig. 6C–G). We identified novel markers for astrocytes (Gdpd2), neurons (Lppr3), oligodendrocytes (Ppapdc1a), and microglia (Olfml3). Consistent with this finding, Olfml3 was reported recently to be a microglia-specific marker in the spinal cord (Chiu et al., 2013). These results suggest that our database includes a substantial number of novel cell type-specific genes that could have wide-ranging significance for understanding brain function. In Figure 6C, the in situ hybridization signals of Gdpd2 and Glast are located in close proximity but do not overlap. When we examined Gdpd2 and Glast signals together with DAPI stains, it was clear that clusters of Gdpd2- and Glast-positive puncta localized to the same cell. In our experience, many astrocyte-specific genes produce in situ hybridization signals that are localized to processes instead of/in addition to the cell body. In situ hybridization signals in astrocytes often appear more diffuse than those in other cell types. The function of mRNA in astrocyte processes is unknown. One possibility is the presence of distinct RNA granules containing Gdpd2 and Glast mRNAs within astrocytes.
Regulation of energy metabolism in neurons and astrocytes
To demonstrate the application of this database toward understanding fundamental cellular interactions in the brain, we used our expression and splicing data to investigate the cellular division of labor in brain energy metabolism. We began by looking at glycolytic enzymes, because the primary energy consumers in the brain, neurons and astrocytes, each exhibit diametric patterns of glucose consumption (for review, see Bélanger et al., 2011; Bouzier-Sore and Pellerin, 2013); neurons heavily rely on oxidative metabolism, whereas astrocytes exhibit high glycolytic rates in vivo and in vitro (Lebon et al., 2002; Itoh et al., 2003; Bouzier-Sore and Pellerin, 2013). Under physiologic conditions, the majority of glucose entering the glycolytic pathway in astrocytes is metabolized to lactate rather than oxidized in the mitochondria. This phenomenon forms the foundation of the astrocyte–neuron lactate shuttle hypothesis (Walz and Mukerji, 1988), which proposes that astrocytes synthesize and subsequently expel lactate so that neurons can use it as an energy source. Subsequently, numerous studies have demonstrated that neurons efficiently use lactate as an energy source (Schurr et al., 1997; Bouzier et al., 2000; Qu et al., 2001) and may even prefer it over glucose (Bouzier et al., 2000; Itoh et al., 2003).
We identified expression and splicing differences in various glycolytic enzymes that help distinguish astrocytes as primary glycolytic cells (summarized in Fig. 7). The first of these enzymes is 6-phosphofructose-2-kinase/fructose-2,6-bisphosphatase-3 (Pfkfb3), which synthesizes fructose 2,6-bisphosphate, a potent glycolytic activator. Protein levels of Pfkfb3 are low in neurons because of its constant proteasomal degradation (Herrero-Mendez et al., 2009; Almeida et al., 2010), and, consistent with these observations, we found that Pfkfb3 mRNA is enriched by an order of magnitude in astrocytes compared with neurons. Another key regulated step of glycolysis is the conversion of phosphoenol pyruvate into pyruvate through the enzyme pyruvate kinase (PK). PK is found in two primary isoforms, Pkm1 and Pkm2. We identified and validated the exclusive expression of Pkm1 in neurons and Pkm2 in astrocytes (and other glial cells; Fig. 5E,F). Pkm2 contains an inducible nuclear translocation signal that allows a cell to regulate the amount of glycolytic flux in response to the local energy state. However, Pkm1 is not allosterically regulated and locks neurons in a steady state of glycolysis. After PK catalyzes the formation of pyruvate, its flux into oxidative metabolism is carefully controlled by the activity of the enzyme pyruvate dehydrogenase (Pdh). Phosphorylation-mediated inactivation of Pdh in astrocytes (Itoh et al., 2003; Halim et al., 2010) prevents glycolytic pyruvate from entering oxidative phosphorylation. Analysis of our transcriptome data demonstrates that pyruvate dehydrogenase kinase 4 (Pdk4) transcripts are >30 times enriched in astrocytes compared with neurons, suggesting that high Pdk4 expression in astrocytes might be responsible for subsequent Pdh inactivation. Diminished Pdh activity in astrocytes requires pyruvate to be shunted to the formation of lactate by the enzyme lactate dehydrogenase (Ldh). Our data indicate that astrocytes and neurons preferentially express different versions of the lactase dehydrogenase enzymes that interconvert pyruvate and lactate, Ldha and Ldhb. Ldha and ldhb are biased toward the production of pyruvate or lactate, respectively, and we find that Ldhb is enriched in astrocytes (twofold), whereas Ldha is primarily expressed in neurons (1.7-fold).
The expression and splicing patterns that we identified in Pfkfb3, Pkm2, Pdk4, and Ldha/b are each consistent with metabolic observations in astrocytes and neurons. Although the majority of the primary glycolytic and oxidative phosphorylation enzymes are expressed at similar levels in neurons and astrocytes, the differential expression of these key regulatory enzymes helps to explain how astrocytes perform such high rates of aerobic glycolysis. Our data suggest that, in contrast to neurons, astrocytes express a collection of regulatory enzymes that are sensitive to the energy state of the cell.
Discussion
An RNA-Seq transcriptome database of purified cell classes of the brain
In this study, we refined purification protocols for neurons, astrocytes, various maturation stages of oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cortex and obtained an RNA-Seq transcriptome dataset of these cell types. Although a large number of neuronal subtypes have already been recognized (and glial subpopulations likely exist), grouping these populations into major cell type classifications allows for the investigation of interactions between primary cell classes. We obtained data on the expression level of >22,000 genes, and we determined the abundance of each alternative splicing variant of each gene for each of our samples.
The systematic transcriptional dissection of an organ to its primary cell type components for the purpose of establishing a transcriptome database provides an essential step toward understanding a complex tissue such as the brain. Understanding brain function (as well as dysfunction) requires an understanding of how distinct cell classes in the brain interact in a dynamic environment. For example, recent advances in understanding disease mechanisms in the neurodegenerative disease amyotrophic lateral sclerosis have revealed that the disease is not simply a motor neurons disease. Rather, it is a disease of the spinal cord in which astrocytes, microglia, and oligodendrocytes all contribute to disease progression (Ling et al., 2013). Thus, the database we have established should be invaluable in future studies of complex cell–cell interactions in other neurological diseases.
Our dataset contains a tremendous quantity of novel gene expression data, and exploration of this dataset by the neuroscience research community will generate hundreds of new testable hypotheses with biological significance. We noted that the vast majority of well annotated cell type-specific genes exhibit the predicted cell-enrichment patterns in our dataset (Fig. 1). Furthermore, RNA-Seq allows for the detection of novel genes that are unannotated in the current genome; we detected 26 novel transcripts with expression levels of FPKM >5 that were previously unannotated in UCSC Genome Browser (Meyer et al., 2013), Ensembl (Flicek et al., 2012), Vega (Ashurst et al., 2005), and AceView (Thierry-Mieg and Thierry-Mieg, 2006) databases (data not shown).
When directly compared with the microarray platform, our RNA-Seq library identified a dramatic increase in the number of cell type-specific genes (Fig. 2). This is in part attributable to the higher sensitivity of RNA-Seq in detecting low-abundance RNAs. One caveat regarding the interpretation of large-enrichment scores is to consider the effects of ratio inflation at lower FPKM levels. The biological relevance of a gene that is 20-fold enriched at an FPKM value of 2 should be weighed with more caution than a similarly enriched gene with an FPKM of 200.
An alternative splicing database of purified cell types of the brain
Alternative pre-mRNA splicing is an essential mechanism for generating proteome complexity. Previous expressed sequence tagged mapping studies have led to the conclusion that the brain has the highest number of alternative splicing events when compared with other organs (Yeo et al., 2004), and different brain regions are associated with complex patterns of alternative splicing (Johnson et al., 2009). However, the cellular source of this complexity is not known because analyses were performed on brain tissues comprising mixed cell types. We sought to answer the question of how many genes are alternatively spliced in the mouse cortex with our highly sensitive RNA-Seq data. Of the ∼22,000 genes that comprise the mouse genome, we examined a large collection of RNA-Seq data across several tissue types and identified 10,447 cases in which there is direct evidence of alternative splicing or predicted alternatively spliced isoforms. Of these genes that are potentially alternatively spliced in the mouse genome, 9036 are expressed in the mouse cortex above a minimum expression threshold. Interestingly, we found that 6588 of these genes (73%) are alternatively spliced in at least one major brain cell type. This demonstrates that the majority of genes that are known to undergo alternative splicing in the mouse genome are alternatively spliced in the brain. Thus, an understanding of cellular transcriptomes would not be complete unless alternative splicing information accompanies gene-level expression data.
Tissue-specific alternative splicing is well established (Xu et al., 2002). However, the regulation of alternative splicing in a cell type-specific manner in a complex tissue like the brain has not been addressed. We report that alternative splicing occurs as frequently in glia and vascular cells as it does in neurons. In contrast to the overall similarity in the frequency of alternative splicing, each cell type contains its own specific repertoire of thousands of alternatively spliced RNAs. These observations suggest that the distinct functions of each cell type may result as much from the presence of specific protein isoforms generated by alternative splicing as from global differences in the levels of gene expression. Moreover, previous studies have shown that disruption of alternative splicing can lead to neurological diseases in human and animal models (Buckanovich et al., 1996; Yang et al., 1998; Gehman et al., 2011), and our splicing dataset provides a resource for understanding the cell type-specific contributions to these disorders.
Alternative splicing of specific genes has also been shown to play important roles in the development and function of the nervous system. For example, the mechanisms by which differentially spliced isoforms of neuroligin and neurexin pre-mRNA mediate synaptic adhesion offer a well characterized example of how cell-specific splicing can have considerable functional consequences (Chih et al., 2006). Before systematic methods for the analysis of alternative splicing became available, characterizing alternative splicing of specific molecules (e.g., neuroligin and neurexin) and their role in the nervous system was cumbersome and time consuming. In the present study, we generated a high-quality splicing dataset for every gene in the major cell classes of the brain. Although a systematic and deep analysis of this splicing data is beyond the scope of this study, a preliminary exploration of the data has already identified interesting splicing patterns of considerable biological significance. For example, the neural cell adhesion molecule (NCAM), encoded by the Ncam1 gene, is a homophilic cell adhesion molecule involved in synaptic plasticity, learning, and memory (Becker et al., 1996; Senkov et al., 2006; Stoenica et al., 2006). Multiple splicing isoforms of Ncam1 have been reported previously (Owens et al., 1987; Krushel et al., 1998). We found that Ncam1 is spliced uniquely in neurons and glia. The neuronal isoform includes an exon that encodes a transmembrane domain, whereas the glial form excludes this exon. Both isoforms contain a signal peptide sequence. Therefore, neuronal NCAM is likely located on the plasma membrane, whereas glial NCAM is likely secreted or glycophosphatidylinositol anchored. This discovery raises interesting new possibilities of the cellular mechanism by which NCAM controls synaptic plasticity, learning, and memory.
Another finding from our alternative splicing dataset is the differential expression of splicing isoforms of the H2afy gene in oligodendrocyte lineage. H2afy encodes an atypical histone variant macroH2A1 enriched in heterochromatin. The gene H2afy contains two mutually exclusive exons, and alternative splicing generates two isoforms, mH2A1.1 and mH2A1.2 (Kustatscher et al., 2005). Embryonic stem cells exclusively express mH2A1.2, and as development progresses, the expression of the mH2A1.1 isoform increases. In addition, cancer cells downregulate the mH2A1.1 isoform, which inhibits cell proliferation (Novikov et al., 2011). We found that OPCs predominately express mH2A1.2, NFOs express approximately equal amounts of mH2A1.1 and mH2A1.2, and MOs predominately express mH2A1.1. This developmental progression of splicing isoform change correlates with the reduced proliferation capacity as oligodendrocytes differentiate. Our data are consistent with a role of mH2A1 in regulating oligodendrocyte differentiation through chromosomal changes.
RNA-Seq is highly reproducible and reference independent. Thus, this database of wild-type mouse neurons, glia, and vascular cells of the brain provides a valuable reference dataset for future studies that compare gene expression patterns of diseased versus healthy tissues. Our cell purification protocols can be used to similarly purify cell types from brains of mutant mice. Comparing these transcriptome studies at the resolution of individual cell types will help elucidate mechanisms of neurological diseases. Accumulating evidence has demonstrated that glia are involved in a variety of neurological diseases, including schizophrenia, autism spectrum disorders, epilepsy, Alzheimer's disease, Parkinson's disease, multiple sclerosis, amyotrophic lateral sclerosis, and stroke (Matute et al., 2005; Tian et al., 2005; Nagai et al., 2007; Chen et al., 2009; Li et al., 2011; Lioy et al., 2011; Molofsky et al., 2012; Zamanian et al., 2012). As the research community continues to look beyond neurons to understand the pathogenesis of neurologic dysfunction, the comprehensive transcriptome dataset of the brain presented here will be a valuable resource.
Notes
Supplemental material for this article is available at http://web.stanford.edu/group/barres_lab/brain_rnaseq.html. Searchable RNA-Seq database for purified cell types. This material has not been peer reviewed.
Footnotes
This work is supported by National Institutes of Health Grants R01 MH09955501 and R01 NS08170301 (B.A.B.), T32GM007365 with additional support from Stanford School of Medicine and its Medical Scientist Training Program (S.S.), and R00 GM95713 (C.Z.), a Berry Postdoctoral Fellowship (Y.Z.), and a National Health and Medical Research Council of Australia C. J. Martin Fellowship (S.L.). J.Q.W., K.C., and S.D. are supported by National Institutes of Health and the Staman Ogilvie Fund–Memorial Hermann Foundation. We thank Dr. Dritan Agalliu for Cldn5 expression plasmid and Vincent and Stella Coates for their generous support.
The authors declare no competing financial interests.
- Correspondence should be addressed to either of the following: Ye Zhang, Stanford University School of Medicine, Department of Neurobiology, 299 Campus Drive, Fairchild Building, Stanford, California 94305-5125, zhangye{at}stanford.edu; or Jia Qian Wu, University of Texas Medical School, 1825 Pressler Street, SRB-630.01, Houston, TX 77030. jiaqian.wu{at}uth.tmc.edu