Abstract
For many years, efforts to decipher the various cellular components that comprise the CNS were stymied by a lack of technical strategies for isolating and profiling the brain's resident cell types. The advent of transcriptional profiling, combined with powerful new purification schemes, changed this reality and transformed our understanding of the macroglial populations within the brain. Here, we chronicle the historical context and scientific setting for our efforts to transcriptionally profile neurons, astrocytes, and oligodendrocytes, and highlight some of the profound discoveries that were cultivated by these data.
Following a lengthy battle with pancreatic cancer, Ben Barres passed away during the writing of this Progression piece. Among Ben's innumerable contributions to the greater scientific community, his addition of publicly available transcriptome databases of CNS cell types will forever remain a relic of his generous spirit and boundless scientific curiosity. Although he had impressively committed a majority of these enormous gene lists to memory, Ben could oftentimes be spotted at meetings buried in his cell phone on the Barres RNAseq database. Perhaps the only thing he enjoyed more than exploring these data himself, was knowing how useful these contributions had been (and will hopefully continue to be) to his scientific peers.
Introduction
As scientists, our curiosity motivates us to understand our surroundings and to ask how we function and interact with our environment. In general, we ask these questions by applying a reductionist approach to difficult problems. For example, the naturally inquisitive scientist who wants to understand the inner workings of a computer might follow a stereotyped formula by first examining the superficial structural features, followed by disassembling the device to scrutinize its individual components: the transistors and capacitors speckled across the logic board, the wires that weave beneath the keyboard, and the LED pixels that illuminate the display. Piece by piece, the curious examiner dissects this complex device into its individual functional elements, asking what role each plays to better comprehend how the unit operates as a whole.
Many speculate that the brain is biologically analogous to one of the most technically advanced computers in nature, yet our understanding of this organ remains in its nascency. As neurobiologists, we have for many decades successfully applied similar principles of reverse engineering to decipher the function of the brain, from cataloguing the major structural components, to delineating the wiring diagrams that comprise specific microcircuits and macrocircuits. But despite these efforts, one of the most fundamental steps toward interpreting how this organ functions remained vexingly elusive: what are the attributes of the brain's cellular components and how do these entities interact to control the development, function, and pathology of such a complex structure? In particular, neurobiologists had long been focused on understanding the molecular details of a single cellular entity within the brain–the neuron. But much like the circuits within the computer, there exists a diverse variety of additional (and abundant) cellular components that together comprise the brain and whose molecular identities remained largely a mystery. In this piece, we detail the scientific journey that led to the development of the first glial transcriptomic database of the murine brain (Cahoy et al., 2008) and discuss how obtaining these data in one of the most widely used model organisms helped to shape the field of neuron-glial interactions as we know today.
The birth of transcriptomics and its limited application to the brain
In the late 1990s, the first automated gene expression platforms (Schena et al., 1995) promised to unveil a new realm of biologic discovery. For the first time, it was possible to interrogate gene expression at a transcriptome-wide level. This was an unprecedented advance in molecular profiling, as it provided the opportunity not only to better define the molecular signature of individual tissues, but to expose new functions and tissue-level interactions based on the expression pattern of specific gene networks. Not surprisingly, neurobiological questions were some of the first to be addressed by gene chip array technologies. By the early 2000s, several groups were already performing gene expression studies on whole tissue samples of the murine brain and revealing a surprising degree of molecular heterogeneity across major CNS structures, such as cerebellum, cortex, and hippocampus (Sandberg et al., 2000).
As microarray technology became more user-friendly and less costly, its application became more widespread. Transcriptomic profiles of healthy and pathologic states of the rodent and human brain became more prevalent, but these studies were invariably performed on entire chunks of brain tissue. While these data provided numerous new insights into global gene expression in various disease states, they added less to our fundamental understanding of the cellular components that together comprise this tissue. This was analogous to diagnosing a malfunctioning transistor beneath your computer keyboard by carving off the bottom corner of the laptop, grinding up the components, and attempting to interpret an abnormality within this mechanical hodgepodge. Thus, there was a clear need to somehow first separate each of the major cellular components within the CNS, and then perform transcriptomic profiling on these purified populations. But this was a surprisingly difficult problem that would stymie neurobiologists for many years.
Cell purification strategies at the turn of the century
The mammalian CNS is comprised of several major cell types, classically subdivided into neurons, glia, and vascular cells. Although glia are the most abundant of these populations, the propensity of neuron-centric research meant that many of the first cell type-specific transcriptomic profiles were performed on neurons and neuronal subtypes. At the time, there were two primary methods for purifying specific CNS cell populations. The first was the use of a marker approach, in which a fluorescent protein was expressed in the same pattern as that of a known marker gene. This could be accomplished using targeted knockin of a fluorescent protein sequence at a marker locus, or through transgenic bacterial artificial chromosomes (BAC) that recapitulated endogenous expression patterns of cell type specific genes. Once a given neuronal population expressed a fluorescent marker, these cells could be separated using FACS to a high purity. Microarray expression profiling of neurons using this technique was first performed in Caenorhabditis elegans by Yun Zhang et al. (2002). This was also an ideal technique at the time since numerous mouse lines were being generated out of the GENSAT project (http://www.gensat.org), and large-scale implementation of the marker approach quickly became the norm (Gong et al., 2003). Of course, a major caveat of the marker approach is that it requires a transgenic line with exceptional cell type specificity without nonspecific labeling in contaminating populations, which given a dearth of cell type-specific promoters was challenging. The second method that was used for purification of specific neuronal subtypes utilized laser microdissection, in which individual cells were painstakingly removed from a piece of tissue and collected for downstream analysis. This technique required a technical tour de force that limited its widespread utility and whose interpretation could be blurred by intertwined cell processes.
What about glia?
If we were provided a detailed textbook outlining the structure and function of the transistors within our computers, we would still struggle to comprehend this machine as a whole. This incomplete glimpse essentially ignores the other critically interdependent components and thus restricts our ability to decipher the more complex intricacies of the device. In a similar vein, while multiple groups were making exciting strides in collating high-quality transcriptomic profiles of neurons and neuronal subpopulations (Arlotta et al., 2005; Lobo et al., 2006; Nelson et al., 2006), little progress was made toward understanding the molecular signatures of other CNS cell types, such as glia. Much of this lag was not for lack of interest; rather, the genetic toolsets for isolating glial populations were simply not yet adequate. This created a chicken and the egg situation; without specific markers for glia, it was difficult to isolate them and subsequently identify better markers.
The first step toward rectifying the absence of glial transcriptomes came in 2006, when a postdoctoral fellow in our group, Jason Dugas, succeeded in purifying rat oligodendrocyte precursor cells and premyelinating, postmitotic oligodendrocytes (Dugas et al., 2006), the CNS cell type responsible for insulating axons and enhancing the propagation of electrical signals between neurons. Given the lack of oligodendrocyte-specific transgenic lines and the technical challenges of laser microdissection, this work was performed using an alternative purification technique known as immunopanning (Barres et al., 1992). Immunpanning involves transferring a single-cell suspension over a series of Petri dishes coated with cell type-specific surface antibodies that bind cells and adhere targeted populations. These transcriptomics studies on young oligodendrocytes, combined with the plethora of existing neuronal transcriptomes, provided useful gene expression profiles for mouse neurons and rat oligodendrocytes. Despite this, a global direct comparison between the main CNS neural cell types still remained elusive because of a lack of methods to purify postnatal astrocytes and myelinating oligodendrocytes.
Filling in the glial gaps to build the first mouse transcriptome database
In 2007, 2 MD/PhD students in the laboratory, John Cahoy and Amit Kaushal, along with postdoctoral fellow Ben Emery, embraced the task of creating the first comprehensive transcriptome database of astrocytes, neurons, and oligodendrocytes. Their first achievement was the development of a purification scheme for murine astrocytes in which they creatively sequentially combined two of the previously described cell sorting strategies. Transgenic mice expressing EGFP under the S100β promoter had strong fluorescence in astrocytes but were plagued by nonspecific expression in oligododendrocyte-lineage cells as well. Thus, the group's strategy was to first use the immunopanning protocol to negatively select contaminating oligodendrocytes and deplete these populations from the single-cell suspension before proceeding to FACS sorting to isolate the remaining EGFP positive astrocytes. Of note, just 2 months before publication of this study, Maiken Nedergaard's group also published a story in which they successfully separated murine neocortical astrocytes for transcriptomic profiling using the Glt1 transgenic line (Lovatt et al., 2007), which provided a useful orthogonal validation of our own astrocyte transcriptome data. The second achievement for Cahoy et al. (2008) was their ability to repurpose the previous rat oligodendrocyte immunopanning protocol for mouse oligodendrocyte lineages, and to extend this protocol to include mature, myelinating mouse oligodendrocytes. In addition to separating each of these primary glial cell lineages, Cahoy et al. (2008) performed cell purifications at various postnatal ages between P1 and P30 to capture developmentally regulated gene expression changes, which we suspected would yield clues about new glial functions during specific phases of CNS development. Finally, to complete the study, the team carefully validated new cell type markers that were identified from the transcriptomic data and annotated novel gene pathways that were previously unknown to be present in these glial populations.
At the time, sifting through this new transcriptomic database was an exercise in patience given the never-ending wealth of information that was suddenly available with a simple search of a spreadsheet. Each gene search query revealed new clues about cell type specificity or developmental regulation, not to mention there were >20,000 genes to scrutinize! While we knew that dozens of new projects would spawn from the abundant data presented in this paper, it is still remarkable to revisit the discussion section less than one decade later and realize the degree to which these data were a harbinger of future scientific discoveries in the field of neuron-glial interactions.
One common critique of the methods used for purifying cell types via FACS or immunopanning is that the 2- to 3-h-long process, which involves enzyme-based dissociation of tissue, may lead to acute gene expression changes that do not reflect true in vivo expression profiles. Not long after publication of the mouse transcriptome database, Nat Heinz' group developed a novel method for interrogating cell type-specific gene expression called bacTRAP (Translating Ribosome Affinity Purification) (Heiman et al., 2008). In this method, transgenic mice are engineered to express an EGFP-tagged ribosomal transgene, which enables tagging of polysomes for immunoaffinity purification of mRNA. Essentially, any BAC transgenic mouse with cell type-specific transgene expression could be used to pull down actively translating ribosomes and their associated mRNA molecules. One advantage of this technique was that mRNA could be collected immediately after sacrificing the animal, without concern for downstream expression changes during the purification process. Much to our delight, when we compared our transcriptomic profiles with those from the bacTRAP method, we found only a very small number of differences. These included the upregulation of a small subset of well-described immediate early genes, such as Fosb and Junb. Expression of these genes escalates during the dissociation process, which is supported by the observation that their expression correlates closely with the total time duration of the purification process (either FACS or immunopanning). As a result, we attempt to minimize the duration of dissociation and immunopanning whenever possible and are cognizant of these genes to avoid interpreting their expression fluctuations as physiologically relevant.
How discoveries from the transcriptomic data influenced a field
Given the retrospection of time, it is remarkable to see how beneficial the Cahoy et al. (2008) transcriptome database was in our understanding of glial function. Below are just a sampling of the most direct impacts from this work on studies that had far-reaching consequences, themselves, in the field of neuron-glial interactions. The following narratives provide an example for how the value of the initial dataset continued to propagate many new discoveries in the field.
One major discovery of Cahoy et al. (2008) was the identification of a new cell type-specific astrocyte marker. At the time, there was a great need for an improved pan-astrocyte marker because many had observed that there was significant regional and subtype heterogeneity in other astrocyte markers. For example, GFAP, the most widely used astrocyte-specific identifier, is preferentially expressed in white matter over gray matter astrocytes and also has bran region-specific expression patterns (Chai et al., 2017). S100β and GLT-1, two other commonly used astrocyte markers, are also known to have nonspecific expression in oligodendrocytes and early neural progenitors, respectively. Thus, the discovery of Aldh1L1 as a mature, pan-astrocyte marker (and the validation of the Aldh1L1-EGFP GENSAT BAC mouse), provided a new resource to the glial community for astrocyte visualization and as a target for developing tools that permit astrocyte selective genetic manipulations. The Aldh1L1 promoter was first used as a constitutive driver of astrocyte-specific Cre expression and has now been developed in an inducible form (Aldh1L1-Cre/ERT2) (Winchenbach et al., 2016) that can be used for multiple experimental paradigms.
Another point highlighted in the discussion of the Cahoy et al. (2008) paper was the finding that murine astrocytes appeared to express many of the same evolutionary conserved phagocytic pathways used by glia in Drosophila and C. elegans (Ced-1/Draper/Megf10 and Mertk/Axl/αvβ5 pathways). Thus, we speculated that, although microglia were considered to be the main phagocytic cells of the CNS, astrocytes may also contribute significantly to synaptic pruning and other phagocytic roles in vivo. In 2013, work by another postdoctoral fellow in the laboratory, Won-Suk Chung, demonstrated just that (Chung et al., 2013). He showed that astrocytes are active phagocytic cells in the developing CNS and mediate synapse phagocytosis through Megf10/Mertk-dependent pathways. This has important implications for neural circuit development, as circuits in the visual system do not refine correctly in the absence of astrocytic expression of Megf10/Mertk. This was a profoundly new way of thinking about synapse pruning in the brain and opened many new avenues about the potential pathophysiology of neurodevelopmental disorders, which are diseases of synapses and may be propagated by defects in the pruning process mediated by microglia (Sekar et al., 2016) or even astrocytes. The entire conceptual underpinnings of the theory of astrocyte synapse phagocytosis in the mammalian brain were derived from the molecular profiles of Cahoy et al. (2008) and were directly influenced by this foundational work.
One final example of the impact of the Cahoy et al. (2008) data was the discovery of new astrocyte-specific surface markers that allowed for the development of immunopanning (and FACS) methods for purifying both rodent and human astrocytes. While the specificity of the astrocyte Aldh1L1-EGFP BAC mice made FACS sorting astrocyte populations an easier task, this was not an ideal method for culturing cells because many astrocytes are too fragile to survive the sorting process. Thus, we were later able to mine this database when searching for new cell-surface astrocyte-specific markers that might be useful for immunopanning cells. This led to the discovery of Itgβ5 as an antibody that could be used for the purification and subsequent culture of rodent astrocytes (Foo et al., 2011). This was a critical advance for experiments that required the use of purified astrocyte populations. Before the development of this method, creating purified astrocyte cultures relied upon the serum-mediated expansion and passaging of astrocyte progenitors, which molecularly and functionally poorly reflect their in vivo counterparts (Foo et al., 2011). Additionally, when we found that Itgβ5 could not be used to purify primary human astrocytes, it was through the use of this resource that we were able to identify yet another cell-surface antigen, HepaCAM, as a suitable human astrocyte-specific marker (Ye Zhang et al., 2016; Sloan et al., 2017).
The mouse transcriptome 2.0
As with all technologies, microarray platforms soon became obsolete with the advent and commercialization of RNA sequencing. RNA sequencing offered several advantages over microarray platforms, including increased sensitivity, the addition of splicing information, and more linear quantification. In the context of how useful the Cahoy et al. (2008) database had already become to the greater scientific community, we saw a new opportunity to improve upon these data in two key ways. First, we could apply RNA-seq to CNS cell types to more accurately capture the transcriptomic landscape and to shed light on how alternative splicing of genes was regulated among various cell classes. Second, we now had better tools to capture the remaining CNS cell types that were absent in the original microarray study (microglia and vascular cells). Thus, in the second round of CNS transcriptomic profiling of the murine brain, we optimized the techniques that were first pioneered by Cahoy et al. (2008) to purify all of the major neuronal, glial, and vascular subtypes and subsequently performed deep RNA sequencing (Ye Zhang et al., 2014). It was clear from the onset that the data from Cahoy et al. (2008) were already a valuable asset to neuroscientists around the globe. Thus, for these successive mouse RNA-seq data, along with multiple subsequent RNA-seq datasets (Bennett et al., 2016; Clarke et al., 2018), we developed a user-friendly database to ensure that scientists around the world could easily search genes of interest and visualize how their expression is partitioned among the various CNS cell types (www.BrainRNAseq.org). Since its inception in the summer of 2014, this resource has steadily garnered increased usage and continues to attract users from scientific communities all over the globe (Fig. 1). Most importantly, the website allows for easy one-stop shopping of gene expression data in the brain. Hear about a newly implicated gene in your favorite disease model, and a nifty 5 s search query will reveal details about its cell type specificity.
The future and open mysteries
As quickly as the microarray platform faded from use, advances in RNA-sequencing technologies are accelerating at an astounding rate. Where once our efforts to profile the cell types in the mouse brain were stymied by lack of cell-purification strategies, massive single-cell sequencing technologies are now rendering purification steps unnecessary. Several large-scale single-cell sequencing studies have subsequently recapitulated many of the findings in the original Cahoy et al. (2008) manuscript, and are unveiling new heterogeneous subpopulations of neurons and glia that have their own unique transcriptional signatures (Darmanis et al., 2015; Zeisel et al., 2015; Tasic et al., 2016). This includes the identification of the genes Mfge8 and Gfap as possible markers of distinct subclasses of cortical astrocytes (Zeisel et al., 2015). Interestingly, because single-cell profiles tend to have low read coverage, many of these studies require the use of “gold-standard” cell type-specific markers, which originate from the initial Cahoy et al. (2008) bulk transcriptomic. Astrocyte diversity may also exist in regionally specific domains throughout the CNS. For example, new studies comparing astrocytes isolated from distinct brain regions have identified subpopulation-specific markers, such as Crym in striatal astrocytes (Chai et al., 2017). Astrocytes have also been shown to exhibit neural circuit-specific functions, which adds a further layer of heterogeneity to this diverse cell type.
Among the abundance of data provided in the initial Cahoy et al. (2008) dataset and subsequently supplemented in the RNA-seq database, a number of open questions remain to be explored. One of the most profound initial discoveries is that there exists a remarkable amount of differential splicing between neurons and glia in the brain. Of the genes expressed in the mouse cortex, >73% are alternatively spliced by at least one CNS cell type, and many of these genes have cell type-specific isoforms whose functions remain entirely unknown (Ye Zhang et al., 2014).
Unveiling this layer of transcriptomic detail adds an additional level of cell type-specific gene expression than was initially revealed in the Cahoy et al. (2008) story and is likely to hold significant clues about the functional segregation between neurons and glia in the brain. In particular, when looking carefully at overall gene expression and isoform differences between neurons and glia, one of the most striking distinctions can be seen in the expression of genes involved in various metabolic processes. This is likely to have significant implications in how metabolism in the brain is segregated across cell types and for the nuances of CNS manifestations of pathologic conditions that lead to metabolic derangements.
In light of all the promising new avenues of scientific exploration that were made available by transcriptomic profiling of the CNS, we must also remain cognizant of the inherent limitations of these methods. For example, there are numerous examples of how transcriptional and proteomic profiles are often discordant, suggesting that translational kinetics, post-translational modifications, and general protein dynamics could distort our interpretation of RNA expression data. Gene expression is but one clue in understanding the function of the cells that comprise the CNS, and it will require a continued multidimensional approach of cellular, molecular, systems, and computational level strategies to understand the role that each of the resident CNS cells plays in the larger scheme.
It has now been almost 10 years since Cahoy et al. (2008) first published the transcriptomic profiles of the major macroglial cell types that reside in the CNS. This work provided a unique opportunity to begin the cellular dissection of the mammalian brain and to glimpse into the instruction manual of its most basic components. As we continue to ask what roles these cells play in the physiology of the brain, we can repeatedly turn back to this manual, and to the many subsequent resources that it has helped generate, to guide our path forward toward deciphering such an intricate organ.
Footnotes
Author reflections on developments since the publication of “A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function,” by John D. Cahoy, Ben Emery, Amit Kaushal, Lynette C. Foo, Jennifer L. Zamanian, Karen S. Christopherson, Yi Xing, Jane L. Lubischer, Paul A. Krieg, Sergey A. Krupenko, Wesley J. Thompson and Ben A. Barres. (2008) J Neurosci 28:264–78.
This work was supported by the National Institutes of Health Grant R01 MH099555-03 to B.A.B. and National Institute of Mental Health Grant F30MH106261 and Bio-X Predoctoral Fellowship to or supporting S.A.S.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Steven A. Sloan, Department of Neurobiology, School of Medicine, Stanford University, 291 Campus Drive, Stanford, CA 94305. ssloan1{at}stanford.edu