Abstract
Regional cellular heterogeneity is a fundamental feature of the human neocortex; however, details of this heterogeneity are still undefined. We used single-nucleus RNA-sequencing to examine cell-specific transcriptional features in the dorsolateral PFC (DLPFC) and the subgenual anterior cingulate cortex (sgACC), regions implicated in major psychiatric disorders. Droplet-based nuclei-capture and library preparation were performed on replicate samples from 8 male donors without history of psychiatric or neurologic disorder. Unsupervised clustering identified major neural cell classes. Subsequent iterative clustering of neurons further revealed 20 excitatory and 22 inhibitory subclasses. Inhibitory cells were consistently more abundant in the sgACC and excitatory neuron subclusters exhibited considerable variability across brain regions. Excitatory cell subclasses also exhibited greater within-class transcriptional differences between the two regions. We used these molecular definitions to determine which cell classes might be enriched in loci carrying a genetic signal in genome-wide association studies or for differentially expressed genes in mental illness. We found that the heritable signals of psychiatric disorders were enriched in neurons and that, while the gene expression changes detected in bulk-RNA-sequencing studies were dominated by glial cells, some alterations could be identified in specific classes of excitatory and inhibitory neurons. Intriguingly, only two excitatory cell classes exhibited concomitant region-specific enrichment for both genome-wide association study loci and transcriptional dysregulation. In sum, by detailing the molecular and cellular diversity of the DLPFC and sgACC, we were able to generate hypotheses on regional and cell-specific dysfunctions that may contribute to the development of mental illness.
SIGNIFICANCE STATEMENT Dysfunction of the subgenual anterior cingulate cortex has been implicated in mood disorders, particularly major depressive disorder, and the dorsolateral PFC, a subsection of the PFC involved in executive functioning, has been implicated in schizophrenia. Understanding the cellular composition of these regions is critical to elucidating the neurobiology underlying psychiatric and neurologic disorders. We studied cell type diversity of the subgenual anterior cingulate cortex and dorsolateral PFC of humans with no neuropsychiatric illness using a clustering analysis of single-nuclei RNA-sequencing data. Defining the transcriptomic profile of cellular subpopulations in these cortical regions is a first step to demystifying the cellular and molecular pathways involved in psychiatric disorders.
- cingulate
- dorsolateral prefrontal cortex
- RNA-seq
- schizophrenia
- single-nucleus
Introduction
The neurobiology underlying psychiatric and neurologic disorders is complex and poorly understood. Two brain regions that are repeatedly suspected to be dysfunctional in major psychiatric illnesses are the dorsolateral PFC (DLPFC) and the subgenual anterior cingulate cortex (sgACC).
The DLPFC is implicated in higher-order cognitive functions, including attention, working memory, abstract reasoning, and cognitive control (MacDonald et al., 2000; Egner and Hirsch, 2005; Hare et al., 2009). Accordingly, the DLPFC is considered a key region of dysfunction in various psychiatric illnesses, especially schizophrenia (SCZ) (Weinberger et al., 1986; Manoach et al., 1999; Potkin et al., 2009; Koenigs and Grafman, 2009; Price and Drevets, 2012). Consequently, the DLPFC has been the target of postmortem studies of SCZ (Fromer et al., 2016; Gandal et al., 2018b; Jaffe et al., 2018) and major depressive disorder (MDD) (Nagy et al., 2020). The less-studied sgACC is involved in linking emotional and cognitive systems, emotion processing, and autonomic and emotional arousal (Mayberg et al., 1999; Price and Drevets, 2012). It has been implicated in mood disorders, particularly MDD and bipolar disorder (BD), and has been the focus of imaging, postmortem, and treatment studies (Tripp et al., 2011; Scifo et al., 2018; Akula et al., 2021).
Detailed characterization of the cellular composition of these brain regions is essential because mechanisms of psychiatric diseases might act through distinct classes of cells. Single-nucleus RNA-sequencing (snRNA-seq) has recently emerged as a high-throughput method to characterize cell populations based on each individual nucleus's transcriptional profile (Darmanis et al., 2015; Habib et al., 2016, 2017; Krishnaswami et al., 2016). This unsupervised approach is advantageous for its ability to elucidate subtle gene expression differences on the individual cell level, undetectable at the bulk-tissue level. Single-cell transcriptomics has already proved useful in understanding various mental illnesses, such as the dysregulation of corticocortical projection neurons in autism spectrum disorder (ASD) (Velmeshev et al., 2019). While earlier snRNA-seq studies have largely used mouse brains, more recent studies have studied various cortical areas of the human brain, including the PFC (Lake et al., 2016, 2018; Habib et al., 2017). In the DLPFC, recent snRNA-seq studies have demonstrated cell subtype-specific abnormalities in MDD (Nagy et al., 2020) and SCZ (Ruzicka et al., 2020; Reiner et al., 2021). In the sgACC, there is only one study with snRNA-seq in 5 individuals (Tran et al., 2021). Additional studies using more subjects are necessary to create a consensus profile of cell types in these brain regions and elucidate how the molecular profiles of these two brain regions inform their functional roles.
This study seeks to map the molecular and cellular landscape of the DLPFC and sgACC using snRNA-seq in human postmortem brain. We profiled a total of 87,279 nuclei from both brain regions in 8 individuals without a history of psychiatric or neurologic disorders. Our large dataset allowed us to identify conserved and unique cell populations between the two cortical regions. Using enrichment analyses, we localized psychiatric disorder-relevant transcriptional and heritability signals in the context of our cellular characterization. Overall, this study provides a resource that catalogs the transcriptomic heterogeneity of these key cortical regions at single-cell resolution.
Materials and Methods
Contact for reagent and resource sharing
Further information and requests for resources and reagents should be directed to and will be fulfilled by S.M. (marencos{at}mail.nih.gov).
Experimental design and statistical analysis
snRNA-seq data were generated from the DLPFC and sgACC of 8 male subjects (age = 44.8 ± 11.2 SD; sex = male; race = 5 White and 3 Black; postmortem interval = 24.3 ± 7 h; pH = 6.5 ± 0.17), each without a history of psychiatric (including substance use disorders) or neurologic disorders during their lifetime (for detailed demographics, see Extended Data Fig. 1-1). Technical replicates for all 16 unique specimens at the droplet-based nuclei capture step resulted in 32 total samples. Quality control steps (detailed in snRNA-seq data preprocessing) eliminated seven samples with low total cDNA yield resulting in 25 samples with 105,047 nuclei and 28,195 genes. Pearson correlations were performed to compare technical replicates.
Unsupervised clustering algorithms in the Seurat version 4.1.0 (Hao et al., 2021) pipeline were used to assign nuclei into general cell classes and to identify and eliminate doublets, clusters with high mitochondrial contamination, and artifact nuclei (for details, see Unsupervised clustering, quality control, and clustering) yielding a final dataset of 63,901 nuclei. Additional subclustering was performed on Ex and In cells using Seurat version 4.1.0 tools. The default nonparametric Wilcoxon rank sum test with Bonferroni correction was applied to identify cluster markers. Pearson correlation was used to compare our cell classification to published snRNA-seq datasets (for details, see Comparison with published human snRNA-seq datasets).
To define consensus cell class identities, our data were integrated with the Allen Cell Types Database Human Multiple Cortical Areas SMART-seq dataset via reciprocal PCA in Seurat version 4.1.0, and a nearest centroid classifier was applied to map each cell onto the cluster labels in the Allen dataset. Regional subpopulation proportional representation was assessed using the GLIMMIX (generalized linear mixed model) procedure with logit function and the option of β distribution with default covariance structure in SAS version 9.4 to test for effects of subpopulations, brain region, and their interaction (see Regional differences in subclusters). Differential expression between regions and subclusters was tested using a pseudo-bulk approach using the edgeR likelihood ratio test (for details, see Regional differences in subclusters).
Linkage disequilibrium score regression (LDSC) was implemented to examine cell type enrichment for GWAS signals from studies of ASD (Grove et al., 2019), attention-deficit/hyperactivity disorder (ADHD) (Demontis et al., 2019), BD (Stahl et al., 2019), MDD (Wray et al., 2018), and SCZ (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) (for details, see Cell type-specific genes and psychiatric disorder enrichment).
Postmortem human specimen collection and dissection
Postmortem brains were collected at the Human Brain Collection Core (HBCC), National Institute of Mental Health Intramural Research Program, with permission from the legal next-of-kin according to the National Institutes of Health Institutional Review Board and ethical guidelines under protocol 17 M-N073. Cases were obtained from the Offices of the Chief Medical Examiner of the District of Columbia, Northern Virginia, and Central Virginia. Clinical, neuropathological, and toxicological characterization was performed as previously described (Deep-Soboslay et al., 2005; Lipska et al., 2006; Kunii et al., 2015).
Samples were dissected from frozen coronal slabs using a dental drill. For the DLPFC (BA9/46) dissections, gray matter tissue from the midpoint of the middle frontal gyrus was targeted. Subcortical white matter was carefully excluded. For the sgACC (BA24/25), the portion of the cingulum immediately ventral to the genu of the corpus callosum was dissected.
snRNA-seq nuclei extraction and isolation
The nuclei isolation protocol was adapted from Krishnaswami et al. (2016) with additional washes to reduce mitochondrial contamination. To maximize our ability to capture the entire cellular diversity in an anatomically defined brain region during dissection, we obtained two separate 50-80 mg aliquots from each frozen dissected block and combined them after nuclei isolation.
Frozen samples were physically dissociated in a homogenizing buffer using a Dounce homogenizer (DWK Life Sciences), five strokes with a narrow pestle followed by 15 strokes with a wide pestle. The resulting nuclear suspensions were filtered through a 35 μm mesh cell strainer. Nuclear suspensions were washed 3 times, pelleting the nuclei with centrifugation at 1200 × g for 9 min before resuspending the pellets in 0.5 ml of wash buffer. Nuclear suspensions from the same frozen block (described above) were combined at this step and thoroughly mixed.
For nuclei counting, a 10 μl aliquot of each nuclear suspension was stained 1:1 with 0.4% Trypan Blue. Nuclei were counted using a hemocytometer under a bright-field microscope. Concentrations ranged from 1 million to 1.5 million nuclei per milliliter. A final aliquot of 50,000 nuclei in 1 ml of storage buffer was taken for the 10× Genomics nuclei capture.
10× genomics nuclei capture and library construction
We followed the Chromium Single Cell 3′ Reagent Kits User Guide version 2 Chemistry (CG00052) and used Chromium Single Cell 3′ Library & Gel Bead Kit version 2 (PN-120237) to produce Single Cell 3′ libraries ready for Illumina sequencing (https://support.10xgenomics.com/). Droplet-based nuclei-capture and library preparation experiments were conducted using materials and support from the Single Cell Analysis Facility, National Cancer Institute.
First, we created technical duplicates from the nuclei suspensions and loaded them onto the Chromium Controller microfluidic device. We loaded volumes provided in the Cell Suspension Volume Calculator Table of the 10× user guide to target a recovery rate of 6000 nuclei per sample. The microfluidic device partitioned single nuclei into bar code-containing Gel Bead-In-Emulsions (GEMs). Samples were incubated, and poly-adenylated mRNAs were reverse-transcribed into cDNAs containing a unique molecular identifier (UMI) and a cell bar code unique to each GEM. Afterward, GEMs were broken, and the barcoded cDNAs were amplified by PCR. Next, a sample index, R2 (read 2 primer sequence), and standard P5 and P7 primers used in Illumina bridge amplification were added during library construction via End Repair, A-tailing, Adaptor Ligation, and PCR. The 16 bp 10× bar code and 10 bp UMI were encoded at the start of Read 1 while the 8 bp sample index information was incorporated into the i7 index read, and cDNA insert sequenced as Read 2.
Sequencing, bar code processing, and alignment
Sequencing was performed by the Center for Cancer Research Sequencing Facility at the Frederick National Laboratory for Cancer Research. Thirty-two 10× Genomics Chromium Single Cell 3′ version 2 paired-end libraries were multiplexed and sequenced on one run on the HiSeq 4000. Base-calling was performed with Illumina instrument runtime analysis software, RTA version 2.4.11. The output raw base call (BCL) files were demultiplexed using Bcl2fastq version 2.17, allowing one mismatch in barcodes, to generate FASTQ files. Sequence alignment and bar code processing were performed with 10× Genomics's processing software, Cell Ranger version 2.1.1 (Zheng et al., 2017). A customized human reference genome, GRCh38-1.2.0_premRNA (described below), was used for counting reads and the “expect-cells” parameter for estimating number of nuclei in each sample above a background expression level was set to 6000, consistent with the number of nuclei targeted during library preparation.
Because snRNA-seq captures precursor mRNA (pre-mRNA) transcripts that have not been alternatively spliced, these transcripts have retained intronic features. Recent RNA-seq and comparative studies of single nucleus and single-cell RNA-seq datasets have shown that including reads mapping to intronic regions as gene counts accurately estimates true gene expression (Gaidatzis et al., 2015; Lake et al., 2017; Bakken et al., 2018). Additionally, this method has an added benefit for downstream clustering analysis by increasing read depth for inherently low coverage snRNA-seq datasets. Therefore, a customized, pre-mRNA reference genome was created following recommendations from 10× Genomics. “Gene_biotype” tags in the original GTF file were modified such that Cell Ranger's count function included intronic features as gene counts.
snRNA-seq data preprocessing
Data processing, analysis, and handling were performed using the R package Seurat version 4.1.0 (Hao et al., 2021). Filtered gene count matrices were read into R using Seurat's Read10X function. After initial review of experimental and sequencing quality control (QC) characteristics, 7 of 32 total samples were identified as “dropout” samples and removed from the dataset. These samples displayed low total cDNA yield calculated from BioAnalyzer electropherograms. They contained high sequence duplication rate (>45%) calculated with the sequencing QC tool FastQC (Andrews, 2010) and high sequencing saturation rate (>85%) from Cellranger QC outputs. Together, these QC results suggested inadequate RNA capture and subsequent PCR overamplification of these “dropout” samples Twenty-five samples (14 DLPFC, 11 sgACC) remained for downstream analysis and were assessed for further quality control. Detailed 10× Genomics Library capture, sequencing, and alignment data for each sample of brain tissue is listed in Extended Data Fig. 1-2.
Filtering of noisy genes and outlier nuclei was performed; 5475 genes detected in fewer than three nuclei were filtered from the data. Nuclei with <200 genes do not contain enough transcriptional information for effective clustering and subsequent biological interpretation. This threshold is like those used in published snRNA-seq studies (Hu et al., 2017). No nuclei fell below this threshold in our data. Nuclei with large library sizes (>5000 genes detected and 15,000 UMIs) are likely to be “doublets” or “multiplets,” which are expected technical artifacts of the microfluidic droplet technology (Habib et al., 2017; Hu et al., 2017; Lake et al., 2018). We excluded 227 nuclei whose values exceeded these thresholds (0.2% of total nuclei). Next, the fraction of reads mapping to mitochondrially encoded genes was calculated for each barcoded nucleus. Mitochondrially encoded mRNA transcripts represent false transcriptional signals and thus should not be present in snRNA-seq data. Nuclei with >20% of reads mapping to mitochondrially encoded genes were filtered out (25,273 nuclei). Subsequently, mitochondrially encoded genes were removed from the count matrix. After these preprocessing QC steps, 105,047 nuclei and 28,195 genes were retained for analysis.
Correlation between technical replicates of samples
To assess similarity between technical replicates, we calculated the Pearson correlation coefficient between average gene expression of replicates using the cor function in R (Extended Data Fig. 1-3). Correlations of average expression across all genes and then across only highly variable genes were assessed. Correlation of gene expression (Pearson r > 0.998) and correlation of number of nuclei obtained (Pearson r = 0.992) across the technical replicates was high (Extended Data Fig. 1-3). Given the high correlation coefficient values between technical replicates and the added clustering power that is achieved with larger sample size (number of nuclei), technical replicates were combined for subsequent downstream analyses. Technical replicates could be treated as additional samples of the same subject because they were separate aliquots from nuclei suspensions, representing different nuclei from the same subject.
Unsupervised clustering, quality control, and clustering
Normalization of gene expression to account for library size and expression level was performed with the Seurat function NormalizeData using default settings. Next, highly variable genes (HVGs) in our dataset were identified using the Seurat function FindVariableGenes for unsupervised clustering. Briefly, genes were first binned into 20 bins based on their mean expressions. Z scores were calculated within each bin. This method of binning controlled for the correlation between gene expression and variability. After binning, genes with SD > 0.65 and mean log normalized expression between 0.015 and 4 were selected as HVGs. Next, the Seurat function ScaleData was used to compute z scores for each gene in the list of HVGs. We calculated z scores by fitting gene expression to a negative binomial model because it more accurately resembles gene expression distributions. In this regression, the total number of UMIs detected and fraction of mitochondrial genome encoded gene counts per nucleus were regressed out to further eliminate library size effects on the downstream clustering analysis. Next, for linear dimensional reduction, a principal component analysis (PCA) was performed on the z scores of the HVGs using the Seurat function RunPCA. The top 50 principal components (PCs) were selected after assessment of PCs with an elbow plot and were used as input to Seurat's unsupervised clustering algorithm function FindClusters. In this algorithm, Euclidean distances between nuclei were calculated in the PCA space (top 50 PCs) and a K-nearest neighbor (KNN) graph-based approach was used to group nuclei into clusters. The resolution parameter, which represents the granularity of the clustering analysis, was set at 3, and all other parameters were set at default settings. For two-dimensional t-distributed stochastic neighbor embedding (tSNE) visualization, the RunTSNE function from Seurat was performed at default settings on the same principal components that were used for clustering.
After clustering, a variety of methods were used to identify and disqualify clusters. The distribution of number of genes, number of UMIs, fraction of mitochondrial gene counts, and fraction of ribosomal protein gene counts were visually assessed for each unbiased cluster using violin plots. To assess similarity of these clusters to each other, hierarchical clustering analysis on the average expression for each cluster was performed using R function hclust, a pairwise correlation heatmap between all clusters was generated with R function cor to view similar mean expression profiles, and distances of tSNE coordinates were visually assessed. Doublet or mixed clusters, identified by high numbers of genes and UMIs detected in addition to the combined expression of multiple canonical cell class marker genes, and clusters with very low numbers of detected genes, low UMI counts, and lack of distinguishing expression profiles, were discarded. The remaining subset of nuclei were reclustered using new HVGs calculated from the subset. Cluster removal and reclustering were repeated until all clusters that formed could be identified by their specific expression of canonical cell-type marker genes from existing literature and could not be removed by quality control as described above. A total of 87,729 of 105,047 nuclei remained after cluster curation.
Finally, to remove artifact nuclei, which include collections of ambient RNA encapsulated during the droplet experiment or doublets/multiplets, we curated each cluster by calculating cell-type scores for each nucleus (barnyard curation). Cell-type scores were calculated as a nucleus's average expression of the top five specific genes for each broad cell class, listed in Extended Data Figure 1-4. Top five specific genes were determined as the five highest-ranked genes when significant differentially expressed genes (DEGs) for each broad cell cluster were ranked by specificity (percent of in-group nuclei expressing the gene/percent of out-group nuclei expressing the gene). An additional criterion for top five specific genes for broad cell classes was that the genes were required to overlap with marker genes from at least one other single-cell or nucleus RNA-seq dataset (Darmanis et al., 2015; Lake et al., 2016; Habib et al., 2017; Hu et al., 2017; Sathyamurthy et al., 2018). By assessing each nucleus's score for each of the broad cell cluster, nuclei with “promiscuous” cellular identity (i.e., having scores greater in cell classes other than the cell class to which it belongs) were filtered out from further analysis. A total of 63,901 of 87,279 nuclei remained after barnyard curation.
Following QC, 63,901 nuclear expression profiles remained from 25 samples, averaging 93 million reads per sample and 27,061 reads per nucleus, slightly above the range (19,000-26,000 reads per nucleus) typically required for sequencing saturation with 10× Genomics version 2 chemistry in human brain snRNA-seq (Habib et al., 2017). Using 10× Genomics' software Cell Ranger for read alignment and counting, 1641 median UMIs and 1100 median genes per nucleus were detected, which is consistent with other snRNA-seq studies in postmortem human brain samples (Habib et al., 2017; Lake et al., 2018). A modified human reference genome was used to incorporate intronic reads as gene counts (Gaidatzis et al., 2015; Lake et al., 2017; Bakken et al., 2018). Ninety percent of reads mapped confidently to the customized genome.
Determination of cluster marker genes
The FindMarkers and FindAllMarkers functions in Seurat were applied with default settings to identify marker genes for each cluster (i.e., genes enriched in a cluster compared with all other clusters). The default test used was a nonparametric Wilcoxon rank sum test. Only genes whose expression value were > 10% of the nuclei in the in-group, whose Bonferroni corrected p values < 0.05, and whose log fold-change > 0.25 were saved as marker genes. This approach was used to identify marker genes for broad cell clusters (Extended Data Fig. 1-5), and excitatory and inhibitory neuron subclusters (Extended Data Fig. 2-1).
Subclustering inhibitory and excitatory neurons
Transcriptional signals that differentiate neurons from glial cells may not be the same signals that differentiate subtypes of neurons, excitatory and inhibitory, from each other (Tasic et al., 2018). The signals that differentiate excitatory from inhibitory neurons, in turn, may also not be the same signals that differentiate canonical inhibitory neuron subtypes from each other. To effectively resolve the transcriptional diversity of inhibitory neuron subtypes without the influence of noninhibitory neurons, we performed subclustering on the restricted set of nuclei identified as inhibitory neurons. We implemented the same data-driven QC approaches described above for the broad cell class clustering analysis with an added step of merging nonunique clusters. From an initial 24 inhibitory neuron subclusters, we reassigned nuclei of one subcluster that did not have a uniquely identifiable expression signal (<10 unique DE genes) and one subcluster that mostly consisted of nuclei from one individual. This process is called “merging” and is performed by reassigning the identity of each nucleus individually to the subcluster with the highest average shared nearest neighbors. This process reassigned nuclei to subclusters that had very similar transcriptional profiles to its original cluster. Similarity of subclusters could be assessed by correlation heatmaps and relative position to one another in a hierarchical clustering tree. This resulted in 22 final inhibitory neuron subclusters. DE analysis was performed as described above for each subcluster against all other inhibitory neurons to identify potential marker genes. Subclustering was also performed for the excitatory neurons. From the initial 29 subclusters that formed, nine subclusters were identified as nonunique cell types and nuclei of these subclusters were reassigned to subclusters with highest average shared nearest neighbors. Merging these subclusters resulted in 20 final excitatory neuron subclusters represented by all 8 individuals. Statistics for the final list of subclusters are shown in Extended Data Figure 2-1.
Comparison with published human snRNA-Seq datasets
For comparison with published human brain snRNA-Seq datasets, we read in count matrices (Lake et al., 2016; Habib et al., 2017; Lake et al., 2017; Tasic et al., 2018; D. Wang et al., 2018) into R as Seurat objects. Average gene expression was computed for each cluster from each dataset. A Pearson correlation coefficient was calculated using R function, cor, on mean expression for all pairwise comparisons of clusters. We limited our genes for correlation comparisons to 3310 HVGs (disregarding 78 ribosomal protein genes) to reduce noise from gene expression signals that are less informative among brain cells and to avoid dataset-specific technical artifacts. All heatmaps were plotted with either the ggplot2 R package or the pheatmap R package with colors palettes generated from the RColorBrewer R package. Correlation heat maps are shown in Extended Data Figure 2-4.
Dataset integration in Seurat and nearest centroid classification
snRNA-seq count data from multiple cortical areas were downloaded from the Allen Cell Types Database (https://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-cortical-areas-smart-seq) and imported into Seurat (version 4) along with HBCC data. Nuclei with a missing cluster label were removed. Inhibitory neurons, excitatory neurons, and non-neuronal cells were analyzed separately. For each major subclass, cells were log-normalized and the top 2000 most variable genes across both datasets were selected using the SelectIntegrationFeatures function in Seurat. PCA was run separately on both HBCC and Allen Institute data for the selected genes. Integration anchors were obtained via the FindIntegrationAnchors function using reciprocal PCA on the top 30 principal components and a k.anchor default value of 5. Datasets were integrated via the IntegrateData function with a k.weight default value of 100.
For classification of HBCC data based on Allen Institute labels, a nearest centroid classifier was implemented via the lolR library and was trained on the Allen cluster labels with the integrated gene expression values from 2000 genes as described above. This classifier was then applied to predict the Allen cluster label of each cell in the HBCC data. A contingency figure of the original HBCC cluster ID and predicted Allen label was created, and the fraction of HBCC cells mapping to each Allen label was visualized in a heatmap (see Fig. 2E,F).
Regional differences in subclusters
To establish whether particular neuronal subpopulations were unequally represented in the DLPFC or the sgACC, we calculated the proportion of neuronal subpopulations using all neurons as the denominator, then used the GLIMMIX (generalized linear mixed model) procedure with logit function and the option of β distribution with default covariance structure in SAS version 9.4 to test for effects of subpopulations, brain region, and their interaction. Subject (id) was used as random effect, assuming subjects participating in the study are a random sample of all possible subjects. This procedure was repeated on absolute counts of each neuronal subpopulation (omitting the β distribution).
Differential expression analysis between subclusters and brain regions was conducted using a conservative pseudo-bulk approach via edgeR likelihood ratio test (Robinson et al., 2010), while adjusting for donor status using the Libra R package (Squair et al., 2021). DEGs were defined as those that passed false discovery rate (FDR)-corrected p < 0.05 and had an absolute logFC > 0.1. To explore the relevance of DEGs, these genes were used as input into gene enrichment analysis with the Gene Ontology (GO) (Gene Ontology Consortium, 2001), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), and DisGeNET (Piñero et al., 2015) as implemented in the ClusterProfiler (Yu et al., 2012) and disgenet2r R packages (Piñero et al., 2020).
Cell type-specific genes and psychiatric disorder enrichment
To test the association between cell type-specific genes and the heritability for psychiatric disorders, we used LDSC, as implemented previously (Finucane et al., 2015; Skene et al., 2018; Bryois et al., 2020). In brief, single-nucleus RNA expression data were scaled to a total of 1 million UMIs for each cell type, separately for each cortical region. A specificity metric was calculated as a ratio of the expression of each gene in a given cell type cluster over the total expression of that gene across all cell types (Skene et al., 2018; Bryois et al., 2020). The top 10% most specific genes for each cell type were used as input for LD score regression. Genome Wide Association Study (GWAS) data for the following psychiatric disorders were used: ASD (Grove et al., 2019), ADHD (Demontis et al., 2019), BD (Stahl et al., 2019), MDD (Wray et al., 2018), and SCZ (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014), accessed through the Psychiatric Genomics Consortium data sharing portal (https://www.med.unc.edu/pgc/). Gene coordinates were extended by 100 kb upstream and 100 kb downstream for each as done previously (Finucane et al., 2015; Bryois et al., 2020). p values were FDR-corrected for 7 broad cell types, and 47 subclusters (Benjamini and Hochberg, 1995).
The same top 10% of cell specific genes were used to test for enrichment with DEGs from a recently published large RNA seq study of ASD, SCZ, and BD samples (Gandal et al., 2018b). Hypergeometric tests were used to assess the enrichment or overlap between gene sets, and the p values were corrected as described above.
RNA ISH (RNAscope)
For RNAscope validation of cell types, 10-μm-thin cryosections were taken from the DLPFC and sgACC dissected from postmortem brain samples (Extended Data Fig. 2-7). The sections underwent post-fixation in 10% NBF at 4°C for 1 h. Tissue was dehydrated and stained following the Advanced Cell Diagnostics protocol for RNAscope 2.5 Duplex Chromogenic Assay (Anderson et al., 2016) with the following modifications: Protease IV treatment reaction time was reduced to 15 min; amplification reaction time was optimized to obtain the best resolution of puncta for the different target probes labeled with HRP-based green; counterstaining was done with 25% Hematoxylin 560 MX (Leica 042817) diluted in dH2O, for 10 s; alternatively, Nissl staining was done with 1% thionine for 30 s, subsequent wash in dH2O for 30 s twice, and differentiated in acidified ethanol for 5-15 s. Whole slide image acquisition was performed on a NanoZoomer S60 Digital Slide Scanner (Hammamatsu).
Code/software
Raw FASTQ files, final Seurat R Object containing nuclei expression data, and metadata are publicly available for download on the National Institute of Mental Health Data Archive (https://nda.nih.gov/experimentView.html?experimentId=1852&collectionId=3151) and Synapse (https://doi.org/10.7303/syn48374287.1). R code for this project is available for download on our GitHub repository: https://github.com/NIMH-HBCC. A shareable interactive user interface using iSEE (Rue-Albrecht et al., 2018) and Binder 2.0 (https://elifesciences.org/labs/8653a61d/introducing-binder-2-0-share-your-interactive-research-environment) is also accessible from the GitHub HBCC repository. Datasets and gene sets from referenced literature were obtained online from supplementary data.
Results
Classification of nuclei into broad cell types
Unsupervised clustering of the nuclei that survived quality control and preprocessing steps (see Materials and Methods; Extended Data Figs. 1-1, 1-2, and 1-3) generated 47 initial clusters. Initial clusters of the same broad cell type were grouped together, which ultimately resulted in seven distinct broad clusters: excitatory neurons (Ex), inhibitory neurons (In), astrocytes (Astro), oligodendrocytes (Oligo), oligodendrocyte precursor cells (OPCs), endothelial cells (Endo), and microglia (Micro) (Fig. 1; Extended Data Figs. 1-4 and 1-5) (Habib et al., 2017; Lake et al., 2018; D. Wang et al., 2018). Neuronal nuclei contained a higher median number of genes and UMIs (1225 and 1875, respectively) than non-neuronal nuclei (674 and 915, respectively), as expected of the physically larger nuclear sizes and greater functional complexity of neurons.
Molecularly defined broad cell types in the human DLPFC and sgACC from 8 male subjects (for donor details, see Extended Data Fig. 1-1). A, Workflow diagram of 10× Genomics snRNA-seq library capture, sequencing, and alignment (for sample metrics, see Extended Data Fig. 1-2), and RNAscope validation. Correlations between technical replicates can be found in Extended Data Figure 1-3. B, tSNE plot of broad cell clusters. C, Expression violin plots of canonical marker genes enriched in each broad cell type (see also Extended Data Figs. 1-4 and 1-5). D, Proportion of nuclei count from each of 8 donors in each broad cell cluster. E, Normalized expression of the most representative canonical marker gene for each broad cell type visualized in relation to the tSNE plot.
Figure 1-1
snRNA-seq subject information. Details about each subject sampled in the snRNA-seq are provided. Brain Number is a lab-specific identifier for brain tissue to conceal personal identifiable information. PMI: post-mortem interval. RIN: RNA integrity number. pH: pH of brain tissue upon collection. Download Figure 1-1, XLSX file.
Figure 1-2
10x Genomics library capture, sequencing, alignment data. Details about each sample of brain tissue are provided. Experiments were carried out in four batches. Sample ID provides brain number, brain region, and replicate number. Cases excluded from analysis are marked in RED. Download Figure 1-2, XLSX file.
Figure 1-3
Correlations between technical replicates of snRNA-seq samples. Pearson correlation coefficients between the replicates for each sample were calculated across all genes and across highly variable genes. Highly variable genes are genes with standard deviation > 0.65 and mean log normalized expression between 0.015 and 4. All cases were well replicated (Pearson correlation coefficient across all genes > 0.99). Download Figure 1-3, XLSX file.
Figure 1-4
Top 10 marker genes for broad cell classes. Significantly differentially expressed genes were ranked by specificity. Ten marker genes with the highest rank for each broad cluster are shown. Specificity is defined as percent of in-group nuclei expressing the gene/percent of out-group nuclei expressing the gene. Enrichment of the genes under their respective broad cell classes is replicated in published single cell RNA-Seq studies (Darmanis S et al., 2015; Lake BB et al., 2016; Habib N et al., 2017; Hu P et al., 2017; Sathyamurthy A et al., 2018). Genes that were selected for barnyard curation for each broad cell class are highlighted in green. Download Figure 1-4, XLSX file.
Figure 1-5
Unique differentially expressed genes for each broad cell class. Adjusted p value: p value after Barnyard curation. A p value of 0 represents a number less than the smallest number represented by R (∼2E-308). Avg log(FC): average value of the natural log of the gene fold change of the expression in the assigned broad cell class over the expression in all other broad cell classes. Pct.in: percent of assigned broad cell class nuclei expressing the gene. Pct.out: percent of all other broad cell class nuclei expressing the gene. Specificity: pct.in/pct.out. Download Figure 1-5, XLSX file.
Transcriptional diversity in excitatory neurons
To investigate the diversity of neuronal subtypes, excitatory neuronal nuclei were extracted from the larger dataset and iteratively reclustered to yield 20 robust, transcriptionally distinct Ex subclusters (Fig. 2A,C). Each subcluster was given a simple assignment (e.g., Ex18), in which the number corresponds to the relative size of each subcluster in descending order (e.g., Ex01 is the largest group of excitatory cells). Differential gene expression was determined for each Ex subcluster versus all other Ex subclusters to identify enriched cell type-specific markers for each subcluster (Extended Data Fig. 2-1). Each subcluster contained at least one nucleus from all 8 subjects, thereby avoiding individual-biased clusters (Extended Data Fig. 2-1).
Ex subclusters were categorized into upper (L2-3), middle (L4-5), and lower (L6-6b) cortical layer groups based on the expression of known layer-specific marker genes: CARTPT (L2-3), CUX2 (L2-4), RORB (L3-5), FOXP2 (L6), PCP4 (L5), GRIN3A (L5), GRIK3 (L5-6), TLE4 (L5-6), and CTGF (L6b) (Fig. 2) (Zeng et al., 2012; Hodge et al., 2019). Subcluster layer assignment was verified with comparison to the Allen Brain Institute's ISH atlas (Extended Data Fig. 2-3) (https://human.brain-map.org/ish/search). We further validated our subcluster classification by demonstrating strong overall correlation with excitatory neuronal clusters defined in other studies across various cortical regions in both mouse and human (Extended Data Fig. 2-5) (Habib et al., 2017; Lake et al., 2018; Tasic et al., 2018; D. Wang et al., 2018).
We next integrated the neuronal clusters identified here with the Allen Brain Institute's Human Multiple Cortical Areas SMART-seq dataset (Fig. 2E; Extended Data Fig. 2-4), which includes data from middle temporal gyrus, ACC, primary visual cortex, primary motor cortex, primary somatosensory cortex, and primary auditory cortex (https://portal.brain-map.org/atlases-and-data/rnaseq) and defines neuronal populations that are conserved across species with anatomically confirmed layer assignments and inferred connectivity of intratelencephalic (IT: all upper- and middle-layer subclasses), extratelencephalic (ET: Ex25), corticothalamic (CT: Ex12), layer 6b (Ex13 and Ex22), and near-projecting (NP: Ex17) cells (Hodge et al., 2019; Bakken et al., 2021). Notably, L4 IT and L3-5 IT neuron subtypes characteristic of primary visual, sensory, and motor cortices were not found. Other exceedingly rare L5, L5/6, and L6 classes were also not found in our dataset. Using this classification method, most HBCC subclusters mapped to several Allen clusters with one cluster being favored (Fig. 2E; Extended Data Fig. 2-4). Rarely, multiple HBCC subclusters would map to the same Allen cluster. A clear one-to-one correspondence for the lower-layer neuronal subpopulations was observed. Middle- and upper-layer subclusters were less well-matched to corresponding clusters. Using this data integration, we labeled our clusters following common cell type nomenclature guidelines (Miller et al., 2020). The labels included layer specificity, inferred connectivity, and major classifying markers, keeping the Allen nomenclature where their classifying markers coincided with ours (Fig. 2E; Extended Data Figs. 2-2 and 2-4).
The smallest subcluster, Ex25 (Exc L5 ET FEZF2 GABRQ), integrated with multiple Allen L5 ET classes, but primarily with Exc L5 FEZF2 MORN2, a rare cell type found primarily in the ACC. Intriguingly, Ex25 displayed a gene expression signature corresponding to a rare neuron type, the von Economo neuron (VEN), with enriched expression of VEN markers, such as ADRA1A, GABRQ, VAT1L, and ADAMTS2 (Extended Data Fig. 2-5). VENs are morphologically defined by a unique spindle shape and are implicated in the pathogenesis of frontotemporal lobar dementia and SCZ (Seeley et al., 2006; L. Yang et al., 2019). The identification of this rare, well-characterized cell type reinforces the validity of our subcluster analysis and cell type classifications.
Transcriptional diversity in inhibitory neurons
In parallel to our analysis of Ex subclusters, we subset the inhibitory neuron nuclei from the larger dataset and performed reclustering to obtain 22 transcriptionally distinct In subclusters (Fig. 2B,D). Except for In22 and In24, both of which consisted of at least one nucleus from 7 subjects, each subcluster contained at least one nucleus from all 8 subjects (Extended Data Fig. 2-2).
Differential gene expression was performed to identify specific markers for each subcluster (Extended Data Fig. 2-1). Our first observation was that In subclusters segregated along developmental lineages derived from caudal and medial ganglionic eminences (CGE and MGE) (Extended Data Fig. 2-3). CGE-derived interneurons accounted for 54% of all interneurons and expressed the transcription factors NFIB and PROX1. MGE-derived subclusters expressed SOX6 and LHX6. Using RNAscope, we confirmed coexpression of the MGE marker LHX6 in SST-expressing cells across all cortical layers and in PVALB-expressing cells in layers IV and above (Extended Data Fig. 2-3). The CGE marker HTR2C was coexpressed with VIP in all cortical layers (Extended Data Fig. 2-3). Intriguingly, the LAMP5 subcluster In06 expressed both CGE and MGE transcription factors (NFIB, LHX6, and SOX6).
We next observed that In subclusters mapped to the four major classes of inhibitory neurons: SST, PVALB, VIP, and LAMP5. In subclusters also matched robustly to corresponding Allen Institute clusters (Bakken et al., 2021). Integration of our data with the Allen Brain Institute's Human Multiple Cortical Areas SMART-seq dataset revealed distinct mapping of all In subclusters to few related Allen clusters with no overlap between In subclusters (Fig. 2F), emphasizing the robustness of our cluster classification. Few Allen clusters were not identified among our In subclusters: Inh L6 SST NPY, L1-2 PVALB TAC1, Inh L6 LAMP5 ANKRD20A11P, Inh L1 VIP SOX11, Inh L1 ADARB2 DISP2, and Inh L1-3 VIP CCDC184 classes, which are especially rare subtypes, and L4-5 PVALB TRIM67, which is primarily found in sensory and visual cortices. Similar to Ex subclusters, the In subclusters were assigned labels following common cell type nomenclature guidelines (Miller et al., 2020), which include broad neuronal class, canonical marker gene, and cell type-specific marker genes.
Numerical differences in neuronal subclusters between DLPFC and sgACC
While there was no difference between the brain regions in the overall counts of excitatory cells (Extended Data Fig. 2-1), significant differences were found (p < 0.0001) at the subcluster resolution (Fig. 3). Some Ex subclusters appeared almost exclusively in the DLPFC; these included three supragranular subclusters (Ex15 L2-3 LINC00507 PDGFD, Ex01 L2-4 RORB CARTPT, and Ex02 L3-4 RORB PRSS12) and one middle-layer subcluster (Ex03 L4-5 RORB ANXA1) (Fig. 3B,C). Conversely, one supragranular subcluster (Ex20 L3 LINC00507 PTGER3) and one middle-layer subcluster (Ex16 L4-5 RORB RSPO3) exhibited enrichment in the sgACC (Fig. 3B,C). Notably, most deep-layer excitatory subclusters, consisting of pyramidal projection neurons, were present in both brain regions with similar frequencies.
Subclusters of excitatory (Ex) and inhibitory (In) neurons (for subcluster details, see Extended Data Fig. 2-1). A, B, Nomenclature and tSNE plot of Ex and In subclusters, respectively. Nomenclature format: Excitatory or versus inhibitory, projection type (if Ex), cortical layer, marker gene, unique differentially expressed marker gene; HBCC ID is written in parentheses (see Extended Data Fig. 2-2). tSNE plot shows subclassification of Ex subclusters to upper, middle, or lower cortical layers and subclassification of In subclusters to LAMP5, SST, VIP, and PVALB (see Extended Data Fig. 2-3). C, D, Expression violin plots for cortical layer marker genes for Ex subclusters and for ganglionic eminence marker genes for In subclusters, respectively (see Extended Data Fig. 2-3). E, F, Nearest centroid classification of Allen Cell Types Database and HBCC Ex and In subclusters, respectively. Ex Allen clusters are divided into Allen projection subclasses. See Extended Data Figure 2-3 for tSNE analysis comparing HBCC and Allen Brain Atlas subclusters and RNAscope validation of select markers and Extended Data Figure 2-4 for comparisons of subclusters by brain region. See Extended Data Figure 2-5 for correlation of HBCC subclusters with additional human snRNA-seq datasets. See Extended Data Figure 2-6 for correlation values of subclusters to each other and Extended Data Figure 2-7 for donor details of cases used for RNAscope studies.
Figure 2-1
snRNA-seq subcluster statistics. Each subcluster is identified by excitatory (Ex) or Inhibitory (In) classification and by a number that had been assigned at the initial subclustering. Excluded subclusters via QC are not shown. Five broad cell class of glia are listed at the end. Download Figure 2-1, XLSX file.
Figure 2-2
Differentially expressed genes for each subclass. Subclusters for excitatory and inhibitory neurons are named starting with Ex and In, respectively; Astro = astrocytes, Endo = endothelial cells, Micro = microglia, Oligo = oligodendricytes, OPCs = oligodendricyte precursor cells. Subclusters are ordered as they are shown in Figure 2. Excitatory subcluster groups are organized by location in the cortical layer. Inhibitory subcluster groups are organized by ganglionic eminence. Adjusted p value: p value after Barnyard curation. A p value of 0 represents a number less than the smallest number represented by R (∼2E-308). Avg log(FC): average value of the natural log of the gene fold change of the expression in the assigned broad cell over the expression in all other broad cell classes. Pct.in: percent of assigned subcluster nuclei expressing the gene. Pct.out: percent of all other subcluster nuclei expressing the gene. Specificity: pct.in/pct.out. Download Figure 2-2, XLSX file.
Figure 2-3
(A, B) Total number of nuclei profiled for each Ex and In subclusters, respectively. (C, D) Percentage of nuclei from each subject for each Ex and In subclusters, respectively. (E) In-situ hybridization images (Allen Brain Institute) showing expression patterns across layers for cortical layer marker genes in the DLPFC and sgACC (http://human.brain-map.org/ish/). Markers FOXP2, TLE4, and CTGF are not shown as they are not available on the Allen Brain Atlas. (F) Scaled mean expression values of inhibitory neuron marker genes for MGE-derived In subclusters. (G) Percent nuclei expression of ganglionic eminence marker genes by each In subcluster. Two distinct expression patterns emerge for In subclusters derived from medial ganglionic eminence (MGE) and for In subclusters derived from caudal ganglionic eminence (CGE). (H) tSNE plots identifying corresponding HBCC Ex subclusters (top plot) to Allen Brain Atlas Ex subclusters (bottom plot). (I) tSNE plots identifying corresponding HBCC In subclusters (top plot) to Allen Brain Atlas In subclusters (bottom plot). (J) RNAscope validation of LHX6+/SST+ cell types in human sgACC and DLPFC sections. LHX6 and SST transcripts were detected with chromogenic RNAscope probes, red and green, respectively. In both the sgACC and the DLPFC LHX6+/SST+ neurons were found throughout the cortex. (K) RNAscope validation of LHX6+/PVALB+ cell types in human sgACC and DLPFC sections. LHX6 and PVALB transcripts were detected with chromogenic RNAscope probes, red and green, respectively. In both the DLPFC and sgACC, LHX6+/PVALB+ neurons were found in cortical layers IV and above. (L) RNAscope validation of HTR2C+/VIP+ neurons in human sgACC and DLPFC sections. HTR2C and VIP transcripts were detected with chromogenic RNAscope probes, red and green, respectively. In both the sgACC and the DLPFC HTR2C+/VIP+ neurons were found throughout the cortex. (M) RNAscope validation of POSTN+/SMYD1+ neurons in human sgACC and DLPFC sections. SMYD1 and POSTN transcripts were detected with chromogenic RNAscope probes, red and green, respectively. POSTN is a gene found to be downregulated in SCZ (Gandal MJ et al., 2018b). In both the sgACC and the DLPFC POSTN+/SMYD1+ neurons were found in the lower-layer. Download Figure 2-3, TIF file.
Figure 2-4
(A, D) Proportion analysis using GLIMMIX showing main effects of brain region, subcluster identities, and of the interaction between brain region and subcluster identity for Ex and In, respectively (see methods). (B, E) UMAP plots of Ex and In neurons, respectively, colored by brain region for HBCC subclusters (top) and Allen Brain cell types (bottom). (C, F) Nearest centroid correspondence of Allen Brain Atlas Cell Types Database and HBCC excitatory and inhibitory neurons, respectively. X-axis shows neuronal subclusters separated by brain region. Y-axis shows Allen Brain cell types. Gray plot shows the number of cells represented by each Allen Brain cluster. Colored plot shows the percent of cells represented by each Allen Brain region per cluster. There was no correspondence of HBCC subclusters to L4 IT Allen Brain clusters. Download Figure 2-4, EPS file.
Figure 2-5
Heatmaps of Pearson correlation values showing the relationship between neuronal subclusters from this study (HBCC) and those from similar snRNAseq studies. (A) Correlation with Lake et al., 2016 neuronal clusters in the human frontal cortex, occipital cortex, and temporal cortex. (B) Correlation with Lake et al., 2018 neuronal clusters in the human frontal cortex, occipital cortex, and temporal cortex. (C, D) Correlation with (Habib N et al., 2017) neuronal clusters in human prefrontal cortex and hippocampus. (E) Correlation with (Wang D et al., 2018) neuronal clusters in human prefrontal cortex. (F, G, H) Correlation with (Tasic B et al., 2018) neuronal clusters and classes in mice primary visual cortex and anterior lateral motor cortex. Download Figure 2-5, EPS file.
Figure 2-6
Correlation values of subclusters to each other. (A) Heatmap of Pearson correlation values showing the relationship between all subclusters. (B) Heatmap of Pearson correlation values showing the relationship between Ex subclusters. (C) Heatmap of Pearson correlation values showing the relationship between In subclusters. (D) Heatmap of Euclidean distances between Ex subclusters. (E) Heatmap of Euclidean distances between In subclusters. Download Figure 2-6, EPS file.
Figure 2-7
RNAscope subject information. Details about each subject sampled in the RNAscope experiments are provided. PMI: post-mortem interval. RIN: RNA integrity number. pH: pH of brain tissue upon collection. Download Figure 2-7, XLSX file.
Neuronal diversity between the DLPFC (salmon) and sgACC (cyan). A, tSNE plots showing excitatory (Ex) and inhibitory (In) neuronal subclusters colored by brain region. B, Percentage of cells in each brain region for each subcluster. C, Proportion of cells for each brain region for each subcluster, shown as logit values of ranked numerosity. Ex20, Ex15, Ex01, Ex02, Ex03, Ex16, In04, and In05 have nonoverlapping CIs. Wilcoxon rank sum test showed that Ex15, Ex01, and In04 have significant differences between DLPFC and sgACC after adjusted p values: **p < 0.05. Ex20, Ex02, Ex03, Ex16, In01, and In06 also show meaningful differences between DLPFC and sgACC with unadjusted p values: *p < 0.005. D, Number of differential expressed genes for each brain region within each subcluster (FDR < 0.05, logFC > 0.1) (Extended Data Fig. 3-1). See Extended Data Figure 3-2 for subcluster enriched GO terms and Extended Data Figure 3-3 for the proportional contribution of individual donors to each subcluster.
Figure 3-1
Differentially expressed genes in Ex and In subclusters by brain region. Adjusted p value: p value after Barnyard curation. A p value of 0 represents a number less than the smallest number represented by R (∼2E-308). Avg log(FC) (DLPFC/sgACC): average value of the natural log of the gene fold change of the expression in the DLPFC over the expression in the sgACC. Genes selected using edgeR pseudobulk method. Download Figure 3-1, XLSX file.
Figure 3-2
Gene Ontology (GO) Terms for Subclusters with High Numbers of Differentially Expressed Genes (DEGs). Displays the top 10 most significant GO Biological Processes Terms for increased differential expression in the DLPFC (top row) and sgACC (bottom row) for each subcluster that has a number of DEGs greater than 50. The color of the dot represents the adjusted p value, while the size of the dot represents the number of DEGs in the given GO term. Gene Ratio is the percentage of total DEGs in the given GO term. Download Figure 3-2, EPS file.
Figure 3-3
Proportion of nuclei-count from each of 8 donors in the DLPFC and sgACC for each (A) inhibitory subcluster, (B) excitatory subcluster, and (C) glia. Different color represents a different donor. Note that we have 6 cases are represented for sgACC and 8 cases for DLPFC (see Fig. 1-2). Download Figure 3-3, EPS file.
Inhibitory neurons were more numerous in the sgACC (p < 0.0001) than in the DLPFC (Fig. 3B,C); however, In subclusters exhibited less regional variability than Ex subclusters. Only In04 L5 PVALB HGF basket cells, In06 L1-6 LAMP5 CA13 and In01 L1-6 VIP VIP subclusters were significantly enriched in the sgACC. Overall, we found that heterogeneity of supragranular Ex cells is the primary distinguishing feature of these cortical regions.
Transcriptional differences between DLPFC and sgACC for neuronal subpopulations
We next asked whether transcriptional diversity within subclusters further contributed to differences between these cortical areas by examining differential gene expression between DLPFC- and sgACC-derived nuclei within each subcluster (Fig. 3D). Very few DEGs were found for In subclusters with a median of 14 DEGs of 1083 median total genes detected (1.3%) (Fig. 3D). Ex subclusters had a median of 67 DEGs per cluster of 1640 median total genes detected (4.1%). In particular, six Ex subclusters had extremely high interregional variation in gene expression Ex06 (20.0%), Ex25 (17.4%), Ex10 (16.4%), Ex08 (12.6%), Ex12 (9.6%), Ex13 (6.3%), and Ex02 (5.2%) (Fig. 3D). GO analyses on these subclusters were performed to identify molecular processes that distinguish each brain region (Extended Data Fig. 3-2). The predominant differences included enrichment for synapse organization, cell-cell adhesion, trans-synaptic signaling, and other GO terms associated with functions of neurotransmission in the sgACC versus DLPFC (Extended Data Fig. 3-2).
Ex25 (Exc L5 ET FEZF2 GABRQ) had the highest absolute number of DEGs of the neuronal subclusters. This was the only extratelencephalic projecting subcluster identified in our dataset. Upon closer inspection, we found that GABRQ, a marker gene for VENs (L. Yang et al., 2019; Hodge et al., 2020) was only expressed in sgACC-derived nuclei of the Ex25 subcluster (Fig. 4A), whereas other markers for VENs (ADRA1A, VAT1L, and ADAMTS2) were present in both brain regions. Despite the large number of interregional DEGs, this subcluster could not be further subdivided using our unbiased classification methods as clustering using a higher resolution would have yielded clusters without representation from all individuals. We therefore investigated whether this population of neurons simply had a lower level of GABRQ expression in the DLPFC not detected by our single-nucleus methodology or whether they were indeed distinct populations. Using RNAscope, we examined the coexpression of GABRQ and ADRA1A in the DLPFC and sgACC and found that (1) GABRQ expression was indeed restricted to the sgACC and was not present in ADRA1A-expressing neurons in the DLPFC and (2) that these cells exhibited distinct morphologies in the two brain regions. The ADRA1A/GABRQ double-positive neurons in layer V of the sgACC exhibited classic spindle-shaped morphology that typifies VENs, whereas the ADRA1A-positive neurons in the DLPFC exhibited the more generalized pyramidal morphology (Fig. 4B; Extended Data Fig. 4-1).
Putative VEN subcluster Ex25. A, Ex25 gene expression of VEN markers in the DLPFC and the sgACC. B, RNAscope histology images comparing VENs in human DLPFC and sgACC sections. ADRA1A and GABRQ transcripts were detected with chromogenic RNAscope probes, red and green, respectively. ADRA1A+/GABRQ+ neurons were found in layer V in both the sgACC and DLPFC. Note the different morphology of the cell bodies in DLPFC and the apparent reduction in expression of GABRQ. For additional images, see Extended Data Figure 4-1. C, Enrichment analysis of genes with greater expression in the sgACC within Ex25 neuronal subpopulation, using GO (left), KEGG (middle), and DisGeNET database of gene-disease associations (right). Point size represents the number of genes that are differentially expressed and within a particular enrichment category (e.g., a GO term).
Figure 4-1
RNAscope histology images comparing VENs in human DLPFC and sgACC sections in various subjects. ADRA1A and GABRQ transcripts were detected with chromogenic RNAscope probes, red and green, respectively. Download Figure 4-1, EPS file.
Cellular enrichment of psychiatric disorder transcriptomic and GWAS signals
Last, we examined the enrichment of bulk-tissue RNA-seq studies of psychiatric disorders, SCZ, BD, and ASD (Fromer et al., 2016; Gandal et al., 2018a; Jaffe et al., 2018), in our transcriptomic profiles of neuronal subpopulations. To this end, we performed hypergeometric tests on data from cell type-specific enrichment of DEGs obtained from the largest postmortem RNA-expression meta-analysis to date (Gandal et al., 2018b). In parallel, we used LDSC on GWAS signals derived from large case-control studies (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Wray et al., 2018; Demontis et al., 2019; Grove et al., 2019; Stahl et al., 2019) to examine the enrichment of heritable loci across neuronal subclusters for the same three neuropsychiatric disorders and two more: ADHD and MDD.
At the broad cell cluster level, glial cells exhibited the greatest enrichment for transcriptional changes across psychiatric disorders. Astrocytes were particularly enriched for DEGs upregulated in SCZ and ASD. Microglia and endothelial cells were highly enriched for DEGs downregulated in SCZ and BD. Neurons were enriched for DEGs downregulated in ASD and upregulated in BD but showed no enrichment in the other disorders (Fig. 5).
Cellular enrichment of psychiatric disorders. Blue heatmaps represent transcriptomic cell type-specific enrichment of upregulated and downregulated DEGs for different psychiatric disorders (SCZ, BD, ASD). Red heatmaps represent genetic cell type-specific enrichment of GWAS loci for different psychiatric disorders (SCZ, BD, ASD, ADHD, MDD). Rows represent HBCC cell types. Each cell type is divided by brain region. Scales represent –log(FDR p value) of enrichment. Values that are statistically significant are shown and starred. Upregulated and downregulated DEGs were obtained from Gandal et al. (2018b). GWAS signals were obtained from the following studies: Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014); Wray et al. (2018); Demontis et al. (2019); Grove et al. (2019); and Stahl et al. (2019).
Additional DEG enrichment signals emerged when assessing neurons at the resolution of subclusters. Several Ex subclusters, primarily in the DLPFC, were associated with altered expression in ASD. Gene expression dysregulation in BD was associated with two In (In01 Inh L1-6 VIP VIP, In19 Inh L4-6 SST RCAN3) and two Ex subclusters (Ex10 Exc IT L2-3 LINC00507 SEMA3C, Ex03 Exc IT L4-5 RORB ANXA1). Eighteen In and Ex subclusters were enriched for transcriptional changes in SCZ in either or both the DLPFC and sgACC (Fig. 5).
GWAS signals at the broad cell class level were generally enriched in neurons and OPCs, especially for BD and MDD, with no significant enrichment in astrocytes, oligodendrocytes, microglia, or endothelial cells across disorders. For SCZ, a nominal association for GWAS signals with broad neuronal classes was found; however, this did not survive FDR correction. Additional GWAS signal enrichment for specific neurons was found at the subcluster classification level, especially for SCZ, where enrichment of SCZ genetic signals was identified in four Ex subclusters (Ex15 Exc IT L2-3 LINC00507 PDGFD, Ex7 Exc IT L6 THEMIS TMEM233, Ex14 Exc IT L6 THEMIS THEMIS, and Ex20 Exc IT L3 LINC00507 PTGER3). Intriguingly, two of these Ex subclusters also featured transcriptional alteration in SCZ: Ex20 in the sgACC and Ex14 in the DLPFC. While no GWAS signals mapped to Ex25 Exc ET L5 FEZF2 GABRQ, this subcluster was enriched for DEGs upregulated in SCZ only in the sgACC. Only one In neuronal subcluster (In07 Inh L2-4 SST AHR) exhibited GWAS enrichment in BD, and no neuronal enrichment was found for ADHD and MDD (Fig. 5).
Discussion
Neuronal diversity in the DLPFC and sgACC
In this study, we sought to characterize the cellular composition of the DLPFC and sgACC, two brain regions implicated in various psychiatric disorders (Drevets et al., 2008; Guillozet-Bongaarts et al., 2014). Through snRNA-seq of these brain regions from 8 individuals with no history of mental illness, we were able to profile 63,901 nuclei, ultimately identifying 42 transcriptionally distinct neuronal populations using an unbiased, data-driven approach. The transcriptional identities of these subclusters proved to be quite robust. Most clusters corresponded tightly to those defined by the Allen Institute Multiple Cortical Areas SMART-seq dataset, which were achieved with layer-specific dissections and much deeper sequencing.
GABAergic interneurons have been classically defined by their derivation from either the MGE or CGE, their expression of the canonical markers SST, PVALB, VIP, or LAMP5, and morphology. While the molecular definition of these cells remains unclear, a few populations could be clearly defined: In21 Inh L3-6 PVALB COL15A1 are presumably Chandelier cells because of expression of TRPS1 and PTHLH and integration with Allen Inh L1-3 PVALB MFI2 (Fig. 2F) (Marin and Muller, 2014); and In02 (Inh L1-4 LAMP5 KIT) are likely Rosehip neurons, with the molecular profile GAD1+/CCK+/CNR1–/SST–/CALB2–/PVALB– (Boldog et al., 2018). We also identified a rare LAMP5 subtype (In06 Inh L1-6 LAMP5 CA13) with characteristics of both MGE and CGE derivation (Extended Data Fig. 2-2H). A similar population was identified in mouse (Tasic et al., 2018) and human (Luo et al., 2022), and a recent study using integrative cross-species alignment and ISH in mouse and marmoset hypothesizes that these LAMP5+/LHX6+ cells may be MGE-derived (Krienen et al., 2020). The remaining populations of SST, VIP, PVALB, and LAMP5 cells do not map unequivocally to specific morphologic or neurophysiological types, although recent works attempt to bridge this gap (Gouwens et al., 2019; Scala et al., 2021). An approach combining immunohistochemical markers, morphology, and RNA probes could help map molecular definitions to morphologic/functional GABAergic classes.
Unlike interneurons, Ex subclusters exhibited considerable diversity across cortical regions. Diversity in cell numbers was greatest in supragranular layers (Fig. 3C), while diversity in expression was distributed across different layers (Fig. 3D). The correspondence of our Ex subclusters with those from the Allen Institute was less precise for superficial layer classes than deeper-layer Ex subclasses. Three distinct cell types identified in our study (Ex10, Ex15, and Ex01) mapped to Exc L2-3 LINC00507 RPL9P17 in the Allen dataset, and Ex20 and Ex18 both mapped to Exc L3 LINC00507 PSRC1. This may reflect evolutionary expansion of supragranular layers in human (DeFelipe, 2011), resulting in regional specialization of cells within these superficial layers. Indeed, several superficial and middle-layer Ex subclasses tend to be enriched in one or few regions, whereas deeper-layer subclasses appear to be present across brain regions (Extended Data Fig. 3-1B,C). Interestingly, the Allen cell classes derived primarily from the ACC (Exc L3-5 RORB HSBP3, Exc L3-5 RORB CD24, Exc L3-5 LINC00507 SLN, and Exc L3-5 LINC00507 PSRC1) were found enriched in the sgACC, which is a contiguous brain region. Supragranular layers consist mostly of corticocortical projection neurons (Krienen et al., 2016), which are postulated to be more varied in higher-order cortical regions to support specialized local information processing. Deeper-layer projection neurons responsible for communicating with other cortical and extracortical regions exhibited less regional variation in cell proportions.
These differences in cell types are also reflected in the cytoarchitectonics of the two regions, the major difference being the relative absence of granular layer 4 in the sgACC (Rajkowska and Goldman-Rakic, 1995; Palomero-Gallagher et al., 2008). Overall, layer 4 RORB-expressing Ex neurons were 2.6-fold overrepresented in DLPFC versus sgACC. Specifically, three abundant L4-enriched Ex subclasses (Ex01 IT L2-4 RORB CARTPT, Ex02 IT L3-4 RORB PRSS12, Ex03 IT L4-5 RORB ANXA1 in Fig. 3C) were significantly overrepresented in DLPFC. Only the rarer Ex16 IT L4-5 RORB RSPO3 was overrepresented in sgACC (Fig. 3B; Extended Data Fig. 2-1).
We also identified a rare, well-defined cell population, the VEN (Ex25 Exc ET L5 FEFZ2 GABRQ). VENs are extratelencephalic projecting excitatory neurons with morphologically characteristic spindle-shaped cell bodies primarily found in the fronto-insular, anterior cingulate, and medial fronto-polar cortices (González-Acosta et al., 2018; Banovac et al., 2021). In our dataset, Ex25 had the greatest number of intercortical DEGs and indeed expressed canonical VEN markers, such as ADRA1A and ADAMTS2. However, Ex25 in the DLPFC did not express GABRQ. Moreover, we noted that, in the DLPFC, these ADRA1A-positive cells exhibited pyramidal morphology as opposed to having a spindled or forked morphology. Hodge et al. (2020) similarly found that this population also mapped to both pyramidal and forked cell morphologies, and others have suggested the existence of VENs in the DLPFC (BA9) (Fajardo et al., 2008; González-Acosta et al., 2018). The fact that these populations of neurons consistently cocluster across studies and datasets suggests that L5 ET cells play similar functional roles across the two brain regions. Like Ex25, other intratelencephalic projecting cell types (Ex10 in L2-3 or Ex06 in L3-5) also exhibit marked intercortical differential gene expression. The sgACC has reciprocal connections with medial and orbital frontal cortical areas, entorhinal cortex and amygdala (Calderazzo et al., 2021; McHale et al., 2022), whereas the DLPFC connects to lateral frontal, parietal, insular, and superior temporal regions (Yeterian et al., 2012); it is possible that the large number of intercortical DEGs present in these long-range projection neurons reflect the underlying differences in connectivity.
Enrichment of psychiatric genetic signals and transcriptional changes in specific neuronal populations
One key advantage of cell-type classification is the ability to interrogate the contribution of individual cell populations to the development and expression of psychiatric disorders. To this end, we examined the cell-specific enrichment for both heritability signals and alterations of gene expression. Our findings suggest: (1) that the genetic determinants of psychiatric disorders act primarily on neurons; (2) that the gene expression changes detected in bulk-RNA-seq studies are dominated by the response of glial cells to the changes in neurons or factors arising after illness onset; and (3) that gene expression changes in neuronal subpopulations are masked in bulk-RNA-seq studies.
Neurons exhibited enrichment for GWAS signals for BD and MDD at the broad class level; however, enrichment of specific neuronal subclusters was not evident for these disorders. However, neuronal subclassification revealed surprisingly strong enrichment for SCZ GWAS signals in four Ex subclusters: Ex15 IT L2-3 LINC00507 PDGFD and Ex14 IT L6 THEMIS THEMIS in the DLPFC; Ex20 IT L3 LINC00507 PTGER3 in the sgACC; and Ex7 IT L6 THEMIS THEM233 in both brain regions. No cell type-specific enrichment was observed for GWAS variants associated with ASD, possibly reflecting (1) the smaller size of ASD GWAS studies (Grove et al., 2019) or (2) that rare de novo variants might contribute disproportionately to the genetics of ASD (Iossifov et al., 2014; Sestan and State, 2018; T. Wang et al., 2020). Others have found enrichment of GWAS signal for SCZ and BD in several excitatory and inhibitory subclasses (Skene et al., 2018; Tran et al., 2021) and for ASD in one inhibitory subtype, OPCs, oligodendrocytes, and astrocytes (Tran et al., 2021). Skene et al. (2018) associated GWAS signals for both SCZ and BD to many of the same inhibitory and excitatory cell types. Our study includes a larger number of nuclei (63,901 vs ∼14,000) and finer resolution clustering for Ex and In subtypes, enabling resolution of differences in the neuronal association of SCZ and BD GWAS signals. Additional key differences include our use of human single nuclei versus mouse single cell data; our focus on two prefrontal cortical brain regions versus the inclusion of striatum, somatosensory, and hippocampus, and differences in the multiple comparison correction procedures used. Regarding Tran et al. (2021), differing results may be explained by: (1) our use of LDSC versus MAGMA to determine relevant GWAS signals; (2) the resolution of the cell types resolved; and (3) the threshold for cell type gene specificity selected for analysis. Nevertheless, the convergent finding is that the heritable factors contributing to the development of a psychiatric disorder act primarily through neurons.
Differential gene expression paints a different picture. DEGs for SCZ, BD, and ASD were enriched primarily in glia, microglia, and endothelial cells with less significant enrichment scattered throughout several In and Ex subclusters. For BD, DEGs derived from studies of the DLPFC (Gandal et al., 2018b) were minimally enriched in neuronal subclusters. However, DEGs identified in a separate study of sgACC (Akula et al., 2021) were not enriched in any cell type, most likely because this study contained only 39 cases with BD (data not shown). Analysis of a more recent RNA-seq study of 138 cases with BD in the sgACC (Zandi et al., 2022) confirmed the findings based on transcriptomics in the DLPFC of BD (Gandal et al., 2018b) (data not shown). With respect to ASD, DEG enrichment primarily localized to upper- and middle-layer Ex subclusters in the DLPFC, corroborating findings by others that upper-layer excitatory neurons (Velmeshev et al., 2019) and the PFC is most affected in ASD (Courchesne et al., 2011; Parikshak et al., 2016; Brumback et al., 2018; Schwede et al., 2018). Two PVALB In subclusters, In04 and In05, were enriched for genes downregulated in ASD. This result aligns with the finding of decreased numbers of PVALB+ interneurons in the PFC of ASD brains (Hashemi et al., 2017).
For SCZ, astrocytes exhibited the strongest enrichment for genes with increased expression in disease and a few Ex subclusters exhibited enrichment for genes with either increased expression in superficial layer subclusters (Ex20 IT L3 LINC00507 PTGER3, Ex18 IT L3 LINC00507 PSRC1, and Ex06 IT L3 − 5 RORB IZUMO3) or reduced expression in middle- or lower-layer subclusters (Ex08 IT L4 − 5 RORB RPRM, Ex09 IT L4 − 5 RORB TDO2, Ex14 IT L6 THEMIS THEMIS, and Ex22 L6b FEZF2 CTGF). Recent snRNA-seq studies of SCZ also found enrichment of DEGs in similar excitatory cell classes (Ruzicka et al., 2020; Reiner et al., 2021; Batiuk et al., 2022). We noted that Ex25 L5-ET L5 FEZF2 GABRQ (VENs) in the sgACC, but not DLPFC, exhibited modest enrichment for genes upregulated in SCZ and ASD, in line with other studies that suggest VENs may be selectively vulnerable in SCZ, ASD, and frontotemporal lobar dementia (Uppal et al., 2014; Krause et al., 2017; Y. Yang et al., 2017; Hodge et al., 2020). Regarding inhibitory cells, VIP In subclusters were enriched for genes with increased expression in SCZ; and while some enrichment was found for SST and LAMP5 In subclusters, no enrichment was noted for any PVALB In subclusters. This finding stands in contrast to findings of presynaptic and postsynaptic abnormalities in PVALB cells (for review, see Lewis et al., 2012), possibly because some within-subcluster alterations may go undetected in homogenate RNA-seq studies, which are more likely to capture global changes across multiple cell classes or changes from a dominant cell population.
Only for SCZ did two cell types demonstrate enrichment for both GWASs and differential expression signals: Ex20 IT L3 LINC00507 PTGER3 and Ex14 IT L6 THEMIS THEMIS. This finding is tantalizing as it suggests a possible connection between heritable features and transcriptional phenotypes within these cell types. Given that more cell types did not exhibit convergent signals suggests that these excitatory neuronal cell types may exhibit the strongest dysregulation in SCZ and may be the primary driving defect underlying the disorder. In this hypothetical framework, the transcriptional changes identified in glia and In subtypes are a compensatory/regulatory response to Ex cell dysfunction. Still, it is possible that (1) some transcriptional changes in other individual neuronal subclusters are obscured in homogenate RNA-seq studies; and that (2) GWAS signals that act during development may not be expressed in the appropriate cell type in adulthood and remain undetected.
Our study was limited by: (1) small sample size of only male individuals; (2) technical differences between the 10× Genomics and SMART-seq platforms resulting in reduced detection of low-expressed genes; and (3) nuclei isolation biased toward collection of neuronal nuclei over glial nuclei, making detection of rare cell types difficult. We were unable to identify the Allen Brain SST CHODL cluster (Inh L6 SST NPY), which is a rare SST interneuron with long-range projections. Additionally, our requirement that each subcluster be present in all or almost all the subjects, while conservative, also increases the likelihood of missing rare cell types. While it is possible that clusters might represent distinct transcriptional states of a similar cell subtype (Kharchenko, 2021), the correspondence of the cell types identified here to those from the Allen Institute supports the validity of our clustering methods.
In conclusion, we analyzed snRNA-sequencing data from the DLPFC and sgACC, identified 42 robust neuronal subclasses, elucidated cellular and molecular differences between the regions, and mapped genetic risk and expression changes associated with psychiatric disorders onto specific cell types. These findings reveal new insights into the functional differences between the sgACC and DLPFC and identify hypothetical mechanisms by which these functions might be altered in SCZ, MDD, BP, and ASD.
Footnotes
- Received April 29, 2022.
- Revision received February 27, 2023.
- Accepted March 15, 2023.
This work was supported by the Intramural Research Program of the National Institute of Mental Health ZICMH002903-15. This work utilized the computational resources of the National Institutes of Health HPC Biowulf cluster (http://hpc.nih.gov). We thank the families of the deceased for tissue donations; the Offices of the Medical Examiners of Washington, D.C. and Virginia for the extraction of the tissue; the Single Cell Analysis Facility, National Cancer Institute Center for Cancer Research and the Sequencing Facility, and National Cancer Institute Center for Cancer Research for support in the snRNA-seq protocol, which are funded through the Frederick National Laboratory for Cancer Research Contract 75N91019D00024; and Advanced Cell Diagnostics for support in the RNAscope protocol.
The authors declare no competing financial interests.
- Correspondence should be addressed to Stefano Marenco at marencos{at}mail.nih.gov or Pavan K. Auluck at pavan.auluck{at}nih.gov
- Copyright © 2023 the authors