Abstract
Systemic study of pathogenic pathways and interrelationships underlying genes associated with Alzheimer's disease (AD) facilitates the identification of new targets for effective treatments. Recently available large-scale multiomics datasets provide opportunities to use computational approaches for such studies. Here, we devised a novel disease gene identification (digID) computational framework that consists of a semi-supervised deep learning classifier to predict AD-associated genes and a protein–protein interaction (PPI) network-based analysis to prioritize the importance of these predicted genes in AD. digID predicted 1,529 AD-associated genes and revealed potentially new AD molecular mechanisms and therapeutic targets including GNAI1 and GNB1, two G-protein subunits that regulate cell signaling, and KNG1, an upstream modulator of CDC42 small G-protein signaling and mediator of inflammation and candidate coregulator of amyloid precursor protein (APP). Analysis of mRNA expression validated their dysregulation in AD brains but further revealed the significant spatial patterns in different brain regions as well as among different subregions of the frontal cortex and hippocampi. Super-resolution STochastic Optical Reconstruction Microscopy (STORM) further demonstrated their subcellular colocalization and molecular interactions with APP in a transgenic mouse model of both sexes with AD-like mutations. These studies support the predictions made by digID while highlighting the importance of concurrent biological validation of computationally identified gene clusters as potential new AD therapeutic targets.
Significance Statement
Powerful computational approaches such as machine learning (ML) can interrogate large-scale multiomics datasets to predict disease-associated genes unbiasedly via systemic study. This study presents a new disease gene identification (digID) computational framework using semi-supervised deep learning classifier. Empowered by the super-resolution imaging and the spatial biology paradigm, we further revealed that the ML model predicted AD-related G-protein signaling is subject to spatial expression dysregulation. Therefore, computational discoveries require independent biological validation to yield medical insights, and our data highlight three novel G-protein genes and their signaling networks to be potential new AD therapeutic targets.
Introduction
Alzheimer's disease (AD) is the most common form of dementia and is a major leading cause of adult death in the United States, killing more people than breast cancer and prostate cancer combined (Matthews et al., 2019). Although extensive studies have been conducted to understand the root causes of AD, it currently has no effective treatments. The presence of amyloid beta (Aβ) plaques, neurofibrillary tangles (NFT), neuronal death and synaptic loss, blood clotting, and inflammation in the brain have all been described as pathological mechanisms of AD (Long and Holtzman, 2019; Leng and Edison, 2021). Current therapeutic efforts have largely focused on targeting and reducing the accumulation of Aβ and NFT but have yet to produce the desired clinical outcomes of halted or reversed AD progression, underscoring the need for novel research approaches (Breijyeh and Karaman, 2020).
Systemic study of AD pathways and their interactions has become possible because of new advancements in bioinformatics and the increasing abundance of multiomics data that can represent the molecular landscape of AD (Emilsson et al., 2008; Ideker and Sharan, 2008). With large-scale multiomics datasets from AD patient cohorts and various gene annotation databases, together with a few dozen of high confidence AD-associated genes that have been discovered mainly through genome-wide association studies (GWAS; Lambert et al., 2013; Jansen et al., 2019; Bellenguez et al., 2020), machine learning can be applied to identify undiscovered AD genes by searching for genes with similar biological fingerprints to those of the high confidence AD genes.
Recently developed semi-supervised machine learning methods are uniquely positioned to serve as classifiers for such studies. Unsupervised machine learning has difficulty utilizing existing knowledge (e.g., known disease-associated genes), which often results in suboptimal prediction performance. In contrast, supervised binary classification requires large numbers of both positively and negatively labeled samples and cannot learn from unlabeled samples and therefore is also not ideal for disease gene discovery. On the other hand, semi-supervised machine learning methods such as baggingPU (ensemble-based; Wright, 2018) and ladder_net (deep learning-based; Gupta, 2019) can learn from both labeled and unlabeled samples to improve classification performance. Remarkably, semi-supervised deep learning approaches are able to learn the underlying relationships across a wide array of samples (Rasmus et al., 2015), leading to identify disease-associated genes from diverse large-scale datasets.
Additionally, various computational approaches have been utilized to prioritize candidate genes for more efficient drug target validation (Arabfard et al., 2019; Mukherjee et al., 2019). Among these computational approaches, protein–protein interaction (PPI) network-based ranking of the importance of disease genes has been demonstrated to be an effective method (Erten and Koyutürk, 2010).
In this study, a computational framework using semi-supervised deep learning to predict AD-associated genes followed by PPI network analysis to rank the predicted AD genes according to their importance in AD development is presented to systemically understand the molecular pathways underlying AD pathogenesis. This framework identified three novel AD-associated genes as potential new targets for AD treatment, which were further validated by biological analysis of spatial expression patterns in AD as well as their molecular interactions with APP in transgenic mice bearing AD-like mutations.
Materials and Methods
Compilation of multiomics data
The input dataset consists of both knowledge-based data and patient data. The knowledge-based datasets, which contain information on gene ontology, gene pathway annotation, and PPI, were downloaded from https://github.com/fabiofabris/Bioinfo2019 (Fabris et al., 2019). From this repository, we used “go_features.csv” (originates from NCBI website) for gene ontology features, “pathdipall_features.csv” for pathway annotation features, and “ppi_features.csv” for PPI features. These datasets were all binary matrices where each row was one gene and each column was a feature.
To construct a gene expression dataset, 133 datasets (microarray and NGS), including 3,401 brain tissue samples from AD patients of both sexes defined by clinical diagnosis, 1,076 brain tissue samples and 2,320 nonbrain tissue samples from patients with other common age-related diseases (e.g., Parkinson's disease, atherosclerosis, and osteoporosis), and 3,684 brain tissue samples and 4,348 nonbrain tissue samples from healthy controls, were extracted from NCBI GEO database (https://www.ncbi.nlm.nih.gov/gds/) by searching (diseaseName[MeSH Terms] OR diseaseName[All Fields]) AND (“gene expression"[MeSH Terms] OR gene expression[All Fields]) AND (“hominidae"[MeSH Terms] OR “Homo"[Organism] OR homo[All Fields])) AND “Homo sapiens" where the diseaseName was either Alzheimer's, Parkinsonism, Atherosclerosis, Stroke, Osteoarthritis, Osteoporosis, Chronic obstructive pulmonary disease, or Type 2 diabetes. For microarray datasets, the versions with normalized data were used. Sequencing read counts data from NGS datasets were transformed into TPM (transcripts per kilobase million). The transformed NGS and normalized microarray data were then centered and scaled (range from 0 to 1) by gene.
The Gene Ontology, pathway annotation, PPI, and gene expression datasets were then merged by gene, and the genes with more than a quarter of null values were dropped. Below are the 133 GEO datasets used for this study: GSE23290, GSE24378, GSE20141, GSE93885, GSE110226, GSE4757, GSE45596, GSE61196, GSE7621, GSE104704, GSE161355, GSE54282, GSE110298, GSE43490, GSE53890, GSE8397, GSE29378, GSE109887, GSE36980, GSE28894, GSE122063, GSE5281, GSE132903, GSE48350, GSE15222, GSE92538, GSE104687, GSE118553, GSE71620, GSE131617, GSE33000, GSE44772, GSE84422, GSE104674, GSE156508, GSE100786, GSE107037, GSE109048, GSE110008, GSE1145, GSE117525, GSE11784, GSE117999, GSE120774, GSE125771, GSE130928, GSE13496, GSE13850, GSE13896, GSE152326, GSE165121, GSE169077, GSE18608, GSE18876, GSE20146, GSE20257, GSE22148, GSE22253, GSE24425, GSE27597, GSE30063, GSE35959, GSE36700, GSE37768, GSE39540, GSE41036, GSE43191, GSE46750, GSE48556, GSE51588, GSE55235, GSE56342, GSE58294, GSE64554, GSE64614, GSE64998, GSE66360, GSE66635, GSE7158, GSE73089, GSE73655, GSE75181, GSE76925, GSE77344, GSE77962, GSE83500, GSE94499, GSE98460, GSE98918, GSE99039, GSE100905, GSE79666, GSE110731, GSE159699, GSE144254, GSE148822, GSE64810, GSE68719, GSE114517, GSE125050, GSE174367, GSE125583, GSE102485, GSE86468, GSE94736, GSE103174, GSE107894, GSE111120, GSE113957, GSE114007, GSE115348, GSE122709, GSE128177, GSE129042, GSE131681, GSE132831, GSE133099, GSE135743, GSE135902, GSE139073, GSE141432, GSE145284, GSE145746, GSE158312, GSE164416, GSE164471, GSE47460, GSE51799, GSE72815, GSE75337, GSE81965, GSE88888, and GSE92724.
Curated high confidence known AD-associated and non-AD-associated genes for class labels
The 43 high confidence known AD-associated genes used as the positive samples in the input data were curated from a large-scale GWAS (Lambert et al., 2013), an AD-associated variants study (Del-Aguila et al., 2015) and Human Phenotype Ontology (HPO; Köhler et al., 2021). For our known AD-associated gene set, we included all genes defined as AD-associated genes by HPO or genes found to be significantly associated with AD (p < 10−6) from any of these GWAS/genomic variants studies (Lambert et al., 2013; Del-Aguila et al., 2015). The 43 curated known AD-associated genes used for this study are A2 M, ABCA7, ADAM10, AKAP9, APOE, APP, BIN1, CACNA1G, CASS4, CD2AP, CD33, CELF1, CLU, CR1, DSG2, EPHA1, FERMT2, GATA1, HFE, HLA-DRB1, HLA-DRB5, INPP5D, MAPT, MEF2C, MPO, MS4A6A, MS4A4A, NME8, NOS3, PICALM, PLAU, PLCG2, PLD3, PSEN1, PSEN2, PTK2B, RIN3, SLC24A4, SORL1, SRCAP, TREM2, UNC5C, and ZCWPW1. The 200 non-AD-associated genes were selected according to the criteria Huang et al. described (Huang et al., 2018). The following is the non-AD-associated genes used for this study: BTN3A1, NPC2, AGR3, SLC19A3, CHPT1, LGI4, LAPTM4B, RIC8B, DAGLB, SLC25A13, RELL1, G6PD, SLC16A10, CCDC80, ROM1, TTC7A, PCTP, RAB37, CMTM6, SLC25A38, MARCO, DERA, SAMD9, RDH5, GXYLT2, BTN2A2, HEXB, TMEM59, FAR2, DDHD2, ROPN1L, PIGO, SMPDL3A, TEX101, SULT1C4, EVA1A, TRIM55, SLC26A1, MCEE, MPV17L2, CLEC2B, AGBL2, CEP120, PIGG, SCRG1, EMC1, SLC22A8, CERCAM, RASSF9, MPC1, KATNAL2, XIRP2, PACRG, SPIN1, ENTPD4, NEUROD4, CABYR, EMC3, GCNT2, PDXK, MEIG1, EFHC1, GGT7, SLC19A2, CAAP1, ALG13, TENM1, OGFOD2, TSNAXIP1, DPPA2, GALT, GLB1L3, VKORC1L1, NIPSNAP1, SLC25A12, ACOT13, COQ4, PHOSPHO2, RIPPLY2, REEP4, XYLB, ACOT7, CMBL, MMD, UBL7, NXNL1, MGAT1, MTHFSD, SLC26A2, PSMG3, KIAA1549, UGT8, LENG8, SLC35A3, NAT2, CCDC146, IL17B, SEMG1, PIGN, COMMD9, IL36G, MTO1, PHLDA3, MGAT5B, STARD10, CD1E, NAT1, SULT1C2, MTHFS, MPC2, TMEM45B, TRMT1, ANXA9, SDR9C7, MGAT4B, SGSH, METTL25, PIGX, SLC5A6, PIGM, ACYP2, CEPT1, SLC51A, WDR45, ATRNL1, NAT8L, ESYT1, TM4SF4, B3GALNT2, MMACHC, ACYP1, SULT1A1, REEP3, SULT1A2, UGGT2, EMC7, GLB1L, TPD52L3, ALG14, PEX26, UGGT1, TRMT10A, NT5DC1, NAIF1, DRP2, MMADHC, ACAD10, TMEM160, BFSP2, XPO7, CDKL2, ELOVL4, GYPB, SLC22A1, CDS2, SSX1, SEPHS2, SEPHS1, CDS1, MMGT1, OXCT2, PNMA2, THNSL2, B3GALT1, PNPO, RCN3, TPD52L2, EMC9, DHDDS, NUDT9, GDAP1, ENTPD7, COQ3, LXN, FUCA1, FUCA2, SLC16A2, GLOD4, SRM, B3GALT4, BTN3A3, B3GALT2, RNASET2, PEX16, CNPY2, ANKMY2, EXTL2, ALAD, ZMPSTE24, GNPTAB, ETFDH, DCTPP1, LAPTM4A, BTN1A1, GPD2, EMC2, DAGLA, CLN5, DGKB, and DGKQ.
Ladder_net for AD-associated gene prediction
ladder_net (Gupta, 2019) is a recent implementation in Keras of a semisupervised deep learning algorithm called Ladder Networks (Rasmus et al., 2015). Ladder Networks consists of an encoder neural network that trains on labeled data which is used for the final predictions and a decoder neural network that learns from unlabeled data to decrease the noise between the layers of the encoder. The ladder network has been proven to work well for semi-supervised learning problems based on related work. Our experimental results also show that it can produce reasonably good results (AUC = 0.8819 ± 0.0460, accuracy = 0.8358 ± 0.0459, specificity = 0.8500 ± 0.0559, sensitivity = 0.7471 ± 0.0511 [mean ± SD]) even when performing 5-fold cross-validation (80–20% train–test split) on only the labeled data, which is a very small subset of the whole dataset. We manually tuned the hyperparameters of ladder_net. The layer_sizes parameter for ladder_net was set to [input_size, 1,000, 500, 250, 250, 250, n_classes] where input_size is the number of features in the data input (30,419 features) and n_classes is the number of output classes (2 classes). The epoch parameter was set to 45 based on the shape of the loss curve. All of the curated AD-associated genes were labeled as positive, and the 200 non-AD genes (found according to Huang et al.'s method; Huang et al., 2018) were labeled as negative. Default values were used for other parameters. Five-fold cross-validation was used to evaluate the prediction performance of ladder-net on the compiled multiomics dataset. After evaluation, the probabilities of genes being associated with AD were then predicted by the trained ladder_net.
Construction of PPI networks and calculation of centrality scores
The PPI data file used for network analysis was downloaded from the STRING database (https://version-10-5.string-db.org/cgi/download.pl?species_text=Homo+sapiens; Szklarczyk et al., 2021). The STRING PPIs that have a confidence score of ≥900 were selected, and then the selected PPIs were mapped to the predicted AD genes to construct the network of AD-associated genes. Finally, redundant gene–gene interactions were excluded from the network. The R package igraph (Csardi and Nepusz, 2006) was used to transform the AD gene network into an igraph object. The Latora closeness centrality function from CINNA R package (Ashtiani et al., 2019) was used to calculate the centrality scores of these AD-associated genes using the igraph object as input.
Spatial analysis of mRNA expression
The mRNA data obtained through GEO GSE28146, GSE36980, GSE33000, GSE118553, and GSE138260 was extracted to analyze the spatial relationship between mRNA expression in healthy human and AD patients. The data were examined with the human cortex subdivided into the whole cortex and prefrontal cortex whereas human hippocampi were subdivided into whole hippocampi and CA1 region for mRNA expression analysis. The frontal cortex was selected according to MRC-LBB brain region selection criteria and validated by hallmark AD pathology whereas for prefrontal cortex, DLPFC (BA9) brain tissues of AD patients and ND control samples were obtained from Harvard Brain tissue resource center (HBTRC). The comparisons were made on the three G-protein signaling genes (GNAI1, GNB1, and KNG1) predicted by digID to be relevant in AD as well as the genes associated with KNG1 signaling (ITSN and CDC42). The statistic analysis method to be consistent with the new analysis performed. A p value < 0.05 was considered statistically significant.
Co-immunoprecipitation of proteins and Western blot
PC12 cells were cultured in DMED supplemented with horse serum and fetal bovine serum as described previously (Lu et al., 2002). PC12 cells were either treated with vehicle, bradykinin (BK), or nerve growth factor (NGF) for 20 min. They were then rinsed with ice-cold phosphate-buffered saline (PBS) before adding Pierce RIPA Buffer (Thermo Fisher Scientific, 89900) with Protease Inhibitor Cocktail Tablets (Roche, 11836153001) and Phosphatase Inhibitor Cocktail (Sigma-Aldrich, P0044). Cells were then scraped off the culture plate, passed through a 21-gauge needle, and centrifuged at 13,200 rpm for 30 min at 4°C, and the supernatant was collected. Protein concentration was measured using BCA protein assay. A total of 500–1,000 µg total protein was immunoprecipitated using rabbit antibodies against GNAI1, GNB1, or KNG1. Following the protein A/G beads incubation to capture to the protein co-precipitates, proteins were run on 8–16% Tris-Glycine gels (Invitrogen, XP08165BOX) and transferred to a nitrocellulose membrane (Pall, 66485). The membranes were blocked in 5% milk in TBST at room temperature on an orbital shaker. They were incubated with mouse anti-APP (22C11) overnight at 4°C and membranes were washed three times with 5% milk in TBST. Secondary antibody was incubated at room temperature and the membranes were washed before they were incubated in ECL (GE Healthcare) at room temperature and imaged using Odyssey FC (Li-Cor Biosciences) with Image Studio version 5.2. The cell lysates were also reverse co-immunoprecipitated with mouse anti-APP (22C11) and then Western blotted with rabbit anti GNAI1, GNB1, and KNG1, respectively.
Mouse cortical neuronal cultures and immunofluorescent light microscopy
Mouse embryonic cortical cultures were prepared essentially as described by Jones et al. (2004). Briefly, 17 d timed pregnant mice were euthanized, and the embryos were removed in accordance with the Institutional Guide for the Care and Use of Laboratory Animals. Cortices were collected, and cells were dissociated by trypsinization and plated onto poly-L-lysine-coated coverslips in DMEM supplemented with 10% FBS. After neurons adhered to the substrate, the medium was changed to Neurobasal supplemented with B-27. This culture scheme allowed us to maintain viable low-density neuronal cultures for several weeks. At 10–12 d in vitro, the neurons were fixed in 4% paraformaldehyde and treated with 0.2% Triton X-100 for 15 min. After being blocked with 10% BSA, the neurons were double labeled with antibodies against APP (Mouse, SIG-39220, Sigma) and GNAI1 (Rabbit, SAB2100936), GNB1 (Rabbit, SAB2701168), and KNG1 (Rabbit, 11926-AP). For conventional fluorescent light microscopy, FITC and Cy3 were used as secondary antibodies whereas for super-resolution imaging, Alexa Fluor 647 and Atto 488 were used. After PBS washes, the coverslips were mounted and analyzed under either a Zeiss M2 (Carl Zeiss) or Nikon Ti2-E inverted microscope with an L-APPS H-TIRF attachment and four-line LUN-F laser module. The conventional morphometric analyses were performed with the MetaMorph Imaging software system (Universal Imaging).
STORM super-resolution imaging and analysis
For STORM analysis, N-STORM calibration: X, Y, and Z-Calibrations and Chromatic Alignments were performed as stated in the NIS-Elements Advanced Research User's Guide menu and described in detail in Naser et al. (2022). The alignment of multicolor asymmetric point spread functions allows for z calibration and color correction in the X, Y, and Z directions.
Image acquisition
Three sets of images were taken for each double staining combination in cells and mouse brain tissue. Raw images were collected and first analyzed for “blinks” which indicate epifluorescent switching; blink thresholds were standardized to ensure that each image was analyzed equally. Thresholding based on peak intensity height allowed the elimination of potential background staining, as well as reduction of false-positive signals. Processing of “blinks” generated a reconstructed STORM image from which further analysis was completed on neuronal images and the images taken from brain tissues of wild-type and 3xTg-AD (triple transgenic mice carrying Alzheimer-like mutations in presenilin 1 M146V, APPsw, and TauP301L) mice of both sexes.
Quantification of fluorescence colocalization
From three areas of 16 µm2, the discrete fluorescent signals (puncta) were counted for GNAI1, GNB1, KNG1, and APP. Any signals, when distance between the two puncta was >20 nm, were counted as a single molecule. Colocalization was considered when the distance between the two fluorescent signals was shorter than 20 nm. Ratio of colocalization was defined as the number of colocalized fluorescent signals over total fluorescent signals. The data were presented as mean + SEM with p values assigned.
Experimental design and statistical analyses
Unless otherwise indicated, all statistics were performed in GraphPad Prism 9.5.1 (528) for Windows (GraphPad Software). Differences between two sample groups were analyzed using Student's t test. For comparison of the mRNA expression of different genes of interest between control and AD samples, a two-way ANOVA followed by post hoc Tukey's HSD analysis was used. Other statistical details are available in the figures or figure legends, where values = mean ± standard error. p values <0.05 is considered as significant.
Results
digID computational framework for the identification of candidate AD targets
To apply machine learning computational approaches to unbiasedly identify potential AD gene targets, we designed a novel disease gene identification (digID) computational framework that uses a semi-supervised deep learning classifier to predict AD-associated genes followed by PPI network-based analysis to prioritize the importance of these predicted genes in AD. As illustrated in Figure 1, the input of digID is a compiled diverse large-scale dataset with each gene labeled as positive (known AD-associated genes), negative (non-AD genes), or unlabeled (remaining genes). The predicted disease genes were then prioritized using PPI network-based ranking to identify potential therapeutic targets for AD treatment.
AD-associated genes predicted by semi-supervised deep learning model
The compiled large diverse dataset was used to validate our proposed method. Based on 5-fold cross-validation, digID using ladder_net demonstrated good prediction performance with AUC = 0.9317 ± 0.0363, accuracy = 0.9278 ± 0.0236, specificity = 0.9550 ± 0.0447, and sensitivity = 0.8185 ± 0.0598 (mean ± SD; Fig. 2A). Using the trained ladder_net model, the probability of each gene being associated with AD was predicted. The distribution of the probabilities shows a clear separation between known AD genes and non-AD genes (Fig. 2B). The curated known AD genes are enriched in the high probability end (>0.75), while the majority of known non-AD and unlabeled genes are at the low probability end, except for a small group of unlabeled genes, which are present at the high probability end (Fig. 2B). This distribution enabled the use of 0.75 as the confidence threshold for AD gene classification instead of the conventional 0.5 to better control the false-positive rate. In total, 1,529 unlabeled genes passed the threshold and were considered as potential AD genes. Only a few curated known AD genes (5 out of 43) have prediction probabilities lower than 0.75.
Functional enrichment analysis of the predicted genes was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 (Huang da et al., 2009a, b). The result indicates that these candidate genes are enriched in a number of cellular signaling processes, such as Rap and Ras small GTPases, NF-κB, and some of the cancer signaling pathways. Strikingly, small G-protein signaling pathways including Rap1 and Ras signaling dominate in the predicted AD-associate genes (Fig. 2C).
Prioritization of AD-associated genes using PPI network-based ranking
In total, 599 out of the 1,567 AD-associated genes (1,529 predicted AD genes and 38 curated known AD genes with prediction probabilities >0.75) have at least one high confidence PPI (confidence score of STRING PPI ≥ 900) in the STRING database. These 599 genes were then used to build the AD gene network using Cytoscape (Shannon et al., 2003) for further analysis (Fig. 3A). Again, small G-protein signaling is highest enriched from functional enrichment analysis of these 599 using DAVID (Fig. 3B). Latora closeness (Ashtiani et al., 2018) was used to calculate the centrality scores of the 599 AD genes included in the AD gene interaction network, and the centrality scores were used to rank these genes for their importance in AD—the higher the centrality score of a gene, the more important the gene in the AD gene interaction network. The top 10 ranked AD-associated genes are listed in Table 1 and highlighted in Figure 3A.
One of the top ranked genes, APP (amyloid precursor protein)—its cleavage product Aβ being the protein basis of AD pathology hallmark amyloid plaques—is the target of FDA newly approved AD drugs Aduhelm and Leqembi. The other six top ranked genes, including SRC family kinases (SRC, FYN, LCK and LYN), EGFR (epidermal growth factor receptor), and RAC1 (Rac family small GTPase 1 or Ras-related C3 botulinum toxin substrate 1), have been investigated as potential AD treatment targets in preclinical and clinical studies (Siddiqui et al., 2012; Um et al., 2012; Kim et al., 2013; Masters et al., 2015; Liu, 2017; Gwon et al., 2019; Kikuchi et al., 2020; Mansour et al., 2021; Portugal et al., 2022; Table 1). However, three of the top 10 ranked genes, GNAI1 (G-protein subunit alpha i1), GNB1 (G-protein subunit beta 1) which encode two of the three G-protein complex subunits, and KNG1 (kininogen 1) which is involved in immune response (Wu, 2015), were barely studied in AD context. Interestingly, kininogen is the precursor of bradykinin (BK) which activates another small G-protein Cdc42 GTPase that plays a pivotal role in synaptic modulation and is implicated in neuropsychiatric and neurodegenerative diseases (Aguilar et al., 2017). Therefore, they were potentially novel and important AD-associated genes.
In order to understand the interactions among the top 10 ranked AD genes, we built the PPI network of genes interacting with the top ranked AD genes using Cytoscape (Shannon et al., 2003; Fig. 3C). Figure 3C shows that the network consists of several connected subnetworks, in which the top ranked genes act as hubs/master regulators, suggesting their importance in this network of AD-associated genes. Additionally, Figure 3D depicts the functional enrichment of the genes involved in the subnetworks of APP-KNG1 and GNB1-GNAI1. Remarkably, KNG1 interacts with many genes that also interact with APP (Fig. 3C), including genes involved in blood clotting (e.g., SERPINE1), inflammatory response (e.g., CCR1, C3, and C5), and synaptic regulation (e.g., F2R, AGTR1, GNB1, and GNAI1). These three AD pathologies reportedly are associated with amyloid plaques (Strickland, 2018) and are consistent with the known physiological functions of KNG1 (Das et al., 2002; Langhauser et al., 2012; Wu, 2015). During the preparation of this manuscript, Zamolodchikov et al. reported that KNG1 and APP colocalize and co-accumulate in AD patient brains and that KNG1 is involved in the accumulation of Aβ plaques and their induced inflammation (Zamolodchikov et al., 2022). This observation together with our finding suggests that combination treatments that target both KNG1 and APP may be more effective than targeting APP alone.
Intriguingly, GNAI1 and GNB1 serve as the hubs of a subnetwork that interacts with two other subnetworks (RAC1 and APP-KNG1 as the hubs) that are associated with the core AD pathologies of neurodegeneration, amyloid accumulation, blood clotting, and inflammation (Masters et al., 2015; Kikuchi et al., 2020). Although it has been reported that RAC1 and APP can regulate each other (Désiré et al., 2005), this observation suggests for the first time that the interaction between RAC1 and APP may be through GNAI1 and GNB1 (Fig. 3C). Additionally, the genetic studies showed that GNB1 knock-out causes neural tube defects and impaired neural progenitor cell proliferation in mouse brains (Okae and Iwakura, 2010). Moreover, variants of GNAI1 and GNB1 are associated with severe neurodevelopmental disorders in humans (Petrovski et al., 2016; Schultz-Rogers et al., 2020; Muir et al., 2021).
Spatial mRNA expression of digID identified AD-associated genes in human whole hippocampus vs CA1 subregion
To further assess the potential links of digID identified G-protein signaling to AD, we performed the spatial analysis of mRNA expression profiles of GNAI1, GNB1, and KNG1, as well as the KNG1 downstream signaling elements intersectin 1 (ITSN1), intersetcin 2 (ITSN2), and small G-protein CDC42 in nondementia controls and AD brains using public datasets from the GEO database.
We compared spatial mRNA expression of digID identified AD-associated GNAI1, GNB1, and KNG1 in human whole hippocampus versus CA1 subregion (Fig. 4A,B). To evaluate whether mRNA expression of the genes of interest (GNAI1, GNB1, KNG1, ITSN1, ITSN2, and CDC42) is influenced by disease state (control and AD), we applied the following statistical analyses.
According to the two-way ANOVA, for the CA1 region of the hippocampus from the GEO dataset GSE28146, there was a main effect of the gene, F(5,168) = 101.1, p < 0.0001; a nonstatistically significant main effect of disease F(1,168) = 0.2255, p = 0.6355; and a gene × disease state interaction F(5,168) = 4.202, p = 0.0013. Post hoc Tukey's honest significant difference (HSD) analysis revealed that mRNA expression of GNAI1 (p < 0.0001) and GNB1 (p < 0.0362) showed statistically significant differences between control and AD (Fig. 4A).
On the other hand, for the mRNA expression of the whole hippocampus from GSE36980, two-way ANOVA revealed that there was a main effect of the gene, F(5,96) = 952.3, p < 0.0001; a main effect of disease F(1,96) = 6.854, p = 0.0103; and a nonstatistically significant gene × disease state interaction F(5,96) = 1.395, p = 0.2331. Although the gene × disease state interaction was not statistically significant, post hoc Tukey's (HSD) analysis revealed that GNAI1 showed a statistically significant difference between control and AD with a p = 0.0449 (Fig. 4B).
Regional mRNA expression of digID identified AD-associated genes in human frontal versus prefrontal cortex
We then further compared two independent mRNA datasets of human frontal cortex with no overlapping brain samples (Fig. 4C,D). For the mRNA expression of human frontal cortex from GSE138260, two-way ANOVA revealed that there was a main effect of the gene, F(5,204) = 723.6, p < 0.0001; a nonstatistically significant main effect of disease F(1,204) = 3.247, p = 0.0731; and a gene × disease state interaction F(5,204) = 2.320, p = 0.0446. Post hoc Tukey's HSD analysis revealed that GNAI1 (p = 0.0233) and KNG1 (p = 0.0408) showed a statistically significant difference between control and AD (Fig. 4C). For the mRNA expression of human frontal cortex from GSE118553, two-way ANOVA revealed that there was a main effect of the gene, F(5,378) = 2,567, p < 0.0001; a nonstatistically significant main effect of disease F(1,378) = 1.289, p = 0.2569; and a nonstatistically significant gene × disease state interaction F(5,378) = 1.640, p = 0.1484. Although we did not observe a statistically significant disease × gene interaction, a single effect was observed during the post hoc Tukey HSD analysis. The post hoc analysis revealed that KNG1 (p = 0.0064) showed a statistically significant difference between control and AD (Fig. 4D). However, the mRNA expression for KNG1 in GSE138260 was increased in AD, a surprising discrepancy when compared with that of the GSE118553 KNG1 mRNA expression data (Fig. 4, compare C,D).
We additionally obtained an mRNA dataset through GSE33000, which recorded expression data from human prefrontal cortex. Two-way ANOVA revealed that there was a main effect of the gene, F(5,2784) = 101.7 p < 0.0001; a main effect of disease F(1 2784) = 172.2, p < 0.0001; and gene × disease state interaction F(5,2784) = 22.74, p < 0.0001. Post hoc Tukey analysis demonstrated a striking downregulation of all three genes GNAI1 (p < 0.0001), GNB1 (p < 0.0001), and KNG1 (p < 0.0001) in AD (Fig. 4E). This result is consistent with the previously reported significant downregulation of GNB1 protein in AD patient brains (Manavalan et al., 2013). Corresponding to KNG1 downregulation, ITSN1 mRNA is significantly (p = 0.0320) downregulated in AD (Fig. 4E).
These findings showed that while digID identified GNAI1, GNB1, and KNG1 as potentially linked to AD, their mRNA expression profiles can be very different in different brain regions such as the cortex versus hippocampus or in different subregions such as the whole hippocampi vs hippocampal CA1 area.
Subcellular localization and molecular interactions of digID identified AD-associated genes with APP in cultured neurons
Determining the molecular proximity can increase confidence for the STRING provided signaling network interactions. We analyzed the colocalization of GNAI1, GNB1, and KNG1 proteins with APP that is known to be associated with AD as well as within the hub of the signaling networks revealed by the network analysis of the predicted AD genes. By using STORM super-resolution fluorescent microscopy which can resolve the two colocalized proteins within 20 nm (Xu et al., 2017; Zhou et al., 2019; Naser et al., 2022), Figure 5A first examined mouse hippocampal neurons in culture. The top panels of Figure 5A showed overall soma and process imaging whereas the bottom panels showed Z-axial images at higher magnification. GNAI1 expression is enriched in the soma although it is also expressed in the neuronal processes. In both soma and neuronal processes, GNAI1 can be found in the proximity of APP (Fig. 5A, GNAI1). Similarly, GNB1 signal is enriched in the soma but is also expressed in the neuronal processes (Fig. 5A, GNB1). On the other hand, KNG1 expression in the soma and neuronal processes are similar. All of them, GNAI1, GNB1, and KNG1, showed the varying degrees of colocalization with APP in both soma and the neuronal processes (Fig. 5A, arrows point to the overlapping fluorescence). However, our quantification results showed that the ratios of overlaps (the distance between the two puncta is shorter than 20 nm) of GNAI1, GNB1, and KNG1 relative to APP in the cultured primary neurons were not significant (Fig. 5B).
We performed additional validation studies with co-immunoprecipitation experiments. GNAI1, GNB1, and KNG1 all formed stable complexes with APP, as demonstrated by both forward and reverse co-immunoprecipitation using either anti-GNAI1, anti-GNB1, and anti-KNG1 as precipitating antibodies (Fig. 5C) or anti-APP as a reverse precipitating antibody (Fig. 5D). Interestingly, the Cdc42 upstream activator BK that is the active peptide derived from high molecular weight (HMW) KNG1 reduced the G-proteins that can form complexes with APP (Fig. 5C). On the other hand, APP interactions with GNAI1, GNB1, and KNG1 were not affected by BK or NGF treatments (Fig. 5D).
Subcellular localization and molecular interactions of digID identified AD-associated genes with APP in 3xTg-AD mouse brain
We then determined whether GNAI1, GNB1, and KNG1 colocalized with APP in vivo and showed any dysregulation in 3xTg-AD mouse brain hippocampi by using STORM fluorescent microscopy. Figure 6A shows double labeling of GNAI1 with APP in WT and 3xTg-AD mouse hippocampal CA1, CA2, and CA3 subregions. GNAI1 expression was widespread in CA1 and remained super-resolution proximity to APP in STORM standard in both WT and 3xTg-AD. There was clear fragmentation in APP staining pattern in 3xTg-AD CA2 and CA3 subregions in comparison with WT and the colocalization was more sporadic (Fig. 6A, bottom). Similar trend can be seen in Figure 6B, which showed double labeling of GNB1 with APP in WT and 3xTg-AD mouse hippocampal CA1, CA2, and CA3 subregions. CA2 and CA3 showed increased distance between GNB1 and APP in 3xTg-AD mouse when compared with WT (Fig. 6B, bottom). KNG1, on the other hand, demonstrated a different pattern of dysregulation in 3xTg-AD mouse hippocampi (Fig. 6C). Here, APP showed more discrete clusters in 3xTg-AD CA1 than WT. Striking colocalization of APP with KNG1 can be seen in CA2 of WT, whereas 3xTg-AD lost this colocalization. APP interaction with KNG1 was similar between 3xTg-AD and WT CA3 subregions (Fig. 6C, bottom, arrows). The different patterns of molecular interactions among GNAI1, GNB1, KNG1, and APP are further supported by the quantification of STORM signals (Fig. 6D–F).
Discussion
In this study, digID, a novel computational framework for disease gene prediction and prioritization, was developed. digID is able to systemically predict AD-associated genes and assess their importance by using genome-wide datasets as input, thus avoiding the bias that can potentially be introduced by traditional approaches that focus on individual genes or functional pathways. To our knowledge, this is one of few computational frameworks that have been built to streamline the prediction and the prioritization of disease genes in AD. This approach also allows that both gene–gene relationship datasets and disease-specific gene expression datasets are compiled into a diverse large-scale dataset for disease gene discovery to increase sensitivity and specificity of disease gene prediction. The robust performance of semi-supervised deep learning approach in AD gene prediction was underscored by multilevel biological validations.
This study identified a comprehensive molecular landscape of 1,567 AD-associated genes with high accuracy, thus providing a resource for understanding the molecular mechanisms of AD. The PPI network analysis of these AD genes identified novel disease mechanisms including multiple master regulators/gene hubs that contribute to the same AD pathology and different AD pathologies that are connected at the molecular level. Collectively, these results suggest the importance of combinational therapies for AD and provide guidance for developing effective AD treatments.
In addition, this study discovered three potentially novel therapeutic targets for AD treatment. KNG1 was identified as a potentially important coregulator of APP in Aβ accumulation-related AD pathology, suggesting that KNG1 could be a new candidate for AD treatment in combination with APP targeted therapies. Interestingly, GNB1 and GNAI1, which were barely studied in the context of AD but reportedly play important roles in neurodevelopment, were identified as two of the top ranked AD genes and as potential mediators of the interactions between small GTPase RAC1 and APP. These findings warrant further studies to understand the importance of these newly identified genes in AD, and their potential as new targets for AD treatment need to be experimentally validated.
Indeed, as with other machine learning and computational tools, their application requires independent experimental validation. Our study applied two approaches to determine the validity of the digID-postulated novel AD genes (i.e., GNAI1, GNB1, and KNG1) and their association with well-known AD networks. One approach was to assess whether individual genomic datasets derived from different human ND and AD brain regions lead to similar digID-postulated novel gene association. Another approach was to determine whether these novel genes GNAI1, GNB1, and KNG1 are at the right molecular proximity with known AD genes such as APP. Our data demonstrated that while mRNA for GNAI1, GNB1, KNG1, and KNG1 downstream small GTPase CDC42 showed significant dysregulation in AD from ND in the prefrontal cortex, considerable uncertainty exists when mRNA data from the frontal cortex were used to address the same question. Similarly, when comparing mRNA data from hippocampi and the CA1 subregion of the hippocampi, GNAI1 and GNB1 showed dysregulation in the CA1 subregion but only GNB1 showed significant downregulation in AD. There were no differences in mRNA of KNG1, CDC42, ITSN1, and ITSN2 between ND and AD hippocampi nor CA1 subregions. On the other hand, KNG1 showed clear dysregulation from the datasets of both frontal cortex and prefrontal cortex. These data showed that the digID-postulated AD gene network is subject to spatial expression interpretation.
The other approach to validate the postulated AD genes was through STORM and co-immunoprecipitation analyses. STORM can generate super-resolution images providing for a lateral (x, y) resolution of 20 nm and an axial (z) resolution of 50 nm within which single molecular interaction can be indicated (Xu et al., 2017; Zhou et al., 2019). Our studies clearly showed that some GNAI1, GNB1, and KNG1 are colocalized with APP at the STORM level and likely form stable complexes with APP. When their colocalization was assessed in 3xTg-AD mouse hippocampi, we observed subregion selective changes in GNAI1, GNB1, and KNG1 expression in comparison with WT hippocampi. These data echo the spatial dysregulation of mRNA expression of GNAI1, GNB1, and KNG1 in human AD in comparison with ND frontal cortex and prefrontal cortex.
The selective spatial dysregulation of AD gene expression not only established the involvement of G-protein signaling through GNAI1, GNB1, and KNG1 in AD pathogenesis but also underscored the formulating therapeutic strategy targeting these gene networks. Furthermore, our recent studies showed that beyond gene and protein expression, spatial dysregulation of G-protein activities should be taken into consideration when designing targeting approaches (Nik Akhtar and Lu, 2023). Therefore, supplementing computational modeling with biological validation is essential in order for novel machine learning methods to provide optimal biomedical insights.
Data and Materials Availability
The link for digID code and data can be found at https://github.com/danielzhang-hackley/digID.
All other data supporting the findings of this study are available in the main text, the supplementary materials, or upon request from the corresponding author in accordance to institutional technology safety and privacy policy.
Footnotes
We thank Christi Boykin and Amna Naser for technical assistance.
This study was supported in part by National Institutes of Health Director's Transformative Research Award R01GM146257, the Wooten Foundation for Neurodegenerative Diseases Research, and the SmartState Endowment Fund of South Carolina.
The authors declare no competing financial interests.
- Correspondence should be addressed to Qun Lu at qun{at}mailbox.sc.edu.