Abstract
A major challenge in neuroscience is finding which genes affect brain integrity, connectivity, and intellectual function. Discovering influential genes holds vast promise for neuroscience, but typical genome-wide searches assess approximately one million genetic variants one-by-one, leading to intractable false positive rates, even with vast samples of subjects. Even more intractable is the question of which genes interact and how they work together to affect brain connectivity. Here, we report a novel approach that discovers which genes contribute to brain wiring and fiber integrity at all pairs of points in a brain scan. We studied genetic correlations between thousands of points in human brain images from 472 twins and their nontwin siblings (mean age: 23.7 ± 2.1 SD years; 193 male/279 female). We combined clustering with genome-wide scanning to find brain systems with common genetic determination. We then filtered the image in a new way to boost power to find causal genes. Using network analysis, we found a network of genes that affect brain wiring in healthy young adults. Our new strategy makes it computationally more tractable to discover genes that affect brain integrity. The gene network showed small-world and scale-free topologies, suggesting efficiency in genetic interactions and resilience to network disruption. Genetic variants at hubs of the network influence intellectual performance by modulating associations between performance intelligence quotient and the integrity of major white matter tracts, such as the callosal genu and splenium, cingulum, optic radiations, and the superior longitudinal fasciculus.
Introduction
A major goal in neuroscience is to identify specific genes that affect brain structure and function, as well as genes that modulate risk for neuropsychiatric disease (Gandhi and Wood, 2010). Using genome-wide association (GWA) scans, scientists can now link genetic markers across the whole genome with biological measures in human populations, to search for common genetic variants that associate with higher risk for disease, e.g., abnormalities in the ApoE gene (Schuff et al., 2009) or the CLU gene (Braskie et al., 2011) associated with late-onset Alzheimer's disease. GWA can help to identify contributors to the mechanisms of neuropsychiatric disease, which can be pursued to identify molecular targets for treatment. Even so, standard GWA studies ignore interactions between genes (Manolio et al., 2009; Becker et al., 2011); the power of GWA is also rather low as one needs to correct for approximately a million hypotheses tested across the genome, so vast samples of subject need to be assessed.
Recently, the cDNA microarray method has been widely used to identify gene coexpression networks where the component genes are “clustered” into groups that jointly affect complex traits (Ghazalpour et al., 2006). The use of clustering may reveal evidence of collective effects of networks of genes on brain pathology. For example, some gene sets associated with neuronal integrity may show altered expression patterns in schizophrenia patients (Torkamani et al., 2010). Nevertheless, in situ DNA sampling and analysis with microarrays may only be feasible in the postmortem brain; moreover, the spatial scope and resolution in tissue sampling across the brain are limited.
In this study, we developed a novel approach to gene discovery in the living brain, outlined in Figure 1. We examined networks of genes that influence brain integrity measured by diffusion tensor images (DTI). DTI is a variant of standard MRI, measuring the 3D directionality of water diffusion at each location in the brain; fractional anisotropy (FA), derived from DTI, is a widely accepted measure of fiber tract coherence and integrity (Basser et al., 1994). By adapting classical twin methods and hierarchical clustering to DTI scans from 472 twins and their nontwin siblings, we identified patterns of brain regions where white matter integrity, measured by FA, was under strong and relatively homogeneous genetic control. We then performed GWA to identify single nucleotide polymorphisms (SNPs) associated with FA in these regions. We discovered SNP pairs that tended to jointly affect brain regions. By applying network topology analysis to this SNP association matrix, we isolated an SNP network influencing white matter integrity in the living brain. This gene network showed small-world and scale-free properties, with several dominant hubs and structured subnetworks. These configurations may make the network biologically economical and more resistant to disruption than random networks (Albert et al., 2000; Rubinov and Sporns, 2010). Hub SNPs in the network interacted with each other in their effects on white matter integrity. They may also influence intellectual performance by modulating the associations between white matter integrity and the performance intelligence quotient (IQ).
Flowchart of the genetic clustering strategy to boost power to discover genetic variants influencing the brain. Our imaging-genetic analysis method was designed to reduce the complexity of the search for significant associations between the genotype of every SNP across the genome and the imaging phenotype (white matter integrity here) at every location across the brain. We first reduced the dimensionality of the image data by selecting ROIs where white matter integrity was under strong genetic control, using structural equation modeling and hierarchical clustering. We then searched along the genome to detect every SNP that was significantly associated with FA in these ROIs. To account for possible interactions between these significant SNPs, we connected them to form a network, based on correlations between their effects on white matter integrity. We then assessed (1) their topological configurations, specifically the small-world and scale-free properties of the network and (2) the functional roles of the hub SNPs in the network, i.e., how these hub SNPs interacted to affect white matter integrity and intellectual performance. See Materials and Methods for details of each individual step in the analysis.
Materials and Methods
Participants.
Five-hundred and thirty-one healthy adult subjects (age: 23.7 ± 2.1 years, mean ±SD; age range: 20–29 years; sex: 217 male/314 female)—consisting of pairs of twins and their nontwin siblings—were recruited from 271 different families. Twin subjects were recruited to study genetic effects that underlie different traits, by comparing the similarity of monozygotic (MZ) and dizygotic (DZ) twins/siblings. All subjects received high-resolution brain MRI and neurocognitive evaluations as part of a 5-year research project evaluating healthy Australian twins. The projected sample size for the adult study is 1150 at completion (de Zubicaray et al., 2008). As described previously (Chiang et al., 2009b), zygosity was established objectively by typing nine independent DNA microsatellite polymorphisms (polymorphism information content >0.7), using standard PCR methods and genotyping. Results were cross-checked with blood group (ABO, MNS, and Rh), and phenotypic data (hair, skin, and eye color), giving an estimated overall probability of correct zygosity assignment >99.99%. Zygosity was subsequently confirmed by GWA. All subjects were screened to exclude left-handers and cases of pathology known to affect brain structure, a history of significant head injury, a neurological or psychiatric illness, substance abuse or dependence, or a psychiatric disorder in any first-degree relative. The study was formally approved by the Institutional Review Boards of the Queensland Institute for Medical Research and the University of California, Los Angeles. Written informed consent was obtained from all participants before any experimental procedures were performed.
SNP genotyping.
Genotyping for subjects in this study is part of a larger genotyping project that includes >4400 twins and their siblings, designed to identify melanoma risk factors (Wright and Martin, 2004; Medland et al., 2009). DNA was isolated from blood cells using standard protocols. Genomic DNA samples were analyzed on the Human610-Quad BeadChip (Illumina). Of single nucleotide polymorphisms (SNPs), 592,392 were genotyped. SNPs that did not satisfy the following quality control criteria were excluded: genotype call rate <95%, significant deviation from Hardy–Weinberg equilibrium p <10−6, minor allele frequency <0.01, or only one allele was observed, and a platform-specific recommended quality control score (BeadStudio GenCall score) <0.7. After quality control, 529,497 SNPs remained. Genotype data were collected for 484 of the 531 subjects who received brain MRI scanning. We excluded 12 subjects who were identified to be ancestry outliers (Medland et al., 2009) from the subsequent genome-wide association study (GWAS), so there were 472 subjects with available image and genome-wide SNP data (age: 23.7 ± 2.1 years, mean ± SD; age range: 20–29 years; sex: 193 M/279 F). The participants' demographic information and the twin/sibling family composition are summarized in Table 1.
Demographic data and family composition of the participants whose image and genome-wide SNP data were both available
Evaluation of psychometric intelligence.
General intellectual ability was assessed at age 16 [as in the study by Luciano et al. (2003)] using the Multidimensional Aptitude Battery (MAB) (Jackson, 1984), a measure highly correlated with the Wechsler Adult Intelligence Scale. MAB is designed for assessment of adults and adolescents aged 16 and older. In this study, we examined three verbal (information, arithmetic, and vocabulary) and two performance (spatial and object assembly) sub tests. Each subtest gave a raw score, and verbal (VIQ), performance (PIQ), and full-scale (FIQ) intelligence quotient standardized scores were derived from these sub tests. IQ data were available for 456 participants (Table 1).
Image processing and registration.
All MR images were collected using a 4 tesla Bruker Medspec MRI scanner (Bruker Medical), with a transverse electromagnetic (TEM) headcoil, at the Center for Advanced Imaging (University of Queensland, Australia). Diffusion-weighted scans were acquired using single-shot echo planar imaging with a twice-refocused spin echo sequence, to reduce eddy-current induced distortions. Imaging parameters were as follows: 21 axial slices (5 mm thick), FOV = 23 cm, TR/TE 6090/91.7 ms, 0.5 mm gap, with a 128 × 100 acquisition matrix. Thirty images were acquired: three with no diffusion sensitization (i.e., T2-weighted images) and 27 diffusion-weighted images (b = 1145.7 s/mm2) with gradient directions evenly distributed on an imaginary hemisphere. The reconstruction matrix was 128 × 128, yielding a 1.8 × 1.8 mm2 in-plane resolution. Total scan time was 3.05 min. We used the FMRIB software library (FSL, http://www.fmrib.ox.ac.uk/fsl/) to preprocess and linearly align the diffusion images. For each participant, motion artifacts were corrected by linearly registering all the T2-weighted and diffusion-weighted images to one of the T2-weighted images (the “eddy_correct” command). Then the three T2-weighted images were averaged and stripped of nonbrain tissues to yield a binary brain extraction mask (cerebellum included), using the Brain Extraction Tool (BET) (Smith, 2002), followed by expert manual editing, if necessary. The masked T2-weighted image was then registered to a standardized high-resolution brain MRI template defined in the International Consortium for Brain Mapping space (ICBM) (Holmes et al., 1998) with a nine-parameter linear transformation using the software FLIRT (Jenkinson and Smith, 2001). The resulting transformation parameters were used to rotationally reorient the diffusion tensors (computed from diffusion-weighted images using the “DTIFIT” command) at each voxel (Alexander et al., 2001). The tensor-valued images were linearly realigned based on trilinear interpolation of the log-transformed tensors (Arsigny et al., 2005) and resampled to isotropic voxel resolution (with dimensions: 128 × 128 × 93 voxels, resolution: 1.7 × 1.7 × 1.7 mm3). A fractional anisotropy (FA) map was then constructed from each affine-registered diffusion tensor image. FA is defined as the ratio of the SD to the root mean square of the eigenvalues of a diffusion tensor (Basser and Pierpaoli, 1996). Each participant's FA image was then registered to a mean FA image computed for the first 258 subjects scanned [a subset of the study sample in this paper; (Chiang et al., 2009a)]. For this alignment step, we used a validated fluid registration algorithm that maximizes the Jensen–Rényi divergence of the joint intensity histogram of the two FA images (Chiang et al., 2007).
We averaged the fluidly registered FA images across all subjects and restricted subsequent data analysis to regions with average FA > 0.2, as recommended by Smith et al. (2006). This focused our regions of interest on major white matter fiber structures. Each participant's FA map was smoothed using an isotropic Gaussian filter with full-width at half maximum (FWHM) = 6 mm (Smith et al., 2006).
Defining regions of interest for GWA.
We defined regions of interest (ROIs) for GWA from brain regions where genetic influences on white matter integrity were high, based on a standard structural equation model (SEM) to model the observed variation of FA, which is widely used in twin studies. The SEM partitions the observed variance into components (proportions of the overall variance) due to additive genetic factors (A), shared environment (C), and unique environment (E). In our sample, where nontwin siblings are also included, we used the extended twin design (Posthuma et al., 2000; Lenroot et al., 2009) where the covariance matrix for FA was modeled for each family based on the known genetic similarity between relatives. The parameters of the SEM were estimated using a maximum likelihood scheme to maximize their fit (Neale et al., 1992). The significance map for the genetic factors, p(A), was determined as minus two times the difference between the log-likelihood of the full (ACE) and the restricted (CE) model at each voxel, which is asymptotically distributed approximately as a χ2 distribution with one degree of freedom. p(A) was further assessed using the false discovery rate method by Benjamini and Hochberg (1995) (BH-FDR) to correct for multiple comparisons. We selected voxels where the genetic component A contributed to at least 60% of the total variation in FA and its significance passed the BH-FDR ≤ 0.05 threshold. This led to 4876 candidate voxels for the ROIs (5% of the total 99,480 voxels). By doing this, we ensured that the main effect (i.e., genetic influences on FA) for the genetic correlation analysis below was significant, and no more than 5% of the voxels declared as genetically influenced are likely to be false positive findings.
We estimated the genetic correlation coefficient rg between any two voxels, using the cross-trait cross-twin method (Chiang et al., 2009b), yielding a 4876-by-4876 genetic correlation matrix. rg is the correlation between the genetic effects influencing two traits x and y, i.e., Cov(Gx,Gy)/
Here aij = 0 or 1 is the adjacency function between nodes i and j. aij = 1 if the genetic correlation rg between them is not <0.95, and indicates that the two nodes are connected.
ROI selection using hierarchical clustering. a, The dendrogram shows how the hierarchical clustering method aggregates the 4876 candidate voxels into clusters based on the TO index between each pair of voxels. Each voxel was represented by the most terminal branch, i.e., leaf, of the dendrogram. Voxels for which the pairwise distance d ≤ 0.5 were grouped to form a cluster, where d = 1 − TO index (the topological overlap index), and were demonstrated as leaves of the same color in the dendrogram. b, Eighteen clusters composed of no less than 50 voxels each were selected as ROIs; the color of dendrogram leaves in these ROI-clusters was retained in the bar, while leaves of the non-ROI clusters were shown in gray. c, Each leaf of the dendrogram coincides with the corresponding column in the color-coded matrix, where the TO index matrix is shown in the upper triangle, and the genetic correlation (rg) matrix in the lower triangle. Compared with rg, the TO index identified clusters that were smaller in size, but contained voxels of higher connectedness. d, Selected ROIs are shown overlapped on the FA maps that are displayed in radiological orientation. These ROIs were rearranged as sequentially numbered from the inferior to superior level of the brain. Therefore, the color and numbering of ROIs in d match those on the bar in b, but the numbering of the ROIs in b is not in numerical order.
GWA for white matter integrity.
The mean FA for each ROI was associated respectively with each SNP across the whole genome (n = 529,497) using the Quantitative Transmission Disequilibrium Test (qTDT) model, where the phenotype (mean FA of the ROI here) was regressed against the SNP genotypic value, with variance components that accounted for within-family phenotypic correlations included in the model (Abecasis et al., 2000). If an SNP has two alleles A and a, and allele A is the major allele that appears more frequent than allele a, then the SNP genotypic value equals 1 for AA, 0 for Aa, and −1 for aa. Subjects' age and sex were included as covariates. qTDT tests the significance of the association between FA and an SNP genotype effect by comparing the difference in log-likelihood functions between the full model (the SNP effect was included) and the restricted (the SNP effect was excluded) model. To increase power for detecting associations, we used the whole genotypic value of the SNPs for the association analysis (total association), instead of dividing the genotypic value into between-family and within-family components (Abecasis et al., 2000). Since the ethnic outliers had been removed from GWA, the risk for detecting spurious associations due to population stratification was minimized.
Corrections for multiple comparisons.
We adopted a two-stage approach to correct for multiple comparisons across the genome and across the image. For associations between FA and SNPs, we first used the traditional Bonferroni method to correct for multiple comparisons across the genome (Lander and Kruglyak, 1995): SNPs with a p value <0.05/(number of SNPs across the genome) = 0.05/529,497 = 9.4 × 10−8 were declared to be significantly associated with FA. For each of these significant SNPs, we then further corrected for multiple comparisons across the ROIs using the bootstrap method (Rhodes et al., 2002). The p value of the SNP was converted to a Z value using the inverse normal cumulative distribution function. Then the p value adjusted for multiple comparisons, denoted by q, was derived by comparing the sum of Z values across all the 18 ROIs for that SNP with the bootstrap distribution of 100,000 sums of Z values randomly selected from the 529,497 SNPs. An SNP with q < 0.05 was considered to reach overall significance.
Genetic interconnection network and network topology.
We then extended our analysis from detecting the effects of individual genetic loci at single locations in the image, to constructing a genetic network that underlies white matter integrity over multiple brain regions. Our methods were inspired by previous work deriving gene coexpression networks (Zhang and Horvath, 2005; Ghazalpour et al., 2006) but modified to deal with the case where SNPs in genes may influence multiple phenotypes (different parts of a brain image). In the network scheme, the connectedness between SNPs was derived from the correlations between their phenotypic effects. We selected SNPs whose association with FA in at least one ROI was with a significance p value < 10-5; this criterion was somewhat relaxed (compared with the very strict GWA-significant threshold with p < 9.4 × 10−8) to include more SNPs into the network analysis and might therefore boost sensitivity for detecting interactions of gene effects. Three-hundred and twenty-nine SNPs were selected. We then computed the Spearman's rank correlation coefficient ρ for every pair of SNPs, based on their partial regression coefficients with respect to FA across all the 18 ROIs. The partial regression coefficient β for an SNP represents change in FA due to the unit additive effect of the major allele. β could be positive, zero, or negative, depending on the direction of the SNP major allele effect on FA. We used Bi to denote the vector that contains β values across all the 18 ROIs for SNP i. SNPs i and j were considered to be connected (i.e., aij = 1) if (1) the correlation between Bi and Bj, denoted by ρij was no less than a threshold ρT, and, (2) the two SNPs were not in linkage disequilibrium (LD), defined as a pairwise LD r2 for the SNPs ≤ 0.2 (Hill and Robertson, 1968). The latter criterion ensured that the connectedness between these two SNPs was not simply due to their positional proximity in the chromosomes, which was reflected by a higher LD between them. The topological configuration of the genetic network was a function of ρT. Here, we tested different ρT values from 0.4 to 0.9, with increments of 0.1. For each ρT, we computed three common measures of the network: the degree of each node (SNP), the clustering coefficient C, and the characteristic path length L (Watts and Strogatz, 1998; Bullmore and Sporns, 2009; Rubinov and Sporns, 2010). The clustering coefficient C is a measure of segregation in the network; a higher value of C indicates that the nodes of the network tend to form isolated clusters (Watts and Strogatz, 1998). On the other hand, the characteristic path length L is an index of integration in the network. It measures the average shortest path length between all pairs of nodes in the network; a smaller L indicates that the topological distance between the segregated clusters in the network is shorter, or the integration between these segregated clusters is more efficient. When a network is simultaneously highly segregated and integrated, it is in a small-world topology (Rubinov and Sporns, 2010). This small-world property is quantitatively estimated by the “small-worldness” of the network (Humphries and Gurney, 2008): S = (C/Crand)/(L/Lrand). Here, Crand and Lrand are the mean clustering coefficient and the characteristic path length of Nr random networks, where each random network was generated by randomly rewiring the links of the original network but keeping the distribution of the degree of the nodes unchanged (Maslov and Sneppen, 2002). Nr was set to 1000. A network is deemed a small-world if S > 1 (Humphries and Gurney, 2008).
We further tested whether the genetic network was also in a scale-free topology. The degree distribution of a scale-free network follows a power law, i.e., f(k) ∝ k−λ, where f(k) is the frequency of the nodes that have degree k (Barabasi and Albert, 1999). A scale-free network is known to be more tolerant of random errors (Albert et al., 2000).
Gene ontology enrichment analysis.
To identify possible functional roles for the genes in the network, we performed gene ontology (GO) enrichment analysis using the web-based Gene Set Enrichment Analysis Toolkit (WebGestalt2) (Zhang et al., 2005; Duncan et al., 2010). The 329 SNPs selected for network analysis were first annotated to corresponding genes, where possible, by searching the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/batchquery.html), yielding a total of 101 genes. These genes were then compared with all genes in the human genome using the hypergeometric test, and the genes were assigned, where possible, to enriched GO categories under three main domains, cellular components, molecular functions, and biological processes. Each GO category contained no less than two genes. Multiple comparisons across these enriched categories were corrected using the BH-FDR method (Benjamini and Hochberg, 1995), with the significance level for the FDR-adjusted p value set to 0.05. The results were represented by an enriched directed acyclic graph (DAG).
Detecting genetic interactions between SNPs and interaction effects of SNPs on FA-IQ associations.
Finding connections between SNPs defined in the above section may help to search for epistasis (genetic interactions) between SNPs that affect white matter integrity, i.e., where the presence of one SNP may modify or interact with the effect of other SNPs. To explore this, we estimated the interaction between SNPs whose associations with FA were with a significance p value < 10−5 in at least one ROI (n = 329 SNPs), by adding an interaction term to the qTDT model: y = μ + xNβN + g1β1 + g2β2 + (g1 · g2) βint.
Here, y is the mean FA within an ROI. For voxel-based analysis, y is the FA of an individual voxel in the white matter mask (defined as FA > 0.2). μ is the overall mean of FA, and xN is a vector of covariates that includes the subjects' age and sex. g1 and g2 indicate the genotypic value of SNP1 and SNP2 respectively, and (g1 · g2) represents the interaction between the two SNPs.
We also tested whether the hub SNPs in the scale-free genetic network (see Results for the 9 hub SNPs) may affect IQ via its modulatory effect on the association between white matter integrity and IQ. Here, we estimated the interaction of a hub SNP with FA and the PIQ scale at every voxel within the white matter mask; we selected PIQ, as in our previous study (Chiang et al., 2009b) we found that the cross-trait correlation between FA and PIQ was strongly mediated by overlapping genetic factors. This was modeled by: y = μ + xNβN + g · β1 + PIQ · β2 + (g · PIQ) βint.
Here, g is the genotypic value of a hub SNP, PIQ indicates the participant's PIQ scale, and (g · PIQ) represents the interaction between the hub SNP and PIQ. The definition of y, μ, and xN are the same as in the previous regression equation for genetic interactions between SNPs.
βint ≠ 0 indicates a significant interaction effect, which was determined as minus two times the difference between the log-likelihood of the full model and the restricted (βint was excluded) model, which is asymptotically distributed approximately as a χ2 distribution with one degree of freedom. We then used the Bonferroni method to correct for multiple comparisons across SNPs, with the p threshold set equal to 0.05/ns = 1.7 × 10−6. Here, ns = 29,180 is the number of all possible pairs of the SNPs (= 329 × 328/2), minus those in linkage disequilibrium (LD r2 for the SNPs ≤ 0.2). To correct for multiple comparisons across the ROIs or the image volume, for the ROI analysis (for genetic interactions between SNPs), the interaction between two SNPs was regarded as significant, if the significance p value of βint in any of the 18 ROIs was less than the threshold that controlled the BH-FDR within 5%. For the voxel-based analysis (for genetic interactions between SNPs, and interactions of the hub SNP with FA and PIQ), as the statistical comparisons were performed in ∼100,000 voxels, we used a more stringent but accepted method, known as topological FDR (Chumbley and Friston, 2009), to control the expected proportion of false positive findings over all connected sets of voxels, i.e., clusters. We controlled the topological FDR of clusters within 5%, which means that only 5% of the clusters that are declared to be significant tend, on average, to be false positive findings.
Results
To perform GWA for white matter integrity measured by FA, we first identified 18 ROIs using hierarchical clustering based on the 4876-by-4876 topological overlap (TO) matrix (Fig. 2). We assumed that there should be more power to detect associations between FA in these ROIs and SNPs identified from the whole genome, as the FA of voxels within an individual ROI was determined by a common set of genes. Twenty-four SNPs were detected in GWA (Table 2), which showed strong “genome-wide significant” associations with white matter integrity (FA), with an uncorrected p value <9.4 × 10−8, and an overall significance (across all 18 ROIs) q value <0.05.
SNPs whose associations with FA reach overall significance, at levels that correct for the genome-wide search, and correct for the number of ROIs assessed
Next, we were interested in how individual SNPs were correlated under a network configuration to affect brain integrity, or in other words, how the genetic network drives the brain network. We defined the nodes of the network as SNPs that were associated with white matter integrity with a significance p value < 10-5 in at least one ROI (number of nodes = 329). We then examined the topological properties of the genetic network, by increasing the correlation threshold ρT from 0.4 to 0.9 (Table 3). Every ρT yielded a small-world network (S > 1). However, when the network consisted of nodes with stronger connections only (ρT ≥ 0.8), S increased abruptly, as also shown in Figure 3a, suggesting that the efficiency of the genetic network was determined by SNPs with stronger correlations in their effects. For the scale-free property, given that the slope of the fitted line for log(f(k)) versus log(k) should be negative in a typical scale-free configuration, the genetic network can be considered to be scale-free only when ρT ≥ 0.7, where p values for the fit are also very small. If we further considered the effect size for the above correlation fitted by requiring a signed R2 > 0.8 (Zhang and Horvath, 2005), only ρT = 0.9 yielded a scale-free network by this definition, where f(k) ∝ k−1.58 (Fig. 3b, signed R2 = 0.93, p = 3.26 × 10−11). Signed R2 here was derived by adjusting the sign of the Pearson's coefficient of determination R2, or signed R2 = −sgn(λ)·R2, where sgn represents the sign function. In other words, if the fitted line for log(f(k)) versus log(k) has a negative (positive) slope λ, it would have a positive (negative) signed R2. Note that ρT = 0.9 also gave the highest small-worldness index (S = 6.70), which corroborated the theoretical prediction that scale-free networks are ultra-small (Cohen and Havlin, 2003).
The topological properties of the genetic network
Network topology at different correlation thresholds ρT. a, We measured the small-world topology of the genetic network by estimating the small-worldness index, S (see Materials and Methods for definition) at different correlation thresholds ρT. The nodes of the network were SNPs associated with white matter integrity with a significance p value < 10-5 in at least one ROI. Two nodes (SNPs) are defined as “connected,” if they have strongly correlated effects on FA (the Spearman rank correlation coefficient ρ between their respective association with FA is no less than ρT) and are not in linkage disequilibrium (linkage disequilibrium r2 ≤ 0.2). Every ρT yielded a small-world network (S > 1). However, the value of S at ρT ≥ 0.8 is much higher than that with respect to a smaller ρT (0.4–0.7), suggesting that the efficiency of the genetic network is determined by SNPs with strong correlations in their effects. b, When ρT = 0.9, the genetic network has a scale-free topology, as shown by the log–log plot (the network is visualized in Fig. 4 below). k is the number of connections that an SNP has, and k = 1 means that the SNPs are isolated and not connected to any other SNPs. f(k) is the number of SNPs with k connections, and f(k) = 144.00 × k-1.58 (Pearson R2 for line fitting = 0.93, p = 3.26 × 10−11).
Figure 4 visualizes the genetic network at ρT = 0.9. The visual layout of these SNPs is based on the Kamada–Kawai algorithm (Kamada and Kawai, 1989) implemented in the Pajek software [http://vlado.fmf.uni-lj.si/pub/networks/pajek/ (Batagelj and Mrvar, 1998)]. This SNP network obeys a scale-free topology. This means that most SNPs have no or only scarce connections with other SNPs. However, a small proportion of SNPs is highly connected with each other and they are therefore the hubs of the network. For example, 134 SNPs (41% of all 329 SNPs) are isolated and not connected with any other SNPs (k = 1). On the other hand, 9 SNPs (< 3% of all SNPs) are highly connected with other SNPs, and thus the hubs of the network. They are (Fig. 4a, labeled as red circles) rs11686777 (k = 20), rs10894606, rs4245110, and rs4491241 (k = 18), rs7839054 (k = 17), rs10091460 (k = 16), rs17038289, rs2012223, and rs4872262 (k = 15) (Table 4). rs10894606, rs10091460, rs17038289, and rs4872262 are also genome-wide significant. These hub SNPs were associated with FA in the ROIs identified in major white matter tracts (Fig. 4b). Specifically, almost all (8 of 9) hub SNPs were associated with white matter integrity in ROI 13 in the superior corona radiata, a white matter structure containing axons projecting to and from the cerebral cortex.
Visualization of the genetic network at ρT = 0.9. a, The nodes of the network are displayed as blue circles, and two connected nodes are linked by a black line. See Materials and Methods or Figure 3 for the definition of network nodes and their connections. SNPs that have no less than 15 connections (k ≥ 15) are defined as the hubs of the network and are colored in red, with their names labeled (see b for the names of these SNPs). To identify white matter regions that the genetic network is most likely to influence, associations between these hub SNPs and the white matter ROIs (see Fig. 2) are demonstrated by a matrix of circles in b. Each circle is the same color as the corresponding hub SNP on the same row, to ease interpretation. When a circle is present, it indicates that the hub SNP was associated with the white matter integrity of the ROI (shown in the same column as the circle) with a significance p value < 10-5. ROIs that did not show significant associations with any hub SNP (e.g., ROIs 1–5 and 14–18) are not shown.
Hub SNPs of the genetic network
To determine the functional significance of the genetic network, we further used the program, WebGestalt2, to assess gene ontology (GO) enrichment for the 101 genes to which the nodes (SNPs) of the network were annotated. Figure 5 shows the directed acyclic graph showing patterns of genetic enrichment. Thirteen genes are enriched in “biological processes,” with their function specified as “cell adhesion.” Fourteen genes belong to the “molecular functions” domain, particularly for calcium ion binding. Forty-two genes are enriched in cellular components relevant to the integrity of nerve fibers: 7 of them are implicated in the integrity of synapses, 33 for the plasma membrane, and 2 for voltage-gated sodium channel complexes. We further analyzed the seven genes in our network that influence the integrity of synapses, using the GNCPro program (http://gncpro.sabiosciences.com/gncpro/gncpro.php). GNCpro is a visualization/data integration system that integrates collective biological knowledge and databases to visualize gene expression interactions as a network (Liu et al., 2010). Figure 6 visualizes various interaction patterns between these seven genes and their neighboring genes in a network configuration. For example, the DMD (dystrophin) and the CACNA1C (calcium channel, voltage-dependent, L type, α 1C subunit) genes, both belong to the enriched GO category “synapse. ” They interact with each other by coexpressing with AVPR1B (arginine vasopressin receptor 1B) and THRA (thyroid hormone receptor, α)—two hormone-related genes that influence cerebral metabolism and cognitive function (Esaki et al., 2003; DeVito et al., 2009).
This directed acyclic graph (DAG) shows the gene ontology categories that are enriched in SNPs that are found in the genetic network described here. SNPs from the genetic network in Figure 4 were annotated as belonging to a set of 101 genes. These genes were further annotated and organized into enriched GO categories using WebGestalt2 (Zhang et al., 2005; Duncan et al., 2010), with all genes in the human genome as a reference (see Materials and Methods). These enriched GO categories were hierarchically organized into a DAG tree, where each box in the tree lists the name of the GO category, the number of genes in that category, as well as the FDR-adjusted p value (adjP) if the enrichment is significant. Categories in red are enriched ones (adjusted p value <0.05), while those in black are non-enriched.
An interaction network involving some of the genes we recovered, i.e., those influencing synaptic integrity in prior studies. We used the program, GNCPro (SABiosciences), to analyze the expression relationships between the seven genes belonging to the enriched gene ontology category “synapse” (see Fig. 5), visualized as a gene interaction network (Liu et al., 2010). These seven genes are represented by solid dark purple circles; they are as follows: DMD, CACNA1C, CTBP2, FAIM2, SYT17, SYN3, and GRM8. See Table 5 for the full name and function of these genes. To detect possible indirect interactions among these genes, e.g., where a gene influences the expression of another gene via some intermediate genes, any other genes found to interact with any of the seven synapse genes were included in the analysis as neighboring genes, represented by gray lozenge nodes. These neighboring genes do not necessarily contain SNPs of the genetic network in Figure 4. The edges of the network indicate various types of interactions among the genes, represented as solid and dotted lines in color as follows: solid red lines, downregulation (the product of a gene downregulates the other one); solid purple lines, coexpression (the two genes are expressed at the same time); solid brown lines, physical interaction (products of the two genes are known to bind one another); dotted aquamarine lines, predicted protein interaction (products of the two genes are predicted to bind one another); dotted purple lines, predicted transcription factor regulation (a transcription factor has a predicted binding site for the promoter of another gene). See http://www.genecards.org/ for the full name of the genes represented by lozenge nodes.
We found that network connectivity between SNPs determined their genetic interactions. Figure 7a shows that SNPs with more connections in the network had on average greater number of interactions with other SNPs. In other words, hub SNPs in the network are more likely to interact with other SNPs in their effects on FA than nonhub ones. This trend is clear in Figure 7b, where the number of interactions for an SNP with degree k is weighted by the probability of finding such an SNP. The fitted curve in Figure 7b shows that hub SNPs in the network are very promising candidates for genome-wide searches for genetic interactions (GWIA), as the likelihood for detecting interactions for these SNPs with others is raised to nearly the second power of their connectivity degree. Figure 8 displays a map of interactions between two hub SNPs at different chromosomes, rs10091460 and rs10894606 (the former is on chromosome 8 and the latter on chromosome 11; both are genome-wide significant). The voxel-based analysis shows that interactions between these two SNPs are highly significant across major white matter tracts, especially in the superior corona radiata and superior longitudinal fasciculus bilaterally.
Connectivity between SNPs predicts their interactions. Each bar in a indicates how many other SNPs, on average, interact with a single SNP with degree k, in terms of their statistical influences on white matter integrity. Two SNPs are defined as statistically interacting with each other if the significance p value of βint in any of the 18 ROIs is less than the threshold that controls the BH-FDR within 5%, as described in Materials and Methods. The fitted line in the bar graph shows that if an SNP has more connections, it is more likely to interact with other SNPs on affecting FA (y-coordinate = 15.91 + 5.26k; Pearson R2 for line fitting = 60%; Spearman's rank correlation coefficient = 0.76, p = 1.4 × 10−4). Even so, considering that the frequency of SNPs with k connections follows a power-law distribution with respect to k (Fig. 3), in (b), we developed an interaction index, which was the ratio of the average number of SNPs that interacted with SNPs with k connections (i.e., the y-coordinate in plot a), over the number of SNPs with k connections. This interaction index may be thought of as the frequency of interactions for an SNP with k connections, conditioned on the occurrence of that SNP. The interaction index is approximately proportional to the square of k (y-coordinate = 0.11 × k2.10; Pearson R2 for the power curve fitting = 90%, while fitting to a line gave R2 = 53% only; Spearman's rank correlation coefficient = 0.96, p < 10-6), indicating that interactions between SNPs are much more likely to be detected for SNPs with higher connectivity.
Interactions between two example SNPs, rs10091460 and rs10894606. These maps show regions where the two SNPs, rs10091460 and rs10894606, significantly interact with each other in affecting white matter integrity. Voxelwise significance p values were color-coded and displayed on a log10 scale. We used the topological FDR method to correct for multiple comparisons across the significant clusters; these were defined as connected sets of voxels with the standard Normal variate Z > 3, where Z corresponds to the voxelwise p value. Only clusters that passed the topological FDR ≤ 0.05 threshold are displayed. MNI coordinates (in mm) of the slices, in radiological orientation, are shown below each slice. Interactions between these two SNPs are highly significant across major white matter tracts, especially in the superior corona radiata (red arrow) and superior longitudinal fasciculus (magenta arrow) bilaterally.
We next investigated the functional roles of the hub SNPs by testing their interaction effects on FA-IQ associations. Six of the nine hub SNPs indirectly influenced performance intellectual performance by modulating the associations between white matter anisotropy and performance IQ (Fig. 9). This modulatory effect was significant in white matter tracts related to IQ (Chiang et al., 2009b), such as the fornix (rs10091460 and rs4872262), genu and splenium of the corpus callosum (rs17038289), cingulum (rs17038289), the anterior limb of the internal capsule (rs10091460 and rs4872262), optic radiation (rs2012223), superior longitudinal fasciculus (rs7839054), and superior corona radiata (rs4245110 and rs17038289). The sign of βint indicates the direction of the modulatory effects of these SNPs. We found βint > 0 for rs4245110 and rs2012223, meaning that the addition of the major allele of these SNPs was associated with an increase in slope for the PIQ (on the x-axis) regressed against FA (on the y-axis). By contrast, rs7839054, rs10091460, and rs4872262 had a value of βint < 0, indicating that the addition of the major allele of these SNPs was associated with decrease in slope of the FA-PIQ regression. By contrast with the other SNPs, rs17038289 showed both directions of genetic modulation, with βint > 0 in the cingulum and the genu and splenium of the corpus callosum, and with βint < 0 in the superior corona radiata.
Interaction effects of the hub SNPs on the correlation between white matter integrity and performance IQ. These maps show regions where the hub SNPs significantly interact with associations between white matter integrity and performance IQ, displayed as clusters that passed the topological FDR ≤ 0.05 threshold. Voxels in these clusters were color-coded by their p value (on a log10 scale) and also the direction of the interaction effect (red-yellow for βint > 0, and blue-green for βint < 0). MNI coordinates (in mm) of the slices, in radiological orientation, are shown below each slice. SCR, Superior corona radiata; SLF, superior longitudinal fasciculus; ALIC, anterior limb of the internal capsule; CC, corpus callosum; Cglm, cingulum; OR, optic radiation.
Discussion
In this study, we developed a new approach to identify networks of genetic variants that influence brain fiber integrity and intellectual performance in 472 young healthy adults. Voxel-based analysis is popular in imaging: when applied to genetic questions, it has identified genetic effects on regions of the brain without the need for an a priori hypothesis (Hulshoff Pol et al., 2006; Chiang et al., 2009b). Even so, directly performing GWA for imaging measures voxel-by-voxel may lose considerable power due to the sheer number of multiple tests—across both the image and the genome. Here we combined voxel-based and ROI approaches for GWA. Using genetic correlation to find voxels with common genetic influences, we defined ROIs, whose component voxels had FA values influenced by a highly overlapping set of genes (quantified by the genetic correlation matrix, rg(x,y), that compares all pairs of voxels). We detected 24 SNPs in 6 genes from the whole genome with significant effects on white matter integrity, after a highly conservative Bonferroni-type correction for multiple comparisons across the genome and across all ROIs.
The genetic network underlying white matter integrity
We then recovered a network of SNPs that influence the brain by linking SNPs with correlated effects on white matter integrity. The correlation coefficient between the phenotypic effects of two genetic loci measures the similarity of their expression patterns (Eisen et al., 1998). We further assumed that if two SNPs are connected in a network to affect the brain, the spatial distribution of their influences on brain integrity will also be similar, i.e., their partial regression coefficients (β) with respect to FA will be correlated. This correlation-based approach has been used in gene expression profiling, where genes with correlated expression levels in cDNA microarrays have been linked to form gene coexpression networks (Zhang and Horvath, 2005; Ghazalpour et al., 2006). We note that this approach to gene expression networks inspired our work here on SNP networks that influence brain images. Such gene coexpression networks in various organisms, e.g., mouse or yeast, have been shown to exhibit scale-free topology (Carlson et al., 2006; Ghazalpour et al., 2006). We note that a genetic network based on correlations between gene expression levels may better describe genetic influence on phenotypic traits than networks based on SNPs. Essentially, genes are functional units of heredity, and it is not always clear how SNPs impact gene expression. Nevertheless, the multivariate nature of a gene-wide association analysis makes it computationally intensive, especially when applied to imaging brain phenotypes (Hibar et al., 2011).
The genetic network influencing white matter integrity is both small-world and scale-free
The genetic network underlying white matter integrity exhibited a small-world topology. In a small-world network, interactions among network nodes (SNPs) are routed through “hub” nodes of the network, which makes the mean path length of the network smaller. Small-world networks are thus more efficient than random networks (Barabási and Oltvai, 2004). When the genetic network was defined by including only SNPs with strong correlations in their effects on the brain, the network became ultra-small, with a so-called “scale-free” topology, with connection probabilities approximately following a power law. Evolutionarily, hub genes (SNPs) in the scale-free genetic network may originate from gene duplication that follows the “preferential attachment,” or “rich-gets-richer” principle (Barabási and Oltvai, 2004). For example, when any gene linked to a hub gene is duplicated, since the duplicated gene shares the same function as the original gene to some degree, the connectivity degree (k) of the hub gene will increase by 1 (i.e., rich-gets-richer). On the contrary, an isolated gene (k = 1) may increase its number of connections with other genes only when it is duplicated itself, which is a low probability event. Scale-free genetic networks may be evolutionarily advantageous: they are relatively error tolerant, and may remain stable even if their nodes are randomly damaged (Albert et al., 2000). As white matter integrity is critical for normal brain function, its underlying genetic network may be quite robust, offering resistance to environmental stressors. The scale-free property observed in our genetic network has been reported for several other kinds of biological networks, e.g., in gene or protein interactions, and even in the internet and in other social networks (Ravasz et al., 2002). Brain connectivity, assessed using functional MRI, may also follow a scale-free distribution (Eguíluz et al., 2005).
Functional significance of the genetic network
The gene ontology enrichment analysis indicates that the genetic network for white matter integrity is also critical for neural transmission, affecting the integrity of synapses and plasma membranes, and the function of sodium-channel complexes. Moreover, hub SNPs of the network, i.e., SNPs that were highly connected with other SNPs, belong to genes crucial for normal brain function (Tables 4, 5): the OPCML gene is a member of the IgLONs glycoprotein gene family, modulating cell—cell recognition and axonal migration (Reed et al., 2007). SCN3A encodes the α subunit of the voltage-gated sodium channel. Sodium channels mediate the voltage-dependent permeability of excitable membranes to sodium ions, facilitating the generation and propagation of action potentials in neurons (Malo et al., 1994). The HADH protein is one of the enzymes in the fatty acid β-oxidation pathway, which catalyzes the conversion of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, a key step in fatty acid metabolism during brain development (Reichmann et al., 1988). This gene is also involved in amyloid β-peptide binding in Alzheimer's disease (He et al., 1998).
The full name and function (if known) of the gene symbols referred in this paper
These hub SNPs influenced intellectual performance by modulating the associations between performance IQ and the integrity of the fiber systems that are part of visuospatial circuits (Chiang et al., 2009b; Kravitz et al., 2011), including the optic radiations (Tuch et al., 2005), fornix (Kwok and Buckley, 2010), corpus callosum (Begré et al., 2007), cingulum bundle (Bush et al., 2000), and the superior longitudinal fasciculus, which is associated with spatial working memory performance in children (Vestergaard et al., 2011). Human intelligence is highly heritable; a substantial proportion (40–50%) of the variation in IQ is genetically determined (Davies et al., 2011). However, effects of individual SNPs on IQ are very small (∼1% of variation in IQ), indicating that genetic variants may influence intellectual performance via the accumulated effects of multiple SNPs (Davies et al., 2011), or, as found here, as a modulatory factor that interacts with both white matter integrity and IQ. We also found that the same genotype of an SNP, e.g., rs17038289, could impose both positive and negative modulatory effects on FA–IQ associations in different brain regions. This may be attributed to differences in the microenvironment of the brain area where a gene is expressed (Posner and Rothbart, 2005; Lein et al., 2007; Liang et al., 2007). A similar example of a region-specific gene effect has also been found for the Val66Met polymorphism of the brain-derived neurotrophic factor (BDNF), a gene crucially related to cognitive performance. Dominant homozygote genotypes (Val/Val) were associated with greater volume in the hippocampus and the prefrontal cortex (Pezawas et al., 2004), but with lower integrity in major white matter tracts such as the corpus callosum and the optic radiation (Chiang et al., 2011).
Network connectivity predicts genetic interactions
The SNP network detected in Figure 4 may help to guide the search for genes that interact to influence white matter integrity. The connected SNPs, especially those at network hubs, strongly interact with each other in their effects on white matter integrity. Finding interactions between genetic variants may help in explaining their effects on normal phenotypes or disease expression (Prata et al., 2009); higher-order interactions between genetic variants are typically overlooked in GWA designs, which typically seek associations between traits and individual SNPs one at a time (Becker et al., 2011). To address this, genome-wide interaction analysis (GWIA) has become popular in recent years, to search for possible interactions between pairs of SNPs on traits of interest (Marchini et al., 2005; Becker et al., 2011). Even so, the computational requirements to apply GWIA to imaging studies are currently prohibitive. For a single voxel, a 500,000-SNP GWA requires 1.25 × 1011 statistical tests to detect SNP interactions. There is a concomitant loss in power as more stringent corrections are needed for the vast number of tests involved in GWIA versus GWA (Becker et al., 2011). Computational effort may be wasted in such a brute-force search, as most genomic SNPs have no detectable phenotypic effect, both here and in other GWA (Visscher, 2008; Stein et al., 2010). In our analyses, the network connectivity of a given SNP predicts its likelihood in interacting with other SNPs. Our method offers a simpler approach to GWIA and selects candidate SNPs by detecting correlated effects on FA; computing correlations between SNPs' partial regression coefficients (β) with respect to FA is much easier than testing how every SNP pair might interact, using a second-order regression. This may boost power to detect interactions, as promising interactions are prioritized based on advantageous search criteria.
Our method will likely empower work in imaging genomics and connectomics (e.g., the Human Connectome Project). Specifically, associations between the SNPs detected here and FA, and the modulatory effects of the hub SNPs on associations between FA and IQ, require replication. While our study was under review, a related study was published that used clustering of the genetic correlation matrix, in twins, to define brain regions with common genetic determination (Chen et al., 2012). In that paper, the authors partitioned the cortical surface, modeled using anatomical MRI, into several regions that had partially overlapping genetic determinants. It was speculated that using these genetically clustered regions may boost the power for subsequent genome-wide analysis using GWAS. However, that study did not perform GWAS on the clustered cortical regions. In our current study, we have shown that a related method for genetic clustering can indeed be applied to diffusion images of the brain, and when GWAS was performed on the resulting clusters, the method yielded promising genome-wide associations. We further showed that the resulting SNP network showed a characteristic network topology, with scale-free and small-world properties, and implicated specific molecular pathways in driving brain integrity. Rare DNA sequence variations that contribute to genetic influences on FA may go undetected in GWA even when sample sizes are very large (Schork et al., 2009). When data from large independent cohorts become more widely available, for example via the Enigma project (http://enigma.loni.ucla.edu), two advantages of this genetic network analysis will be apparent: (1) genetic correlation can tap into the natural latent structure of gene action in a brain image; (2) voxel clustering by genetic affinity should boost power to find SNPs with correlated effects in genome-wide scans. SNP aggregation can be used for dimension reduction, so the resulting networks are strong candidates for verification and replication, without the computational and statistical burden of a full image-wide genome-wide search.
Footnotes
This study was supported by Grant RO1 HD050735 from the National Institute of Child Health and Human Development, and Project Grant 496682 from the National Health and Medical Research Council, Australia. The collection of IQ data and zygosity typing was supported by the Australian Research Council (A7960034, A79906588, A79801419, DP0212016). M.-C.C. is supported by the National Science Council (NSC 100-2218-E-010-001-MY2) and the Brain Research Center, National Yang-Ming University (a grant from Ministry of Education, Aim for the Top University Plan, 100AC-B12), Taiwan. G.I.d.Z. is supported by an Australian Research Council Future Fellowship (FT0991634). Additional support for algorithm development was provided by the NIA, NIBIB, and the National Center for Research Resources (AG016570, EB01651, RR019771 to P.M.T.). We are also grateful to the twins for their willingness to participate in our studies, and research nurses, Marlene Grace and Ann Eldridge, Queensland Institute of Medical Research, for twin recruitment.
The authors report no competing financial interests.
- Correspondence should be addressed to Dr. Paul Thompson, Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA 90095-7332. thompson{at}loni.ucla.edu