A major challenge in neuroscience is finding which genes affect brain integrity, connectivity, and intellectual function. Discovering influential genes holds vast promise for neuroscience, but typical genome-wide searches assess approximately one million genetic variants one-by-one, leading to intractable false positive rates, even with vast samples of subjects. Even more intractable is the question of which genes interact and how they work together to affect brain connectivity. Here, we report a novel approach that discovers which genes contribute to brain wiring and fiber integrity at all pairs of points in a brain scan. We studied genetic correlations between thousands of points in human brain images from 472 twins and their nontwin siblings (mean age: 23.7 ± 2.1 SD years; 193 male/279 female). We combined clustering with genome-wide scanning to find brain systems with common genetic determination. We then filtered the image in a new way to boost power to find causal genes. Using network analysis, we found a network of genes that affect brain wiring in healthy young adults. Our new strategy makes it computationally more tractable to discover genes that affect brain integrity. The gene network showed small-world and scale-free topologies, suggesting efficiency in genetic interactions and resilience to network disruption. Genetic variants at hubs of the network influence intellectual performance by modulating associations between performance intelligence quotient and the integrity of major white matter tracts, such as the callosal genu and splenium, cingulum, optic radiations, and the superior longitudinal fasciculus.
A major goal in neuroscience is to identify specific genes that affect brain structure and function, as well as genes that modulate risk for neuropsychiatric disease (Gandhi and Wood, 2010). Using genome-wide association (GWA) scans, scientists can now link genetic markers across the whole genome with biological measures in human populations, to search for common genetic variants that associate with higher risk for disease, e.g., abnormalities in the ApoE gene (Schuff et al., 2009) or the CLU gene (Braskie et al., 2011) associated with late-onset Alzheimer's disease. GWA can help to identify contributors to the mechanisms of neuropsychiatric disease, which can be pursued to identify molecular targets for treatment. Even so, standard GWA studies ignore interactions between genes (Manolio et al., 2009; Becker et al., 2011); the power of GWA is also rather low as one needs to correct for approximately a million hypotheses tested across the genome, so vast samples of subject need to be assessed.
Recently, the cDNA microarray method has been widely used to identify gene coexpression networks where the component genes are “clustered” into groups that jointly affect complex traits (Ghazalpour et al., 2006). The use of clustering may reveal evidence of collective effects of networks of genes on brain pathology. For example, some gene sets associated with neuronal integrity may show altered expression patterns in schizophrenia patients (Torkamani et al., 2010). Nevertheless, in situ DNA sampling and analysis with microarrays may only be feasible in the postmortem brain; moreover, the spatial scope and resolution in tissue sampling across the brain are limited.
In this study, we developed a novel approach to gene discovery in the living brain, outlined in Figure 1. We examined networks of genes that influence brain integrity measured by diffusion tensor images (DTI). DTI is a variant of standard MRI, measuring the 3D directionality of water diffusion at each location in the brain; fractional anisotropy (FA), derived from DTI, is a widely accepted measure of fiber tract coherence and integrity (Basser et al., 1994). By adapting classical twin methods and hierarchical clustering to DTI scans from 472 twins and their nontwin siblings, we identified patterns of brain regions where white matter integrity, measured by FA, was under strong and relatively homogeneous genetic control. We then performed GWA to identify single nucleotide polymorphisms (SNPs) associated with FA in these regions. We discovered SNP pairs that tended to jointly affect brain regions. By applying network topology analysis to this SNP association matrix, we isolated an SNP network influencing white matter integrity in the living brain. This gene network showed small-world and scale-free properties, with several dominant hubs and structured subnetworks. These configurations may make the network biologically economical and more resistant to disruption than random networks (Albert et al., 2000; Rubinov and Sporns, 2010). Hub SNPs in the network interacted with each other in their effects on white matter integrity. They may also influence intellectual performance by modulating the associations between white matter integrity and the performance intelligence quotient (IQ).
Materials and Methods
Five-hundred and thirty-one healthy adult subjects (age: 23.7 ± 2.1 years, mean ±SD; age range: 20–29 years; sex: 217 male/314 female)—consisting of pairs of twins and their nontwin siblings—were recruited from 271 different families. Twin subjects were recruited to study genetic effects that underlie different traits, by comparing the similarity of monozygotic (MZ) and dizygotic (DZ) twins/siblings. All subjects received high-resolution brain MRI and neurocognitive evaluations as part of a 5-year research project evaluating healthy Australian twins. The projected sample size for the adult study is 1150 at completion (de Zubicaray et al., 2008). As described previously (Chiang et al., 2009b), zygosity was established objectively by typing nine independent DNA microsatellite polymorphisms (polymorphism information content >0.7), using standard PCR methods and genotyping. Results were cross-checked with blood group (ABO, MNS, and Rh), and phenotypic data (hair, skin, and eye color), giving an estimated overall probability of correct zygosity assignment >99.99%. Zygosity was subsequently confirmed by GWA. All subjects were screened to exclude left-handers and cases of pathology known to affect brain structure, a history of significant head injury, a neurological or psychiatric illness, substance abuse or dependence, or a psychiatric disorder in any first-degree relative. The study was formally approved by the Institutional Review Boards of the Queensland Institute for Medical Research and the University of California, Los Angeles. Written informed consent was obtained from all participants before any experimental procedures were performed.
Genotyping for subjects in this study is part of a larger genotyping project that includes >4400 twins and their siblings, designed to identify melanoma risk factors (Wright and Martin, 2004; Medland et al., 2009). DNA was isolated from blood cells using standard protocols. Genomic DNA samples were analyzed on the Human610-Quad BeadChip (Illumina). Of single nucleotide polymorphisms (SNPs), 592,392 were genotyped. SNPs that did not satisfy the following quality control criteria were excluded: genotype call rate <95%, significant deviation from Hardy–Weinberg equilibrium p <10−6, minor allele frequency <0.01, or only one allele was observed, and a platform-specific recommended quality control score (BeadStudio GenCall score) <0.7. After quality control, 529,497 SNPs remained. Genotype data were collected for 484 of the 531 subjects who received brain MRI scanning. We excluded 12 subjects who were identified to be ancestry outliers (Medland et al., 2009) from the subsequent genome-wide association study (GWAS), so there were 472 subjects with available image and genome-wide SNP data (age: 23.7 ± 2.1 years, mean ± SD; age range: 20–29 years; sex: 193 M/279 F). The participants' demographic information and the twin/sibling family composition are summarized in Table 1.
Evaluation of psychometric intelligence.
General intellectual ability was assessed at age 16 [as in the study by Luciano et al. (2003)] using the Multidimensional Aptitude Battery (MAB) (Jackson, 1984), a measure highly correlated with the Wechsler Adult Intelligence Scale. MAB is designed for assessment of adults and adolescents aged 16 and older. In this study, we examined three verbal (information, arithmetic, and vocabulary) and two performance (spatial and object assembly) sub tests. Each subtest gave a raw score, and verbal (VIQ), performance (PIQ), and full-scale (FIQ) intelligence quotient standardized scores were derived from these sub tests. IQ data were available for 456 participants (Table 1).
Image processing and registration.
All MR images were collected using a 4 tesla Bruker Medspec MRI scanner (Bruker Medical), with a transverse electromagnetic (TEM) headcoil, at the Center for Advanced Imaging (University of Queensland, Australia). Diffusion-weighted scans were acquired using single-shot echo planar imaging with a twice-refocused spin echo sequence, to reduce eddy-current induced distortions. Imaging parameters were as follows: 21 axial slices (5 mm thick), FOV = 23 cm, TR/TE 6090/91.7 ms, 0.5 mm gap, with a 128 × 100 acquisition matrix. Thirty images were acquired: three with no diffusion sensitization (i.e., T2-weighted images) and 27 diffusion-weighted images (b = 1145.7 s/mm2) with gradient directions evenly distributed on an imaginary hemisphere. The reconstruction matrix was 128 × 128, yielding a 1.8 × 1.8 mm2 in-plane resolution. Total scan time was 3.05 min. We used the FMRIB software library (FSL, http://www.fmrib.ox.ac.uk/fsl/) to preprocess and linearly align the diffusion images. For each participant, motion artifacts were corrected by linearly registering all the T2-weighted and diffusion-weighted images to one of the T2-weighted images (the “eddy_correct” command). Then the three T2-weighted images were averaged and stripped of nonbrain tissues to yield a binary brain extraction mask (cerebellum included), using the Brain Extraction Tool (BET) (Smith, 2002), followed by expert manual editing, if necessary. The masked T2-weighted image was then registered to a standardized high-resolution brain MRI template defined in the International Consortium for Brain Mapping space (ICBM) (Holmes et al., 1998) with a nine-parameter linear transformation using the software FLIRT (Jenkinson and Smith, 2001). The resulting transformation parameters were used to rotationally reorient the diffusion tensors (computed from diffusion-weighted images using the “DTIFIT” command) at each voxel (Alexander et al., 2001). The tensor-valued images were linearly realigned based on trilinear interpolation of the log-transformed tensors (Arsigny et al., 2005) and resampled to isotropic voxel resolution (with dimensions: 128 × 128 × 93 voxels, resolution: 1.7 × 1.7 × 1.7 mm3). A fractional anisotropy (FA) map was then constructed from each affine-registered diffusion tensor image. FA is defined as the ratio of the SD to the root mean square of the eigenvalues of a diffusion tensor (Basser and Pierpaoli, 1996). Each participant's FA image was then registered to a mean FA image computed for the first 258 subjects scanned [a subset of the study sample in this paper; (Chiang et al., 2009a)]. For this alignment step, we used a validated fluid registration algorithm that maximizes the Jensen–Rényi divergence of the joint intensity histogram of the two FA images (Chiang et al., 2007).
We averaged the fluidly registered FA images across all subjects and restricted subsequent data analysis to regions with average FA > 0.2, as recommended by Smith et al. (2006). This focused our regions of interest on major white matter fiber structures. Each participant's FA map was smoothed using an isotropic Gaussian filter with full-width at half maximum (FWHM) = 6 mm (Smith et al., 2006).
Defining regions of interest for GWA.
We defined regions of interest (ROIs) for GWA from brain regions where genetic influences on white matter integrity were high, based on a standard structural equation model (SEM) to model the observed variation of FA, which is widely used in twin studies. The SEM partitions the observed variance into components (proportions of the overall variance) due to additive genetic factors (A), shared environment (C), and unique environment (E). In our sample, where nontwin siblings are also included, we used the extended twin design (Posthuma et al., 2000; Lenroot et al., 2009) where the covariance matrix for FA was modeled for each family based on the known genetic similarity between relatives. The parameters of the SEM were estimated using a maximum likelihood scheme to maximize their fit (Neale et al., 1992). The significance map for the genetic factors, p(A), was determined as minus two times the difference between the log-likelihood of the full (ACE) and the restricted (CE) model at each voxel, which is asymptotically distributed approximately as a χ2 distribution with one degree of freedom. p(A) was further assessed using the false discovery rate method by Benjamini and Hochberg (1995) (BH-FDR) to correct for multiple comparisons. We selected voxels where the genetic component A contributed to at least 60% of the total variation in FA and its significance passed the BH-FDR ≤ 0.05 threshold. This led to 4876 candidate voxels for the ROIs (5% of the total 99,480 voxels). By doing this, we ensured that the main effect (i.e., genetic influences on FA) for the genetic correlation analysis below was significant, and no more than 5% of the voxels declared as genetically influenced are likely to be false positive findings.
We estimated the genetic correlation coefficient rg between any two voxels, using the cross-trait cross-twin method (Chiang et al., 2009b), yielding a 4876-by-4876 genetic correlation matrix. rg is the correlation between the genetic effects influencing two traits x and y, i.e., Cov(Gx,Gy)/, where Gx and Gy are the genetic effects that influence traits x and y, respectively. Var(Gx) and Cov(Gx, Gy) denote the variance of Gx and the covariance of Gx and Gy. Note that rg is not the proportion of total covariance due to genes, i.e., Cov(Gx,Gy)/ [Cov(Gx,Gy) + Cov(Ex,Ey)], where Ex and Ey are respectively the environmental effects that influence traits x and y; this term can be quite small even if the genetic correlation is very large (even unity). We then identified clusters of voxels that were highly similar in their genetic determination, and defined these as our ROIs, using the hierarchical clustering method in the Matlab Statistics Toolbox (The MathWorks). We assumed that it would be more likely to detect associations between SNPs identified across the genome and FA in these ROIs than in other regions, as the FA of voxels in an individual ROI was strongly determined by a common set of genes. Hierarchical clustering initially treated each of the 4876 voxels as a node in a network. It began with each node as a single cluster, and merged every two nodes with the highest similarity, or the shortest distance, into a successively larger cluster. Genetic similarity between any two nodes was quantified by the topological overlap (TO) index, which was based on rg and measured the interconnectivity between the two nodes (Ravasz et al., 2002; Zhang and Horvath, 2005), as follows: Here aij = 0 or 1 is the adjacency function between nodes i and j. aij = 1 if the genetic correlation rg between them is not <0.95, and indicates that the two nodes are connected. is the number of nodes connected to both nodes i and j, and is the degree of node i (Rubinov and Sporns, 2010). TOij lies between 0 and 1; a higher value of TOij indicates that nodes i and j are linked to more common neighbors. We used the TO index for network clustering, as it has been shown to provide more distinct aggregations of network nodes than direct correlation (rg) (Zhang and Horvath, 2005). We defined the distance between nodes i and j as dij = 1 − TOij. The distance between two clusters that contain more than one node was given by the average distance across all pairs of nodes in the two clusters. Finally, we created a hierarchical tree of clusters at different distance levels (Fig. 2). Here, a cluster at level d means that the distance between every pair of all the subclusters it contains is not greater than d. By setting a threshold d = 0.5, we obtained 233 clusters. We then selected those of no less than 50 voxels (246 mm3) in size as our ROIs; 18 ROIs were identified (Fig. 2). Setting a high threshold on ROI size might bias gene hunting in favor of SNPs with extensive effects on white matter integrity but ignore those with strong associations with FA in small regions only. Even so, we assumed that effects detected in larger ROIs are more likely to be neurophysiologically meaningful, especially when detected in the presence of imaging noise.
GWA for white matter integrity.
The mean FA for each ROI was associated respectively with each SNP across the whole genome (n = 529,497) using the Quantitative Transmission Disequilibrium Test (qTDT) model, where the phenotype (mean FA of the ROI here) was regressed against the SNP genotypic value, with variance components that accounted for within-family phenotypic correlations included in the model (Abecasis et al., 2000). If an SNP has two alleles A and a, and allele A is the major allele that appears more frequent than allele a, then the SNP genotypic value equals 1 for AA, 0 for Aa, and −1 for aa. Subjects' age and sex were included as covariates. qTDT tests the significance of the association between FA and an SNP genotype effect by comparing the difference in log-likelihood functions between the full model (the SNP effect was included) and the restricted (the SNP effect was excluded) model. To increase power for detecting associations, we used the whole genotypic value of the SNPs for the association analysis (total association), instead of dividing the genotypic value into between-family and within-family components (Abecasis et al., 2000). Since the ethnic outliers had been removed from GWA, the risk for detecting spurious associations due to population stratification was minimized.
Corrections for multiple comparisons.
We adopted a two-stage approach to correct for multiple comparisons across the genome and across the image. For associations between FA and SNPs, we first used the traditional Bonferroni method to correct for multiple comparisons across the genome (Lander and Kruglyak, 1995): SNPs with a p value <0.05/(number of SNPs across the genome) = 0.05/529,497 = 9.4 × 10−8 were declared to be significantly associated with FA. For each of these significant SNPs, we then further corrected for multiple comparisons across the ROIs using the bootstrap method (Rhodes et al., 2002). The p value of the SNP was converted to a Z value using the inverse normal cumulative distribution function. Then the p value adjusted for multiple comparisons, denoted by q, was derived by comparing the sum of Z values across all the 18 ROIs for that SNP with the bootstrap distribution of 100,000 sums of Z values randomly selected from the 529,497 SNPs. An SNP with q < 0.05 was considered to reach overall significance.
Genetic interconnection network and network topology.
We then extended our analysis from detecting the effects of individual genetic loci at single locations in the image, to constructing a genetic network that underlies white matter integrity over multiple brain regions. Our methods were inspired by previous work deriving gene coexpression networks (Zhang and Horvath, 2005; Ghazalpour et al., 2006) but modified to deal with the case where SNPs in genes may influence multiple phenotypes (different parts of a brain image). In the network scheme, the connectedness between SNPs was derived from the correlations between their phenotypic effects. We selected SNPs whose association with FA in at least one ROI was with a significance p value < 10-5; this criterion was somewhat relaxed (compared with the very strict GWA-significant threshold with p < 9.4 × 10−8) to include more SNPs into the network analysis and might therefore boost sensitivity for detecting interactions of gene effects. Three-hundred and twenty-nine SNPs were selected. We then computed the Spearman's rank correlation coefficient ρ for every pair of SNPs, based on their partial regression coefficients with respect to FA across all the 18 ROIs. The partial regression coefficient β for an SNP represents change in FA due to the unit additive effect of the major allele. β could be positive, zero, or negative, depending on the direction of the SNP major allele effect on FA. We used Bi to denote the vector that contains β values across all the 18 ROIs for SNP i. SNPs i and j were considered to be connected (i.e., aij = 1) if (1) the correlation between Bi and Bj, denoted by ρij was no less than a threshold ρT, and, (2) the two SNPs were not in linkage disequilibrium (LD), defined as a pairwise LD r2 for the SNPs ≤ 0.2 (Hill and Robertson, 1968). The latter criterion ensured that the connectedness between these two SNPs was not simply due to their positional proximity in the chromosomes, which was reflected by a higher LD between them. The topological configuration of the genetic network was a function of ρT. Here, we tested different ρT values from 0.4 to 0.9, with increments of 0.1. For each ρT, we computed three common measures of the network: the degree of each node (SNP), the clustering coefficient C, and the characteristic path length L (Watts and Strogatz, 1998; Bullmore and Sporns, 2009; Rubinov and Sporns, 2010). The clustering coefficient C is a measure of segregation in the network; a higher value of C indicates that the nodes of the network tend to form isolated clusters (Watts and Strogatz, 1998). On the other hand, the characteristic path length L is an index of integration in the network. It measures the average shortest path length between all pairs of nodes in the network; a smaller L indicates that the topological distance between the segregated clusters in the network is shorter, or the integration between these segregated clusters is more efficient. When a network is simultaneously highly segregated and integrated, it is in a small-world topology (Rubinov and Sporns, 2010). This small-world property is quantitatively estimated by the “small-worldness” of the network (Humphries and Gurney, 2008): S = (C/Crand)/(L/Lrand). Here, Crand and Lrand are the mean clustering coefficient and the characteristic path length of Nr random networks, where each random network was generated by randomly rewiring the links of the original network but keeping the distribution of the degree of the nodes unchanged (Maslov and Sneppen, 2002). Nr was set to 1000. A network is deemed a small-world if S > 1 (Humphries and Gurney, 2008).
We further tested whether the genetic network was also in a scale-free topology. The degree distribution of a scale-free network follows a power law, i.e., f(k) ∝ k−λ, where f(k) is the frequency of the nodes that have degree k (Barabasi and Albert, 1999). A scale-free network is known to be more tolerant of random errors (Albert et al., 2000).
Gene ontology enrichment analysis.
To identify possible functional roles for the genes in the network, we performed gene ontology (GO) enrichment analysis using the web-based Gene Set Enrichment Analysis Toolkit (WebGestalt2) (Zhang et al., 2005; Duncan et al., 2010). The 329 SNPs selected for network analysis were first annotated to corresponding genes, where possible, by searching the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/batchquery.html), yielding a total of 101 genes. These genes were then compared with all genes in the human genome using the hypergeometric test, and the genes were assigned, where possible, to enriched GO categories under three main domains, cellular components, molecular functions, and biological processes. Each GO category contained no less than two genes. Multiple comparisons across these enriched categories were corrected using the BH-FDR method (Benjamini and Hochberg, 1995), with the significance level for the FDR-adjusted p value set to 0.05. The results were represented by an enriched directed acyclic graph (DAG).
Detecting genetic interactions between SNPs and interaction effects of SNPs on FA-IQ associations.
Finding connections between SNPs defined in the above section may help to search for epistasis (genetic interactions) between SNPs that affect white matter integrity, i.e., where the presence of one SNP may modify or interact with the effect of other SNPs. To explore this, we estimated the interaction between SNPs whose associations with FA were with a significance p value < 10−5 in at least one ROI (n = 329 SNPs), by adding an interaction term to the qTDT model: y = μ + xNβN + g1β1 + g2β2 + (g1 · g2) βint.
Here, y is the mean FA within an ROI. For voxel-based analysis, y is the FA of an individual voxel in the white matter mask (defined as FA > 0.2). μ is the overall mean of FA, and xN is a vector of covariates that includes the subjects' age and sex. g1 and g2 indicate the genotypic value of SNP1 and SNP2 respectively, and (g1 · g2) represents the interaction between the two SNPs.
We also tested whether the hub SNPs in the scale-free genetic network (see Results for the 9 hub SNPs) may affect IQ via its modulatory effect on the association between white matter integrity and IQ. Here, we estimated the interaction of a hub SNP with FA and the PIQ scale at every voxel within the white matter mask; we selected PIQ, as in our previous study (Chiang et al., 2009b) we found that the cross-trait correlation between FA and PIQ was strongly mediated by overlapping genetic factors. This was modeled by: y = μ + xNβN + g · β1 + PIQ · β2 + (g · PIQ) βint.
Here, g is the genotypic value of a hub SNP, PIQ indicates the participant's PIQ scale, and (g · PIQ) represents the interaction between the hub SNP and PIQ. The definition of y, μ, and xN are the same as in the previous regression equation for genetic interactions between SNPs.
βint ≠ 0 indicates a significant interaction effect, which was determined as minus two times the difference between the log-likelihood of the full model and the restricted (βint was excluded) model, which is asymptotically distributed approximately as a χ2 distribution with one degree of freedom. We then used the Bonferroni method to correct for multiple comparisons across SNPs, with the p threshold set equal to 0.05/ns = 1.7 × 10−6. Here, ns = 29,180 is the number of all possible pairs of the SNPs (= 329 × 328/2), minus those in linkage disequilibrium (LD r2 for the SNPs ≤ 0.2). To correct for multiple comparisons across the ROIs or the image volume, for the ROI analysis (for genetic interactions between SNPs), the interaction between two SNPs was regarded as significant, if the significance p value of βint in any of the 18 ROIs was less than the threshold that controlled the BH-FDR within 5%. For the voxel-based analysis (for genetic interactions between SNPs, and interactions of the hub SNP with FA and PIQ), as the statistical comparisons were performed in ∼100,000 voxels, we used a more stringent but accepted method, known as topological FDR (Chumbley and Friston, 2009), to control the expected proportion of false positive findings over all connected sets of voxels, i.e., clusters. We controlled the topological FDR of clusters within 5%, which means that only 5% of the clusters that are declared to be significant tend, on average, to be false positive findings.
To perform GWA for white matter integrity measured by FA, we first identified 18 ROIs using hierarchical clustering based on the 4876-by-4876 topological overlap (TO) matrix (Fig. 2). We assumed that there should be more power to detect associations between FA in these ROIs and SNPs identified from the whole genome, as the FA of voxels within an individual ROI was determined by a common set of genes. Twenty-four SNPs were detected in GWA (Table 2), which showed strong “genome-wide significant” associations with white matter integrity (FA), with an uncorrected p value <9.4 × 10−8, and an overall significance (across all 18 ROIs) q value <0.05.
Next, we were interested in how individual SNPs were correlated under a network configuration to affect brain integrity, or in other words, how the genetic network drives the brain network. We defined the nodes of the network as SNPs that were associated with white matter integrity with a significance p value < 10-5 in at least one ROI (number of nodes = 329). We then examined the topological properties of the genetic network, by increasing the correlation threshold ρT from 0.4 to 0.9 (Table 3). Every ρT yielded a small-world network (S > 1). However, when the network consisted of nodes with stronger connections only (ρT ≥ 0.8), S increased abruptly, as also shown in Figure 3a, suggesting that the efficiency of the genetic network was determined by SNPs with stronger correlations in their effects. For the scale-free property, given that the slope of the fitted line for log(f(k)) versus log(k) should be negative in a typical scale-free configuration, the genetic network can be considered to be scale-free only when ρT ≥ 0.7, where p values for the fit are also very small. If we further considered the effect size for the above correlation fitted by requiring a signed R2 > 0.8 (Zhang and Horvath, 2005), only ρT = 0.9 yielded a scale-free network by this definition, where f(k) ∝ k−1.58 (Fig. 3b, signed R2 = 0.93, p = 3.26 × 10−11). Signed R2 here was derived by adjusting the sign of the Pearson's coefficient of determination R2, or signed R2 = −sgn(λ)·R2, where sgn represents the sign function. In other words, if the fitted line for log(f(k)) versus log(k) has a negative (positive) slope λ, it would have a positive (negative) signed R2. Note that ρT = 0.9 also gave the highest small-worldness index (S = 6.70), which corroborated the theoretical prediction that scale-free networks are ultra-small (Cohen and Havlin, 2003).
Figure 4 visualizes the genetic network at ρT = 0.9. The visual layout of these SNPs is based on the Kamada–Kawai algorithm (Kamada and Kawai, 1989) implemented in the Pajek software [http://vlado.fmf.uni-lj.si/pub/networks/pajek/ (Batagelj and Mrvar, 1998)]. This SNP network obeys a scale-free topology. This means that most SNPs have no or only scarce connections with other SNPs. However, a small proportion of SNPs is highly connected with each other and they are therefore the hubs of the network. For example, 134 SNPs (41% of all 329 SNPs) are isolated and not connected with any other SNPs (k = 1). On the other hand, 9 SNPs (< 3% of all SNPs) are highly connected with other SNPs, and thus the hubs of the network. They are (Fig. 4a, labeled as red circles) rs11686777 (k = 20), rs10894606, rs4245110, and rs4491241 (k = 18), rs7839054 (k = 17), rs10091460 (k = 16), rs17038289, rs2012223, and rs4872262 (k = 15) (Table 4). rs10894606, rs10091460, rs17038289, and rs4872262 are also genome-wide significant. These hub SNPs were associated with FA in the ROIs identified in major white matter tracts (Fig. 4b). Specifically, almost all (8 of 9) hub SNPs were associated with white matter integrity in ROI 13 in the superior corona radiata, a white matter structure containing axons projecting to and from the cerebral cortex.
To determine the functional significance of the genetic network, we further used the program, WebGestalt2, to assess gene ontology (GO) enrichment for the 101 genes to which the nodes (SNPs) of the network were annotated. Figure 5 shows the directed acyclic graph showing patterns of genetic enrichment. Thirteen genes are enriched in “biological processes,” with their function specified as “cell adhesion.” Fourteen genes belong to the “molecular functions” domain, particularly for calcium ion binding. Forty-two genes are enriched in cellular components relevant to the integrity of nerve fibers: 7 of them are implicated in the integrity of synapses, 33 for the plasma membrane, and 2 for voltage-gated sodium channel complexes. We further analyzed the seven genes in our network that influence the integrity of synapses, using the GNCPro program (http://gncpro.sabiosciences.com/gncpro/gncpro.php). GNCpro is a visualization/data integration system that integrates collective biological knowledge and databases to visualize gene expression interactions as a network (Liu et al., 2010). Figure 6 visualizes various interaction patterns between these seven genes and their neighboring genes in a network configuration. For example, the DMD (dystrophin) and the CACNA1C (calcium channel, voltage-dependent, L type, α 1C subunit) genes, both belong to the enriched GO category “synapse. ” They interact with each other by coexpressing with AVPR1B (arginine vasopressin receptor 1B) and THRA (thyroid hormone receptor, α)—two hormone-related genes that influence cerebral metabolism and cognitive function (Esaki et al., 2003; DeVito et al., 2009).
We found that network connectivity between SNPs determined their genetic interactions. Figure 7a shows that SNPs with more connections in the network had on average greater number of interactions with other SNPs. In other words, hub SNPs in the network are more likely to interact with other SNPs in their effects on FA than nonhub ones. This trend is clear in Figure 7b, where the number of interactions for an SNP with degree k is weighted by the probability of finding such an SNP. The fitted curve in Figure 7b shows that hub SNPs in the network are very promising candidates for genome-wide searches for genetic interactions (GWIA), as the likelihood for detecting interactions for these SNPs with others is raised to nearly the second power of their connectivity degree. Figure 8 displays a map of interactions between two hub SNPs at different chromosomes, rs10091460 and rs10894606 (the former is on chromosome 8 and the latter on chromosome 11; both are genome-wide significant). The voxel-based analysis shows that interactions between these two SNPs are highly significant across major white matter tracts, especially in the superior corona radiata and superior longitudinal fasciculus bilaterally.
We next investigated the functional roles of the hub SNPs by testing their interaction effects on FA-IQ associations. Six of the nine hub SNPs indirectly influenced performance intellectual performance by modulating the associations between white matter anisotropy and performance IQ (Fig. 9). This modulatory effect was significant in white matter tracts related to IQ (Chiang et al., 2009b), such as the fornix (rs10091460 and rs4872262), genu and splenium of the corpus callosum (rs17038289), cingulum (rs17038289), the anterior limb of the internal capsule (rs10091460 and rs4872262), optic radiation (rs2012223), superior longitudinal fasciculus (rs7839054), and superior corona radiata (rs4245110 and rs17038289). The sign of βint indicates the direction of the modulatory effects of these SNPs. We found βint > 0 for rs4245110 and rs2012223, meaning that the addition of the major allele of these SNPs was associated with an increase in slope for the PIQ (on the x-axis) regressed against FA (on the y-axis). By contrast, rs7839054, rs10091460, and rs4872262 had a value of βint < 0, indicating that the addition of the major allele of these SNPs was associated with decrease in slope of the FA-PIQ regression. By contrast with the other SNPs, rs17038289 showed both directions of genetic modulation, with βint > 0 in the cingulum and the genu and splenium of the corpus callosum, and with βint < 0 in the superior corona radiata.
In this study, we developed a new approach to identify networks of genetic variants that influence brain fiber integrity and intellectual performance in 472 young healthy adults. Voxel-based analysis is popular in imaging: when applied to genetic questions, it has identified genetic effects on regions of the brain without the need for an a priori hypothesis (Hulshoff Pol et al., 2006; Chiang et al., 2009b). Even so, directly performing GWA for imaging measures voxel-by-voxel may lose considerable power due to the sheer number of multiple tests—across both the image and the genome. Here we combined voxel-based and ROI approaches for GWA. Using genetic correlation to find voxels with common genetic influences, we defined ROIs, whose component voxels had FA values influenced by a highly overlapping set of genes (quantified by the genetic correlation matrix, rg(x,y), that compares all pairs of voxels). We detected 24 SNPs in 6 genes from the whole genome with significant effects on white matter integrity, after a highly conservative Bonferroni-type correction for multiple comparisons across the genome and across all ROIs.
The genetic network underlying white matter integrity
We then recovered a network of SNPs that influence the brain by linking SNPs with correlated effects on white matter integrity. The correlation coefficient between the phenotypic effects of two genetic loci measures the similarity of their expression patterns (Eisen et al., 1998). We further assumed that if two SNPs are connected in a network to affect the brain, the spatial distribution of their influences on brain integrity will also be similar, i.e., their partial regression coefficients (β) with respect to FA will be correlated. This correlation-based approach has been used in gene expression profiling, where genes with correlated expression levels in cDNA microarrays have been linked to form gene coexpression networks (Zhang and Horvath, 2005; Ghazalpour et al., 2006). We note that this approach to gene expression networks inspired our work here on SNP networks that influence brain images. Such gene coexpression networks in various organisms, e.g., mouse or yeast, have been shown to exhibit scale-free topology (Carlson et al., 2006; Ghazalpour et al., 2006). We note that a genetic network based on correlations between gene expression levels may better describe genetic influence on phenotypic traits than networks based on SNPs. Essentially, genes are functional units of heredity, and it is not always clear how SNPs impact gene expression. Nevertheless, the multivariate nature of a gene-wide association analysis makes it computationally intensive, especially when applied to imaging brain phenotypes (Hibar et al., 2011).
The genetic network influencing white matter integrity is both small-world and scale-free
The genetic network underlying white matter integrity exhibited a small-world topology. In a small-world network, interactions among network nodes (SNPs) are routed through “hub” nodes of the network, which makes the mean path length of the network smaller. Small-world networks are thus more efficient than random networks (Barabási and Oltvai, 2004). When the genetic network was defined by including only SNPs with strong correlations in their effects on the brain, the network became ultra-small, with a so-called “scale-free” topology, with connection probabilities approximately following a power law. Evolutionarily, hub genes (SNPs) in the scale-free genetic network may originate from gene duplication that follows the “preferential attachment,” or “rich-gets-richer” principle (Barabási and Oltvai, 2004). For example, when any gene linked to a hub gene is duplicated, since the duplicated gene shares the same function as the original gene to some degree, the connectivity degree (k) of the hub gene will increase by 1 (i.e., rich-gets-richer). On the contrary, an isolated gene (k = 1) may increase its number of connections with other genes only when it is duplicated itself, which is a low probability event. Scale-free genetic networks may be evolutionarily advantageous: they are relatively error tolerant, and may remain stable even if their nodes are randomly damaged (Albert et al., 2000). As white matter integrity is critical for normal brain function, its underlying genetic network may be quite robust, offering resistance to environmental stressors. The scale-free property observed in our genetic network has been reported for several other kinds of biological networks, e.g., in gene or protein interactions, and even in the internet and in other social networks (Ravasz et al., 2002). Brain connectivity, assessed using functional MRI, may also follow a scale-free distribution (Eguíluz et al., 2005).
Functional significance of the genetic network
The gene ontology enrichment analysis indicates that the genetic network for white matter integrity is also critical for neural transmission, affecting the integrity of synapses and plasma membranes, and the function of sodium-channel complexes. Moreover, hub SNPs of the network, i.e., SNPs that were highly connected with other SNPs, belong to genes crucial for normal brain function (Tables 4, 5): the OPCML gene is a member of the IgLONs glycoprotein gene family, modulating cell—cell recognition and axonal migration (Reed et al., 2007). SCN3A encodes the α subunit of the voltage-gated sodium channel. Sodium channels mediate the voltage-dependent permeability of excitable membranes to sodium ions, facilitating the generation and propagation of action potentials in neurons (Malo et al., 1994). The HADH protein is one of the enzymes in the fatty acid β-oxidation pathway, which catalyzes the conversion of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, a key step in fatty acid metabolism during brain development (Reichmann et al., 1988). This gene is also involved in amyloid β-peptide binding in Alzheimer's disease (He et al., 1998).
These hub SNPs influenced intellectual performance by modulating the associations between performance IQ and the integrity of the fiber systems that are part of visuospatial circuits (Chiang et al., 2009b; Kravitz et al., 2011), including the optic radiations (Tuch et al., 2005), fornix (Kwok and Buckley, 2010), corpus callosum (Begré et al., 2007), cingulum bundle (Bush et al., 2000), and the superior longitudinal fasciculus, which is associated with spatial working memory performance in children (Vestergaard et al., 2011). Human intelligence is highly heritable; a substantial proportion (40–50%) of the variation in IQ is genetically determined (Davies et al., 2011). However, effects of individual SNPs on IQ are very small (∼1% of variation in IQ), indicating that genetic variants may influence intellectual performance via the accumulated effects of multiple SNPs (Davies et al., 2011), or, as found here, as a modulatory factor that interacts with both white matter integrity and IQ. We also found that the same genotype of an SNP, e.g., rs17038289, could impose both positive and negative modulatory effects on FA–IQ associations in different brain regions. This may be attributed to differences in the microenvironment of the brain area where a gene is expressed (Posner and Rothbart, 2005; Lein et al., 2007; Liang et al., 2007). A similar example of a region-specific gene effect has also been found for the Val66Met polymorphism of the brain-derived neurotrophic factor (BDNF), a gene crucially related to cognitive performance. Dominant homozygote genotypes (Val/Val) were associated with greater volume in the hippocampus and the prefrontal cortex (Pezawas et al., 2004), but with lower integrity in major white matter tracts such as the corpus callosum and the optic radiation (Chiang et al., 2011).
Network connectivity predicts genetic interactions
The SNP network detected in Figure 4 may help to guide the search for genes that interact to influence white matter integrity. The connected SNPs, especially those at network hubs, strongly interact with each other in their effects on white matter integrity. Finding interactions between genetic variants may help in explaining their effects on normal phenotypes or disease expression (Prata et al., 2009); higher-order interactions between genetic variants are typically overlooked in GWA designs, which typically seek associations between traits and individual SNPs one at a time (Becker et al., 2011). To address this, genome-wide interaction analysis (GWIA) has become popular in recent years, to search for possible interactions between pairs of SNPs on traits of interest (Marchini et al., 2005; Becker et al., 2011). Even so, the computational requirements to apply GWIA to imaging studies are currently prohibitive. For a single voxel, a 500,000-SNP GWA requires 1.25 × 1011 statistical tests to detect SNP interactions. There is a concomitant loss in power as more stringent corrections are needed for the vast number of tests involved in GWIA versus GWA (Becker et al., 2011). Computational effort may be wasted in such a brute-force search, as most genomic SNPs have no detectable phenotypic effect, both here and in other GWA (Visscher, 2008; Stein et al., 2010). In our analyses, the network connectivity of a given SNP predicts its likelihood in interacting with other SNPs. Our method offers a simpler approach to GWIA and selects candidate SNPs by detecting correlated effects on FA; computing correlations between SNPs' partial regression coefficients (β) with respect to FA is much easier than testing how every SNP pair might interact, using a second-order regression. This may boost power to detect interactions, as promising interactions are prioritized based on advantageous search criteria.
Our method will likely empower work in imaging genomics and connectomics (e.g., the Human Connectome Project). Specifically, associations between the SNPs detected here and FA, and the modulatory effects of the hub SNPs on associations between FA and IQ, require replication. While our study was under review, a related study was published that used clustering of the genetic correlation matrix, in twins, to define brain regions with common genetic determination (Chen et al., 2012). In that paper, the authors partitioned the cortical surface, modeled using anatomical MRI, into several regions that had partially overlapping genetic determinants. It was speculated that using these genetically clustered regions may boost the power for subsequent genome-wide analysis using GWAS. However, that study did not perform GWAS on the clustered cortical regions. In our current study, we have shown that a related method for genetic clustering can indeed be applied to diffusion images of the brain, and when GWAS was performed on the resulting clusters, the method yielded promising genome-wide associations. We further showed that the resulting SNP network showed a characteristic network topology, with scale-free and small-world properties, and implicated specific molecular pathways in driving brain integrity. Rare DNA sequence variations that contribute to genetic influences on FA may go undetected in GWA even when sample sizes are very large (Schork et al., 2009). When data from large independent cohorts become more widely available, for example via the Enigma project (http://enigma.loni.ucla.edu), two advantages of this genetic network analysis will be apparent: (1) genetic correlation can tap into the natural latent structure of gene action in a brain image; (2) voxel clustering by genetic affinity should boost power to find SNPs with correlated effects in genome-wide scans. SNP aggregation can be used for dimension reduction, so the resulting networks are strong candidates for verification and replication, without the computational and statistical burden of a full image-wide genome-wide search.
This study was supported by Grant RO1 HD050735 from the National Institute of Child Health and Human Development, and Project Grant 496682 from the National Health and Medical Research Council, Australia. The collection of IQ data and zygosity typing was supported by the Australian Research Council (A7960034, A79906588, A79801419, DP0212016). M.-C.C. is supported by the National Science Council (NSC 100-2218-E-010-001-MY2) and the Brain Research Center, National Yang-Ming University (a grant from Ministry of Education, Aim for the Top University Plan, 100AC-B12), Taiwan. G.I.d.Z. is supported by an Australian Research Council Future Fellowship (FT0991634). Additional support for algorithm development was provided by the NIA, NIBIB, and the National Center for Research Resources (AG016570, EB01651, RR019771 to P.M.T.). We are also grateful to the twins for their willingness to participate in our studies, and research nurses, Marlene Grace and Ann Eldridge, Queensland Institute of Medical Research, for twin recruitment.
The authors report no competing financial interests.
- Correspondence should be addressed to Dr. Paul Thompson, Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA 90095-7332.