Heritability of Working Memory Brain Activation

Although key to understanding individual variation in task-related brain activation, the genetic contribution to these individual differences remains largely unknown. Here we report voxel-by-voxel genetic model fitting in a large sample of 319 healthy, young adult, human identical and fraternal twins (mean ± SD age, 23.6 ± 1.8 years) who performed an n-back working memory task during functional magnetic resonance imaging (fMRI) at a high magnetic field (4 tesla). Patterns of task-related brain response (BOLD signal difference of 2-back minus 0-back) were significantly heritable, with the highest estimates (40–65%) in the inferior, middle, and superior frontal gyri, left supplementary motor area, precentral and postcentral gyri, middle cingulate cortex, superior medial gyrus, angular gyrus, superior parietal lobule, including precuneus, and superior occipital gyri. Furthermore, high test-retest reliability for a subsample of 40 twins indicates that nongenetic variance in the fMRI brain response is largely due to unique environmental influences rather than measurement error. Individual variations in activation of the working memory network are therefore significantly influenced by genetic factors. By establishing the heritability of cognitive brain function in a large sample that affords good statistical power, and using voxel-by-voxel analyses, this study provides the necessary evidence for task-related brain activation to be considered as an endophenotype for psychiatric or neurological disorders, and represents a substantial new contribution to the field of neuroimaging genetics. These genetic brain maps should facilitate discovery of gene variants influencing cognitive brain function through genome-wide association studies, potentially opening up new avenues in the treatment of brain disorders.


Introduction
Functional magnetic resonance imaging (fMRI) is a powerful tool for interrogating the mechanisms of the brain's response to different environmental stimuli. However, even with a rigidly standardized stimulus or task, we know that the brain's response is highly variable between people. It is of considerable interest to know how much of this variability is due to genetic differences between people, how much to their (unique) environmental experience, and how much to measurement error. To tease apart these contributions to individual variability, we measured brain response to an n-back working memory (WM) task. Activation during this task is affected in persons with genetically influenced neurodegenerative (Wishart et al., 2004) and neuropsychiatric disorders (Callicott et al., 1998;Matsuo et al., 2007), and in healthy family members at increased genetic risk for some of these disorders (Callicott et al., 2003;Drapier et al., 2008). In addition, prior studies have shown that working memory task performance measures are heritable phenotypes (Ando et al., 2001;Luciano et al., 2001;Polderman et al., 2006). Furthermore, the n-back task robustly activates areas (Owen et al., 2005) shown to be highly heritable in twin studies of brain structure (Schmitt et al., 2007). Although substantial heritability for certain features of brain structure has now been firmly established by twin studies, only a handful of neuroimaging studies has examined the heritability of task-related neural activity, as captured by fMRI. These studies, mostly involving a region-of-interest (ROI) approach, have had mixed findings, ranging from no genetic effect to strong genetic influences (Côté et al., 2007;Matthews et al., 2007;Polk et al., 2007;Blokland et al., 2008;Koten et al., 2009). This may be attributable to differences in task paradigms and analysis approaches and/or lack of statistical power due to small sample sizes.
Here we investigated the relative contributions of genetic (heritability) and environmental influences to individual variation in task-related brain activation across the brain using fMRI in a large genetically informative sample of identical [monozygotic (MZ)] and fraternal [dizygotic (DZ)] twins. This study is a substantive extension on our first analysis (Blokland et al., 2008), both in terms of the complexity of the voxel-by-voxel analyses and the sample size. A total of 319 healthy, young adult MZ and DZ twins performed the 0-and 2-back versions of an established spatial, numerical n-back task (Callicott et al., 1998(Callicott et al., , 2003Blokland et al., 2008). This task requires online monitoring, updating, and manipulation of remembered information, and is therefore assumed to place great demands on a number of key processes essential to working memory (Owen et al., 2005).

Materials and Methods
Participants. Twins between 20 and 30 years of age who had participated in the Brisbane Twin Cognition study at 16 years of age (Wright and Martin, 2004) were invited to participate in the present study. Before inclusion, twins were assessed for handedness; and were screened (by self-report) for their suitability for imaging; for significant medical, psychiatric, or neurological conditions, including head injuries; for a current or past diagnosis of substance abuse; and for current use of cognitionaffecting medication. Of the 366 participants thus far, 7 were excluded due to excessive head motion or incomplete datasets, and 40 because of inadequate task performance. The final sample of 319 twins included 75 MZ pairs (46 female, 29 male), 66 DZ pairs (30 female, 11 male, 25 opposite sex), and 37 unpaired twins (12 MZ, 25 DZ; 22 female, 15 male), aged 23.6 Ϯ 1.8 years (age range, 20 -28 years), and all were right handed. Table 1 describes the sample demographics. Seventy-one percent of twin pairs were scanned on the same day, with the remainder, on average, within 12 d of each other. The scanning session lasted 60 min, and each participant received a $100 gift voucher in appreciation of their time. The study was approved by the Human Research Ethics Committees of the Queensland Institute of Medical Research, University of Queensland, and Uniting Health Care. Written informed consent was obtained from each participant.
Genotyping. Details regarding initial determination of zygosity have been described previously (Wright and Martin, 2004;Blokland et al., 2008). Zygosity typing was later confirmed by genome-wide single nucleotide polymorphism genotyping (Illumina 610K chip).
Psychometric intelligence testing, birth information, and socioeconomic status. General intellectual ability was assessed at age 16 years as part of the Brisbane Twin Cognition study (Wright and Martin, 2004) using three verbal and two performance subtests from the Multidimensional Aptitude Battery (Jackson, 1984), a measure that shows good temporal stability (Jackson, 1984;Harrell et al., 1987;Luciano et al., 2003) and is highly correlated with the Wechsler Adult Intelligence Scale (Luciano et al., 2003). A full-scale intelligence quotient (FIQ) standardized score was derived from these subtests (Wright and Martin, 2004). FIQ for one DZ female was unavailable due to computer hardware failure. The mean interval between cognitive testing and MRI scanning was 7 years and 9 months (range, 3.8 -11.6 years). Birth information (gestational age and birth weight) was obtained from a parent, usually the mother, as part of the cognition study. Socioeconomic status was obtained from parental report and rated according to the Australian Socioeconomic Index 2006(McMillan et al., 2009. Experimental procedure. Imaging was conducted on a 4 tesla Bruker Medspec whole-body scanner (Bruker) in Brisbane, Australia. Functional images were acquired using a T2*-weighted gradient echo-planar imaging (EPI) sequence, sensitive to blood oxygen level-dependent (BOLD) contrast [interleaved; repetition time (TR) ϭ 2100 ms; echo time (TE) ϭ 30 ms; flip angle ϭ 90°; field of view (FOV) ϭ 230 ϫ 230 mm], and using a radio-frequency receive-transmit transverse electromagnetic head coil (M&R Devices) (Vaughan, 1999). Geometric distortions in the EPI images caused by magnetic field inhomogeneities at high field were corrected using a point-spread mapping approach (Zeng and Constable, 2002;Zaitsev et al., 2003). Over a continuous imaging run, we acquired 127 axial brain volumes, one volume every 2.1 s, with 36 coronal slices of 3 mm thickness (64 ϫ 64 matrix; voxel size, 3.6 ϫ 3.6 ϫ 3.0 mm), and with a 0.6 mm slice gap. In addition to the functional scans, 3D T1-weighted images were acquired (MPRAGE; TR ϭ 1500 ms; TE ϭ 3.35 ms; TI ϭ 700 ms; pulse angle ϭ 8°; coronal orientation; FOV ϭ 230 mm 3 ; 256 ϫ 256 ϫ 256 matrix; slice thickness ϭ 0.9 mm).
During functional imaging, the participants performed the 0-back and 2-back versions of the spatial n-back working memory task based on Callicott et al. (1998Callicott et al. ( , 2003. See Figure 1 in Blokland et al. (2008). In this task, a number (1-4, randomized) was presented in a fixed position in one of four white circles, positioned at the corners of a diamond-shaped square. Stimuli were projected using a digital light video projector and presented on a screen at the foot of the scanner bed, viewed through a mirror mounted on the head coil. A fiber-optic response box, with four buttons arranged in the same configuration as the numbers presented on the screen, was used for responses. Participants pressed one of the four buttons to match the target stimulus. For n ϭ 0 (i.e., 0-back), a simple button press in response to the number displayed was required. For n ϭ 2 (i.e., 2-back), participants pressed the button corresponding to the number presented two trials before the current one. Thus, the 2-back condition required both the maintenance of the last two numbers in memory and the updating of these encoded stimuli as each new stimulus was presented (Fletcher and Henson, 2001). While difficulty increased from 0-back to 2-back, the stimulus information and demands on response selection and execution were the same within levels. Task conditions were run in blocks with the level of the task shown on the screen, and the background color of the diamond-shaped square changing from blue (0-back) to yellow (2-back). Participants were scanned through 16 alternating blocks of the 0-back and 2-back conditions (i.e., 8 blocks/ condition). Each block consisted of 16 trials, with a stimulus presentation time of 200 ms and an interstimulus interval of 800 ms, resulting in a total experimental length of 4 min and 16 s (256 s). Participants were fully trained on the task before being positioned in the scanner, with each performing a minimum of four training blocks (two per condition). The importance of effort and commitment to the task was emphasized. To familiarize participants with the scanner and the response box, an additional set of practice trials were given once they were positioned in the magnet, with the pulse sequence running in the background. Task performance was measured as the percentage of correct responses (accuracy) and average response time (across correct trials) for each of the task conditions separately. The mean percentages correct in the 0-and 2-back conditions were 88.3 and 71.5, respectively (Table 1). On the 0-back condition, 98.1% of the sample had Ͼ50% accuracy, and 99.7% had Ͼ40% accuracy. On the 2-back condition, 84.6% of the sample had Ͼ50% accuracy, and 93.4% had Ͼ40% accuracy.
Image preprocessing. Images were processed and analyzed using Statistical Parametric Mapping software (SPM5, Wellcome Department of Cognitive Neurology) implemented in MATLAB (MathWorks). The first five EPI volumes were discarded to ensure that steady-state tissue magnetization was reached. Time-series volumes were realigned and unwarped using a robust rigid-body transformation procedure (Freire et al., 2002). A mean image generated during realignment was then coregistered with the participant's 3D T1 image, and the latter spatially normalized via nonlinear basis functions to the standard T1 template image in Although unpaired MZ and DZ twins did not contribute to the estimation of the genetic and environmental parameters, they did contribute to the estimation of mean and variance effects (i.e. they allowed a more accurate estimation of phenotypic correlations and phenotypic effects). b FIQ was measured using five subtests from the Multidimensional Aptitude Battery (Jackson, 1984), as close as possible to the twins' 16th birthday. The observed higher mean is likely due to the fact that the MAB test was created and normalized for Canadian samples, and therefore results on this test may differ when used in a different country.
In addition, the presence of an ascertainment bias cannot be excluded, as more intelligent and often more highly educated people tend to volunteer for these studies more frequently. However, the higher FIQ mean does not affect the representativeness of this sample because FIQ follows a normal distribution, with scores ranging from 85 to 146, thus showing good variability. c Gestational age, birth weight, and parental socioeconomic status (McMillan et al., 2009) were obtained from parental reports, either when the twins were 12 or 16 years. There were no significant mean or variance differences between co-twins or by zygosity. Males had slightly higher FIQ (d ϭ 0.34, p Ͻ 0.01) and birth weight (d ϭ 0.28, p Ͻ 0.05) than females. *p Ͻ 0.05; **p Ͻ 0.01.
MNI atlas space using the unified segmentation approach (Ashburner and Friston, 1999). The nonlinear transformations were next applied to the time-series volumes from which the mean was generated. Normalized volumes were then resampled to 3 mm 3 voxels and smoothed with an 8 ϫ 8 ϫ 8 mm full width at half-maximum isotropic Gaussian kernel. Global signal effects were estimated and removed using a voxel-level general linear model (Macey et al., 2004). High-pass (cutoff: 128 s) and low-pass (AR1 model) filtering were applied to discard signals of no interest. Image analysis. Image analysis was conducted in two stages. First, block design fixed-effects models were fitted at the single-subject level. Separate regressors were constructed for the 0-and 2-back conditions comprising a boxcar reference waveform convolved with a canonical hemodynamic response function. Second, the resulting single-subject 2-backϾ0-back t-contrast images were entered into a second-level group random-effects model (one-sample t test), regardless of zygosity. Using the Volumes Toolbox (http://sourceforge.net/projects/spmtools; authored by V. Glauche, Universität Freiburg, Freiburg, Germany), t scores were extracted from single-subject 2Ͼ0-back contrast images in each of 15,804 voxels comprising a brain mask created from the random-effects analysis (p Ͻ 0.05, family wise error (FWE) corrected; extent threshold, 25 voxels). Voxel activation values from the 2Ͼ0-back t-contrast images approximated normal distributions and did not require transformation. For each case, in each voxel, the square of the Mahalanobis distance was calculated to screen for and remove outliers. In addition, total gray matter volume was calculated by segmenting the 3D T1-weighted images in SPM5.
Genetic modeling. Using the statistical package Mx (Neale et al., 2002;Neale and Maes, 2004), maximum likelihood twin correlations were estimated, and univariate structural equation models examined the means and genetic (A) and environmental (E) sources of variance for taskrelated brain activation extracted from single-subject 2Ͼ0-back t-contrast images in each of the voxels specified by the group analysis. Age, sex, 2-back performance accuracy, and FIQ (estimated at age 16 years) were included as covariates, leaving their effects free to vary in each voxel. Correlations between genetic factors are fixed at 1 for MZ twin pairs, as they share 100% of their genes, and 0.5 for DZ pairs as they share, on average, 50% of their genes (see supplemental Fig. 1c, available at www.jneurosci.org as supplemental material). Thus, if the patterns and intensities of neural activity associated with the recruitment of working memory are significantly more similar in MZ twins than in DZ twins, this is strong evidence of heritability. By definition, nonshared environmental factors (e.g., illness, prenatal or postnatal traumas, peer groups), which also includes measurement error, are left uncorrelated. The significance of A was determined by testing whether dropping this parameter resulted in a significant (p Ͻ 0.05) decrease in the goodness-of-fit 2 statistic. Probability maps were height thresholded per voxel at an uncorrected p Ͻ 0.05, then cluster thresholded to correct for multiple statistical tests. At 10,000 iterations, a voxel-level threshold of p Ͻ 0.05, a mask image of 15,804 voxels, and an 8 mm smoothing kernel, Monte Carlo simulation(AlphaSim;http://afni.nimh.nih.gov/pub/dist/doc/program_ help/AlphaSim.html) determined a minimum cluster size of 147 voxels for cluster significance at p Ͻ 0.05. Cluster thresholding of p value maps was performed using FSL and AFNI command line tools (http://www. fmrib.ox.ac.uk/fsl/ and http://afni.nimh.nih.gov, respectively).
We assessed whether the AE model was appropriate in several steps. We compared the maximum likelihood twin correlation maps for MZ and DZ pairs. From the MZ and DZ twin correlations (r MZ and r DZ , respectively), we calculated initial estimates of heritability [h 2 ϭ 2*(r MZ Ϫ r DZ )] and common environmental influence [c 2 ϭ (2*r DZ ) Ϫ r MZ ] (Falconer and Mackay, 1996), and plotted the observed sampling distributions of h 2 and c 2 against the expected sampling distributions under the null hypothesis of no effect. Figure 1 shows that the differences between the observed and expected sampling distributions are consistent with the presence of genetic influences (heritability) and the absence of common environmental influences (C). The midline of the observed h 2 sampling distribution is well above zero, whereas the midline of the observed c 2 distribution is slightly below zero. The low DZ correlations, being less than half the MZ correlation and the fact that the midline of the observed c 2 distribution is slightly below zero suggested genetic dominance (D) in some regions. However, we did not have enough statistical power to estimate this separately in a structural equation model that includes additive genetic factors (A), dominance genetic factors (D), and unique environmental factors (E) (see supplemental Fig. 1b, available at www.jneurosci.org as supplemental material). In Ͻ2% of groupactivated voxels, the variance component D was significant at p Ͻ 0.05. When applying cluster thresholding to the D map, no clusters survived. Based on this result, it was decided to fit an AE model to all groupactivated voxels, with A including both additive and dominant genetic effects.

Genetic modeling
We first established a brain mask of commonly activated regions (2-backϾ0-back t contrast) identified in a group-level randomeffects analysis regardless of zygosity, and restricted our analyses to these regions (Fig. 2a). The group random-effects analysis showed the most significant increase in BOLD signal during the 2-back compared with the 0-back condition ( (Falconer and Mackay, 1996). h 2 and c 2 expected are normal distributions with a mean of zero and an expected sampling variance estimated as [4*((1 Ϫ r MZ 2 ) 2 /m ϩ (1 Ϫ r DZ 2 ) 2 /n)] for h 2 expected, and [(4*(1 Ϫ r DZ 2 ) 2) /n ϩ ((1 Ϫ r MZ 2 ) 2 /m)] for c 2 expected, where n and m refer to the numbers of DZ and MZ twin pairs, respectively, and r MZ and r DZ are set to zero under the null hypothesis of no heritability and no common environmental influence (Visscher, 2004 . Activation peaks were labeled using an image-based probabilistic brain atlas based on cytoarchitectonic cortical parcellation (Eickhoff et al., 2007). The activation pattern is consistent with the literature, including our previous study in a subsample of the current sample (Callicott et al., 1998(Callicott et al., , 2003Blokland et al., 2008).
Maximum likelihood twin correlations for task-related brain activation are shown in Figure 2b. For the regions activated by this task, overall MZ twin correlations were more than twice the size of the DZ correlations, suggesting that individual variation in WM activation is genetically influenced. DZ twin correlations in many voxels did not differ significantly from zero (for confidence intervals, see supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
Based on our preliminary investigations (see Materials and Methods), we opted to fit a model that includes genetic (A), and unique environmental factors (E) to explain the variance in taskrelated brain activation (for confidence intervals, see supplemental Fig. 1c, available at www.jneurosci.org as supplemental material). The resulting genetic component p value map was height thresholded at an uncorrected p Ͻ 0.05, and cluster thresholded based on Monte Carlo simulation. The A and E esti-mates and the A clusters surviving this strict significance criterion are shown in Figure 3. At a cluster threshold of 147 voxels, three large clusters were significant, totaling 8224 voxels, with moderately high heritability estimates (ϳ40 -65%) in inferior frontal gyri (pars triangularis) (L: [Ϫ48, 24, 21]  Excluding task performance (i.e., 2-back performance accuracy) as a covariate had a negligible effect on the heritability estimates, with the minimum (0%), maximum (65%), and mean heritability (23%) across tested voxels remaining the same, and the number of voxels reaching significant heritability (i.e., surviving cluster thresholding) increasing by only 173 voxels, to 8397 voxels.
To assess whether task performance in this study was due to genetic factors, we fitted univariate ACE models (for confidence intervals, see supplemental Fig. 1a, available at www.jneurosci.org as supplemental material) to the performance measures with normal distributions: 0-back response time, 2-back performance, and FIQ. The distribution for 0-back accuracy showed a strong positive skew, reflecting the low level of difficulty for this condition, and this variable was therefore not modeled. Table 2 shows the model-fitting results and percentage of variance estimates for the performance phenotypes. Performance on the 2-back condition was found to be strongly influenced by genes; accuracy in particular (57%), but response time as well (34%).
Performance measures were significantly reproducible as well, with intraclass correlations between time points 1 and 2 in the moderate to high range (p Ͻ 0.01, two-tailed) at 0.73 for 0-back accuracy, 0.79 for 0-back reaction time (RT), 0.77 for 2-back accuracy, and 0.56 for 2-back RT, with a significant improvement in response accuracy in the 2-back condition (t ϭ Ϫ3.376, df ϭ 39, p Ͻ 0.05). This practice effect was stronger for females (t ϭ Ϫ2.859, df ϭ 19, p Ͻ 0.05) than for males (t ϭ Ϫ1.845, df ϭ 19, p ϭ 0.08), consistent with the finding that performance at the first time point was slightly lower in females than in males, and suggesting a "catch-up" effect.

Influences of sex, age, and other covariates on task-related brain activation
We also quantified and analyzed the effects of sex, age, 2-back performance accuracy, and FIQ on the BOLD response ( Fig. 4a; supplemental Table 4, available at www.jneurosci.org as supplemental material), which may mediate individual variation. As shown in Figure 4b, all covariates survived cluster thresholding, but sex had the strongest effect on brain activation (supplemental Table 1, available at www.jneurosci.org as supplemental material, shows that sex influenced task performance and FIQ as well). At a threshold of 147 voxels, one large cluster and five smaller clusters were significant, totaling 3613 voxels. Males had significantly stronger activation than females in several areas, but differences were most pronounced in inferior frontal gyrus (pars opercularis) (R: [51,12,27], [54,15,30]   , with a t-contrast score difference of 0.13-0.70. This is quite a large effect given a mean Ϯ SD voxel t-contrast score of 0.32 Ϯ 0.45 across the working memory network (range, Ϫ2.31 to ϩ3.77).
Higher accuracy on the 2-back condition predicted slightly higher activation in several regions. The effect was most pronounced in middle frontal gyrus (R: [30,6,60], [36,6,48], [42,9,48]); inferior frontal gyrus (pars opercularis) (R: [42,15,27], [48,9,21] , with a 0.004 -0.009 increase in t-contrast score difference per percent increase in performance accuracy. Sex differences in BOLD response may partly account for these response accuracy effects, as males, on average, performed better on the task. Compared with the other covariates, the effect of FIQ on WM task-related activation was small, with only positive effects surviving cluster thresholding. Higher FIQ predicted slightly higher activation in the thalamus (L: [Ϫ15, Ϫ27, 12]) and hippocampus (cornu ammonis; CA) (L: [Ϫ30, Ϫ33, 0], [Ϫ21, Ϫ42, 6]), at 0.004 -0.012 increase in t-contrast score difference per IQ point. These findings suggest that better task performance and higher IQ do not necessarily imply greater efficiency in the use of brain resources, which is consistent with the work of Klingberg et al. (2002).

Discussion
The heritability of task-related brain activation has not been estimated with any degree of certainty before, as the few previous twin studies have been limited in terms of sample size and analysis. Here we demonstrate for the first time, in the largest twin sample to date, using a voxel-level analysis, and consistent with our preliminary ROI analysis on a small subsample (Blokland et al., 2008), that there is a significant and substantial genetic influence on WM task-related activation across the brain, with genes accounting for up to 65% of the variance, averaging ϳ33%. Heritable areas include the dorsolateral prefrontal cortex, an area whose function is strongly affected in schizophrenia (Callicott et al., 1998) and other mental disorders. Our heritability estimates are comparable in magnitude to a small-sample ROI study in which genetic influences accounted for 38% of the variance in activation of the dorsal anterior cingulate cortex during an interference processing task (Matthews et al., 2007), as well as a recent family study where up to ϳ40% of the variance in resting-state functional brain connectivity was due to genes (Glahn et al., 2010). Importantly, we found significant heritability for the task component of interest, WM, whereas Koten et al. (2009) reported a small genetic influence on activation related to an unconstrained distraction phase of the n-back WM task, and in a very small sample (10 MZ twin pairs and their non-twin siblings). Additionally, while considerably lower than the heritability reported for large brain structure, our heritability estimates for task-related activation are of similar magnitude to those for small brain volumes (e.g., hippocampus) (Schmitt et al., 2007). Our findings are also in line with imaging studies showing an association between WM brain function and specific genetic polymor- Maximum-likelihood twin correlations, univariate genetic model fitting results, and variance component estimates from the full ACE and best fitting models for the performance measures and FIQ. The best fitting model is shown in bold. Nested submodels are compared to the full ACE model by testing whether dropping a parameter resulted in a significant increase in the goodness-of-fit 2 (the difference in Ϫ2 times the log likelihood (Ϫ2LL) of a model and a nested submodel follows a 2 distribution with degrees of freedom equal to the difference in the number of parameters). Estimates are corrected for age and sex; assumption testing supported homogeneity of means and variances across birth order and zygosity. Ϫ2LL ϭ minus 2*log-likelihood; ⌬ 2 ϭ change in 2 ; ⌬df ϭ change in degrees of freedom; CI, confidence interval; ns, not significant. *p Ͻ 0.05.
phisms, such as GRM3, COMT, and TPH2 (Egan et al., 2004;Tan et al., 2007;Reuter et al., 2008), and are congruent with behavioral studies where a significant proportion of the variance in WM performance has been attributed to genes (Ando et al., 2001;Luciano et al., 2001), with the heritability estimates found here for brain activation being comparable to those for performance (20 -65% vs 34 -57%). An alternative explanation that MZ co-twins have more similar task-related brain activation because they are more likely to use the same task strategy than DZ co-twins, as suggested by the differential association between WM performance and activation of brain areas (Kirchhoff and Buckner, 2006), is unlikely. In a post hoc analysis, we assessed strategy use from self-report for a subsample of 139 twins, with results as follows: ϳ48% used a numeric strategy, 19% used a spatial strategy, and the remainder used a combination of both or neither strategy. This distribution was approximately the same for MZ and DZ twins, but more importantly, MZ co-twins were no more alike in their strategy use (r ϭ 0.21, p Ͼ 0.05) than DZ co-twins (r ϭ 0.37, p Ͻ 0.01). Thus, these preliminary data provide no indication that WM brain ac- Figure 4. Covariate effect estimates for sex, age, 2-back performance accuracy, and FIQ. A, Standardized regression coefficients (␤ values) obtained from multiple regression of task-related activation on sex, age, 2-back performance accuracy, and FIQ in Mx (Neale et al., 2002). Positive effects (i.e., greater activation in males, in older participants, and in participants who performed better on the n-back task or with higher FIQ) are represented by hot colors and negative effects (i.e., greater activation in females, in younger participants, and in participants who performed worse on the n-back task or with lower FIQ) are represented by cold colors. B, Height-thresholded (p Ͻ 0.05) and cluster-thresholded (Ͼ147 voxels) p value maps corresponding to the regression coefficient maps of sex, age, 2-back performance accuracy, and FIQ. tivation from MZ twins is more similar because they adopt the same strategy. Strategy does not appear to be genetic, while there is a strong genetic influence on accuracy and RT. We controlled for possible effects of task performance (i.e., 2-back performance accuracy) on brain activation by including it as a covariate, along with age, sex, and FIQ, although excluding task performance as a covariate had a negligible effect on the heritability estimates and their significance. Because of the low phenotypic correlation between task performance and task-related brain activation, and similarly for FIQ and whole-brain gray matter volume (supplemental Figs. 4, 5, available at www.jneurosci.org as supplemental material), we did not explore whether a common set of genes could explain any covariation among these phenotypes. In the future, it would be interesting to extend the analysis further, in particular to investigate whether there is a genetic covariation between a voxellevel gray matter measure and voxel-level brain activation.
The spatial, numeric version of the n-back task used in this study recruits some specific brain regions that other n-back task versions may not (Owen et al., 2005). This means that heritability estimates found here may not be generalizable to other (e.g., verbal) n-back task versions or to other cognitive paradigms. We could theorize that spatial abilities may be evolutionarily older than verbal abilities and could therefore be more hard-wired in the brain and thus more strongly genetically influenced than verbal abilities. Also, while our twin correlations hinted at the possible presence of genetic dominance in some brain areas, we did not have sufficient power to estimate D separately, so A includes both additive and dominant genetic influences. In an even larger sample, it might be possible to estimate these influences separately. Congruent with studies of cognitive ability that show there is little common environmental variance in adults (Plomin and Spinath, 2004), our fMRI data did not provide any indication for common (shared) environmental influences.
Given the large unique environmental variance estimated in this and the few prior twin fMRI studies, we ascertained the extent to which this variance might be due to measurement error, which places a ceiling on heritability estimates. Prior studies reported mixed fMRI test-retest reliabilities, depending on sample composition, paradigm, brain region, magnetic field strength, scan interval, and analysis methods (Caceres et al., 2009). Our high test-retest reliability suggests it is unlikely that the large E variance component reflects only measurement error; unique environmental factors appear to be involved. Twins were screened for trauma, illness, and drug and medication use before inclusion, so these factors can largely be excluded. Unique environmental factors may include stochastic biological effects or idiosyncratic experiences, such as job stress or tertiary educational course. These factors could have a direct effect on the variance in activation, or an indirect effect, for example by influencing motivation, concentration, or nervousness during fMRI. New synapses may develop in response to unique environmental conditions, in addition to strengthening or discarding existing synapses. Heritability estimates were highest in areas with high test-retest reliabilities, showing the importance of reliable measures when searching for genetic effects. On a related note, it has been suggested that brain regions activated in some individuals only might be better candidates for genetic analysis, because genetic influences on brain activation in areas that are activated similarly among individuals might be underestimated (Koten et al., 2009). However, here we show in our large sample that individual variability within the group activation area is high, with voxel activation intensities approximating normal distributions. Regions outside the group activation area that are activated in just one or a few individuals would pose problems for genetic analysis, since the phenotypic variance in these regions would be too close to zero. This voxel-based analysis can inform ROI studies. Averaging across an ROI has the advantage of increased signal-to-noise ratio relative to the voxel level; however, the spatial variability of interest may be missed when drawing anatomically defined ROIs, as this assumes that functional divisions and heritability patterning follow anatomical divisions. Future heritability studies will be able to estimate regional heritability more accurately when averaging activation across regions identified here as heritable (and reliable). Genome-wide association studies can furthermore benefit from the reduction in statistical tests associated with analyzing ROIs compared with voxel-level phenotypes.
As sex and age may mediate individual variation, we also quantified their effects on the BOLD response. Our finding of relatively strong sex differences in the brain's response to a WM task confirms the results of Schweinsburg et al. (2005) and Bell et al. (2006), but contradicts those of Schmidt et al. (2009). One theory is that differences in activation are due to gender-specific processing strategies used to accomplish the same cognitive task (Schweinsburg et al., 2005;Clements-Stephens et al., 2009). Another theory, supported by a recent near-infrared spectroscopy study in the prefrontal cortex, is that females possess more efficient hemodynamics during WM (Li et al., 2010). Possibly, different sets of genes explain variance in brain activation for males and females. In a larger sample, heritability could be estimated separately for each sex. Furthermore, areas with significant sex effects partially overlapped with those found for age (age effects were much smaller than sex effects), consistent with prior studies showing differential developmental patterns for males and females for neural activation (Schweinsburg et al., 2005;Clements-Stephens et al., 2009), possibly reflecting the development of gender-specific task strategies. Other possible explanations for age effects include greater neural efficiency in older participants (Schweinsburg et al., 2005), early age-related changes in cerebrovascular characteristics (Kannurpatti et al., 2010), or age-related gray matter decrease (Sowell et al., 2003). The relative influences of genes and environment may differ with age, as shown in a twin study of brain structure (Lenroot and Giedd, 2008), and it may eventually be possible to estimate them separately for different age groups.
In summary, this is the first fMRI study to investigate genetic and environmental influences on task-related brain activation in a combined MZ-DZ twin sample voxel by voxel. It is a substantive extension on our first analysis (Blokland et al., 2008), both in terms of complexity of the analyses and sample size, and provides the strongest support so far for the hypothesis that WM brain activation is heritable. By establishing the heritability of cognitive brain function, this study provides the necessary evidence required before taskrelated brain activation can be considered as an endophenotype (Gottesman and Gould, 2003) for brain disorders. As the size of our twin sample is increased over the next several years, these genetic brain maps should facilitate discovery of gene variants influencing cognitive brain function through genome-wide association studies, which may provide us with even greater insight into human neurobiology and cognition, and could open up new avenues in the diagnosis and treatment of brain disorders.