Introduction

Specific reading deficits (Dyslexias) affect approximately 8% of children, despite adequate intelligence, education, and social environment.1 The disorder begins in childhood, continuing into adulthood2 and has serious social impacts.3 Since early reports of its familiality,4, 5 systematic twin studies have shown that most of this familial aggregation is due to genetic differences6 with a heritability of around 0.7.7, 8

Two forms of dyslexia have been recognized: surface dyslexia, affecting lexical processing and diagnosed by poor reading of irregular words, such as ‘YACHT’ and phonological dyslexia, diagnosed by poor phonological decoding skill, for instance reading of nonwords such as ‘SLINT’, with most cases showing affection of both types.9 Quantitative behavioural modelling suggests that normal variance in reading is heritable and that, although most genes affect both forms of reading, some genes are specific for lexical or nonlexical processing.7, 10 So whereas many of the genes accounting for this trait may be expected to exert a general effect, some should be specific for lexical processing, and some for nonlexical processing.7

Linkage and association studies of affected sib-pairs and family segregation studies have recently been reviewed in this journal11 and elsewhere.12, 13 In summary, 11 regions have been linked to reading disorder, located on chromosomes 1p34–36,14, 15, 16 2p15–16,17, 18, 19, 20 2q22.3,21 3p12-q13,22 6p23–21.3,23, 24, 25, 26, 27, 28, 29, 30 6q11.2,31 7q32,19 11p15.5,32 15q21.1,24, 33, 34, 35, 36, 37 18p21,38, 39 and Xq27.338, 40 (see Table 1 for a cross-tabulation of linkage reports, and the diversity of phenotypes linked to each region).

Table 1 Previously reported linkages for reading traits in dyslexic samples

Thus, whereas nearly one dozen linkages to clinical dyslexia have been reported, five have yet to be assessed in an independent sample, others have failed to replicate at least once, and none have previously been studied in a normal sample. We therefore examined which, if any, of these clinical linkages could be replicated in unselected adolescents. Second, we examined the specificity of linkages to markers of surface and phonological dyslexia: the two major subtypes of dyslexia.41 A third goal was to examine genome-wide information for possible novel linkages.

Materials and methods

Participants

Twins were initially recruited from primary schools in the greater Brisbane area, media appeals and by word of mouth, as part of ongoing studies of melanoma risk factors42, 43 and cognition,44 and form approximately one-half of the full eligible birth cohort. The sample is representative of the Queensland population for intellectual ability.45 Informed consent was obtained from all participants and parents before testing. This paper concerns data collected during 2003.

Reading and spelling phenotypes and genotyping were available for 403 twin families. This consisted of 214 pairs of DZs, 85 DZ pairs with one extra sib, 23 pairs with two extra sibs, and two with three sibs. An additional 54 MZs with one extra sib, seven with two, and one with three sibs were included in the linkage analysis using the MZ option of Merlin. MZ pairs, although not genotyped, contributed to estimation of additive genetic and shared environment effects and places an upper limit on the estimate of variance owing to a linked QTL.46 Finally, three nontwin sib pairs and 14 single co-twin-sibling pairs (11 with one sib, three with two sibs) were included. Note that although parents were not phenotyped, their genotypes (where available) still contributed to IBD estimation for siblings.

Reading and spelling assessment

Regular word, irregular word and nonword reading were assessed using the CORE,2 a reliable 120-word extended version of the Castles and Coltheart9 test with additional items added to increase the difficulty of this test for an older sample. Regular and irregular-word spelling were tested using 18 regular words and 18 irregular words from the CORE, respectively. These were presented verbally, untimed and in mixed order, the dependent variable being number of words spelled correctly to oral challenge. Nonlexical spelling was assessed by having subjects produce a regularized spelling for each of the 18 words given in the irregular spelling test. Each word was then presented verbally, and the letter string used for spelling was recorded and scored for phonological correctness from a list of acceptable regularizations. Words were repeated on request.

Each participant was contacted and interviewed over the telephone by a trained researcher in accordance with the instructions outlined above. If a blood sample for DNA analysis had not yet been obtained, this was also arranged at this time, with the subject's consent. Test scores on each of the three reading subtests and three spelling tests were calculated as a simple sum of correct items. Before analysis, all raw data were log-odds transformed to approximate normality.

Zygosity testing and genotyping

DNA extraction, zygosity determination, and genome-scan acquisition are described in detail elsewhere.47 The genome scans consisted of 796 highly polymorphic microsatellite markers (31 X-linked) at an average spacing of 4.8 cM with locations determined from the sex-averaged deCODE map48, 49 and interpolation of unmapped markers. Marker heterozygosity ranged between 52.6 and 91.9%. Both parental genotypes were available in 292 families, for one parent-only in 76 families, and for neither parent in 35 families. Parents were typed for between 228 and 784 markers (mean of 398±103). For twins/siblings, the number of typed markers ranged from 211 to 788, with an average of 576(±195) total markers.

Analyses

Univariate multipoint variance components (VC) linkage analysis was used at each marker to partition the phenotypic covariance matrix into genetic variance owing to the linked QTL, residual polygenic additive genetic variance, and unique environmental variance. Dominance effects (ADE model) did not appear likely as DZ correlations were not significantly lower than half the MZ correlations.50, 51, 52, 53

VC were estimated by maximum-likelihood analysis of the transformed data using the MERLIN and MINX (Merlin for the X chromosome) software packages54 with sex and age specified as (linear) covariates, and with the MZ option in MERLIN specified to allow both members of an MZ pair to be included in the analysis with their nontwin sibling/s. Linear fits for the age covariate gave equivalent fits to higher order polynomial functions within the limited range of ages in this study (12–25 years, mean 18.3±2.7). Kosambi map units (derived from the deCODE map) were transformed to Haldane units for input to Merlin. Results are reported in Haldane units. Significance of additive QTL linkage effects was assessed at each marker by comparing log10 likelihood of models including and excluding the QTL effect.55

Sex-specific difference in genetic distances arising from different female and male recombination rates can bias results in samples where parental genotypes are available only for a portion of the sample and the proportions of maternal and paternal genotypes differ significantly.56 In the present case, the percentage of cases in which only maternal or paternal genotypes were recorded was 17 and 2, respectively, meaning that the sex-averaged map will may bias LOD scores upward by around 5% in the worst case (regions with m:f distance ratios of 10).

Lander and Kruglyak's57 criterion for replication level support as lod ≥1.44 was adopted. For assessing genome-wide association, empirical P-values for significance and suggestive support for linkage were established for each phenotype using 1000 gene-dropping simulations in MERLIN, with suggestive support defined as the empirical LOD corresponding to one expected false positive per genome scan and significant linkage corresponding to one expected false positive per 20 genome scans.58 Empirical LOD scores corresponding to suggestive support were somewhat lower than the theoretical value of 2.2: being 1.66, 1.93, 1.89, 1.98, 1.86, and 1.89 for the regular, nonword, and irregular spelling measures and regular, nonword, and irregular reading measures, respectively (95% confidence interval on the empirical P-value corresponds to 0–0.003).

Results

Table 2 provides the range, mean, and standard deviation for each measure, their heritability based on a VC analysis of this sample (reported by Bates et al2), and the magnitude of significant sex and age effects. Female subjects showed increased scores for irregular word reading and both irregular- and regular-word spelling. All measures increased with age. Univariate heritabilities ranged from 0.52 for nonword spelling to 0.76 for irregular word spelling as computed in Mx.

Table 2 Descriptive statistics and heritabilities (h2) and for each of the measures, including significant mean differences between female and male scores (sex) and the unstandardized regression coefficient for age in years

Table 3 shows the phenotypic and genotypic intercorrelations between the six phenotypes, which are uniformly moderate to high in magnitude, and significant.

Table 3 Phenotypic (lower triangle) and genetic (upper triangle) correlations among the Six Reading and Spelling Scales (N=1082–1085)

Results of the autosomal linkage scan for reading and spelling are shown in Figure 1. More detailed plots for chromosomes where support for linkage exceeded or approached replication levels as well as the novel suggestive evidence for linkage on chromosomes 4, 9 10, and 17 are shown in Figure 2. Table 4 shows the maximum LOD score and associated markers and locations for each replicated linkage. In cases where a larger linkage peak was found at a marker or region other than originally reported, both the linkage at the original region and the marker of maximum linkage (and associated trait) are shown. Also shown in Table 4 are the markers and traits associated with linkages on chromosomes 4, 9, and 17.

Figure 1
figure 1

Autosomal linkage plots of the reading (a) and spelling (b) measures. Empirical LOD scores for suggestive support for linkage57 were 1.98, 1.86, and 1.89 for regular, nonword, and irregular reading, respectively, and, for regular, nonword, and irregular spelling were 1.66, 1.93, and 1.89, respectively.

Figure 2
figure 2

Individual linkage plots for chromosomes supporting previously reported linkages in affected samples in this unselected sample, and for suggestive novel linkages on chromosomes 4, 9, and 17. Note: horizontal line drawn corresponding to the 1.44 lod criterion adopted for replication.

Table 4 Replicated and novel linkage regions for reading and spelling

The strongest support was found for linkages at 2q22.3, 3p12-q13 (DYX5), 6q11.2–q12 (DYX4), 7q32, 15q21 (DYX1), and 18p21 (DYX6). Strong support for linkage at 2q was found for regular-word spelling, exceeding genome-wide suggestive evidence for linkage (peak lod 2.18 at 222 cM, marker XRCC5). This peak, however, lies 70 cM distal to the original locus (marker D2S1399, 169 cM) reported by Raskind et al,21 and outside the region of overlap for the confidence intervals of the two peaks, based on a lod-drop of 1. It may still be that these peaks reflect the same gene as linkage peaks for true QTLs can vary significantly in location owing to a number of mainly stochastic factors.59

For four regions (7q32–34, 15q21, 18p21, and Xq27), linkage support fell at the same or closest marker to that reported in the original finding. Support for linkage for 7q32–q34 reached a maximum lod of 2.03 for nonword spelling at marker D7S530 (140 cM) identical to that reported in the original study.19 Irregular, regular, and nonword reading also supported linkage at the same marker with lods of 1.92, 1.13, and 1.21, respectively.

Replication-level support for linkage at 15q21 (DYX1) affecting regular word spelling (lod 1.89) fell at D15S994, our closest marker to the D15S132 satellite, which defined the peak identified by Schulte-Körne et al,34 but which was not included in our marker panel. This marker also defined the peak in the UK study. Similarly, the replication peak for linkage at 18p21 (the putative DYX6 locus) coincided with the strongest marker from the UK sample reported by Fisher et al38 (marker D18S464, 34.50 cM, lod=1.70), and our strongest signal (marker D18S478, 54.878 cM, lod=2.00) coinciding with the strongest marker from the US sample. Finally, linkage at Xq27 was supported, with linkage peaking at the same markers (DXS1227 through DXS8091) implicated in the original report of de Kovel et al,40 with a peak lod of 1.09 at DXS9908 in the present sample.

For DYX5 (3p12-q13) support exceeded replication levels (max. lod 1.66 at marker D3S1292 for irregular reading), within 20 cM of the peak reported by Nopola-Hemmi et al.22 Further support for this being the same locus as identified by Nopola-Hemmi et al was provided by the overlap of the drop-lod 1 region of this linkage and the original peak, with lods of 1.04 at marker D3S1289 for nonword reading (our closest marker to D3S3665 which defined the peak in the original report) and a lod of 1.19 for nonword spelling at marker D3S4542, some 20 cM proximal to the original locus,22 evidence for a gene or genes in the locus identified by Nopola-Hemmi.

Although peaking at the same marker previously reported by Grigorenko et al,15 support for linkage for DYX8 (chromosome 1: marker D1S234) fell short of the replication level with a peak lod of 1.2 for nonword reading. Support for linkage was low and flat across 6p, with no support for any effect at 6p23–21.3 (DYX2) for any phenotype for reading or spelling (peak LOD score 0.16). Support for linkage at 11p15.5 (putative DYX7) was also weak (max. lod 0.62 for irregular reading at marker D11S1338), with no peaks elsewhere on the chromosome. Similarly for the putative DYX3, two distinct peaks were present within 18 cM distal of the D2S337-D2S286 linkage identified by Francks et al20 at 2p15–16, with a lod of 0.83 at marker D2S1360 for nonword spelling; and a second peak of 1.04 at marker D2S2972 for regular word reading.

Linkage for DYX4 (6q11.2–q12), support at the original locus of maximum evidence for linkage exceeded replication levels (linkage for irregular word spelling, max. lod of 1.59 at D6S462), but a considerably stronger peak lay distal to this at D6S262 (lod=2.04, 130.7 cM), again for irregular word spelling. It is noteworthy that the small set of markers examined by Petryshen et al31 did not extend beyond 6q12, as they designed the marker set primarily to explore the 6p region, leaving open the possibility that the true region of maximum linkage in their sample was more distal than reported, and that the present peak at D6S262 refines the locus as more distal than originally reported.

Two new regions exceeded the empirical criteria for suggestive evidence for linkage. The strongest of these novel linkages was found for irregular-word reading on chromosome 4 (marker AD4S403; 29.21 cM, lod=2.08; empirical threshold for suggestive support for linkage= 1.89). The irregular-word spelling counterpart also supported linkage, albeit weaker, peaking at a lod of 1.43 at marker D4S2633. Irregular-word spelling showed suggestive support at chromosome 17, peaking at marker D17S831 (5.89 cM) with a lod of 1.99 (exceeding the empirical threshold for suggestive evidence of linkage =1.89).

Discussion

Most previous studies of dyslexia have used patient samples. In contrast, we have administered reading and spelling tests by telephone to a sample of twins and their siblings without selection for any cognitive or other trait, that is, a sample of normally varying adolescents.

The present study of an normal sample supported nine of the 11 regions previously identified only in samples selected for dyslexia, with seven of these loci supported at the level of independent replication. In addition to replicating most prior reports of linkage from clinical samples, two new regions exceeded the empirical criteria for suggestive support. The strongest of these novel effects was found for irregular-word reading on chromosome 4p15 and for irregular-word spelling on chromosome 17p13.

For five of the seven replicated linkages (2q,21 6q11.2–q12,31 7q32,19 18p21,38 and Xq2740), the present paper forms the first replication in a sample outside those reported in the original papers. Replication maxima also fell close to the loci reported in original reports. For loci 18p21, 7q32, and Xq27, markers identical to those used in the original reports defined the maxima. In the case of 15q21, the linkage peak was to the marker in our microsatellite array closest to that in the original report.34 Our results, then, suggest that loci identified in dyslexic groups play a role in normal reading and spelling although further examination is needed.

Two linkage regions received little support in the present study: 6p21–23 and 11p. Given our sample size, failure to find evidence for linkage cannot be taken as strong evidence against the linkage. It may also be the case that the genes in these regions are selective for severe forms of reading problem not present in our general sample. In the case of 6p21–23, the failure to find linkage is compatible with several previous nonreplications at this locus.33, 34, 60 Chapman et al33 noted that in most cases of nonreplication, the tests of reading used were unspeeded, whereas successful replications such as that of Kaplan et al29 used timed measures of reading aloud. It is possible, then, that 6p is related to processes determining the speed, rather than accuracy of reading, or at least that the 6p linkage region is highlighted during speeded reading. Also, two candidate genes have now been reported for 6p23–21.3: KIAA031961, 62 and DCDC2.63 Because of the greater power of association studies, we are examining these genes in our sample at present.64 By contrast with 6p, the 11p15.5 region has a weaker background of previous support.32 The lack of evidence for linkage in this region might suggest that this linkage is related to ADHD and only appears in dyslexia samples owing to comorbidity of ADHD with dyslexia,65 in which case DYX7 would not, in fact, be related to reading-specific functions.

The present data shed some light on observed behavioural correlates of reading such as sex differences, on theoretical proposals for mechanisms underlying reading such as speed of lexical access as a determinant of reading skill,66 and on the range of languages and geographical regions in which chromosomal regions identified to date play a role in the development of reading. Support for linkage on the X chromosome and previous reports38, 40 suggest a mechanism for observed sex differences in dyslexia. The role of sex in determining reading disorder is an important and open topic, especially given recent reports that males are more at risk67 and that severe risk is more highly heritable in males at least before age 8 years.68 More research is needed in this area, given the contradictory state of reports from older samples,2, 69 but clinical risk rates are indisputably elevated for younger males.

The replication support for linkage at 2q22.3 focuses attention on the ability of genetic studies to dissect component tasks within reading. Raskind et al21 reported that linkage support at 2q was specific for speed of nonword reading (‘phonological decoding’), and was unrelated to accuracy on this task. This is the first report of such a within-task specificity, and as such was of interest in potentially dissecting the reading phenotype. Although our study did not contain pure speeded measures, instead focusing on accuracy, modest support for linkage was found proximal to the original locus for regular word reading accuracy with stronger evidence (lod 2.18 at marker XRCC5) distal to 2q22.3, again for a regular-word phenotype (spelling). Wolf's double-deficit hypothesis of reading suggests that lexical access speed is a basis for lexical skill66, 70 and would predict that a ‘speed’ endophenotype such as reported by Raskind et al21 should associate with regular word reading accuracy as found here. Clearly, more studies are needed, both to narrow the region of interest and to better understand the phenotype, and how speeded reading might be related to our spelling task.

Several of the loci including 2p and 7q have now been examined in a diverse range of language families and geographical regions. For instance, chromosome 2p has been linked to reading across a range of geographical boundaries and language families including Norwegian,17 Canadian,18 Finnish,19 and English20, 38 samples. Likewise, the replication of linkage support at 7q32 in an English-speaking unselected Australian sample suggests that it is not specific to the original sample, or language (the original report being Finnish19). This suggests that several of the genes identified may affect reading in ways common to most or all languages and, that at least some linkages may operate across broad geographical regions.

The status of candidate genes has recently been reviewed,13 with support for candidates at DYX1 (DYX1C171), DYX2 (KIAA031961, 62 and DCDC263), and DYX5 (ROBO172) being involved in at least some cases of dyslexia. For most regions, then, candidate genes are unknown, but the finding that each of the above four genes play putative roles in neuronal migration during the laying down of the nervous system may guide the search for candidates at the other loci reported here and elsewhere.

In summary, our data provide the first reported replication of five linkages: 2q22.3, 6q11.2, 7q32, 18p21, and Xq27. Additional support was found for the four well established loci: 3p12-q13 (DYX5), 15q21.1 (DYX1), 1p34–36 (DYX8), and 2p15–16 (DYX3), with a clear failure to find support only in two cases (6p23–21.3 (DYX2) and 11p15.5 (DYX7)). This level of support for linkage in a normal, unselected sample, for linkages previously reported in dyslexic patient samples, as well as novel linkages is encouraging, and developments in positional candidates provide optimism that further studies can identify genes, and contribute to a mechanism for the biological basis of both reading disorder, and normal variation in reading.