Introduction

Autism (Mendelian Inheritance in Man (MIM) 209850) is a severe, complex neurodevelopmental disorder, characterized by impairments in reciprocal social interaction and communication, and restricted and stereotyped patterns of interests and behaviours.1 It predominantly affects males with a sex ratio of approximately 4:1. The latest study in the United Kingdom indicates that the general population prevalence is approximately 38.9 in 10 000, with no apparent systematic geographic or socioeconomic variation.2 Autism spectrum disorders (ASDs) refer to a broader definition of autism, which include classical and atypical autism, Asperger syndrome, and pervasive developmental disorder not otherwise specified and belong to the group of closely related pervasive developmental disorders.3 When the different ASDs are combined, the total prevalence can be as high as 116.1 per 10 000 children.2 Prevalence estimates have risen over the last two decades, likely because of an increased awareness of the disorder and because of broadening and improvement of the diagnostic tools.4 Although the aetiology of autism is not yet understood, it is a highly heritable disorder, with an age of onset usually before 3 years of age, with symptoms persisting throughout life.5

Following the first published genome-wide linkage screen in 19986 several independent studies for autism have been conducted. Evidence for suggestive linkage has been found on almost all chromosomes with little concordance between studies. However, one of the most consistently replicated loci, is located on chromosome 7q. The Autism Susceptibility Locus 1 (AUTS1) region, identified by the International Molecular Genetic Study of Autism Consortium (IMGSAC) on chromosome 7q,6 was further established in two follow-up studies conducted by IMGSAC using additional families and markers.7, 8 It has also subsequently shown consistent positive linkage results in several other studies.9, 10, 11, 12, 13, 14, 15, 16 However, the significance of linkage and precise location of the peak varies between these studies. Thus the peak of linkage in this area remains broad, spanning more than 40 Mb and containing over 200 mapped genes.

The human gene MET (proto-oncogene hepatocyte growth factor receptor; MIM 164860) is located on chromosome 7q31.2, under the linkage peak of interest for IMGSAC families (7q21.3-7q34). MET covers approximately 126 kb and encodes a high-affinity transmembrane receptor tyrosine kinase of the hepatocyte growth factor/scatter factor (HGF/SF). HGF is a mesenchymally derived growth factor that stimulates epithelial cell mitogenesis, motility, invasion and morphogenesis. MET and its ligand are expressed in numerous tissues, although predominantly in cells of epithelial and mesenchymal origin, respectively.17 The receptor comprises an N-terminal α-chain, located outside the membrane, a C-terminal β-chain that contains an extramembrane sequence, a single transmembrane domain and a cytoplasmic tyrosine kinase domain.18, 19 The kinase activity and the receptor's dimerization are essential for the active form of the receptor, and this activation will consequently trigger a number of signalling pathways in target cells.20

MET was primarily identified as a proto-oncogene21 and has a well documented role as a dominant oncogene in tumour development and progression, being overexpressed and/or deregulated in diverse human tumours.22, 23, 24, 25 Furthermore, MET signalling participates in the immune system regulation,26, 27, 28 embryogenesis20 and in the peripheral organ development and repair (such as gastrointestinal29). MET and HGF, both expressed in the developing nervous system, have been implicated in neuronal development,18, 30, 31, 32 more specifically in the cerebral cortex33, 34 and cerebellum.35 Impaired MET/HGF signalling interferes with interneuron migration and disrupts neuronal growth in the cortex,33, 34 and also leads to a decreased proliferation of granule cells, causing a parallel reduction in the size of the cerebellum.35 Interestingly, these features have been observed in the brains of autistic individuals.34, 36, 37, 38

Recently, Campbell et al38 reported genetic and molecular biological evidence pointing to a common functional promoter variant in the MET gene as a contributing risk factor for autism susceptibility. They showed significant overtransmission of the rs1858830 C allele in 204 autistic families (P=0.0005), confirmed in a replication sample of 539 families (P=0.001). Also, the C allele was significantly less prevalent in a group of 189 unrelated controls than in the cases or their parents. Expression studies determined that the C allele results in decreased MET promoter activity and altered binding of specific transcription factor complexes, suggesting a link between reduced gene expression and autism susceptibility.38 MET protein levels were also significantly decreased in ASD cases compared with controls and this was accompanied in ASD brains by increased mRNA expression for proteins involved in regulating MET signalling activity.39

In this study, the role of the MET gene in autism susceptibility was analyzed in the IMGSAC family collection following previous reports of association. The entire known variation within the MET locus was covered to evaluate whether any specific gene variants or haplotypes were associated with autism, thus seeking a better understanding of potential mechanisms underlying the previous positive findings for autism in the 7q region.

Materials and methods

Subjects

The collection and identification of families, assessment methods and inclusion criteria used by IMGSAC are described elsewhere.7 Briefly, after an initial screen, parents undertook the Autism Diagnostic Interview-Revised40 and the Vineland Adaptive Behaviour Scales.41 Probands were administered the Autism Diagnostic Observation Schedule42 and a medical examination was performed to exclude recognizable medical causes of autism, particularly tuberous sclerosis and neurofibromatosis. Karyotyping was undertaken where possible on all affected individuals and molecular genetic testing for Fragile X syndrome performed on one case per family. DNA was extracted from blood samples, buccal swabs or cell lines by use of a DNA purification kit (Nucleon, Manchester, UK) and standard techniques. Written informed consent was given by all parents/guardians and, where possible, by affected individuals. The study has been approved by the relevant ethical committees. For this experiment, a total of 1621 Caucasian individuals from 325 multiplex IMGSAC families and 10 IMGSAC trios were genotyped. The male:female ratio of the affected individuals is 3.3:1. Also, 82 Italian trios from the IMGSAC collection were used in the replication study.

A sample of 185 UK controls from ECACC (European Collection of Cell Cultures) and 88 Italian controls randomly chosen were used for the case–control study. The Italian cohort (samples and controls) used in this study is entirely independent from the Campbell et al study.38

Single nucleotide polymorphism (SNP) genotyping

SNP selection

With the intention of capturing the maximum amount of genetic variation in MET, genotyping data from CEPH (Centre D'étude du Polymorphisme Humain) individuals was downloaded from the HapMap phase II (release 21), for the 7q31.2 region, including 5 kb upstream and downstream of the gene (chromosome 7: 115890847–116029719). Thirty-one haplotype-tagging SNPs were selected using Tagger from Haploview v4.043 (r2>0.8 and minor allele frequency >0.05, aggressive tagging), covering the selected region. In addition, four non-synonymous SNPs and rs1858830 (the promoter variant reported earlier by Campbell et al38) were also chosen (Figure 1). Thirty-four of these were genotyped using the Sequenom MALDI-TOF iPLEX platform (Sequenom, San Diego, CA, USA) and two by a restriction fragment length polymorphism assay.

Figure 1
figure 1

(a) Schematic representation of the 36 SNPs chosen to cover the MET locus and their respective locations. The 21 exons are indicated by solid blue boxes and numbered. The SNPs represented include the 31 haplotype-tagging SNPs, 4 non-synonymous variants (in green) and rs1858830 (promoter variant in light blue). (b) The graphical output from Haploview for the MET gene, including the markers tested and the haplotype blocks constructed. Analysis of the markers selected to cover the genetic variation within the MET locus showed four different LD blocks. D′ values are given (bright red corresponds to D′=1, with the colour tending towards white as D′ tends towards 0).

Sequenom iPLEX assay

The genotyping assay was designed using MassARRAY design software (Sequenom) and genotypes were obtained using the MassARRAYTyper system (version 3.1.4.0). The 34 SNPs were genotyped with sample and SNP genotyping success rates of 99 and 100%, respectively. The Picogreen dsDNA Quantitation Kit (Invitrogen, OR, USA) was used to confirm the concentration of all the DNA samples and 40 ng of DNA per sample was used in the genotyping assay. Two control individuals were included on each plate as genotyping controls for inter-plate reproducibility. Genotype Analyzer (Sequenom) was used to check the quality of genotypes and to assign alleles where possible. The data were exported and uploaded into the integrated genotyping system44—an in-house database.

Restriction digest

The restriction fragment length polymorphism assay was selected for genotyping three markers: rs1858830, rs2237711 in IMGSAC cases (as they were incompatible with the Sequenom system) and rs38845 in the Italian cohort and in the ECACC controls. Primers were designed using Primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) and BLAT (Blast-like alignment tool) analysis was performed (using the University of California – Santa Cruz (UCSC) database) to confirm their specificity. Primer sequences used are as follows: forward (f)-GATTTCCCTCTGGGTGGTGC/reverse (r)-CAAGCCCCATTCTAGTTTCG for rs1858830, (f)-GGGCTATGTCCCATTTCTCA/(r)-GGCAGGGTAGTAGGTTGGAA for rs2237711 and (f)-CTGATTCTGCCCTCACTTCAG/(r)-AAGTATGTGTTAGTGAGACCGAAA for rs38845.

For rs1858830, the KOD Hot Start DNA Polymerase kit (Novagen, Osaka, Japan) was used owing to the high GC content in this region (85%). A touchdown-PCR (95 °C for 30 s, 65 °C for 30 s and 72 °C for 30 s, followed by 10 cycles at decreasing annealing temperatures in decrements of 0.5 °C per cycle, then 25 cycles of 95 °C for 30 s, 60 °C for 30 s and 72 °C for 30 s) was performed in 15 μl reactions containing 50 ng of DNA and a final concentration of 2.0 mM MgSO4, 0.33 μ M primer, 0.2 mM dNTPs, 5% DMSO and 0.05 U KOD Hot Start DNA polymerase (Novagen). The PCR for rs2237711 (35 cycles at 90 °C for 30 s, 62 °C for 30 s and 72 °C for 45 s) was performed in 15 μl reactions containing 50 ng of DNA and a final concentration of 1.25 mM MgCl2, 0.4 μ M primer, 0.2 mM dNTPs and 0.25 U BIOTaq DNA polymerase (Bioline, Bath, UK). The touchdown PCR for rs38845 (95 °C for 60 s, 63 °C for 60 s and 72 °C for 30 s, followed by 13 cycles at decreasing annealing temperatures in decrements of 0.5 °C per cycle, then 28 cycles of 95 °C for 60 s, 56 °C for 30 s and 72 °C for 60 s) was performed in 15 μl reactions containing 50 ng of DNA and a final concentration of 2 mM MgCl2, 0.3 μ M primer, 0.2 mM dNTPs and 0.25 U AmpliTaq Gold DNA polymerase (Applied Biosystem, Foster City, USA). In total 5 μl of PCR products of rs1858830, rs2237711 and rs38845 were separated on 2% agarose gels to check the size and quantity of product and the remaining 10 μl incubated with EagI, HhaI and Hpy188I, respectively (5 U of EagI and 1 × NEB buffer 3, in a total volume of 20 μl; 1 U of HhaI and 1 × NEB buffer 4, supplemented with 1 × BSA in a total volume of 20 μl; 2 U of Hpy188I and 1 × NEB buffer 4, in a total volume of 15 μl) at 37 °C for 3 h and visualized with SYBR-safe (Invitrogen) on a 3% agarose gel. Genotypes were identified and entered into integrated genotyping system. As with the iPLEX assay, two control individuals were included on each plate as genotyping controls for inter-plate reproducibility.

Statistical analysis

Error corrections

Mendelian consistency of SNP genotype data in family pedigrees was checked using PedCheck45 and any inconsistent genotypes were removed. Genotypes flanking double recombinants were detected using Merlin46 and disregarded if ambiguous, as these could be indicative of genotyping errors. All the SNPs were tested for Hardy–Weinberg equilibrium using a χ2-test in founders and probands separately.

Association analysis

The pairwise linkage disequilibrium (LD) map for MET was constructed from genotypes using the Haploview software. Association was assessed using the transmission disequilibrium test (TDT).47 TDT uses genotype data from nuclear families consisting of two parents and one or more affected individuals, and is robust to population stratification. A version robust to non-independent siblings was implemented using the DGCgenetics R library (http://www-gene.cimr.cam.ac.uk/clayton/software), allowing for multiple affected offspring from the same nuclear family to be used in this study. Allele frequencies were reported for all the parents, as well as allele transmission frequencies from parents to offspring. Parental transmissions were also examined for each SNP to consider parent-of-origin effects. A haplotype-based TDT analysis was performed using Transmit48 methodology implemented within the tdthap R library (http://cran.r-project.org). For each haplotype, risk estimates and their 95% confidence regions were estimated using a Bayesian method.47, 48

Logistic regression was applied in the analysis of both the alleles and genotypes in the case–control study. We assessed the robustness of results through permutation approaches by sampling and analyzing multiple (50 000) datasets consisting of one affected case per family from the multiplex families.

Correction for multiple testing in family-based study

Performing multiple statistical tests leads to an inflation in the occurrence of false positives and it is necessary to adjust the P-value significance threshold (usually 5%) to account for the number of independent tests. Because of the high LD within the region, a Bonferroni correction using the total number of SNPs would be too conservative.48 By considering the LD pattern, which identified four independent haplotype blocks, it is reasonable to interpret a P<0.0125 to be statistically significant. A separate permutation approach found a similar threshold, established for the principal analyses. However, secondary tests, such as assessing parent-of-origin effect, which we know require larger sample sizes to be meaningful, are not reflected in this threshold.

Bioinformatic analysis

A search for transcription factor binding sites was conducted using the online tools MatInspector49 and TFSearch (http://www.cbrc.jp/research/db/TFSEARCH.html). These software programmes utilize a large library of matrix descriptions for transcription factor binding sites to locate matches in DNA sequences.50

Results

Family-based studies

SNP genotyping and association analysis

A total of 36 SNPs were successfully genotyped across the entire MET locus, all of which were in Hardy–Weinberg equilibrium in both probands and founders (P>0.01). To test the potential association between MET and autism, a family-based analysis was performed using TDT. The P-values for all SNPs tested are reported in Table 1. One SNP (rs38845) showed statistically significant (P<0.01) transmission disequilibrium, having a preferential transmission of the A allele to the affected offspring (P=0.0035, odds ratio =1.3 (95% confidence interval: 1.09, 1.54) – Table 1). This odds ratio of 1.3 indicates that the condition is more likely in the individuals who carry the A allele (conversely, carrying the G allele is protective). Also, parental transmissions were examined for each SNP, but no evidence of parent-of-origin effects was found (P>0.12). Moreover, none of the four non-synonymous SNPs genotyped (rs34349517, rs35776110, rs33917957 and rs34589476) showed association to autism, having very low-heterozygosity content (in particular rs34349517, which was monomorphic in this population). There was 100% concordance with the two control individuals genotyped to control for inter-plate reproducibility.

Table 1 TDT results for the SNPs genotyped across MET

LD patterns and haplotype analysis

Analysis of the LD patterns across MET (using 31 haplotype-tagging SNPs, along with the 4 non-synonymous SNPs and the rs1858830 promoter variant), defined 4 LD blocks of 10, 3, 37 and 56 kb (Figure 1). The determination of haplotypes of SNPs in LD can offer more power to detect association than testing SNPs individually. Haplotype analysis was performed to further characterize the transmission disequilibrium within the LD blocks. Transmission of haplotypes, including all markers within each block, were tested using Transmit48 (Table 2). Haplotype A–A–G–T–G for markers rs38845–rs9641562–rs10487353–rs38846–rs12535996 was overtransmitted to the affected offspring (P=0.007, odds ratio=1.34 (95% confidence interval: 1.08, 1.66)), revealing a significant transmission distortion in LD block 2. This suggests that this specific haplotype (in intron 1) could potentially increase susceptibility to autism. Statistical significance was not increased compared with single markers and so the association obtained with the five-marker haplotype may be driven by the single-locus association to rs38845.

Table 2 Haplotype transmission disequilibrium results within each LD block for MET haplotypesa and for the haplotype combinations from Haploview, selected by Tagger

Given that the haplotype-tagging SNPs were chosen using aggressive tagging in Haploview v4.043, the multi-marker haplotype combinations from the Tagger output were also tested (Table 2). The haplotype ATG with markers rs38845–rs38846–rs38849 was significant (P=0.01, odds ratio=1.33 (95% confidence interval: 1.11, 1.59)). This haplotype again contains the significant single marker and the A allele is overtransmitted to the cases, confirming once more the direction of association. This particular combination of markers is tagging a fourth SNP (rs39747). An overall view of the LD structure across the region reveals that this variant is in the haplotype–block 2 boundary, just before rs38845, but is not in complete LD with the latter (D′=95; r2=0.7).

TDT replication study

The SNP showing evidence for association (rs38845) and the promoter variant rs1858830 were further studied in an independent sample of 82 Italian trios. TDT analysis of the two SNPs did not provide significant results (Table 3).

Table 3 TDT results for rs38845 and rs1858830 using the 82 Italian trios. The allele frequencies shown are for the parents only

Case–control study

A case–control study was also conducted among patients with autism against unselected controls, using allele and genotype data from our SNP of interest (rs38845). The genotypes of 185 ECACC and 88 Italian controls were determined by restriction enzyme digestion. Case–control analysis was performed using logistic regression in IMGSAC probands against ECACC controls and Italian cases against Italian controls. Allelic tests of association applied to the Italian cohort indicated an increased risk of the A allele compared with the G (P=0.017 (95% confidence interval: 1.10, 2.59)). A genotype analysis suggested an increased risk for the AA/GA genotypes compared with GG (a dominant effect of A) of nearly 2.5 times (P=0.02 (95% confidence interval: 1.16, 5.33) (Table 4)). No significant allele or genotype frequency difference was detected in IMGSAC cases vs the sample of ECACC controls.

Table 4 Case–control studies for the rs38845 variant in the United Kingdom and Italian populations

Bioinformatic analysis of rs38845

The results from MatInspector49 predicted the creation of a binding site for an interferon-stimulated response element (interferon regulatory factor 1 (IRF1) (MIM *147575)) in the region only when the A allele of rs38845 is present. This prediction was confirmed by TFSearch.50 In addition, the TFSearch output showed that the protein C/EBP (CCAAT/enhancer-binding protein) is predicted to bind the same region only when the A allele is present. The binding of these elements could be of importance in the regulation of MET expression.

Discussion

MET has been implicated in brain circuit development and in peripheral functions (such as gastrointestinal repair and immune function),34, 38 which are consistent with observations in a proportion of autistic patients. As the gene is also located in a key area of linkage on chromosome 7, MET was examined as an appealing candidate gene for autism by performing a thorough analysis aiming to cover all its genetic variation. However, the earlier reported genetic association of the common C allele of rs1858830 in the promoter region38 failed to replicate in this study. A family-based analysis was carried out using TDT and Transmit, but there was no evidence of significant association for this marker in the family collection tested. Several explanations can be proposed for this incongruity. The lack of association found with the pre-ascertained variant might be because of the differences in the sample collection, such as the inclusion/exclusion criteria, the different size of family collections and the use of different analytical approaches. Furthermore, although our sample was mostly composed of multiplex families, and Campbell et al reported their association to be stronger in such families, heterogeneity in autism cohorts could still be a potential source of variation. The TDT approach used is robust to population stratification and although it only uses heterozygous parent transmissions, should be an efficient method to fine map susceptibility loci in a region of linkage such as the one on chromosome 7.

The family-based association analysis carried out indicated that there is a genetic association between the marker rs38845 and autism. The A allele is transmitted to the affected offspring more often than expected by chance, suggesting that it increases the risk that carriers of this allele will develop the disorder. In addition, the haplotype analysis revealed that the risk haplotype A–A–G–T–G for the combination of markers within LD block 2 (rs38845–rs9641562–rs10487353–rs38846–rs12535996) and the haplotype ATG (rs38845–rs38846–rs38849) were overtransmitted to the affected offspring. Therefore, these findings indicate that the A allele of rs38845 increases the risk of developing autism, possibly interacting with other susceptibility genes (not yet known), and with still undefined environmental factors. It is noteworthy that rs38845 is not in strong LD with the promoter SNP rs1858830 (Figure 1). We attempted to replicate this finding in an independent sample collection of 82 Italian trios and 88 Italian sex-matched controls. Using TDT, no evidence of association was detected with either rs38845 or rs1858830. Specifically for rs1858830, this result is not completely surprising as the association was first found in multiplex families, while the Italian cohort is composed of simplex families. However, case–control analysis in the Italian cohort revealed a positive association of rs38845, in the same direction. Although the sample size for the Italian case–control sample is arguably small, this result can be taken as a confirmation of the positive result obtained earlier by family-based analysis in the IMGSAC sample set.

Finding different associated variants within the same region (such as ours and Campbell et al, 2006) is often reported.51 When an association is replicated, it is frequently with a different phenotype of the disorder, with different polymorphisms in the same gene and even with different alleles of the same variant, such as examples reported in asthma.52 The lack of concordance between our association studies reflects once more the complexity in studying such a heterogeneous disorder as autism, and should not be taken to mean that the results are false positives. There are now several studies associating MET with psychiatric disorders, especially within the area upstream of the coding region.38, 39 Regulatory mechanisms not yet identified may be acting in this region and be responsible for alterations in gene expression.

Multiple upstream and downstream proteins regulate MET signalling, such as the availability of the HGF ligand and MET activating coreceptors. For instance, the CD44 element that forms a multimeric complex with MET and HGF and is required to activate signalling through the MET receptor tyrosine kinase.39 As the rs38845 polymorphism is located in intron 1, it may have a regulatory role in gene expression, affecting, for example, transcription rate or splicing. Alterations in MET expression may contribute to autism susceptibility. Bioinformatic analyses of this variant predicted that the nuclear factor IRF1 only binds the target DNA sequence when the A allele is present. As the A allele was associated with autism susceptibility in our families, this warrants further investigation.

IRF1 is a nuclear factor that binds the promoter region of both IFN-α and IFN-β genes53, and IRF1 functions as a transcriptional activator for the type I IFN genes54 (in which IFN-α and IFN-β are included). Additionally, an earlier study showed that in human hepatocytes IFN-α upregulates 44 genes by 100% and downregulates 9 genes by 50% (including MET). Downregulation of MET is caused by IFN-α suppression of the MET promoter activity (through downregulation of Spl), which results in an attenuation of HGF/MET-induced signals and cell proliferation.55 Recent literature has suggested that a reduced expression of the MET receptor tyrosine kinase could play a crucial role in the pathobiology of autism.38 Thus, we hypothesize that when the A allele is present at rs38845, IRF1 binds directly to the MET promoter DNA sequence contributing to increased deregulation of HGF/MET-induced signalling in autistic patients by mechanisms not yet known.

In disorders with a complex genetic basis there has been a growing interest in LD, mainly because of the belief that the association studies offer considerable power for mapping common disease genes.56 Nonetheless, the pursuit of common variants might not have been solely the most suitable to identify the susceptibility variants in this specific region, as we should perhaps be looking for rare variants in parallel. The repeated occurrence of non-replication in the story of the genetics of ASDs seems to be a constant dilemma, and reflects that going from a broad peak of linkage to a functional genetic variant continues to be a challenge for autism today.

Overall, these results provide further evidence that MET may play a role in autism susceptibility; however, further studies will be essential to better clarify its part in the pathology of the disorder.