Long Human–Mouse Sequence Alignments Reveal Novel Regulatory Elements: A Reason to Sequence the Mouse Genome

  1. Ross C. Hardison1,2,
  2. John Oeltjen3, and
  3. Webb Miller2,4,5
  1. Departments of 1Biochemistry and Molecular Biology and 4Computer Science and Engineering, and 2Center for Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania 16802; 3Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030

This extract was created in the absence of an abstract.

The utility of sequencing entire genomes of bacteria and fungi is amply demonstrated. For instance, as the complete set of genes for each species is catalogued, one can ascertain the full complement of encoded proteins, obtain insights into the function of new proteins by sequence matches to known proteins, and measure the transcriptional levels of all genes in a genome under various environmental conditions or at different stages of the cell cycle (Boguski et al. 1996; Velculescu et al. 1997). The currently sequenced genomes consist primarily of coding regions with little sequence between the genes, and the amount of genetic information in each segment is usually quite high. Larger genomes from more complex organisms have a considerable amount of DNA between the genes and in introns that interrupt the coding regions, and one could question whether it is useful to determine the sequences of all of these noncoding regions. Indeed, the concerted efforts to determine partial sequences of normalized cDNA libraries have generated rich and very useful databases, such as the TIGR database (TDB) and dbEST (Adams et al. 1991; Boguski 1995). Efforts from Schuler and his colleagues to unite the several sequences from each set of cDNA clones representing a unique gene, the UniGene project, will organize this large amount of sequence data. As of late 1996, the UniGene database contained samples of sequences of almost 50,000 genes, which could represent a majority of human genes (Schuler et al. 1996). Of these UniGene clusters, 16,000 have been placed on the human genome map, which will greatly aid in positional cloning of interesting genes.

Although TDB, dbEST, and UniGene are extremely useful, they do have limitations. For instance, comparison of a long genomic DNA sequence with the expressed sequence tag (EST) databases is a very effective method for …

| Table of Contents

Preprint Server