The growth in the number of online metabolism and metabolomic resources over the last decade has paralleled the growth in online resources and online databases for genetic and metabolic diseases. As in most areas of medical science, the need to develop online disease databases arose from the need to disseminate rapidly and update continuously the information being acquired about known and newly discovered genetic diseases. Many of these genetic and/or metabolic disease resources started off as textbooks or simple downloadable lists. However, the opportunities and cost savings provided by Web distribution have led to a proliferation of disease-specific Web-enabled resources. In fact, the book you are reading [the Online Metabolic and Molecular Basis of Inherited Disease (OMMBID)] is a perfect example of a textbook on metabolic and genetic diseases that has moved to an online format. As a rule, most genetic and/or metabolic disease databases are quite specialized, with a specific focus on a single disease. However, a few resources are quite general and impressively comprehensive. This section will review a number of the better known or more useful genetic and/or metabolic disease databases. More are listed in Table 3.1-4.
Perhaps the best example of a comprehensive genetic-metabolic disease database is the Online Mendelian Inheritance of Man (OMIM). OMIM is a superbly researched encyclopedic resource that contains genetic (cloning, gene function, gene structure, mapping), phenotypic, historical, and clinical data on more than 5500 genetic and/or metabolic disorders (Hamosh et al, 14). Among those disorders, nearly 400 diseases have specific gene sequences associated with them and another 1900 disorders are characterized by a solid understanding of their molecular origins. Originally started in the early 1960s by Dr. Victor McKusick at Johns Hopkins, the Mendellian Inheritance of Man (MIM) was published in 12 print editions from 1966 to 1998. In 1987, the material in the MIM was made available online and renamed OMIM, and in 1995 it was ported to the Web by the National Center for Biotechnology Information (NCBI), where it has remained ever since. Curation and editing of OMIM occur primarily at Johns Hopkins Medical School, although many entries include contributions from scientists around the world. OMIM entries include not only detailed descriptions of diseases but also detailed descriptions of selected disease genes.
OMIM is searchable by MIM number, disease name, gene name, and plain English queries. The "limits" button allows users to restrict searches to disease or gene classes, certain data fields, certain types of records, and certain chromosomes. After a query is made, the OMIM search engine automatically ranks database matches according to the relevance and frequency with which the query term appears in a specific OMIM entry. High-scoring matches then are presented as a hyperlinked list. Clicking on a disease or gene name will send the user directly to the complete OMIM entry. Within each OMIM entry there is a menu on the left side that provides hyperlinks to subtopics within the OMIM page as well as tdatabases outside OMIM. Typically, each disease entry contains a title, a list of synonyms, a short description of the disorder, and a summary of the clinical features, mode of inheritance, cytogenetics, diagnosis, pathogenesis, evolution, and genetic variability, along with relevant references. Below the pagination links on each page’s menu list are links to EntrezGene sequence databases (RefSeq, GenBank, UniGene, etc.), a Clinical Synopsis, and a Gene Map table. The Gene Map lists the chromosome location, gene symbols, and methods of mapping and provides hyperlinks to the NCBI Map Viewer for more detailed views of the gene’s context within the chromosome. Some disease entries also contain links or "Linkouts" to external disease databases [such as the Cystic Fibrosis Mutation Database (CFMDB)]. OMIM entries for disease genes are similarly structured, with names, synonyms, or alternative symbols listed at the top of each page, followed by a short description of the gene as well as a summary of the cloning process, gene function, mapping, allelic variability, and related references. Links to EntrezGene, a Gene Map table, and Linkouts to external databases are provided for many disease gene entries.
OMIM entries frequently contain a small lightbulb icon at the ends of paragraphs. Clicking on the lightbulbs allows users to extract relevant papers from PubMed that may relate to the material described in the paragraph. This link exploits the NCBI’s "neighboring" feature that uses keywords from the associated paragraph to generate a list of relevant search terms. In addition to its Gene Map tools, OMIM maintains something called Morbid Map, an alphabetical hyperlinked listing of all mapped disorders in OMIM. This table lists the disease name(s), associated gene(s), chromosome location, and OMIM number, all of which are hyperlinked to various NCBI resources and databases. The Morbid Map allows users to identify rapidly which genetic diseases are associated with which genes or chromosomes.
OMIM is an exceptional resource with well-written and well-researched articles on almost every genetic and inherited metabolic disease that has been formally characterized. OMIM also uses many of the best features of the Web as it employs extensive hyperlinking to PubMed, NCBI’s sequence databases, and NCBI’s gene-mapping tools. Perhaps the only limitation to OMIM is its complete lack of images or diagrams. Indeed, relative to metabolic or metabolomic databases that are highly visual and full of hundreds, if not thousands, of colorful pictures, OMIM is a rather dry-looking text-only database. It is hoped that this situation will be remedied soon, with OMIM incorporating information and images from at least a few of the metabolomic or metabolic pathway databases listed in Tables 3.1-1 and 3.1-2
Other General Genetic and Metabolic Disease Databases
With the possible exception of the OMMBID (which is really a textbook), few online resources can match the comprehensiveness of OMIM. Nevertheless, there are a number of smaller general genetic and metabolic disease databases that are worth mentioning. They include MetaGene Online, the Japan Metabolic Disease Database (Takeuchi et al, 42), GeneDis, the National Organization of Rare Disorders (NORD) database, and the GeneTests database. These databases are discussed in detail below.
MetaGene Online (Table 3.1-4) is an online database of metabolic disorders developed by G. Frauendienst-Egger and F. K. Trefz in Germany. Although never formally published as a paper, it has been updated continuously since 1999. Currently, MetaGene Online has 416 metabolic disorders cataloged in its database. The database supports simple text querying, allowing complete and partial name searches of diseases, metabolites, and symptoms. MetaGene is also browseable through a hyperlinked list of all 416 diseases. Clicking on a disease name (from its disease list) or entering a specific disease name in the text search box will generate a MetaGene "disease card." Each disease card contains information on the disease name, synonyms, the MIM number, associated genes and/or enzymes, gene locus, hyperlinks to EXPASY and OMIM, remarks on the disease origins (autosomal, X-linked, dominant, recessive, etc.), phenotypic traits (which are hyperlinked and cross-referenced), changes in metabolite concentrations (increased, decreased) in various biofluids, possible therapies, pictures of individuals with the disease, and an extensive list of references. Unlike OMIM, the MetaGene database is not encyclopedic in nature. All the information is presented in a synoptic tabular format, with at most a few words attached to any field. Nevertheless, the breadth and depth of information, especially with respect to measured metabolite changes and other clinical diagnostic traits, make this unique database particularly valuable to biomedical researchers.
Table 3.1-4: Genetic and Metabolic Disease Databases
The Japan Metabolic Disease Database
The Japan Metabolic Disease Database (JMDBase) (Takeuchi et al, 42) is essentially a polymorphism or SNP database as opposed to a disease database. JMDBase catalogs the SNPs associated with genes that have been implicated in a number of polygenic or multifactorial metabolic diseases, such as diabetes, obesity, and metabolic syndrome. The intent of this resource is to improve the characterization of multigenic or polygenic metabolic disorders. Currently, JMDBase has cataloged 5914 SNPs from 401 disease-associated genes. The database can be searched by gene symbols, gene names or synonyms, phenotypes, text words in gene summaries, and dbSNP IDs. Users also can select different chromosomes to limit their searches. Database queries return hyperlinked lists of matching genes and gene summaries with matching keywords highlighted in the text. JMDBase’s gene summaries are typically 100- to 200-word descriptions that comment on the gene function and the potential association of the gene with different polygenic metabolic disorders. JMDBase is limited in its searching utilities, having a very strong genocentric focus, as opposed to a disease-centered focus. Furthermore, JMDBase does not include most of the better-known single-gene or monogenic metabolic disorders, such as phenylketonuria, alkaptonuria, and Tay-Sachs disease. Although these monogenic diseases are formally outside the scope of JMDBase, their exclusion limits this database’s utility among medical geneticists.
The GeneDis database (Table 3.1-4) is a very modest human genetic disease database that covers only 12 common genetic disorders (some of which are metabolic diseases). Each disease has a curated three- to five-page website describing the disease in detail and all of its known or associated genes and/or proteins. The sequences of the associated genes and proteins are provided as mutation tables that describe the types of associated mutations, their gene sequence locations, the phenotype, ethnic or subpopulation associations, and appropriate references. In some cases, the three-dimensional structure of the disease-related protein (experimental or homology modeled) also is provided. Unlike most other genetic disease databases, GeneDis is colorful and visually appealing. As this database grows in size and scope, it should attract a wider number of potential users.
The National Organization of Rare Disorders Database
The NORD database (Table 3.1-4) is a subscription-only online database that provides detailed clinical and support society information on rare (mostly genetic) disorders. Currently, the database contains information on 1100 diseases or disorders. Each NORD disease report contains information on the disease name or names, synonyms, and classification, followed by a well-written, two- to three-page summary covering a description, symptoms, causes, affected populations, related disorders, and standard and/or investigational therapies, along with a listing of support organizations affiliated with the disease or disorder. The NORD database is a text-only resource that is targeted primarily to clinicians, caregivers, and those who have the disease. As a result, the NORD database does not have a particularly strong molecular or genetic focus and does not link to the data or images found on other online sequence, structure, or metabolic pathway resources. Nevertheless, the NORD database is a superbly curated and clinically important online resource.
The GeneTests and GeneReviews Database
Similar to the NORD database, the GeneTests website is an online genetic disease resource targeted primarily to clinicians, caregivers, and patients. Originally started in 1993, it is freely available (unlike NORD) and is maintained and curated at the University of Washington in Seattle. In addition to providing a very current directory of clinics and laboratories performing genetic disease testing, the GeneTests website provides detailed reviews of genes, proteins, and their associated diseases through its GeneReviews section. Both the reviews and the clinical testing laboratory directory are searchable with a simple text interface that allows users to query by disease name, gene symbol, chromosomal locus, protein name, protein feature, or OMIM number. Users also can search the clinical directory on the basis of services, name of director, or location. Most gene or disease queries will return a page with three or four clickable options: (1) Testing, (2) Research, (3) Reviews, and (4) References. Clicking on "Testing" will return a hyperlinked table that lists testing laboratories, tests offered, and laboratory locations. Clicking on "Research" will provide a list of ongoing clinical trials and lists of laboratories conducting research on the gene or disease of interest. Clicking on "Reviews" will generate an expertly annotated GeneReview about three to four pages in length concerning the disease or gene of interest. Each GeneReview follows a style similar to that of a typical OMIM entry, beginning with a brief summary of the disease (or gene), detailed information on diagnosis, a clinical description with information on prevalence, information on disease management, genetic counseling information, a description of the molecular genetics, resources for patients, and a collection of references. GeneTests has cataloged 1700 clinics and testing laboratories that test for more than 1266 genetic diseases. Since 1997, a total of 342 reviews have been deposited into the GeneReviews database.
Specialized Metabolic and Genetic Disease Databases
As shown in Table 3.1-4, a number of disease-specific or locus-specific databases have emerged over the last decade. Some of them are PAHdb [the phenylalanine hydroxylase locus knowledgebase (Scriver et al, 35)], the T1Dbase [a type 1 diabetes database (Smink et al, 39)], PHEXdb [the X-linked hypophosphatemia database (Sabbagh et al, 32)], and G6PDdb [the glucose-6-phosphate dehydrogenase deficiency database (Kwok et al, 22)]. These are much smaller in scope and size than the general genetic and metabolic disease databases described above. In fact, some amount to no more than a few pages of online tables and text. Nevertheless, their existence underlines the fact that there is a real or perceived need to distribute and update this kind of information rapidly to the public, clinicians, and the biomedical research community.
These disease- or locus-specific databases typically provide much more detailed descriptions of a disease and its molecular genetics than what might be found in more comprehensive disease databases. This is the case because these online resources usually are hosted, curated, and edited by some of the world’s leading experts on those diseases. As a rule, most disease-specific databases provide information on the disease-associated gene or genes, the corresponding enzymes and their three-dimensional structures, extensive lists of clinical phenotypes, and information about animal models and their orthologous genes as well as detailed summaries and literature citations regarding any disease-associated mutations.
The Glucose-6-Phosphate Dehydrogenase Deficiency Database
The G6PDdb (Kwok et al, 22) is a relational database that is designed to facilitate research into favism (a glucose-6-phosphate dehydrogenase deficiency). Favism is an X-linked disorder that leads to a sudden destruction of red blood cells that may lead to hemolytic anemia after the intake of fava beans, certain legumes, and various drugs. Favism is estimated to affect about 400 million persons worldwide, with the highest prevalence rates in sub-Saharan Africa, the Middle East, certain regions of Asia, and some parts of the Mediterranean. G6PDdb integrates recent mutational data from a number of public databases [GenBank, OMIM, the Human Gene Mutation Database (HGMD), the Human Genome Variation Database (HGVBase)] with mapped structural data from the Protein Data Bank (PDB) along with data from biochemically or phenotypically characterized patients with G6PD deficiencies. The developers of the G6PDdb have developed software that automatically analyzes mutations that are likely to have a significant impact on the structure of the G6PD protein. As with most online databases, G6PDdb supports text queries, although they are limited to mutation names, mutation classes, and selected amino acid residue types. It also allows users to submit information on recently identified G6PD mutations.
The PHEXdb (Sabbagh et al, 32) is another integrated, locus-specific database; this one is designed to facilitate research into X-linked hypophosphatemia (XLH). This genetic disorder is a rare dominant disease caused by mutations to a phosphate-regulating gene called PHEX on the X chromosome. XLH affects phosphate homeostasis and can lead to growth retardation, rachitic and osteomalacic bone disease, hypophosphatemia, and problems with vitamin D metabolism. The database contains data on PHEX mutations; recent publications on XLH, cDNA, and gene structure information on PHEX; a "working model" of PHEX activity; and data on murine homologues and clinical PHEX phenotypes. PHEXdb supports three types of queries, allowing users to search the database by mutation, phenotype, and references. It also allows users to submit data on recently identified PHEX mutations.
The Phenylalanine Hydroxylase Database
The PAHdb (Scriver et al, 35) is a relational locus-specific database that cataloges the mutations (both pathogenic and benign) in the human Phenylalanine hydroxylase (PAH) gene locus. PAH is a gene that is necessary for the conversion of phenylalanine to tyrosine. Mutations in the PAH gene can cause a number of clinical disorders, including phenylketonuria (PKU) and hyperphenylalaninemia (HPA). Left untreated, PKU and HPA can cause brain damage and progressive mental retardation as a result of the accumulation of phenylalanine and its breakdown products. The incidence of occurrence of PKU is about 1 in 15,000 births but ranges from 1 in 4500 births among the Irish to fewer than one in 100,000 births among the population of Finland. The PAH database not only cataloges known PAH mutations but also provides an annotated view of the genomic DNA and cDNA, a mutation map, information on the source of the mutation (population, name of submitter), and information on murine PAH orthologues as well as the predicted effect of the mutation on the PAH enzyme structure or expression. A clinical module within the PAHdb also provides information on PAH-associated disorders and their clinical phenotype as well as data on their inheritance and treatment. Users can search the PAHdb for mutations, mutation-haplotype associations, ethnicity or regional associations, phenotypes and genotypes, and in vitro expression data. As with PHEXdb and G6PDdb, PAHdb supports user submissions of recently identified PAH mutations. In both its content and its layout, the PAHdb is one of the most complete and comprehensive disease-specific websites.
The Type 1 Diabetes Database
Another comprehensive and well-designed disease-specific database is T1Dbase (Smink et al, 39). This recently developed online resource was developed to support the type 1 diabetes (T1D) research community. Diabetes mellitus is a metabolic disorder that is characterized by varying or persistent hyperglycemia (high levels of blood sugar) resulting from the defective secretion or action of insulin. Type 1, or juvenile, diabetes is primarily an autoimmune disorder in which the body makes antibodies that attack the insulin-producing islet cells in the pancreas. According to the World Health Organization, at least 171 million people worldwide have diabetes, with approximately 10 percent having type 1 diabetes. The causes of T1D are complex but may involve a combination of factors, including genetics, viruses, diet, chemicals, and environmental factors. The genetic contributions to T1D are of particular interest to the T1D research community as more than 30 candidate or susceptibility regions have been associated with the development of this disorder. T1DBase includes annotated genome sequence data for the human, rat, and mouse; clickable chromosome maps of T1D susceptibility regions; data on T1D susceptibility regions determined by genetic linkage and association studies; cultured beta-cell gene expression data (obtained under various conditions); B-cell gene annotations; related T1D data on gene expression from different tissues and organs; and related T1D pathways from KEGG and BioCarta. T1Dbase also has a rich variety of browsing and interactive viewing tools, including the GBrowse (Burren et al, 5) genome browser, Cytoscape (Shannon et al, 38) for visualizing and analyzing biological networks, and GESTALT (Glusman and Lancet, 12) for performing genome annotation. Like PAHdb, the T1DBase is more appropriately called a disease knowledgebase than a simple database as it provides far more information than tables of sequences, facts, and figures.
A Summary of Genetic and Metabolic Disease Databases
Online disease databases are critical to providing information about and context to the many hundreds of inherited conditions that have been found or identified over the last quarter century. The number of disorders, combined with the rate at which new information is being acquired for many genetic diseases, means that most information about those diseases has had to migrate to the World Wide Web. This shift to "electronic publishing" or "electronic databasing" is important, as it allows the research community to stay much more current and have access to much more detailed data than would be available through standard print resources. This movement toward Web-based data exchange also has allowed a proliferation of database styles and database types, each of which is designed to appeal to different users with different needs. Some, such as OMIM and MetaGene Online, are very comprehensive and attempt to provide at least a little information about most kinds of inherited metabolic disorders. Others, such as PAHdb and T1DB, are disease-specific and attempt to assemble all known genetic and clinical data about a single disease in a single electronic repository. The growth in both the number and the comprehensiveness of disease-specific databases is encouraging and suggests that more and more disease specialists are realizing that the Web is probably the best place to share their world-class knowledge with the rest of the biomedical community.