We analyzed the disorders included in MMBID7 (1995) with the aim of characterizing the genes and proteins so identified and of describing the burden of this group of diseases on human health. We also asked whether the genes in MMBID7, identified by their association with a human phenotype, are a representative sample of the entire human gene repertoire. Our results reflect the evolution and expansion of this field and put it in perspective with medicine as a whole. They also highlight areas where our understanding of disease is limited and where future studies will be of value.
We first examined the growth and contents of MMBID as a reflection of the expansion of the study of inborn errors of metabolism. Since its first edition in 1960, MMBID has enlarged considerably as new diseases have been identified and as the editors expanded their view of the types of disorders includible under the heading of inborn errors. This is reflected in a threefold increase in pages and chapters and a fivefold increase in disorders, contributors, weight, and thickness! The original editors (John B. Stanbury, James B. Wyngaarden, and Donald Frederickson with later assistance from Michael S. Brown and Joseph L. Goldstein) served through the fifth edition; the sixth and seventh editions were compiled by the current editorial team, which has continued to widen the scope of the book, reasoning that virtually all monogenic disorders result from a specific change in a protein that perturbs its function in specific homeostatic or developmental systems. The criteria set by the editors for inclusion in MMBID7 included a clinical phenotype plus a biochemical and/or molecular phenotype. On this basis, 460 genes were included in MMBID7. Coincident with this evolution of editorial perspective, there has been an increase in the number of biomedical professionals interested in the study of the biochemical and molecular basis of human disease. Together, these factors have led to a gradual transition of MMBID from a book devoted to small numbers of rare inherited metabolic diseases to one that has a view of medicine that embraces all diseases in terms of their biochemical and molecular bases. This evolution in our thinking about inborn errors was anticipated by A.E. Garrod who saw rare inborn errors as “merely extreme examples of variations of chemical behavior which are probably everywhere present.”16
Burden of Inborn Errors of Metabolism in Human Health
In a study performed nearly 20 years ago, Costa and colleagues assessed the impact of monogenic disorders in human health.15 They studied 351 confirmed monogenic disorders, selected in an unbiased way from the fifth edition (1978) of Mendelian Inheritance in Man.17 Using information from the literature, they characterized this set of disorders in terms of inheritance, age at onset, organ systems involved and impact on life span, reproductive capability, and permanent disability. Their results showed that most of the disorders they surveyed became apparent early in life, most involved multiple organ systems, and more than half reduced life span and reproductive capacity, and limited life opportunities.
For this review, we evaluated all the disorders described in MMBID7 by criteria developed from those used by Costa and colleagues.15 In contrast to their study, we were able to include more extensive molecular information and we made no attempt to review additional literature; all information was extracted from MMBID7. Of the 460 monogenic disorders in MMBID7, we excluded those with so little information that they could not be scored (n = 30). The definition of the categories and the scoring system for each disease is presented in Table 4-2. Because information pertinent to each of the categories was not included for every disease, the denominator may vary from one category to another.
Table 4-2: List of Categories Scored in Entries from MMBID7 |Favorite Table|Download (.pdf) Table 4-2: List of Categories Scored in Entries from MMBID7
|1. Mode of inheritance. Unknown, autosomal recessive, autosomal dominant, X-linked, mitochondrial, multifactorial or complex |
|2. Frequency. Unknown, common (<10,000), rare (10-100,000), very rare (>100,000) |
|3. Ethnic aggregation. Yes/No |
|4. Age at onset. Not reported; in utero; birth to 1 yr; [to puberty,] early adulthood (puberty to 50 yr); late adulthood (>50 yr); no recognized clinical manifestations. We used the age at onset for the typical phenotype to classify each disorder. |
|5. Systems and organs involved in phenotype. See text. |
|6. Reduction of life expectancy. None; mild; moderate; severe; not reported. |
|7. Chromosome mapped. Yes/No |
|8. Gene cloned. Yes/No |
|9. Mutations described. Not described; point mutations; gross rearrangements; expansion alleles. |
|10. Molecular mechanism of inheritance pattern. Unknown; haploinsufficiency; gain of function; dominant negative; recessive loss of function; other. |
|11. Function of deficient gene product. Unknown; enzyme; transcription factor; receptor; hormone; channel protein; transmembrane; transporters; extracellular transporter; modulator of protein function; extracellular matrix; intracellular structural protein. |
Inheritance Patterns and Frequency
Information on mode of inheritance was available for 413 of the 430 disorders (Fig. 4-1A). Most (≈60 percent) were inherited as autosomal recessive traits while approximately 20 percent were inherited as autosomal dominants, with X-linked recessives and mitochondrial traits still lower. The threefold excess of recessives over dominants is the converse of large surveys of all Mendelian phenotypes (e.g., see the twelfth edition of MIM, v. 1, p. xviii).18 This difference likely derives from a traditional emphasis of MMBID on enzyme deficiencies, nearly all of which are inherited as recessive traits. For example, of the 44 chapters dealing with specific disorders in the first edition, 25 (57 percent) described enzyme deficiencies. Despite an expanding view of inborn errors, this tradition continues in MMBID7 where 76 (55 percent) of 138 chapters devoted to specific disorders involve enzyme deficiencies.
Distribution of MMBID7 disorders by (A) mode of inheritance and (B) frequency. See Table 4-2 for definition of categories.
Population frequency information was supplied for about two-thirds of the disorders in our study (n = 284). The lack of reliable frequency data for such a large number of disorders (n = 156 or 36 percent) is an indication of the difficulty in obtaining this information, particularly for rare diseases. Population-based screening is performed for only a few disorders (e.g., in the state of Maryland, only six inherited disorders are included in the Newborn Screening Program). The recent development of tandem mass spectrometry as a screening tool offers the promise of providing prospective frequency data on scores of disorders, but these results are not yet available.19–21 Thus, in the absence of direct measures, disease frequency is usually estimated by extrapolation from the number of recognized cases. This underestimates frequency, in part, because patients who are more mildly affected or whose phenotype otherwise varies from the “classical” are less likely to be recognized. Thus, there is consistent underestimation of disease frequency and a distorted view of phenotypic severity, skewed to the severe, easily recognized end of the spectrum. With these limitations, the range of frequencies for the diseases in MMBID7 was wide; the most common disorders had frequencies on the order of 0.0001 in a particular target population (e.g., cystic fibrosis in northern Europeans; type I Gaucher disease in Ashkenazi Jews), while more than 80 percent of the disorders were rare or very rare with frequencies <0.0001 (Fig. 4-1B).
We designated ethnic aggregation when the incidence of the disorder was higher in populations of substantial size (≥5 million) with a common biologic background. Only 14 percent of the disorders had a clear ethnic predomination in well-characterized ethnic groups such as Ashkenazi Jews and Finns. Nearly all (>80 percent) of these were inherited as autosomal recessive traits.
Age at onset of clinical manifestations of a disease has medical and biological significance. Physicians categorize diseases by age at onset and organ system involvement. Indeed, these features determine what sort of doctor a patient is likely to see. The biological significance of age at onset for genetic disease is that it reflects the timing of the development of incongruence between the genetic program of an individual and the developmental, homeostatic, and environmental demands placed on that individual. For some disorders, compensatory mechanisms may postpone the onset of overt clinical manifestations for years or even decades (e.g., certain lysosomal storage diseases) while for others, the adaptive mechanisms are overwhelmed in short order (e.g., acute metabolic diseases of the newborn). A related biologic variable is the extent to which a particular mutation disrupts the protein’s function as a unit step in homeostasis. Thus, some patients with completely inactivating mutations at the ornithine transcarbamylase locus present within 24 to 48 h after birth, while others with some residual function do not come to medical attention until childhood or even young adult life.22
We enumerated age at onset for the diseases identified in MMBID7 scoring the age when clinical manifestations appear in the standard presentation dividing life into five stages: in utero; birth to the first birthday; 1 year to puberty; early adulthood (puberty to 50 years); and late adulthood (>50 years). Of the 389 phenotypes we were able to analyze for this characteristic, the vast majority (85 percent) showed clinical manifestations in the prereproductive age (Fig. 4-2). Age at onset for the remainder was mainly during early adulthood. This result is similar to that of Costa et al.15 who found that 25 percent of the disadaptative Mendelian phenotypes in their study were apparent at birth and over 90 percent by the end of puberty. Parsing the age at onset distribution according to mode of inheritance (Fig. 4-3) has little consequence for autosomal recessive and X-linked disorders, but autosomal dominant phenotypes have a bimodal age distribution with the modes occurring during the first year of life and early adulthood.
The relationship between age at onset of clinical symptoms and cumulative frequency of the disorder grouped by mode of inheritance.
Age at onset of clinical symptoms: A, all disorders; B, autosomal recessive; C, autosomal dominant; D, X-linked.
Thus, for most of the disorders in MMBID, the consequences for development and homeostasis are severe and disadaptive early in extrauterine life. This explains the more rapid invasion of genetics into pediatrics as compared to internal medicine, 23 and predicts difficulty with treatment. The bimodal distribution of the autosomal dominants is of special interest (Fig. 4-3C). Some in the late onset group may reflect new age-related demands on homeostatic systems revealing previously asymptomatic functional deficits (e.g., Chap. 124). Other disorders in this group may be caused by partial alterations in protein function and homeostatic abilities that gradually produce cell damage leading to cell death and eventually to system failure (e.g., Chaps. 223 and 234).
Figure 4-4 shows a relationship between age at onset and frequency of disease. The figure reveals that when diseases are frequent, ages at onset vary widely, even while those of infancy and childhood predominate. But as frequency declines, the range of onsets narrows. If we compare Figs. 4-4A and 4-4C, we see that for the very rare disorders onsets in utero are reduced by half, those in infancy almost double, and those over 50 disappear altogether. It is perhaps what one expects. The more damaging the gene effect, the earlier the onset and the more subject to negative selection. On the other hand a gene effect associated with disease of late onset is more likely to be conditional, dependent upon collaboration with others and with experiences of the environment and so less subject to selection. A geneticist would say that the heritability of disease varies directly with severity and indirectly with age at onset.
Age at onset for (A) common, (B) rare, and (C) very rare disorders.
Organs and Systems Affected
One difference among genes is their pattern of expression. For some, expression is exquisitely limited to one cell type (e.g., rhodopsin in photoreceptors; phenylalanine hydroxylase in hepatocytes), while others are broadly expressed, apparently required by nearly all cell types (so-called housekeeping genes). How does this difference in expression relate to involvement in disease? Or does it? One clue might come from an examination of the number and type of organs and systems affected in each genetic disease, with the caveat that pattern of gene expression is but one of several variables that determines pathophysiology and hence organ and system involvement.
We scored organ and system involvement for each disease according to abnormalities detected by clinical history, physical exam, and routine laboratory investigation (Table 4-2). For practical reasons, we limited the number of organs and systems scored to three, selecting those that cause the most severe problems for the patient and added an additional category for those disorders with ≥4 organs or systems involved. We divided the organs and systems into 20 categories: blood included plasma proteins and all blood cells; reticuloendothelial included spleen, lymph nodes, Kupffer cells, and macrophages; nervous system was scored for both central and peripheral neuropathies; genitourinary refers to kidneys and the genitourinary tract; digestive included alimentary tract but not liver and exocrine pancreas; circulatory included the heart and blood vessels; muscular was limited to striated muscle; integument included skin, nails, hair, and mammary glands; and limbs included developmental defects. In the category designated metabolic, we included disturbances in the total body water concentration of small molecules such as amino acids, sugars, and organic acids. The remaining categories are self-explanatory.
Our results showed that 70 percent of the phenotypes were multisystemic (Fig. 4-5). Metabolic was the most frequent (47 percent) followed closely by the nervous system (43 percent). None of the organs and systems we analyzed was immune from genetic disease (Fig. 4-6). Thus, pleiotropism is the rule for nearly all (>70 percent) the phenotypes in MMBID7. It is hardly surprising that most diseases affect more than one tissue. The multiplicity of affected cells and organs is simply a reflection of how tightly integrated the body is. Indeed, it is a tribute to the versatility of homeostasis that more are not involved.
Distribution of MMBID7 phenotypes by number of organs and systems affected.
Involvement of organs and systems in MMBID7 phenotypes.
Reduction of Life Expectancy
We scored the impact of the disorders in MMBID7 on life span by considering the typical life expectancy of the untreated patient with the most frequent form of the disease. Adequate information was available for 332 (77 percent) of the 430 disorders. We defined four categories: no reduction; mild, for those in which patients usually reach middle age; moderate, for those diseases where death occurs between age 10 and 30; and severe, for those where mortality occurs before age 10.
We found that about two-thirds of the disorders had an effect on lifespan (Fig. 4-7). Of these, about 75 percent were moderate to severe reductions with death before age 30. These results are consistent with the expectation that the disorders in which the gene effect is the most obtrusive and independent of modification have the earliest onset and the most disrupted phenotype.
Consequences of the MMBID7 disorders on life expectancy. See Table 4-2 for definition of categories.
The interval of medical history corresponding to the birth, development, and maturation of MMBID spans the molecular revolution. The first edition appeared in 1960, just 16 years after the description by Avery, McCloud, and McCarty of DNA as the hereditary material, 24 and just 7 years after the elucidation of the antiparallel double-stranded structure of DNA by Watson and Crick (1953).25 MMBID7, by contrast, appeared 35 years later in 1995, 5 years after the start of the Human Genome Project and 5 years before we expect to have a draft of the entire human sequence.26
Not surprisingly, the first edition of MMBID had virtually no molecular information; aside from X-linked traits, no genes were mapped to chromosomes and there was no information about causative mutations. By contrast, MMBID7, as emphasized by the insertion of “Molecular” into its title, has substantial molecular information. Of the 430 disorders in our study, nearly 70 percent were mapped, and for more than half, the responsible gene was cloned and disease-producing mutations identified. Of these, 94 percent were point mutations (nucleotide substitutions, insertion or deletions of less than 20 bp); 4 percent were gross rearrangements often affecting more than one gene, and 2 percent consisted of those diseases caused by expansion of short repeat sequences. This distribution of causative mutations is similar to other large collections of mutations (e.g., see the Cardiff Human Gene Mutation Database http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html), so at the molecular level mutations associated with monogenic disease are indistinguishable from mutations in general.
Thus, our knowledge of the proximate cause of disease is increasing dramatically. The molecular information in MMBID8 is greater still and in future editions, we can reasonably expect a nearly complete enumeration of the genes and mutations responsible for monogenic disorders. Identification of the genes and alleles contributing to complex traits will come along at a slower pace. The paradox is that our ability to treat genetic disease is improving at a much slower rate (see Chap. 5). Proximate cause is but one piece of the puzzle, useful for diagnosis and as a starting point for understanding pathogenesis but of limited value for developing treatment.
Proteins do the work of the genes; understanding their function will provide more insight into disease, especially pathophysiology. Of the 430 disorders we analyzed, the nature of the protein product was identified for 348 (81 percent). We classified these according to function: enzymes, transporters, transcription factors, and so forth. Not surprisingly (and in part for the historical biases mentioned earlier), almost half were enzymes involved in intermediary metabolism. The distribution of the remaining categories is broad; that is, proteins performing virtually any function can be involved in disease (Fig. 4-8).
Functional classification of proteins identified as: A, the conserved set of orthologous proteins that carry out core biologic processes in S. cerevisiae and C. elegans (see reference29); B, the primary defects in the disorders in MMBID7; C, the MMBID7 set adjusted so that the fraction that are enzymes in intermediary metabolism is set equal to the same category in the yeast/worm set.
How does this set of proteins, identified by their association with disease phenotypes, compare to the complete repertoire of human proteins? Are some gene products more often involved in disease or is the distribution simply a mirror of the protein products of our complete genome? Answers to these questions must wait for completion of the sequence and annotation of the human genome.
In an effort to foretell the answers to these questions, however, we took advantage of the recent completion of the yeast and worm whole-genome sequence.27,28 Chervitz and colleagues compared the full complement of the predicted protein sequences encoded by these genomes and identified a set of about 3000 orthologous proteins that carry out core biologic processes in each organism (Fig. 4-8A).29 These comprise about 40 percent of the yeast proteins and 20 percent of the worm proteins. The remainder appear to be involved in specialized function related to the special biologic requirements of each organism: one a multicellular creature that utilizes coordinated patterns of gene expression to produce specialized cell types; the other a single cell that turns batteries of genes on and off to adapt to environmental variables. We reasoned that a similar core would hold for all organisms and asked how the human proteins identified by association with disease in MMBID compared to the core biologic set identified by the yeast-worm comparison (Fig. 4-8A). Again, the MMBID7 set has a greater percentage of enzymes, we think in part for historical and technical biases rather than for entirely biologic reasons (Fig. 4-8B). If we arbitrarily set the percentage encoding enzymes as equal in both (Fig. 4-8C), then the adjusted distribution of the remainder is more similar except for the extracellular proteins (not represented in the yeast/worm set) and the larger fraction of unclassified proteins in the MMBID7 set (perhaps reflecting more complexity and less understanding of the human organism). This general similarity of the kinds of proteins identified in the MMBID7 set and the yeast/worm set suggests that, at least for the core set of proteins, all functional categories are equally likely to be involved in disease. It will be of great interest to follow the course of this trend as the numbers increase. Similarly, it will be interesting to compare the disease involvement of the conserved core set with the specialized nonorthologous proteins characteristic of each type of organism.