The ultimate source of all genetic variation is mutation, namely an alteration in the DNA sequence that may be induced chemically or by external radiation such as x-rays or ultraviolet light, or that may occur spontaneously, often as a result of mistakes made during DNA replication. When the mutation occurs, for example, in the middle of an intervening sequence, or in a flanking region between genes, or in a third base-pair coding position that does not give rise to a changed amino acid (a synonymous substitution), it almost certainly has no functional consequence and so is selectively “neutral.” It may also be that some amino acid changes or changes in 5′ or 3′ untranslated regions of an expressed gene similarly have no functional consequence and so are effectively neutral. The fate of a neutral mutation is strictly determined by chance, sometimes referred to as “random genetic drift,” a phenomenon that is discussed in more detail later. Occasional mutations are functionally advantageous, and so give rise to a positive selective advantage that may be expressed, for example, as improved viability or improved fertility. Such mutations are the basis for the adaptive process underlying Darwin's theory of evolution by natural selection. If the selective advantage persists, then generally such mutations will increase in frequency in a population until they replace the version of the gene from which they were derived, and this is the basic process of gene substitution during evolution by natural selection.
The vast majority of new mutations in expressed genes, especially those that change an amino acid in a coding region, are likely to disrupt the function of a gene at least to some extent, and therefore lead to a selective disadvantage. This disadvantage will generally lead, usually quite rapidly, to the disappearance of the mutant gene from the population. However, new mutations continuously arise each generation, and so a balance is achieved between mutation giving rise to new deleterious variants of a gene and selection removing them from the population. This is the source of nearly all of the rare inherited human diseases, including the “inborn errors of metabolism” first described by Garrod. Because mutation rates are generally quite small and their selective disadvantages appreciable, the frequencies of genes maintained by the balance between mutation and selection are generally very low, and hence the corresponding diseases are comparatively rare.
Well-known examples of dominantly inherited diseases maintained by mutation selection balance include the colorectal cancer susceptibility, familial adenomatous polyposis (FAP). This occurs with a frequency of approximately 1 in 8000 in those populations where there is essentially complete registration for the trait, and probably with a similar frequency in all human populations. Individuals with severe disruptive mutations in most of the first half of the corresponding gene, APC, develop hundreds to thousands of small adenomatous polyps in their large intestine, usually starting in the early teens. One or more of these inevitably develops into a colorectal cancer usually by the mid-twenties to thirties, and if untreated leads to an early death often, therefore, during or before the normal reproductive period. Now, if families with FAP are monitored and a colectomy is performed sufficiently early to prevent the development of colorectal cancers, FAP individuals can have an almost normal life expectancy and good quality of life. However, under “primitive” conditions, including even those of more than 30 to 50 years ago, FAP was clearly a severe selective disadvantage. Thus, the low frequency of FAP in the population reflects this balance between its selective disadvantage and yet the continuing production of new mutations. (For a review of FAP see reference6 .)
The simple approximate formula relating mutation rates, selective disadvantages and gene frequencies was first derived by Danforth and then J.B.S. Haldane, one of the great founders of mathematical population genetics, in the early 1920s. For a dominant disadvantageous condition determined by a gene A such that all heterozygotes Aa have the condition, and as a result have a fitness 1 − s compared to 1 for normal aa individuals, and ignoring random fluctuation, it can be shown that the balance between the overall mutation rate μ to the dominant disadvantageous gene A and the selective disadvantage, is related to the gene frequency p of A at equilibrium, by the formula
The frequency of the mutant heterozygotes Aa, is, from the Hardy-Weinberg Law, 2pq = 2p (1 − p), which is approximately just 2p when p is small and terms in p2 can be ignored. Thus the frequency of the affected individuals is approximately 2 μ/s. If, for example, there are good estimates of the frequency of affecteds in a population and of their selective disadvantage s, then the mutation rate can be estimated from the rearranged formula
For example, for FAP, assuming its population incidence is approximately 1 in 10,000 and the selective disadvantage about 30 percent or 0.3 then the mutation rate is estimated as
which is a relatively high mutation rate but reflects the combined frequency of all mutations in the gene that could give rise to FAP. Even for an overall mutation rate as high as 10−5 and a selective disadvantage as small as 1 percent, the estimated frequency of affected individuals is still only 2 per 1,000. Thus, the balance between mutation and a dominant trait with a selective disadvantage will always give rise to gene frequencies that are substantially less than 1 percent, the conventionally accepted level for the definition of a polymorphism, and usually, at most, a hundredth of this, namely about 1 in 10,000.
If there is reliable information on the proportion of affected individuals Aa who are new mutants because both their parents are normal aa, then the selective disadvantages can be estimated simply from the proportion of these “sporadic” cases, namely those that do not have mutant parents and so are new mutations. This only applies, of course, to a fully penetrant trait.
The situation for a recessive trait is substantially different. Suppose the abnormality was recessive aa with a frequency, following the Hardy-Weinberg Law, of p2. If the dominant homozygote normal genotype AA and the heterozygote for the mutant Aa generally have identical fitnesses of 1, while the fitness of the mutant homozygotes aa is 1 − s, then now the balance between loss of genes to a population by selection and gain by mutation is achieved when the trait frequency q2, where q is the frequency of the gene a, is equated to μ/s, namely
The gene frequency at equilibrium will therefore be given by
which will always be substantially greater than the corresponding equilibrium frequency for a dominant gene. This is because most of the mutant genes are present in the heterozygote, and yet the only elimination is of the affected homozygotes, which is a much less efficient selective process. Thus, for example, assuming a mutation rate of 10−6 and a recessive lethal, for which s = 1, the frequency of the affected homozygotes would be 1 in 1,000,000 and the gene frequency of the recessive mutant therefore 1 in 1,000. For a dominant lethal with the same mutation rate the gene frequency would itself be 1 in 1,000,000.
How then, with reasonable mutation rates, which certainly are not expected to be higher on average than 10−5 or 10−6 per generation per gene overall, can a recessive lethal such as cystic fibrosis (which surely would have been sufficiently severe before recent improvements in medical treatment to preclude individuals having any offspring) ever reach the high frequency of between 1 in 2,000 and 1 in 3,000, corresponding to a gene frequency of around 2 percent, which it does in Northern Europe? It must be emphasized that this high frequency refers only to one particular mutant allele, ΔF508, amongst the very large number of possible CFTR mutants which can, in principle, give rise to cystic fibrosis. One answer is that the balance between mutation and selection for a true recessive is very sensitive to any departure from the assumption of recessiveness, since the effects of selection on the heterozygote Aa are so much greater than on the homozygote aa. This is only helpful if one assumes that there may, ironically, be some advantage to the heterozygote at a very low level that somehow compensates for the disadvantage of the homozygote and allows a relatively higher gene frequency to be attained. The basis for this explanation will become clearer after the forthcoming discussion of the classical mechanisms by which a polymorphism can be maintained at a high frequency purely by balancing selection. A totally different explanation, however, becomes possible if we take into account the extent to which chance, namely random drift, can occasionally give rise to an increased frequency of a mutant gene that is neutral with respect to selection, or very nearly so. That is the situation for the cystic fibrosis ΔF508 mutation, and so perhaps it is pure chance that this particular mutation has increased to the extent it has in northern Europe but not elsewhere. The counterargument will be, that this is highly unlikely, in general, because the probability of any particular mutant increasing in frequency to such an extent by chance is necessarily small. But then we must also take into account the inevitable ascertainment bias involved in identifying and analyzing such mutant traits. The chance of a disease being studied is, effectively, proportional to its frequency in the population. Therefore, inevitably, the more common abnormal traits will come to our notice first and foremost. There could be hundreds or thousands of genes that give rise to inborn errors comparable to cystic fibrosis, but it is cystic fibrosis and perhaps one or two other conditions, like PKU, which have achieved this relatively high frequency and so been brought starkly to our attention in European populations. Had the original studies on genetic abnormalities been done, for example, in China, then neither of these traits would have rated any particular attention.
One consequence of this interpretation is that, when the frequency of a rare inherited disease varies substantially by a factor of 10, or even of 100, in frequency between populations, as is the case, for example, for cystic fibrosis and PKU, then these high frequencies will be associated with particular alleles that, by chance, have drifted up in frequency substantially, compared to all other alleles present in the population with similar effects. This is only likely to apply to recessive traits because of the effective neutrality of heterozygotes for the mutant gene. (However we shall later discuss an apparent exception to this notion for dominant genes with less than full penetrance and relatively mild selective disadvantages). Dominant traits, on the other hand, maintained by the balance between mutation and selection and with a relatively severe disadvantage are unlikely to vary substantially in their incidence from one population to another. This is because, in that case, drift effects will be too weak to counteract the selective disadvantage of the new mutant present in a heterozygote, and so the overall mutation rate of the gene and its average selective disadvantage in heterozygotes will, according to the formula given above, determine the population frequency. As neither the overall mutation rate nor the overall selective disadvantage is likely to vary substantially from one population to another, this means that the incidence also is unlikely to vary. This is clearly the case, for example, for the dominantly inherited or FAP syndrome.
The effect of chance, or stochastic variation on gene frequencies in populations of finite size was first classically analyzed by R.A. Fisher and Sewall Wright, and involves for its full understanding a complex mathematical development. The basic idea behind genetic drift can however be explained simply.
Suppose we start with a population of 10 genes, 5 of which are A, and 5 of which are a. We then choose 10 new genes by choosing A and a on each occasion with equal frequencies, for example, by tossing a coin. By chance this new population of genes may, for example, contain only three As and seven as. We now form a new population of 10 genes choosing A with a probability each time of 3 of 10 and a with a probability of 7 of 10. One could imagine doing this with a biased coin or using a computer to choose a random number between 0 and 1 and assigning A, if it is <0.3, and a, if it is bigger than 0.7. If this process is repeated many times, reflecting many generations of random mating amongst the 10 genes (or 5 individuals), eventually a situation will arise where all 10 genes are either A or a and then, in the absence of mutation, no further change can take place. In that case, gene A is said to be fixed. For a pair of neutral alleles in a finite population however large, the eventual outcome is bound to be that one or the other is fixed and the proportion of times they are fixed is simply proportional to their initial frequency. So, for example, starting always with 10 genes, 5 of which are A and 5, a, over many such experiments of going to fixation, half the time the result will be that A is fixed and half that a is fixed. However, supposing we had started with 3 A genes and 7 a, then the result would have been that 30 percent of the time A is fixed and 70 percent a. Taking this to its limit, a single new mutation in a population of N individuals has a gene frequency of 1/2N, and this then will be the probability that it eventually gets fixed. Clearly this probability becomes very small when N, the population size, becomes large, but it is always there. That is why random fluctuations can give rise to increases in the frequency of a gene simply by chance, although the probability that this happens for any particular gene is always small. The mean time taken for the gene to be fixed, if it is, can be shown to be about 4N generations, where N is the population size, and so may be very large for large populations. The distribution of the frequencies of different mutants due to drift is heavily skewed. In other words, only a very small proportion of new mutants will reach polymorphic frequencies, while the vast majority of the mutants hover around very low frequencies with a high probability of being lost.
A particular form of chance effect is associated with populations that have gone through a “bottle-neck,” namely a severe reduction in size followed by a comparatively rapid expansion. In that case, there will be a “founder” effect in that those mutations that happen to be present in the reduced population at the time of the bottle-neck will be there at a relatively high frequency, which, on the whole, will be retained when the population expands rapidly. Thus, populations that have gone through a bottle-neck may contain a relatively larger number of alleles that have apparently drifted up in frequency than would be expected in a large homogeneous population. This accounts, presumably, for the relatively large number of comparatively frequent mutant alleles, for example, for Tay-Sachs disease and for one or two alleles of the breast cancer BRCA1 gene in Ashkenazi Jewish populations. A somewhat similar phenomenon is seen in Finnish populations.
The first serious attempt to estimate human mutation rates was made by J.B.S. Haldane in the 1930s using the mutation-selection balance formula (Eq. 11-1) for deleterious dominant genes, (as modified for x-linked recessives) and estimates of the incidence of the abnormality and its selective advantage, as already discussed. Apart from the obvious problems of getting good estimates of the trait incidence and its selective disadvantage, the difficulty of this approach, understandably not realized at the time, is that a mutation rate when applied to a whole gene consisting of maybe 5000 to 10,000 base pairs is not itself a meaningful concept. This is because at the DNA level, different positions within the gene may have different mutation rates. In addition, mutations may vary considerably in their expression, and so selective effect, depending on the particular disruptive effect of any particular mutation. Haldane initially used hemophilia as his example and it is now clear that different mutations disrupting the function of factor VIII can have varying phenotypic consequences on clotting efficiency, leading not only to variability in the disease manifestation, but also clearly in the selective disadvantage associated with any particular mutation. Thus, the mutation rate is really the aggregate rate with which all mutations that give rise to a collective phenotype called hemophilia occur. This aggregate rate is, in general, an inextricable mixture of varying mutation rates at different positions of the gene and different selective disadvantages associated with different categories of mutations. The only general comment that perhaps can be made is that the larger the gene the higher the expected overall mutation rate. This may explain the prominence of genetic diseases such as the X-linked Duchenne muscular dystrophy and the inherited colorectal cancer susceptibility, FAP, both associated with relatively big expressed genes.
There is one condition under which this confounding between mutation rates at the DNA level and different selective effects can be resolved, namely if a phenotype is made up of a set of mutations all of which have the same, or more or less the same, effect and so can be assumed for practical purposes to have the same selective disadvantage. It can then be shown by an extension of the classical mutation-selection balance theory (personal observation), that the relative frequency with which different categories of mutation are observed is proportional to their mutation rates. In that case, given an accurate overall estimate of the incidence of the phenotype in the population, and of its selective disadvantage, for example, estimated from the proportion of sporadic cases, then absolute estimates of the mutation rate at the DNA level are obtainable.
The dominantly inherited cancer susceptibilities associated with tumor suppressor genes and Knudson's classical hypothesis on the relationship between germ line and somatic mutations may satisfy these conditions. Thus, clearcut florid polyposis arises almost entirely through truncation mutations in the APC gene that disrupt the function of the protein specifically between approximately amino acids 200 and 1600 out of the total 2843 amino acids that make up the protein. Several hundred germline mutations have been sequenced, nearly all of which are truncating due to either nonsense or frameshift mutations. These can be categorized into types with respect to probable mutation rates, for example; single base pair transversions, which have been seen only once; mutations at CpG positions (which often have C methylated), in particular from CGA to TGA, which replace an arginine by a stop codon; and one or two individually extremely common mutations, notably at amino acids position 1061 and 1309. For each defined category of mutations, it is possible to count the number of potential sites in that category at which truncating or frameshift mutations could occur. Using reliable estimates of the population incidence of FAP from Scandinavian registries (about 1 in 8000) and an estimate of the selective disadvantage of FAP of about 30 percent gives rise to an estimate of the germline mutation rate at the DNA level for single base pair transversion mutations of about 3 to 5 × 10−9 per base pair per generation.
Methylated CpG positions give a mutation rate estimate that is about 40 times as high as this, while the estimate of the mutation rate at the 5 base pair duplication around amino acid position 1309, which accounts for some 10 to 15 percent of all germline mutations, is a thousand-fold higher. This range of mutation rates illustrates the complexity of estimating the rate at the DNA level and emphasizes the distinctive value of the FAP/APC data for this purpose.6,7
These are perhaps the first valid direct mutation-rate estimates for human populations at the DNA level. This approach still involves some averaging assumptions because it is known, for example, that the 1309 mutation has a more severe effect than many others. A significant part of the observed phenotypic heterogeneity in FAP is due to the differential effects of mutations in different positions within the gene, which can, to some extent, begin to be explained by the protein's multifunctional role. When mutations with different phenotypic effects occur with different frequencies, then this can also explain patterns of heterogeneity with respect to a trait caused by mutations in a given gene. Beyond this, heterogeneity may arise as a result of differential environmental effects, the effects of genetic modifiers (which, however, unless they are closely linked to the primary locus, should not segregate with FAP in families), and, often most troublesome, heterogeneity due to mutations in different loci giving an essentially similar phenotype. This was initially the situation reported for xeroderma pigmentosum.
There is another aspect of the heterogeneous effects of different mutations within a given gene that is well illustrated by FAP. As already mentioned, only mutations within the region amino acids 200 to 1600 give rise to classical polyposis. Mutations near the beginning of the gene and occasional mutations beyond amino acid 1600 have been shown to give rise to a much milder phenotype, which has been called AAPC for attenuated adenomatous polyposis coli. This is exemplified by a much smaller number of polyps and a later age of onset of the disease. Recently, paralleling these observations, missense variants in the central portion of the gene have been described that have a clearly lower penetrance and may give rise to a somewhat milder phenotype. One of these (I1307K) has been found so far only in Ashkenazi Jews with colorectal cancer, or people with adenomas, at a three-fold or more higher incidence than the control frequency of 5 to 10 percent in Ashkenazi Jews with no overt colorectal cancers or adenomas.8,9 The second variant (E1317Q) has been found in patients with multiple adenomas and also with early onset of colorectal cancer with a frequency of a few percent, but has not so far been seen at all in relevant control populations.9 The selective disadvantage (presumably mainly with respect to fertility) of these missense variants will be minimal at worst, and possibly nonexistent. Thus, these mutations can drift up in frequency by chance in just the same way that recessive mutations, such as the cystic fibrosis, ΔF508, mutation can. This explains their relatively high frequency in some populations and not others, and can account for mild sporadic early onset cases without a family history. This is because, with a relatively low penetrance, perhaps ⅓ to ½, pairs of sibs will only be affected with a probability of 1 in 16 to, roughly, 1 in 36 and therefore will mostly not show up as families, but usually only as sporadic cases.
Thus there are two quite contrasting explanations for genetically based early onset sporadic cancers. The first is that these are new mutations, but in that case the effects must be severe enough to account for the proportion of new mutations as the selective disadvantage. The other extreme arises, as in the case of the 1307 and 1317 mutants, when the effect is of sufficiently low penetrance and severity, that disease in first-degree relatives has not been detected. The relatively moderate effects of these missense variants can also account for the fact that they are not seen at the somatic level, namely as mutations in sporadic cancers. Mutations that occur in the germline are, of course, present in every cell in the body. Thus, they have a higher probability of contributing to the development of a tumor even if their selective advantage at the cellular level is relatively weak. The mutations that occur sporadically, however, generally occur in just one cell and therefore are much more likely to contribute to the somatic evolution of a cancer if they have a higher selective advantage. The overall probability of success must be proportional to the number of cells that have the mutation times its selective advantage, hence the difference between the mild germline mutations, still acting at the cellular level, and the more severe sporadic mutations, which in APC occur predominantly in a small central portion of the gene.
This notion may explain other situations such as for the BRCA1 and BRCA2 mutations in breast cancer, where germline mutations are found conferring a pronounced increase in the risk of breast cancer in genes that apparently should be behaving like tumor suppressor genes, but for which mutations are not found in sporadic cancers.
The overall frequency of these “subpolymorphic” missense variants, such as I1307K and E1307Q for APC, may be substantially higher than the combined frequency of the more severe truncating mutations. Thus, even though these missense variants may have a lower penetrance, their overall contribution to colorectal cancer susceptibility may at least equal, if not exceed, the overall contribution of the severe mutants maintained by mutation selection balance in the population. These missense variants thus appear to represent a new intermediate category of susceptibility between rare mutations kept at bay by their selective disadvantage, and common polymorphisms associated with comparatively common chronic diseases such as in the case of the HLA and disease associations.
The type of subpolymorphic tumor predisposing variation represented by these missense variants in the APC gene might be found quite generally in a wide range of disease-causing genes. Thus, in any situation where severe nonfunctional mutations in a gene cause an obvious Mendelian disease susceptibility, there could be such polymorphic or subpolymorphic variants with a much lesser, but nevertheless significant effect. These may be likely to occur particularly when the relevant protein product functions as a dimer or in a complex with other proteins, so that missense mutations can have significant dominant negative or gain of function effects (see Bodmer6 for further background to these ideas).