Methods in research: mapping disease genes

A number of techniques are available for mapping and identifying genes involved in modifying susceptibility to disease, and these have been dramatically successful in recent years. If we consider a traditional context in which one began knowing the abnormal protein which was involved in aetiology and then identified the abnormal gene coding for that protein, the modern approach is termed "reverse genetics" because the identification of the abnormal gene precedes characterisation of its abnormal product. The term "positional cloning" may also be used to indicate that the disease gene is identified by first implicating a particular genetic location. One can obtain information with regard to the localisation of a disease gene from cytogenetic abnormalities and from genetic marker studies, which comprise linkage and association studies. Genes implicated as candidates either from mapping data or through knowledge of their function can be screened for polymorphisms and then these can be tested for association with disease.

Alzheimer's disease (AD) provides an example of many of these processes. AD-type pathology occurs in middle age in subjects with trisomy 21 (Downs syndrome), so this could be taken as a cytogenetic abnormality suggesting that an AD gene could be situated on chromosome 21. The gene for amyloid precursor protein (APP) was mapped to chromosome 21 ("forward genetics"). Genetic linkage studies with chromosome 21 markers in families with presenile, autosomal dominant AD implicated the region of the APP locus ("reverse genetics"), and when the APP gene was screened for mutations a few mutations were found which occurred in all the subjects with AD in some of these families and which were not found in any subjects without AD. In cases of senile onset AD weak linkage and association was found to markers on chromosome 19, and one of the genes in this region codes for apoliprotein E. This protein was known to exist in three common forms, e2, e3 and e4, and when association studies were carried out with this polymorphism it was found that the e4 allele was associated with a higher risk of AD, suggesting that this was directly involved in pathogenesis. Other families with presenile AD demonstrated linkage to markers on chromosome 14, and linkage studies eventually led to a narrow localisation. When genes in this region were screened, one was found to have mutations which again only occurred in subjects with AD and in no normal subjects. This gene was named presenilin 1 (PS1). Other researchers then searched for genes with similar coding sequences to PS1 and very soon afterwards found a second gene on chromosome 1 which contained mutations in other cases of presenile AD, and this was named presenilin 2 (PS2).

Cytogenetic abnormalities

This term refers to quite major chromosomal abnormalities which are detectable microscopically. If a disease occurs in association with a cytogenetic abnormality then it may imply that the disease gene is in the region involved. For example, the gene involved in fragile-X mental retardation is situated at the "fragile-site" near the tip of the X chromosome which can be visualised when cells are cultured in a low folate medium. Families have been reported in which a cytogenetic abnormality cosegregates with mental illness, and some cytogenetic abnormalities are associated with high rates of psychosis, so these may point to regions of interest.

Linkage studies

Genetic linkage is the phenomenon whereby alleles at loci close together on the same chromosome will tend to be inherited together, because it will be rare for a crossover to occur between the loci at meiosis. The closer together the loci are, the less likely crossovers will be and the fewer recombinants will be observed. If loci are far apart or on different chromosomes then recombination will occur by chance in 50% of meioses. The recombination fraction ranges from 0 (tight linkage) to 0.5 (no linkage) and is a measure of genetic distance.

In the first pedigree, the disease cosegregates with an A allele of the marker. In the second pedigree, the disease initially cosegregates with a B allele until there is a recombination, after which it cosegregates with a C allele.

Linkage can be used to map disease genes by typing polymorphic DNA markers and seeing if their alleles cosegregate with disease among related subjects. Linkage can be studied in multiply-affected families, in which case the strength of evidence in favour of linkage can be measured as the lod score. This is the logarithm (base 10) of the ratio of the likelihood of the observed genotypes given a recombination fraction less than 0.5 compared with the likelihood under non-linkage, i.e. with the recombination fraction equal to 0.5. Traditionally a lod of 3 or more is taken as "significant" evidence for linkage (although this is does equate with the meaning of statistical significance in other contexts).

It can be difficult to apply the lod score method to so-called "complex" diseases with non-Mendelian inheritance because the likelihood calculations require that an exact mode of inheritance is specified, and of course this is unknown for most psychiatric diseases. A common alternative approach is to examine allele-sharing between pairs of affected relatives, and the simplest example of this is the sib-pair method. Taking pairs of affected siblings, we would expect that by chance they would share two alleles of a DNA marker 25% of the time, one allele 50% and no alleles 25%. However if the marker is linked to the disease gene then alleles will be shared between affected sib pairs more often than expected. If parents are also genotyped then the inheritance of the marker alleles can be studied directly (identity-by-descent, IBD analysis), but even if the parents are unavailable one can use population allele frequencies to estimate whether increased allele-sharing is occurring (identity-by-state, IBS analysis). The strength of evidence in favour of linkage can be given by a chi-squared statistic or by a maximum likelihood score (MLS), the latter being similar to a lod score.

The second sib pair shares one allele IBD, while all the others share both alleles, suggesting a recessive gene may be quite closely linked to the marker.

The lod score method, which requires specification of values for transmission model parameters, is termed parametric, whereas tests which do not involve model specification are termed non- parametric. The latter consist mainly of tests for increased allele-sharing between affected relatives. They are widely regarded as being more appropriate for studying complex diseases (although there is controversy about this), but they are poor at providing a precise location of the disease gene compared to the lod score method.

The crucial thing to realise about linkage studies is that they rely on studying sets of related affected subjects and that they are capable of detecting a disease gene over a relatively large range, so that only 2-300 markers are sufficient to carry out a screen of the whole genome. This means that linkage studies can be used to provide initial localisations for disease genes even if one has no prior information with regard to the kind of gene which may be involved or its chromosomal location.

Association studies

If a disease gene is very tightly linked to a genetic marker, then the alleles of the disease gene and marker may be associated with each other. That is to say that a particular marker allele may be found more frequently among affected subjects than in the general population. Association may occur either because the disease gene and marker polymorphism are very close together and there have not been enough recombinant events throughout human evolution to allow them to reach "linkage equilibrium" or it may be that the marker alleles themselves somehow influence susceptibility to disease (as occurs for example with ApoE and AD). "Spurious" assocations can occur if the disease and marker alleles have different frequencies in different subpopulations - then if the disease and a particular marker allele are both commoner in a certain group they will be associated in the population whether or not the loci are close together. Association studies are generally carried out by measuring the frequency of marker alleles in a group of cases and matched controls, and it is important that ethnically homogeneous and well-matched groups are used if spurious associations are to be avoided. In addition to studying for association with disease, one may also wish to test for the association of a genetic polymorphism with factors such as drug response.

Allele D is found more frequently among cases than controls.

Association is a very short-range phenomenon and will only be observable for loci which are very tightly linked. Additionally one should realise that even if a marker is very close to a pathogenic locus, there will not necessarily be any association - this will depend on the evolutionary history of the marker and disease polymorphisms. Thus association studies cannot be used for genome-screening because potentially one would need to study many thousands of markers. However they can be very useful when a narrow region has already been implicated, for example by linkage, or when one has a restricted range of candidate genes which one may wish to study.

Transmission disequilibrium test

This method of analysis (TDT) can be conceived as a form of association study which avoids the possibility of spurious positive results produced by population stratifications. It is similar to another test called haplotype relative risk (HRR) in that sets of affected cases are studied together with their parents. Each parent transmits one marker allele to their affected offspring and does not transmit another allele, so the non-transmitted alleles can serve as a control sample. In a TDT analysis one studies parents heterozygous for the marker and counts the number of times each allele is transmitted or is not transmitted to the affected subjects. If an allele is transmitted to an affected subject more often than not then this is robust evidence for association with the disease.

The A allele is transmitted to affected offspring four times out of five.

Gene-mapping in psychiatry

The approaches outlined above have been very successful when applied to Alzheimer's disease and Huntington's disease, but have not yet produced definitive results for functional diseases. Large-scale genetic linkage projects are underway for the functional psychoses, and association studies of some candidate genes are also being carried out.

December 2000

Dave Curtis (