Model-free linkage analysis

Dave Curtis and Pak Sham, August 1995.

Curtis D and Sham PC. Model-free linkage analysis using likelihoods. Am J Hum Genet, 1995.

Problems with linkage analysis

The lod score method uses all available information and is powerful, but produces false negative lod scores if the transmission model is misspecified. This is especially the case at small recombination fractions, and also in multipoint analyses.

Attempts to reduce reliance on model specification

  • Test several models - multiple-testing makes interpretation of significance difficult
  • Only consider affected subjects - discards information and may increase sensitivity to misspecification of marker genotype probabilities
  • Only use two-point analyses - but adjacent markers may be more informative in combination than either on their own
  • Use nonparametric methods - but these lack power and discard available information concerning the pattern of segregation of disease and markers

Developing a model-free method of analysis

We attempted to produce a new method of analysis with the following aims:
  • Use all available information concerning affection status, marker genotypes and pedigree structure
  • Produce a method which could detect a susceptibility locus close to a marker, without relying on an inflated estimate of recombination
We sought to be able to detect the presence of a genetic effect on susceptibility at a particular point on the genetic map where the pattern of haplotype segregation was completely known.

The general single locus linkage model

The situation of a biallelic susceptibility locus and its relation with a marker can be described using the following parameters:
  • f0, f1 f2 - penetrance values
  • q - frequency of allele conferring susceptibility
  • theta - recombination fraction between disease and marker
In addition, we may allow for the presence of another susceptibility locus with the same mode of transmission but which is unlinked to the marker:
  • alpha - proportion of families segregating the linked locus
For convenience, we can refer to the three penetrance values as a single vector with three elements, F, and we can refer to the combination of this vector with the allele frequency, q, as the transmission model T, which thus comprises four parameters: f0, f1, f2 and q. The situation where a locus has no effect on susceptibility may be referred to as T0, where f0=f1=f2 (or perhaps q=0 or q=1, implying the locus is monomorphic).

Parameter estimation and hypothesis testing

1. Segregation analysis

The likelihood is maximised over the (four) transmission model parameters. Marker genotypes are to be ignored, so the recombination fraction between the disease and marker loci is fixed to 50%.

Testing for the presence of a susceptibility locus:

LR = L(D | T) / L(D | T0) 
   = L(D | f0,f1,f2,q) / L(D | f0=f1=f2 or q=0 or q=1)
   = L(D,M | f0,f1,f2,q,theta=0.5) / 
     L(D,M | f0=f1=f2 or q=0 or q=1,theta=0.5)

Parameter estimation and hypothesis testing

2. Classical linkage analysis

The likelihood is maximised over different values of theta, assuming a fixed transmission model. Testing for linkage:

LR = L(D,M | theta<0.5,{q,F}) / 
     L(D,M | theta=0.5,{q,F}) 

Parameter estimation and hypothesis testing

3. Linkage analysis incorporating admixture

Classical linkage analysis can extended to allow for locus heterogeneity by allowing alpha to be less than 1. Testing for linkage with admixture:

LR = L(D,M | theta<0.5,alpha>0,{q,F}) / 
     L(D,M | theta=0.5 or alpha=0,{q,F}) 

Parameter estimation and hypothesis testing

4. Testing for an effect on susceptibility at a given position

To test a specified position, we fix theta and allow alpha as the sole parameter to test for linkage. To make the test model- free, we no longer fix the transmission model parameters in advance. Testing for a susceptibility locus at position theta=t:

LR = L(D,M | alpha>0,theta=t,q,F) / 
     L(D,M | alpha=0,theta=t,q,F) 

Because there is one more free parameter in the numerator than in the denominator, log(LR) provides a test with one degree of freedom and is comparable with a standard lod score.

Constraining transmission model parameters

As described, the test may have no power to detect linkage - if affected sib pairs are used the transmission model parameters will take values such as f2=1, q=1 or f0=f1=f2=1. In other samples this effect may be less extreme but may still reduce the power of the method.

Constraining to produce correct prevalence

To reduce the effect of selection bias which may yield unrealistic estimates of the transmission model parameters, we can constrain the transmission model parameters to yield the correct population prevalence for the disease, K, and this constraint can be denoted [q,F]. The test for linkage can then be written as follows:

LR = L(D,M | alpha>0,theta=t,[q,F]) / 
     L(D,M | alpha=0,theta=t,[q,F]) 

If we impose the additional constraint f0<=f1<=f2 then we can draw a polyhedron which encloses all possible values for F:

This polyhedron has a vertex at the point (K,K,K), corresponding to T0, which models the locus having no effect on susceptibility.

At all other points there is single value for q which produces the correct value of K, so that q becomes a function of F.

Further constraints on model parameters

To make the procedure less computationally demanding one may restricting consideration to a smaller subset of models: those represented in the figure by the dotted lines joining the Mendelian recessive model, at (0,0,1), through the null effect model, at (K,K,K), to the Mendelian dominant model at (0,1,1). It can be seen that these lines pass close to most points within the allowable volume.

If only these models indicated are used, then f0 and f2 both become functions of f1 and transmission model is completely defined by the choice of f1.


MFLINK is a program which automatically sets up appropriate data files and then calls MLINK to carry out the likelihood calculations. It will shortly be available from John Attwood's ftp site at in /pub/packages/dcurtis.


The method was applied to pedigree and sib pair data generated under a wide variety of transmission models.

Performance was compared with an IBD sib pair analysis and the lod score method using the correct transmission model.

When applied to the affected sib pair data set all methods gave similar results.

With the pedigree data, the methods sometimes gave similar results to each other, but for some transmission models the two likelihood-based methods were considerably more powerful than the sib pair method.

Dave Curtis (