DNA variation in humans

DNA sequence

Normal human nuclei contain 22 pairs of autosomes and one pair of sex chromosomes (XX or XY). Within a pair of chromosomes or between the same chromosomes in different individuals the DNA code is almost, but not quite, identical.  

The human genome consists of approximately 3,000,000,000 DNA bases. About 5% of the genome is functional and about 1% consists of genes coding for proteins. There are approximately 21,000 genes.  

The portion of the genome containing coding sequences of genes is termed the exome.  


About 1 per 1,000 DNA bases varies from the normal "reference" sequence. Where the change is from one base to another this is termed a single nucleotide polymorphism (SNP). Common SNPs are used as markers for association studies.  

There can also be insertions or deletions of 1 or many DNA bases (indels). Large insertions and deletions are termed copy number variants (CNVs).  

Gross chromosomal abnormalities such as trisomy, inversion and translocation also occur.  

There is far less varation in the exome than in the rest of the genome. This is understandable because variation outside the exome is expected to have little or no effect on the functioning of the organism and hence is not subject to selection pressure. Even if a variation does occur in a coding region it may be such that the same amino acid sequence is produced (because of redundancy in coding - that is, the fact that different codons can code for the same amino acid). DNA changes which produces changes in amino acid sequece are termed "non-synonymous".  

Each subject will have approximately 3,000,000 variants. Many of these will be common and shared with other people. Others may be rare or unique.  

In one subject, only approximately 12,500 variants occur in coding regions of which approximately 10,000 are non-synonymous. However most or these are common and have a neutral effect and probably only 1,500 affect protein function. Additionally, a few hundred indels may also produce functional effects.  

Because different subjects have different sequence variants, 200 subjects may between them have 120,000 variants in the exome of which 50,000 are in coding regions. Of these, 25,000 are synonymous. The non-synonymous variants tend to be rare or unique and hence are presumably more likely to be exerting functional effects.  

Epigenetic effects

Epigenetic effects produce long-term modifications in gene expression which can be retained in daughter cells produced by mitosis. Thus they produce sustained changes in functioning of DNA without any modifications to the actual sequence of DNA bases.  

There are two main mechanisms - methylation of DNA bases and modification of histones. (Histones are the proteins DNA wraps round - DNA and histones together form chromatin.)  

Epigenetic processes account for the differentiation of different cell types. Cells of different tissue express different genes in spite of sharing the same DNA sequence.  

Epigenetic changes can occur in response to environmental factors including drug treatment. They may be a cause or effect of disease.  

It has been proposed that epigenetics processes may contribute to phenomena such as gene-environment interactions and incomplete penetrance.  

Updated January 2012  


Dave Curtis (d.curtis@ucl.ac.uk)