What is a Genetic Sequencing?

Genetic sequencing is a the determination of the order of the bases in DNA: { Adenosine, Cytosine, Guanine, Thymine (Uracil in RNA) }. These bases, in triplets called codons, correspond to the amino acids which make up proteins and enzymes in a living body.

How is it done?

First, some background...

The size of the problem

that is, the size of a DNA genome is on the order of 10^9 bases. The directed approach to get the answer is to divide the problem and the DNA genome into smaller pieces--divide the whole genome into pieces of O( 10^6 ) bases, and divide these into contiguous segments ('contigs') of O( 10^4 ) bases. This way, the sequencing of a particular gene may be done concurrently by different (teams of) scientists working on different parts.

The process (sort of):

Genes are cut up into short segments that are still long enough so they will appear on the contig only once--segments of 350 or so bases. Then they are sorted and separated (somehow) according to the 350-base sequence. So now we have a set of segments 'A' which has one uniquely identifying 350-base sequence, 'B' which has another sequence, and 'C', 'D', and so on.

A promoter is attached to the end(s) of the segments to facilitate the attachment of additional bases. Then two kinds of segments, say A and B, are added to a solution of contigs which are only one side of the double helix. In solution, segments A and B 'find' their places beside their complements on the contig (that is, they are most readily attracted to those locations (how?)). Next, the enzyme polymerase is added to initiate the polymerase chain reaction and extend the segments. It appends bases to the 'promoted' ends of segments A and B according to the pattern of the contig. After a time, the extended segments A+ and B+ are long enough that they overlap--that is, they both have a common sequence of bases (that is long enough to be unique on that contig).

  .--------------------------------------------------------------------------.
  | Example:                                                                 |
  |                                                                          |
  |  contig  ...GTGCCAGGTCTTGTCACAAAGTTC...ACGCGCGCGTACGTCACTGCACTCTCA...    |
  |    A        CACGGTCCAGaacagtgtttcaag...tgcgcgcgcatgc                     |
  |                                                                          |
  |  contig  ...GTGCCAGGTCTTGTCACAAAGTTC...ACGCGCGCGTACGTCACTGCACTCTCA...    |
  |    B                       tgtttcaag...tgcgcgcgcATGCAGTGACGTGAGA         |
  |                                                                          |
  |                                                                          |
  |    A        CACGGTCCAGaacagtgtttcaag...tgcgcgcgcatgc                     |
  |    B                       tgtttcaag...tgcgcgcgcATGCAGTGACGTGAGA         |
  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^                     |
  |                            common sequence                               |
  `--------------------------------------------------------------------------'

When the common sequence appears, it can be concluded that A and B are indeed both present on the same contig, and that they are connected by that common sequence.

How is it detected? A chromatograph is taken using a method related to four-component Fourier transform (one component from the fluorescence of each base).

The common sequence will provide a larger amplitude of detection because of its greater number frequency.


References:

Dr. Pat Gillevet, Biology Department, GMU.


prepared by Jonathan Steidel ( jsteidel at science dot gmu dot edu )