It's All in the Code!
![]() |
|||||||||||||
![]() |
![]() |
||||||||||||
A Celera lab technician works with a DNA sequencing machine. Photos courtesy of www.GenomeNews Network.org/J. Craig Venter Institute. |
|||||||||||||
Genes make each of us what we are. Each cell in our bodies contains our genome, a code of more than 3 billion letters contained in the protein DNA. Sequencing DNA means figuring out what order the letters appear in -- their sequence. Knowing the sequence helps scientists figure out what kind of genetic information is carried in a particular section of DNA. Some sections contain genes; other sections don't. Some sections may show changes in sequence, called mutations, that can cause disease.
In the past, geneticists studied genes one at a time. Today, scientists study the whole code, a situation comparable to getting a complete story, rather than using individual words to try to figure out a plot.
![]() |
Dr. J. Craig Venter. |
How do you sequence DNA when such gigantic numbers are involved? If you figured out one letter per second, it would take you longer than a century to sequence a human's DNA, says J. Craig Venter. He's the scientist who first sequenced a genome, the Haemophilus influenzae bacterium, which can cause ear infections. In 1994, he used a new machine and computers to do the job of identifying each letter in the code. He later went on to be the first to sequence the human genome.
The DNA code is made up of four "bases," the letters of the genetic alphabet: A is for adenine, G is for guanine, C is for cytosine, and T is for thymine. When an organism's genome is sequenced, the result is thousands to billions of these letters. A virus of the E. coli bacterium has around 5,000 base pairs while the Pompeii worm's is estimated to be about 800 million. The human genome has over 3 billion. Yet humans by far do not have the largest genome. The record for the largest known genome currently is held by a tiny, single-celled organism, an amoeba (Amoeba dubia), which has some 670 billion base pairs.
The
first DNA sequencing methods were developed in the mid-1970s.
Back then, scientists could only sequence a few base pairs of
DNA per year -- not nearly enough to sequence a single gene.
When the Human Genome Project began in 1990, only a few labs
could sequence even 100,000 base pairs per year. Today, the
latest production-scale sequencer can analyze millions of
base pairs of DNA in a 24-hour period.
Since most genomes are too large for any machine to sequence all at once, scientists have to chop up the genome into manageable chunks. These pieces are sequenced and then fit together much like a giant jigsaw puzzle to form the complete genome.
But where does one chunk of the genome end and another begin? Computer programs called assemblers look for overlaps in identical sequences so that they can put the genome back together in the proper order. As careful as scientists might be, errors can occur at many different points in the sequencing process. So how do scientists know if they've got the sequence right? To make sure, scientists typically will sequence a genome between 6 to 10 times.
While automatic sequencing machines have sped up the task of revealing the genetic code, they do not tell scientists an organism's genetic secrets. That takes much more work although the genome sequence can provide scientists with clues about where certain genes are. Genome maps also can help scientists navigate their way to locations of interest.
Scientists
constantly are adding to a catalog of genomes that have been sequenced.
Our Extreme 2008 scientists will "blast" the microbes
they find at the hydrothermal vents. This means that they will
try to match the microbes' DNA to those in the catalog on the
way to sequencing their genomes. This is important not just "because
it's there" --
for the sake of discovering something new -- but because new
bacteria have the potential to help us find new medicines, fuel
sources, and even foods.
Sources:
Canadian
Museum of Nature: The
Geee! in Genome
Genome News Network: What's
a Genome
National Human
Genome Research Institute, National Institutes of Health
U.S.
Department of Energy Human Genome Project Information Web Site













