Anyway, this got me thinking. What good is a genome sequence? What is it going to tell us about our favourite organism that good old-fashioned biological enquiry and lab work hasn’t been able to tell us so far? Whenever the idea of sequencing the tuatara genome is discussed, one of the major questions that comes back (especially from non-geneticists) is “why? Even though genome sequencing is getting faster and cheaper by the day, it still requires huge resources of time and money and it’s not always obvious why its worth going to the effort.
By now you may be wondering what sequencing a genome actually means. The genome of an organism refers to all the DNA that makes up one set of its chromosomes – this includes all the genes, and all the pieces of DNA in between the genes. DNA is made up of 4 nucleotides, or bases – Adenine (A), Guanine (G), Cytosine (C) and Thymine (T). So sequencing a genome means determining the entire sequence of As, Gs, Cs and Ts for all of the chromosomes. Because genomes are so large (most mammalian genomes are a few billion bases long), they have to be sequenced in pieces (usually of a few hundred bases each) then put back together. Sequencing all the pieces is quick and easy, but putting them together takes huge amounts of computing power – like doing one giant jigsaw puzzle with several million pieces.
Having the DNA sequence is only just the beginning, however. The DNA sequence itself is meaningless until it is annotated – this involves figuring out which bits are the genes, what these genes are and eventually, what the genes do and how they interact. Annotation can take years and require huge amounts of bioinformatics manpower (and computer power). To give you some idea of what we’re talking about here, the cow genome sequence was finished a few months ago and involved more than 300 researchers in 25 countries, including 15 analysis teams to turn the raw data into meaningful knowledge.
So what do we get out of all this investment of time and money? Once we have an annotated whole genome sequence we theoretically know all the genes an organism has, and where they are found on the chromosomes in relation to each other. And even if our genome sequence is only partially annotated, we still have a huge head start on finding the genes we are interested in. Most of the power in a genome sequence comes with being able to compare its structure with other genomes already sequenced, enabling us to work on a much larger scale than was previously possible and without having to “guess” which genes may be important to investigate in advance.
This kind of scaling up has revolutionised evolutionary biology, enabling us to spot patterns in genome evolution that wouldn’t be apparent if we were only studying individual genes. For instance, we can identify parts of the genome that are conserved across distantly related species – these may have some important functional role which means their DNA sequence hasn’t changed much over millions of years, and may even be regions of the genome previously regarded as “junk” or nonfunctional DNA. We can also study how genes are formed and lost, or identify genes that appear to have evolved faster or taken on a new function in a particular species. Genome sequences can also help us understand how species and populations are related to each other. Instead of just having a handful of genes or markers available to build phylogenetic trees we now have literally thousands of markers at our fingertips, enabling more powerful comparisons to be made.
Whole genome sequences can also facilitate research where we only want to look at individual genes or particular regions of the genome to understand a particular biological trait, for instance resistance to a particular disease. For this type of more traditional genetics research having a whole genome is unnecessary, but it does provide a very convenient shortcut. Finding a particular gene or region of a chromosome and then sequencing it can be a laborious task requiring many hours in the lab. With genome sequencing becoming faster and cheaper by the day, it is fast becoming more cost-effective to sequence an entire genome upfront than to pick off small pieces of it to sequence as individual projects.
So there are a couple of good reasons to sequence whole genomes. Genome sequences allow researchers to work on a much larger scale than was previously possible to answer some fundamental questions about evolution; and circumvent the need for a whole lot of laborious labwork to isolate specific genes or genomic regions. So perhaps the question should be “which genome should you sequence?” That might have to be a topic for another post…