Being so busy beating cancer one technical paper at a time, I often don’t get the opportunity to step back and see how our stuff relates to what other people are doing in foreign territories … like the humanities. So I was thrilled to be invited to team up with Barbara Zipser, a researcher in the history of medicine at RHUL. In a chapter in her forthcoming book we contrast stemmatics and textual criticism in philology with phylogenetic methods in biology. The following fragment is part of my bit of the bargain. Enjoy!
Unlike physics, biology does not have a strong mathematical theory to explain and predict observed phenomena. This may be one of the reasons why biology is so rich in metaphors. The Tree of Life connects all forms of life on earth. Conrad Waddington famously compared the development of cell types and tissues to marbles rolling down a grooved slope, the so called epigenetic landscape. And inside every single cell the nucleus contains an organism’s genome, the Book of Life written in the language of DNA. Similar to a text written in a human language, DNA transfers information, it can be transcribed into a different form (RNA instead of DNA) and it can be translated (into proteins).
The idea that the genome can be read and edited pervades all molecular biology and forms one of the most powerful and suggestive metaphors of modern science.
Errors in the book of life
The step from a biological molecule to a written text seems large, but it is actually quite easy to make. DNA is built of four nucleotides, the bases Adenine, Cytosine, Guanine and Thymine, which can form long chains. In higher organisms, two chains together form the strands of a double-helix. The text of the book of life is a linear abstraction of the three-dimensional structure of DNA. The first step is to unwind the double-helix into two parallel linear strands. Next, we notice that the pairings between them are not random: Adenine only binds Thymine and Cytosine only binds Guanine. Thus, the strands are complementary; if I know one, I can reconstruct the other. This allows us to concentrate on a single strand. If we now, in a final step, abbreviate each nucleotide of this strand by its first letter, we have a linear sequence of A’s, C’s, G’s and T’s – the text of the book of life.
For example, the beginning of the gene sequence of FOXP2, which is important for the development of speech in humans and only found in an altered form in apes, looks like this:
CTGATTTTGTGTACGATTGTCCACGGACGCCAAAACAATCACAGAG CTGCTTGATTTGTTTTAATTACCAGCACAAAATGCCATCAGTCTGG GACGTGATCGGGCAGAGGTGTACTCACA...
It is not visible in this very short example, but the genomic text is highly structured. For example, genes have precise start and stop positions. Around the genes short sequence patterns indicate the positions where proteins can bind to the DNA to turn genes on or off, allowing cells to react to different environments. Thus, even though the text is the same at all times in all cells of an organism, it’s ‘meaning’ is context-specific: a liver cell and a brain cell both contain exactly the same genome, but in different tissues different parts of the text are read out.
The basic process in genetics is the copying of the genetic DNA material, which happens every time cells divide. Mistakes in copying the genomic text in a single cell can have terrible consequences for the whole organism. For example, a copy mistake that activates an oncogene or de-activates a tumor suppressor gene can often lead to cancer.
Cancer is a distorted version of our self – a point powerfully made in Siddhartha Mukherjee‘s Pulitzer-price winning book The Emperor of All Maladies. It took until the middle of the 20th century to realize that cancer does not invade the body from the outside but is often derived directly from the tissue in which it was first discovered. Both the normal growth of an organism and the abnormal growth of cancer, can be traced back to the genome. Observations like these strengthen the book of life metaphor and make DNA the center of interest in molecular biology.
Compared to a normal cell, the genomes of cancer cells often look completely chaotic: some parts are missing, other parts are repeated many times, and the order of the text can be completely lost. Think of a vandalized collection of books in a foreign language that has been poorly set together again by a novice librarian. Currently several international projects are under way to catalogue as many genomic changes as possible in as many cancers as possible. The hope is that this catalogue will then point us to the drivers of cancer, the genomic aberrations that cause a cell to become a cancer cell. This is a difficult task, because not all aberrations are causative; some are just `passengers’, which develop because cancer in general has a much lower genomic stability than the more tightly regulated normal cells.
To add to this complexity, there is not always a single driver for each type of cancer, instead different (combinations of) aberrations can cause very similar cancers. Untangling this complexity requires large sample sizes: cancer genome projects involve sequencing hundreds and thousands of genomes. This makes the cancer genome projects a much larger effort than the Human Genome Project which took more than a decade to succeed. The hope is that in this mass of data the drivers of cancer will stand out as very frequently mutated genes, while the random mutations in the passengers are much less frequent.
Success in these large-scale international projects will depend largely on technological advances and meticulous book-keeping, less on scientific inspiration and eureka moments.
Evolution and the book of life: textual criticism of genomic sequences
Cancer exploits natural mechanisms that have developed during evolution to allow a species to better adapt to its environment. Evolution has several mechanism to act on the genome. Individual letters (nucleotides in the DNA) can be mutated and changed. These mutations are called single nucleotide polymorphisms (SNPs; pronounced `snips’). Counting the number of SNPs allows us to infer how related two genomic sequences are: the more SNPs, the further apart they are. Another evolutionary mechanism is recombination, where a region of the DNA is cut out and joined in at a different position. One of the effects of recombination can be a change in the number of copies of a DNA region, which can get lost or amplified. In general all humans have two copies, but every one of us can naturally have more or less copies in particular regions. This variation is a very mild form of the chaos raging in a cancer cell. Finally, an evolutionary mechanism to act on the genome is the incorporation of genes from one population into another, so called gene flow. When not perverted by cancer, all these genomic changes happen naturally and contribute to the variety of body sizes, hair and eye colours and the rest of the phenotypic diversity we see in humans.
The mutations observable in the course of evolution were beneficial and induced an evolutionary advantage. In some sense, they are `improvements’ of the text. Already few mutations can have an tremendous impact on the morphology and behaviour of related species. For example, the genomes of modern humans, chimpanzees and Neandertals are almost identical. Humans and chimpanzees differ by only 1.2 percent of all base-pairs in gene sequences, and Neandertals are even closer.
DNA changes trace the evolutionary history of species. We don’t know all the details of the `Tree of Life’ but collections of genomic sequences allow researchers to estimate how many years ago two species were still one and when the split between them happened. In 2010 an international research consortium led by Svante Pääbo published a draft sequence of the Neanderthal genome. Their goal was to identify genomic features that distinguish modern humans from other hominin forms by comparing the human genome to the genomes of Neanderthals and apes. In humans and chimpanzees the DNA could be sampled from current populations. In Neanderthals, however, the DNA needed to be retrieved from archaeological and paleontological remains, making it a challenge to prove the authenticity of DNA sequences retrieved from ancient specimens.
Still, obtaining a Neanderthal genome is worth the effort, because genomic analyses can reach much further back in history than archeological analyses based on excavated bones and artefacts can. For example, the earliest known remains of anatomically modern humans are 195,000 years old. From genomic data, however, it could be estimated that the split between ancestral human and Neanderthal populations happened 370,000 years ago, extending the horizon by 175,000 years. So far, only around 100 genes — surprisingly few — have been identified that have contributed to the evolution of modern humans since the split. Less surprisingly, several of theses genes are involved in cognitive function and others in bone structure. Understanding the functions of these genes better can have a bearing on what it means to be human — or at least not Neanderthal.
Genomic data also allows us to address questions that can be very hard to solve from archeological data alone. Did Neanderthals interbreed with anatomically modern humans? Substantial controversy surrounds this question: Morphological features of present-day humans and early anatomically modern human fossils have been interpreted as evidence both for and against genetic exchange between Neanderthals and human ancestors. However, Neanderthals mating with humans must have left traces in both the Neanderthal and human genomes. Pääbo and his colleagues compared three Neanderthal genomes with five genomes of present-day humans from different parts of the world, including Africa, Asia and Europe. If Neanderthals are more closely related to present-day humans in certain parts of the world than in others, this would suggest that Neanderthals exchanged parts of their genome with the ancestors of these groups. Pääbo and his colleagues found that Neanderthals are equally close to Europeans and East Asians, but they are significantly closer to non-Africans than to Africans. This can be explained by Neanderthals exchanging genes with the ancestors of non-Africans.
Statistics versus `the facts’
Svante Pääbo’s study — elegant and diligent as it may be — will not be the last word on human prehistory. The dominance of the genome in phylogenetic studies is not uncontested. DNA evidence can be contradicted by other, more classical sources of data, like fossil records. In the question of Neanderthal-human interbreeding, the genomic evidence points to a period of 100,000-60,000 years ago in the Middle East. However, the archeological record for an overlap in the populations at this time and place is very sparse. Archeologists and paleo-anthropologists favour a scenario in which interbreeding happened in Europe, possibly from 44,000 years ago (when modern humans first entered Europe) to 30,000 years ago (when the last Neanderthals fell extinct).
What becomes visible here is the gap between two types of approaches to answer the same scientific questions: Geneticists and computational biologists prefer to analyze DNA, because it directly shows the traces of evolution. To them, DNA is an overwhelmingly superior source of information. It can be analyzed much more precisely than the blur of phenotype in ancient specimens. Archeologists, on the other hand, prefer to draw conclusions from the fossil record. While most of them agree that geneticists have been making valuable contributions to human prehistory, they feel that genetic analysis rely too heavily on computational methods and mathematical statistics. With their scientific training, archeologists find genetic arguments very hard to follow and far less solid, informative and convincing than `hard’ archeological fact.
In doubting statistics archeologist are not completely wrong. Reconstructing genomic phylogenies relies on the statistical analysis of genomic data and –like all statistical analyses– crucially depends on mathematical assumptions that can sometimes be disputed. No statistical method recognizes `the truth’; they only make estimates based on the likelihood of observed events and a quantification of the uncertainty in the data. As a result, statistical estimates from genomic data are seldom so clear cut and convincing that they automatically beat conclusions drawn from other sources of information, like for example fossil records. Often, it is a judgement call for the researcher whether or not to trust the statistical results of a phylogenetic genome analysis.
The same is, of course, true for archeological `facts’. Facts are theory-bound — they need to be interpreted and put into a bigger context, else they are useless. This requires no less experience and judgement than the analysis of statistical results. Finding one type of evidence more convincing than the other is a matter of education and training, not of the intrinsic scientific value of different types of data.
The story about Neanderthal-human mating is not the only example of this gap in scientific backgrounds and approaches. The science writer Carl Zimmer has collected several others in his essay `The Genome: An Outsider’s view‘. In some of them, the fossil evidence clearly contradicts the statistical claims; in others, fossil evidence later validated claims from DNA data that had seemed far-fetched at first. Carl Zimmer compares the clash between different research styles with the fistfight between the blind men, who had touched different parts of an elephant and could not agree on an interpretation of what they had felt.
A common theme emerging from these examples is the need for integrated approaches that combine different styles of research. In some areas, like cancer biology, interdisciplinary training and research are already well established. Other areas, like paleo-anthropology, will need to follow if they want to take advantage of the full range of data and insights.
Maybe the abundance of metaphors in biology actually turns out to be a strength when it comes to interdisciplinary research, because metaphors –if used carefully– can bridge the gaps between disciplines.