“Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer’s life history is encrypted in the somatic mutations present,”
write Nik-Zainal et al in the abstract of their 2012 Cell paper `The life history of 21 breast cancers’. The key figure of their paper shows a phylogenetic tree of tumor development in a patient. The paper contains lots of computational work on analyzing and interpreting mutations based on deep-sequencing data, but –a big surprised but— the very last step of putting together the tree was done manually. Half the paper is describing the reasoning that Peter Campbell and his group used to condense all the evidence they had gathered from genomic data into the tree – but there is no algorithm.
Obviously I was getting terribly excited when I saw this and started muttering to myself `I can automate this! I can put together an algorithm to build trees! That’s what I’m good at!’ Little did I know how difficult the problem was, how short the reads and how sparse the mutations. But my enthusiasm, if not my ignorance, must have been shared all over the world by all other computational biologists working in cancer genomics, because the last 1-2 years have seen quite a variety of approaches to assess tumor heterogeneity and reconstruct phylogenies.
Describing cancer as an evolutionary process is not a new idea, going back at least to Nowell 1976. Mutations are thought to increase the fitness of cancer cells and make the population grow faster, out-competing normal cells and other less-fit cancer cell populations. Cancer evolution has already been widely reviewed before the technological advances in sequencing over the last few years brought renewed vigor to the field.
Cancer evolution is currently a very active field of research and will be for a while. This is why I want to use a series of posts to discuss some of the basic underlying concepts and ideas.
A toy example of cancer evolution
Let’s start with the toy example in Figure 1 above, which conceptually summarizes the evolution of a tumor from the first tumour initiating mutations to the heterogeneous tissue at the time of sampling (it is similar to Figure 7 in Nik-Zainal et al.).
The tumour sample on the right contains some normal cells (grey circles) and three different cancer clones (i.e. cancer cells that share a genome). Each clone is characterized by sets of mutations the cell population acquired over time (indicated by the letters A, B, C and D). Early mutations (A) are shared by all clones, while later mutations (C, D) differentiate younger clones. (I am using italic letters for mutations and bold italics for clones.)
The evolutionary process leading to this heterogeneity is summarized in the left plot, where colors correspond to clones. The shapes show the expansion of clones over time. At the time of sampling 3 clones are still in the tumour (A, ABC, ABD), while a fourth one (AB) has been replaced by its descendents.
This cartoon is obviously very simplified! For example most tumours will have many more very small clones. But even this simple toy example is already interesting enough to make some important observations.
A tree of clonal evolution
The cellular composition of the tumour and the evolutionary relationships between the different components can be summarized in a tree, like the one you see in Figure 2 on the right. The numbers in the nodes correspond to the percentage of cells in the sample that belong to this particular clone. The grey top node indicates the 20% normal cells in the sample. The oldest clone is the green clone A, which is represented by 15% of cells in the sample. In the A population B mutations appeared and formed the AB clone, which is not present in the sample (0%) because its descendents ABC (25%) and ABD (40%) replaced it.
The way I have drawn the tree is also very simplistic and, for example, does not scale the edges according to mutation rates or evolutionary time passed. But as we will see (in later posts) even inferring such a simplified representation from data can be tricky!
This little example is already quite instructive. For example, we can see the difference between the frequencies of clones (A 15%, ABC 25%, ABD 40%) and the frequency of the mutations characterizing them (A 80%, B 65%, C 25%, D 40%), which are the sums of all cellular frequencies carrying this particular mutation. We can also see that ancestors and descendents can co-exist, which means that (some of) the inner nodes in the tree are populated. Finally, cancer trees have a well-define root node (the normal cells without mutations) and generally have a clear directionality, because mutations accumulate over time with child nodes keeping the parent mutations and adding more to them.
Infinite sites assumption. As you can see a key assumption underlying the analysis of mutation data is that each that each mutation appears only once and furthermore that once it appears, it does not revert back to its original state. This is called the infinite sites assumptions and from it follows that tumor evolution is a process of accumulation of mutations: earlier clones have less mutations and later clones have the early mutations plus some more.
The intra-tumour phylogeny problem
Using genomics technologies we can sample (features of) the genomes of the heterogeneous cell mixtures that is a tumour. These features can include single nucleotide variants, copy-number aberrations and CpG methylation.
The intra-tumour phylogeny problem is to infer a phylogenetic tree like the one in Figure 2 from this genomic sample. The problem comprises two sub-problems:
- Identify clones (= the nodes in the tree). If your data comes from deep-sequencing a mixed population of clones you will have to deconvolute this mixture to identify the clonal genomes. And if you have genomes of single cells from the tumour, you need to cluster them into clones.
- Relate the clones to each other (= the edges of the tree). Once you have the nodes you need to connect them in a graph, where they can take inner nodes and leaf nodes.
These tasks can be solved sequentially or jointly, and in the following posts I will discuss different methods to solve the intra-tumour phylogeny problem.
Thanks to Thomas Sakoparnig and Moritz Gerstung for comments on drafts of this post.