Quick recap: Last time we talked about tumor evolution and I presented a toy example to introduce key concepts. I also introduced the intra-tumor phylogeny problem: Given a sample of the genomes of clones in a tumour, reconstruct its `life history’. This problem consists of two sub-problems: (1) identification of clones, and (2) inferring evolutionary relationships between clones.
This problem falls into the general area of reconstructing phylogenetic trees — so how does inferring clonal trees compare to classical phylogenetic methods?
Classical phylogenetic trees
Joseph Felsenstein’s book Inferring Phylogenies is a classic in the field and the methods he describes have all been used in cancer genomics (eg here and here in one of my own papers). And Navin and Hicks show (conceptually) in a review from 2010 how different evolutionary scenarios can lead to different phylogenetic trees, so these methods could be very useful in differentiating between different theories of tumour evolution. Here is a link to their paper’s main figure with modes of evolution (a-e) and reconstructed phylogenies (f-j).
Clonal evolution trees
However, there are several reasons I believe that classical phylogenetic trees are not the best representation of clonal evolution in tumours. To start the discussion, the following figure compares classical phylogenetic trees to clonal evolution trees in the example we discussed in the last post.
Now, if you compare the panels of this figure, you will find:
- Classical phylogenetic trees do not infer clones, but place the taxa –in our case observable feature of the tumour like single-cell genomes or methylation patterns– as leaf nodes in the tree. In classical phylogenetic analysis you don’t need to cluster your observations, because you have the Mouse genome and compare it to the Human genome, instead of multiple individual genomes of mice and men.
- Inner nodes of classical phylogenetic trees are unobserved. The ancestral genomes at these inner nodes can be inferred in a second step. However, in a tumour ancestors and descendants can co-exist so the tree representation should allow inner nodes to be populated (like the right panel of the figure).
- Classical phylogenetic trees encode distances between taxa, and because distances are symmetric the trees are undirected. However, the accumulation of aberrations in cancer genomes gives clonal trees a directionality: The child nodes carry the parent aberrations plus some more. Clonal evolution is more about asymmetric subset relations than symmetric similarities.
- The problems classical methods face in a cancer setting become even more evident when thinking about data from deep-sequencing a mixed population of clones. Without deconvolving this mixture, the taxa are not even defined and there is nothing you can put into the tree.
The need for new methods
There are some caveats to what I just said:
- You can cluster the leaves into clones by cutting the tree at different levels like in hierarchical clustering;
- While inner nodes are indeed never observed, edge lengths can be very small (or even zero) and thus effectively place leaves in the middle of the tree. For the plot above I (realistically!) assumed there was noise in the measurements of each cell. For perfect data, in contrast, the phylogenetic tree would have many edges of length 0, which means: the cells from each clone would have no distance between them and clone A would come to lie on top of an inner node — which would result in a phylogenetic tree pretty similar to the clonal tree I drew.
- Not all phylogenetic methods are distance-based and others like maximum parsimony or maximum likelihood might be more effective for cancer studies. And using an outgroup can help to establish directionality in any tree. But models of DNA evolution are generally time-reversible, whereas in cancer we can assume that somatic mutations don’t go away again, so I think there is still a difference here, no matter how you look at it. Working in directed models also has computational advantages and, for example, allows us to avoid time-consuming marginalization steps for the inner nodes.
So, in summary, even if the classical approaches are not a perfect fit for tumor evolution, they might come close enough in some cases. How well they do against methods directly built on principles of tumor evolution is the topic of on-going research (Edith and Ke will soon have something to say about that.)
Importantly, problem (4) stands tall and strong: cancer studies are mostly done on a mixed population of cells, which needs to be deconvoluted prior to evolutionary analysis.
The next post will be about recent approaches to infer tumor evolution from data.
Thanks to Edith Ross, Ke Yuan, Thomas Sakorparnig and Moritz Gerstung for feedback on drafts of this post.