Inferring tumour evolution 2 – Comparison to classical phylogenetics

Series on Tumor Evolution

Quick recap: Last time we talked about tumor evolution and I presented a toy example to introduce key concepts. I also introduced the intra-tumor phylogeny problem: Given a sample of the genomes of clones in a tumour, reconstruct its `life history’. This problem consists of two sub-problems: (1) identification of clones, and (2) inferring evolutionary relationships between clones.

This problem falls into the general area of reconstructing phylogenetic trees — so how does inferring clonal trees compare to classical phylogenetic methods?

Classical phylogenetic trees

Joseph Felsenstein’s book Inferring Phylogenies is a classic in the field and the methods he describes have all been used in cancer genomics (eg here and here in one of my own papers). And Navin and Hicks show (conceptually) in a review from 2010 how different evolutionary scenarios can lead to different phylogenetic trees, so these methods could be very useful in differentiating between different theories of tumour evolution. Here is a link to their paper’s main figure with modes of evolution (a-e) and reconstructed phylogenies (f-j).


Clonal evolution trees

However, there are several reasons I believe that classical phylogenetic trees are not the best representation of clonal evolution in tumours. To start the discussion, the following figure compares classical phylogenetic trees to clonal evolution trees in the example we discussed in the last post.

Classical phylogenetic trees compared to clonal evolution trees. Left: the poly-clonal tumour from the last post. Middle: cells sampled from the tumour arranged as leaves in a phylogenetic tree. The bold letters are the genotypes of the cells (according to the example in the last post). The grey letters in the tree are inferred ancestral genomes. Right: A clonal tree representation of the same tumour where nodes are clones (not cells) and inner nodes can be populated.

Now, if you compare the panels of this figure, you will find:

  1. Classical phylogenetic trees do not infer clones, but place the taxa –in our case observable feature of the tumour like single-cell genomes or methylation patterns– as leaf nodes in the tree. In classical phylogenetic analysis you don’t need to cluster your observations, because you have the Mouse genome and compare it to the Human genome, instead of multiple individual genomes of mice and men.
  2. Inner nodes of classical phylogenetic trees are unobserved. The ancestral genomes at these inner nodes can be inferred in a second step. However, in a tumour ancestors and descendants can co-exist so the tree representation should allow inner nodes to be populated (like the right panel of the figure).
  3. Classical phylogenetic trees encode distances between taxa, and because distances are symmetric the trees are undirected. However, the accumulation of aberrations in cancer genomes gives clonal trees a directionality: The child nodes carry the parent aberrations plus some more. Clonal evolution is more about asymmetric subset relations than symmetric similarities.
  4. The problems classical methods face in a cancer setting become even more evident when thinking about data from deep-sequencing a mixed population of clones. Without deconvolving this mixture, the taxa are not even defined and there is nothing you can put into the tree.

The need for new methods

There are some caveats to what I just said:

  1. You can cluster the leaves into clones by cutting the tree at different levels like in hierarchical clustering;
  2. While inner nodes are indeed never observed, edge lengths can be very small (or even zero) and thus effectively place leaves in the middle of the tree. For the plot above I (realistically!) assumed there was noise in the measurements of each cell. For perfect data, in contrast, the phylogenetic tree would have many edges of length 0, which means: the cells from each clone would have no distance between them and clone A would come to lie on top of an inner node — which would result in a phylogenetic tree pretty similar to the clonal tree I drew.
  3. Not all phylogenetic methods are distance-based and others like maximum parsimony or maximum likelihood might be more effective for cancer studies. And using an outgroup can help to establish directionality in any tree. But models of DNA evolution are generally time-reversible, whereas in cancer we can assume that somatic mutations don’t go away again, so I think there is still a difference here, no matter how you look at it. Working in directed models also has computational advantages and, for example, allows us to avoid time-consuming marginalization steps for the inner nodes.

So, in summary, even if the classical approaches are not a perfect fit for tumor evolution, they might come close enough in some cases. How well they do against methods directly built on principles of tumor evolution is the topic of on-going research (Edith and Ke will soon have something to say about that.)

Importantly, problem (4) stands tall and strong: cancer studies are mostly done on a mixed population of cells, which needs to be deconvoluted prior to evolutionary analysis.

The next post will be about recent approaches to infer tumor evolution from data.


Thanks to Edith Ross, Ke Yuan, Thomas Sakorparnig and Moritz Gerstung for feedback on drafts of this post.


6 thoughts on “Inferring tumour evolution 2 – Comparison to classical phylogenetics

  1. Hi Florian, first of all, let me say that this a very nice series of posts.

    I finally found some time to make some comments (for now just on this particular post)

    To start, I would like to restate that in my opinion points 1-3 are not conceptually problematic for classical phylogenetic analysis, as you mention in your caveats.

    1. Indeed, classical phylogenetic analysis is about clustering observations (=clades/branches/splits/nodes) at different hierarchical levels. And often have multiple genes/genomes from the same species! This happens all the time when we work with closely related species (there are many examples, but see for example our recent SysBio paper:

    2. Note that you are always talking about allele/haplotype/clone trees, which are not the same as gene (copy) trees ( In cancer gene (copy) trees correspond to cell trees, and in such case ancestors and descendants never co-exist, and true trees are always binary. When we refer to the history of the alleles/haplotypes/clones, then, as you mention, classical phylogenetic analysis can just use zero branch lengths to represent internal contemporary nodes, as you mentioned, but we can also use phylogenetic networks (

    3. Classical phylogenetic analysis provides directionality once rooted (this is what rooting is about). In fact your classical tree on the left is rooted, so the different events are already ordered! Models of evolution are usually time reversible for computational reasons, one of them being to ignore root location for calculating tree scores. In cancer it should be trivial to root the trees using the healthy matched samples. In addition, we do have asymmetric models of nucleotide substitution and in the parsimony framework this assumption make things even easier (Camin-Sokal parsimony).

    Now, it is true that in classical phylogenetic analysis we need to have some specified taxa (alleles/clones) a priori, but this will not be a problem for single cell data, and for pooled data we can always try to do variant calling and haplotyping. And, we have also species delimitation methods (, quite intensive, but that in theory could be also used, maybe with some modifications, in this case.

    Clearly, the type of data (pooled DNA vs. single cell; genotypes vs. phased haplotypes) will make a difference. For most classical phylogenetic analysis we need phased haplotypes, but because of the lack of recombination there are simple shortcuts to this (i.e., paternal and maternal chromosomes have the same history). But In “modern phylogenetic analysis” we have also specific methods for SNPs, in this case for inferring species/population (see for example

    In my opinion, the biggest challenge for classical phylogenetic analysis is that in cancer in principle we just work with one locus (complete linkage), and the sampling variance of the evolutionary process will be difficult to deal with,

    In summary, I do not see any clear advantages of “clonal evolution trees” over “classical phylogenetic trees”…but I am happy to keep discussing about this 😉


  2. David Posada is correct.
    I favor classical phylogenetics, and extending classical methods to account for uncertainty, data types, and evolutionary models in cancer data sets.
    Classical phylogenies worked well for me in
    There’s always uncertainty, in phasing, etc. but I’d rather rely on classical phylogenetic theory and methods
    that are well developed and tested. In other words, I would favor building on existing phylogenetic theory nd methods (and learning it) versus coming up with ad-hoc approaches.

You gotta talk to me!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s