Inferring tumour evolution 6 – What do we talk about when we talk about a clone?

Series on Tumor Evolution

What do you picture when you hear the word ‘clone’? A white-clad imperial stormtrooper from Star Wars: Attack of the clones? Or a fluffy sheep called Dolly? Both are good choices. Both are good, solid, well understood clones. But how is the situation in cancer? This is where it gets difficult. In most talks (at least the ones I sit in) the word ‘clone’ is used very loosely like it was a trivial concept. My goal for today is to show that reality is more complex than the ‘plain vanilla’ version that is often described on some introductory slide.

I found some interesting comments to one of my recent posts trying to explain why ‘real’ evolutionary biologists have traditionally not been that interested in cancer. Erick Matsen wrote:

I was a little put off looking into the area when there was a recent pile of papers working on various ways to define clones (…). I’m not sure if I’m interested in entering such a “hot” field.

And David Posada wrote:

[L]ack of individuals: evolutionary inference is often made on populations of individuals, or on individuals from different species. Until the appearance of single-cell genomics (..) cancer data was on pooled individuals, which makes, in my opinion, evolutionary inference more complex and less powerful.

“Various ways to define clones” … “lack of individuals” … they have both spotted a key problem of cancer evolution studies: most data are from bulk sequencing a single tumor sample which pools all the different cells and clones in there. The ‘populations of individuals’ are the clones, but they are not a priori defined and need to be reconstructed from the mixture (see the initial posts in this series).

Inferring clonal evolution is indeed a hot field. As hot as it gets, actually. The ICGC pan-cancer project has a whole working group dedicated to the inference and characterization of tumor clones in 2500 bulk-sequenced samples. And a DREAM challenge on tumor phylogenies from bulk samples will soon open shop.

So, with all that activity on clonal evolution, do we at least understand well what a clone is?

The simple case: clonal and sub-clonal aberrations

Let’s start the discussion with individual aberrations (mutations, copy-number changes, basically anything you can do to a genome). An aberration is called clonal if it appears in all cells of a tumor. If it appears in fewer cells, it is called subclonal.

Here, clonality is a statement about cellular frequencies: 100% = clonal; <100% = sub-clonal (assuming you have already corrected for the number of normal cells in the sample). A popular way to assess clonality of a mutation is clustering by SNV frequency as we discussed in earlier posts.

Clonality is also a statement about the order of appearance: if an alteration is clonal, we take this as evidence that it was already there in the very beginning of the tumour, whereas sub-clonal aberrations are thought to appear later in tumor development.

What do we talk about when we talk about a clone?

So much for aberrations. Now let’s talk about cells. A popular definition is the following:

[A clone is a] set of cells that share a common genotype owing to descent from a common ancestor.

In some contexts a clone is more restrictively defined as a set of genetically identical cells. (Merlo et al 2006)

Problem 1: clusters of mutations are not yet clones

Figure 1: A cartoon histogram of SNV frequencies showing four clusters. These four clusters are not necessarily four clones. Depending on their frequency and phylogeny there might be less clones than clusters.

As a first step, let’s just accept this as a good definition. Then we are left with a practical problem: Linking the sets of mutations occurring at different (mostly subclonal) frequencies to sets of cells is not straight-forward. I have gone through this inference problem in more detail in a previous post.

Quick recap: to infer clones from clusters you need a phylogeny that tells you how the clusters relate to each other. For example, if you have one cluster at frequency 50% and one at frequency 30%, then only the phylogeny can tell you if there is a cell having both sets of mutations or if they live on separate branches. The frequency of clusters plus a phylogeny can tell you which cell populations with shared genotype (= the clones) exist in a tumor.

In practice, it is really hard to infer unique phylogenies if all you got are allele frequencies (this is what David Posada means when he says ‘complex and less powerful’) and a lot of uncertainty remains (see e.g. the PhyloSub paper for a discussion).

This means: while it is not too difficult to cluster mutations into sets of equal (allele or cellular) frequencies, making the additional step to predict the genotypes and frequencies of cell populations (= clones) is very hard. At least for bulk sequenced data, which makes up almost all the data that is out there.

Problem 2: there are no two cells with identical genome in the tumour

But it is even worse than that. Not only are there practical problems, I think there are conceptional problems too. The definition above talks about ‘cells sharing a genotype’ or ‘genetically identical cells’ –but do such cells actually exist? I don’t think so. Instead, I claim that with high likelihood no two cells in a tumour have a completely identical genome.

My thinking goes like this: The mutation rate of the human genome is notoriously hard to estimate, but I found some numbers for healthy tissue that give at least a first orientation: Bionumbers says 10^-11, Lynch 2009 says 10^-9 and Bozic and Nowak 2013 say 10^-9 to 10^-10.

Probability that genome stays identical
Figure 2: Probability that genome stays identical

These little differences matter! You can see that by a back-of-the-envelope calculation how likely it is that the genome stays identical through cell division. Assuming that bases are independent and all have the same mutation rate, this probability is (1-mutation rate)^(number of bases). The figure on the right plots this value for different mutation rates for a genome of 3 billion bases. For a mutation rate of 10^-10 the probability of staying unmutated is 74%, for 10^-9 it is 5% and for 10^-8 it is pretty much 0.

In cancer we will be on the right side of this plot. First of all, we expect mutation rates to be higher than in healthy tissue,  and additionally, copy-number changes and structural variation also contribute to the mutational load of a cancer genome, but are not covered in the mutation rates cited above.

As soon as the mutation rate is higher than a still healthy 10^-9 you can be pretty certain to see at least one mutation per cell division. This means: every cell in a tumor will have its own genome and there are no sets of cells with identical genomes. The more accurate sequencing technologies become, the more we will see of this diversity.

Defining a clone as a set of genetically identical cells sounds straight-forward until you realize that you will end up with as many clones as cells.

Problem 3: All cells in a tumor descent from one renegade cell

This leaves us with the last remaining part of the definition of a clone: The descent from a common ancestor.

Cancers are generally thought to start with one renegade cell. All the heterogeneity and genetic diversity we observe develop out of this cell, which is the common ancestor of all cells in the tumor. So if descent from a common ancestor is the criterion for being a clone, then the whole tumor is a single clone.

This might be the reason that some people speak of subclones instead of clones. The subclone is a part of the tumor clone. But which part? If you define it by genetic identity, you will run into the same problems as discussed in the last section.

What now?

Here are some ideas, which are modifications of the definition we discussed above. Figure 1 shows a small toy example of the history of tumor cell population. Circles are cells, colors correspond to clones, boxes are mutations.

Tumor development tree
Figure 3: A tree following the development of a tumor cell population (A – G) from a normal cell. The cell population consists of three clones (red, green, blue), sets of cells with no mutation occurring between them and their most recent common ancestors (cells 1-3).

To define a clone you need to bring different ideas together:

  1. Genetically identical cells. No two cells might be genetically identical in a tumor with a high mutation rate, but not all these changes might matter. If we want to describe the structure of the population we could restrict mutations to a predefined set (e.g. only known drivers mutations) and then define a clone as a set of genetically identical cells based on only these markers. This will ensure that a clone is more than a single cell, but opens the door to arbitrariness how to select the marker set, something that apologists of genome-wide ‘unbiased’ approached do not like.
  2. Identical by descent: The reason for genetic identity of a clone should be that all cells are descendants of the same ancestor (the cells numbered 1-3 for the three clones in Figure 3). If a mutation can only arise once during tumor development then all genetically identical cells will live on the same branch of the tree, but if mutations appear more than once, there could be (but maybe not very likely) two clones with the same genomes arising two different branches of the tree.
  3. Maximality: By the definitions in items 1 and 2, {E,F,G} is a clone, but its subset {F,G} would also be a clone. To remove this redundancy we should demand that the set cannot be extended by other cells and still be a clone. Then only {E,F,G} is a clone and {F,G} is not.

This definition applies to cells that are in the current cell population at the time of sampling. I am not sure what to do about historical clones like the yellow one that once lived in the tumor, but not anymore. We see the mutations that defined them in all their descendants, but how do we define them? And do we need to?

I hope I didn’t make things more complicated than they already are. At least for me it was helpful writing all this down.


9 thoughts on “Inferring tumour evolution 6 – What do we talk about when we talk about a clone?

  1. Very nice post, as usual. Indeed, clonal definition in cancer has been, in my opinion, always subjective, being depending on the specific “genomic markers” used.

    To help in making formal definitions that make use the tree topology and mutations, I suggest to look a the standard phylogenetic definitions of monophyletic (clades), paraphyletic and poliphyletic groups, and the concept of shared derived or ancestral characters (synapomorphyies, plesiomorphies,..). They might help at least to clarify your idea of “maximality”.

    Check for example:

    Best wishes,



    1. How about if a clone is paraphyletic? Take a look at:

      Here clone 1 (red) is paraphyletic regarding clone 2(green), so:

      1) is clone 1 still a clone?

      2) and would current clone delimiting methods still detect it?

      And now take a look at this other picture:

      Now the situation is also complex, as cell C from clone 1 is more related to clone 2 than to other cells in clone 1.

      Both situations are as normal as the one depicted in Figure 1…right?


      1. Hi David, thanks for these examples.

        I agree, these scenarios are as normal/expected as the one I drew.

        Regarding your two questions about the first example: Yes, clone 1 is still a clone and it would show up in the allele frequency spectrum.

        Assuming a diploid genome and no normal contamination I expect four bumps in the allele frequencies: yellow at 1, blue at 3/7, red at 4/7 (because clone 1 and 2 carry red) and green at 2/7. (In the second figure: yellow at 1, blue at 3/7, red at 4/7, green at 1/7).

        However, getting the phylogeny (and thus the genotypes of the clones) is much harder and without other evidence all methods I know would have a hard time deciding between your figure and a linear cascade yellow – red – blue – green (i.e. simply ordering the subpopulations by frequency).

        In your second example: C and D come from the same mother cell (and thus are much closer related than C is to A and B), but in terms of (driver) mutations, D is very different from C — and I would argue that the acquired mutations are more important than the cell lineages.



  2. So there you have your definition of a (cancer) clone: a maximal set of cells carrying the same (arbitrary) set of (driver) mutations.

    It seems you do not need anything else. Basically a “cancer clone” seems to be a convenience, rather than a natural concept



  3. I just read Eldrege and Gould’s paper on punctuated equilibria. What they say on p92 about speciation seems to me quite relevant to the discussion we had here:

    >> Biologists insisted that the biospecies is a “real” unit of nature, a population of interacting individuals, reproductively isolated from all other groups. Yet its reality seemed to hinge upon what Mayr calls its “non-dimensional” aspect: species are distinct at any moment in time, but the boundaries between forms must blur in temporal extension– a continuous lineage cannot be broken into objective segments.<<

    For cancer clones this means: We might be able to differentiate different cell populations in the sample we have (red, green, blue in my example above), but we cannot really say when and where in the history of the tumour they actually differentiated from yellow, because the boundaries are blurred. Does that make sense?


  4. Eldrege, Gould and Mayr are talking about the process of speciation, in particular using the biological species concepts which need reproductive isolation. Molecular evolution is a different realm, even more for somatic evolution, in which there is no sexual reproduction. I think the introduction of “punctuated equilibrium” in cancer is basically hand-waving and lack of formal evolutionary background. We do have models in molecular evolution to explain rate changes, within lineages and among lineages -we do not need to cite a theory for speciation.

    The concept of a (genetic) clone is straightforward. A clone is a set of genetically identical cells. A different issue is the “use” of the word clone in cancer, which often refers to a “maximal set of cells carrying the same (arbitrary) set of (driver) mutations”, as I mentioned above. Moreover, many people in cancer actually refer to “mutation clones”, which are stable in time and that occupy internal nodes in a tree. “Real” clones are always evolving (=changing), given the rate of somatic mutation per cell division, and most of the time will occupy tips in the tree, at more or less distance to the internal (now extinct) “real” clones.

    Lastly, we can differentiate different cell populations in the sample we have (red, green, blue in my example above), and using phylogenetics we can estimate (with more or less error) when and where in the history of the tumor the ancestor of red and blue differentiated from blue.


    1. About the ‘when and where’: what I meant is that in the mix of dividing cells over time it is hard to say “Look, that cell is still clone 1 and that daughter cell is now clone 2”.

      Punctuated equilibria: I agree. From what I understand of the history of evolution, the way ‘punctuated equilibria’ are used in cancer is completely incorrect – at least if you use the definition given by Eldredge and Gould. What Stratton and Garraway (who published papers on punct. equ. in cancer) mean are rather ‘Hopeful Monsters’ and other saltationist concepts. In particular, because the mechanisms they describe are Big Mutations (chromoplexy and chromothripsis) and have nothing to do with allopatric “speciation”. But ‘punctuated equilibrium’ just sounds way too sexy not to use it.

      Can I ask a general question: you think that none of the concepts from species evolution are applicable? To ask bluntly: Why can’t I just substitute ‘species’ for ‘clone’ and get a theory of cancer? (I understand about the apparent lack of sexual reproduction in tumours). Spatial isolation can definitely happen, because many tissues are highly structured (as is for example emphasized in the Big Bang paper). Allopatric ‘speciation’ would then mean: In an isolated, peripheral region of the tumour a small subpopulation of cells accumulates mutations and forms (one or many) new clone(s), which then might even invade the main tumour mass. That would be pretty close to what Eldredge and Gould describe (except that, yes, they talk about sexually reproducing species and highlight stasis between jumps).


      1. I mean that we do not need to think inter-species, but intra-species. Indeed molecular population genetics (i.e, evolution within species) has a lot to say about tumor development. We can and should substitute “allele” for “clone”.


You gotta talk to me!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s