Duty Calls, Science

The wrong type of consensus — Response to Maley et al, Nature Reviews Cancer 2017

In a recent paper in Nature Reviews Cancer, Maley et al set out to define a consensus framework for classifying neoplasms. The paper’s premise is that such a theoretical framework is a necessary first step for developing new quantitative approaches. I disagree. I argue that the paper highlights the limited practical relevance of a purely intellectual exercise. Solid classification frameworks of clinical relevance need more detail and need to be grounded on applicability to real data in clinical practice.


For those of you in hurry, let me sum up what my claims are:

  • This is a very good review of the field. Its particular strength is combining cancer evolution with the tissue microenvironment. You should definitely read it.
  • However, the review poses as something it is not: a classification scheme of clinical relevance.
  • The proposed classification scheme fails because (a) there is no practical way how to classify patients with it, and (b) evidence of clinical impact is circumstantial and anecdotal.
  • The authors recognise all these problems, but dismiss them as areas of future research, rather than testing prototypes of their scheme on real data.
  • Methodological and measurement innovations happen as we speak – no one needed this framework to kick start innovation.
  • Consensus on specific approaches will be much harder, much more interesting and much more useful, than consensus on lofty ideas.

What do Maley et al propose?

Everyone knows that tumours have two compartments: The cell-autonomous one (the cancer cells) changes by clonal evolution and the non cell-autonomous one (the microenvironment) consists of a variety of genetically normal cells, including immune cells. Obviously, a comprehensive description of cancer needs to take both compartments into account.

Based on this distinction, Maley et al propose a set of four binary questions: two about the cancer cells called the “Evo-index” and two about the microenvironment called the “Eco-index”. The answers to these four questions result in 2^4=16 classes of cancers.

The four questions are:

  1. [Evo1: D] Is tumour diversity high or low?
  2. [Evo2: Δ] Is diversity changing slowly or rapidly over time?
  3. [Eco1: H] Are hazard levels high or low?
  4. [Eco2: R] Are resource levels high or low?

So far, so good. At this highly conceptual level there has never been any dispute that I know of. That tumour diversity and interactions with the microenvironment are important is just oncological common sense in this day and age.

Still, it is good to see it spelt out that explicitly. Often the focus of papers is on either genomic evolution or the microenvironment, and I think it is a strength of Maley’s paper that they discuss both of them together.

I agree with the authors on many general points they make -after all my group has been jointly analysing cancer genomes and their microenvironment for some years now- but, as I will describe in the following, I believe the way they market these ideas as a consensus framework on how to classify tumours is not very useful.

The paradox of choice

Consumer theories  state that an abundance of choice leads to frustration, not happiness. And I am certainly frustrated when I look at Maley’s Table 1, which lists an abundance of choices how to answer the four questions.

There are two main issues: First of all, Maley et al give me no guidance which of these measures I should choose or even how they compare to each other. And second, even if by an act of divine inspiration I had found the right measure –say phylogenetic trees to measure genetic diversity– they give me no guidance on how to use the trees to answer the question. Which features of a phylogenetic tree will tell me whether D is high or low? Pray tell!

Just to make things even more complicated, the same measure applied to different types of data might yield contradictory results. For example compare the measures of genetic heterogeneity in breast cancer derived from SNV data [Sohrab Shah’s work, D=high] and copy number data [Nick Navin’s work, D=low].

Enough problems already, if we just concentrate on one type of diversity. To make things worse, the authors lump genetic diversity together with epigenetic, functional and phenotypic diversity. The relationships between these different types of diversity are completely ununderstood. They provide no insight at all on how such a broad view of diversity can be made concrete and measurable.

Maley’s message is “someone should somehow measure some sort of diversity”. This is empty hand-waving. And even though I completely agree with the direction the authors wave their hands in, the proposed system is useless unless specific approaches have been defined.

When your tumour is a rainforest

The authors give the 16 classes lyrical names like `desert’, `garden’ or `rainforest’ – ok, whatever, everyone has a right to their own flowery metaphor.

But what does any of this mean for a patient? Since no one can place real tumours into the 16 classes, Maley et al have no way of showing that the classes actually matter.

Always cautious, they have included safeguards in the paper: “We will probably be able to drop some of the 16 possible tumour types (…) and focus on the subset of classes that present in the clinic.” I think it would have been good scientific practice to find out which of these deserts, gardens and rainforests are actually observed in real patients before presenting this whole theoretical framework to the world.

In clinical practice, several binary schemes are already in use. Examples include patient stratification by ER or HER2 status in breast cancer. These classifications are useful, because (a) they reflect the biology of the disease, (b) can be routinely measured, and (c) have a direct impact on treatment.

For Maley’s system, (a) is hard to judge, because the classification is so unspecific. But I think there is some evidence that at least individual aspects of their big scheme reflect biology. However, (b) and (c) are definitely not true for Maley’s system.

As a system of clinical relevance, as promised in the abstract, Maley et al’s scheme is a failure. The clinical impact of the different classes, should they actually exist, is completely unverified. Of course, Maley et al cite many papers that provide some limited evidence for the clinical usefulness of one part of the classification or the other, but there is no evidence that the complete system is useful at all.

Thus, as it is presented, the classification scheme Maley et al propose is a purely intellectual exercise without practical relevance.

Eat your pudding!

Because of these limitations, Maley’s paper is a piece of armchair oncology. It is not a How-To guide on how tumours should be classified. It is a gleeful statement of “wouldn’t it be ace if we knew how to”.

I am an armchair oncologist myself, trying to fight the disease with the weapons of mathematics, statistics and computer science. I can see the temptation of trying to bring order into the cancer chaos. But at the end of the day, the only thing that counts is what you get out of the data. The proof of the pudding is in the eating.

If the authors themselves actually knew how to use their system, I am sure they would have tried. Large international efforts like TCGA or ICGC and the Pan Cancer Analysis of Whole Genomes contain the data to try out at least a prototype of this proposed classification scheme. But the authors did not get their hands dirty.

Will this classification be used? No. No one will use this classification. Ever. Not even the authors of the consensus statement. Because no one can.

Will this paper be cited? Yes. Thousands of times. After all this is a very good review covering a lot of ground. That it fails as a classification system will not hurt its success as a review article – especially with such a glamorous author list.

The visibility of the paper must not distract from its problems: it lacks all specific details to make it a useful contribution to cancer research. The best thing about Maley’s approach is Box 1, which describes the real issues quite well. Given that the authors seem to understand the problems, I really don’t understand how this paper can be marketed as a consensus on how to classify tumours.

The authors try to avoid criticism by presenting all practical problems as topics of future research. Maley et al want to make us believe that their theoretical framework enables methodological and measurement innovations. I disagree: these innovations happen as we speak – no one needed this framework.

A valid framework needs to be built on measurements and evidence rather than precede it. Consensus on specific approaches will be much harder, much more interesting and much more useful, than consensus on lofty ideas.

Even if this paper was just meant as a first step, it should have been bigger and clearer.

The real work goes on.



3 thoughts on “The wrong type of consensus — Response to Maley et al, Nature Reviews Cancer 2017

  1. I am obviously biased but that article could have done with more statisticians on the author list to offer the statistical perspectives on this topic.


You gotta talk to me!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s