Career, Science

Methods vs Insights #3: the data don’t fall from the sky

Methods vs Insights is back! In the first post on this topic, I distinguished between computational biologists and computational biologists. The boundaries between the two groups are blurred and my own group has people with computational and biological backgrounds working on very similar problems.

Having The Talk

But when a new team member joins us fresh out of a computer science or statistics degree, I need to have The Talk with them. The Talk about how our work at a biomedical research institute differs from the work in a computer science department. The Talk about how to get into journals with an impact factor bigger than 5.

I generally start by sketching a plot on my white board, which looks like this (yes, that’s true, my hand-drawn plots look just like fresh out of Illustrator):

The difference between biomedical research and methodological research.
The difference between biomedical research and methodological research.

Let’s start with the inner blue box. Methodological research –what editors of glamorous journals generally call `more suited for a more specialized journal’– often goes like this:

  1. Pick a high-profile paper that describes a big data set or some new type of data;
  2. Identify the weaknesses in their analysis;
  3. Put all your effort into figuring out a new method that fixes these issues and thus is faster and/or more accurate;
  4. Validate your method in simulations and apply it to the original data;
  5. Publish in BMC Bioinformatics, Bioinformatics, IEEE Transactions Computational Biology and Bioinformatics, or the Annals of Applied Statistics

As long as your methodological advance is unique, significantly improves the original analysis method, and doesn’t contribute to the deluge of marginal improvements, there is nothing wrong with this approach and you can have a very successful career.

My group does that type of research too and, for example, we just published a cool combination of Hidden Markov Models and Nested Effect Models. We have two case studies in the paper, but don’t necessarily claim any new insights into biology. The stats journal we published in was much more interested to learn about the identifiability of parameters–which is exactly what they should do.

The data don’t fall from the sky

But I am generally not very satisfied with this model of research (apart from the fact that the journals you get into are not what anyone would call high-impact).

The problem is that the data seem to fall from the sky. There is not real interest into how and why people decided to create this particular data set. Purely methodological research is separated from the biological question that motivated generating the data. And for someone in a CS or Stats department it might be completely OK to be inspired by some biological problem and then go off to do their theoretical stuff without ever thinking of the biology again.

But what I am telling my new starters is: that’s not how it works for us! We are computational biologist. Our work is not only motivated by biological problems, it actively tries to solve them by generating testable and tested results. If you work in my group, you need to start with an important biological problem and see the project through all the way to experimental validation. That is: the full workflow in the plot above, from left to right, not just the bit in the middle.

Experiments and computation — like a person’s left and right hands

Obviously, I am not the first to talk about this. (I am just rehearsing The Talk with you here.) For example, Michael B Yaffe wrote about the same distinction a few years ago in an editorial in Science Signaling:

We frequently receive manuscripts that are purely mathematical and computational in nature, without experimental data or tests of their conclusions. These papers are best suited for other journals.

This is the inner box of ‘methodological research’ in my plot above.

Instead, we seek those in which the experiments and computation partner naturally with each other like a person’s right and left hands, and the ultimate product of the analysis is novel insights into the biology, which is then further explored as a test of the method and its conclusions.

What Yaffe is talking about here is the whole big picture of biomedical research as shown in my plot above. This is what excites me and what I want my group to do.

It’s obviously a lot of work. It can be quite a culture shock to move from a CS or Stats department into a cancer research institute. For example, projects tend to take much longer. Having a nice idea, pulling an all-nighter, and submitting to NIPS a few minutes before the deadline doesn’t work anymore. My next posts in the series will explore the challenges and frustrations (and occasional joys) of computational biology further.


Other posts in the series Methods vs Insights.

You gotta talk to me!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s