The Human Cell Atlas needs a pre-registered analysis plan

The Human Cell Atlas preprint came out some days ago on bioRxiv. It describes a project to collect all the cell types in the human body in one big reference map.

Our mission: To create comprehensive reference maps of all human cells—the fundamental units of life—as a basis for both understanding human health and diagnosing, monitoring, and treating disease. [from]

The contributors to the project are a Who-is-who of the leaders in single cell genomics and this will be a fantastic data set when it comes out. Because in-depth analysis of resources like this provides the foundation of all biology, as you know.

I enjoyed reading the preprint. It puts the project into a historical perspective and discusses promises as well as limitations. It even references Borges’ `On Rigor in Science’. (I love well-read scientists!) And even if all that means nothing to you, it is still worth reading as a comprehensive summary of the current state-of-the-single-cell-art.

But I kept wondering, with a project like this, how do you know whether it is a success or not? How do you know that your reference map is really comprehensive and covers all (most?) of what it is supposed to find?

Continue reading


Science Stories – Reproducibility

If you think I am serious about reproducibility, you should see my wife.

In this movie by the Royal Society she is explaining the issue to David Spiegelhalter. That is Sir David Spiegelhalter, FRS etc etc.

Published on 22 Dec 2015. We need mathematical help to tell the difference between a real discovery and the illusion of one. Fellow of the Royal Society and future President of the Royal Statistical Society, Sir David Spiegelhalter visits Dr Nicole Janz to discuss reproducibility in scientific publications.

 Way to go!


Books, Science

Life out of sequence – Hallam Stevens’ data-driven history of bioinformatics


How do people like you ever get last-author papers?” The person who asked me this question in 2008 during the interview for my current job was (and still is) a well-known stem cell biologist with decades of experience in science. But she still didn’t really know what to think of ‘people like me‘: bioinformaticians and computational biologists. Aren’t bioinformaticians just service providers? Handy to have, but without any real scientific vision and contribution? She clearly worried about my ability to do independent research.

And she wasn’t alone. A couple of years later I interviewed for an EMBO fellowship, which I didn’t get because the panel –mostly cell biologists, no one computational or from genomics or medicine– thought my group was a “mathematical service unit” and my research was “overly driven by my collaborators”. I’m still not sure what a ‘mathematical service unit’ could be (proofing theorems on demand maybe?) but their comments showed me how far removed their research practice was from my own.

Even though bioinformatics is by now an established field these personal experiences show that ‘old school’ biologists, who form the scientific establishment and direct mainstream research, are still very uncomfortable with ‘people like me’ who were trained in other disciplines, pursue biological questions different from their own, and use approaches not covered in classical biological training.

Life Out Of Sequence Cover

Hallam Steven’s book Life Out Of Sequence, A Data-Driven History of Bioinformatics starts with the tension between old and new biology that ‘people like me’ experience every day and describes the way biology has been and is being changed by computational methods.

Continue reading


The hedgehog and the Quants


Nate Silver bashing everywhere I look. For example in the New York Times. Paul Krugman does it. And someone called Timothy Egan. `Creativity vs. Quants‘ is the title of his OpEd – how silly! Does he really thing we quantitative folks are mechanical calculation machines devoid of any creative thought? If you think quantitative work is not creative, you just haven’t done it yet.

Intimidation by quantification

Much more interesting, I thought, was Leon Wieseltier’s take in the the New Republic. I really like Wieseltier’s phrase ‘intimidation by quantification’ – this is how my biological collaboration partners must feel when I bombard them with p-values.

Wieseltier discusses the old idea of the hedgehog and the fox (dating back to ancient Greece) that Silver had used to explain the Fox logo of FiveThirtyEight: “The fox knows many things, but the hedgehog knows one big thing.”

Continue reading


Maximal Information Coefficient – just a messed-up estimate of mutual information?


Theory papers almost never make it into top journals and this is why I have blogged about the paper ‘Detecting Novel Associations in Large Data Sets’ in Science by Reshef et al before (here and here). The reception in the statistics community was mixed and while Terry Speed seemed to love it, Rob Tibshirani started to point out weaknesses. And now other people have joined the discussion.

Continue reading


Finding correlations in big data — ask the expert!

The paper Detecting Novel Associations in Large Data Sets stirred up quite some excitement. Mostly in the stats/comp community, where many welcomed a theory paper in a prominent journal, but some voiced concers about the quality of the results. I’ve posted about this previously.

Now another prominent journal, Nature Biotech, tries to explain what the fuzz is about to a wider readership by interviewing 8 experts: Gustavo Stolovitzky, Peng Qiu, Eran Segal, Bill Noble, Olga Troyanskaya, Noah Simon & Rob Tibshirani, and Edward Dougherty:

No big surprises for me, but maybe a nice intro for people not from the network-field.



“Detecting Novel Associations in Large Data Sets” — let the giants battle it out!

Computational and statistics papers usually don’t make it to glossy high-impact journals. “Your manuscript seems better suited for a more technical journal” is a regular response for submissions focussing on theory not data.

But sometimes these papers make it through, usually to Science, which has a much better track record for theoretical papers than Nature. An encouraging recent example is Detecting Novel Associations in Large Data Sets by Reshef et al in Science:

Continue reading

Philosophy, Science

Here be dragons! Thomas Kuhn, Statistics and System Biology

Here be dragons!

Thomas Kuhn had physics in mind when he wrote Structure of scientific revolutions but his key ideas also apply to statistics and systems biology and can explain some of the confusion in the field.

Thomas Kuhn’s Structure of scientific revolutions desribes the history of science as phases of normal science separated by revolutions and paradigm shifts. During normal science, research is guided by  a ruling paradigm, which identifies feasible problems and routes to tackle them. Normal science is a period of puzzle solving. The better your paradigm, the clearer the puzzle, the better your chances to solve it and progress.

Continue reading