Career, Science

#scidata16 follow-up

More publications, more grants, more awesome! Here is my #scidata16 talk on youtube:

And here is Jonathan Page reporting on my talk in Naturejobs:

The problem lies in the fact that working reproducibly often requires some time investment, something which many scientists working in competitive fields claim they can’t afford. Florian Markowetz from the University of Cambridge counters these claims by saying “not to ask what you can do for reproducibility, but to ask what can reproducibility do for you!”

Indeed I do.



A visual summary of my keynote at #scidata16

I gave my talk on 5 selfish reasons to work reproducibly as a rabble-rousing conference opener at Publishing Better Science through Better Data 2016 (#scidata16) on Wednesday.

Throughout the talk cartoonist Royston Robertson was scribbling away on a huge sheet of paper to visually summarize our key statements.


Hmmm, interesting to have this record what I said.

“Long CVs is what science is all about,” true and sad at the same time.

From now on, I want a personal cartoonist for every talk I give.



‘Five selfish reasons’ is one of Genome Biology’s Most Influential Articles of 2015

Genome Biology just sent an email around with 2015’s Most Influential Articles, according to

And, guess what, one of mine made the Top 10: Five selfish reasons to work reproducibly  from last December — really a late-comer to the competition.

And so, my fellow scientists: ask not what you can do for reproducibility; ask what reproducibility can do for you! Here, I present five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist.

Now I just need one of my research papers to have the same impact as my opinions, and I’d be sorted …




First parasites, now online harrassment – how has transparency harmed you lately?

An interesting post at Political Science Replication:

Getting the idea of transparency all wrong

Following an article in the New England Journal of Medicine, which portrayed scientists who re-use data as parasites, we now hear more on this from Nature. Apparently, data transparency is a menace to the public. The Nature comment “Don’t let transparency damage science” claims that the research community must protect authors from harassment by replicators. The piece further infects the discussion about openness with more absurd ideas that don’t reflect reality, and it leads the discussion backwards, not forward. 


Duty Calls, Science

I am a research parasite. Got a problem with that?

In case you wondered what’s wrong with biomedical research, just read this editorial on data sharing by Longo and Drazen in the New England Journal of Medicine, a leading journal in the field. What you will find is a desperate attempt to take data hostage and to enforce co-authorships for people who didn’t make any intellectual contributions.

But let’s take it one step at a time. What did Longo and Drazen actually say? They think there are major problems with sharing data fully, timely and openly.

Continue reading


Science Stories – Reproducibility

If you think I am serious about reproducibility, you should see my wife.

In this movie by the Royal Society she is explaining the issue to David Spiegelhalter. That is Sir David Spiegelhalter, FRS etc etc.

Published on 22 Dec 2015. We need mathematical help to tell the difference between a real discovery and the illusion of one. Fellow of the Royal Society and future President of the Royal Statistical Society, Sir David Spiegelhalter visits Dr Nicole Janz to discuss reproducibility in scientific publications.

 Way to go!



“Five selfish reasons to work reproducibly” published


Wohoo! Genome Biology just published my piece on “Five selfish reasons to work reproducibly” (which I have talked about before).

And so, my fellow scientists: ask not what you can do for reproducibility; ask what reproducibility can do for you! Here, I present five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist.

Go check it out at

I am a bit sad, though, that they cut this über-geeky joke I used to illustrate how tightly the tools of reproducibility have to be linked with routine practice:




Sustaining reproducibility

Our paper on tumor evolution in ovarian cancer (see here) came with a nice knitR file to reproduce the survival results, which I used as an example in my recent talk about reproducibility (see here).

I thought that was a nice test scenario to see if I could reproduce the results I got more than a year ago.

How reproducible am I?

Downloading the Rnw from the journal webpage (link) was easy, but -of course- it didn’t run through smoothly.

LaTeX failed and there were several R error messages.

The joys and frustrations of reproducibility

First of all, I had linked to a BibTeX file instead of just copying the bibliography in to the Rnw as I should have done.

Second, I ran into problems with the survival analysis, because one of the packages had changed.

rms::survplot() used to allow plotting a survfit object through survplot.survfit() function. However, this function has been deprecated as of version 4.2.

Luckily I found an easy workaround, just use npsurv() instead of survfit().

The updated Rnw is here on my webpage:

Together with a PDF so you can see what the output should look like.

Take-home message for me: Even with a knitR file I did myself, reproducibility is not a one-click thing.

To make reproducibility sustainable I would have to check all published analysis scripts in regular intervals (e.g. once every year or every 6 months). Am I prepared to do this? And for how long?



Is there an alternative to ‘Excel is the devil’?

In my last post I shared the slides for my talk  “5 selfish reasons to work reproducibly”.

In my talk I stressed the importance of using scripts and code to make analyses reproducible. Instead of clicking, cutting and pasting as you would have to do in a tool like Excel.

I had also submitted my slides to the F1000 slides collection and after a few days got a very polite email back, asking me to rethink the keywords I had chosen in the submission:

Thank you for your slides submission: “5 selfish reasons to work reproducibly”.

Just a quick note to say that keywords are displayed alongside your presentation and are often how users will find your submission, by searching our site.

With this in mind, we were wondering if you had an alternative to “Excel is the devil” which might be more likely to appear on search results. [my emphasis]

First of all, I am impressed by how serious they take curating slides at F1000.

And, yeah, I might come up with some other keywords, even though I think ‘Excel is the devil’ remains quite accurate.

You can find the slides together with the new keywords (quite boring: Reproducible research, knitr, Sweave, Successful lab, Career advice) here:

Markowetz F.
Five selfish reasons to work reproducibly [v1; not peer reviewed].
F1000Research 2015, 4:207 (slide presentation)
(doi: 10.7490/f1000research.1000179.1)



Five selfish reasons for working reproducibly

And so, my fellow scientists: Ask not what you can do for reproducibility — ask what reproducibility can do for you!

The following is a summary of a talk I gave in my institute and at the Gurdon in Cambridge. My job was to motivate why working reproducibly is a good strategy for ambitious scientists. Right after my talk, Gordon Brown (CRUK CI) and Stephen Eglen (Cambridge DAMTP)  presented tools and case studies of reproducible work.

All materials are on github and below are my slides, thanks to slideshare:

Continue reading


A serious case of replication frustration!

Pol Sci Rep

… and this time it doesn’t even have anything to do with breast cancer. Check out the latest news on replication in the social sciences on my wife’s new blog Political Science Replication:

“I don’t have a ready-made dataset.” – “We don’t have the R code for our paper available.” – “I’m travelling. I will definitely send the replication data when I can clean it up a bit.” These are just some of the answers I received when asking authors for their replication data in political science. Only few sent me replication-ready data and almost no one sent me their code or .do file. This is a serious case of replication frustration! *

Reading her post I realized that my own field, genomics, actually seems to put much more effort into reproducibility and replicability than the social and political sciences. Just think of all the big repositories for micorarrays and sequencing data. That should sort at least the availability of the original data. And the experimental data packages on Bioconductor can contain vignettes with all the code necessary to re-do the analysis (like in this example from my group).

I’m quite happy about these efforts, but then again …  it’s still sometimes a pain in the neck to get full GWAS data sets or complete genome-wide RNAi screening data. And ‘we need to clean up the data first’ or ‘the first author is traveling’ are excuses I regularly hear when asking for data.

Still so much left to do, wherever you look …


PS: I wonder where she got the idea for the layout from …