A key component of my PhD thesis -all those years ago- was called Nested Effects Models (NEMs) and I feel lucky and privileged that over the years many other researchers have liked them enough to apply them in their work and extend the methodology. Before I show you two recent examples, here is the 5 minute summary of how NEMs work:
The Cancer Research UK Cambridge Institute will host the 8th Annual International Symposium, entitled ‘Unanswered Questions: Tumours at Cellular Resolution’ on 4-5 March 2016.
Organisers and Chairs: Greg Hannon, Florian Markowetz, John Marioni, Duncan Odom.
Hello, my fellow PIs, here is a question for you: Did you get trained well for your job?
Silly question, of course you did. Years of study and examinations culminating in a PhD have obviously trained you well in all things science.
But that’s not what I mean. Details of experiments and algorithms –what you learn in a PhD– are only a small part of a PI’s job. Once you start leading a group, the tough nuts to crack are people-problems.
The scariest picture I have seen this Halloween (or maybe even ever) is Ken Currie’s eerie portrait Three Oncologists:
The Three Oncologists are Professor RJ Steele, Professor Sir Alfred Cuschieri and Professor Sir David P Lane of the Department of Surgery and Molecular Oncology, Ninewells Hospital, Dundee. *
In the Guardian Kathleen Jamie writes:
It’s a portrait, but far from flattering. (…) The three men are lit with a ghoulish inner light; they seem to be haunting the threshold between life and death. (…)
Furthermore, they hold their tools or means: Steele raises his gloved and bloodstained hands, Cuschieri holds a surgeon’s implement, Lane carries a paper. Whose sentence is written there?
As we grow more able to say the word “cancer” out loud and more of us survive it, thanks in no small part to our surgeons and physicians, this painting will become a historical record of an emotional state, as well as honouring three esteemed medics.
But it will still send a shiver down the spine.
It sure will.
Scientific labs are like black boxes. You rarely get a glimpse how someone else has organised their group, what strategies they use to manage their team and how they keep everyone motivated.
This is why I think the PloS Comp Bio Collection “About My Lab” is a great resource.
The collection ‘About My Lab’ was launched with the mission to share knowledge about lab organization and scientific management. Each Perspective article represents an interview with a Principal Investigator, who shares his or her experience of running a lab by discussing selected topics in an informal and personal style. By creating this collection at PLOS Computational Biology, a journal committed to open knowledge, the collection editors hope to create a dialog through which we all can learn from each other.
I feel very honoured they asked me to contribute and my article just came out yesterday. It’s called `You are not working for me; I am working with you.’
It wasn’t an interview, though. I had to write the whole thing myself. Here are the first few paragraphs, you can read the rest on at the PLOS Comp Bio webpage.
Since 2009, I have led a cancer research group at the University of Cambridge; the current group includes ten scientists (five postdocs, five PhD students). In the following, I will share with you some of the lessons I learned over the years and some of the leadership strategies that work well for me. Key topics will be the integration of new lab members and the communication in the lab (in particular, how to make expectations explicit).
How the Lab Started
One of the papers that impressed me most as a PhD student was Eran Segal’s paper on module networks in yeast . When I prepared to start my own lab, at the end of my postdoc in 2008, I realised there had been almost no follow-up, and certainly nothing in cancer research. What an opportunity, I thought! So I wrote up a series of projects, which could easily have kept two postdocs busy for three years, on how to extend module networks and use them for data integration in cancer genomics. It was a great plan. Then I went to Intelligent Systems in Molecular Biology (ISMB) 2009 and heard Daphne Koller’s keynote. What a shocker—point by point, I could tick things off of my to-do list. Not only had Daphne’s lab thought of all my ideas for module networks, but they had implemented, tested, and improved them, and the papers had already begun to be published . Well, I thought, at least they are not doing this in the field of cancer. But then I saw one of Dana Pe’er’s publications , which killed my research program for good. I could have added some marginal improvements, sure, but that wouldn’t have been too exciting. So more than a year into my Principal Investigator (PI) position, I stood there empty-handed, without much of an idea what to do next. I had hit the ground, but I wasn’t running. What rescued me was the people in my group.
What makes a cancer deadly is not necessarily the growth at the location where it started (the primary tumour) but its spread through the body to other organs and tisses (called metastasis). Better understanding the metastatic process is one of main reasons we are interested in inferring cancer evolution.
Today I would like to summarize and discuss two recent papers on cancer phylogenetics and metastasis. The first paper is the comprehensive review by Naxerova and Jain in Nature Reviews Clinical Oncology titled “Using tumour phylogenetics to identify the roots of metastasis in humans.” The second paper is an Opinion paper by Hong, Shpak and Townsend in Cancer Research titled “Inferring the origin of metastases from cancer phylogenies.”
I thought that was a nice test scenario to see if I could reproduce the results I got more than a year ago.
How reproducible am I?
Downloading the Rnw from the journal webpage (link) was easy, but -of course- it didn’t run through smoothly.
LaTeX failed and there were several R error messages.
The joys and frustrations of reproducibility
First of all, I had linked to a BibTeX file instead of just copying the bibliography in to the Rnw as I should have done.
Second, I ran into problems with the survival analysis, because one of the packages had changed.
rms::survplot() used to allow plotting a survfit object through survplot.survfit() function. However, this function has been deprecated as of version 4.2.
Luckily I found an easy workaround, just use npsurv() instead of survfit().
The updated Rnw is here on my webpage:
Together with a PDF so you can see what the output should look like.
Take-home message for me: Even with a knitR file I did myself, reproducibility is not a one-click thing.
To make reproducibility sustainable I would have to check all published analysis scripts in regular intervals (e.g. once every year or every 6 months). Am I prepared to do this? And for how long?
How do you know the leaders in your field? Because they get invited by
Nature Medicine and Nature Biotechnology to a fancy place owned by the Volkswagen Foundation and write a report about it. For example, this one in the latest issue of Nature Medicine titled Toward understanding and exploiting tumor heterogeneity.
What did they discuss? Lots of things. But I got stuck already in the very first topic:
In my last post I shared the slides for my talk “5 selfish reasons to work reproducibly”.
In my talk I stressed the importance of using scripts and code to make analyses reproducible. Instead of clicking, cutting and pasting as you would have to do in a tool like Excel.
I had also submitted my slides to the F1000 slides collection and after a few days got a very polite email back, asking me to rethink the keywords I had chosen in the submission:
Thank you for your slides submission: “5 selfish reasons to work reproducibly”.
Just a quick note to say that keywords are displayed alongside your presentation and are often how users will find your submission, by searching our site.
With this in mind, we were wondering if you had an alternative to “Excel is the devil” which might be more likely to appear on search results. [my emphasis]
First of all, I am impressed by how serious they take curating slides at F1000.
And, yeah, I might come up with some other keywords, even though I think ‘Excel is the devil’ remains quite accurate.
You can find the slides together with the new keywords (quite boring: Reproducible research, knitr, Sweave, Successful lab, Career advice) here:
Five selfish reasons to work reproducibly [v1; not peer reviewed].
F1000Research 2015, 4:207 (slide presentation)
And so, my fellow scientists: Ask not what you can do for reproducibility — ask what reproducibility can do for you!
The following is a summary of a talk I gave in my institute and at the Gurdon in Cambridge. My job was to motivate why working reproducibly is a good strategy for ambitious scientists. Right after my talk, Gordon Brown (CRUK CI) and Stephen Eglen (Cambridge DAMTP) presented tools and case studies of reproducible work.
All materials are on github and below are my slides, thanks to slideshare:
Wohoo! My first set of citable slides.
I submitted the talk I gave at the ISMB 2015 network SIG to F1000. And they made it citable:
Functional analysis of interaction networks [v1; not peer reviewed].
F1000Research 2015, 4:221 (slide presentation)
To make sure that the animations are all visible I duplicated some slides several times and because they contain network plots the file is a massive 43 MB. I need to think of a better way to do this next time …
The book on Systems Genetics I have edited with Michael Boutros just came out! Wohoo!
You can get a look at the first copies at ISMB this year and I will shamelessly promote it in my talk at the Networks SIG on July 10th.
You can download the first introductory chapter written by Michael and me here.
Here is the promo text for your pleasure and enjoyment:
‘My professor demands to be listed as an author on many of my papers’ writes an anonymous scientist in the Guardian.
[T]here’s one instance where it’s acceptable for scientists to lie: when fraudulently claiming authorship of a paper.
Too often, researchers attach their names to reports when they have contributed nothing at all to the work.
The problem gets worse the higher up the academic ladder you go.
I think this is completely true.
The reasons are manifold:
Former postdoc Roland Schwarz of MEDICC fame has become a movie star.
Or at least the face of computational biology for the Wellcome Genome Campus:
In this film Roland Schwarz talks about his research using computers to model and understand evolution. This is one of a series of films providing a unique insight into different careers in the field of genomics.
Go here or watch it directly:
Marcia McNutt, Editor-in-Chief of Science, wrote a thoughtful Editorial about recommendation letters:
I noted an overall bias in the language used to describe the male candidates versus some of the female candidates. In some letters, women were described as “friendly,” “kind,” “pleasant,” “humble,” and frequently, “nice.”
[O]ne letter described how the candidate was so good to her elderly mother, yet still enjoyed life, spending time in nature with her husband and her animal friends.
Another letter reflected amazement that the candidate managed to balance so efficiently being a student, a scientist, and a mother.
But isn’t it good being nice, humble and having lots of animal friends?