Cell asked for help – here you go…

Recently, Cell editor in chief Emilie Marcus posted an article from the perspective of an editor, decrying the recent uptick in allegations of data mis-handling that appear to be flooding editors’ desks in recent years, and asking what to do about it. I’ve already opined on how Cell royally screwed up this process in the past, so instead let’s focus on the actual questions asked, and some solutions. Here are the specific questions posed by Dr. Marcus…


(1) At a time when there is increasing pressure to reduce the costs of publishing, how should journals, institutions, and funders apportion time and resources to addressing a burgeoning number of alerts to potential instances of misrepresentation or misconduct?

Charge less and spend more?  Scientific publishing is a multi billion dollar industry, with profit margins routinely in the region of 35%. All the labor (i.e., peer review) is essentially free, and the actual production costs are miminal now that everything is on-line.

The issue here is that Cell and the other publishers want it both ways – they like the title “gatekeepers of truth”, but they don’t want to shell out the cash required to ensure that what they’re peddling is actually true!  Seriously, the answer to this one is so simple, it really questions the sanity of anyone asking it – SPEND MORE MONEY!  Hire more people to scrutinize the data before it goes out the door.  In the life sciences field in the US right now, we’re all bemoaning the glut of former grad students and post-doc’s struggling to find jobs. They have intimate knowledge of the subject matter. Hire them and you’d fix 2 problems in science.


(2) Are there ways to improve the signal-to-noise ratio that we haven’t thought of?

By signal to noise, one presumes Dr. Marcus is referring to the number of allegations that come in the door, but then don’t turn out to be real problem data.  Again this is a tractable problem.  More eyes on the data = easier to figure out what’s real.  Hire more trained eyes.

Another really simple solution is to USE THE TOOLS ALREADY AVAILABLE for plagiearism detection – things such as iThenticate and DejaVu.  It is shocking that we’ve had text plagiarism software now for well over a decade, but most journals simply don’t use it. Why? Because it costs money!  What’s interesting is that software tools are now also being developed to do the same thing for data and images (I know of some in the pipeline, but can’t mention specifics).

So, we are literally a couple of years away from the point when any submitted paper – both text and images – can be screened automatically using software. After that watershed moment, does anyone want to gamble how long it will take before such tools gain widespread adoption across the publishing industry?  Don’t hold your breath!

The other answer to this question is to think differently about the problem of signal:noise. In the old days, the only way to do it was boost the signal and cut the noise.  Computing power and labor are now so cheap, it’s easier to just do everything – take the signal, take the noise, take everything, and look at it all.  If there’s some noise in there, who cares? It’s actually more expensive to expend effort trying to figure out what’s noise. Just examine everything and THEN decide if it’s noise. You don’t gain anything by filtering out the noise first.


(3) Is there a process for investigating that would be more streamlined, coordinated and efficient?

Yes, see above. Software and more pairs of eyes. Spend more money (does anyone see a theme developing here?)  As regards efficiency, that fact that it was necessary to ask all of these questions tells you that the current system is not working. Quite frankly, anything the publishers do differently will be more efficient than the current approach.


(4) Would allowing/requiring authors to post all the raw data at the time of publication help?

Yes, both of these. It simply beggars belief that in 2015, when I can fit the entire back catalog of thousands of journals on a pocket-sized hard disk that costs <$100, journals are still on the fence about whether to allow authors to store all the data associated with a paper.  Hell, they’re still imposing limits on word count, pages, number of images etc.

Data is, quite literally, the cheapest possible thing in the world that you can store. Journals need to get out of the last century and embrace scientists’ wish to both include all their own data, and to see all the data in other people’s papers.


(5) Should we require that all whistleblowers be non-anonymous to ensure accountability? What if we enact this policy and need to let a serious claim go unaddressed because the whistleblower refuses to reveal their identity?

“Facts should be viewed as such, regardless where they came from”.  As a society, we are in grave danger when we attach relative importance to facts dependent on the perceived importance of the messenger. I experienced this first hand on PubPeer, when a scientist whose work I questioned went on a long diatribe about my own qualifications – as if that somehow changed the facts of the case.

Although the term used in Dr. Marcus question here was “accountability”, inherent is the assumption that anonymity equates to unreliability. There are anecdotes about the infamous Clare Francis being wrong much of the time, but that’s N=1 and I think we can do slightly better here. PubPeer has admirably demonstrated on thousands of occasions that anonymous reporters should be taken seriously, because they are very often right. Similarly, informants who use their real names are very often wrong. There is no hard evidence (to the best of my knowledge) that the reliability of allegations is correlated with the named status of the accuser.

The danger with a “named” approach is the slippery slope to an importance of the messenger prat-fall. Will a journal take an allegation more or less seriously if it comes from a post-doc’ versus a senior PI?  What about an undergrad?  What about a non-scientist member of the public? What if the accuser is a former employee of the paper’s author – does that somehow disqualify their opinion, or does it make their accusations more valid because they may have first hand knowledge of the case?  All these examples lead to a simple conclusion – identity does not matter.

If a journal chooses to assign “importance” to a series of allegations based on who they came from, one must assume that a similar biased system will exist at the other end of the investigation, i.e., the journal may choose to take allegations seriously or not depending on the status of the scientist who is being accused.  If a journal had a policy stating “we don’t investigate Nobel prize winners”, that would be offensive. Why is ignoring anonymous reporters any less offensive?  Both strategies attach undue importance to the messenger, not the facts.


(6) Should we only consider concerns related to papers published in the last 5 years? Seems fine for “small things” like gel splicing, etc., but presumably, if a concern arose that some body of work was fraudulent, even if it was 10 years old, wouldn’t we want to correct the published record?

A possible strategy might be to investigate everything fully within a given time frame (say 6 years – that’s the ORI statute of limitations), but then for older papers apply a graded approach dependent on the other papers from an author or group.

For example, if a paper from 10 years ago is questioned, and the problem is un-reported gel splicing, this may indeed represent a simple mistake by the authors who were simply following accepted (and now universally acknowledged to be wrong) contemporary practices.

However, that same paper juxtaposed against a back-drop of 20 similar papers, all with problems, perhaps with a few already retracted or corrected, suggests a pattern that may be indicative of misconduct or (at the very least), sloppy data handling habits.


This last point is where another key solution comes in… COORDINATION.  None of the above proposals will work if each journal tries to implement them individually.  There has to be a database, shared between journals and publishers, to keep track of all these problems.  The simple idea would be as soon as an allegation comes in, the journal’s investigator would look up the authors in the database and see if any other journals had active on-going investigations about the same authors. Right now, doing a search on PubPeer, PubMedCommons, and Retraction Watch is a reasonable proxy for this, but far from comprehensive or perfect.

Finally, another potential solution that has been overlooked is the role that funding agencies have to play in this process. In case anyone didn’t notice, the NIH open access mandate was introduced in 2008, and sat pretty much ignored for a couple of years. Then NIH came up with a simple carrot/stick – if you don’t comply, you won’t get funding.  Boom!  Now everyone is super careful to publish their work in journals that comply.

What if NIH were to draft up a set of regulations for how journals ought to deal with these problems?  Not guidelines or recommendations but actual RULES… Want to publish your NIH-funded work? It has to be in a journal that plays by the rules, otherwise you can’t list it on your bio-sketch.  Just watch how quickly all the journals would fall into line. Their bread-and-butter is packaging up science they didn’t pay for and selling it back to the public, so if the hand that feeds them the bread says “jump”, they will jump. A mass exodus of researchers from journals that don’t comply, would be cool.  Maybe NIH could call the new mandate “Regulations On Biomedical Oversight Concerning Obsolete Publishing Practices”