
By Rebecca Stauffer, AAPS
Retractions are rising in medical research literature, even as more eyes examine peer-reviewed papers for accuracy. AI is powering an arms race in the world of research misconduct, making it easier for scientific fraud to occur, and for editors to identify and root out.
These observations arose from the 2024 AAPS PharmSci 360 opening session, featuring a panel discussion with Ivan Oransky, M.D., Retraction Watch co-founder, and Colby Vorland, Ph.D., an assistant research scientist at the Indiana University School of Public Health-Bloomington who is part of a growing number of “sleuths” examining medical research for errors.
Expert: Publish or Perish Driving Paper Mills
Oransky began by offering his definition of a retraction.
“A retraction means that, generally, a journal, sometimes a publisher on behalf of a journal, or vice versa, an institution, or authors say ‘what you’re looking at is not reliable. You shouldn’t trust this in some way,’” he said. “All it means is: don’t base your next set of experiments, your research plan, or even a citation you are planning in a paper—don’t base it on that. That’s really all it means.”
He pointed out that while retractions carry a negative stigma, the definition itself is neutral.
Oransky said that last year was a “record year for retractions.” In early December of 2023, Nature reported that there were more than 10,000 retractions that year, based on resources including Retraction Watch.
“Turns out it was closer to 13,000,” Oransky said. “This piece ran in December and by the end of the year when all the dust settled, there were even more.”
While 13,000 retractions is an alarming number by itself, Oransky pointed to the rate of retractions as being of major concern, especially when it comes to citations.
In 2002, 1 in 5,000 papers were retracted, Oransky said. In 2023 retractions increased to 1 in 500 papers. “Statistically speaking, there is a not-zero chance that one of the papers you’re citing has been retracted. That’s pretty significant.”
The growth in retractions reflects the increasing availability of better screening, including groups dedicated to unearthing concerning research.
The rise in retractions last year is tied to publishing giant Wiley’s 2021 purchase of journal publisher Hindawi. Special journal issues were a major part of Hindawi’s publishing strategy. These special issues, according to Oransky, were overrun with articles submitted via paper mills, so much so that more than 8,000 papers had to be retracted last year. This led to what he referred to as a “tsunami effect” in the publishing world, including a drop in Wiley’s stock price.
“Part of what’s happening is that there’s an entire industry now, one might say an illicit industry or at least a black market, of paper mills,” he said. “A paper mill, and I heard a really good definition recently, is an organization, a for-profit company really, set up to somehow falsify the scientific record.”
These cartels sell papers, authorships, citations, and other signaling devices that scientists – and the public – use to decide who to trust and fund. AI is becoming a factor in the rise of retractions as it makes fraud easier.
Oransky believes that science’s publish-or-perish research culture is driving an increasing use of paper mills. The pressure to publish is the culprit behind the worst crimes in research, and Oransky said the mills are putting misconduct “on steroids.”
“The real problem is citations are the coin of the realm,” he said. “Never mind just the papers. It’s how often you are cited.”
Many university rankings are heavily based on a metric of citations that is easily gamed. “And, again, paper mills will provide that service to you,” he said. “ ‘You cite me, I cite you, and that person over there,’ and you have a whole ring, a sort of cartel of people citing each other.”
But these citation cartels must now contend with “sleuths.” These scientific avengers have taken on the role of examining scientific literature for potential issues, often risking potential defamation suits.
Research Fraud Dampens Scientific Enthusiasm
Vorland is one such sleuth, working with a team of researchers identifying potential issues with research data.
“Research integrity issues, particularly research misconduct which includes fabrication, falsification, and plagiarism, have existed as long as science has existed,” he said, explaining that even well-known scientific figures in history, including Gregor Mendel and Louis Pasteur, may have committed misconduct as we define it today during their research.
From the past, Vorland moved to present-day examples of misconduct, beginning with neurology researcher Matthew Schrag’s work as an image manipulation sleuth. Schrag found evidence of image manipulation in a seminal 2006 paper used in Alzheimer’s and Parkinson’s disease research, which potentially misled millions of dollars’ worth of research for over 15 years.
Schrag was also involved with the team responsible for identifying possible evidence of research misconduct involving an NIH leader specializing in aging; more than 130 papers were identified and displayed a number of issues. “This has probably caused millions of dollars of funding to go into certain areas where it wouldn’t have otherwise gone with this fake data,” Vorland said.
Research misconduct carries other consequences as well.
Vorland discussed an article in The Transmitter (Oransky is editor-in-chief of this publication), about a group of students unable to replicate research in their lab. The data they were attempting to replicate turned out to be fake.
“People within the lab, the students, some of them left the lab to go to other labs. Some of them left science completely, and in the end, some of the students as the principal investigator quoted ‘lost some of that spark for science,’” he said. “One of the things I wanted to emphasize is the cost of misconduct and errors in science goes beyond just the harm to patients and the financial cost. It’s the cost of time and redirecting resources where it could go to other research questions. But it also involves the cost of losing early career talent.”
Vorland has found that one way to mitigate what he calls “honest errors” (e.g., statistical errors) is for researchers to work together. He has actually worked with other researchers to identify potential data errors in articles. In one case, his team contacted the authors of a paper, and after collegial discussion, the two groups of researchers submitted a joint paper with updated analysis.
“I really think this is a win-win. This is ideally how the process should go but unfortunately, it’s kind of the exception to the rule,” he said, recounting how his team contacted the authors of a paper in 2020 with concerns about statistical analysis, and a retraction was only published in 2023 after multiple outreaches to the publication’s editor.
“Most of the time it’s unfortunately easy to find errors in science. What I spend most of my time on is getting someone to care at a journal that there is an error to help get that corrected.”
Tools to Ensure Credible Science
Oransky and Vorland concluded their presentations with a look at specific tools researchers can use to ensure integrity in their own research. Vorland noted that there are many approaches available to examine signatures of errors in text, images, and data, such as methods to identify various phrases and patterns of image manipulation.
In particular, he likes the checklist called INSPECT-SR that can be used to identify problematic randomized controlled trials in medical research literature. It is expected to be available in 2025.
Some journals are also using AI tools to look for textual signatures of problems or potential image manipulations. Other tools include the Problematic Paper Screener and Papermill Alarm.
To avoid citing retracted papers, Oransky said that a number of tools now offer automated retraction alerts using the Retraction Watch database. He recommends using Edifix, Zotero, and the LibKey Nomad browser extension.
Below is a transcript of the Q&A portion.
How can the peer review process be improved so there are less retractions?
Vorland: Unfortunately, a lot of us come from the assumption when reviewing a paper, we’re reviewing something that does not have errors, that there is no potential for fraud. I think in some cases we need to flip that a little bit and start from a more critical evaluation and really look at the numbers and look at plausibility.
If you’re a reviewer, you can contribute to a transparent research culture by requesting that journals establish data sharing policies and spend your time reviewing for journals that have these policies.
Oransky: I would even go a bit upstream. There are far more papers that should be retracted. Maybe even should have never have been published in the first place than there are. We think it’s about 2% that should be retracted.
I would urge everyone to go upstream. I know that AAPS publishes four journals with Springer Nature which does have its own dedicated team doing all of the research integrity investigations. But make sure, and this is probably at the board level if I’m understanding the organization, that you’re not creating your own incentives to publish more and more and more and to make sure not too much revenue is tied up into those journals that would require you to grow and grow and grow, because that’s where the upstream kind of stuff that happens.
How often are retractions due to honest problems or mistakes due to the real data of science?
Oransky: The data are pretty consistent that about two-thirds of the time retractions are due to something that is considered misconduct. Not honest error.
Only about 20% of the time in any given dataset are retractions colloquially what we would refer to as honest errors. The others we can’t quite tell as retraction notices have not been very consistent over the years. In fact, again, this gets back to the idea that retractions have earned their negative stigma.
There’s honest error. For example, ordering the wrong reagent or using the wrong reagent. Or ordering the wrong mouse. People have done that and it’s horrible and painful to have to realize this because nothing gets replicated, nothing gets reproduced. People have done the right thing. In fact, we have category in Retraction Watch called “Doing the Right Thing” where they’ve retracted those but that’s actually the minority of retractions.
What happens if you have cited an article that is later retracted?
Oransky: In fact, the answer is actually nothing. I mean that a little bit flippantly in the sense that there is no punishment for that. I think the real question is ‘what should you do?’ What is best practice? I think that it behooves authors to keep track of these things and see well did that actually affect what you thought about. Was it one of 12 references that said the same thing in your introduction which is a little bit different than if it was the central reference that led you down this path.
I think the example that [Vorland] gave about Alzheimer’s research and this small number of trials leading down a devastatingly long dead end that’s something we really should think about. The first step is to actually make sure that you’re not citing a retracted paper moving forward. I think you have to actually start there. It turns out, in the biomedical literature, more than 95% of the time when people cite retracted papers—and they’re already retracted and not retracted later—they’re citing as if it hadn’t been retracted. I want you to think about that for a second, think about those of you who have done expert witness testimony. If an attorney showed up with a case, with a precedent, would it have been overturned? Would you lose that case?
What are the current regulations for the use of AI in medical writing and general research publishing? And are there mechanisms to screen for AI use by journals?
Vorland: I know a lot of journals now have policies about the use of AI. The most important thing is to just disclose if you use it. Going forward, I think AI could be really useful in the scientific writing process, and also useful in helping to screen articles and help researchers identify potential issues before papers go out for publication. I’m pretty pro on the use of AI but the disclosure is the most important thing. So, if you use AI, definitely disclose it.
[Oransky] said it in his introduction as well, we know for a fact that a lot of papers are using AI because of these strings of prompts that are left in. There was another really clever method some investigators used to kind of estimate use of these large language models (LLMs) in scientific writing. These large language models tend to over-represent, overuse specific words when compared to the background of scientific writing. For example, one of those words for some reason is ‘delve.’ So, they, before LLMs came out, looked at the prevalence of a bunch of scientific words, including “delve,” and they looked at it after it [LLMs] started coming out and some words just shot up. They were able to estimate that, in their sample at least, on a lower bound, at least 10% of scientific abstracts probably have review by an LLM. I think this is a really big issue that we need to grapple with relatively quickly about what is an appropriate use of these tools. But disclosure is, in my opinion, the most important thing.
What can we, the stakeholders, researchers, universities, publishers, and funding agencies do to reduce academic misconduct, ultimately leading to less issues and retractions?
Oransky: I would say do whatever you can to cut down on Publish or Perish culture. One really smart idea I’ve seen is instead of basing it all on index or number of papers, impact factor, and all that, if someone’s applying for a role or applying for a promotion, tenure, etc., ‘show me your three or show me your five papers you’re most proud of that you think are the strongest.’ Let the committee look at those and not all these sorts of metrics that I think have unintended consequences.
Vorland: I definitely agree with that. I would also offer that I think a lot of the cases of academic misconduct or errors, one of the things a lot of them have in common is that only one person is handling data or preparing data. I think having a culture within labs, within institutions, that you have multiple checks, workflow checklists, independent investigators (ideally, professional biostatisticians) looking at data independently as a double check. I think this would potentially cut down on a lot of these issues.