AI Assistant Tool QED Scores bioRxiv Preprint Quality

Along with running their labs, conducting research, and teaching, most scientists often spend unseen evenings and weekends scrutinizing data and crafting feedback as peer reviewers for scientific journals.

“Everyone would agree that the way science publishing works at the moment is far from ideal,” said Oded Rechavi, a molecular biologist at Tel Aviv University. “The bottleneck is the reviewing of the work or evaluating the science, which is just extremely difficult, and currently is done by good, willing people dedicating their time.”

The peer-review process can often be long and tedious, with publishing a paper taking up to 18 months after the initial submission.¹ Partly frustrated with this, last year, Rechavi and his team created QED, an AI assistant for reviewing preprints. This would help researchers gauge the quality of a non-peer reviewed paper at a stage where traditional proxies like journal rank and citation count are not available.

Now, building on this, Rechavi and his team developed a metric called the QED Score to rank the validity and originality of findings in a preprint. Rechavi and his team used QED to derive this score from more than 57,000 bioRxiv preprints through blinded evaluations to identify the top one percent life science preprints published in the last year. Moreover, according to a white paper published this week, QED Scores of preprints offered a reasonable estimate of a paper’s quality, per domain experts.

“A lot of research [is] coming every day, and so we need some proxies for quality,” said Pedro Beltrao, a biologist at ETH Zürich, who was not associated with QED Science, Rechavi’s company that developed QED. “Right now, a lot of people just use the journal in which a paper is published as a proxy, and maybe this [QED Score] could be an interesting proxy,” he added. However, he said that such a reduced metric should not be used for the evaluation of scientific careers.

QED Scores Help Evaluate Preprint Quality

Rechavi said that feedback from the research community poured out when they launched QED last year. They received responses that helped them refine the tool to identify the novelty of a paper, evaluate each of the claims that the authors make, and point out gaps in the work.

“But now, we wanted to take [QED] to the next level,” said Rechavi. He noted that the committees that decide the fate of researchers’ recruitments, promotions, or grant outcomes usually do not consider preprints and instead wait for peer-reviewed publications. Hoping to bridge this gap, Rechavi and his team developed QED Scores as a proxy of a preprint’s quality.

The AI-based metric evaluates the originality and validity of manuscripts blinded from who wrote them or which country or institution they belong to, a particularly important step given the location- and gender-bias in peer review.²Using this pipeline, Rechavi and his team scored 57,455 bioRxiv preprints submitted between May 2025 and April 2026 to identify the top work produced in that period. “If you do relatively shallow work, reading a…paper would take you eight hours, and you need three reviewers,” said Rechavi. “It adds up to millions of expert hours, assuming you can find these experts.”

A graph depicting a bubble chart comparing the quality and volume of the top one percent preprints country-wise.

Are QED Scores on Par with Expert Assessment?

Rechavi and his team also sought to validate the utility of QED Scores. For this, they scored nearly 5,000 preprints, of which almost 3,000 had been published in journals. Comparing the QED Scores of these papers with the rank of the journal they were eventually published in revealed a positive correlation, indicating that the AI-based review of largely agreed with formal peer review.

However, there were some instances where QED Scores did not align with journal ranks: manuscripts with low QED Scores made it to high-ranking journals, and vice versa. In this case, Rechavi and his team paired manuscripts with contradictory scores such that one of the papers had a high QED score but was published in a low-ranking final journal, and the other had a low QED score but ended up in high-ranking journal. They then sent these pairs of manuscripts to different experts to get their opinions on the relative quality of the preprints.

Bluma Lesch, a geneticist at Yale University, was one of the researchers who participated in this assessment. She received three pairs of manuscripts with explicit instructions to not check where they have been published and had to pick out the paper she thought was of superior quality.

This exercise revealed that Lesch and other experts sided with QED Scores 75 percent of the times, that is, they thought that manuscripts with higher QED Scores were of superior quality even if their final journal rank was lower.

“Having used the tool…I wasn’t surprised that [the QED Score] generally aligns with how people would rate the overall quality of a paper,” said Lesch. However, she said it was not completely unexpected that this alignment was not perfect. “We all know that there are biases and imperfections in peer review.”

Of note, QED scored one of Lesch’s preprints in the top one percent, one of the only instances of a paper by a researcher involved in testing QED Scores achieving that level.³

Poonam Thakur, a neuroscientist at the Indian Institute of Science Education and Research, Thiruvananthapuram, also noted that QED Scores aligning with expert views was not unexpected. “If you train an AI [tool] to do a certain task, which is very specialized and specific, [it is not surprising] that it will start to perform well after very thorough training,” said Thakur, who was involved in some steps to refine QED. However, she anticipates that the AI tool will face some push back from the community.

The fact that experts favored QED Scores over journal rankings a majority of the times is “very, very encouraging,” said Rechavi. “[It] makes me very happy.”

A graphic depicting the comparison of QED Scores of preprints with their eventual journal rank.

Using AI for Peer Review

Despite QED Scores aligning with expert reviews, Beltrao noted that some considerations remain, especially with using one metric to assess the quality of a body of work. Quantifying the value of scientific work has many dimensions, he said, “and I think their method is probably scoring one dimension.”

He noted another potential problem with the score. “As a scientist, nobody wants to be evaluated by a single metric,” said Beltrao. And while QED Scores could be integrated with other metrics to assess scientific output, he is skeptical that scientists would want that. However, he also acknowledged that, “We just don’t have the capacity to read all the science that’s happening at [a] given time. It is unavoidable that we need proxies.”

Another concern with QED Scores could be the use of AI to review manuscripts. But according to Sandhya Koushika, a neuroscientist at the Tata Institute of Fundamental Research, Mumbai, this is already happening. “I have heard from other people who [are] getting reviews back that their review is an AI review,” said Koushika, who was also not involved in the development of QED.

She noted that if researchers are going to use AI to review manuscripts, “I think it is better to declare it and perhaps better to use some product like QED, which is free anyway, which is [developed] by biologists for biologists and biomedical sciences.”

Beltrao said even his team recently received reviews, and “one of them looked clearly AI made.” He believes that journals must come up with strategies to ensure restricted AI usage in peer review and clear disclosure of AI use.

However, Rechavi emphasized that QED and QED Scores are not meant to replace peer review. Rather he hopes that QED acts as a complementary tool in the process. “[Peer review] should be done. It’s great,” said Rechavi. “It’s just not always available, and it fails often.”

Is AI Replacing Critical Thinking?

With researchers uploading their preprints in QED to receive feedback before sending them for peer review, some people on social media pointed out that this was like outsourcing critical thinking to AI. However, Koushika does not agree. “The business of science is still critical thinking,” she said, noting that AI cannot conduct or troubleshoot biological experiments. “Critical thinking is part and parcel of everyday life. I don’t think we are outsourcing it, but I think we are helping the process along,” she said.

Both Thakur and Lesch noted that sometimes in its review, QED suggests experiments that are not practically doable within tight time frames, which highlights the requirement of human oversight. “[I] hope that when people are using AI for giving some reviews, they do apply real practical jurisdiction of their own mind into what is actually possible,” said Thakur.

Overall, Rechavi accepted that the QED Score is not perfect. “We know that it doesn’t capture [everything] in the world, no one score can,” he said. “But the nice thing about AI evaluations is that it’s much easier to improve them.”

This is something they plan to do, especially for giving out reviews and scoring papers from fields outside of the life sciences. “There are always ways to improve,” said Rechavi. “And we will improve, but we did the best we can at this particular stage.”

Andersen MZ, et al. Time from submission to publication varied widely for biomedical journals: a systematic review. Curr Med Res Opin. 2021;37(6):985-993.
Haffar S, et al. Peer review bias: A critical review. Mayo Clin Proc. 2019;94(4):670-676.
Tullius TW, et al. Protamine lacunae preserve the paternal chromatin landscape in sperm. bioRxiv. 2025

What's Hot

Tiktok and Youtube deactivate 4.7 million accounts | News.az

Azets urges businesses to prepare for accounts filing changes – Renfrewshire Chamber of Commerce

SaaS Customer Relationship Management (CRM) Market Size to Hit USD 224.43 Billion by 2035

AI Assistant Tool QED Scores bioRxiv Preprint Quality | The Scientist

New AI tools aim to help boaters at the Jersey Shore avoid endangered whales

Could AI eventually make things cheaper?

60 Seconds on Tech & Sourcing: Key Considerations for Multi-Vendor AI Solutions | Loeb & Loeb LLP

Tiktok and Youtube deactivate 4.7 million accounts | News.az

Azets urges businesses to prepare for accounts filing changes – Renfrewshire Chamber of Commerce

SaaS Customer Relationship Management (CRM) Market Size to Hit USD 224.43 Billion by 2035

VastAdvisor Launches New Version, Connecting CRM Intelligence to AI-Driven Campaign Execution for Wealth Management Firms

Tiktok and Youtube deactivate 4.7 million accounts | News.az

Azets urges businesses to prepare for accounts filing changes – Renfrewshire Chamber of Commerce

SaaS Customer Relationship Management (CRM) Market Size to Hit USD 224.43 Billion by 2035

Subscribe to Updates

What's Hot

AI Assistant Tool QED Scores bioRxiv Preprint Quality | The Scientist

QED Scores Help Evaluate Preprint Quality

Are QED Scores on Par with Expert Assessment?

Using AI for Peer Review

Is AI Replacing Critical Thinking?

Related Posts

Subscribe to Updates