For one, this definitely seems like a submarine ad for the AI tool (imagetwin, h...

For one, this definitely seems like a submarine ad for the AI tool (imagetwin, henceforth called AIT for AI tool to not give them too much more free press). Not sure if Nature has an angle (probably also free scans) but they gave the researcher free access to the AIT per the preprint disclaimer [0]. Also, it's not really giving you enough info to asses whether the AIT is better, because it's basically PR, so allow me:

For one, the fact-checking against the AIT per the preprint is "Duplications highlighted by ImageTwin.ai were evaluated as appropriate or inappropriate by80 one reviewer (the author)". Not a great way to reduce false-positives given the conflict of interest (more free scans for the author!) and the fact he'd already reviewed most of the offenders.

Also, the article says the AIT missed four papers the author flagged. But in the preprint the category with "At least one inappropriate duplication was identified during the manual review, none were highlighted by the ImageTwin.ai software" has 34 members. Per the author's results text, out of 715 papers with images, the author caught 34 inappropriate duplications the AIT missed, the AIT caught 57 inappropriate duplications that the author subsequently agreed were bad, and they both caught 24 together. But these numbers disagree with the venn diagram shown in figure 3 and referenced in the conclusion, which the Nature article also references. So... am I missing something or is that inconsistent? And were there any AIT flagged papers that the author disagreed were problematic or not?

AIT charges a pretty non-trivial amount per scan, at high volume it's still >$2 before you get to custom pricing [1]. It was "only" 2-3x faster than the researcher, and at least per the article it still needs to be checked for false positives. Taking some normal researcher pay and reasonable estimates for the rate at which they can review papers, it seems like the AIT is pretty damn expensive and might be priced competitively to "research intern".

This technology has existed for a really long time in the form of reverse image search. Google launched it in 2011, later neutered it (probably not profitable enough), but Yandex still has a pretty good one since 2014 [2]. Overall this seems like a pretty sloppy preprint with an obvious conflict of interest, and a tool that doesn't have any apparent innovation beyond commercializing a probably-underserved vertical.

[0] https://www.biorxiv.org/content/10.1101/2023.09.03.556099v2

[1] https://imagetwin.ai/pricing/

[2] https://www.searchenginewatch.com/2014/06/19/yandexs-sibir-r...