This is a bot account controlled by User:ערן (a.k.a. Eran).
Copy & Paste detection
It powers the CopyPatrol feed, among other tasks. It is a copy and paste detection bot that populates a database with possible copyvios which is then used to create the CopyPatrol feed. It is based on the multi-year efforts of WP:Turnitin.
All recent edits to the English Wikipedia over a certain size (+500 [after removing wikicode]) are scanned (that wasn't there in the 2 previous revisions). The text is sent to plagiarism detection service iThenticate.
Edits with similar text to external sources are considered as possible copyright violations, and are reported in the CopyPatrol tool.
If the external source is a mirror of Wikipedia, it is either removed by iThenticate itself or afterwards by the bot (based on the EranBot blacklist)
If the source is a broken link the bot removes it
Each entry in the report page has the following fields: Title of the edited page, Diff with link to the relevant edit diff and page history, Editor, Source - link to report page in iThenticate (titled "report") and links to possible sources of the edit (titled "compare"), Status - Should be filled manually with TP/FP. The bot adds hints for possible good edits:
citation - the added text mentioned in the source. For short text it is OK (in copyright sense) and for long text it is a violation (see also Wikipedia:Close paraphrasing).
Mirror? - the added text comes from a source that may be a possible mirror site of Wikipedia. E.g the source seems to be unknown mirror (that doesn't appear in our blacklist, but has attribution to Wikipedia). Editors can add such sites to the EranBot blacklist, so they don't appear in future.
(CC) - the added text comes from a source that probably has a Creative Commons license.
Page triage
Special:NewPagesFeed (aka PageTriage) interacts with Copyright bots, such as Eranbot.
Use Set Filter => "Copyvio" to review pages with pontential copyright issues
Currently it works only for the en/es/fr/cs Wikipedias (There may be potential for it to expand to other languages - just ask!). It has been a great help for medical articles. Efforts to make it more functional are ongoing. The results are being placed at CopyPatrol tool and the bot runs 24/7.
There is NO plan for this bot to make edits to mainspace. The concept has been discussed with the WMF legal team who are happy with it.
The bot is based on Pywikibot and you can find its source code in GitHub. It is possible to run the bot in other Wikipedia languages, but to run the bot you have to request an account for iThenticate.
You must be logged in to post a comment.