Commons:Bots/Requests/YiFeiBot (13)
Operator: Zhuyifei1999 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: check every image to see if it includes {{License template tag}} or {{No license since}} on it. If not, add the page to Category:Media without a license: needs history check
Automatic or manually assisted: Automatic unsupervised
Edit type (e.g. Continuous, daily, one time run): Daily
Maximum edit rate (e.g. edits per minute): 6 edits per minute
Bot flag requested: (Y/N): N
Programming language(s): python: pywikipedia
Zhuyifei1999 (talk) 06:09, 27 October 2013 (UTC)
Discussion
- Please make a test run. Is it possible to go through all use uploads after finding one problematic file? This will to reduce clutter on user talk pages. --EugeneZelenko (talk) 14:47, 27 October 2013 (UTC)
- On hold buggy with those uploads before 2011, I'll do another test run after tons (may be 10M) of null-edits is done. Also, go through all use uploads after finding one problematic file is hard, but I'll try with some additional sql queries. --Zhuyifei1999 (talk) 09:25, 28 October 2013 (UTC)
- I am confused as to why do you think you need 10M null-edits, I did not noticed any issues with older uploads. this edit to {{GNU-Layout}} set thinks back for a while, so now we will have to look for files lacking {{License template tag}} or {{GNU-Layout}} for next few months. As I mentioned before the manual pipeline we use when dealing with images with no license is:
- Check for uploads done within the last few weeks (or since the last run) that do not have a license: (lacking {{License template tag}} or {{GNU-Layout}}) or is already tagged with {{No license}}, {{Delete}}, {{Speedydelete}}, {{Remove this line and insert a license instead}} or in Category:Media without a license: needs history check. Add {{No license}} using VisualFileChange. VisualFileChange uses user friendly I message per user, which is very important since often people make the same upload mistake on 10's or hundreds of files and we really do not want to add 100's of templates to their user pages.
- All older files should be added to Category:Media without a license: needs history check, since they are most likely files that "lost" license somehow. Do not tag those with {{No license}}, since many are by no longer active users, and it is not their fault that some vandal of inexperienced editor removed the license and nobody noticed. Deleting admins are suppose to check the edit history before deleting, but I suspect that few of them do.
- Alternatively we can look at number of edits to the file. Files with a single edit, or edit only by the uploader and categorization (and other) bots can be tagged with {{No license}}. --Jarekt (talk) 12:24, 28 October 2013 (UTC)
- Changing to files with a single edit. Anyways, the 10M null edits are for the files before {{License template tag}} is created, so it somehow lacking the update of the templatelinks table and generates some false positives while doing a sql query (more than 19 out of 20), sometimes even null edit just before the tagging won't work, so have to do the null edits. --Zhuyifei1999 (talk) 12:47, 28 October 2013 (UTC)
- {{License template tag}} was created over 2 years ago and is now transcluded on about 18,029,597 pages (compare to 19,089,184 files present). I did not observed any false positives due to non-current templatelinks table, in last year when quarrying the database. The only exception were files using {{GNU-Layout}}. --Jarekt (talk) 17:24, 28 October 2013 (UTC)
- Oh? Special:Permalink/108061906 & Special:Permalink/108089514 (see also the dates before the edits) was because of this with this sql query:
- {{License template tag}} was created over 2 years ago and is now transcluded on about 18,029,597 pages (compare to 19,089,184 files present). I did not observed any false positives due to non-current templatelinks table, in last year when quarrying the database. The only exception were files using {{GNU-Layout}}. --Jarekt (talk) 17:24, 28 October 2013 (UTC)
- Changing to files with a single edit. Anyways, the 10M null edits are for the files before {{License template tag}} is created, so it somehow lacking the update of the templatelinks table and generates some false positives while doing a sql query (more than 19 out of 20), sometimes even null edit just before the tagging won't work, so have to do the null edits. --Zhuyifei1999 (talk) 12:47, 28 October 2013 (UTC)
SELECT page_title
FROM page
WHERE page_namespace = 6
AND page_is_redirect = 0
AND NOT page_id IN (
SELECT tl_from
FROM templatelinks
WHERE tl_title = "License_template_tag"
OR tl_title = "No_license_since"
--Zhuyifei1999 (talk) 09:08, 29 October 2013 (UTC)
- Test ran Done at [1] (5 of the edits were deleted). But the talk pages of the bot uploader point to operator's and just flooded it. --Zhuyifei1999 (talk) 09:55, 6 November 2013 (UTC)
- But what is preventing bot to analyze entire user's upload history before issuing warnings? --EugeneZelenko (talk) 15:07, 9 November 2013 (UTC)
GROUP BY
uploader & list of files per uploader (in simpler way) Maybe I can figure out tomorrow. --Zhuyifei1999 (talk) 15:46, 9 November 2013 (UTC)
- But what is preventing bot to analyze entire user's upload history before issuing warnings? --EugeneZelenko (talk) 15:07, 9 November 2013 (UTC)
- Fixed at [2] with another task running in the middle. --Zhuyifei1999 (talk) 14:42, 10 November 2013 (UTC)
- Could you please make test run again, may be with bigger number of samples. My concern is for subpages in File: name space like overlay.kml. Should be skipped by bot as non-file ones (or may be API has better way to detect is there are file or not). --EugeneZelenko (talk) 15:06, 11 November 2013 (UTC)
- Done as ignoring all overlay.kml files. (Got some accidental unwanted errors after the last edit, and is now fixed.) --Zhuyifei1999 (talk)
- Looks like bot could still produce false positives (File:Shipka pass (Шипка) - central Monument 2.JPG).
- I think will be good idea to customize edit summary on user talk page depending on number of problematic files. For one case (and the listed ones) should not be used. For many will be good idea to specify number.
- Please repeat test run after modifications.
- EugeneZelenko (talk) 15:10, 12 November 2013 (UTC)
- I don't think so. See diff
- ok. --Zhuyifei1999 (talk) 15:16, 12 November 2013 (UTC)
- Sorry for own false positive, I should look into file history. --EugeneZelenko (talk) 15:26, 12 November 2013 (UTC)
- Never mind. But On hold for too many HTTP Error 503: Service Unavailable happening. --Zhuyifei1999 (talk) 11:29, 13 November 2013 (UTC)
- Those seem to have improved now (ran a bot cleanly 14h ago). --Dschwen (talk) 19:02, 13 November 2013 (UTC)
- Never mind. But On hold for too many HTTP Error 503: Service Unavailable happening. --Zhuyifei1999 (talk) 11:29, 13 November 2013 (UTC)
- Sorry for own false positive, I should look into file history. --EugeneZelenko (talk) 15:26, 12 November 2013 (UTC)
- Done as ignoring all overlay.kml files. (Got some accidental unwanted errors after the last edit, and is now fixed.) --Zhuyifei1999 (talk)
- Could you please make test run again, may be with bigger number of samples. My concern is for subpages in File: name space like overlay.kml. Should be skipped by bot as non-file ones (or may be API has better way to detect is there are file or not). --EugeneZelenko (talk) 15:06, 11 November 2013 (UTC)
By the way, the query I used for a while was:
select /* SLOW_OK */ page_title
from page
where
page_is_redirect=0 and
page_namespace=6 and
not exists (
select *
from templatelinks
where
tl_from=page_id and
tl_namespace=10 and
tl_title in ("License_template_tag","GNU-Layout","No_license","Delete","Speedydelete","Remove_this_line_and_insert_a_license_instead")
limit 1
)
I think now that may be we should start simple, so if the bot could only put all the images meeting the above criteria (and not already in this category) to Category:Media without a license: needs history check or a day subcategory, that would be ideal. Sometimes I run into issues where one typo in some book template causes 100's of images to loose a license. We do not want to sent several hundred notifications to one poor uploader that probably did not broke the template to start with. So multi-step approach with human-in-the-loop might work better. --Jarekt (talk) 15:57, 10 December 2013 (UTC)
- Sorry, I'm busy recently. I'll look into it this Friday. --Zhuyifei1999 (talk) 11:56, 11 December 2013 (UTC)
- Changing to add the category for every match --Zhuyifei1999 (talk) 07:26, 14 December 2013 (UTC)
- Done another test run at [3] with two reruns for improvements. --Zhuyifei1999 (talk) 08:25, 14 December 2013 (UTC)
- Is it possible to create different edit summaries depending on history size? It's not clear what bot did: added category or template. --EugeneZelenko (talk) 15:30, 14 December 2013 (UTC)
- Changed summary assuming you mean like that. --Zhuyifei1999 (talk) 06:31, 15 December 2013 (UTC)
- Looks much more informative for me. --EugeneZelenko (talk) 15:40, 15 December 2013 (UTC)
- Changed summary assuming you mean like that. --Zhuyifei1999 (talk) 06:31, 15 December 2013 (UTC)
- Is it possible to create different edit summaries depending on history size? It's not clear what bot did: added category or template. --EugeneZelenko (talk) 15:30, 14 December 2013 (UTC)
If there are no objections, I think task should be approved. --EugeneZelenko (talk) 15:31, 19 December 2013 (UTC)
- Agree--Jarekt (talk) 15:42, 19 December 2013 (UTC)