Commons:Bots/Work requests/Archive 7

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Category:Incomplete deletion requests - missing subpage

Can someone create a bot which can go through the pages and categories in Category:Incomplete deletion requests - missing subpage (Currently 3 files and categories) and create appropriate sub-pages if they are missing or flush the cache if they are not? I think it is quite tedious to do manually. --Sreejith K (talk) 11:02, 25 March 2012 (UTC)

I did it manually for now. But it will be good to have a bot scan the category periodically. --Sreejith K (talk) 10:43, 26 March 2012 (UTC)

Unrealistically high lifetimes

Is there a way to detect such unrealistically high lifetimes based on the categories xxxx births and xxxx deaths (e.g. difference > 105)? --Leyo 16:11, 12 April 2012 (UTC)

A toolserver query should be able to do that. Maybe you can convince MZM to add a report to Commons:Database reports, especially since it could run at Wikipedia as well. --  Docu  at 16:26, 12 April 2012 (UTC)
Also, birth should be before death. Not sure if you're checking that. --Stefan4 (talk) 16:34, 13 April 2012 (UTC)
I think this is a good idea. We can probably add it to the creator template used on about 10% of Category:People by name pages. Of course that would not be much help to the rest of the categories. --Jarekt (talk) 03:04, 15 April 2012 (UTC)
Commons:Database reports/Unbelievable life spans --MZMcBride (talk) 16:44, 22 April 2012 (UTC)
Good work. BTW maybe the cut-off should be at 123: en:Jeanne_Calment reached 122. --  Docu  at 17:22, 22 April 2012 (UTC)
@MZMcBride: Thanks a lot.
@Docu: It seems that you've already corrected most cases. I just found a few that still needed to get fixed. --Leyo 22:35, 22 April 2012 (UTC)
I tried to do some of the categories (alphabetically by page title from "Category:A .." to "Category:I .." should be done). --  Docu  at 22:40, 22 April 2012 (UTC)

You should probably talk to en:User:WereSpielChequers, since he does some similar reporting across the Wikipedias (I did a one off once, not sure if I have the code though). Rich Farmbrough, 02:48 1 May 2012 (GMT).

Snowbound images

There are a number of images like File:Reste des Forts Rheineck.jpg and File:Halbzeug_1.jpg which contain "2001 SNOWBOUND, ALL RIGHTS RESERVED" in the author field of he EXIF metadata. This appears to be added by the image software, Snowbound, and it does not represent the true author or the true copyright status, since these have all been uploaded to Commons by their creators under a free license. Having the metadata like this is quite confusing to the viewer (it took me a while to figure out what was going on) and potentially quite misleading to reusers, since the file will retain that metadata once it is used outside of Commons. I found 90 images with that string using Google: [1]. Would it be possible to remove or clarify the metadata for all these images? Dominic (talk) 11:18, 20 April 2012 (UTC)

 Support Unfortunately I do not know how to work with EXIF data cleanup. --Jarekt (talk) 15:37, 23 April 2012 (UTC)
This sort of thing might be more common than I thought. There are more than 100 images with "1996-98 AccuSoft Inc., All rights reserved" in the author field, which should be stripped for the same reason. Dominic (talk) 04:45, 25 April 2012 (UTC)
See also Commons:Bots/Work_requests#Filmitadka_EXIF --Jarekt (talk) 15:46, 25 April 2012 (UTC)

Filmitadka EXIF

Per this thread, can we get a bot to modify all of the EXIF data from files in Category:Images from FilmiTadka to remove the passage mentioned in that thread? Sven Manguard Wha? 19:55, 12 April 2012 (UTC)

Ps. Here's a list of all the images containing the passage (or, at least, the string "FilmiTadka is here") in their metadata:

Extended content

Ilmari Karonen (talk) 20:38, 12 April 2012 (UTC)

 Support Unfortunately I do not know how to work with EXIF data cleanup. --Jarekt (talk) 15:36, 23 April 2012 (UTC)
How cleanup? Just remove attack text? PS: can see pyexiv2--shizhao (talk) 14:46, 25 April 2012 (UTC)
I would say remove "Image title" text and in some files like File:Aashita_Dhawan.jpg do something to "Copyright holder" field so it does not try to display Commons template. --Jarekt (talk) 15:34, 25 April 2012 (UTC)
I have remove "Image title" in EXIF (see File:Aanchal Kumar posing with her back at Tassel style lounge launch.jpg) and suppress old image, but "Image title" are still, not removed :( --shizhao (talk) 15:34, 25 April 2012 (UTC)
I just opened the new image with XnView and it still have both IPTC and EXIF tags:
  • Copyright Notice: {{cc-by-sa-3.0|FilmiTadka}}
  • Caption: FilmiTadka, the Big Daddy to tame all the bitches and pompous assess has arrived...
  • EXIF:
  • Copyright: {{cc-by-sa-3.0|FilmiTadka}}
--Jarekt (talk) 15:45, 25 April 2012 (UTC)
I just remove "Image title" in EXIF(FilmiTadka, the Big Daddy to tame all the bitches and pompous assess has arrived...), Medaiwki support IPTC?--shizhao (talk) 15:54, 25 April 2012 (UTC)
I do not know. May be File:Aanchal Kumar posing with her back at Tassel style lounge launch.jpg is fixed now (except for {{cc-by-sa-3.0|FilmiTadka}}in the Copyright field) but it takes some time for some process to update exif data record in the database. --Jarekt (talk) 16:30, 25 April 2012 (UTC)
MediaWiki supports IPTC?: Yes, since a while. -- RE rillke questions? 10:59, 1 May 2012 (UTC)

Category task

A Commons user looks for the quality images and adds the categories: Quality images of #### Oblast and Quality images of people in Russia. Is it possible to make this work automatically, based on the other categories stated in the images?--PereslavlFoto (talk) 12:35, 14 May 2012 (UTC)

Most of the tasks here are one-time tasks, not continuous tasks, which might need some specialized bot. One solution might be to use Template:Intersect categories. --Jarekt (talk) 18:55, 15 May 2012 (UTC)

Eurovision Song Contest

Move Category:Eurovision to Category:Eurovision Song Contest, all "Category:Eurovision x year" to "Category:Eurovision Song Contest x year" (including Category:Eurovision 2008 to check. This is uncontroversial, and a move to the real name, that would avoid confusion with the EBU, Junior Eurovision Song Contest and the Eurovision Dance Contest. J 1982 (talk) 21:18, 25 April 2012 (UTC)

Please use (the talk page of) User:CommonsDelinker/commands. Multichill (talk) 17:02, 17 May 2012 (UTC)

mushroom

I need the description pages of a larger series of images fixed, meaning: For each number of a new set of images I need the description replaced with the one from the corresponding number of an old set.

Background:
I uploaded better versions of Sowerby's mushroom drawings a while ago.
The old ones are:
File:Coloured Figures of English Fungi or Mushrooms - t. 1.png through File:Coloured Figures of English Fungi or Mushrooms - t. 438.png,
the new ones are:
File:Coloured Figures of English Fungi or Mushrooms - t. 1.jpg through File:Coloured Figures of English Fungi or Mushrooms - t. 438.jpg.
(Take care - some numbers are missing: 409, 423, 427 and 437 are missing - at least).

Now copying the appropriate description from the old file over to the replacement for each of the *.png files would already be sufficient.
Some minor other things from the "nice to have" category would be:

I would appreciate any thoughts on implementation concerns and existing methods or tools that might apply. Examples of my own varied types of batch uploads from Flickr can be found here. Thanks -- (talk) 14:30, 9 December 2012 (UTC)

It's certainly possible. The bot itself wouldn't be too hard. The summarising could be harder, but if it sent on midnight all the notifications, it could group by user before emailing him. The texts to be used would need to be carefully designed, though. Platonides (talk) 15:21, 9 December 2012 (UTC)
I believe that all of the issues raised as the result of User:MaybeMaybeMaybe uploading thousands of Flickr images in less than a week have been the result of not understanding or ignoring COM:SCOPE and COM:PEOPLE. Flickr users are unlikely to understand our policies and may not be aware of the applicable laws, so they are likely not in a position to judge whether the upload was questionable. Having a bot notify Flickr users that their images have been uploaded to Commons is only helpful in cases where the Flickr user objects to this upload. Since there is presently no policy which allows the Flickr user to have the image deleted from Commons, this is not helpful. In fact, people who ask for images to be deleted (whether photographer, uploader, or image subject) are often rebuffed. Unless there is a way for Flickr users to have their images deleted (consistently and without unnecessary process), this is actually making the situation worse. Delicious carbuncle (talk) 16:50, 11 December 2012 (UTC)
I don't see how if makes the situation worse, although perhaps it could be made even more helpful. Maybe we should write up a summary of how the most relevant Commons policies and guidelines apply to uploads from Flickr, and adding a link to it in the notification. --Avenue (talk) 16:59, 13 December 2012 (UTC)

othver_versions

In last days, I uploaded several tens of images which contain words "othver_versions" instead of "other_versions" in the {{Information}}. Would be somebody ready to found & replace them by a bot? Thank You! --ŠJů (talk) 22:33, 2 January 2013 (UTC)

Removed the typo in ~170 files ranging from Dec. 23 to Dec. 27. ✓ Done. --McZusatz (talk) 11:15, 3 January 2013 (UTC)
Thank You! --ŠJů (talk) 02:37, 9 January 2013 (UTC)
This section was archived on a request by: McZusatz (talk) 09:01, 9 January 2013 (UTC)

Fix file extensions

Wrong extensions

1,625 files have the wrong file extension of those 913 can be blindly moved as they never hosted another MIME type (Reuploads==0). —Dispenser (talk) 16:34, 22 September 2012 (UTC)

Some (a small amount) lossy saved files in jpg format could be kept if the extension is .png and the artifacts are easily removable... --McZusatz (talk) 17:11, 22 September 2012 (UTC)
I am not sure how to accomplish this so lets consider some options:
  • a bot can easily add {{Rename}} template to the files and some poor file-movers will get stuck with the task of moving them by hand.
  • There might be bots already written for this, but if they are we should make sure the "move" is not just reupload under new name, since such moves do not move the edit history.
  • I wonder if one of "useful_bots_that_you_can_request_services_from" would not be able to help.
  • I looked into AutoWikiBrouser and it can be used to move files in semiautomatic mode, but I would recommend not to use it, since mover can not tell what is the current media type of the file in question. Current media type might have hanged since list was made since someone might have reupload it.
  • Anything in pywikipediabot that might be useful?
Did I miss any of the options? --Jarekt (talk) 05:02, 23 September 2012 (UTC)
Yes, the power and easiness of JavaScript. It should take less than 3h to code a script that steps through the list (and checks whether someone moved it by hand before) instructing Commons Delinker and moving these files. But Filemoving is slow so running it will take a fair amount of time if you don't want to flood the API with simultaneous requests. -- Rillke(q?) 21:37, 24 September 2012 (UTC)

Recently I went throug McZusatz' move log to generate the Commands for the Delinker:

var $pre = $('<pre>'),
    t = '';
$('.mw-logline-move').each(function(i, el) {
    try {
    var $el = $(el),
        $as = $el.find('a[title^="File:"]'),
        f1 = $as.eq(0).attr('title').replace(/^File:/, ''),
        f2 = $as.eq(1).attr('title').replace(/^File:/, ''),
        ex1 = f1.replace(/^.+?\.(\w{2,5})$/, '$1'),
        ex2 = f2.replace(/^.+?\.(\w{2,5})$/, '$1');
        
    if (!ex1 || !ex2 || ex1 === ex2) return;
    t += '{{universal replace|' + f1 + '|' + f2 + '|reason=Tech-maintenance: Adjusting file extension to MIME format:' + ex1 + '→' + ex2 + '}}\n';
    } catch(ex) {
       //console.log($el, $as)
    }
});
$pre.text(t).appendTo('body');

Anyone else who did such moves manually?

One problem remains: Delinker does not replace x with svg. Therefore I could not update the usage for some pages: User:Rillke/universalReplace -- Rillke(q?) 21:37, 24 September 2012 (UTC)

I also did a few manual moves and noticed that in a few cases someone else already beat me to it. --Jarekt (talk) 02:17, 25 September 2012 (UTC)
  • If there is consensus to move these files (what are the benefits/ which policy my bot could cite while moving?), I'll do it. But I've already one complaint I don't really understand on my talk page. -- Rillke(q?) 11:15, 28 September 2012 (UTC)
    • I do not filly understand the .ogv situation, but the image files with extensions not matching MIME types are wrong an potentially confusing for the users. Current upload software does not allow that but we have plenty of the old files. I would assume that it would fall under of AIM # 3 and #5 of Commons:File renaming policy. I do not know if you can get consensus on this page, since not enough people might be watching it. May be ask at Commons_talk:File_renaming to be sure. --Jarekt (talk) 16:11, 4 October 2012 (UTC)
      .ogv should not be used for audio-only-files. The .ogv should be renamed to either .oga or better to .ogg as .oga is still not that good supported. --McZusatz (talk) 17:00, 4 October 2012 (UTC)
    • Also most of the files are "broken" when opened up either in full resolution in the browser and/or downloaded in full resolution to the local hard drive. So it should not be a problem to move all the files which only have one file version in history. --McZusatz (talk) 16:58, 4 October 2012 (UTC)

Perhaps we should wait until bugzilla:40927 is resolved? Having files disappear and then manually fix/report errors of tens of files is not what I have in mind. -- Rillke(q?) 14:48, 16 October 2012 (UTC)

Seems this wont be fixed in near future. --McZusatz (talk) 10:51, 23 October 2012 (UTC)
I think bugzilla:39221 has made things worse for a while, but now it's again safe enough to move files. --Nemo 20:54, 25 October 2012 (UTC)
Please have a look at MediaWiki talk:Gadget-AjaxQuickDelete.js/auto-errors, especially those internal_api_error_DBQueryErrors in the last few days. -- Rillke(q?) 21:15, 25 October 2012 (UTC)
Ok it seems to be fixed. If no one opposes I will start it slowly within the next few days. -- Rillke(q?) 22:20, 17 November 2012 (UTC)
Any updates? I'd like to continue to reporting problems in Commons, but I lose interest if speeds are glacier. Dispenser (talk) 17:36, 5 December 2012 (UTC)
CommonsDelinker is down so replacing now is also not a good idea. The redirects left should work but there is bugzilla:42582. Also if you replace, you have to answer various enquiries at your talk page. It's on my/our TODO-list. -- Rillke(q?) 19:06, 6 December 2012 (UTC)
Would the 600+ files with zero global usage be safe to move? Dispenser (talk) 19:10, 14 December 2012 (UTC)
Some should now disappear from the list. Not sure how to proceed with those with "conflicts" as moving back will be impossible once they have the correct extension. -- Rillke(q?) 01:33, 18 December 2012 (UTC)
I think those need manual fix (i.e. either move, revert or convert). I finished fixing all files with a usage greater than 9 as CommonsDelinker is up and running again. Also all files with more than one conflict got fixed by now. --McZusatz (talk) 18:47, 19 December 2012 (UTC)

Double extensions

For those interested, I've posted to the village pump a report of over 4,800 file names with duplicate extensions. —Dispenser (talk) 04:20, 21 November 2012 (UTC)

Many of them can be fixed by a bot, I think.
 [...]
 File:BasílicaDeSanVicente20110619105253P1120393.JPG.jpg
 File:BasílicaDeSanVicente20110619105306P1120394.JPG.jpg
 [...]
--McZusatz (talk) 18:54, 19 December 2012 (UTC)
I can paste a java-script here that could do the job but I won't run it, as the benefits are relatively low. And before running it might be worth consulting one of our bureaucrats about their thoughts (of they are the only ones commenting on the bot requests anyway). -- Rillke(q?) 15:24, 21 December 2012 (UTC)

Template language subpages which don't use the template's layout page

Can we have a bot tag template language subpages which don't use the template's layout page? Example: {{PD-Art/hy}} doesn't use {{PD-Art/layout}}, it uses {{PD-Layout}} directly, and doesn't pass the parameters to the PD-Art layout page as it should. Rd232 (talk) 10:17, 8 January 2013 (UTC)

Templates like {{PD-Art/mk}} or {{PD-Art/nl}}? --Jarekt (talk) 15:56, 9 January 2013 (UTC)
yes. But not just for PD-Art, or I'd do it myself. This is a much wider problem for many templates, and I think it would be useful to get a handle on it with a maintenance category populated by a bot. Rd232 (talk) 16:09, 9 January 2013 (UTC)
OK, User:Jarekt/d has a list of all language templates that use {{PD-Layout}} but do not call any "/layout" templates. Often they do not call them because there is not one. Also some of the templates should not be using {{PD-Layout}}. --Jarekt (talk) 16:18, 9 January 2013 (UTC)
Great, thanks. Could you do the same for other Layout templates in Category:Style formatting templates, like {{CC-Layout}}? Rd232 (talk) 13:06, 10 January 2013 (UTC)
Sure, is it OK if I limit the list to only the templates that have /layout? We can tackle the rest latter if needed. --Jarekt (talk) 17:01, 10 January 2013 (UTC)
Yes, that's fine. Thanks. Rd232 (talk) 17:05, 10 January 2013 (UTC)
There was only one irregular license template using {{CC-Layout}}: {{Photos by the Norwegian Museum of Cultural History}} and I fixed it. I will check the other layout templates. --Jarekt (talk) 17:41, 10 January 2013 (UTC)
{{GNU-Layout}} is done too. --Jarekt (talk) 20:13, 13 January 2013 (UTC)
✓ Done--Jarekt (talk) 12:33, 15 January 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 12:33, 15 January 2013 (UTC)

Adding template

I'd need Template:Personality rights to be added to all the images in Category:Images by Georges Biard. Some of the pictures in this category already have the template, but most don't. 99% of the images feature people, most of whom are still alive. After the work is done, I'll remove myself the template from the few images that don't feature people, and from the minority of photos which show now-deceased people. JJ Georges (talk) 09:06, 11 January 2013 (UTC)

I could add the template. Where should it go exactly?Smallman12q (talk) 14:51, 12 January 2013 (UTC)
I think it would go in the license section below the actual license, but you should also check there current templates are usually placed. --Jarekt (talk) 16:04, 12 January 2013 (UTC)
Here is one example of a picture with the template. JJ Georges (talk) 18:18, 12 January 2013 (UTC)
✓ Done-Also reverted some vandalism in templates.Smallman12q (talk) 22:33, 12 January 2013 (UTC)
Source to check
x = commons.getcategorymembertexts("Category:Images by Georges Biard")
for m in x:
	if(u'{{self|author=Georges Biard|cc-by-sa-3.0}}' not in x[m] \
	   and u'{{self|Cc-by-sa-3.0}}' not in x[m] \
	   and u'{{self|cc-by-sa-3.0}}' not in x[m] \
	   and u'{{Cc-by-sa-3.0|Georges Biard}}' not in x[m] \
	   and u'{{cc-by-sa-3.0}}' not in x[m] \
	   and u'{{Cc-by-sa-3.0}}' not in x[m]):
		print "no" + m
Source to run
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from Site2 import Site2
from p import p
import sys

def projectreplace(text,license):
    return text.replace(license,license+'\r\n{{personality rights}}')

print "Encoding is: " + sys.getdefaultencoding()
print "UTF8 check: ☠"

commons = Site2("https://commons.wikimedia.org/w/api.php")
commons.login("smallbot",p.bP)
commons.settoken("edit")

x = commons.getcategorymembertexts("Category:Images by Georges Biard")

for m in x:
    if(u'{{Personality rights}}' not in x[m]\
       and u'{{personality rights}}' not in x[m] \
        and u'{{personality rights warning}}' not in x[m]):
        newtext=x[m]
        newtext=projectreplace(newtext,u'{{self|author=Georges Biard|cc-by-sa-3.0}}')
        newtext=projectreplace(newtext,u'{{self|author=Georges Biard|Cc-by-sa-3.0}}')
        newtext=projectreplace(newtext,u'{{self|Cc-by-sa-3.0}}')
        newtext=projectreplace(newtext,u'{{self|cc-by-sa-3.0}}')
        newtext=projectreplace(newtext,u'{{Cc-by-sa-3.0|Georges Biard}}')
        newtext=projectreplace(newtext,u'{{cc-by-sa-3.0}}')
        newtext=projectreplace(newtext,u'{{Cc-by-sa-3.0}}')

        if(newtext != x[m]):
            print "Updating" + m.encode('utf-8','ignore')
            commons.edittext(m,newtext,u'[[Commons:Bots/Work_requests#Adding_template]]: Adding {{personality rights}} to files in [[:Category:Images by Georges Biard]].')

print "Done"
Thanks a lot. I'll remove the template from the few images which don't feature human beings. JJ Georges (talk) 18:57, 13 January 2013 (UTC)
This section was archived on a request by: McZusatz (talk) 10:06, 15 January 2013 (UTC)

Hi everybody. Please move articles in "Category:Files by User:Midi7" to "Category:Files by User:Miďonek", because user was renamed. Thanks, Érico Wouters msg 01:09, 22 December 2012 (UTC)

Hi everybody, thanks for the move. Could any bot replace all parametres "Author" in my files for a new one [[User:Miďonek|Radim Holiš]]? There are different variants of my name now. My files are available in Category:Files by User:Miďonek. I'm sorry of my bad English, hope you understand me. Merry Christmas, --Miďonek (talk) 22:14, 23 December 2012 (UTC)

Hi everybody, did you read the introduction to this page which is the same as the edit notice. Great. HNY. -- Rillke(q?) 00:12, 1 January 2013 (UTC)

 Doing… the bot is currently running.  Hazard-SJ  ✈  04:00, 16 January 2013 (UTC)

✓ Done  Hazard-SJ  ✈  19:31, 22 January 2013 (UTC)
This section was archived on a request by:  Hazard-SJ  ✈  19:31, 22 January 2013 (UTC)

Change license from PD-Art to PD-Art|PD-old

The template of {{PD-Art}} changed. Can you please replace the template {{PD-Art}} to {{PD-Art|PD-old}} in categories that it is obvious PD-old case:

May be it is good idea to chage to all tha painters Category:Paintings by painter. Geagea (talk) 02:37, 4 December 2012 (UTC)

There are 158k files Category:PD-Art (PD-old default) so that might take a while. Ideally all files using {{PD-Art}} would be also using {{Creator}} and {{Creator}} adds different Category:Empty tag templates depending on the year of death of the author. So one can use CatScan2 to get a list of files that are both in Category:PD-Art (PD-old default) and transcludes {{Works of authors who died more than 100 years ago}} tag. For those files we can be automatically replace {{PD-Art}} with {{PD-Art-100}}. Unfortunately many files do not use {{Creator}} templates. --Jarekt (talk) 13:15, 4 December 2012 (UTC)
When doing such replacements please don't use just the generic PD-old - at least use {{PD-old-70}}. But {{PD-Art|PD-old-auto|deathyear=XXXX}} always works, and {{PD-Art|PD-old-auto-1923|deathyear=XXXX}} for works published before 1923 is even better. Rd232 (talk) 19:52, 4 December 2012 (UTC)
I converted some {{PD-Art}} to {{PD-Art|PD-old-100}} for files with creator templates, but that trick worked only for ~7k files, so 151k to go. I will try to add some more based on intersections with categories like Category:16th-century paintings. --Jarekt (talk) 17:58, 7 December 2012 (UTC)

I am giving up on that task. First I do not see the point of changing {{PD-Art}} to {{PD-Art|PD-old}} or {{PD-Art|PD-old-70}}. For last several years {{PD-Art}} and {{PD-Art|PD-old}} gave the same results. If we want template in files that use {{PD-Art}} to look like {{PD-Art|PD-old}} than the simplest way is to change the template, not 150k files. Most {{PD-Art}} can be changed to {{PD-Art|PD-old-100}} or {{PD-Art|PD-old-100-1923}}. Unfortunately all the easy cases, like files using {{Creator}} are done and we are stuck with the rest. I tried intersections with categories like Category:16th-century art but I am finding too many images which are from 1800s or 1900s. I do not think there is a safe way to do that by a bot. The proper way to do it would be to add more {{Creator}} templates first. --Jarekt (talk) 17:22, 21 December 2012 (UTC)

Yes, it is shameful how unclear the Commons-rules are and if so, not even admins then stick to it. Commons requires paid group of people. What is with {{PD-Art-100}}, {{PD-Art-70}}? -- ΠЄΡΉΛΙΟ 18:34, 21 December 2012 (UTC)
all the easy cases, like files using {{Creator}} are done - I'm not sure how true that is. (i) Are all PD-Art files with creator templates tagged based on the creator info? (ii) in very many cases the {{Creator}} template isn't applied, but we know from the categorisation that it should be (for "works by" categories). Could we have a user-directable bot task to add creator templates based on categories? I mean, have a page User:Bot/creator-from-category, where commands can be added in a standard format category name / creator, and the bot processes these. The bot won't always be able to recognise the existing author info and remove it as redundant, but in those cases it can just add the creator template, since that's the key thing. Rd232 (talk) 18:55, 21 December 2012 (UTC)
Apart from the creator-license, for me it is to make different licenses according to the technical creating of the image total nonsense. But that's a personal view and only incidentally. -- ΠЄΡΉΛΙΟ 19:02, 21 December 2012 (UTC)

Note: I'm finding Help:VisualFileChange.js works very well for this task (mass changes for many similar files, eg files from an old book (example diff). I've knocked off c 400 files in less than an hour (including creating some Creator templates and getting used to the idea). Process: look in Category:PD-Art (PD-old default) and identify similar filenames, then find the relevant category, check there's nothing unexpected, and do "perform batch task" on it. Rd232 (talk) 01:41, 22 December 2012 (UTC)

Great, if more people get involved than we have a chance tackling it. Rd232, you are right, I did not get all the files with creator templates. I did get all the ones with creator templates who died more than 100 years ago, but the rest is much more tricky since the best way is to use some sort of {{PD-Art|PD-old-auto|deathyear=19??}}, but that is doable. You mentioned an approach of adding creator templates to files first, I tried it several times over the years and never found a good or fast way to do it. In the end of the day you need to replace author string with creator template, it is quite dangerous to just delete author string and replace it, so one needs to look at content of the author field and that would require some semiautomatic approach, which would takes way too long if you want to process many thousands of files. I will think more about it after I finish all the files with creator template. --Jarekt (talk) 01:21, 24 December 2012 (UTC)

For a moment there are no files in Category:PD-Art (PD-old default) or Category:PD-Art (PD-old) that have creator template. They all now use either {{PD-Art|PD-old-auto|deathyear=19??}} or {{PD-Art|PD-old-100}}. --Jarekt (talk) 19:41, 30 December 2012 (UTC)

Narrow categories on a lot of images

I have a lot of images that are currently listed at User:Nyttend/Bloomington replacement, and all of them need to have an architecture category narrowed; bot help would be appreciated, since if I counted correctly, there are 1,465 images in total. Each section contains members of a different category:

  1. Please move images in Category:Vernacular architecture of Indiana to Category:Vernacular architecture of Bloomington, Indiana
  2. Please move images in Category:Bungalows in Indiana to Category:Bungalows in Bloomington, Indiana
  3. Please move images in Category:American craftsman style in Indiana to Category:American craftsman style in Bloomington, Indiana
  4. Please move images in Category:Italianate architecture in Indiana to Category:Italianate architecture in Bloomington, Indiana
  5. Please move images in Category:Queen Anne architecture in Indiana to Category:Queen Anne architecture in Bloomington, Indiana
  6. Please move images in Category:Tudor Revival architecture in Indiana to Category:Tudor Revival architecture in Bloomington, Indiana

As well, a lot of the bungalows are also in Category:Arts and Crafts houses in Indiana. I'd appreciate it if these bungalows could simultaneously be moved to Category:Arts and Crafts houses in Bloomington, Indiana.

I created the list page by going through these categories and removing all members that aren't in Bloomington, so as long as you instruct the bot only to edit the files that are listed on this page, you shouldn't worry about the bot changing categories for images of buildings in other cities. Since the whole thing is a simple category replacement, I doubt that this will be too difficult; Avicennasis' AvicBot performed a similar task for me in July without much difficulty. Note that the new categories don't yet exist, but if someone volunteer to help with this, I'll create them. Nyttend (talk) 19:28, 18 December 2012 (UTC)

I thought I could get VFC to handle this, but it was quicker for me to do something more complex :-) I'm letting this sort itself out slowly, so it will probably take about 8 hours from now to finish. Please ensure the categories are created. Thanks -- (talk) 17:09, 9 January 2013 (UTC)
✓ Done -- (talk) 09:40, 10 January 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 19:21, 25 January 2013 (UTC)

I need a bot performing a null-edit on all content in this cat, the only way to filter-out already resolved issues. This cat is auto-placed by wiki software but not auto-removed. Only a null-edit removes items from this cat. Cat-a-lot doesn't work here. --Denniss (talk) 17:15, 1 January 2013 (UTC)

 Doing…, I've been running a script that does this on enwp for a while, so its rather easy. Legoktm (talk) 03:03, 3 January 2013 (UTC)
Bump to prevent archiving. --Denniss (talk) 15:30, 8 January 2013 (UTC)
I did all the subcategories. I will also do galleries, but skip user talk pages - we expect broken file links there. --Jarekt (talk) 13:09, 22 January 2013 (UTC)
I did a null-edit to all the files and galleries. --Jarekt (talk) 13:13, 25 January 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 13:13, 25 January 2013 (UTC)

Template to files

Hi, can you add {{Mediagrant|Události}} to files from Category:Rogar please.--Juandev (talk) 22:41, 10 January 2013 (UTC)

✓ Done with VFC. --McZusatz (talk) 00:23, 11 January 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 02:15, 26 January 2013 (UTC)

HTML artifacts from geograph

Pages like:

include HTML from the source. If there was way to clean this up by bot that would be great. --  Docu  at 13:57, 20 January 2013 (UTC)

✓ Done. I succeded with ~3-4k files. The rest will have to be done by hand:
--Jarekt (talk) 19:19, 25 January 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 19:19, 25 January 2013 (UTC)

Great. Thanks! --  Docu  at 03:59, 28 January 2013 (UTC)

The last ones are also ✓ Done --  Docu  at 04:33, 28 January 2013 (UTC)

Move images to specific scientific category from Category:Photos by Jason Hollinger (uncategorized)

Hi. I imported some images from flick (about 2000) and most of them are very well tagged. Most image descriptions look like this: Description = Scientific Name: ''Lessingia filaginifolia''. The bot would need to check for the scientific name and check if there is a category with this name. If there is, move it to there (and remove: Category:Photos by Jason Hollinger (uncategorized). I will manually move all the ones which get left out. Thanks! Amada44  talk to me 18:11, 30 January 2013 (UTC)

User:JarektBot is working on it, using this code. --Jarekt (talk) 14:51, 6 February 2013 (UTC)
I did what I could. The rest will have to be done by hand. I also created Category:Photos by Jason Hollinger (create new taxon category) for images where I found scientific name, but no matching category. For those I added red-link category which will have to be created. --Jarekt (talk) 16:58, 6 February 2013 (UTC)
Since scientific name has been found, I suggest using the bot to look for genus in Wikipedia or Wikispecies and find upper taxa in toxoboxes. This way, the bot can find the most specific existing categories in Commons for those images, and even create the species category and categorise it in the right place.--Pere prlpz (talk) 18:35, 8 February 2013 (UTC)
That is possible, however I have never dealt with taxon categories before and I do not want to reinvent a wheel. Are there any existing bots or codes that do that? --Jarekt (talk) 21:27, 8 February 2013 (UTC)
I automatically created bunch of taxa categories so now there are 64 files in Category:Photos by Jason Hollinger (create new taxon category) with possibly correct scientific. Some of them are pretty obscure and require multiple levels of categories to be created, others are just misspelled.... --Jarekt (talk) 15:38, 11 February 2013 (UTC)
Now I have fixed some spellings. Moreover, in some cases changed taxonomy is the reason for missing categories. Cladina is regarded as a synonym of Cladonia and Dentaria as a synonym of Cardamine. --Franz Xaver (talk) 14:31, 12 February 2013 (UTC)
Ohh, I completely missed the progress! Thanks to all who helped out here. I'll do the manual work! Thank you! Amada44  talk to me 10:27, 13 February 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 18:46, 16 February 2013 (UTC)

Special:WantedCategories

Hi, could a bot create all red categories in Special:WantedCategories and with a name that start with "Rijksmonumenten" or "RCE suggested::" with a content of [[Category:Rijksmonumenten categories to be classified]]. As it is standing now, Special:WantedCategories is useless for months to detect new batch uploaders and systematic bad category naming. Thank you. --Foroa (talk) 18:02, 16 February 2013 (UTC)

✓ Done --Jarekt (talk) 04:12, 17 February 2013 (UTC)
Great, thank you, [[Special:WantedCategories] will become exploitable again. --Foroa (talk) 11:37, 18 February 2013 (UTC)
No problem, you do enough good work with categories, the last think you need is to create two thousand categories. The only thing worrying me is that someone might stop using them one day and we will be stuck with manually deleting few thousand categories. I do not know if there is a way to mass delete such categories. --Jarekt (talk) 19:00, 18 February 2013 (UTC)
py bot can do that. You could also tag them for speedy deletion and I think there is a script that can delete them from there. --  Docu  at 19:03, 18 February 2013 (UTC)
Good thing to know. I can manually delete ~4-5 pages per minute with AutoWikiBrouwser when I am logged in with my admin account, and so far that was all I needed. However I just might look into tagging for speedy deletion, if I run into large batch. --Jarekt (talk) 19:08, 18 February 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 19:00, 18 February 2013 (UTC)

Convert all interlaced JPGs

Marat in swapdeath due to convert memory leak for an interlaced JPG

I'm currently making a list of all interlaced big JPGs on Commons, for bugzilla:17645; completing it will take about a couple months (unless I use more of Toolserver's resources). So far, out of 25 thousands JPGs above 5 MB, 3,5 % is interlaced, so there should be over twenty thousands in total. Converting them is as simple as running convert -interlace none, so maybe someone with enough CPU and bandwidth can start preparing a bot for the task. --Nemo 10:44, 17 October 2012 (UTC)

Sounds great but I dont think the 5 MB border is a good choice. It seems the problem is with pixel count? --McZusatz (talk) 19:49, 17 October 2012 (UTC)
It's an arbitrary choice: we have to start somewhere and under 5 MB they're unlikely to break. --Nemo 21:51, 17 October 2012 (UTC)
I already made lists of progressive JPEGs (and every other image type) that ImageMagick cannot thumbnail. Why convert images which thumbnails correctly? Dispenser (talk) 21:39, 17 October 2012 (UTC)
Why? «Do not use interlaced (a.k.a. progressive) JPEG compression» said by Tim Starling seems enough of a reason. Where are your lists? I found a faster way (still sub-optimal), in 4 days my list should be ready. --Nemo 21:51, 17 October 2012 (UTC)
I think I should be able to help with converting the images using the Wikimedia Polska toolserver; CPU power and bandwidth should not be a problem (but I can only get ca. 200 GiB of HDD space for that). odder (talk) 21:56, 17 October 2012 (UTC)
It shouldn't be hard for you to write and use a script which downloads, converts and uploads on the fly (curl can be piped to convert to start with), so storage is not a problem. --Nemo 08:40, 18 October 2012 (UTC)
Yes, but recoding them to baseline means a loss in quality. And if the original file is rendering fine you should not reupload a new one. --McZusatz (talk) 11:38, 18 October 2012 (UTC)
Loss in quality with convert? How so? And how do you propose to identify which files are rendering fine? And do you think it's wise to ignore Tim Starling's recommendation? --Nemo 15:36, 18 October 2012 (UTC)
You're holding Tim Starling's opinion too high. His opinion is why only now we're getting Lua script instead of years ago and the $100,000+/year in unnecessary hardware that's cost us. The <100 unthumbnailable progressive JPEGs were added to Category:Progressive mode JPGs to be saved in Baseline mode. Our time is better spent: 1) Fix large image support in ImageMagick 2) Changing the upload wizard 3) Fix other stuff I keep blabbing about (like #Fix file extensions or TIFF's "Metadata uses too much space" error :-). Dispenser (talk) 05:43, 19 October 2012 (UTC)
I'm not spending much time on this issue, you seem to be consuming a lot to say we shouldn't though. --Nemo 19:24, 19 October 2012 (UTC)
So convert is lossless? ok.
You can determine the broken files by finding Insufficient memory (case 4) as an error message in the thumbnail url. --McZusatz (talk) 08:33, 19 October 2012 (UTC)
Doesn't scale. --Nemo 19:24, 19 October 2012 (UTC)
Once mutlithreaded, it scales very well and had to include a throttle it to averaging 2-3 images per second (admittedly a bit more than robots.txt). Plus it performs an end-to-end check, finds missing and corrupt images, and provides thumbnails for WikiMiniAtlas. Dispenser (talk) 21:24, 23 October 2012 (UTC)
The list is finished by now? --77.2.41.90 09:28, 21 October 2012 (UTC)
Not yet. --Nemo 14:31, 21 October 2012 (UTC)

Here's the list: tools:~nemobis/interlaced-exiftool.txt.

"Loss in quality with convert? How so?" This comment make me very uneasy. Can you give only one example file what will do the bot!? -- -- ΠЄΡΉΛΙΟ 14:13, 23 October 2012 (UTC)
I'm fairly certain that convert is performing a conversion from the DCT domain into the real space domain and then just recompres as a non-progressive JPG. This will indeed lead to round-off errors causing a degradation in image quality. If you must do this nonsensical task, please do it right, or you will do more damage than good. In theory it should be possible to perform a lossless conversion from progressive to baseline by just changing the order the DCT coefficients are stored in the file. --Dschwen (talk) 14:28, 23 October 2012 (UTC)
My "How so?" wasn't meant to make anyone uneasy, I just asked why because I didn't know. Thanks Dschwen, I've asked and I was told that jpegtran -copy all -perfect in.jpg > out.jpg is what we want. It's completely lossless and also three times faster than convert in my test. --Nemo 20:42, 25 October 2012 (UTC)
With perfect you have to trap cases where jpegtran fails, just a heads up. But otherwise, yes, this is a viable solution.--Dschwen (talk) 21:21, 25 October 2012 (UTC)

Ping! Nemo 09:18, 26 November 2012 (UTC)

If there is a working script to convert I can run the job. I have a mostly idle bot and cable internet access --Slick (talk) 14:56, 26 November 2012 (UTC)
I believe the task is the following pseudocode using this list: tools:~nemobis/interlaced-exiftool.txt:
foreach (string line in lines)
{
    download(line) //original
    run jpegtran -copy all -perfect line.jpg > newfile.jpg
    upload(newfile)
    delete(line) //delete local original file
    delete(newfile) //delete local new file
}

Is this what you're looking for? Slick should be able to write the script.Smallman12q (talk) 00:59, 27 November 2012 (UTC)

Yes I can write a script, but there are some questions I need help:

  1. How do I get the file download url by filename with the api? (Or it necessary to parse the html webpage of the file?)
  2. How do I upload a new version of a file? I only know Pywikipediabot/upload.py. But this only "override" the whole page (IMHO) (and can not create a new version information).

Maybe there are already running bot with similar jobs, where can I get the source to study? --Slick (talk) 15:50, 27 November 2012 (UTC)

For 1. I found a solution: http://commons.wikimedia.org/w/api.php?action=query&titles=File:<FILENAME>&prop=imageinfo&format=xml&iiprop=url|size but not for 2. yet. --Slick (talk) 16:13, 27 November 2012 (UTC)

Looking at the source of upload.py you'll see a class called UploadRobot. You set the ignoreWarning to true to overwrite. You can see an example here.
from upload import UploadRobot
bot = UploadRobot(name.fileUrl(), description=descrip, useFilename=name.fileUrl(), keepFilename=True, verifyDescription=False, ignoreWarning=True, targetSite = commons)
bot.run()

Smallman12q (talk) 01:02, 28 November 2012 (UTC)

Right. Also, as Dschwen wrote above: remember to write to a log file the list of images which failed conversion with jpegtran -perfect, for later consideration (we'll need to check those and decide what to do with them, depending on how many they are and what quality losses they'd have). --Nemo 09:12, 28 November 2012 (UTC)

Ok, will do. Can take some days to start ... --Slick (talk) 18:39, 28 November 2012 (UTC)

The bot is running now. But I exclude all files with a " in the filename. They have to be convert by hand or other script, because I guess this makes a lot of trouble to rewrite the script to work with these files. The list of this non processed files can found here. The source of the running script can found here. --Slick (talk) 18:25, 29 November 2012 (UTC)

There is a problem with some files I cant found. i.E. with this file. The logfile says the upload is succesfull done, but there is no new version. Any idea? --Slick (talk) 18:46, 29 November 2012 (UTC)

I cancel the job because there is a problem with non-ascii chars in general. Can anybody help me to fixup the script? If I cat the filenames-file in the linux console all chars are fine, but during upload they are converted wrong, i.E. make "File:Nørre_Nebel_-_Kirche7.jpg" to "File:Nørre_Nebel_-_Kirche7.jpg". I have no idea how to fix. Maybe the pywikipedia is not utf8 safe? --Slick (talk) 19:51, 29 November 2012 (UTC)

i.E. a other file that does not work is File:Украина,_Киев_-_Флоровский_монастырь_04.jpg --Slick (talk) 20:34, 29 November 2012 (UTC)

Not sure if this will help, but have you added "# -*- coding: utf-8 -*-" to the top per PEP 0263? You have an encoding issues somewhere.Smallman12q (talk) 23:41, 29 November 2012 (UTC)
PEP's are for python, aren't they? But this 'coding: uft-8' stuff might be more general... don't know. What I do know is pywikipedia is "utf8 safe" (if not, it is a bug and should be reported!) but I am not sure about your sh-script... Looks like passing from sh script to python does not work properly... I would not mix them and therefore write a plain python instead of an sh-script, then you can easily access other python scripts (like upload.py) from there. Greetings --DrTrigon (talk) 10:09, 30 November 2012 (UTC)
I can not write scripts in python (and I dislike to learn the language now). If it necessary please help me and write a plain python script. IMHO the bash script is fine, so the bug is not on the bash script. Please help me to get a working solution. --Slick (talk) 14:25, 30 November 2012 (UTC)
bump --Slick (talk) 05:26, 4 December 2012 (UTC)
Learning a new language - especially python (!) - is always a good thing, since it does expand your horizon. Anyway if you dislike you can try to get help from one of the (other) pywikipedia developpers by writing a support request e.g. to pywikipedia-l maillist. Might be very probable that someone there has time for your task. Sorry that I cannot give you another answer! Greetings --DrTrigon (talk) 09:38, 7 December 2012 (UTC)
Because there isnt a working script and I cant create it, I can not support this job futhermore and revert to my first statement: If there is a working script to convert I can run the job. I have a mostly idle bot and cable internet access --Slick (talk) 15:19, 9 December 2012 (UTC)
Thank you nonetheless! In the meanwhile I've completed the list of the progressive JPGs below 5 MB, they're about half a million. Nemo 23:30, 12 December 2012 (UTC)
I had a look at the involved scripts. Your shell script looks fine. And the pywikipedia scripts also look fine (They all have the # -*- coding: utf-8 -*- in the header). I could reproduce the issue when console_encoding was not set to utf-8. So could you please check if /pywikipedia/user-config.py has console_encoding = 'utf-8' in it? --McZusatz (talk) 22:01, 18 January 2013 (UTC)
Big Thanks. console_encoding was not set in my /pywikipedia/user-config.py. So I will add it and run some tests next days. --Slick (talk) 17:57, 19 January 2013 (UTC)
I did some tests. Looks good. (Will keep it running.) But there are a lot of images that are already non-progressive (Baseline DCT, Huffman coding) (i.E. [6], [7], [8]). Maybe the check script which create the list (this or this, which one?) did not work fine. My new script will ignore files like this, because there is no need to convert them IMHO. --Slick (talk) 21:36, 19 January 2013 (UTC)
Great! Thank you very much (and thanks McZusatz for finding the culprit). Yes, my script was very silly because I'm not a programmer, there's an error in 7th line where it assumes that the URL given by the API is between double quotes and doesn't contain quotes in itself. There should be way less such false positives when you go on in the list and I suppose you should have checked anyway for cases where conversion was null or impossible, so I hope my error doesn't give you too much additional work? Sorry, Nemo 10:39, 20 January 2013 (UTC)
On my discuss page users requested how to save images to have them already fine. Is there a explaining page to link or it is possible to create a small howto? I guess more users will request next time while the bot is replacing thousands of files. --Slick (talk) 16:57, 20 January 2013 (UTC)
And can anybody please add this page here to the bugzilla, so users maybe can find more information? (I do not have a account there yet) --Slick (talk) 17:02, 20 January 2013 (UTC)
Rillke added the link; as replied on your talk, I've created Help:JPEG#Progressive JPEGs to explain everything. --Nemo 23:09, 20 January 2013 (UTC)
  • I got a question on my talk page[9]. It is necessary to convert all progressive images or only this one that makes trouble (i.E. with thumbnails)? I only running the bot, I did not create the lists of files to convert. I dont know. --Slick (talk) 15:43, 24 January 2013 (UTC)
    Whether thumbnailing will fail or not is not within our control, and depends on several factors which are changed without notice, including how much RAM is assigned to the processes on the imagescaling servers. Tim Starling told us not to use progressive JPGs (at all), so it's wise to follow his suggestion. --Nemo 09:42, 25 January 2013 (UTC)
    I can't agree with you Nemo: a general principle says 'if it is not broken, don't fix it.' The obvious solution is to convert only problematic images, not to convert all images here. If we have to change settings on the server side and use a lower memory threshold, we'll have to find another solution, but it is not the case currently. Seriously, the obvious solution would be to recode the thumbnail process to work within memory limit for progressive images... Or to put one more server with more memory dedicated for handling images that failed once... Esby (talk) 10:23, 25 January 2013 (UTC)
I think it is enough to convert the progressive jpegs larger than 5MB. (The list you created first). All others have a very low chance of being rejected by the image scalers. --McZusatz (talk) 11:33, 25 January 2013 (UTC)
Ok, I will only process the first list (> 5 MB), unless there is another reason here. If there is another reason, tell me. --Slick (talk) 19:26, 25 January 2013 (UTC)
And I will think about to run a always running bot, to check fresh uploadet images (>5 MB), unless there is a solution. --Slick (talk) 19:28, 25 January 2013 (UTC)
IMHO the only reasonable solution for future uploads is converting all existing images so that users know about the issue: the thumbnailing process is not going to change any time soon. Otherwise we could at least add a small informative template saying that the image is interlaced and linking the help page, although reducing memory usage on the thumbnailing servers would probably be a nice side-effect of the conversion. --Nemo 10:04, 26 January 2013 (UTC)
If a upload is done, the first thumbnails are created already. If now convert the image (by bot) even more thumbnails have to be created, so you need even more resources. So the absolute best way is to convert nothing. On the other hand, if you do nothing, the user does not know about the problem and it can be bigger in future. So the best solution for less use of the thumbnailing is to inform the user before/during the upload IMHO. Maybe there is a way to detect progressive during the upload and ask the user "Are you sure..?". But unless this is solved, I agree with you, the best way should to convert all (new uploadet) progressive images to inform the user and make that he is thinking about (because I guess the users does not like that a bot is touching his files). (And I agress with the others, to convert ALL (older, smaller) progressive images is to much, but not impossible) --Slick (talk) 11:06, 26 January 2013 (UTC)
Informing the user during upload is not that easy and it's not going to happen soon. We also don't want to discourage newbie uploaders. I think the best solution is to add a simple template to all those interlaced images that we're not going to convert, just to link Help:JPEG as informative material for the reusers and the uploaders who keep them in watchlist.
We're just guessing here, but I'm not sure reupload increases resources, because the thumbs that must be (re)created are very few (120, 220, 800px, not much more and sometimes less) while the thumbs that can be requested (on which we'd save memory) are unlimited; in fact we have 6 millions used images but about 130 millions thumbnails.[10] --Nemo 11:30, 28 January 2013 (UTC)
  • The list with images >5MB is ✓ Done. Because there isnt a clear agreement about the other images, I will do this now: I will rewrite the bot. Than I will run it with the list contain possible progressive images <5MB, but will not convert it, just add the Template:Use_baseline (if progressive). So please, update the template and add a link to Help:JPEG, maybe the bug # too! (Other check bots (maybe my future one) can add this template to other progressive images.) If there will be a agreement what to do now (convert it or not) we can process all this files later. And please talk about what to do now to resolve disputes amicably. --Slick (talk) 17:54, 30 January 2013 (UTC)
  • I found a important bug so I cant process the list. If I download a file, sometimes a old version is given. i.E., I did the same command two times and get different files:
$ wget --no-cache -q "http://upload.wikimedia.org/wikipedia/commons/9/9b/%22The_Jacobite%22_approaching_Beasdale_Station_-_geograph.org.uk_-_1023985.jpg" -O - | exiftool - | grep "Encoding Process"
Encoding Process                : Progressive DCT, Huffman coding
$ wget --no-cache -q "http://upload.wikimedia.org/wikipedia/commons/9/9b/%22The_Jacobite%22_approaching_Beasdale_Station_-_geograph.org.uk_-_1023985.jpg" -O - | exiftool - | grep "Encoding Process"
Encoding process                : Baseline DCT, Huffman coding

I looks like there is a bad cache in the round robin or like this. I havent a bugzilla account, so can anybody please post this or contact a cache-admin? Examples files are File:"The_Jacobite"_approaching_Beasdale_Station_-_geograph.org.uk_-_1023985.jpg and File:"Onion-skin"_renal_arteriole.jpg

--Slick (talk) 13:25, 2 February 2013 (UTC)

So, as discussed on #wikimedia-tech: as long as you don't process files you've already converted, this should be ok now. It was a cache purging bug which is now fixed and didn't exist when the lists were made. I'll check how many small files you already converted, they can probably be purged by hand. I'm also editing the template. --Nemo 13:57, 2 February 2013 (UTC)
Ok, all works fine now. The bot that running this job is NonProgressive and is waiting for the confirmation. --Slick (talk) 14:35, 2 February 2013 (UTC)
I am not sure if this has a positive effect. The bot run will add a lot of noise to many pictures. The template suggests to upload a non interlaced image. But this was discussed not to do for very small images. Besides that it would be a waste of time for many contributors to reupload a (maybe recompressed) version and remove the template tag. Also many images (e.g. from geograph.org.uk) will most likely wear this tag forever.
I think it would be a better way to inform all uploaders and leave the JPEG as they are. --McZusatz (talk) 15:05, 2 February 2013 (UTC)
"The template suggests to upload a non interlaced image." Thats the goal as I understand Nemo. And we dislike to convert the smaller images by self as discussed. After the job is finished we have a fresh overview how many images and than we can discuss about again. To remove the template - if really not need - is very fast. --Slick (talk) 15:14, 2 February 2013 (UTC)
IMO this is going way too far now, and is liable to cause more problems than it solves (as a result of image degradation in the conversion process). Progressive JPGs are problematic, but only if they are large. Future changes to server hardware and software might mean smaller file causing problems. However, it strikes me as absurd to think 1MB files will start causing troublein the future, when 5MB files are consistently OK now - do we really think WMF's servers will degrade like that? And if they do, its probably going to be associated with other more serious degradation needing developer fixing, not user workarounds.
There is zero benefit to converting small files - such as File:057332 cdaaab8b-by-Martin-Bodman.jpg - the thumbnails generate just fine, who cares if its progressive or not? IMO even tagging such files with {{Use baseline}} is too much, why put a problem template on a file, when there isn't a problem with it? A tracking template which puts such files in Category:Progressive mode JPGs would be fine, but the problem template and Category:Progressive mode JPGs to be saved in Baseline mode should be reserved for those files where the conversion would actually be beneficial.
If many of the Geograph uploads are progressive, then thousands of bot-uploaded images will be tagged, yet that will not affect uploader behaviour at all, and will cause unnecessary labour when trying to fix the larger files. In short: Leave small files alone (such as <1MB).--Nilfanion (talk) 20:11, 2 February 2013 (UTC)
Small files are not going to be converted, as said above. The template suggests to convert only "large images", which is quite generic and left to the user's judgement: many will discover a bug in their photoshop or a checkbox they didn't notice in their GIMP and decide not to reupload but to be careful next time, and so on. It's just informative. If you think the suggestion is excessive, you can edit the template: I agree that it's a bit too big; it can always be made into an invisible template that adds just a category if that's the consensus. It doesn't have any use as "template for bigger interlaced JPEGs only". --Nemo 21:01, 2 February 2013 (UTC)
To find a consensus, what about this: just add all found (small) progressive files to Category:Progressive mode JPGs and write a short summary on this category to inform the user about. Move all files currently in Category:Progressive mode JPGs to be saved in Baseline mode to this category and remove the template. But then I like to get a definition what is not a small image but a "big progressive image that makes trouble" to add them to Category:Progressive mode JPGs to be saved in Baseline mode instead of Category:Progressive mode JPGs by bot. --Slick (talk) 21:47, 2 February 2013 (UTC)
Please stop your bot ASAP, there's no need to fill the problem cat with thousands of non-problematic images. To be on the safe side with those progressive JPG I'd adjust the filesize limit down to below 4MB so everything larger should be converted. All other images should just get an informal template. Please have a look at Template:Progressive mode JPG I just created with my limited skills. --Denniss (talk) 01:54, 3 February 2013 (UTC)
Stopped and waiting how to resume now. --Slick (talk) 08:01, 3 February 2013 (UTC)
{{Use baseline}} reads like a user message, not a file template. A file template needs to say "This file is a progressive JPG". The new template made by Denniss works, but I'd prefer it being much less intrusive (smaller text, not red) - or hidden entirely - as its an informational only, about something not-at-all important to that file and is likely to sit there forever. One template could be used for this - add a switch like |convert=yes when conversion is required, and the switch can make the template more prominent, alter the text to "change this file" and put it in the problem cat.--Nilfanion (talk) 08:35, 3 February 2013 (UTC)
I've incorporated the proposed changes in the old template, why should we use a new one? It's not a good idea to tell "this is big" or "this is not big", we don't have precise limits to use. Nobody is using the old category to find file in need of a fix, anyway: there is Category:Images without thumbnails for that. --Nemo 11:54, 3 February 2013 (UTC)
I've toned the template down, and directed it to the broader category. If the template is being used on all progressives, it should not say (or imply by its design) that the file it is on is bad. It should say large progressives can cause problems, and should be converted. It should not say "this file should be converted", regardless of if it has problems or not. The category redirect is for the same reason - it should only be in a cat that says it needs to be saved as baseline, if it actually needs to be coverted to baseline. The old cat can, and IMO should, be used to track the files which need to be converted as opposed to all progressives.-Nilfanion (talk) 19:08, 3 February 2013 (UTC)
  • Just for information ... sometimes I hate to help in this job. Anybody say: do this .. another say: no, do this. - the third say: not at all .. do this. I am a bit confused about the processes here. I like to help but no one tell me what to do is right to not get in trouble. I not like to explain what I doing when I do what I should do. I dont know about your skills and so I dont know which comment is the important one and which one is in right. Please talk about what todo and when finish write on my talk page to please me to help to run the bot. I think I have a bit knowledge in scripting, but am not the ball to play around with. If you dont know what the job is that is todo, do not request it. Over and out. --Slick (talk) 10:11, 3 February 2013 (UTC)
    You're right... --Nemo 11:54, 3 February 2013 (UTC)
    • I do not know if converting files is a good idea or not, but I think that putting a template in all them is worthless and quite disturbing. Furthermore, I can't find in these long discussion any consensus about doing it.--Pere prlpz (talk) 13:08, 3 February 2013 (UTC)
  • I support conversion of images that do not thumbnail correctly, but I am opposed to cluttering file description pages of images that display correctly with a notice about rather obscure technical detail. There is virtually no benefit to this and it will only irritate the majority of users (especially non-technical types). --Dschwen (talk) 15:18, 3 February 2013 (UTC)
Proposal:
  • Stop adding the template to those old images (As mentioned above it is unlikely to have any effect than irritate. Furthermore it is very unlikely that those files need conversion to Baseline anytime)
  • Remove the template from the images
  • Decide on what to do with newly uploaded progressive images... (Proposal: Do a bot run every week and convert all progressive JPEGs greater than [insert number here] MB to baseline. --93.132.125.226 15:38, 3 February 2013 (UTC)
+1, if number is 0.1. ;-) I'm all for the broadest automatic conversion possible, so that users almost never have to worry about all this. --Nemo 18:09, 3 February 2013 (UTC)
Users do not have to worry about them anyways. Don't fix it if it is not broken. --Dschwen (talk) 00:29, 4 February 2013 (UTC)
Whatever the number is, please do not forget to define what about images less than [number] MB too. Keep it untouched or add a template or just a hidden category? --Slick (talk) 19:02, 3 February 2013 (UTC)
Keep them untouched. It does not make sense to touch them. At the most you could use an invisible template or hidden category. --McZusatz (talk) 19:17, 3 February 2013 (UTC)
As I understand Nemo, the problem is not the missing thumbnail, but the need of resources on servers to create thumbnails for progressive jpegs. So the problem is not visible for "normal" user! So the problem are progressive in general, independent of size. So I can understand Nemo if the like to convert all images. And the other hand I can understand the "normal" user, who see a fine thumbnail and have no knowledge about the servers load. The current progressive jpeg template may explain the wrong problem. It now explain "missing thumbail for large progressive jpegs". So currently this template is wrong on small images, right. It should be like "this progressive jpeg increase the servers load. We have to convert it so save our servers hardware." so everybody can understand the problem is independent of the size. Then there is not need to discuss about a file size and we can convert all to save the hardware. Or we find now a usable filesize (i.E. 3 MB) und files greater are converted by bot and less then keep untouched. But as I understand the original request and Nemo, this does not resolve the problem of servers load. And we like to solve this problem, or not? --Slick (talk) 11:48, 5 February 2013 (UTC)
Don't worry about the servers - the fact progressives take a bit longer to generate is not an issue worth fixing. An en-masse conversion of the small files will generate more work for the server (will need to create new thumbs).
The problem is when the progressive is so large that the server cannot correctly generate a thumbnail.--Nilfanion (talk) 13:53, 5 February 2013 (UTC)
As I said: This only makes sense for newly uploaded files. We should agree on one file size to be converted to baseline. Currently there are three proposals: 0.1; 3 and 5 Mb. --McZusatz (talk) 17:06, 5 February 2013 (UTC)

Filesize is a nonsense criterium. The success or failure of thumbnailing is due to RAM limitations, which in turn are due to pixel count. I suggest Reupload anything above 20 Megapixels. If you want to lower the limit Show me a file at your proposed limit that does not correctly thumbnail! As simple as that. --Dschwen (talk) 18:08, 5 February 2013 (UTC)

As multiple users said: There is no problem with the thousands and thousands existing images. Please remove the bot spam from all these description pages. It is completely useless. It's like a template "warning, this image uses a lossy format, consider using PNG" or "warning, this image uses colors and may not be accessible for colorblind users". This is pointless. From what I know it's not possible to convert all images without loosing quality. Even if the conversion is lossless it adds nothing. All it does is wasting space and cluttering our histories and watch lists. If there is a problem with a specific image being progressive you should fix that single image, but not all other images that don't have a problem. Create a template and a bot like Template:Rotate. Fix the broken images. Add a hint to the file upload form about large progressive images or simply block uploading large progressive images. --TMg 20:11, 5 February 2013 (UTC)

Ok, I merge the opinions above

  • Stop adding the template to the old images - IMHO we agree at this point - since some days the bot is stopped already.
  • Remove the (current set) template from the images - IMHO we agree at this point - I will do this ASAP now
  • do not touch the existing images less than 5 MB (greater than 5MB are already converted) - IMHO we agree at this point
  • a limit in pixel is better than a limit in MB - IMHO that is the best solution, so we should agree here

IMHO, we not agree in this points yet:

  • What is the limit - 20 Megapixel or less or higher than - to do any action?
  • Should we convert all (progressive) images above this limit automatically (by bot) or should only convert images without valid thumbnails (because of the bug) (then the limit is senseless)?
  • Would we add all progressive images (independence of a limit in size or pixel) to a hidden category (without a visible template) (maybe to inform the user) ? (I think no, but I am not sure. But it is better than to add a comment on any uploaders talk page))

--Slick (talk) 16:15, 7 February 2013 (UTC)

The limit is pretty much pointless. Only convert if the thumbnail is broken. Do not touch images or image pages of progressive JPGs that correctly thumbnail at all. --Dschwen (talk) 16:54, 7 February 2013 (UTC)

A more productive exercise would be simply monitor new uploads. If the file is both a progressive JPG and big - big defined as something like "over X MB" or "over X megapixels" - tag it with a template. A human can review and either (1) remove the tag as unnecessary, (2) convert the file and remove the tag or (3) change the tag to one saying conversion is needed (to be done by bot or another user). If the bot can reliably detect when thumbnail failure occurs itself, then human review can be skipped and the bot could either tag for conversion, or immediately convert.

This process ought to identify any problematic files, only convert files that need it, and any templates would be both temporary and only applied to files where there is a realistic chance of problems.--Nilfanion (talk) 02:01, 8 February 2013 (UTC)

Ok, because there are not further opinions yet, I suggest this (in summary) now:

  • bot monitor new uploads and IF a new image
    • greater than 20 MPixel OR greater than 20 MByte (both independent if there is a valid thumbnail)
    • AND is a progressive JPEG
  • THEN (there is realistic chance of problems, independent if there is currently a valid thumbnail) the bot will add Template:Use baseline, so a human can review (to convert the file by hand and/or just to remove the template)

Currently I do not know how many new images will match by this, but I think are are not a lot of. Only if there are a lot of, we should create a additional category i.E. Category:Progressive mode JPGs to be saved in Baseline mode by bot so a human can move them here after review to convert automatically if necessary. Confirm? --Slick (talk) 10:18, 13 February 2013 (UTC)

Yes. Would be nice to have a rough view about how many files are affected. --93.132.81.103 13:51, 17 February 2013 (UTC)
I did a try run with all uploaded images in the last 14 days. 32 images were found that match (greater 20MB or 20MPixel and progressive). (~2 images/day) Only one of them have a broken thumbnail (I add the template manually already). I am unsure furthermore what to do now. Start as suggested above or not? --Slick (talk) 08:04, 19 February 2013 (UTC)
I did some tests with progressive JPEGS recently but I could not set a precice limit of MP where files begin to break. It seems to depend on the file structure itself, too. As there are only 2 files a day filtered out by your bot, let him convert all the files he finds directly. IMO it is not worth to do more discussion and waste time by fiddling around with technical details which are quite vague sometimes or create further templates/categories which increase the workload of contributors for this rather minor issue. --McZusatz (talk) 19:03, 19 February 2013 (UTC)
It is not hard to test if the files picked out through this threshold do thumbnail correctly. A bot that unnecessarily reuploads files that are not broken is unlikely to get approval. --Dschwen (talk) 19:21, 19 February 2013 (UTC) P.S.: progressive seems to be a new default setting in recent versions of the GIMP!!
Yes it is possible to detect only files with broken thumbnails, but the chance to get a False-Positive is bit higher. So IMHO a human review is necessary before any automatic convert. On the other hand, for (as currently known) ~1 image/14 days it is absolute senseless to run and support a bot IMHO. I like to run the bot if it have a bit more to do (like suggested above, 20MB/20MPixel, convert or add a template) but I will not run a bot to identity ~2 images/month only (with broken thumbnails where everybody can see there is trouble) --Slick (talk) 07:39, 20 February 2013 (UTC) --Slick (talk) 19:35, 20 February 2013 (UTC)
Ok, I was thinking about again and there is a new goal for the bot, see here (bottom), so I like to do as suggested by Dschwen. (identify and convert only broken progressive jpegs). So I will rewrite the bot now and when all is running fine this task can close in my eyes. --Slick (talk) 19:35, 20 February 2013 (UTC)
This section was archived on a request by: McZusatz (talk) 20:31, 21 February 2013 (UTC)

request

Remov the {{rename}} template from all fiels in Category:Media renaming requests needing target (except Category:Al Jazeera files with bad file names‎), rename is not appropriate for all fiels. Thank You--Steinsplitter (talk) 15:02, 20 February 2013 (UTC)

Can you provide more background on how so many images ended up with incorrect {{Rename}} templates? (if it is easier, you can write in German). --Jarekt (talk) 15:40, 20 February 2013 (UTC)
the immages was upload by bot (bot has uploaded the immages with the "rename" template). see: 1. Ther is no valid reaso set for reneam the immages. --Steinsplitter (talk) 15:48, 20 February 2013 (UTC)
in german: Ich halte es für eine Fehlfunktion bzw. falsche Konfiguration des Bots, einige Bilder benötigen wirklich eine umgenennung aber nicht alle. Es wär ein riesiger Aufwand alle Bilder per Hand durchzuckecken. Dazu wurde kein Triftiger grund angegeben wiso die Bilder umbenannt werden sollen. Massenhafte rename-Anfragen sollten generell in eine eigene Kategorie und nicht in die Hauptkategorie. --Steinsplitter (talk) 16:02, 20 February 2013 (UTC)
OK, I will remove {{Rename}} from files from Category:Files from Simpio96 stream. I also added Category:Files from Simpio96 stream to Category:Media renaming requests needing target. Please keep it there until filenames in this category are verified and possibly corrected. --Jarekt (talk) 17:54, 20 February 2013 (UTC)
✓ Done --Jarekt (talk) 20:32, 20 February 2013 (UTC)
I think we need at least another category to keep track of already renamed files. Otherwise files with "wrong" and "correct" names are stored in the same cat. and file movers are unable to efficiently move files. --McZusatz (talk) 21:11, 20 February 2013 (UTC)
I d not speak the language the filenames are written in, but most do not seem wrong to me. I think that you can just eye-ball the names and change the strange ones. However is additional categories are needed they are easy to add. Possibly anybody with VisualFileChange can do it without a bot? --Jarekt (talk) 12:32, 21 February 2013 (UTC)
Thank You!--Steinsplitter (talk) 21:51, 20 February 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 20:44, 25 February 2013 (UTC)

Dates on images

Some of the information templates in the above category have the date from Flickr. Either this should be removed (sample) or replaced by the date found elsewhere in the description (sample). --  Docu  at 02:00, 13 February 2013 (UTC)

✓ Done Good idea. --Jarekt (talk) 17:03, 5 March 2013 (UTC)
This section was archived on a request by: Jarekt (talk) 17:03, 5 March 2013 (UTC)

Shtooka typo in spelling files

Task summary: Replace Sktooka by Shtooka.

Task details: in Category:English pronunciation, replace the word Sktooka by the expression [http://shtooka.net/ Shtooka] (a link allows more easily to understand what Shtooka is).

Note: There are too many files in the category to do this replace with VisualFileChange. --Dereckson (talk) 16:15, 25 February 2013 (UTC)

I will file my bot to do this later today. Although, it is up to the community to decide if "Shtooka" is linked or not. Riley Huntley (talk) 18:20, 25 February 2013 (UTC)
Filed at Commons:Bots/Requests/RileyBot 2. Riley Huntley (talk) 18:29, 25 February 2013 (UTC)
My suggestion here would be to link this to Commons:Shtooka and write a short description there. This page could point out why using this program/site is a good choice for commons users and should of course include a link to the webpage. I prefer wikilinking over external linking in a case like this, where we are talking about hundreds of link occurrences. --Dschwen (talk) 18:54, 25 February 2013 (UTC)
  • That works for me, if we are going to go with the wikilink; we can go ahead with this task without waiting for Commons:Shtooka to be created since it will just be a red link until it is created (which shouldn't take long). Dereckson, would you be able to create the page? Riley Huntley (talk) 19:07, 25 February 2013 (UTC)
  • So there are about 3500 occurrences of the type Sktooka. It is probably best to change that to the correctly spelled external link first. And then we can talk about wikilinking it. Changing everything to wikilinks would increase the magnitude of the task by more than one order. --Dschwen (talk) 22:22, 25 February 2013 (UTC)
  • Yeah, I'd remove them. The linking makes the term stand out enough, the apostrophes look a bit odd around the linked version. But that is probably a matter of taste. --Dschwen (talk) 00:51, 26 February 2013 (UTC)
That sounds good. --Jarekt (talk) 17:04, 5 March 2013 (UTC)
Agree, sounds good. --Dschwen (talk) 22:26, 6 March 2013 (UTC)
This section was archived on a request by: Riley Huntley (talk) 14:36, 8 March 2013 (UTC)