Wikipedia:Bot requests/Archive 9
This is an archive of past discussions on Wikipedia:Bot requests. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
Archive 5 | ← | Archive 7 | Archive 8 | Archive 9 | Archive 10 | Archive 11 | → | Archive 15 |
Dead link bot?
Is it possible to make a bot that checks to see if links are dead? The Placebo Effect 01:38, 14 December 2006 (UTC)
- Yes, but it would be quite complicated. The bot would have to ensure that it wasn't a problem with your connection, and that it wasn't just a temporary server outage. —Mets501 (talk) 01:46, 14 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- I could code something like this in Perl, using POE, if needed, but I wouldn't be able to run it long-term. --Jmax- 14:52, 18 December 2006 (UTC)
- The best way to do this is probably check the links and then post what the bots summary of the links are on the talkpage. I can program in java but i have no clue how to make a bot that checks websites and evaluates them. The Placebo Effect 02:36, 21 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Wikipedia). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirittalk 03:19, 21 December 2006 (UTC)
- I was assuming it would check all the external links in an article, then post on the article's talkpage. The Placebo Effect 03:25, 21 December 2006 (UTC)
- Yup, should be easy to do that, my point was that going through every single article from A-Z checking links will take a fair amount of time, and isn't too 'friendly' to the server ^_^ ShakingSpirittalk 03:31, 21 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago [1] to prospectively archive (cache) all cited URLs on Wikipedia, which is ridiculously easy using WebCite [2]. Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links [3] [4] as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- To the best of my knowledge, there's no easy way to check when the link was added short of going though each edit in the page history and scraping it; a solution which is ugly, and wastes both the bot user's bandwidth and the server. I have, however, come up with another idea ^_^ ShakingSpirittalk 00:38, 27 December 2006 (UTC)
- EDIT: I was wrong; you can grab the page history in a bandwidth and phrasing friendly manner. Still, personally I don't think every dead link should be automatically replaced with an archived version, as sometimes the information the link contained is out of date - sometimes links go dead for a reason! I'd like to hear others' opinions ^_^ ShakingSpirittalk 00:44, 27 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago [1] to prospectively archive (cache) all cited URLs on Wikipedia, which is ridiculously easy using WebCite [2]. Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links [3] [4] as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Wikipedia). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirittalk 03:19, 21 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
I would happily code something for this, However, I have concerns regarding WMF policy on using webcite and other proprietary methods of caching web sites. -- Jmax- 09:22, 27 December 2006 (UTC)
Please look at Wikipedia:Dead external links — Iamunknown 01:50, 29 December 2006 (UTC)
Like MMORPG project, we need a bot to add some tags for our banner. The banner is Template:SGames or {{SGames}} and is located here. We have a list of categories to put them in, and they are
- Category:Real-time strategy computer games
- Category:Turn-based strategy computer games
- Category:Age of Discovery computer and video games
- Category:Free, open source strategy games
- Category:Panhistorical computer and video games
- Category:Abstract strategy games
- Category:Chess variants
- Category:Tic-tac-toe
- Category:Strategy computer games
- Category:Real-time tactical computer games
- Category:Economic simulation games
- Category:Strategy game stubs
- Category:City building games
- Category:God games
If someone could make a bot or teach us how, that would be great. Thanks, Clyde (talk) 02:00, 21 December 2006 (UTC)
- I can do that for you Saturday - to confirm, you'd like {{SGames}} added to all articles in those categories? If there are subcategories, should I include them? ST47Talk 02:07, 22 December 2006 (UTC)
- Um yes the template is correct, and I found some uneeded subcategories, though I'm still going through them. I found that Category:Virtual toys doesn't fit. I also removed Category:Chess and Category:Strategy (they have too much nonrelevant info) so if you find those as subcategories, don't add them. Actually, are you more experienced with this? Would it be better just to not add to any subcategories, and I'll personally add them later? What's your call?--Clyde (talk) 05:30, 23 December 2006 (UTC)
- Sorry for the delay, comcast decided to stab my internet. I'll start that now, without subcategories just in case. ST47Talk 15:35, 27 December 2006 (UTC)
- Okay thanks.--Clyde (talk) 15:53, 28 December 2006 (UTC)
Image bot idea
Hello. I have had a bot idea. Would it be possible for a bot to scan through every image that comes under Category:Non-free image copyright tags - of which there are tens of thousands - and flag all those that are over a certain file size / set of dimensions / resolution? This is as fair use only allows for a low resolution image, and I have apotted a veritable crapload that are nowhere near being low resolution.
A very clever bot could then automatically downscale and overwrite the image with a lower res version, and automatically leave a note on the talk page of the original uploader.
Is this even technically feasible? Proto::► 13:19, 27 December 2006 (UTC)
- It's possible, but I don't think it's a feasible. Low resolution is quite subjective. (subjective to the original image that is). Then again, I guess a 2 meg image is likely not "low resolution." ---J.S (T/C) 04:49, 29 December 2006 (UTC)