deidetected.com, a self-published source potentially used for harassment

edit

This website launched and run by the creator of the "Sweet Baby Inc detected" Steam curator would fall under the definition of a self-published source on Wikipedia. The Steam curator has been linked to the harassment campaign against Sweet Baby Inc. by reputable sources like PC Gamer, The Verge, and multiple others.

Wikidata has a page for the website, with the website linked via the described at URL property, by User:Kirilloparma on more than one if not every occasion. Even within the scope of that source, it is done in a very targeted way in that the website seems to be added to the Wikidata pages only when the game is recommended against at deidetected.com (e.g. The First Descendant, Abathor, Valfaris: Mecha Therion recommended as "DEI FREE" by deidetected do not have the property set). Based on that, its goal of harassment or POV pushing appears to be evident.

Does Wikidata have any guidelines that would explicitly allow or disallow this behavior or the coverage of deidetected.com at all? Daisy Blue (talk) 09:45, 14 September 2024 (UTC)Reply

There is no policy on WD for blacklisting websites for other than malicious cases such as spam or malware Trade (talk) 11:59, 14 September 2024 (UTC)Reply
Now from having read the property description for described at URL on its talk page, which explains that it's for "reliable external resources", I'm convinced the website has no place on Wikidata, as it's not a reliable source (at least not per the guidelines of Wikipedia (WP:RSSELF)). What is the best place to initiate its removal without having to start a potential edit war? A bot would also do a more efficient job at removing it from all the pages. Daisy Blue (talk) 12:03, 14 September 2024 (UTC)Reply
You might have more luck if you stopped bringing up Wikipedia guidelines and used the Wikidata ones instead Trade (talk) 00:09, 15 September 2024 (UTC)Reply
Wikidata itself cites the Wikipedia guidelines on self-published sources (and on original research). Daisy Blue (talk) 05:04, 15 September 2024 (UTC)Reply
English Wikipedia policy is im many cases useful to decide what should be done in Wikidata (e.g. which sources are reliable), but should never be considered normative and have no more authoritativeness than policies in any other project. GZWDer (talk) 06:37, 15 September 2024 (UTC)Reply

This could be used to mass undo 18 of the edits that introduced the links, but it's not progressing for me when trying. Daisy Blue (talk) 11:14, 15 September 2024 (UTC)Reply

Seems like a low-quality, private website that doesn't seem to add anything of value to our items. There are countless websites out there, but we generally don't add every single site via described at URL (P973) just for simply existing. IIRC, there were various cases in the past where users added unreliable websites to lots of items, that were then considered spam and deleted accordingly. And if the site's primary purpose is indeed purely malicious and causing harassment, there's really no point in keeping it. Best to simply put it on the spam blacklist and keep the whole culture war nonsense out of serious projects like Wikidata. Additionally, DEIDetected (Q126365310) currently has zero sources indicating a clear lack of notability. --2A02:810B:5C0:1F84:45A2:7410:158A:615B 13:50, 15 September 2024 (UTC)Reply

I've already nominated that and Sweet Baby Inc detected for deletion citing the same reason, though specifically for the curator, one could stretch point 2 of Wikidata:Notability to argue against it, but I'm not sure what value it would bring to the project apart from enabling harassment and its use to justify any other related additions. Daisy Blue (talk) 16:06, 15 September 2024 (UTC)Reply
Just add this website to the spam blacklist, no one will be able to add links to this website on Wikimedia projects anymore. Midleading (talk) 17:18, 16 September 2024 (UTC)Reply
What's the proper venue for proposing that? Also, seeing how you have a bot, could you suggest a quick way to mass remove the remaining instances from Wikidata? I've already undone a number by hand but it's not the greatest experience. Having the knowledge may also help in the future. Daisy Blue (talk) 18:24, 16 September 2024 (UTC)Reply
On the home page of Meta-Wiki, click Spam blacklist, and follow instructions there.
To clean up links to this website, I recommend External links search. A WDQS search is likely to time out. I also recommend reviewing each case manually, sometimes the item should be nominated for deletion, but tools can't do that. Midleading (talk) 01:27, 17 September 2024 (UTC)Reply
Thanks. I'll remove the rest by hand then. As for the Wikimedia spam blacklist, it says that "Spam that only affects a single project should go to that project's local blacklist". I'm not sure if there have been any attempts to cite deidetected on Wikipedia or elsewhere. We can search for the live references (there are none) but not through the potential reverted edits, I don't think. Daisy Blue (talk) 07:33, 17 September 2024 (UTC)Reply
Well, you may request this website be banned on Wikipedia first, then you may find some users who agree with you. Midleading (talk) 08:45, 18 September 2024 (UTC)Reply
I believe Wikipedia has the same policy in that if it hasn't been abused (and I wouldn't know if it has been specifically on Wikipedia), then there is no reason to block it. On Wikidata, as it stands now, the additions come from one user, Kirilloparma, who pushed back on my removals here but hasn't reverted. Unless it becomes a sustained effort by multiple users, it will come down to whether Kirilloparma concedes that described at URL is for reliable sources and the website is not a reliable source. Daisy Blue (talk) 12:14, 18 September 2024 (UTC)Reply
For some reason Kirilloparma keeps making points on the subject on the Requests for deletions page rather than here (despite having been informed), now arguing that the short property description takes precedence over the property documentation on the talk page, which is dismissed as "outdated". Daisy Blue (talk) 09:29, 20 September 2024 (UTC)Reply
  • Wikidata has items for many websites even if those websites are worthy of criticism. Knowing that "Sweet Baby Inc detected" is linked to "DeiDetected" is useful information even if both of those sources would be completely unreliable.
I don't see any use of links to deidetected.com within Wikidata where it's used for the purpose of harassement which would justify putting it on a blacklist. ChristianKl13:09, 26 September 2024 (UTC)Reply
The whole purpose of that website is to incite harassment, so intentionally linking to it within Wkkidata directly contributes to that problem. --2A02:810B:5C0:1F84:2836:F2FD:EE77:CF71 19:38, 28 September 2024 (UTC)Reply
@ChristianKl: Quite frankly, your comment is insensitive and I agree with the IP. Note that the OP did say that the only edits adding them have been to "recommended against" games' items, so your point does not stand I'm afraid. Other than information on the sites themselves, we really should not provide "described at" claims linking them to people. Such is arguably a gross violation of Wikidata:Living people.--Jasper Deng (talk) 19:41, 28 September 2024 (UTC)Reply
What part of Wikidata:Living people do you believe is violated here and by which edits?
Instead of focusing on what the OP said, why don't you look yourself to get an impression of what we talk about?
The OP asked for the item to be deleted. Currently DEIDetected (Q126365310) does link to Sweet Baby Inc detected (Q124830722). The described at URL (P973) claims on Sweet Baby Inc detected (Q124830722) seem to me like the go to relatively neutral sources like Wired saying things like "Although early efforts began on sites like notorious harassment hub Kiwi Farms last year, much of the misinformation about Sweet Baby has coalesced around Sweet Baby Inc Detected, a Steam curation group that bills itself as “a tracker for games involved with” the company." ChristianKl13:40, 2 October 2024 (UTC)Reply
I don't oppose the existence of these items and the existing claims you quoted. It is when these claims are added to particular games' items that it begins to create problems for the game's developers by inviting harassment targeted around their alleged ties to Sweet Baby and other organizations.--Jasper Deng (talk) 18:18, 2 October 2024 (UTC)Reply
@Kirilloparma: Please do not reintroduce any of these links in the future. Doing so is a violation of Wikidata:Living people on the grounds of privacy.--Jasper Deng (talk) 19:47, 28 September 2024 (UTC)Reply

I have boldly block-listed the domain on Wikidata. In accordance with the Wikimedia Foundation DEI principles, linking a low-quality harassment site in a way that causes LP violations is not appropriate. Exceptions, such as for items on articles covering the site, can be handled using edit requests. I request that the blacklisting stand unless an explicit consensus rises against it.--Jasper Deng (talk) 20:05, 28 September 2024 (UTC)Reply

Dict: protocol

edit

Do we have a server for the dict: protocol, as described in this blog post and at DICT?

Curiously, if I type dict:cheese in the search bar here, I am taken to https://www.wikidata.org/wiki/Special:GoToInterwiki/dict:cheese (and similar if I do so on en.Wikipedia, etc.*), which displays:

Leaving Wikidata

You are about to leave Wikidata to visit dict:cheese, which is a separate website.

Continue to https://www.dict.org/bin/Dict?Database=*&Form=Dict1&Strategy=*&Query=cheese}}

and not to a Wikidata entry (nor a Wiktionary page**). Can we get that changed?

[* doing so on fr.Wikipedia still takes me to an English definition; does it do so for people whose browsers use other languages?]

[** Also raised at wikt:Wiktionary:Grease pit/2024/September#Dict: protocol]. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:15, 18 September 2024 (UTC)Reply

See phab:T31229.--GZWDer (talk) 16:02, 18 September 2024 (UTC)Reply
It's kind of retro-sexy but also incredibly niche. Since the WMF offers servers to useful projects you could set up a server that offers wikidata or wikipedia over gopher, telnet (BBS-style or non-interactive) or dict. I didn't intend to register a developer account, but for something fun like this I might change my mind. I can hardly think of a better way of procrastinating. Be warned though, I might be compelled to do the implementation in D (language designed by walter bright and andrei alexandrescu, both C++ heavyweights) just because I want more experience with it. I recon a single docker instance will do which doesn't require much formality, so this this could be up and running without too much delay. Main thing would be agreeing on how the protocols should be queried. Infrastruktur (talk) 17:39, 19 September 2024 (UTC)Reply
Had a quick look at it. I guess dict protocol makes more sense for Wikipedia and Wiktionary than it does for Wikidata, as Wikidata doesn't have short definitions. Seems most dict servers serve a unicode text file that consists of a key-value pair of dictionary entry and its definition. If we don't expect much traffic I think an approach where we skip compiling a dictionary and merely act as a gateway transforming the first paragraph of Wikipedia articles into pure text, stripping out any templates, should be sufficient. Might also scrap the plan to use D for this and just use good old Python. Looks like lookups will also be exact, so no search suggestions or anything like that. No user authentication required either, but people might like support for encryption.
On a related note I use bang codes in my browser bar to look up stuff. If DuckDuckGo is configured you can just type "!wd Q12345", "!wen Marco Polo" or "!mw gourd" to look up stuff quick. Lots of dictionaries supported [1]. Infrastruktur (talk) 17:45, 25 September 2024 (UTC)Reply
@Infrastruktur: "Wikidata doesn't have short definitions" On cheese (L4517), for example, at L:L4517#S1, I can see "milk-based food product". But then we also have cheese (L331133),so I guess we would need to query all lexemes with a label matching the desired string. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:04, 25 September 2024 (UTC)Reply
Forgot about lexemes. It might be easier to handle lexemes with the same name than it is to handle Wikipedia disambiguation pages, where it's not clear which of the links is a definition of the word. This protocol is all new to me so lots of things to figure out still. Infrastruktur (talk) 23:43, 25 September 2024 (UTC)Reply

aircraft engine (Q743004) and models (also many classes similar to aircraft engine (Q743004))

edit

The class aircraft engine (Q743004) has problems in Wikidata. aircraft engine (Q743004) is a subclass of physical object (Q223557), which means that its instances are physical objects. But almost all the instances of aircraft engine (Q743004) are not physical objects, instead mostly being aircraft engine models like Poinsard (Q7207885).

What should be done to fix this problem? The simplest fix would be to just make aircraft engine (Q743004) no longer be a subclass of physical object (Q223557) and other classes that cause similar problems, perhaps by replacing aircraft engine (Q743004)subclass of (P279)aircraft component (Q16693356) with aircraft engine (Q743004)is metaclass for (P8225)aircraft component (Q16693356). But that would leave all the labels and descriptions for aircraft engine (Q743004) as is, and not corresponding to the actual intent of the class. Changing just the English label and descriptions would be possible but would cause a difference in the meaning of the labels in different languages. Adding an English value for Wikidata usage instructions (P2559) would help a bit not would not solve the mismatch. Changing all the descriptions doesn't seem immediately possible.

A variation of the first option would be to add a new class for aircraft engines, transfer all the labels and descriptions and aliases to the new class, correctly place the new class in the Wikidata ontology, give the new class appropriate label and description and aliases, and make the model instances also be subclasses of the new class.

Another option would be to make all the aircraft engine models that are currently instances of aircraft engine (Q743004) subclasses of it instead, perhaps in conjunction making the models instances of some suitable metaclass like engine model (Q15057021). This option probably requires many more statment changes than the previous approaches. If this is done, adding an appropriate English value for Wikidata usage instructions (P2559) on aircraft engine (Q743004) seems indicated.

Given what I have seen in related classes, I expect that there are many classes that have the same problem so perhaps the best way forward is to consider the problem in general and come up with a general solution.

Does anyone have preferences between these approaches? Does anyone have a different approach to fix this problem? Does anyone know what is the best way to gather a community that could come up with a consensus decision? Peter F. Patel-Schneider (talk) 15:32, 18 September 2024 (UTC)Reply

the standard solution would be to create a new item "aircraft engine model" as a metaclass for aircraft engine. Fgnievinski (talk) 23:50, 19 September 2024 (UTC)Reply
We'd save a lot of work generating a parallel *_model hierarchy if something could be a instance of (P31) of product model (Q10929058) and a subclass of (P279) (or some other property) of its functional class. Vicarage (talk) 00:17, 20 September 2024 (UTC)Reply
Care to exemplify Fgnievinski (talk) 02:03, 20 September 2024 (UTC)Reply
Peter and I are working on a RfC. Vicarage (talk) 20:51, 21 September 2024 (UTC)Reply
Wikidata:Requests for comment/object vs design class vs functional class for manufactured objects now ready for comment. Vicarage (talk) 14:15, 26 September 2024 (UTC)Reply
Sadly I think the first thing you need to do is make sure aircraft engine (Q743004) is used as a subclass of (P279), not the 237 times its currently used as a instance of (P31). Once that's done, I don't see why you can't leave the subclass of (P279) aircraft component (Q16693356) alone, and work up the chain to find a point where a design idea erroneously becomes a physical thing. Naively it seems to me you can have design all the way down to a final instance of (P31) for a single object. But you know a lot more about ontology than me. Really it should be called "aircraft_engine_model", like we have for weapon model (Q15142894) but I don't like the idea of doubling up adding _model to every concept, to have 2 parallel trees. I see we have engine model (Q15057021), so perhaps as we do with weapons, we make the 237 aircraft engines use that, and keep looking for generic terms at the level above a specific design.

Vicarage (talk) 16:31, 19 September 2024 (UTC)Reply

Yes, I'm coming around to this approach of moving lots of items from instance to subclass of aircraft engine (Q743004), except that there might be a few instances that are actual physical objects (like aircraft engines in museum collections). That's probably the most actual changes but probably has the least changes to the Wikidata ontology. Peter F. Patel-Schneider (talk) 16:52, 19 September 2024 (UTC)Reply
Those actual objects should either at a pinch have P31 of physical_object directly, or for museum items, item of collection or exhibition (Q18593264). I expect we can come up with useful reasons why we care about a physical instance of a manufactured design. Vicarage (talk) 16:58, 19 September 2024 (UTC)Reply

Enabling the CampaignEvents Extention on Wikidata

edit

The Campaigns Product team at the Wikimedia Foundation is proposing to enable the CampaignEvents extension on Wikidata by the second week of October.

This extension is designed to make it easier for organizers to manage community events and projects on the wikis, and it makes it easier for all contributors to discover and join events and projects on the wikis. Once it's enabled on Wikidata, you will have access to features that will help with planning, organizing, and promoting events/projects on Wikidata.

These features include:

  • Event Registration: A tool that helps organizers and participants manage event registration directly on the wiki.
  • Event List: A simple event calendar that shows all events happening on the wiki, particularly those using the Event namespace. It will also be expanded soon to have an additional tab to discover WikiProjects on a wiki.
  • Invitation Lists: A feature that helps organizers identify editors who might be interested in their events, based on their editor history.

Please note that some of these features, like Event Registration and the Invitation List, require users to have the Event Organizer right. When the extension is enabled on Wikidata, the Wikidata admins will be responsible for managing the Event Organizer right on Wikidata. This includes granting or removing the right, as well as establishing related policies and criteria, similar to how it’s done on Meta.


We invite you to help develop the criteria/policy for granting and managing this right on Wikidata. As a starting point for the discussion, we suggest the following criteria:

  1. No active blocks on the wiki.
  2. A minimum of 300 edits on Wikidata.
  3. Active on Wikidata for at least 6 months.


Additional criteria could include:

  1. The user has  received a Wikimedia grant for an event.
  2. The user plans to organize a Wikidata event.


We would appreciate your input on two things:

  1. Please share your thoughts and any concerns you may have about the proposal to enable the CampaignEvents extension on Wikidata.
  2. Review the starting criteria listed above and suggest any changes or additions you think would be helpful.

Looking forward to your contributions - Udehb-WMF (talk) 16:00, 19 September 2024 (UTC)Reply

300 edits may be too low; Wikidata edits are generally very granular, so it's easy to make a lot of them. Maybe set the minimum at 1000? ArthurPSmith (talk) 18:04, 19 September 2024 (UTC)Reply
I think 300 or 1000 matters little. The rights also don't give much room to mess up, so it is okay to have a low bar. From the additional criteria, I think a grant is way too restrictive, but the plan to organize is a must. Why else would the rights be needed? Ainali (talk) 18:22, 19 September 2024 (UTC)Reply
I think the proposed criteria are reasonable. It is really hard to judge someone by the amount of edits because of the tools we are using on Wikidata. Perhaps we want to use a trial period for granting the rights (at least for less experienced users). We could grant it temporary for one year and renew it if it is still needed. --Ameisenigel (talk) 19:35, 19 September 2024 (UTC)Reply
Hello! As a staff from an affiliate, I'd suggest to add a criteria that bypasses the number of edits for staff that belongs to an affiliate. In the case of Wikidata it's always useful if they know the platform before running an event, but it could be among the responsibilities of a new member of an affiliate staff to organize an event. Other than that, the criteria seems to follow what other wikis are currently discussing or implementing. Scann (WDU) (talk) 12:48, 20 September 2024 (UTC)Reply
That's an interesting point that makes me question why we need an extra limit at all. Couldn't this right just be added to what autoconfirmed users can do? If someone misbehaves, it wouldn't be too much hassle to notice it and block, and the harm they can do wouldn't be any worse than being able to create items or pages in the Wikidata namespace. Ainali (talk) 13:38, 20 September 2024 (UTC)Reply
We could start of without an edit limit for Wikidata and see whether any problems arise that way. If problems arise we can still increase the limit later.
If I remember right there was in the past some grant funded event that produced a few problems with bad edits. Does anyone remember more and whether the people in question would have fulfilled the limits that are proposed here? ChristianKl20:23, 23 September 2024 (UTC)Reply
@ChristianKl: I think you mean Wikidata:Project chat/Archive/2023/12#Wikidata-related grant proposals and Wikidata:Administrators' noticeboard/Archive/2023/12#Recent crop of new Nigerian items. --Matěj Suchánek (talk) 07:44, 26 September 2024 (UTC)Reply
@Udehb-WMF: what do you think about the case Matěj linked to? Should we assume that the WMF is capable of not repeating that mistake in future grants for events? If so we wouldn't need an amount of edits of Wikidata as a limit. ChristianKl10:47, 26 September 2024 (UTC)Reply
Thank you for your comment/question, @ChristianKl.
I would like to clarify that the Event Organizer right and the CampaignEvents extension are not limited to grantees or events funded through grants. These tools are designed to help any organizer, whether they are running events, WikiProjects, or other on-wiki collaborations, to manage organizing more easily on the wiki.
The community will decide who can use these tools on their wiki. That’s why we are having this discussion now - The idea behind the edit count, as one of the qualifying criteria for the right, is that it could help show a level of experience and engagement on Wikidata. The 300-edit threshold I suggested was just to start the discussion, but the community will ultimately decide on the final criteria.
Exceptions could also be made for affiliate staff members, similar to how it's handled on Meta, since they may need access to these tools to carry out their roles. -Udehb-WMF (talk) 11:05, 27 September 2024 (UTC)Reply
With regards to the question on grants, the team confirmed responding to the questions raised in the past; Grants talk:Programs/Wikimedia Community Fund/Rapid Fund/Wikimedia Awareness in Nafada (ID: 22280836) - Meta . The team has also been stringent in its grant request review process and remains open to further improvement. Feel free to share your input on the grants talk page or connect directly with VThamaini (WMF) at vthamaini@wikimedia.org. -Udehb-WMF (talk) 17:40, 27 September 2024 (UTC)Reply
Thank you, @Ainali, for your comment/question.
The reason for the Event Organizer right, instead of giving it to all autoconfirmed users, is that this right grants extra abilities that are specifically useful for event or wikiproject organizers, but not necessary for all autoconfirmed users. These abilities include:
  • Sending mass emails to registered event participants using the event registration feature(see demo).
  • Collecting participant demographic data through the event registration feature(see demo)
  • Creating invitation lists based on user contribution history via the Invitation List feature (see demo)
As you can probably guess, the risk of abuse seems low with this right. However, it’s still important to give this right to people the community trusts - people who meet the community's defined criteria. This is why local admins are responsible for managing this right on each wiki. If the extension is enabled on Wikidata, only users with the Event Organizer right on Wikidata will have access to these extra features. -Udehb-WMF (talk) 11:02, 27 September 2024 (UTC)Reply
Good idea, if we can enable this extension, we may need to remove Wikidata:Account creators group?--S8321414 (talk) 12:34, 25 September 2024 (UTC)Reply
This is completely unrelated since the event organizer role is only for the usage of the events extension. Account creator is nearly unused on Wikidata. --Ameisenigel (talk) 15:21, 25 September 2024 (UTC)Reply
As a user who has used the Campaign extension in Meta, I'm happy to see it being enabled in Wikidata, especially with Wikidata's birthday approaching. Users will be able to use this for the upcoming birthday events on Wikidata. Since the right now allows users to create event pages and send mass messages to those who register for the event. I agree with @ChristianKl that there doesn't seem to be a need for a minimum edit count requirement. Many organizers may not have 300 edits or be active for 6 months on Wikidata. Affiliate members requesting this right may not meet these criteria. An endorsement from their affiliate group should be considered instead. Other users can also request the right with supporting links explaining why they need the right on Wikidata rather than Meta. Like in metawiki believe all the events created by the users will be listed at the Special:AllEvents page in WD. So this can be easily monitered can tracked.-❙❚❚❙❙ GnOeee ❚❙❚❙❙ 11:38, 27 September 2024 (UTC)Reply

New ticket about making Wikidata horizontally scalable

edit

Feel free to join the discussion about making Wikidata great and sustainable 🤩 https://phabricator.wikimedia.org/T375352 So9q (talk) 04:34, 23 September 2024 (UTC)Reply

It not a ticket about making it scalable. It's a ticket about wanting it to be scalable without understanding the reasons why Wikidata isn't. SPARQL-based databases don't scale horizontally the way a lot of other databases do. ChristianKl08:10, 23 September 2024 (UTC)Reply
Don't you think that's unnecessarily blunt? But well, graphs are very nice but they don't scale into eternity either. And I'm not sure how well SQL scales beyond single-server. A laymans naive impression is we might get two decades if we federate and get a better triplestore. But yeah, at some point, if we refuse to set hard guidelines for what we include (which I believe So9q have advocated for) we will eventually reach the point where graphs simply is no longer an option so a fundamental change is inevitable. As the CouchDB docs say "disks are cheap", but expanding from 3 indexes to a stupid amount also have a cost, although it certainly will scale, but it will also have lost some of its appeal. Infrastruktur (talk) 15:05, 23 September 2024 (UTC)Reply
So9q wrote a post claiming that he knows what the community wants without having done the work of figuring out what the community wants, in a case like that I do think a blunt statement is warrented. I don't think people should write in that way if they just speak about their own opinion.
As one of the lead CouchDB developers once explained to me, CouchDB has a philosophy of not allowing you features that don't scale. If you ask them "Why does CouchDB does not support feature X that MongoDB supports?" the standard answer is "Because there's no way to develop the feature so that it scales to really large datasets".
Disks are cheap and some problems are solved by having more disks. Storing data on WikiCommons for example is solved by simply having more disks and thus we could use "tabular data" more to offload some data off Wikidata. ChristianKl17:40, 23 September 2024 (UTC)Reply
Thanks for pointing that out. I will gladly copyedit the statements in question. Which are you referring to?
The issue here from my point of view is that very little discussion has happened here since 2019 about what the community wants.
Based on the very recent discussion about import-policy I conclude that the community does not want to limit the growth.
It wants the WMF to fix any scaling issues so we don't have to worry about technical limits or choosing to import some amount of information over another despite both being notable. So9q (talk) 09:05, 24 September 2024 (UTC)Reply
I think statements about what the community wants in a phabricator ticket should only be made if there's community consensus for a given position. You wrote "The Wikidata community does not want to bother or worry about technical limits". For my part, having more information about the technical limits so that we can optimize Wikidata to work better within the existing technical limits would be great.
Ideally, we would have a system that scales perfectly. Unfortunately, that's not possible. The fact that a system like Telegram can easily run on a NoSQL databases and thus scale does not imply that this is possible for a triple store that can be queried with SPARQL. If you want to Wikidata to scale horizontally in a way that makes it impossible to run SPARQL queries that currently run fine, there are likely going to be people in our community who think that this isn't worth it.
WMDE recently developed the "mul" datatype to reduce the amount of unnecessary edits that get made and information that's stored in the database. That's a decision that allows us to have more data overall. ChristianKl11:34, 24 September 2024 (UTC)Reply
I'm not talking about the sparql database per se. I know they don't scale well.
The graph split can be viewed as a kind of manual sharding of the graph database with the downside that it affects queries and thus the user which is undesirable, but hard to avoid I'm the case of Blazegraph (and perhaps any other graph database in existence) So9q (talk) 08:57, 24 September 2024 (UTC)Reply
I think User:ASarabadani_(WMF)/Growth_of_databases_of_Wikidata would be a better place to discuss things. Vicarage (talk) 15:11, 23 September 2024 (UTC)Reply
I disagree, the scalability issues reported in that page is a concern for the whole Wikidata community and wider ecosystem IMO.
Perhaps it should be moved to meta since a failure of the Wikidata mariadb cluster would effect all wikis that are linked to Wikidata which is all of them.
The technical and community health of Wikidata is concerning all wikis and thus the whole movement. So9q (talk) 08:51, 24 September 2024 (UTC)Reply
I followed up with two child tickets initiating a search for a replacement of the master-n-replicas mariadb setup is outdated and does not scale horizontally for both read and write operations.
Also it has issues like lack of automated failover, lack of features like sharding, self-healing nodes, etc.
See https://phabricator.wikimedia.org/T375472 So9q (talk) 08:54, 24 September 2024 (UTC)Reply
I got a response from the lead mediawiki backend operations engineer and a decline of the ticket and subtickets I wrote. See my response
As I note in the response the mariadb backend is NOT scalable and offloading all the scholarly articles to a separate Wikibase (which has not been funded or approved by the board yet, see the proposal) is NOT a viable long term solution.
Basically our engineers are using a 2005 database setup (master on a single machine with a few replicas) not geared to big data at all. It's NOT best practise as of 2024 and it's not going to get any better by sticking our heads in the sand and hoping for good luck (like the lead engineer seems to want along with a few optimizations to the table layout).
Soon enough we will reach 100M items again once @Egon Willighagen imports millions of more chemicals or someone imports all the named streets of the USA, Russia and Russia, all bridges in Sweden, etc.
We need the WMF board and tech team to consider ways forward and time is running out for wikidatawiki according to @ASarabadani (WMF) NOW.
I'm considering writing a letter to the new board alerting them to this precarious situation, you are very welcome to join me, write me an email through my user page or reach out to me in telegram. So9q (talk) 10:52, 26 September 2024 (UTC)Reply
The database architect of WMF seems surprisingly pessimistic when it comes to scaling a SQL database horizontally. I just replied in phabricator to one of this comments with a possible open source drop in replacement for mariadb.
I urge the readers and users of Wikidata to ask themselves, if a community member can find a solution to the problem stated by @ASarabadani (WMF) in his spare time in a few minutes browsing Wikipedia for distributed SQL database engines that are open source, why have the WMF engineering team which is highly paid not done anything about this since the scalability issues became common knowledge? Why are they so negative to community members pointing to possible solutions? Why are they so unwilling to reflect on their own architecture decisions?
What could be causing this? What has hindered a solution to be found since 2012? (they could have continuously projected the growth of Wikidata and tested their current setup with dummy data and forecasted that we would outgrow a single machine master mariadb database long ago). Why did they fail to do that?
Imagine having a technical management and team of lead engineers who would rather try to impose growth limits on our thriving community of 23k contributors (and millions of consumers world-wide of the data every month) than do their job and make sure the backend scales according to community needs and the vision of the foundation[horizontally 1]. Is that what is going on?
I wonder if this situation is known to the board and what consequences it is going to get. WDYT? So9q (talk) 12:04, 26 September 2024 (UTC)Reply
The fact that ASarabadani wrote the post, suggests to me that he's considering ways forward. Writing a letter to the WMF board suggesting that he isn't considering the problems because he closed your tickets, seem like unnecessary drama.
Basically, you claim that you have a better idea of the kind of work that would be needed to change the present code base to software like MySQL Cluster than ASarabadani does. I find it highly unlikely that this is true. If you write a letter to the board, I would expect that you are not going to convince them that you understand the MediaWiki code base and what would be required to change it to be horziontally scalable better than ASarabadani just because you read a few articles on Wikipedia about distributed SQL database engines.
Writing software new software COBOL is not "best practice". That doesn't mean that banks aren't still running on a lot of COBOL code. Changing legacy system is not easy.
The scalablity bottleneck that Wikidata had to deal with in 2019 was about the amount of edits that Wikidata is able to do per minute. It was not about the size of the SQL database. Focusing engineering resources on the SQL database would not have helped with resolving the bottleneck we had at that time.
When optimizing a system it's important to understand the bottlenecks that exist and focus on solving them. You make suggestions without having tried to understand the existing bottlenecks. ChristianKl12:37, 26 September 2024 (UTC)Reply
The scalablity bottleneck that Wikidata had to deal with in 2019 was about the amount of edits that Wikidata is able to do per minute. It was not about the size of the SQL database. Focusing engineering resources on the SQL database would not have helped with resolving the bottleneck we had at that time.
Are you sure? The master on a single server + replica setup helps scale read operations but not write operations. Moving to a distributed SQL database scales both write and read operations. So9q (talk) 13:08, 26 September 2024 (UTC)Reply
Changing the SQL database can only help scaling the write and read operations when the bottleneck is about the SQL database in the first place. When the bottleneck however is about the performance of the triple store, it doesn't help you at all. ChristianKl13:18, 26 September 2024 (UTC)Reply
Writing software new software COBOL is not "best practice". That doesn't mean that banks aren't still running on a lot of COBOL code. Changing legacy system is not easy.
I agree, but this situation is very different. I'm NOT talking about rewriting any code. The MediaWiki software is separated from the database. How the database distribute queries and sharding etc. is not affecting the code in any way AFAIK. That is why it is a drop-in solution that could be tested out in a weekend by anyone who wants. The only thing you need is two networked machines, a good internet connection and a bit of linux command line know how to load the data from the dumps to setup a wikidata clone on a distributed database. So9q (talk) 13:12, 26 September 2024 (UTC)Reply
AFAIK doesn't bring you very far when you don't know what you are talking about. If you ask ChatGPT, who also doesn't understand all roadblocks, it's able to give you a bunch of reasons why it would require a lot of work to change to MySQL Cluster such as limits of transaction size. ASarabadani is going to know a lot of other reasons why it's hard to simply switch databases. ChristianKl13:25, 26 September 2024 (UTC)Reply
When optimizing a system it's important to understand the bottlenecks that exist and focus on solving them. You make suggestions without having tried to understand the existing bottlenecks.
Are you sure? If I understood @ASarabadani (WMF)s information correctly the core problem is that the sheer size of the wikidatawiki tables makes it hard for the master and replicas to keep all the information needed to serve MediaWiki in a timely manner in RAM. Buying larger servers is not a solution because of the growth rate of the project. Distributing the load over multiple servers is the go-to industry solution when doing big data projects like Wikidata seem to have become. So9q (talk) 13:17, 26 September 2024 (UTC)Reply
While ASarabadani used to work on Wikidata (and WMDE) he's now at the WMF and chief database architect for MediaWiki.
As such the bottleneck that Wikidata faces that are outside of MediaWiki currently aren't his job. That does not mean that Wikidata does not have other bottlenecks that come from the triple store. If you look at the evaluation documents for choosing a new triple store for Wikidata, you find that amount of triples that those triple stores can store is unfortunately limited.
While there are technical solutions that require a lot of work that might allow MediaWiki to be horizontally scalable, implementing them would not result in the Wikidata Community not having to worry about our triple count. You don't get 100x growth out of the available triple store technology. ChristianKl13:47, 26 September 2024 (UTC)Reply
Wikidata will never be horizontally scalable. Asking who are the POTUS and asking who are male humans have no sematic difference. If there are as many POTUS as there are male humans, Wikidata will not be able to give an answer to either question. Midleading (talk) 09:41, 25 September 2024 (UTC)Reply
Let's make one thing clear. Wikidata is a MediaWiki-run wiki. MediaWiki supports nothing but a relational (SQL) database. Such databases are known not to be horizontally scalable. Therefore, Wikidata simply cannot be completely horizontally scalable. I can't imagine the amount of work needed to implement support for a (hybrid) NoSQL storage.
Note that this has actually nothing to do with the Wikidata Query Service split. These are, unfortunately, two different problems, which do have a common cause: Wikidata is becoming unsustainably large. This is the only thing we can do something about right now. --Matěj Suchánek (talk) 15:38, 26 September 2024 (UTC)Reply
There are many things that could be done. Currently, the knowledge about how various knowledge modeling decisions affect performance isn't readily available. Gathering that knowledge, writing it up and then bringing it up in relevant decisions would be helpful.
Initiatives like "mul" can free up capacity that we can use better otherwise. ChristianKl22:38, 26 September 2024 (UTC)Reply
We know long property chains is expensive but they are also handled efficiently so it all comes back to the size of the graph, ergo federation solves the problem. From my experience the community is unwilling to change their data model even if you present them with good reasons for why it makes sense they will refuse. You could or example insist that P131 would only go as low as municipality. But when you include neighborhoods the computational cost becomes unreasonable. Infrastruktur (talk) 16:36, 27 September 2024 (UTC)Reply
"But when you include neighborhoods the computational cost becomes unreasonable" how do you know?
Without good documentation about how costly various decisions happen to be it's hard to know whether an individual decision is worth the computational cost or whether that cost is unreasonable. ChristianKl10:27, 1 October 2024 (UTC)Reply
"How do you know?" If we just want to illustrate it without diving into the matters, that's quick enough. Germany have 13425 municipalities (according to Wikidata anyways). If we ask for a count of P131* of Q183 that will give us over a million items and and takes 30-50 seconds to run (three sample runs; and asked for count to exclude data transfer overhead). That leaves only 10-30 seconds for the rest of the query to do all the things it needs to do. If we ask for subclasses of watercraft that yields over 10000 items, and doesn't even take a second to complete. I didn't bother to look into the distribution, but that might also be interesting to look into if someone have the time. Infrastruktur (talk) 14:37, 1 October 2024 (UTC)Reply
If our goals is to be able to have as much items as possible on Wikidata, the computational cost we care about, is how much size items take up in the database and not how fast queries run.
If the goal is to be able to run more queries, buying servers that mirror the Wikidata Query Service is easily possible while you can't get the capacity to store more items simply by buying more servers. ChristianKl15:11, 2 October 2024 (UTC)Reply
If our goals is to be able to have as much items as possible on Wikidata, the computational cost we care about, is how much size items take up in the database and not how fast queries run.
If the goal is to be able to run more queries, buying servers that mirror the Wikidata Query Service is easily possible while you can't get the capacity to store more items simply by buying more servers. ChristianKl15:28, 2 October 2024 (UTC)Reply
I wonder how many users would be happy to have queries that ran in 2 or even 10 minutes, if they could be confident they wouldn't time out. This could be done with just more servers, but would be more useful if the server had an internal measure of task completion so it could abort early if the task was getting out of control, and that might require software changes. Vicarage (talk) 15:43, 2 October 2024 (UTC)Reply
"mul" can help, but it will be only a minor bit, in comparison with e.g. "Wikimedia category/template" descriptions. Just for consideration. --Matěj Suchánek (talk) 12:57, 29 September 2024 (UTC)Reply
@Matěj Suchánek Over the long term, I don't think the Query service needs direct access to descriptions and the job serving descriptions could be separated to a separate server. If WikiFunctions works better, it's possible that all these kinds of descriptions could be created over at a WikiFunctions driven server and cached there. ChristianKl14:22, 2 October 2024 (UTC)Reply

Refs:

  1. share in the sum of all knowledge

datatype of P5143

edit

Hi all. I have proposed changing the datatype of amateur radio callsign (P5143) from external ID to string, and I'd like to have a consensus about it before making any changes. If you are interested, go ahead and join the discussion on the property talk page. Thanks. Samoasambia 08:26, 25 September 2024 (UTC)Reply

A good way, it to ping all the people who commented on the creation of the property when proposing to change it. ChristianKl12:45, 25 September 2024 (UTC)Reply
Thanks for the suggestion. I did that now. Samoasambia 15:23, 25 September 2024 (UTC)Reply

FBI file numbers

edit

I’d like to add an FBI file number to a Wikidata profile, ( i.e. 100-HQ-34789, or 92-NY-1456, etc.). However, many FBI files were destroyed or are still classified, so I can’t link the file number to an external copy of the file in every case. I can provide a reference for each file number though.

  1. Is there an existing property, such as “Described by Source” or “Inventory Number”, that could be used for these numbers? If so, would it be best to create a new Item for each FBI file?
  2. If not, would this be appropriate for a new property (something like “Federal Bureau of Investigation File Number”), even if the file numbers won’t link to an external database or site?

Thanks! Nvss132 (talk) 10:40, 26 September 2024 (UTC)Reply

I think (2) is preferred, but you should probably start a property proposal to have a more in-depth discussion about this. I'm not entirely clear what these file numbers identify - they are for individual people? Can one person have more than one file number? Anyway a property proposal discussion would be a good place to clarify current options or whether we really should create a new property for this. ArthurPSmith (talk) 13:17, 27 September 2024 (UTC)Reply
Thanks for responding. After researching this weekend, I don’t think creating a new property will work anymore. Not every FBI file maps to a specific Wikidata item. (For example, FBI file 100-HQ-4869 is on the funding of the Communist Party while file 100-HQ-365088 covers the sale of foreign publications in America.) Since these subjects won’t correspond to one Wikidata item, I think the best solution is to create an item for each file, treating them like individual works. In addition, this also lets people use Template:P1343 to link individual people described in the file who are not the main subject of the file, such as a spouse being described in someone’s FBI file or when a file covers multiple members of an organization. Nvss132 (talk) 00:04, 30 September 2024 (UTC)Reply

Dereferencing missatributed Israel CBS Ids

edit

I cant figure out how to edit the pages and the bot which originally made the errors seems to have been dead for a couple years now. I added a comment on the talk pages but would appreciate if someone who knows how to do this would remove the properties.

https://www.wikidata.org/w/index.php?title=Talk:Q48195&oldid=2253153681 https://www.wikidata.org/w/index.php?title=Talk:Q121157&oldid=2253151994 Wissotsky (talk) 11:31, 26 September 2024 (UTC)Reply

You wouldn't have been able to edit those two items because of the protection ([2][3][4][5]) so it's possible the "edit" links don't appear. I removed them, the correct values were already on items for places in Israel. Everything with CBS IDs now seems to be somewhere in Israel or Israeli-occupied territories. Peter James (talk) 13:52, 26 September 2024 (UTC)Reply

RfC on object vs design class vs functional class for manufactured objects

edit

@Peter F. Patel-Schneider and I have been in discussion over how we distinguish for manufactured items a physical object, its design, and the function it performs. We propose a series of constraints on their instance and subclass properties, and a simplification of the parochial set of something_type, something_model and something_family classes. We have used military items as exemplars, but the approach would have much wider application. We would appreciate your views at Wikidata:Requests for comment/object vs design class vs functional class for manufactured objects. (talk) 14:26, 26 September 2024 (UTC)Reply

ID property for the actual WPBSA site (snooker association)

edit

It seems we have the WST.tv property: World Snooker Tour player ID (P4498) and the SnookerScores.net property: WPBSA SnookerScores player ID (P10857), but we do not have an ID property for wpbsa.com. It appears that wpbsa.com actually contains a significant amount of data, for example: Mark Allen on WPBSA, which is more than on: the same player on WST. Nux (talk) 19:07, 26 September 2024 (UTC)Reply

@Nux You can always propose a new property: Wikidata:Property proposal RVA2869 (talk) 12:55, 27 September 2024 (UTC)Reply
Thanks for the tip :).
Vote or discuss here: Wikidata:Property proposal/WPBSA com player ID :) --Nux (talk) 21:21, 27 September 2024 (UTC)Reply

Wikidata MOOC For Beginners (in English) - Starting October 1, 2024!

edit

Hi everyone,

A rerun of the Wikidata Open Online Course will kick off on October 1, 2024, and will be available for the following 5 weeks. The previous iteration of the course saw a great turnout, with positive feedback from learners, including GLAM professionals and students.

Here’s what you can expect:

Course Structure

  • Chapter 1: The Wikimedia Movement and the Creation of Wikidata
  • Chapter 2: Understanding Knowledge Graphs and Queries
  • Chapter 3: Discovering Wikidata, Open Data, and the Semantic Web
  • Chapter 4: Contributing to Wikidata, the Community, and Data Quality
  • Chapter 5: Bonus Resources on Scientific Bibliography from Wikidata

Head over to Wikidata 101: An Introduction to enroll, and don’t hesitate to share it with your friends and colleagues. The course is hosted on learn.wiki, and you can sign up using the same credentials you use for Wikimedia projects.

If you have any questions, feel free to reach out to me directly.

Cheers, Mohammed Abdulai (WMDE) (talk) 19:44, 26 September 2024 (UTC)Reply

Duplicate entries due to ceb wiki?

edit

Landau an der Isar (Q509536) and Landau an der Isar (Q32084506) seem to be the same but ceb.wiki has two articles. Magnus Manske (talk) 09:24, 27 September 2024 (UTC)Reply

I have merged the Cebuano pages into one because they both about the same subject. But the WD items are about the different concepts -- the commune and the centre of the commune. Landau an der Isar is divided into seven settlements (? quarters?), and the main one shares the same name with the commune. See w:de:Landau an der Isar#Gemeindegliederung. --Wolverène (talk) 10:04, 27 September 2024 (UTC)Reply
@Magnus Manske There are a lot of bot created pages in ceb.wiki because of GeoNames see https://www.wikidata.org/wiki/Wikidata:WikiProject_Territorial_Entities/Geonames_and_CebWiki for more background. ChristianKl13:13, 27 September 2024 (UTC)Reply

Adding multiple statements to plant wikidata entries

edit

Hi all,

I'm a plant enthusiast interested in enhancing Wikidata's plant entries. I'm contemplating adding statements to plant species that reflect required care and features of plants.

For example, to Goeppertia insignis (Q90458733) (Calathea orbifolia), I would add something like the following:

Property: Value
Cycle: Herbaceous Perennial
Watering: Average
Propagation: Division,Stem Propagation,Leaf Cutting,Air Layering Propagation
Flowers: Yellow Flowers
Sun: part shade,part sun/part shade
Leaf: Yes
Leaf Colour: green,purple
Growth Rate: Low
Maintenance: Moderate
Tropical: Yes
Indoors: Yes
Care Level: Medium

I believe these additions would be valuable for several reasons:
1. They would provide more detailed information for plant care.
2. They could facilitate SPARQL queries for plant selection based on specific criteria.
3. They might aid in botanical research and education.

Before proceeding, I have a few questions:

1. Are there existing properties in Wikidata that cover some of these aspects? If so how can I find them?
2. If not, what is the process for proposing new properties?
3. Do you think these additions would be acceptable and valuable for Wikidata?
4. Are there any concerns or potential issues with adding this type of information?

I would greatly appreciate your feedback on the specific properties I've listed and any suggestions for improvement or additional properties to consider.

Thank you for your time and input! Inkpotmonkey (talk) 11:52, 28 September 2024 (UTC)Reply

Most of these are subjective, and therefore are not suitable for use in a database, unless they are rigorously defined and widely agreed-on by scientists.--Jasper Deng (talk) 22:16, 28 September 2024 (UTC)Reply
@Inkpotmonkey: Most of the proposed data sounds very subjective, which means it is hard to make them compatible with Wikidata. However, if you want to help, you may add properties like flower color (P2827), foliage type (P10906) and leaf morphology (P12616) with together with realiable references. Samoasambia 08:39, 1 October 2024 (UTC)Reply

European language levels

edit

Hi, in Europe which has 45+ languages to handle we have some transnational languages level framework called en:Common European Framework of References for Languages (Q221385) together with languages levels (Q104381881) structured as as :

I assigned :

  • A1 & A2 as sub-class of A,
  • B1 & B2 as sub-class of B,
  • C1 & C2 as sub-class of C.

But are A, B, C of *instance of* (P31) or of *sub-class of* (P279) of Common Reference Levels for languages (Q104381881) ??

See also WDQS https://w.wiki/BMKo . Yug (talk) 20:47, 28 September 2024 (UTC)Reply

@VIGNERON:. Yug (talk) 10:43, 29 September 2024 (UTC)Reply
I would suggest that all of the items listed above should be part of (P361)CEFR common reference level (Q104381881) instead of instance or subclass, but I won't claim to be an expert. Huntster (t @ c) 13:45, 29 September 2024 (UTC)Reply
I think Q104381881 should be edited to "CEFR language level" (or something similar) so that having it as an "instance of" value would make sense. In addition all of the levels could have part of (P361)Common European Framework of Reference for Languages (Q221385). Samoasambia 16:40, 29 September 2024 (UTC)Reply
Good catch, agreed on all points. Huntster (t @ c) 17:03, 29 September 2024 (UTC)Reply
@Yug, Huntster: I did the changes I proposed now. I assigned (now renamed) CEFR common reference level (Q104381881) both as a instance of and subclass of value for the "group levels" (A, B, C) which looks a bit awkward. That's because otherwise the contraint checks on the "lower levels" (A1, A2, B1 etc.) would be trigged by being a subclass of an item that is not subclass of anything. Samoasambia 08:59, 1 October 2024 (UTC)Reply
@Samoasambia: I've removed rank (Q4120621) from CEFR common reference level (Q104381881) (since it's not really a rank in and of itself), and added it to each of the levels in place of CEFR common reference level (Q104381881) to avoid the issue you pointed out. Let me know if you disagree. Huntster (t @ c) 13:45, 1 October 2024 (UTC)Reply
Thanks Huntster, that seems to work well. Samoasambia 19:33, 1 October 2024 (UTC)Reply

Adding Nigerian politicians

edit

Hello! I scanned a book about the Nigerian legislature called Nigeria Legislature 1861-2011 with lists of the members of the Nigerian Senate and House of Representatives. Many of them are not on Wikidata (or anywhere I can find online :/) so I wanted to add them. They come in the form of infoboxes that look like this [6]. I'm slowly compiling these infoboxes (there are a LOT) into a spreadsheet to add to Wikidata through QuickStatements. Unfortunately, I'm not extremely familiar with Wikidata so I wanted some help, feedback, and other comments about how I should go about this.

Right now, my CSV has columns for Name, Constituency, State, Date of Birth, and Education. I wish I could add an image for them but I'm not sure about the copyright of a book published by the Nigerian government. Fields for Date of Birth and Education can be pretty spotty, with Education in particular varying in specificity from specific subject details of a Ph.D to simply listing a diploma in a subject, if any is listed at all. Politicians from Oct-Dec 1983 in particular have sparse details likely due to the military coup in 1983.

Some questions I have about this,

1) Some names only have initials without full names. Is this okay?

2) Some list in their Education field a Grade III/II Teacher's Cert. I can't find anything related to this education on Wikidata (seems to be an old teacher credential used in the 1960s or so). What should I do here?

3) Right now, fields in the spreadsheet are the plain text from the infoboxes. I plan on using Pandas to transform it into properties and qualifiers that QuickStatements would like. How would I go about adding "inner qualities" of a property? Not sure what the correct jargon for it is but an example is in Leslie Lamport under Doctor of Philosophy, it lists his academic major as mathematics.

Thanks for reading, and let me know any questions, comments, or concerns! Moon motif (talk) 02:23, 29 September 2024 (UTC)Reply

@Moon motif: Good questions. Have you looked into OpenRefine as a tool to convert your CSV into wikidata statements directly (no need to go through QuickStatements)? Initials instead of full names are fine; the description should disambiguate who they are. For education, we typically use educated at (P69) for the educational institution, with qualifiers (I assume that's what you mean by "inner qualities") for dates and degree attained. It's possible that Teachers' Training Certificate (Q98793260) or some other type of academic degree (Q189533) meets your needs for the degree; if not it's fine to add a new item as long as you're sure it's not a duplicate of something already here. ArthurPSmith (talk) 19:48, 30 September 2024 (UTC)Reply
Oh cool! Didn't hear about OpenRefine and it definitely looks like exactly what I need. And thanks for answering my questions! Moon motif (talk) 15:15, 1 October 2024 (UTC)Reply

Depreciation tag for database entries that were wrongly created due to scraping?

edit

Take a look at Q23649754. There are currently four statements for identifiers that are meant exclusively for video games, not software (Can You Run it ID, HowLongToBeat ID, Lutris game ID and Rock Paper Shotgun game ID). However because these sites scrapes everything from Steam the identifiers were created anyways Trade (talk) 02:57, 29 September 2024 (UTC)Reply

I think these should not be deprecated, unless the website deprecates, redirects or deletes these identifiers themselves. Midleading (talk) 10:44, 29 September 2024 (UTC)Reply
It does create an annoying amount of constraint errors Trade (talk) 18:08, 29 September 2024 (UTC)Reply

Surname is a common christian name

edit

The entry here for Sydney Walker Barnaby, here is wrong. His surname is Barnaby, on commons it shows up as a given name? Meanwhile I added it to commons as a surname, but the given name derived from wikidata, still shows as a given name? Why? Broichmore (talk) 17:10, 29 September 2024 (UTC)Reply

  Fixed RVA2869 (talk) 17:47, 29 September 2024 (UTC)Reply

Official residence of a university president

edit

We have official residence (Q481289) and official residence (Q11452137). Both seem to me to be too specific to cover the official residence of a university president. Do we have something more general, short of simply residence (Q699405)?

This came up for New York Building (Q130320815). - Jmabel (talk) 14:59, 30 September 2024 (UTC)Reply

@Jmabel: My first instinct was to say be bold and create one if you don't find an existing one. However, I question whether a class is really the best way to model this. I'm not sure that official residences of universities are a class with common features enough that instance of (P31) is the right relationship. Being an official residence seems less like an inherent characteristic of a building and more like a status temporarily conferred. I know well that many existing Wikidata classes similarly fit this description, but it doesn't seem ideal. I'd model this case as:
Daask (talk) 18:42, 30 September 2024 (UTC)Reply

Wikidata Weekly Summary #647

edit

allow source(s) to be added to support claim of an "alias"

edit

Currently, an alias is added with no ability to add a "reference" to support that claimed alias. How and do I make this proposal? Thank you, -- Ooligan (talk) 17:53, 30 September 2024 (UTC)Reply

Labels and aliases are different from other properties in that they are mainly for the use of human editors, and are somewhat outside the graph database logic. Generally if you need to be more specific about the name of an item, the period it applies for, or its variants, and provide references, you should be using one of the "name" properties like name (P2561) or official name (P1448), and adding references to those, and repeating the names as aliases so human searches can see them. Vicarage (talk) 17:59, 30 September 2024 (UTC)Reply

Merge or not?

edit

I have a feeling that Comptes Rendus de la Association Française pour l'Avancement des Sciences. (Q51458548) should be merged into Compte Rendu de l'Association Francaise Pour l'Avancement des Sciences (Q5780218). However there are quite a few very similarly named scientific journals from this time period, so I'm not entirely sure—hence haven't gone ahead and actually done anything. Please advice, if you have access to more detailed information than I have. Thanks! Tommy Kronkvist (talk), 22:50, 30 September 2024 (UTC).Reply

The way forward would to look at all the external ID properties and the information they store to see whether that matches. ChristianKl11:20, 1 October 2024 (UTC)Reply

Building a Health center project proposal

edit

Can we partnership in building a morden Health center in Liberia my country. I'm Michael M. Edwards from Liberia. 41.57.95.221 08:25, 1 October 2024 (UTC)Reply

Who do you mean with "we"? Wikidata is not an institution that builds hospitals. ChristianKl09:58, 1 October 2024 (UTC)Reply

Search items by properties

edit

Hello, while I develop Wikivoyage modules, I found that mw.Wikibase does not have a method to search items by properties. How can this be implemented, or is it simply impossible? Thanks, Tmv (talk) 08:55, 1 October 2024 (UTC)Reply

@Tmv: I'm not sure about wikibase generally, but in Wikidata there's a haswbstatement filter for the search box that allows property-based searches. Put "haswbstatement:P18" in the search box and you'll get all the items with images, or put "haswbstatement:P31=Q5" in to find humans (i.e. a specific property value). This can be very useful combined with other search terms. ArthurPSmith (talk) 13:48, 1 October 2024 (UTC)Reply

Knowledge Graph Embedding

edit

Select an NLP task for which an annotated dataset is available and a knowledge graph can be useful (e.g., Question Answering) – Embed the selected knowledge graph – Analyse the advantages of using the graph directly or its embeddings when performing the task. how can i do a project related to this 194.210.175.150 13:58, 1 October 2024 (UTC)Reply

Author Disambiguator now uses split graph

edit

As of October 1 2024 the Author Disambiguator tool has switched from using the original Wikidata Query Service to using the new split graph services. The tool defaults to using the "scholarly" graph to find authored items; however this can be changed on a session-by-session basis using a new "Preferences" page. Check the box to switch to using the "main" subgraph instead of the scholarly one for authored works. Please let me know if you run into any problems; suggestions can also be submitted as a GitHub issue. ArthurPSmith (talk) 14:52, 1 October 2024 (UTC)Reply

Lamia

edit

Bartolomeus Anglicus's late medieval encyclopedia De proprietatibus rerum, mentions (book 5 chapter 2, in Stephen Bateman's 1582 translation):

...a beaſt that is called Lamia, that hath as the Gloſe ſaith Super Tre. an head as a maide, and bodie like a grimme beaſt.

Which Lamia is the proper target of a link? Wikidata has Lamia (Q200073) and lamia in a work of fiction (Q59312503), but it's neither of those because Bartolomeus clearly believed they were real. Marnanel (talk) 15:52, 1 October 2024 (UTC)Reply

Merging multiple Wikidata entries into one

edit

So, recently, I created a new page on main Wikipedia entitled "LGBTQ themes in Western animation". Site links have been added to redirect to those entries to the revised page. That's find. However, the Wikidata entries for the now-merged pages still exist as "LGBTQ themes in Western animation (Q96381090)", "LGBTQ themes in Western animation (Q104862909)", "LGBTQ themes in Western animation (Q104862902)", "LGBTQ themes in Western animation (Q104862898)" and "LGBTQ themes in Western animation (Q96381091)" still remain. I would like to merge them into "LGBTQ themes in Western animation (Q130371258)". How do I do that? Historyday01 (talk) 18:51, 1 October 2024 (UTC)Reply

@Historyday01: Hi, I did it for you. But for the future you can find instructions at Help:Merge. Samoasambia 19:30, 1 October 2024 (UTC)Reply
Thanks. I'll definitely keep that in mind going forward. Historyday01 (talk) 19:33, 1 October 2024 (UTC)Reply

Please delete Q57539376 and Wikidata:WikiProject sum of all paintings/Exhibitions/Salon de 1871

edit

I created this item and list by mistake. There was no Salon in 1871 because of the German-France War. Carl Ha (talk) 19:29, 1 October 2024 (UTC)Reply

@Carl Ha: done, but please use Template:Delete or WD:RfD in future. --Wüstenspringmaus talk 10:25, 2 October 2024 (UTC)Reply

What's the difference between Olympic sporting event (Q18536594) and Olympic sports discipline event (Q26132862)?

edit

Apparently the latter includes the former, but I can't really figure out the difference, Strainu (talk) 21:47, 1 October 2024 (UTC)Reply

The former seems to be for actual events within the discipline like snowboarding at the 2010 Winter Olympics – women's halfpipe (Q263926) whereas the latter is for general disciplines like snowboarding at the 2010 Winter Olympics (Q381127) — Martin (MSGJ · talk) 11:21, 2 October 2024 (UTC)Reply

Unattended report

edit

Hello, What next steps would you recommend for the situation when my User Report hangs unattended by the administrators almost for a week, while the Wikidata items affected still have incorrect data? Flipping Switches (talk) 09:49, 2 October 2024 (UTC)Reply

Plus, user's tone took close to offencive turn. Flipping Switches (talk) 10:43, 2 October 2024 (UTC)Reply
Does this relate to Wikidata:Administrators'_noticeboard#Report_concerning_User:Шкурба_Андрій_Вікторович? Probably better to keep the discussion in one place. Keep posting until you get a response — Martin (MSGJ · talk) 11:22, 2 October 2024 (UTC)Reply

Vandalism by 114.5.110.202 on Oct 2, 2024

edit

Hey there, it seems someone behind 114.5.110.202 vandalised some items: [7]. Can somebody with the right tools revert the edits, please?

--Frlgin (talk) 13:11, 2 October 2024 (UTC)Reply

  Done: Reverted & blocked. @Frlgin: Please report vandals on WD:AN next time. Thanks! --Wüstenspringmaus talk 13:17, 2 October 2024 (UTC)Reply

Geopatronyme family name ID (P3370) now redirected to Fila

edit

The new pattern is "http://www.filae.com/nom-de-famille/$1.html". And because the .html part is new, the automatic redirection to Filae returns a 404 error message. Rosenzweig (talk) 17:52, 2 October 2024 (UTC)Reply