Boghog
Important molecular biology task force discussions
editHello, you signed up as a participant for the Molecular biology task force. As promised, we are keeping you up to date on important discussions as they arise. Currently, we are very interested in your thoughts on how orthologs are handled in Wikidata. Please chime in here if you have thoughts. There is also another discussion on whether genes and proteins should be combined in a single wikidata item or kept separate. Both these issues are critical for how we model genetic data, and it would be much better if we established solid community consensus now. Your input is appreciated! Cheers, Andrew Su (talk) 00:15, 23 June 2013 (UTC)
- Hi Andrew. Thanks for the heads up. Both discussions are interesting and important and I have responded to both. Cheers, Boghog (talk) 11:02, 23 June 2013 (UTC)
Thanks!
editFYI, this comment was meant for you too... User_talk:Emw#Thank_you.2C_and_forge_ahead.21 As always, I'm enjoying working with you! Cheers, Andrew Su (talk) 20:25, 30 June 2013 (UTC)
The property PubChem CID (P662) that you supported is available now. --Tobias1984 (talk) 14:57, 2 July 2013 (UTC)
The property NCBI taxonomy ID (P685) that you supported is available now. --Tobias1984 (talk) 09:56, 10 July 2013 (UTC)
sitelinks
editHi Boghog, is there a reason that you move enwiki sitelinks to items that no longer fit the main article subject? Example: the enwiki fatty acid synthase is mainly about all the fatty acid synthases that exist, ie the whole family, not the human protein. --SCIdude (talk) 15:35, 14 February 2020 (UTC)
- @SCIdude: Thanks for your message. The scope of Gene Wiki articles is not only the human gene and protein encoded by that gene, but also orthologs that exist in other species. Please also note that the redirect from the human FASN (gene) is to Fatty acid synthase. Furthermore, fatty acid synthase, subgroup (Q81924470) strictly speaking, is not a family of proteins, but a chemical reaction catalyzed by a family of enzymes (see Enzyme Commission number). In order for Template:Infobox gene on Fatty acid synthase to pull data from wikidata, the wikidata page for fatty acid synthase (human) (Q419864) must link to Fatty acid synthase. Otherwise the infobox will not display any data. That is why I moved the enwiki sitelink from fatty acid synthase, subgroup (Q81924470) to fatty acid synthase (human) (Q419864). I had a similar problem with Insulin. I suspect that there are many more like this. I think the problem is that wikidata assumes that there is a one to one correspondence between wikidata items and Wikipedia articles, but this is not always the case in practice. Boghog (talk) 17:01, 14 February 2020 (UTC)
- fatty acid synthase, subgroup (Q81924470) is a family of proteins because that is what is the value of the instance of (P31) statement. This statement makes the item. The problem is rather that you do not understand that wiki sitelinks should be placed on that WD item that is the main subject of the article. That is not negotiable. If the infoboxes have problems then I have some ideas for improvements on the Wikidata side of things, but not by tweaking sitelinks. For example, if you talk about a catalyzed reaction, almost all enzyme families have an associated molecular function (P680) which you can use for filling the infobox. Why? Because enzymes can have multiple molecular function (P680). An idea would be to have a model item of a family which would point to a human member of that family, or E.coli for bacterial families. Just be open and look at the item's statements not the label. --SCIdude (talk) 17:55, 14 February 2020 (UTC)
- Maybe I should add that the main subject of an article is not what the name seems to suggest, it's what most sentences in the article are about. --SCIdude (talk) 18:10, 14 February 2020 (UTC)
- @SCIdude: a class of enzymes ≠ a class of proteins. The former is defined by function (what reaction the enzyme catalyzes) and the former is defined by sequence homology. In Wikipedia, if there is a one-to-one correspondence between gene (and by extension protein family) and enzyme, then the two article are merged. Currently Fatty acid synthase has (at least) three main topics: fatty acid synthase, subgroup (Q81924470) (a class of enzymes), fatty acid synthase (human) (Q419864) (the human protein), and the protein family (human + orthologs in other species). Right now, I agree that most of the article deals with the enzymatic function/mechanism but there are also significant sections on human gene/enzyme (regulation and clinical significance). It makes no sense to split one article that contains three closely interrelated topics that unless the article grows extensively in size. So how are we going to fix wikidata so that we can pull wikidata into Template:Infobox gene which expects human gene data if the "main topic" is a enzyme class? Boghog (talk) 19:01, 14 February 2020 (UTC)
- I'm aware that protein family is defined by homology. There is "group or class of proteins" for sets that are not defined this way and we might need to make enzyme family a subclass of that. Usually in WD if there is a protein family with enzymatic activity it is instance of both. As to Wikipedia articles being conglomerates of concepts my proposal was to mimic that in WD by making such a conglomerate item having links to all concept items (genes, proteins, complexes, families, activities, complex families) via the main subject statement. And then move all sitelinks to that. Infoboxes would need to handle that indirection but at least the general problem would be solved. No one was interested, neither in WP, nor in enwiki. --SCIdude (talk) 20:36, 14 February 2020 (UTC)