"This has been an issue for more than six months": zh-pron audio files don't play in some circumstances
editDay 3
for disaster ambiance:
Audio: | (file) |
Problem: zh-pron audio files don't play in some circumstances
(Summarizing previous conversation: [1])
"This has been a problem for a long time." - Tooironic
"has been an issue for more than six months"; "very frustrating"- Metaknowledge
[2] - an edit from April 2016, "A few days after" which Suzukaze-c says "the damn buttons stopped showing up"
"the problem has to do with its interactions with the collapsible element" - Justinrleung
"I tried it on my laptop using Opera and Firefox in Safe Mode. In my case, where there should be a play button etc, there's just a pale grey bar with nothing on it. This has been going on for months at least- I noticed this in December, ... I can play the file on my phone using the generic browser." - Geographyinitiative
Potential Solutions:
"What if we display the audio in the uncollapsed part, i.e. after the romanization in the main part instead of after the IPA in the collapsed part?" - Justinrleung
--Geographyinitiative (talk) 10:29, 1 April 2019 (UTC) (modified)
- To clarify, it was "a few days after" I officially implemented the hack in 2018. —Suzukaze-c◇◇ 06:22, 4 April 2019 (UTC)
- "The cause of your audio issue is a clash between the collapsible element and the audio. This is bug T130982." -Snaevar --Geographyinitiative (talk) 09:16, 8 April 2019 (UTC)
Please update the existing global script with this. This version includes the ability to detect the display form of the lemma from its headword, so that it doesn't need to be specified anymore. —Rua (mew) 16:38, 2 April 2019 (UTC)
The Wiktionary page [[ads]] is not loading. https://en.wiktionary.org/wiki/ads
I've tried this on multiple browsers. There seems to be a page there according to page source, but nothing renders. I've even tried it through a Glype proxy set to strip everything.
-- 70.51.201.106 06:19, 4 April 2019 (UTC)
- Could it be an adblocker? Perhaps it is being over-zealous. —Suzukaze-c◇◇ 06:19, 4 April 2019 (UTC)
- I use uBlock Origin and it is stating that it is not blocking anything on that page (I also have this domain and most other Wikimedia domains whitelisted), yet I am not seeing content (in Chrome). In Edge I do see the page load as normal. Even when I disable Chrome's internal ad blocker the page fails to load in Chrome. - TheDaveRoss 13:08, 4 April 2019 (UTC)
- I asked the kinds folks in #wikimedia-tech, see Wiktionary:Information_desk#ads_blocked_by_AdBlock. - TheDaveRoss 14:11, 4 April 2019 (UTC)
Delimiters of affixes inadequately set
editWhen I do the same code with {{suffix}}
or {{affix}}
in Aramaic as in Hebrew entries to achieve Category:Hebrew words suffixed with ־ות (forming nouns) but in Aramaic this does not happen as exemplified by the page חָנוּתָא because the מַקָּף (maqqāp̄) hyphen is somewhere not set for Aramaic but a default hyphen-minus is used which makes the categorization of Hebrew and Aramaic incongruent (at the same time the entries of Category:Aramaic roots use the maqqāp̄). Last time I tried it did not work as expected with Persian, even, for when I use the templates with Arabic I can have a تَطْوِيل (taṭwīl) but not with Persian, Persian has a special template {{fa-suffix}}
as a hack for the same result. Somewhere in some back end something must be changed. I assume that instead of superimposing the hyphens for all scripts if a language hasn’t special settings the scripts itself should have delimiters set, so taṭwīl for Arabic script and maqqāp̄ for Hebrew script. Fay Freak (talk) 22:33, 4 April 2019 (UTC)
- I agree that this ought to be set by script, but that would still present a problem for Aramaic, which is attested in multiple scripts, not just Hebrew. —Μετάknowledgediscuss/deeds 00:23, 5 April 2019 (UTC)
- Why, so the script needs to be detected for this purpose too, as it is anyway for formatting. Fay Freak (talk) 00:28, 5 April 2019 (UTC)
- @Fay Freak Currently Module:compound has special hyphen entries for Arabic (lang=ar), Persian/Farsi (lang=fa), Hebrew (lang=he) and Yiddish (lang=yi). I'm not sure why your experience with Persian doesn't work. As for Aramaic, the issue here as User:Metaknowledge points out is that there are multiple possible scripts, and script detection currently happens too late for it to be easily usable here. I may be able to fix that by checking for a given language if it has multiple scripts, and if so, manually calling script detection before linking; let me think about this. Benwing2 (talk) 06:57, 5 April 2019 (UTC)
- Persian and Arabic behave differently, example code:
{{affix|ar|هَرْج|ـِيّ}}
{{affix|fa|هرج|ـی}}
- equal in categorization to:
{{suffix|ar|هَرْج|يّ}}
{{suffix|fa|هرج|ی}}
- The first of each categorizes to Category:Arabic words suffixed with ي, the second to Category:Persian words suffixed with ـی. Whence this difference? @Benwing2
- Category:Uyghur words by suffix has the hyphen-minus.
- Fay Freak (talk) 11:00, 5 April 2019 (UTC)
- @Fay Freak I looked into this. This happens because the entry for Arabic in Module:languages/data2 has tatweel (Unicode 0x0640) listed as a character to remove (along with vowel diacritics) when forming entries, but Persian doesn't. What's the correct behavior? Should the tatweel character be present in the "Arabic words suffixed with FOO" category? Benwing2 (talk) 13:03, 5 April 2019 (UTC)
- The problem with that all is that, if I see it correctly, that this not only affects categorization but also the links and “allowed” entry names (which categorize as “terms spelled with X if there is a character excluded), so the Persian links to ـی and the Arabic to ي; Sindhi seems to have the same settings as Arabic again, looking at that module data. Now I realize what’s up with هـ, that is written like that in the outside world, unlike the affixes which have only fictitious strokes, but we can’t link it with
{{m}}
etc.: Some talk about it there was in User talk:Wyang/Archive7#Some Arabic tracking categories but we didn’t realize the whole problem back then. - My tendency is that the affixes in all Arabic script categories should have the taṭwīl in the appropriate places though I would also agree with all entries having none because “suffixed with”, “interfixed” with etc. contains the information already, whereas I assume there should be no affix pages with taṭwīl because people don’t expect to enter ـی instead of ی to get the information they want, they don’t see the reason by which there is differentiation (and I suspect the entries are rather arbitrarily entered for Persian with and without kešide, too, there not being any reason). They only see هـ as distinct perhaps, which could however also be on ه. I see that the ezâfe sign ی does not connect to the right but does this justify having the suffixes at ـی on separate pages, instead of only one page where the headlines do the differentiations? And yes indeed, now I find the Persian pages are arbitrarily named, looking into Category:Persian suffixes. So we have وش though it apparently connects to the right (or sometimes, in names, not?), and Category:Persian words suffixed with ـپرست though it apparently doesn’t connect to the right, and Category:Persian words suffixed with ـگی with entries linking to گی, whereas we have ـستان. This is some confusing state achieved with the template
{{fa-suffix}}
, which is apparently used to link to entries without the kešide where{{suffix}}
would link with a kešide, and needs to be cleaned up sooner or later: I think Persian editors intended to have all suffixes on pages without kešide. Because فرهنگی# actually links to ی#Persian instead of ـی where the suffix entry is, but{{suffixsee}}
does not work without manual correction (|1=
) of the category name on pages without kešide. Here you see that the the parallelism of linking+categorization either both with or without tatweel has gone wrong. The programming needs to be fixed, the Persian pages placed correctly, and{{fa-suffix}}
deleted, in accordance with our recent turn towards general templates, @Benwing2. - Maybe the module stuff, while needing to be rewritten as I have demonstrated by this, should be rewritten in the fashion that the information about which characters shouldn’t appear in entries are stored in variables for a whole script and this variable can be accessed by each language, with only additional language-specific characters or language-specific deductions from the general set, and parallelly different variables for which characters will appear in category names to make differentiation possible, maybe this would also save some memory, right @Erutuon? Instead of copying the same or inconsistent stuff across languages, “basic” data for a script? This would also mean that if a language is added it is mostly enough to add the script without making those
entry_name
exceptions, only write that the general applies (which should be stated for each language to prevent cases of unthought languages where what is thought to be a diacritic turns out, like in Yiddish contra Hebrew, as being part of a distinct character). Every Arabic dialect, arz, aec, and so on, such as in خاشوقة, has its own character list which is only a copy. Aramaic in Module:languages/data3/a has the same diacritics as Hebrew for its Hebrew script and uses the same delimiters, etc. etc. Fay Freak (talk) 14:46, 5 April 2019 (UTC)- @Fay Freak Sorry, what you've written is a bit confusing ... If you can make a concrete list of exactly what you'd like to see done, I can see about implementing it. Benwing2 (talk) 18:38, 5 April 2019 (UTC)
- @Benwing2 That’s because the state of things is confusing.
- 1. if you look into Category:Persian words by suffix, some pages are created with taṭwīl, some without.
- 2. However the affix templates,
{{suffix}}
and{{affix}}
, cannot be used to link from etymology sections to affix entries that do not contain a kešide in the page name, which are the majority. - 3. Editors circumvent 3. by using
{{fa-suffix}}
, which should be deleted according to our recent turn towards general templates after the Modules and the page names are put into order. - 4. the whole calamity arises by the character of the maqaf and the tatweel not being recognized as what is for example for English the hyphen-minus; they are not represented correctly by the current system. Module:languages/data2 seems to be not at all the place where tatweel and maqaf belong to?
- 5. Arabic suffixes are all without tatweel in the page name, and
- a) I argued that this is the state preferred by users for Arabic script entries of any Arabic script language, in particularly as shown by the Persian editors so far: Most affix entries do not have a tatweel but via
{{fa-suffix}}
their categorizations contain them. The affixes should by this reasoning all be on pages without taṭwīl for Persian as for any other Arabic-script language. - b) But I discard 5. a) seeing that English, Latin entries etc. have the affix entries, e.g. con-, entered with hyphens. It would be easier to implement that a tatweel equals a hyphen for these scripts and move all Arabic affixes (they are few) and Persian entries without tatweel to the respective places; similarly it seems with Hebrew which already handles the maqaf like English. I guess @Rua would agree here. I wonder why nobody has noted this inconsistency yet.
- 6. This leads me to the observation that the presumption of the current implementation that linked page names are mirrored in the category names, is wrong. شناسی not having a tatweel does not entail the categorization Category:Persian words suffixed with شناسی: Instead people chose Category:Persian words suffixed with ـشناسی. But this is solved if the tatweel is consistently treated as a hyphen in Latin entries.
- 7. An additional problem arises however because of Persian suffixes some (for letters which do even connect to the right) connect to the right, some not. The suffix in بوییدن (buyidan) connects to the right, the suffix in انسانشناسی (ensân-šenâsi)} doesn’t. This can be solved, methinks, by the affix templates recognizing the presence of a zero-width non-joiner at the beginning of
|2=
,|3=
and accordingly categorize Category:Persian words suffixed with ـیدن but Category:Persian words suffixed with شناسی. A zero-width-joiner then denotes the end of an affix, so to say. - 8. Small problem: There are terms appearing with tatweel: هـ appears in this form in Arabic texts.
- 9.
{{suffixsee}}
does not work if the affix page name is ≠ the category’s affix name, as a side note,so it wouldn’t work unless شناسی is a page with zero-width-non-joiner in the page name. This can lead to an argument for zero-width-non-joiners in page names and category names, as it has already been noted that a zero-width-joiner then denotes the end of an affix, but without this one can link through(Striked because of an inconsequent presumption that the affix template does not discard the ZWNJ in the category name) this will be fixed. Fay Freak (talk) 04:34, 8 April 2019 (UTC){{suffixsee}}
in the شناسی cases by the|1=
. - 10. Taken together the easiest solution seems
- a) we should remove the tatweel (0x0640) from the Module:languages/data2 and all other language data linked there (Arabic dialects, Sindhi). This would also solve 8.
- b) recognize the tatweel as “the hyphen for Arabic script” and treat it as such in the affix templates. Also treat the maqaf as “the hyphen for Hebrew script”. The problem of Aramaic
{{affix}}
instances not employing the maqaf will be solved this way; Arabic script entails tatweel, Hebrew script Maqaf, in the links and the categories. - c) also treat the zero-width non-joiner as a hyphen in the affix templates, to fix point 7.
- aa) If d) aa) is true, then the affix templates will not link to pages with ZWNJ but to the passed argument without ZWNJ, and the categorizations will not contain ZWNJs.
- bb) If d) bb) is true, then the affix templates will link to pages with ZWNJ, and the categorizations will contain ZWNJs.
- d)
- aa) Variant 1: move all affix entries in Arabic script to pages with tatweel unless they do not connect like شناسی.
- bb) Variant 2: move all affix entries in Arabic script to pages with tatweel, but those that do not connect like شناسی to the same with zero-width non-joiner. شناسی gets moved to شناسی (with ZWNJ).
- e) remove
{{fa-suffix}}
. Through a—d we have all to handle Persian suffixes as well as other Arabic script suffixes correctly.
- 11. Before the insight into the nature of the tatweel as better being treated as not a diacritic but as the hyphen, I had the side idea that the information about which characters shouldn’t appear in entries should perhaps be stored in variables more. Instead of repeating that which is under
entry_name =
for every entry maybe the assignment of a script automatically entails certain diacritics being treated as such, if not set otherwise as for example Yiddish needs diacritics in entry names as opposed to Hebrew. Fay Freak (talk) 22:52, 5 April 2019 (UTC)
- @Fay Freak Sorry, what you've written is a bit confusing ... If you can make a concrete list of exactly what you'd like to see done, I can see about implementing it. Benwing2 (talk) 18:38, 5 April 2019 (UTC)
- The problem with that all is that, if I see it correctly, that this not only affects categorization but also the links and “allowed” entry names (which categorize as “terms spelled with X if there is a character excluded), so the Persian links to ـی and the Arabic to ي; Sindhi seems to have the same settings as Arabic again, looking at that module data. Now I realize what’s up with هـ, that is written like that in the outside world, unlike the affixes which have only fictitious strokes, but we can’t link it with
- @Fay Freak I looked into this. This happens because the entry for Arabic in Module:languages/data2 has tatweel (Unicode 0x0640) listed as a character to remove (along with vowel diacritics) when forming entries, but Persian doesn't. What's the correct behavior? Should the tatweel character be present in the "Arabic words suffixed with FOO" category? Benwing2 (talk) 13:03, 5 April 2019 (UTC)
- @Fay Freak Currently Module:compound has special hyphen entries for Arabic (lang=ar), Persian/Farsi (lang=fa), Hebrew (lang=he) and Yiddish (lang=yi). I'm not sure why your experience with Persian doesn't work. As for Aramaic, the issue here as User:Metaknowledge points out is that there are multiple possible scripts, and script detection currently happens too late for it to be easily usable here. I may be able to fix that by checking for a given language if it has multiple scripts, and if so, manually calling script detection before linking; let me think about this. Benwing2 (talk) 06:57, 5 April 2019 (UTC)
- Why, so the script needs to be detected for this purpose too, as it is anyway for formatting. Fay Freak (talk) 00:28, 5 April 2019 (UTC)
Some Arabic-script affixes also have hyphens. I gathered all titles with Arabic-script characters and hyphens, and got the languages based on the lemma categories. (Not all of them are affixes.) The languages represented are Malay (15), Mozarabic (1), Ottoman Turkish (3), Urdu (2), Uyghur (79):
- Titles (click to toggle)
- -ئاي: Uyghur
- -ال:
- -ان: Malay
- -تا: Uyghur
- -تىكى: Uyghur
- -تىن: Uyghur
- -تە: Uyghur
- -تەك: Uyghur
- -خانا: Uyghur
- -دا: Uyghur
- -داش: Uyghur
- -دىكى: Uyghur
- -دىن: Uyghur
- -دە: Uyghur
- -دەك: Uyghur
- -راق: Uyghur
- -رەك: Uyghur
- -سز: Ottoman Turkish
- -سى: Uyghur
- -سیز: Ottoman Turkish
- -غا: Uyghur
- -غىچە: Uyghur
- -قا: Uyghur
- -قىچە: Uyghur
- -قە: Uyghur
- -كىچە: Uyghur
- -كە: Uyghur
- -لار: Uyghur
- -لق: Ottoman Turkish
- -لىر-: Uyghur
- -لىرى: Uyghur
- -لىق: Uyghur
- -لىك: Uyghur
- -لۇق: Uyghur
- -لۈك: Uyghur
- -لەر: Uyghur
- -لەر-: Uyghur
- -ماق: Uyghur
- -مىز: Uyghur
- -مۇ-: Uyghur
- -مەك: Uyghur
- -نى: Uyghur
- -نىڭ: Uyghur
- -نىڭكى: Uyghur
- -ىش: Uyghur
- -ىم: Uyghur
- -ىم-: Uyghur
- -ىمىز: Uyghur
- -ىڭ: Uyghur
- -ىڭلار: Uyghur
- -ىڭىز: Uyghur
- -ىڭىزلار: Uyghur
- -ىڭىزلەر: Uyghur
- -ي-: Uyghur
- -پ-: Uyghur
- -چى: Uyghur
- -چىلىك: Uyghur
- -چە: Uyghur
- -کو: Malay
- -ڭلار: Uyghur
- -ڭىز: Uyghur
- -ڭىزلار: Uyghur
- -ڭىزلەر: Uyghur
- -گىچە: Uyghur
- -گە: Uyghur
- -ۇش: Uyghur
- -ۇم: Uyghur
- -ۇڭ: Uyghur
- -ۇڭلار: Uyghur
- -ۈش: Uyghur
- -ۈم: Uyghur
- -ۈم-: Uyghur
- -ۈڭ: Uyghur
- -ۈڭلار: Uyghur
- ئا-: Uyghur
- ئابدۇل-: Uyghur
- ئاق-: Uyghur
- ئەبۇ-: Uyghur
- ا-: Urdu
- ال-:
- ان-: Urdu
- اونڠ-انيڠ: Malay
- ايک-: Malay
- اِيّارِ-: Mozarabic
- بى-: Uyghur
- بەت-: Uyghur
- تري-: Malay
- تق بر-: Malay
- تق تاهو مناري دکاتاکن لنتاي جوڠکڠ-جوڠکيت: Malay
- تل-أبيب:
- تيدق بر-: Malay
- دوي-: Malay
- س-: Malay
- سۈ-: Uyghur
- قى-: Uyghur
- قىش-ياز: Uyghur
- مڠ-: Malay
- نا-: Uyghur
- پاك-پاكىز: Uyghur
- ڤنچا-: Malay
- ک-: Malay
- کو-: Malay
- ڽه-: Malay
The ones without a language are redirects. — Eru·tuon 08:12, 6 April 2019 (UTC)
- @Erutuon Thanks, I'll figure out what to do with those entries. Benwing2 (talk) 02:20, 8 April 2019 (UTC)
- @Fay Freak I have mostly rewritten Module:compound to have script-based hyphens instead of language-based hyphens. Just to clarify, you're asking to have two possible hyphens for Persian (and maybe other languages with the Arabic script), tatweel and ZWNJ, right? So the module will recognize either of them as a hyphen? The only tricky thing then is what to do with
{{prefix}}
and{{suffix}}
, which can accept the affix without the hyphen and add the hyphen in order to generate the link and the category. When there are two hyphens, which one do we add? Maybe the correct thing is to always add tatweel, and to require that users on Persian pages either use{{affix}}
(for which this problem doesn't exist) or explicitly specify the appropriate hyphen when using{{prefix}}
/{{suffix}}
(which is definitely allowed). Another possibility is to check for the existence of an affix page with the tatweel added, and if it doesn't exist, check for the existence of the affix page with ZWNJ added; if neither exists, fall back to tatweel. Checking for page existence is an "expensive" operation but I assume it will be OK because we should only be checking a limited number of times per page, and you can always avoid the problem entirely by explicitly specifying the hyphen, as I mention above. Benwing2 (talk) 02:31, 8 April 2019 (UTC)
- @Benwing2 You have greatly developed the concept of the things. I wasn’t clear in expressing, if in thinking, what would be done with
{{suffix}}
as distinguished from{{affix}}
. Apparently I had in mind that, since hyphens are only relevant in{{affix}}
, by contrast in{{suffix}}
a ZWNJ would not be handled (naturally) “as a hyphen” but as a sign serving the purpose of telling the module that this suffix does not connect, being discarded by the link and category (unless one chooses the variant where one moves these suffixes to pages with ZWNJ. But this is not necessary if one can discard and is rather theoretical anyway since people will be abhorred by pages names with ZWNJ). For{{affix}}
the ZWNJ would also tell the module if the part is a suffix or if it is a prefix and it would also be discarded. As an alternative for{{affix}}
one could also require that affixes use tatweel and discard this tatweel too when a ZWNJ is given at its tip, not interpreting a ZWNJ without tatweel as signifying that here is a suffix resp. prefix. - One could of course also implement a parameter for the same effect: I don’t know if either alternative has to be considered disadvantageous as compared to the other. An implementation in the special affix templates is an issue of course relevant because one cannot use
{{affix}}
in all cases.{{circumfix}}
, though I don’t know which Arabic script language would need it, cannot be replaced by{{affix}}
. (And I mean the analogous for the other special affix templates if I talk about{{suffix}}
, which you have already understood.) - But hm, if one is to suggest parameters to note if an affix connects, why only for the special affix templates but not the general
{{affix}}
?{{affix}}
could have something like|noconnectN=yes
– or actually one would need|noconnectrightN=yes
and|noconnectleftN=yes
to set the connectability for both sides, or even|noconnecttopN=yes
and|noconnectbottomN=yes
for vertical script; one would then enter the affixes with kešide, by the position of which{{affix}}
knows whether it is a suffix or prefix, and then if|noconnectrightN=yes
is given the kešide gets removed? Strange, but possible. You have just found an additional tweak of fixing the things. It was just my biased idea that one would solve the things by handling a leading or trailing ZWNJ in the templates. Community might abhor new ZWNJs but be inclined towards parameters, or be content with ZWNJs since these are on all Persian keyboard layouts while protesting against additional parameters, or have no reason against any and want both for the sake of copiosity. My reservation against parameters for the exposed purpose is that it is script-specific and also a bit many possible parameters, see these examples for{{affix}}
, hence I thought the parameters should handle ZWNJs as a part of the script, but this is a formalistic view. One says (?) also newbies are abhorred by too many parameters being around (rather than ZWNJs?). Fay Freak (talk) 04:34, 8 April 2019 (UTC)- @Fay Freak OK, I'm confused again by all the text. I think I'm going to implement my first idea, that
{{prefix}}
/{{suffix}}
/{{circumfix}}
without hyphens assumes tatweel for Arabic script. If you want ZWNJ, add it explicitly (IMO you should add the hyphen explicitly in any case when dealing with Arabic script). I think ZWNJ is better than adding extra parameters. Benwing2 (talk) 04:59, 8 April 2019 (UTC)- @Benwing2 Seems correct. I try to summarize with less useless distinctions:
- 1. So like for English a hyphen signifies if the part is a suffix or prefix, so will do the tatweel for all Arabic script languages, and so will the Arabic script suffixes be categorized with tatweel and have their entry names on a page with tatweel.
- 2. So like for English
{{suffix}}
adds a hyphen, so it adds a tatweel for Arabic script languages. - 3. A ZWNJ in front of the content of
{{suffix}}
signifies “this end does not connect”, and these Arabic script suffixes will be categorized without tatweel and have their entry names on a page without tatweel, accordingly in such a case{{suffix}}
does not add a hyphen. And the ZWNJ is not extant anywhere except in the wiki source code, in the linked page or category (“discarded”). Fay Freak (talk) 05:34, 8 April 2019 (UTC) - 4. A ZWNJ in front in
|3=
(in the simple case) of{{affix}}
signifies both “this end does not connect” and “this is a suffix, not a compound part“ (the latter, because one wouldn’t use a tatweel because these suffixes do not connect). I alternatively suggested: If this is implemented, one could use ZWNJ+Tatweel and the tatweel is not being displayed by the affix template since the affix does not connect. The ZWNJ is not being anywhereexcept in the wiki source code, in the linked page or category (“discarded”). - For Hebrew script only 1 and two is relevant.
- I think now the thing will be easy to explain in documentations. Fay Freak (talk) 05:34, 8 April 2019 (UTC)
- @Fay Freak I think you mean that if a ZWNJ is specified in e.g. an argument to
{{affix}}
, that it signifies that the argument is an affix, but it will be displayed/linked/categorized without the ZWNJ (and without a tatweel)? I can implement that. Benwing2 (talk) 05:40, 8 April 2019 (UTC)- @Benwing2 Yes. Seems not unreasonable, is it? That’s my idea of “how to use
{{affix}}
for affixes that do not connect”. For the parameter idea we have already rejected, and we want to get rid of special templates like{{fa-suffix}}
. Of course one could also be content with only{{suffix}}
and the other special affix templates supporting the ZWNJ only for signifying that the affix does not connect, deciding “one can’t use{{affix}}
for the suffixes”, but that with{{affix}}
is thought one step further for the parallelism, I think. Fay Freak (talk) 05:52, 8 April 2019 (UTC)
- @Benwing2 Yes. Seems not unreasonable, is it? That’s my idea of “how to use
- @Fay Freak I think you mean that if a ZWNJ is specified in e.g. an argument to
- @Fay Freak OK, I'm confused again by all the text. I think I'm going to implement my first idea, that
- @Benwing2 You have greatly developed the concept of the things. I wasn’t clear in expressing, if in thinking, what would be done with
Implementation of script-specific affix delimiters
edit@Fay Freak, Erutuon I implemented this. Erutuon, please watch out for module errors, and if you see a bunch, feel free to revert Module:compound. I haven't yet removed tatweel from the Arabic entry in Module:languages/data2; once we do that we should fix up the suffix categories and the suffixes themselves. Benwing2 (talk) 02:03, 9 April 2019 (UTC)
- @Benwing2: I made a page of affix templates containing right-to-left characters and discovered a module error in بختسیز (bahtsız). It's caused by the suffix -sız having a hyphen. The category Category:Ottoman Turkish words suffixed with -سیز also has an error message, and is placed in Category:Categories with incorrect name. So entries for hyphenated affixes and entries referencing hyphenated affixes need to be cleaned up, and their categories. — Eru·tuon 04:20, 9 April 2019 (UTC)
- @Erutuon: Hmmm. This is going to require some cleaning up. Maybe you can help? Can you make a list of all pages with affixes containing right-to-left characters as well as hyphens? I can find the affixes and affix cats themselves using categories like CAT:Uyghur suffixes and CAT:Uyghur words by suffix, but finding the words is harder because they will over time get removed from the categories as hyphen is no longer recognized as an affix character. We should also consult User:Oyunqi (for Uyghur) and User:Embryomystic (for Ottoman Turkish) to make sure that using tatweel for hyphen in those languages is correct. Benwing2 (talk) 04:42, 9 April 2019 (UTC)
- @Benwing2: Here's affix templates in which a positional parameter contains both a right-to-left character and a hyphen, if that's what you mean. I just had to filter the list that I linked to in my previous message. — Eru·tuon 04:58, 9 April 2019 (UTC)
- @Erutuon Thanks, that's what I wanted. I went ahead and added Latin hyphen as a possible hyphen character for Arabic scripts as a temporary hack to get rid of the errors, pending confirmation that replacing the hyphens with tatweel is the correct thing to do. Once we've decided what to do and implemented it, we can remove this hack. Benwing2 (talk) 05:07, 9 April 2019 (UTC)
- @Fay Freak See above discussion. What do you think? Currently there are a lot of Arabic-script etymologies, affixes and categories that use a Latin hyphen to denote the affix (in Uyghur, Ottoman Turkish, etc.). Should we convert all of these to tatweel and/or ZWNJ? Do any Arabic-script languages besides Persian make use of ZWNJ in suffixes? Benwing2 (talk) 02:19, 10 April 2019 (UTC)
- Probably Iranian languages in general do the same what Persian does. The users who deal with the languages will be able to handle all correctly since the ZWNJ has been implemented as a marker of non-connecting affixes without special restriction. Concerning moving all to tatweel-containing, that would be the most consistent and later-easiest-to-understand approach. What would be the alternative? Not to have any entries with the tatweel/kešide but only display it in the headword and link to affix entries without tatweel/kešide? But still categorize tatweel/kešide? We can have the tatweel everywhere or discard it partially like we discard ZWNJ because nobody links ZWNJ in pages names (but tatweel is of course not the same) – the question being tripartite, whether etymology sections, category names and entry names (and from the last, the links to entry names, which cannot be separated, since the point of this topic has become to fix the inconsistency) should use tatweel/maqaf at all. For having tatweel/maqaf on all three places speaks:
- 1. The analogy with Latin-script entries – then as well the Arabic entries should have tatweel and the Hebrew-script entries maqaf in the category names and the entry names (unless, for Arabic script, the suffixes do not connect).
- 2. This is the easiest from the technical implementation, I assume.
- 3. Also if we do not discard affix tatweel in entry names we can link entries like هـ with
{{l}}
,{{m}}
and so on. - 4.
{{suffixsee}}
/{{prefixsee}}
etc. will work in any affix page of Hebrew- or Arabic-script languages. - 5. It is the easiest to parse mentally and hence to understand for users: Link display = entry = category.
- 6. Probably it is also better for machine-readability and bottability.
- Just noticed by the way that the pages belonging to Category:Hebrew prefixes have inconsistent entry names while Category:Hebrew words by prefix has all with Maqaf, but Category:Yiddish prefixes has all consistent again as has Category:Yiddish words by prefix, same with Yiddish suffixes.
- For having a split approach speaks: Nothing, apart from not consistent former practice?
- The same way I do not see any reason to use hyphens for only some Hebrew and Arabic script languages. Additionally you see that in -سیز, to name only one, the hyphen is not even on the correct side in the page title.
- The same way I do not deem it advisable to do one thing for one language and one for the other in the same script. My original confusion arose because I couldn’t just use
{{affix}}
and{{affix}}
and get the same for Persian as I get in Arabic, which is frustrating for polyglot endeavours. The only distinction here which has a ground here is between connecting and non-connecting suffixes. - @ZxxZxxZ, Calak, Qehath, פֿינצטערניש, Ruakh, Isaacmayer9, Vtgnoq7238rmqco, Sinonquoi Any leaning to express?
- What we tend to do here is make the sign ـ for Arabic script languages and ־ for Hebrew script languages work like the hyphen - in Latin script languages in the templates
{{affix}}
,{{suffix}}
,{{prefix}}
and so on. These characters are going to be displayed in etymology sections and link to entries of affixed named bearing the character in the name, except the case that it is an affix written unconnected in Arabic script, and likewise the category names will be. Currently there is wild growth; single languages have inconsistent entry and/or category names (some page names use the sign, some not, and category names and page names deviate from each other), and they are inconsistent with languages that use the same script (while Persian uses the kešide in category names, even if the suffix does not connect in script, for example Category:Persian words suffixed with ـشناسی, Arabic never uses it anywhere because the sign is excluded; Uyghur uses a hyphen instead of the tatweel everywhere, others have mixes), and templating does not work predictably: We can use{{suffix}}
and{{affix}}
for Arabic but not so for Persian, it uses{{fa-suffix}}
. And{{suffixsee}}
does not work without manual link correction on Persian suffix pages. Fay Freak (talk) 22:32, 10 April 2019 (UTC)- @Fay Freak I don't quite understand what you wrote above but I take it you prefer the universal use of tatweel instead of Latin hyphen? This makes sense because the Latin hyphen is marked as LTR so it will appear on the wrong side of RTL text. One question though ... should we still use tatweel at the end of prefixes when the last character doesn't join to the left? Or should we use ZWNJ in that case? An example is the Arabic prefix لا. Tatweel with لا looks like لاـ, which some might object to, but ZWNJ is unlikely to be easily inputtable on Arabic-language keyboards. For now I'm going to rename all suffixes containing hyphens, where the non-joining issue doesn't occur because AFAIK all Arabic characters join to the right. Benwing2 (talk) 03:03, 11 April 2019 (UTC)
- @Fay Freak I don't understand entirely what the discussion is about, but in general I would support editing the affix template to automatically convert hyphens to their respective equivalents in Hebrew, Arabic, Yiddish, Persian, etc. based on the language code. פֿינצטערניש (Fintsternish), she/her (talk) 13:56, 11 April 2019 (UTC)
- @Fay Freak See above discussion. What do you think? Currently there are a lot of Arabic-script etymologies, affixes and categories that use a Latin hyphen to denote the affix (in Uyghur, Ottoman Turkish, etc.). Should we convert all of these to tatweel and/or ZWNJ? Do any Arabic-script languages besides Persian make use of ZWNJ in suffixes? Benwing2 (talk) 02:19, 10 April 2019 (UTC)
- @Erutuon Thanks, that's what I wanted. I went ahead and added Latin hyphen as a possible hyphen character for Arabic scripts as a temporary hack to get rid of the errors, pending confirmation that replacing the hyphens with tatweel is the correct thing to do. Once we've decided what to do and implemented it, we can remove this hack. Benwing2 (talk) 05:07, 9 April 2019 (UTC)
- @Benwing2: Here's affix templates in which a positional parameter contains both a right-to-left character and a hyphen, if that's what you mean. I just had to filter the list that I linked to in my previous message. — Eru·tuon 04:58, 9 April 2019 (UTC)
- @Erutuon: Hmmm. This is going to require some cleaning up. Maybe you can help? Can you make a list of all pages with affixes containing right-to-left characters as well as hyphens? I can find the affixes and affix cats themselves using categories like CAT:Uyghur suffixes and CAT:Uyghur words by suffix, but finding the words is harder because they will over time get removed from the categories as hyphen is no longer recognized as an affix character. We should also consult User:Oyunqi (for Uyghur) and User:Embryomystic (for Ottoman Turkish) to make sure that using tatweel for hyphen in those languages is correct. Benwing2 (talk) 04:42, 9 April 2019 (UTC)
I couldn't read all of the discussion. Just a note: it is not common to use tatweel to represent affixes in languages that use Arabic script, especially for Persian, in which the present stems of all verbs can act as affixes, so affixes are generally written as they are without any extra charachters. --Z 08:01, 11 April 2019 (UTC)
- I think the basic issue is that the page name of a dictionary entry for a morpheme that's not written by itself is an artificial construct that's generally not present when the morpheme is in actual use- most of them would fail rfv if challenged as strictly spelled. The hyphen, or whatever else, is a convention we use to organize the entries in our dictionary. I doubt anyone searches for a hyphenated form unless they already know or guess that's how we list affixes. Chuck Entz (talk) 14:06, 11 April 2019 (UTC)
- @ZxxZxxZ, Chuck Entz There are two separate concepts w.r.t. hyphen-like characters, which are now properly separated in the code:
- The template hyphen, which is the character used in template code to denote an affix. For most languages, this is a regular hyphen, but for some (particularly right-to-left languages), a different character is used (Arabic tatweel, ZWNJ, Hebrew maqaf, etc.). There needs to be some character used (especially in the
{{affix}}
template), in order to indicate that a particular term is an affix (bound morpheme) rather than a regular term (free morpheme). - The display hyphen, which is the character that actually appears in affix entry names on Wiktionary and in all displayed forms of the affix. For most languages, this is the same as the display hyphen, but it need not be. For example, for most East Asian languages (Chinese, Japanese, Korean, Thai, Lao, etc.), the template hyphen is the regular Latin hyphen, but the display hyphen is an empty string. For example, I can write
{{affix|ja|超-|tr1=chō-|t1=super-|音速|tr2=onsoku|t2=speed of sound|sort=ちょうおんそく}}
- The template hyphen, which is the character used in template code to denote an affix. For most languages, this is a regular hyphen, but for some (particularly right-to-left languages), a different character is used (Arabic tatweel, ZWNJ, Hebrew maqaf, etc.). There needs to be some character used (especially in the
- and get
- Note that I put a hyphen in the wikicode after 超 to denote that it's a prefix (and hence the page will be categorized into Category:Japanese words prefixed with 超), but it doesn't appear in the displayed or linked form, or in the prefix category.
- The outcome of the above discussion with User:Fay Freak is that for Arabic scripts, there are (for the moment) three possible template hyphens: tatweel (which also maps to tatweel as a display hyphen), ZWNJ (which maps to a blank string as a display hyphen), and Latin - (which also maps to Latin - as a display hyphen). The presence of Latin - as a possible template and display hyphen character is solely because a lot of Arabic-script affix entries currently have it in them, and a lot of pages use it in the wikicode to link to those affix entries. I am planning on renaming the affix entries with a Latin hyphen in them, probably to instead contain a tatweel, and either remove Latin hyphen as a template hyphen character for Arabic scripts, or make it map to tatweel as a display hyphen. User:Fay Freak suggests that in many cases, it's correct to have a tatweel in the affix entry names, e.g. in cases like هـ; but an alternative is to have write the affix entries without any display hyphen, like for East Asian languages. Either possibility can be supported in the code; but supporting two possible display hyphens for a given template hyphen is a bit tricky because you have to do something like check for the existence of the page with a tatweel appended, and if it doesn't exist, fall back to the form without the tatweel. Benwing2 (talk) 20:19, 13 April 2019 (UTC)
- @ZxxZxxZ, Chuck Entz There are two separate concepts w.r.t. hyphen-like characters, which are now properly separated in the code:
Cleaning up lang-specific form-of templates, part #1
edit
There are a lot of lang-specific form-of templates and they're messy and often duplicative. Here I just tackle a few of them with relatively few uses.
There are four languages handled here:
- Modern Greek templates like Template:Cretan dialect form of that are missing the el- prefix.
- Template:be-Taraškievica -> Template:be-Taraškievica spelling of.
- Obsoleting some barely-used Bulgarian templates in favor of more-used, more general equivalents.
- Cleaning up extra Hebrew templates. All these templates come in pairs, one of which uppercases the first letter and one of which doesn't. The normal way to handle this is to use
|nocap=1
, which makes sense here as the uppercase templates are much more common than the lowercase ones (the #uses above is misleading as the uppercase ones call the lowercase ones, so the #uses for the lowercase ones includes uses of the corresponding uppercase ones).
Benwing2 (talk) 06:51, 5 April 2019 (UTC)
{{praenominal abbreviation of}}
Rename to{{la-praenominal abbreviation of}}
Matthias Buchmeier (talk) 09:31, 6 April 2019 (UTC)- @Matthias Buchmeier Thanks, this is fixed. I also made the above changes for all but Hebrew, which is trickier. Benwing2 (talk) 02:19, 8 April 2019 (UTC)
When a linking template is permissibly used with the second parameter being a hyphen but a gloss parameter is present, which I think is not unbeseeming, it categorizes as a term request
editExample in the descendants section of σμίλη (smílē).
* {{desc|arc|bor=1|-|t=scalpel, shoe-knife}} *: {{desc|syc|ܙܡܝܻܠܝܳܐ|tr=zmēlyā}} *: Christian Palestinian Aramaic: {{l|syc|ܝܙܡܝܠ}} *: {{desc|sem-jar|[[אִיזְמֵיל]] / [[אִיזְמֵל]] / [[אִזְמֵל]]|tr=ʾizmēl}}, {{l|arc|אוּזְמֵל|tr=ʾuzmēl}} ** {{desc|ar|إِزْمِيل|bor=1|t=shoe-knife; chisel}} * {{desc|he|אִזְמֵל|bor=1|tr=ʾizmēl|t=scalpel}}
vs.
* {{desc|arc|bor=1|-}} *: {{desc|syc|ܙܡܝܻܠܝܳܐ|tr=zmēlyā}} *: Christian Palestinian Aramaic: {{l|syc|ܝܙܡܝܠ}} *: {{desc|sem-jar|[[אִיזְמֵיל]] / [[אִיזְמֵל]] / [[אִזְמֵל]]|tr=ʾizmēl}}, {{l|arc|אוּזְמֵל|tr=ʾuzmēl}} ** {{desc|ar|إِزْمِيل|bor=1|t=shoe-knife; chisel}} * {{desc|he|אִזְמֵל|bor=1|tr=ʾizmēl|t=scalpel}}
Fay Freak (talk) 00:34, 7 April 2019 (UTC)
{{desc}}
is handled by Module:etymology/templates. The method of displaying annotations (gloss in this case) is pretty hacky: generate a link the usual way, but remove the term request text:{{desc|arc|-|bor=1|t=scalpel, shoe-knife}}
yields → Aramaic: (“scalpel, shoe-knife”). I don't think parentheses make sense here but don't know what the thing should look like.- Now the module removes the term request category. — Eru·tuon 01:27, 7 April 2019 (UTC)
trail
editCan someone please check the entry for English trail ? I do not see anything past the Translation section for the Verb. No Noun section is showing, even though it is still there. Leasnam (talk) 04:34, 8 April 2019 (UTC)
- Same, all under it cut. Fay Freak (talk) 04:36, 8 April 2019 (UTC)
- Fixed; the rest of the contents entered the last translation box due to botched wikicode. —Suzukaze-c◇◇ 04:42, 8 April 2019 (UTC)
- Oh, now I see, this IP edit is some exquisite vandalism, confusing that far. Fay Freak (talk) 04:54, 8 April 2019 (UTC)
- Fixed; the rest of the contents entered the last translation box due to botched wikicode. —Suzukaze-c◇◇ 04:42, 8 April 2019 (UTC)
- Thanks, All ! Leasnam (talk) 12:38, 8 April 2019 (UTC)
Feature request: update navigation popups
editCan anyone update the Lupin/navigation popups, MediaWiki:Gadget-popups.js, so that it grabs/shows at least the first definition on a page? At present it seems to stop at/before the second header of a page, which means it only displays the first language header but not any definitions, which makes it next to useless for examining page content and useful only for examining diffs (when looking for vandalism in one's watchlist, recentchanges, etc). Or is there some better popup gadget with this functionality? - -sche (discuss) 20:08, 10 April 2019 (UTC)
- This gadget loads MediaWiki:Gadget-popups.js on Wikipedia. Pretty formidable bit of JavaScript. Maybe only
Previewmaker.prototype.firstBit
has to be changed though. It seems to determine the parts of the wikitext that are displayed. One complication would be the necessity of expanding templates, which popups currently don't do. — Eru·tuon 20:21, 10 April 2019 (UTC)- Is it necessary to expand templates? Even if the popups didn't expand any templates, and only displayed their wikicode (or ignored/dropped any templates other than
{{l}}
, and either displayed it as{{l|en|wikicode}}
or converted it to a simple wikilink), it would be an improvement, IMO (if they managed to display the text of the page down to the definitions). - -sche (discuss) 21:32, 10 April 2019 (UTC)- I suppose it wouldn't hurt to remove most templates in an initial version. Many definitions would display well this way. But ideally a final version would have some way of displaying definitions with form-of templates or non-gloss definitions. — Eru·tuon 00:20, 11 April 2019 (UTC)
- Navigation popups are used mostly (almost exclusively?) by experienced editors. I don't think they would be daunted by the display of wikitext. DCDuring (talk) 00:34, 11 April 2019 (UTC)
- Perhaps, but that would probably require more dramatic changes to the gadget, since it does some kind of wikitext parsing right now. — Eru·tuon 17:53, 12 April 2019 (UTC)
- Navigation popups are used mostly (almost exclusively?) by experienced editors. I don't think they would be daunted by the display of wikitext. DCDuring (talk) 00:34, 11 April 2019 (UTC)
- I suppose it wouldn't hurt to remove most templates in an initial version. Many definitions would display well this way. But ideally a final version would have some way of displaying definitions with form-of templates or non-gloss definitions. — Eru·tuon 00:20, 11 April 2019 (UTC)
- Is it necessary to expand templates? Even if the popups didn't expand any templates, and only displayed their wikicode (or ignored/dropped any templates other than
Fixing an oversight with the id=
parameter on affix templates
edit
I figured this problem would come up sooner or later, and now it has. In the entry affectioned, it would be useful to specify a sense ID for the suffix -ed so that the user is sent to the right section. But if you give the id2=
parameter on the affix template, then that also changes the categorization. In this case, the categorization is unwanted, because there is only one -ed suffix that actually derives new lemmas (the others are inflectional, so they don't get their own categories). It is not possible to specify a sense id without affecting the category name, and that's an oversight I didn't think of when I first introduced this mechanic.
The question, of course, is what to do about it now. My own idea is to introduce a cat idN=
parameter (in the same spirit as {{head}}
's cat sc=
), which specifies that the id is to be used both as the sense id and as the category qualifier, which is the behaviour that idN=
still has right now. The new idN=
will then specify only the sense id, but not change the category. In theory it's also possible to specify both, but then the category has a different sense id from the link, which may not be desired, so we should probably block that possibility until someone comes up with a legitimate reason to allow it.
Implementing this change will mean that we first have to change all existing idN=
instances to cat idN=
, on the affix templates and within all categories having an ID, to preserve the current behaviour of existing instances of the parameter. Once the idN=
parameter is "orphaned", so to say, the new behaviour for this parameter can be implemented. —Rua (mew) 21:18, 10 April 2019 (UTC)
Isebaert 1977
editI need to create a new template to cite the following as a reference in one of my entries:
Isebaert, Lambert (1977[79]). "Notes de lexicologie tokharienne II." Orbis 26: 381-387
How would I go about this? I'm reasonably sure how to make a new template, but I wanted to check first, seeing as I'm not too experienced with Wiktionary and this is my first time making a template. Would something like {{R:txb:Isebaert 1977}} work, displaying the above text of course? GabeMoore (talk) 20:22, 11 April 2019 (UTC)
- Yes, sounds good. Ideally you would use
{{cite-book}}
in your template, to get consistent formatting, e.g.{{cite-book|first=Lambert|last=Isebaert|pages=381-387|title=Notes de lexicologie tokharienne II.|worklang=fr|year=1977}}
- Isebaert, Lambert (1977) Notes de lexicologie tokharienne II. (overall work in French), pages 381-387
- – Jberkel 20:16, 13 April 2019 (UTC)
Korean interwiki links broken on "^" symbol
editWhen adding translations to Korean terms with the symbol "^" (to force capitalisations of the transliteration), the links to the Korean Wiktionary are broken:
For example, 이탈리아 (Itallia), 이태리 (Itaeri), 이딸리아 (Ittallia) - all have entries in the Korean Wiktionary. It has been the common practice that Korean transliterations can be capitalised (proper nouns, beginning of sentences)- currently three languages with no capitalisations in the native scripts have exceptions. The Korean is the weakest case ( - it hasn't been proven that it's the policy to capitalise romaja for proper nouns. In any case, the common practice is all over the place, used in usage examples throughout. We can review this but please restore this functionality on translation adder - interwikis, which is gone recently. --Anatoli T. (обсудить/вклад) 07:05, 12 April 2019 (UTC)
- @Atitarev: Okay, I've modified MediaWiki:Gadget-TranslationAdder-Data.js to fix this, and it seems to work correctly now. (Fortunately this is one of the easier things to change about the translation adder.) — Eru·tuon 17:44, 12 April 2019 (UTC)
- @Erutuon: Thanks for that. The existing translations will need to be modified manually, again, I guess. Zero edit doesn't help. --Anatoli T. (обсудить/вклад) 22:58, 12 April 2019 (UTC)
- @Atitarev: Yeah. The translation adder doesn't modify existing translations, I think. — Eru·tuon 23:05, 12 April 2019 (UTC)
- I created this list of Korean translations with
{{t}}
from the latest dump that have an entry on the Korean Wiktionary, in case someone would like to switch them to{{t+}}
. — Eru·tuon 00:04, 13 April 2019 (UTC)
- @Erutuon: Thanks for that. The existing translations will need to be modified manually, again, I guess. Zero edit doesn't help. --Anatoli T. (обсудить/вклад) 22:58, 12 April 2019 (UTC)
Cleaning up lang-specific form-of templates, part #2
editI gathered all the lang-specific form-of templates I could locate. There are 355 of them including aliases (321 not including aliases). It's a huge mess:
- they are inconsistently named;
- each one takes different parameters from every other one;
- the output formatting is inconsistent;
- each one categorizes idiosyncratically (or often not at all);
- the documentation is generally in a poor state.
Some of them do various sorts of language-specific things, but most of them don't: They simply do the equivalent of {{inflection of}}
or one of the other general form-of templates, sometimes with additional categorization.
My belief is that we should replace them with general templates whenever possible, because the general templates (a) behave consistently, (b) take consistent parameters, and (c) are well-documented, all of which significantly reduce the cognitive load on editors trying to create or modify the wikitext for these pages. Note that, on top of this, most of these lang-specific form-of templates are for non-lemma forms, which can and should normally be autogenerated, either using the accelerated-creation gadget or by bot, and furthermore, the replacement text is often shorter than the original, now that I've created the {{infl of}}
shortcut and made it possible to put the language code as the first parameter.
I went through all the lang-specific templates and classified them as follows:
- How many uses do they have?
- Are they replaceable with
{{inflection of}}
or another general form-of template without losing functionality? - Do they add special categorization (not including categories like "FOO noun forms" that should already be handled by the headword)?
The list below is a subset, which includes only templates that (a) can be replaced with a general form-of template and (b) don't add special categorization.
In the "disposition" below, I assume that templates with >= 1,000 uses need to be deprecated rather than just deleted. I also think that with very high-use templates (which means maybe >= 20,000 uses or so), we should consider whether we want to deprecate them or just leave them alone; but none of the templates below are anywhere near this threshold (most of them are well below 1,000 uses).
Benwing2 (talk) 19:39, 13 April 2019 (UTC)
- Support. - -sche (discuss) 18:09, 14 April 2019 (UTC)
- They're not listed here, but
{{nl-noun form of}}
and{{nl-adj form of}}
are no longer needed either. WT:ACCEL now generates entries using generic templates like{{plural of}}
for Dutch. The entry gemoedelijk has all inflections created via WT:ACCEL using the new format, and laadpaal equivalently for nouns. If you want to orphan the old templates and replace them with the new format, feel free. {{nl-verb form of}}
could in theory receive the same treatment, but there are some forms like aanziet that{{inflection of}}
doesn't really support yet. I'll have to work out how to add WT:ACCEL support for Dutch verbs before I can say anything conclusive about how to orphan this template. —Rua (mew) 18:19, 14 April 2019 (UTC)- Support. Fay Freak (talk) 19:07, 14 April 2019 (UTC)
- @Rua I've actually classified all the form-of templates according to whether and how to orphan them (as I mention above); for the moment I only included the ones that don't have complications of various sorts. The section of my table for the Dutch templates reads as follows:
Aliased template | Canonical template | #Uses | Replaceable with {{inflection of}} etc. |
Categorizes | Disposition |
---|---|---|---|---|---|
Template:nl-adj form of | Template:nl-adj form of | 4559 | No; needs some language-specific tags, has language-specific glossary links, has post-text when |comp-of= or |sup-of= |
No | Keep |
Template:nl-noun form of | Template:nl-noun form of | 27827 | Yes | Yes | Keep for now due to #uses |
Template:nl-pronadv of | Template:nl-pronadv of | 144 | No | No | Keep |
Template:nl-verb form of | Template:nl-verb form of | 30619 | No due to archaic label and post-text when |sub= |
No | Keep |
- As you can see, my disposition for now is "keep" for all of them for various reasons. That doesn't mean they can't ultimately be orphaned as well, but there are complications. I already have a solution for the issue of categorization that I've implemented privately, but I haven't decided yet what to do in general for language-specific tags, language-specific glossary links, or post-text. One possibility for post-text is to create a separate (often language-specific) template that's placed after the call to
{{inflection of}}
or inserted using a|posttext=
parameter to{{inflection of}}
; I'm planning to do something of this sort for{{ga-lenition of}}
, which has post-text like this:{{#ifeq:{{{2|}}}|ts| {{qualifier|after {{m|ga||an}} and, in some dialects, after any ''l'' or ''n''}}|}}
- This happens when
|2=ts
, which occurs in only 4 or 5 calls to{{ga-lenition of}}
. But this isn't a general solution, esp. when the occurrence of the post-text is common. Benwing2 (talk) 19:54, 14 April 2019 (UTC)- The reasons for keeping
{{nl-adj form of}}
are not very pressing, so you can orphan it. Adding post-text to{{inflection of}}
would completely break the auto-merging that WT:ACCEL does, so it needs a lot of thought. —Rua (mew) 20:07, 14 April 2019 (UTC)
- The reasons for keeping
- As you can see, my disposition for now is "keep" for all of them for various reasons. That doesn't mean they can't ultimately be orphaned as well, but there are complications. I already have a solution for the issue of categorization that I've implemented privately, but I haven't decided yet what to do in general for language-specific tags, language-specific glossary links, or post-text. One possibility for post-text is to create a separate (often language-specific) template that's placed after the call to
Bug in #TABLE called on a list loaded from mw.loadData()?
edit@Erutuon, Rua Are either of you aware of the following: #TABLE on a list that's part of a structure loaded with mw.loadData() returns 0 instead of the correct number? That appears to be the case at least with the table returned by lang:getScriptCodes(). You can verify it in the console, e.g. by adding these lines to the beginning of the function export.request_script() in Module:script utilities:
if type(lang) == "string" then lang = require("Module:languages").getByCode(lang) end local scripts = lang:getScriptCodes() error("#scripts:" .. #scripts)
Then call it from the console as
p.request_script("la")
and the output will be 0. If you modify the code as follows:
if type(lang) == "string" then lang = require("Module:languages").getByCode(lang) end local scripts = lang:getScriptCodes() scripts = require("Module:table").shallowClone(scripts) error("#scripts:" .. #scripts)
Then the output will be 1. This makes no sense to me but I have to assume it's some bug in the metatable that's set on data returned by mw.loadData(). If this applies generally, then it's probably the source of a ton of bugs in our existing code, YUCK. Note that for the structure that has #scripts == 0, you can still iterate over it using ipairs() and get the right values, but table.concat() returns a blank string. Benwing2 (talk) 22:25, 13 April 2019 (UTC)
- This is unfortunately the expected behavior. You can't get the length of
frame.args
or a table loaded withmw.loadData
because they are both empty tables with metamethods that make them seem populated, and they don't have a__len
metamethod, which in regular Lua 5.1 could be used to make#
work. The__len
metamethod seems not to function in Scribunto (demo). I wonder if there was some kind of security problem with it. - The
length
function in Module:table is a more efficient workaround thanshallowClone
, because it doesn't involve creating a new table. — Eru·tuon 22:56, 13 April 2019 (UTC)- @Erutuon Thanks. God, though, that's awful. There must be a ton of bugs floating around our code due to this damaged behavior. In general it's far from obvious whether a given table was loaded using loadData() or created normally. Benwing2 (talk) 23:17, 13 April 2019 (UTC)
- @Benwing2: Hmm, I just found a discussion on MediaWiki (Extension talk:Scribunto/Lua reference manual § __len metamethod) that indicates that the
__len
metamethod didn't exist in Lua 5.1. Looking at the changes for Lua 5.2, apparently the__len
metamethod doesn't work for tables in Lua 5.1 (and there's norawlen
). So that's what's going on. (Presumably the type it did work for is userdata, which we don't have access to in Scribunto, and if we had access todebug.setmetatable
, we could set for most types.) - It is a subtle pitfall. I encountered it before when revamping WT:LL. It took me a while to figure out what was happening. I think there aren't too many places where the length operator is used on such a table, but one place where it's relevant is Module:scripts, where
#scripts <= 1
could be used to get the number of scripts, butscripts[2]
is used instead. Unfortunately, adding__len
probably involves fairly extensive changes to the Lua source code, so they're not likely to do it. (I wish they'd just install Lua 5.3, though it'd probably break things.) — Eru·tuon 23:39, 13 April 2019 (UTC)- @Erutuon Thanks for the analysis. I've never much liked Lua and this certainly doesn't help. I wish they had gone with some other language (e.g. Python), but I understand Python isn't really set up for acting as an embedded language. Benwing2 (talk) 23:43, 13 April 2019 (UTC)
- I don't know how many times I've tripped over Lua's weirdnesses, like 1-based indexing. It was definitely a strange choice for a language (no built-in unicode support, for use on a multi-lingual project?). I hope we won't be stuck forever on 5.1. – Jberkel 12:04, 14 April 2019 (UTC)
- The choice makes sense to me because of Lua's low memory usage compared to Python or JavaScript and ease of embedding, and I'm pretty used to Lua's quirks because it was about the first programming language I learned. But I'm frustrated that we can't use libraries written in C (besides Lua's basic libraries that we have access to in Scribunto). The ease of creating such libraries is one of Lua's strengths, but we aren't taking advantage of it. For instance, a C-based library equivalent to
mw.ustring
like luautf8 is bound to be more efficient than the PHP-based version, and might reduce our memory usage given that we use string functions so heavily. LPeg would also be very useful, though it's hard to learn. But I gather there are reasons for not including C libraries. — Eru·tuon 19:53, 14 April 2019 (UTC)- @Erutuon I would think there would be security issues with allowing unfettered access to C libraries, because there's no way to sandbox them. Benwing2 (talk) 20:04, 14 April 2019 (UTC)
- @Erutuon Your link to LPeg is broken, but from searching the web it looks like a custom regex replacement. Although it may be more powerful, I think that a regular Perl-compatible regex library would be sufficient for most uses and a lot easier to use, as most programmers are already familiar with Perl-compatible regexes as they're in most modern programming languages (Python, Java, JavaScript, etc.). Lua's patterns are crippled, which is quite annoying and inefficient for some uses. Benwing2 (talk) 20:08, 14 April 2019 (UTC)
- I agree that regular expressions would be much more usable for most people. It's been mentioned on Phabricator that typical regular expressions would be pretty unsafe to allow access to, but something like RE2 or Rust regex would be better candidates since they exclude some features found in other regex libraries so that resource usage is more predictable. Rust regex has some very neat features related to sets that I like, but its C interface isn't fully developed and RE2 seems to already have a PHP binding. — Eru·tuon 20:38, 14 April 2019 (UTC)
- @Erutuon Your link to LPeg is broken, but from searching the web it looks like a custom regex replacement. Although it may be more powerful, I think that a regular Perl-compatible regex library would be sufficient for most uses and a lot easier to use, as most programmers are already familiar with Perl-compatible regexes as they're in most modern programming languages (Python, Java, JavaScript, etc.). Lua's patterns are crippled, which is quite annoying and inefficient for some uses. Benwing2 (talk) 20:08, 14 April 2019 (UTC)
- @Erutuon I would think there would be security issues with allowing unfettered access to C libraries, because there's no way to sandbox them. Benwing2 (talk) 20:04, 14 April 2019 (UTC)
- The choice makes sense to me because of Lua's low memory usage compared to Python or JavaScript and ease of embedding, and I'm pretty used to Lua's quirks because it was about the first programming language I learned. But I'm frustrated that we can't use libraries written in C (besides Lua's basic libraries that we have access to in Scribunto). The ease of creating such libraries is one of Lua's strengths, but we aren't taking advantage of it. For instance, a C-based library equivalent to
- I don't know how many times I've tripped over Lua's weirdnesses, like 1-based indexing. It was definitely a strange choice for a language (no built-in unicode support, for use on a multi-lingual project?). I hope we won't be stuck forever on 5.1. – Jberkel 12:04, 14 April 2019 (UTC)
- @Erutuon Thanks for the analysis. I've never much liked Lua and this certainly doesn't help. I wish they had gone with some other language (e.g. Python), but I understand Python isn't really set up for acting as an embedded language. Benwing2 (talk) 23:43, 13 April 2019 (UTC)
- @Benwing2: Hmm, I just found a discussion on MediaWiki (Extension talk:Scribunto/Lua reference manual § __len metamethod) that indicates that the
- @Erutuon Thanks. God, though, that's awful. There must be a ton of bugs floating around our code due to this damaged behavior. In general it's far from obvious whether a given table was loaded using loadData() or created normally. Benwing2 (talk) 23:17, 13 April 2019 (UTC)
More language/family code requests
edit- Teojomulco Chatino (extinct language with no ISO 639-3 code)
- Zapotec (family) – different from Zapotecan (
omq-zap
). Zapotec is one branch of Zapotecan, Chatino is the other branch. - Proto-Zapotec
- Proto-Zapotecan (should be
omq-zap-pro
to match the family code)
--Lvovmauro (talk) 05:07, 14 April 2019 (UTC)
- Done: I've added Teojomulco as omq-teo, Zapotec (family) as omq-zpc, and the proto languages as omq-zap-pro and omq-zpc-pro. - -sche (discuss) 18:38, 14 April 2019 (UTC)
- @-sche: Zapotec-family and Proto-Zapotec categories like Category:Isthmus Zapotec terms inherited from Proto-Zapotec don't seem to like
{{auto cat}}
. Is it because we now have a language family (CAT:Zapotec languages) with the same name as an individual language (CAT:Zapotec language)? —Mahāgaja · talk 18:19, 22 April 2019 (UTC)- The various redundant data pages need to be updated (see Module:data consistency check). DTLHS (talk) 18:23, 22 April 2019 (UTC)
- Done. — Eru·tuon 20:40, 22 April 2019 (UTC)
- The various redundant data pages need to be updated (see Module:data consistency check). DTLHS (talk) 18:23, 22 April 2019 (UTC)
- @-sche: Zapotec-family and Proto-Zapotec categories like Category:Isthmus Zapotec terms inherited from Proto-Zapotec don't seem to like
- Could someone who knows these language families fix the module errors in nis? An example:
Proto-Zapotec (omq-zpc-pro) is not set as an ancestor of Amatlán Zapotec (zpo) in Module:languages/data3/z. The ancestor of Amatlán Zapotec is Proto-Zapotecan (omq-zap-pro).
. It looks like a bunch of languages need to be moved from the Zapotecan family to the Zapotec family. — Eru·tuon 20:06, 28 April 2019 (UTC)
How come the interwiki link to zh:教會 is missing from 教會 even though the page exists at zh.wikt? — justin(r)leung { (t...) | c=› } 06:00, 14 April 2019 (UTC)
Help with Module:fur-conj
editI don't know any Friulian at all, but Module:fur-conj seems to be malfunctioning. {{fur-conj-jessi}}
is using a host of positional parameters to specify all the irregular forms of this verb (which means 'to be', so we expect a bunch of irregular forms in a Romance language), but at jessi itself the table is displaying nothing but automatically generated regular form. User:KarikaSlayer, who created the module and the templates that rely on it, seems not be active at Wiktionary at the moment. Can anyone else figure out how to fix it so that the irregular forms listed at {{fur-conj-jessi}}
actually get displayed in the entry? Thanks! —Mahāgaja · talk 10:34, 15 April 2019 (UTC)
- @Mahagaja It looks like the module was never finished. There's no support at all in the module for specifying the conjugation through positional parameters, like jessi is trying to do. I think the intention was to put all the irregular verbs in Module:fur-conj/data rather than in a template that calls the module (which IMO makes sense), but the only irregular verb implemented in the module is vê (“to see”). The module needs a good deal of work to get it working properly, I'm afraid. Benwing2 (talk) 11:12, 15 April 2019 (UTC)
- @Benwing2: OK, then I'm just gonna remove the template from the entry. No info at all is better than a template suggesting it's a regular verb. If anyone feels like cleaning up the the module, they can add the template back in. —Mahāgaja · talk 11:27, 15 April 2019 (UTC)
¿¡
editCan someone change Module:links to strip the characters ¿ and ¡ from in front of Spanish translations and links? Ultimateria (talk) 18:56, 15 April 2019 (UTC)
- @Ultimateria: This seems to already be done (but by
Language:makeEntryName()
in Module:languages). For instance{{l|es|¿prueba?}}
links to prueba (current output: ¿prueba?). Can you show where these characters aren't being stripped? — Eru·tuon 19:35, 15 April 2019 (UTC)- Oh I see. I've been looking at User:Matthias Buchmeier/en-es-c and the like, and pages with these two characters have been coming up as redlinks. I thought they were redlinks in the translation tables as well. Ultimateria (talk) 20:04, 15 April 2019 (UTC)
Template:quote-web possibly broken
editThe page on the defensive shows the year as 2019, when it should be 2017. If I could fix things I would. --I learned some phrases (talk) 21:41, 17 April 2019 (UTC)
- So I fixed that page (it was easy) but there's probably a huge bunch of pages with similar mistakes (which probably I caused) --I learned some phrases (talk) 21:50, 17 April 2019 (UTC)
- If the date can't be parsed it will do stupid things like substitute the current date. DTLHS (talk) 21:58, 17 April 2019 (UTC)
Parenthesis in search
editExcuse my ignorance, I need some help.
Why can't I get pages ending at at Special:AllPages?
And, is there a way to find at Special:Search an exact expression in parenthesis: (xxxx) in pages, titles, Cats... but NOT when without parentheses xxxx? (I have tried google too) --sarri.greek (talk) 18:25, 23 April 2019 (UTC)
- The pages ending at still starts from the beginning, and maintains the ordering, so unless you have a small result you will still see a large list. Compare aardvark to ant with aardvark to aardvarks. Unfortunately search strips out all punctuation, so searching for something containing parentheses will not work. - TheDaveRoss 19:29, 23 April 2019 (UTC)
- Thank you TheDaveRoss. Stupid me, I was using ONLY the pages ending at thinking it is a reverse dictionary. As for parenthesis... I will have to check all words. Thanks. sarri.greek (talk) 19:36, 23 April 2019 (UTC)
- If "(xxx)" is inside the source code (not the generated text), then you can search for it with
insource:/\(xxx\)/
. The parentheses need a backslash\
before them so that they are not interpreted as regular expression syntax. Doinsource:"xxx" insource:/\(xxx\)/
to reduce search time. To search for "(xxx)" in the title, you can dointitle:/\(xxx\)/
,intitle:"xxx" intitle:/\(xxx\)/
. I don't know of a way to search for parentheses in generated text, for instance in the output of{{l}}
. — Eru·tuon 19:42, 23 April 2019 (UTC)- Thank you Erutuon! Great help! --sarri.greek (talk) 23:05, 28 April 2019 (UTC)
Turning off “Expand all sections” doesn’t work on mobile Safari
editEven if I switch this toggle off in Settings, every language section is expanded when I visit a word’s page.
Sorry if this isn’t the right way to report this, I’m new. I searched for similar issues but nothing came up. — This unsigned comment was added by Jackvlj (talk • contribs) at 20:14, 23 April 2019 (UTC).
I’m happy to look into this once pointed in the right direction.
Cleaning up lang-specific form-of templates, part #3
editAbove, I eliminated a number of lang-specific form-of templates and replaced them with generic templates such as {{inflection of}}
. I explained there why it's a good idea to do this (e.g. the parameters are standardized, making it easier to understand and edit, and the appearance is also standardized). The following is the list of lang-specific form-of templates that I plan on deprecating or deleting in this round. This includes the majority of the remaining templates that can be eliminated in this fashion. The ones that remain are either quite high-use (which I define as >= 20,000 uses) or require additional thought. Note that some of them are found in a subpage of Module:accel; I'll fix this before deleting or deprecating the template.
Benwing2 (talk) 03:20, 24 April 2019 (UTC)
- The Lithuanian ones should be translated into English. We do not want or need foreign languages in our definitions, especially when there are English alternatives.
- The word "simple" in English should be removed. This word only has any relevance when discussing the different periphrastic constructions of English verbs, but in terms of inflection there are only present and past, there is nothing for "simple" to contrast with.
- Regarding the Latvian comparative and superlative templates, it is a bit strange. They are defined like lemmas, with glosses like on visasākais, have their own inflection table, and are even categorised as "superlative adjectives" which as a name is a lemma category (vs "adjective superlative forms" for non-lemmas). However, they are also placed in Category:Latvian non-lemma forms. I think these really need a complete overhaul beyond what a bot can do: removing all the glosses, which just repeat the lemma, moving inflections to the lemma, and recategorising them as "adjective form". —Rua (mew) 12:51, 24 April 2019 (UTC)
- @Rua I agree the we should avoid use of terms like "dalyvis" and "būdinys" to the extent possible. From reading the Wikipedia article, the term "dalyvis" is entirely unnecessary as it simply refers to adjectival participles. However, the situation with the other terms (which all refer to adverbial participles) is trickier. Wikipedia says these terms don't really have well-known English equivalents:
- As the name suggests, adverbial participles have the characteristics of an adverb and are used to describe the verb instead of the subject. There are three types of such participles: padalyvis ("sub-participle"), pusdalyvis ("half-participle") and būdinys ("descriptive participle"). These forms are not conjugatable, although the pusdalyvis has feminine and masculine genders for both singular and plural. These forms do not have equivalents in English or other languages (except Latvian), the given translations of these names are ad hoc.
- On the other hand, looking at various grammars, it appears that padalyvis participles can be rendered as simply "adverbial participles" (or "gerunds") in various tenses, while pusdalyvis is frequently rendered as "half-participle", (sometimes "semi-participle" or "special adverbial participle"). As for būdinys, it is often not mentioned at all, but one place [3] calls it a "manner-of-action participle" and another calls it a "verbal adverb". Benwing2 (talk) 05:15, 30 April 2019 (UTC)
- Even if people in general will not know what a "half-participle" is, it's still more transparent to the reader than "pusdalyvis". Some languages unfortunately do have grammatical intricacies that have to be described using terminology that not many will be familiar with. One can imagine that the average English language enthusiast doesn't know what all the cases in Hungarian mean, for example. But English, as the language of Wiktionary itself, has a unifying role as well. The delative case in Hungarian can be equated with other cases named delative in other languages. If we were to use the native grammatical terms, then such connections become impossible to see. In the case of Lithuanian in particular, comparisons with Latvian can be useful, so we should use the same terminology when possible. Where the forms are uniquely Lithuanian, we should still use English. —Rua (mew) 11:27, 30 April 2019 (UTC)
- @Rua I agree with you in general, the question is just what terminology to use. Latvian appears to have three adverbial participles, all of which are present active. Wiktionary terms them respectively the "variable" (in -dams, which can vary by gender and number, but not by case), the "invariable" in -ot (which is indeclinable), and the "object of perception" in -am (which is also indeclinable). I can't find any other references that use the "object of perception" terminology; in general, they just say "the indeclinable participle in -ot" and "the indeclinable participle in -am" (as in e.g. [4], which has a long discussion of these and points out that the indeclinable participle in -am (which Wiktionary calls the "object of perception" participle) has no counterpart in Lithuanian). The Latvian "variable" adverbial participle in -dams corresponds to the Lithuanian pusdalyvis participle in -damas, while the Latvian "invariable/indeclinable" adverbial participle in -ot corresponds to the present active padalyvis participle in -ant. The Lithuanian būdinys participles (there are two, ending in -te and -nai) do not seem to have equivalents in Latvian. In general, Lithuanian and Latvian participles don't map perfectly, and Lithuanian has many more (it's often said to have 13 participles). Interestingly, the conjugation entries in Wiktionary do give English names to these participles, e.g. sutrukdyti calls būdinys "manner of action" in agreement with [5], while it calls pusdalyvis "special", and very strangely calls padalyvis "half-participle", which is more commonly used for pusdalyvis. I am inclined to use the terms "gerund", "half-participle" (or "special participle") and "manner-of-action participle" to refer to respectively the padalyvis, pusdalyvis and būdinys participles, and fix the conjugation templates accordingly. But you see the problem here, where the Lithuanian native-grammar terminology is standard but the English terminology is not. Benwing2 (talk) 02:40, 1 May 2019 (UTC)
- @Benwing2 I've changed all uses of
{{de-form-adj}}
to{{inflection of|de}}
so you know.Jonteemil (talk) 07:26, 14 November 2019 (UTC)
- @Benwing2 I've changed all uses of
- @Rua I agree with you in general, the question is just what terminology to use. Latvian appears to have three adverbial participles, all of which are present active. Wiktionary terms them respectively the "variable" (in -dams, which can vary by gender and number, but not by case), the "invariable" in -ot (which is indeclinable), and the "object of perception" in -am (which is also indeclinable). I can't find any other references that use the "object of perception" terminology; in general, they just say "the indeclinable participle in -ot" and "the indeclinable participle in -am" (as in e.g. [4], which has a long discussion of these and points out that the indeclinable participle in -am (which Wiktionary calls the "object of perception" participle) has no counterpart in Lithuanian). The Latvian "variable" adverbial participle in -dams corresponds to the Lithuanian pusdalyvis participle in -damas, while the Latvian "invariable/indeclinable" adverbial participle in -ot corresponds to the present active padalyvis participle in -ant. The Lithuanian būdinys participles (there are two, ending in -te and -nai) do not seem to have equivalents in Latvian. In general, Lithuanian and Latvian participles don't map perfectly, and Lithuanian has many more (it's often said to have 13 participles). Interestingly, the conjugation entries in Wiktionary do give English names to these participles, e.g. sutrukdyti calls būdinys "manner of action" in agreement with [5], while it calls pusdalyvis "special", and very strangely calls padalyvis "half-participle", which is more commonly used for pusdalyvis. I am inclined to use the terms "gerund", "half-participle" (or "special participle") and "manner-of-action participle" to refer to respectively the padalyvis, pusdalyvis and būdinys participles, and fix the conjugation templates accordingly. But you see the problem here, where the Lithuanian native-grammar terminology is standard but the English terminology is not. Benwing2 (talk) 02:40, 1 May 2019 (UTC)
- Even if people in general will not know what a "half-participle" is, it's still more transparent to the reader than "pusdalyvis". Some languages unfortunately do have grammatical intricacies that have to be described using terminology that not many will be familiar with. One can imagine that the average English language enthusiast doesn't know what all the cases in Hungarian mean, for example. But English, as the language of Wiktionary itself, has a unifying role as well. The delative case in Hungarian can be equated with other cases named delative in other languages. If we were to use the native grammatical terms, then such connections become impossible to see. In the case of Lithuanian in particular, comparisons with Latvian can be useful, so we should use the same terminology when possible. Where the forms are uniquely Lithuanian, we should still use English. —Rua (mew) 11:27, 30 April 2019 (UTC)
- @Rua I agree the we should avoid use of terms like "dalyvis" and "būdinys" to the extent possible. From reading the Wikipedia article, the term "dalyvis" is entirely unnecessary as it simply refers to adjectival participles. However, the situation with the other terms (which all refer to adverbial participles) is trickier. Wikipedia says these terms don't really have well-known English equivalents:
Dividing derived-terms columns into sections
editTraditionally, my way of organising derived terms is by ordering them like this (seen on e.g. spaak, huis and lopen):
- Affixations
- Compounds with the headword as head (last element)
- Compounds with the headword as modifier (first element)
- Phrases, idioms, anything else
This ordering has always been somewhat implicit, but it's obviously destroyed by the sorting of {{col}}
, so you have to use the unsorted versions. It would be nice if the list of derived terms could have sections to indicate this division and make it explicit. Naturally, only {{col-u}}
should support this, since sorting the entries would mess everything up. Also, I don't think we should even keep the sorting versions around anyway, now that we have the nice substable {{sort}}
. (Thanks @Erutuon!) So I'd rather discourage people from using them, and make this feature exclusive to the unsorted versions.
I suppose there could be parameters to specify a section header, with appropriate syntax to indicate that it isn't a word. Something like #Affixations#
maybe. But such a freeform solution would lead to inconsistency across entries, whereas things like affixations, head-in-compounds and modifier-in-compounds are recurring things that will surely show up in many derived terms sections across languages. So maybe instead of allowing just anything to be entered between the # #
, only certain predefined things can be added there, to encourage uniformity. —Rua (mew) 16:48, 24 April 2019 (UTC)
- @Rua In general I agree with your sentiments and I like the idea of allowing for section headers. I'm of two minds as to whether we should disallow arbitrary headers; I like the idea of enforcing uniformity, except that it's hard to anticipate all the language-specific headers required (e.g. affixations and compounds don't apply very well to Arabic, where instead you might want "form I derivations", "form II derivations", etc.). I also strongly disagree that we should be discouraging users from using the autosort versions. I'm not really sure what your objections are to autosorting; if this is for efficiency purposes, keep in mind that Lua's
table.sort
is implemented using quicksort, which is O(N ln N), i.e. effectively linear, and is in-place, requiring no extra memory. Forcing users to do a bunch of extra keystrokes to get sorting, and repeat this every time they add entries to the list, is asking for trouble and is not what we want to do. Benwing2 (talk) 23:10, 2 May 2019 (UTC)- @Benwing2 I suppose an alternative solution, which I somehow missed, is to write the different "sections" directly in the wikitext, and have each followed by its own
{{col}}
. I think this would be cleaner than complicating the template further. However, I'm not sure what would be the best way to format these section headers.;
doesn't seem appropriate, as it's not a definition list. Using a heading would trip up scripts like User:Erutuon's that find nonstandard headers in entries. I guess plain bold text'''
is all that remains then, or maybe even completely unformatted text? —Rua (mew) 11:28, 6 May 2019 (UTC)- Module:columns currently formats the header with
<div class="term-list-header">...</div>
. See sea § Derived terms for instance. The tag needs to be locatable by CSS or JavaScript. — Eru·tuon 17:57, 6 May 2019 (UTC)- @Erutuon Oh, I didn't know that already existed! I'll use that then. I'm not a fan of the current formatting, but that's an implementation detail. —Rua (mew) 20:25, 6 May 2019 (UTC)
- Module:columns currently formats the header with
- Also pinging @Lingo Bingo Dingo, Lambiam as editors who often work with Dutch entries and have thus presumably already encountered this practice in Derived terms. —Rua (mew) 11:30, 6 May 2019 (UTC)
- @Benwing2 I suppose an alternative solution, which I somehow missed, is to write the different "sections" directly in the wikitext, and have each followed by its own
Cyrillic letter replacements in the translation adder
editThe translation adder fails to convert Ossetian letters again. E.g. фæрсдзырд with the Roman æ to the Cyrillic ӕ, the correct normalised spelling is фӕрсдзырд (færsʒyrd, “adverb”). It does the right thing for the Chuvash ĕçхĕлтеш -> ӗçхӗлтеш (ĕçhĕlt̬eš, “adverb”) and fixes palochkas for Chechen. Can someone fix it æ/ӕ problem, please? Calling @Erutuon. --Anatoli T. (обсудить/вклад) 01:51, 25 April 2019 (UTC)
- Sorry, I am not really sure what the translation adder does in this area. Does it replace some Latin characters with Cyrillic ones in the appropriate languages? I see a bunch of "from" and "to" and "strip" replacements in MediaWiki:Gadget-TranslationAdder-Data.js, but it seems like these are for converting from displayed text to entry name, not for correcting incorrect characters. — Eru·tuon 03:35, 25 April 2019 (UTC)
- @Erutuon yes, sorry, I didn't explain it well. It changes from displayed to entry name and I find this feature very useful. I copy the corrected "displayed text" to the entry field. Perhaps this can be enhanced in the future to correct certain wrong characters for both entry and display terms but for now, I just need Ossetian æ/ӕ fixed, please. --Anatoli T. (обсудить/вклад) 06:24, 25 April 2019 (UTC)
- Hmm, it turns out I can't edit this page any more. I need to be able to make further changes, not just for Cyrillic scripts. --Anatoli T. (обсудить/вклад) 07:36, 25 April 2019 (UTC)
- @Atitarev: I made the change to the data for Ossetian. For some reason the exact same characters were in the "from" and "to" fields. — Eru·tuon 02:37, 26 April 2019 (UTC)
- @Erutuon: It works for me, thanks! You need to clear the cache in your browser. We'll have to do more of those but I want my access back. --Anatoli T. (обсудить/вклад) 03:06, 26 April 2019 (UTC)
- Anatoli, it's because you're not an interface admin, but the right should be given back to you. @Chuck Entz, SemperBlotto, can you please attend to this? —Μετάknowledgediscuss/deeds 03:02, 6 May 2019 (UTC)
- @Erutuon: It works for me, thanks! You need to clear the cache in your browser. We'll have to do more of those but I want my access back. --Anatoli T. (обсудить/вклад) 03:06, 26 April 2019 (UTC)
- @Atitarev: I made the change to the data for Ossetian. For some reason the exact same characters were in the "from" and "to" fields. — Eru·tuon 02:37, 26 April 2019 (UTC)
- Hmm, it turns out I can't edit this page any more. I need to be able to make further changes, not just for Cyrillic scripts. --Anatoli T. (обсудить/вклад) 07:36, 25 April 2019 (UTC)
- @Erutuon yes, sorry, I didn't explain it well. It changes from displayed to entry name and I find this feature very useful. I copy the corrected "displayed text" to the entry field. Perhaps this can be enhanced in the future to correct certain wrong characters for both entry and display terms but for now, I just need Ossetian æ/ӕ fixed, please. --Anatoli T. (обсудить/вклад) 06:24, 25 April 2019 (UTC)
Middle Scots
editIs it intentional that Middle Scots isn't listed as an ancestor of Scots? I feel like that's wrong. —Globins (yo) 03:32, 25 April 2019 (UTC)
- @Globins: We don't consider Middle Scots to be a separate language from Scots, that's why. We treat it as an "etymology-only" language, which means that in an Etymology section we can say a term comes from Middle Scots, but the entry itself is listed simply as Scots. A parallel case is Old Italian, which is an etymology-only variety of Italian. In such cases we cannot say that a Scots word is inherited from Middle Scots or that an Italian word is inherited from Old Italian. I agree it's unfortunate, but I don't know how to fix it, or if there's consensus that it's even a problem that needs fixing. —Mahāgaja · talk 08:00, 25 April 2019 (UTC)
- Technical barriers can be cleared, the issue is more that it doesn't make logical sense. If one says that Middle Scots is part of Scots, but then claim a Scots term inherits from Middle Scots, what you end up saying is that it inherited from itself. Since that's nonsensical, our templates don't allow it. —Rua (mew) 22:29, 28 April 2019 (UTC)
- Then maybe we should consider Middle Scots and Old Italian separate languages, because it isn't nonsensical (IMO) to say a modern Scots word is inherited from Middle Scots or that a modern Italian word is inherited from Old Italian. And it does strike me as inconsistent to allow
{{der|sco|sco-smi}}
and even{{bor|sco|sco-smi}}
but not{{inh|sco|sco-smi}}
. —Mahāgaja · talk 05:39, 29 April 2019 (UTC)- It would make more sense to me if there were an etymology-only code for Modern Scots, let's say
sco-mod
, and etymology-only codes could be used in the first parameter, so that we could do{{inh|sco-mod|sco-smi}}
to indicate that the Modern Scots term is inherited from a Middle Scots term. This is currently not possible; the first code has to be a regular language code. — Eru·tuon 04:19, 30 April 2019 (UTC)- Could something like this be added without having to edit the template significantly? —Globins (yo) 16:49, 30 April 2019 (UTC)
- @Globins: I'm not entirely sure all that would be required, but in addition to edits to the module that handles the etymology templates, Middle Scots would have to be indicated as an ancestor of Modern Scots in Module:etymology languages/data. This would change the rules of the language data modules significantly, so would require more discussion (and careful consideration of any possible side-effects). — Eru·tuon 17:07, 30 April 2019 (UTC)
- Could something like this be added without having to edit the template significantly? —Globins (yo) 16:49, 30 April 2019 (UTC)
- It would make more sense to me if there were an etymology-only code for Modern Scots, let's say
- Then maybe we should consider Middle Scots and Old Italian separate languages, because it isn't nonsensical (IMO) to say a modern Scots word is inherited from Middle Scots or that a modern Italian word is inherited from Old Italian. And it does strike me as inconsistent to allow
- Technical barriers can be cleared, the issue is more that it doesn't make logical sense. If one says that Middle Scots is part of Scots, but then claim a Scots term inherits from Middle Scots, what you end up saying is that it inherited from itself. Since that's nonsensical, our templates don't allow it. —Rua (mew) 22:29, 28 April 2019 (UTC)
gebaptopatroj unrecognised POS category
editThis entry isn't being categorised as either a lemma or a non-lemma, because "pluralia tantum" is (correctly) not recognised as a part of speech by Module:headword. It appears that the Esperanto module is giving the wrong category as the part of speech; "nouns" is in there, just not as the POS category. —Rua (mew) 22:03, 28 April 2019 (UTC)
- @Rua: Fixed in Module:eo-headword. — Eru·tuon 23:16, 28 April 2019 (UTC)
[POLL] Handling of language-specific categorization in form-of entries
editI am trying to clean up language-specific form-of entries, replacing language-specific templates with equivalent calls to either {{inflection of}}
(or to wrapper templates {{adj form of}}
, {{noun form of}}
, {{verb form of}}
, which take exactly the same parameters as {{inflection of}}
and which I explain below). The reasons for this are fundamentally that
- each language-specific template does things differently, with its own use of parameters, its own conventions, its own abbreviations, etc.;
- the resulting display is inconsistent, with terms typically not linked and with varying use of initial caps and final punctuation;
- most of them aren't properly documented, and the only way to understand them is to read the template code;
- much of the template code is poorly written and full of bugs.
Some examples of what these conversions involve:
Non-lemma page | Lang-specific template | Lang-specific display | Equivalent with generic template | Generic display |
---|---|---|---|---|
rotem | {{de-form-adj|s|n|d|rot}} |
Template:de-form-adj | {{adj form of|de|rot||str|dat|n|s}} |
strong dative neuter singular of rot |
rotem | {{ca-verb form of|p = 1|n = pl|t = pres|m = sub|rotar}} |
{{ca-verb form of|p = 1|n = pl|t = pres|m = sub|rotar}}[commented out] | {{verb form of|ca|rotar||1|p|pres|sub}} |
first-person plural present subjunctive of rotar |
heisse | {{de-verb form of|heissen|3|s|k1}} |
(deprecated template usage) Third-person singular subjunctive I of heissen. | {{verb form of|de|heissen||3|s|sub|I}} |
third-person singular subjunctive I of heissen |
съм | {{bg-verb form of|person = third|number = singular|tense = present|mood = indicative|verb = съм}} |
(deprecated template usage) Third-person singular present indicative form of съм. | {{verb form of|bg|съм||3|s|pres|ind}} |
third-person singular present indicative of съм (sǎm) |
ⰿⰾⱑⰽⱁⰿⱐ | {{cu-form of|ⰿⰾⱑⰽⱁ|type = noun|case = instrumental|pl = singular|sc = Glag}} |
(already deleted) | {{noun form of|cu|ⰿⰾⱑⰽⱁ||ins|s}} |
instrumental singular of ⰿⰾⱑⰽⱁ (mlěko) |
αδελφοί | {{el-form-of-nounadj|αδελφός|c=nv|n=p}} |
Nominative and vocative plural form of αδελφός (adelfós). | {{inflection of|el|αδελφός||nom//voc|p}} |
nominative/vocative plural of αδελφός (adelfós) |
Note how each lang-specific template has its own set of parameters, which are often obscure (e.g. k1
for "subjunctive I").
I went through and made a list of all the templates that could be converted. One thing I discovered in the process of doing this is that a number of the templates add the non-lemma pages to categories of various sorts. I generally believe that most non-lemma categorization is unnecessary, but I didn't want to unilaterally remove the categorization without discussion. Instead, I proceeded as follows:
- In part 2 above I listed the ones that did NOT categorize in this fashion, got assent to convert them, and converted them.
- I centralized all the form-of category logic into Module:form of/cats. Despite being a module, this contains no code, only data, in the form of simple if/then/else conditions that mirror the logic embedded in the various lang-specific form-of templates. The idea is that the categories will be added when
{{inflection of}}
(or the variants{{noun form of}}
,{{verb form of}}
,{{adj form of}}
) is called with certain tags, in exactly the circumstances where the corresponding lang-specific template would have added the same category. As an example,{{bg-noun form of|singular|definite|noun=аба}}
when called on the page абата (abata) adds that non-lemma form to Category:Bulgarian noun definite forms, and in general adds all nouns with thedefinite
tag to this category, so I made{{noun form of}}
do the same when called with thedef
tag with lang codebg
. That way,{{bg-noun form of|singular|definite|noun=аба}}
could be converted to{{noun form of|bg|аба||def|s}}
while not changing the categorization. In general, the intent here was to keep the same logic that was already present, and not add any new categories that weren't there already. My further plan was/is to eliminate as many of them as possible, through discussion. I believe that having them centralized makes it easier to understand the whole picture regarding which categories are present, which in turn makes it easier to decide in a holistic fashion what should stay and what should go. - I listed in part 3 above the remaining lang-specific form-of templates that can be converted to a generic template once the categorization logic is in place.
(sorry for the lengthy explanation)
User:Rua agrees with most of what I'm trying to do, and in particular wants to eliminate the lang-specific form-of templates and most of their categorization, but disagrees with my approach and in particular with adding categorization to {{inflection of}}
/ {{noun form of}}
/ etc.
Given this goal, there are various ways to proceed, and I'd like to see what people think:
- Keep the categorization mechanism already in-place, convert the remaining convertible lang-specific templates to generic templates, and strive to eliminate most of the non-lemma categories, by discussion. (In this approach, eliminating them is easy to do; just remove the corresponding entries from Module:form of/cats.) This is my preferred approach.
- Keep the categorization mechanism already in-place but don't convert any more lang-specific templates to generic templates.
- Disable the categorization mechanism and leave all remaining lang-specific templates in place.
- Disable the categorization mechanism and go ahead and convert the remaining convertible lang-specific templates to generic templates. In the process this will disable en masse the non-lemma categorization these lang-specific templates provided.
- Disable the categorization mechanism and leave all remaining lang-specific templates in place, but then gradually eliminate categorization from them, maybe converting them afterwards to generic templates. I think User:Rua advocates an approach something like this. I don't quite understand her thoughts, but they appear to require multiple rounds of bot runs, first adding
|nocat=1
and then later removing it. - Another possibility I just thought of: Make
{{inflection of}}
not categorize at all unless either a part of speech is passed in using|p=
/|POS=
, or|cat=1
is set. That would guarantee that existing entries never categorize, but the newly converted entries that use{{verb form of}}
/{{noun form of}}
/{{adj form of}}
would categorize (since they set|p=
), and entries that are newly converted to{{inflection of}}
can set|cat=1
if and only if they need categorization. As categorization is removed from individual languages, the non-lemma entries can be bot-converted from{{verb form of}}
/{{noun form of}}
/{{adj form of}}
/{{inflection of|...|cat=1}}
to plain{{inflection of}}
. This solution might be acceptable to User:Rua.