Wiktionary:Grease pit/2024/May


Lithuanian collation

I've been cleaning up some coding errors in Lithuanian entries, and I noticed that seven years ago @エリック・キィ had ineffectually added |sort=skesti to the headword line of Lithuanian skę̃sti? What problem was it hoped to address? The word currently gets sorted (except when separated from its tagging as Lithuanian between "ske" and "skė" and in particular before "ski"; it looks as though sorting was much worse when the entry was created. It's possible that the hope was that it would be sorted between Lithuanian skėrys and Lithuanian skėtis as would be done by Mimer SQL. (Notifying Agamemenon, Apisite, BigDom, GabeMoore, Insaneguy1083, Helrasincke, Hippietrail, RichardW57, Sławobóg, 70.175.192.217): . I have removed the ineffectual parameter from the page. RichardW57m (talk) 13:13, 1 May 2024 (UTC)[reply]

And we can find successive dictionary entries Lithuanian skersvėjis, Lithuanian skėsti, skęsti, sketera,[1] showing the problems with promoting secondary collating differences to primary differences. --RichardW57m (talk) 14:50, 1 May 2024 (UTC)[reply]
Where is the design rational for our current Lithuanian collation? As far as I can tell, if it was controlled from Module:languages/data/2, it was implemented on or after 2 December 2022, in Module:lt-sortkey, which was deleted on 7 January 2023. Essentially - why are secondary collation differences promoted to primary, whereas they are simply ditched in French, so sort French e, é, è and ê the same, but make Lithuanian e, ę and ė sort like completely different letters? Was it a conscious decision? I suspect the decision was taken by @Theknightwho, and it can always be justified by being better than what went before. --RichardW57m (talk) 17:25, 1 May 2024 (UTC)[reply]
@RichardW57m What do you mean by "problems with promoting secondary collating differences to primary differences"? Can you clarify? BTW I doubt User:Theknightwho intentionally made a decision to change Lithuanian sorting order. He did a lot of work restructuring the *handling* of sort keys, but AFAIK the intent was to preserve whatever sorting rules were already present. Benwing2 (talk) 23:58, 1 May 2024 (UTC)[reply]
@Benwing2 Richard is referring to the Unicode Collation Algorithm, which uses primary, secondary and tertiary weightings (secondary tiebreaks primary, and so on). While I'd very much like us to use the UCA, implementing it would be a lot of work, but it would be a big improvement over the crude sort methods we generally use at the moment.
To answer @RichardW57m's question: the current sortkey isn't based on the UCA - with a handful of exceptions, our sortkey algorithms are very simple. Theknightwho (talk) 00:23, 2 May 2024 (UTC)[reply]
@Benwing2, Theknightwho: Well, someone made a change between 28 November 2022 and 15 December 2022 (see [2]), but the history is hidden in the now deleted Module:lt-sortkey. Was it perhaps @Octahedron80? --RichardW57 (talk) 06:48, 2 May 2024 (UTC)[reply]
I'm afraid Theknightwho's answer isn't an answer to my question. While our collations seem to be defined entirely by what the UCA would term primary keys, they tend to attempt to approximate the native sorting orders. It hasn't always been done - our Roman script Pali sorting bears no relationship to the usual sort order, for example, being mostly based on the foremost sorting order for each script. Someone should be making a decision on how to do the approximation - but they may not actually understand the subtlety of the secondary level. --RichardW57 (talk) 06:48, 2 May 2024 (UTC)[reply]
@RichardW57 There's only one edit to Module:lt-sortkey, when it was created on Dec 1 2022 by User:Theknightwho. It looks like prior to that there was no sorting algorithm defined for Lithuanian. The Lithuanian dictionary at [3] sorts ę as if it were e (e.g. gesti is directly followed by gęsti) but treats ė as a distinct letter, hence gežtis is directly followed by gėbelėti. I think you should figure out the correct sort order according to standard dictionaries, and then we can implement it. Benwing2 (talk) 07:07, 2 May 2024 (UTC)[reply]
{[re|Benwing2}} There had been a prior algorithm, but it was subsumed in the stripping of the three stress accents to generate page names. I haven't dug into the history of this stripping, but it may explain some of the oddities of Lithuanian templates if it was a later feature on Wiktionary.
Am I missing a trick with the LKZ web site? I can't see how to get a dictionary page from it, only dictionary entries. Short of ordering books, I could only find Lalis's dictionary and introductory pages. --RichardW57m (talk) 08:40, 2 May 2024 (UTC)[reply]
@RichardW57m You can e.g. type just g in the search box and hit enter, and down the left rail you'll see a list of all the entries starting with g, in sorted order. Benwing2 (talk) 08:56, 2 May 2024 (UTC)[reply]
@Benwing2: Thanks. Curious. The 'standard' Lithuanian collation as defined by CLDR and demonstrated (today) at [4] has gėbelėti before gežtis. --RichardW57m (talk) 10:19, 2 May 2024 (UTC)[reply]
@Benwing2: But remember Theknightwho's assessment above of the UCA (let alone the CLDR Collation Algorithm) being a lot of work. Last time I looked, the latter wasn't clearly defined. --RichardW57m (talk) 10:31, 2 May 2024 (UTC)[reply]
This LKZ list may not be reliably sorted. We have four consecutive items gabumas, gabūnas, gabuoti, gaburdalas, but the distinctly non-empty lists of words stating 'bub' and 'būb' do not overlap! Likewise for words starting gabu and gabū. At https://zodynas.vz.lt/terminaiRaidec.php, I found the index headings/links ABCČDEĖFGHIJKLMNOPRSŠTUŪVZŽ, but 'E' and 'Ė' and also 'U' and 'Ū' pointed to the same lists! Possibly that's a case of the author and the software having different ideas about Lithuanian sorting. The list for 'U' and 'Ū' contained examples of both as initial letter. --RichardW57m (talk) 13:10, 2 May 2024 (UTC)[reply]

References

  1. ^
    1915, Antanas Lalis, A Dictionary of the Lithuanian and English Languages[1], Chicago, page 325

--RichardW57m (talk) 14:50, 1 May 2024 (UTC)[reply]

Strange behaviour of translation template

The translations for verb play, sense "deal with a situation in a diplomatic manner", display OK, as you will see. However, if everything else in the "translations" section is deleted, leaving only the "deal with a situation in a diplomatic manner" part, then the translations become corrupted with funny characters, stuff like "Finnish: ⦃⦃t+¦fi¦hoitaa¦¦¦¦¦¦¦¦¦⦄⦄"etc., as you can see here in a test edit that I have now reverted. (Of course, I do not actually want to delete everything else in the section. I wanted to make another edit, which went wrong for the same reason, and in trying to work out where the problem lay, I successively deleted other sections, until nothing else was left, to arrive at the minimal case that exhibited the problem.) Any ideas? Mihia (talk) 17:39, 1 May 2024 (UTC)[reply]

@Mihia This is not a bug. The {{tt}} and {{tt+}} templates need to be surrounded by {{multitrans}} in order for them to work and not display "funny characters" (as you say). If you delete the surrounding call to {{multitrans}}, you need to convert {{tt}} back to {{t}} and {{tt+}} back to {{t+}}. But a better solution is just to not do this. You can also put the call to {{multitrans}} around each translation section instead of once around all of them, but that will partly negate the memory-saving benefits of {{multitrans}}. Benwing2 (talk) 23:53, 1 May 2024 (UTC)[reply]
To expand on Ben's point: the reason we originally used {{multitrans}} was to avoid hitting the 50MB memory limit on large pages (which is no longer much of a concern since the limit got raised to 100MB a few months ago), but it's still very useful because it helps with page loading times as well. water/translations would be totally unusable without it, for example. Theknightwho (talk) 00:26, 2 May 2024 (UTC)[reply]
Thanks. The way this is presently laid out, the start of "multitrans" is embedded within one "trans" block, and the end of it, "}}", I think I can now see, is embedded within another, so I must say that it is non-obvious to editors what is going on. It looks "obviously" as if each block is self-sufficient, and, e.g. that they can be reordered, but of course when I moved one block to the end this actually took it out of "multitrans" and broke it. Is there any way to lay this out more clearly? E.g. can "multitrans" be started at the very top on a separate line, and ended at the very bottom on a separate line? Mihia (talk) 09:29, 2 May 2024 (UTC)[reply]

Occitan template requests

Can anyone create Occitan past participle template and other essential templates?. other Romance languages such as Asturian, Catalan, French, Galician, Italian, Portuguese, Spanish have already had past participle template. Thank you in advance. Flummont (talk) 02:06, 2 May 2024 (UTC)[reply]

I guess you could make {{oc-pp}} as a copy of {{ast-pp}} to start with, as that does not use Lua and should be easy to adapt. But I can see that Occitan morphology is not always as simple as adding letters onto a fixed stem. Ultimately what needs to be created is Module:oc-headword, as a copy (😢) of Module:ca-headword or similar, with changes to the Catalan-specific logic there. @Benwing2 is the expert here. This, that and the other (talk) 02:49, 2 May 2024 (UTC)[reply]
@This, that and the other Conceptually this is not so hard, but unfortunately I don't know that much about Occitan; and a complicating factor is the 6 or so different dialects, each of which (conceivably) forms its feminine and plural according to its own rules. Benwing2 (talk) 03:03, 2 May 2024 (UTC)[reply]
Looking at oc:parlar#Conjugason, dialectal differences may not be an issue for past participles (at least in these four dialects, and for regular first group verbs). I was going to say "I'm sure Flummont knows more", but it seems that this editor does not actually speak Occitan. This, that and the other (talk) 04:13, 2 May 2024 (UTC)[reply]
I took a look at the conjugations there. Unfortunately they only have Lengadocian conjugations for -ir/-er/-re verbs but the regular ones seem to be like parlar. However, the irregular ones (e.g. oc:Template:Conjugason/oc/leng/-odre-t, oc:Template:Conjugason/oc/leng/-dire) look to be more complex and probably differ dialect-to-dialect. Benwing2 (talk) 04:24, 2 May 2024 (UTC)[reply]

Two issues with transliteration categories

This, that and the other (talk) 02:32, 2 May 2024 (UTC)[reply]

@This, that and the other I agree with your first statement. I'll have to look into what's going on with honey. Benwing2 (talk) 03:08, 2 May 2024 (UTC)[reply]
@Benwing2 I guess the hidden cat issue could be fixed by changing Module:category tree/poscatboiler#L-448 to return false, but I don't really understand the logic here. Umbrella categories don't contain entries, so why would they ever need to be hidden? This, that and the other (talk) 04:06, 2 May 2024 (UTC)[reply]
@This, that and the other I think I wrote that code more or less mechanically. But in this case making that change wouldn't fix the issue because the 'Requests for ...' categories are all raw, so the first arm of the if-statement would apply. We need to make a change somewhere in Module:category tree/poscatboiler/data/entry maintenance, which generates its own umbrella categories, to not hide such categories. Benwing2 (talk) 04:16, 2 May 2024 (UTC)[reply]
Maybe [5] will fix it. This, that and the other (talk) 04:40, 2 May 2024 (UTC)[reply]
@This, that and the other Looks good to me. Benwing2 (talk) 04:58, 2 May 2024 (UTC)[reply]