Jump to content

Wiktionary:Beer parlour/2022/June

From Wiktionary, the free dictionary

The issue of Large Entries

[edit]

We have a dilemma and balance to strike. Being an online dictionary we are able to include much more information than a paper one. Information that, no doubt, is useful to some amount of readers. However the vast majority of readers are looking for definitions. When they are presented with a massive wall of text, it might be easy to get lost and not be able to find the information you are looking for. I believe we should make more things collapsible as a solution. Any level of header (L2-L4) should be collapsible entirely, and L2's are already collapsible on mobile. I think the default should be L2's are open by default and so is the L3 with the headword and definition. I also wonder if we should follow suit of the collapsible nyms change and make collocations and usex's collapsible WITHIN the definition line. This is also in reaction to the [https://en.wiktionary.org/wiki/Wiktionary:Votes/2021-08/Prioritizing_definitions recent failed vote to prioritize definitions and move etymologies to the bottom. We could still keep etys at the top, but they could remain hidden. I am sure there might be some downsides I am overlooking, so I am very interested in feedback. However, again, the upside would be this would enable us to include yet more information valued by some readers without overloading others. Vininn126 (talk) 10:07, 1 June 2022 (UTC)[reply]

P.S. Polling a VERY small number of people (this is honestly probably an issue we should somehow ask the readership about), people seem to prefer having the information just there, even if it's a lot. Granted, this was a tiny pool of people, we should ask more. Vininn126 (talk) 15:58, 1 June 2022 (UTC)[reply]
I don’t know, if descendants lists and reference sections or even windy etymologies are unusually large they are already collapsed. Don’t wanna click all the time for any particular kind of section. In general, I also recommend having a large screen true to colours (not under 222 €), if not two or three, so you don’t perceive to get into this predicament. Polish mice are also very good to scroll large pages like bar. Fay Freak (talk) 20:12, 2 June 2022 (UTC)[reply]
It's less for me and more for readers. I have large screen and ways of navigating, but not everyone. Maybe we could include in the settings what is automatically collapsed or not. But again, it'd be something that we should ask readers about. Vininn126 (talk) 20:22, 2 June 2022 (UTC)[reply]
Can you give an example of an entry that badly needs it and can't be reduced any other way? I feel like a totally collapsed etymology section would like quite, so I would like to try some possible layouts first. - Sarilho1 (talk) 12:49, 3 June 2022 (UTC)[reply]
Some of the larger English entries come to mind. Or pages with many L2's. go is very annoying to navigate, even with good tools turned on. Polish behawiorysta is a page with very little semantic information, thus all the other information is obscuring it. Vininn126 (talk) 12:53, 3 June 2022 (UTC)[reply]
Thank you. Though frankly, I think the main problem with the two examples are the Pronunciation sections, in particular the audios. - Sarilho1 (talk) 14:42, 3 June 2022 (UTC)[reply]
So I reckon. They should be collapsed; this becomes more pressing if we include more dialectal pronunciations as we should for English. Using three lines for audio files in the same language is unwarranted in almost any case. Fay Freak (talk) 15:11, 3 June 2022 (UTC)[reply]
Well, imagine a similar situation with any entry. A person might be interested in only one thing, and having a ton of other stuff in the way might be a problem. So having them optionally collapsible might be convenient. Vininn126 (talk) 15:17, 3 June 2022 (UTC)[reply]
psst. — SURJECTION / T / C / L / 22:32, 3 June 2022 (UTC)[reply]
That's exactly the sort of thing I'm talking about. I just wonder if there'd be value in making this a togglable option for everyone. I also wonder if it'd be possible to get it work for further reading and references. Finally, this would require it not being a json, but the options to say what's automatically collapsed or not. Vininn126 (talk) 09:29, 4 June 2022 (UTC)[reply]

Different categories for older vs newer borrowings from Latin, and use of Vulgar Latin in etymology if lemmas are absent

[edit]

So is it policy not to include a language in an etymology if you don't have the actual lemma for the term in the etymology? For example, if you know a Portuguese, French or Spanish term comes through Old Portuguese, Old French, Old Spanish but you don't actually have the word in the 'Old' language itself. Are we supposed to just skip and it go straight to Latin until we have it (or create a reconstruction, which I'm not too keen on)? Or in languages like Welsh or Albanian, which absorbed a lot from spoken Latin during Roman rule... are we not supposed to put borrowed from Vulgar Latin if we don't have that precise term? I noticed some of the ones I did that way were reverted. Most of the proto-Brythonic terms that came from Latin were likely taken from Vulgar Latin or Ecclesiastical in some cases (like ciwdod), but this was an organic process and not later medieval scholarly borrowings (those can be differentiated). How do we handle this? I think it's important to distinguish, through category, terms in these languages borrowed in ancient times in this organic fashion versus later scholarly borrowings. That's why I did things the way I did. Should I not do that? Word dewd544 (talk) 21:21, 5 June 2022 (UTC)[reply]

For inherited Romance vocabulary, I don't see how saying, for instance, 'from Old Portuguese' is beneficial. If we don't know the medieval form, we can simply say 'inherited from Latin', which implies the existence of a medieval form.
For Latin words that were, say, borrowed into Old Portuguese and then passed into the modern language, the best way to distinguish them would be to create Old Portuguese entries for them, so that they get categorized under 'Old Portuguese terms borrowed from Latin'. If on the other hand we don't know the Old Portuguese forms in the first place, are we really justified in positing medieval-era borrowings?
For languages such as Welsh, Albanian, etc. I have fiddled around, incidentally, with distinguishing ancient borrowings from modern ones in descendants sections, as in sagitta, contrarius, and Latinus. Nicodene (talk) 22:12, 5 June 2022 (UTC)[reply]
@User:Nicodene I see what you did in the descendants sections and I do like that. But I also meant how to distinguish them in the lemma entries themselves and through potentially different categories. That seems like a trickier endeavor. Like if someone was interested in seeing a list of words in Welsh that specifically came from Latin via Proto-Brythonic vs. a list of later Latin borrowings. Word dewd544 (talk) 14:19, 13 June 2022 (UTC)[reply]
@Word dewd544 I like your approach of adding {{bor|x|VL.|-}}. If I'm not mistaken, it already accomplishes all that we need it to. Is anyone against this specific usage? Nicodene (talk) 19:29, 13 June 2022 (UTC)[reply]
@User:NicodeneSome of the folks here who were against the general policy of not including VL. unless we have an explicit lemma or reconstruction in the etymology I guess. Also I believe @User:Mahagaja reverted most of the ones I did like that for Welsh. I can see where they're coming from but I'm still trying to figure out the best way to handle this. Also, Welsh is special and a bit different from say, Irish, in that it was a language that developed directly under the rule of the Roman empire (as proto-Brythonic), from the Romano-Britons. While Ireland was never ruled by Latin speakers and only got some of its words later, mostly via the church and missionaries. There was a British Latin vernacular apparently, and this is where the Welsh would have incorporated this vocabulary from. That was probably more likely spoken by the elites, who eventually abandoned it as Old English pushed them west and they lost contact with other speakers. In a parallel way, Albanian (probably) borrowed much of their Latin from the local emerging Eastern Romance, although that is contentious because it shows some features closer to Italo-Dalmatian at times rather than Romanian. Word dewd544 (talk) 23:55, 13 June 2022 (UTC)[reply]
Perhaps your reason for doing so did not occur to Mahagaja. Your approach has my vote. Nicodene (talk) 00:08, 14 June 2022 (UTC)[reply]
@User:Word dewd544 It’s possible to do exactly what you’re asking by using a hyphen in place of the lemma. For example: {{der|en|la|-}} renders simply as Latin, but still categorises things correctly. I have never encountered this being a problem (though I don’t know if some people have an issue with it). Theknightwho (talk) 21:29, 9 June 2022 (UTC)[reply]
I disagree and think that we ought to always provide all intermediate steps and remove some intermediate language codes if those languages aren't different enough from their mother or daughter language (which is why I think some codes, like Proto-Nuclear Polynesian, should be removed as too specific; or, alternatively, more pages could be created).
Anyway, as long as we don't have these intermediate steps we're incomplete in my opinion, and I would be happy to see some, in particular Polish and Russian, editors structurally introducing Old Polish and OES to their etymologies. Thadh (talk) 22:27, 9 June 2022 (UTC)[reply]
I do know about using the hyphen in place of a lemma and have done that many times. But as a matter of policy, some people seem to be against that unless we have an explicit lemma in the etymology. I was just thinking it could be useful in terms of categorization regardless. Word dewd544 (talk) 14:19, 13 June 2022 (UTC)[reply]

I created {{uder}} (= "unknown derivation"), which takes the same parameters as {{der}}, {{bor}} and {{inh}} but adds the page to a tracking category, just like for {{etyl}}. It is intended to replace {{etyl}}, to reduce the temptation of certain editors to mechanistically convert {{etyl}} to {{der}}. In fact I went ahead and bot-converted uses of {{etyl}} in French to use {{uder}}, according to the following procedure (note that {{m}} below actually stands for any of {{m}}, {{l}}, {{mention}} or {{link}}):

  1. {{etyl|DEST|SOURCE}} {{m|DEST|...}} becomes {{uder|SOURCE|DEST|...}}.
  2. {{etyl|DEST1|SOURCE}} {{m|DEST2|...}} becomes {{uder|SOURCE|DEST1|...}} if DEST1 is an etymology language that's either a child of DEST2 or an alias of DEST2.
  3. Remaining {{etyl|DEST1|SOURCE}} {{m|DEST2|...}} (i.e. with mismatched destination languages) are left alone, but my bot issues a warning when it runs.
  4. Remaining {{etyl|DEST|SOURCE}} not occurring before {{m|...}} are converted to {{uder|SOURCE|DEST|-}}.

I am going to convert the remaining occurrences of {{etyl}} in other languages according to the same procedure unless someone objects. Thanks to User:Svartava for the original prodding to do this (granted, that was more than a year ago ...). Benwing2 (talk) 01:41, 6 June 2022 (UTC)[reply]

The main advantage to this approach is that it makes it possible to permanently retire {{etyl}} for all languages without losing any information. The main caveat is that it may just move the problem to a new template, with the same tendency for people to copy from other entries that have the wrong template. At least it will stop the people who have been using {{etyl}} because that's what they've always used. Chuck Entz (talk) 01:59, 6 June 2022 (UTC)[reply]
@Chuck Entz My other thought is that people who are abusing {{der}} as a catch-all because they don't want to be bothered to figure out inheritance vs. borrowing can be persuaded to use {{uder}}, which at least makes it explicit that the term needs reinvestigation. There are a lot of existing uses of {{der}} put there by users who either don't understand the {{der}} vs. {{bor}} vs. {{inh}} distinction or (just as often) know better but just can't be bothered. Benwing2 (talk) 02:24, 6 June 2022 (UTC)[reply]
@Benwing2, Chuck Entz: Thanks for creating it, @Benwing2. I don't think it's best to make it a copy of {{etyl}}, but ideas for a derivation template when the editor doesn't know what type of specific derivation it is, have been proposed before at BP discussions (see also Template:der?). I think it would be useful to have this kept even after etyl-cleanup is done. I created the tracking cat LANG undefined derivations, e.g. Category:French undefined derivations for this template. If you think categories like CAT/{{{1}}}/{{{2}}} are needed, feel free to restore them but I didn't think they're needed especially for some particular languages since they're findable from MediaWiki search. —Svārtava (t/u) • 04:01, 6 June 2022 (UTC)[reply]
@Svartava I think it might be best to have it categorize both into the etyl-cleanup categories and the one you created; eventually we can remove the former. Benwing2 (talk) 04:24, 6 June 2022 (UTC)[reply]
@Benwing2: Taking a look at the source, the special categories with both source and destination lang codes such as CAT:etyl cleanup/en/fr were only for English, so I restored that bit. I didn't restore Category:etyl cleanup no target since {{uder}} would generate a module error in case there is no lang code and it'd be picked up from CAT:E. The list of languages done with etyl cleanup is also not needed in this case since we want to allow {{uder}}/{{der?}} for all languages. Lastly, I didn't bring back CAT:etyl cleanup/LANGCODE since ultimately after {{etyl}} -> {{uder}} replacements are done, e.g. CAT:etyl cleanup/fr would be redundant to CAT:French undefined derivations. —Svārtava (t/u) • 05:10, 6 June 2022 (UTC)[reply]
Thanks all. “langname undefined derivations” I deem a very reasonable tracking category name. Probably for clarity that a page does not use {{etyl}} {{uder}} must not simultaneously categorize under Category:etyl cleanup/langcode/langcode, else you yourself as the category creator get misled about the situation like if you take a wikibreak and then look how things are going and don’t remember by heart what you have implemented or run. But by the way I estimate the remaining etyl cleanup at roughly two-hundred manhours, however partially requiring people with judgement particular to the connoisseur of a language’s morphology and relations. Fay Freak (talk) 05:33, 6 June 2022 (UTC)[reply]
@Fay Freak, Svartava What do you think about "LANG unclassified derivations" instead of "undefined"? This might be clearer in specifying that the issue is that the derivations need to be classified properly as inheritances/borrowings/neither. Benwing2 (talk) 05:46, 6 June 2022 (UTC)[reply]
@Benwing2: I agree either way. Feel free to change that. —Svārtava (t/u) • 05:48, 6 June 2022 (UTC)[reply]
Not sure that it is so, with previous knowledge of the categories’s purpose. If you don’t know what we are about then “unclassified” doesn’t tell more either, and it is longer to type. It is quite clear that the derivation is undefined. If you really want to be clear you call them underspecific derivations. Fay Freak (talk) 05:51, 6 June 2022 (UTC)[reply]
OK, I left it at "undefined" for now, let's see what others think. The issue of typing length shouldn't matter too much with autocompletion. Benwing2 (talk) 06:24, 6 June 2022 (UTC)[reply]
Now it occurs me that unclassified could also mean that nobody knows it, as in a biological unclassified taxon, so it would look less a maintenance category than it should look. Fay Freak (talk) 06:55, 6 June 2022 (UTC)[reply]
I'm skeptical this will help, but I don't see any harm in it. Thadh (talk) 07:16, 6 June 2022 (UTC)[reply]

Wow, I had the exact same thought in April. Needless to say, I fully support this idea. 70.172.194.25 07:00, 6 June 2022 (UTC)[reply]

I'm also in support of this change (I think I've proposed the same thing back in October) and I agree with Fay Freak's point about preferring undefined over unclassified. Defined (to me) expresses something about us, Wiktionary: "we have not defined it yet"; classified (to me) speaks about the nature of the word itself: "it has not been determined which class this word belongs to". — Fytcha T | L | C 20:22, 9 June 2022 (UTC)[reply]
Whoa, I must have either read that and totally forgotten about it, or we both came up with the same idea independently (GMTA?). 70.172.194.25 03:32, 10 June 2022 (UTC)[reply]
Funniest thing is that we both wanted to call it {{der?}} :) — Fytcha T | L | C 10:10, 10 June 2022 (UTC)[reply]

Old Armenian derivations from "Middle Iranian"

[edit]

See Category:etyl cleanup/xcl. The remaining 80 entries after my bot attempted cleanup are all or almost all cases where words are derived from "Middle Iranian". In reality there is no such thing as "Middle Iranian"; it's not even a well-defined family. So I have no idea what language the reconstructed forms are supposed to be. They look to all be created by User:Vahagn Petrosyan. Can you comment on where these came from and what language or proto-language they're supposed to represent? I have a hankering to comment them all out. Benwing2 (talk) 07:46, 6 June 2022 (UTC)[reply]

This has been discussed before. It is often impossible to determine from which Iranian language an Armenian term is borrowed — Parthian, Middle Persian, Middle Median or controversially also some other Iranian languages, all from the Middle Iranian period. It is quite common in Iranological and Armenological literature to use "Middle Iranian" in such cases. Why can't you do this? Vahag (talk) 08:32, 6 June 2022 (UTC)[reply]
@Vahagn Petrosyan I can do something like that, e.g. whenever I see 'und' as the language being linked to; but my larger point is that I have a hard time believing that you really can't determine at least to some extent which Middle Iranian language is being referenced. Whether this is standard practice in Iranological/Armenological literature makes no difference to what is the right thing to do. I know for example there is a massive difference in the phonology of Old Persian vs. Avestan, and if you carry that forward it should get even worse. I see for example that Wiktionary has a Proto-Medo-Parthian language, why can't you derive from that? And is there really so little phonological difference between Proto-Medo-Parthian and Middle Persian that you can't tell which is which? At the very least rename "Middle Iranian" to "Middle Western Iranian". Benwing2 (talk) 02:59, 7 June 2022 (UTC)[reply]
Western Iranian languages developed in parallel and often borrowed from each other. It is really often impossible to distinguish the source of the Armenian loanword. Compare Old Armenian հազար (hazar, thousand) ~ Parthian hazār, Middle Persian hazār "thousand", Old Armenian դէմ (dēm, face) ~ Parthian dēm, Middle Persian dēm "face", Old Armenian դրաւշ (drawš, flag) ~ Parthian drafš, Middle Persian drafš "flag". Proto-Medo-Parthian is not helpful, as it does not include Middle Persian, and also because the borrowing from the Parthian or Middle Median may not always be reconstructible to the Proto-Medo-Parthian stage. We can't rename "Middle Iranian" to "Middle Western Iranian", as Category:Terms derived from Middle Iranian also includes terms in languages which borrowed from Eastern Middle Iranian. Vahag (talk) 06:30, 7 June 2022 (UTC)[reply]
@Vahagn Petrosyan If Category:Terms derived from Middle Iranian includes Eastern Middle Iranian, then how can that possibly be of any use to users of Wiktionary? Linguistically it is total nonsense and IMO an embarrassment, just like etymologies that derive from "Native American". It needs to be cleaned up. Also IMO a better approach than the existence of a "Middle Iranian" etymology language, even if just for Western Middle Iranian, is to say something like Borrowed from an {{bor|xcl|ira}} source, compare {{cog|pal|...}} and {{cog|xpr|...}}; at least that way you avoid linguistic nonsense. Also I rewrote everything in Category:etyl cleanup/xcl that had a family (including "Middle Iranian" and "Old Iranian") as the destination and 'und' as the source; can you take a look at the remainder? Benwing2 (talk) 08:03, 7 June 2022 (UTC)[reply]
@Benwing2: You are nonsense “linguistically”. It is only saying two things, “Iranian”, which you find legitimate, and a periodization of the borrowing time, for the Iranian chronolects are marked by common isoglosses (it is questionable how much this word isogloss can be used for other-than-geographic boundaries, but I just do it now, as you know there are a features spreading over multiple languages, e.g. the Slavic yer reduction), a defined subset of the Iranian languages. And we thereby avoid circumstantial and stereotypical phrases like “Iranian, compare x and y” with conspicuous linguistic precision, instead variating our wording to something like “borrowed from a Middle Iranian term attested in …” (the attestation uses to be spotty). I say “Iranian borrowing” for Arabic when I am not sure if a word is from the 700s or present around 600 already, but all of Parthian and Middle Persian and Proto-Kurdish as far as we see changed drastically at this time, as did the political situation in the Middle East, together with the fall of antiquity in the West; it is useful to note whether a word spread before or after the occurrences. Fay Freak (talk) 09:23, 7 June 2022 (UTC)[reply]
Distinguishing Old, Middle and New Iranian layers in Nebenüberlieferungen is an important task. For now we often cannot be more precise because of the spotty attestation of Iranian, intra-dialect borrowings and the late writing of the three main Caucasian languages. Thanks for cleaning up Category:etyl cleanup/xcl. I handled the remaining words. Vahag (talk) 09:40, 7 June 2022 (UTC)[reply]

"Proto-Baltic"

[edit]

@Rua Speaking of linguistic nonsense, we have an etymology language bat-pro that stands for "Proto-Baltic". These etymologies need to be reviewed; probably we need to rename that code to "Proto-East Baltic" and the supposed "Proto-Baltic" ancestors of Old Prussian fixed. Benwing2 (talk) 08:06, 7 June 2022 (UTC)[reply]

Wiktionary:Etymology scriptorium/2019/December § code for Proto-East-Baltic PUC09:49, 7 June 2022 (UTC)[reply]
Agreed, I also think we're overdue to have PEB reconstructions. Thadh (talk) 14:17, 7 June 2022 (UTC)[reply]

This template is used in the etymology section of Chinese entries, and adds Category:Chinese terms borrowed from Japanese. But it also adds Category:Wasei kango, which is a subcategory of Category:Japanese terms derived from Chinese. Should this latter categorization be changed to reflect the fact that the entries being referred to are Chinese? Most of the entries also have Japanese sections anyway, but the ones listed here do not. 70.172.194.25 13:41, 7 June 2022 (UTC)[reply]

A term can, normally speaking, not be at the same time a borrowing in language X from language Y, and a term in language Y derived from language X. There may be a few exceptions of terms that made the round trip (with a change in meaning), but these should be noted separately as such. In almost all cases there is no round trip, so this should not be triggered automatically by the template.  --Lambiam 09:17, 8 June 2022 (UTC)[reply]
We should make a distinction between Japanese terms that are wasei kango (“written in kanji, but made in Japan”), and Chinese terms that have been borrowed from such Japanese wasei kango. The documentation of the template {{wasei kango}} states that it is meant for use in Chinese entries. I think it is also of interest to mark Japanese wasei kango entries as such, which requires some modifications – either a separate template, or a parameter identifying the language.

Old Tamil in Brahmi Script

[edit]

Given the changes in the Brahmi script with Unicode 14.0, and the lack of grandfathering of the old encoding for Tamil Brahmi, at some point we ought to convert the entries. The relevant changes are:

  1. Short e and o now have their own characters; they are no longer to be written as e and o plus virama.
  2. Pulli is now encoded distinctly from the original virama. Previously it was formally considered to be a stylistic distinction.

An example of the change would be from Unicode 13 𑀧𑁂𑁆𑀬𑀭𑁆 (peyar) and 𑀯𑁂𑁆𑀫𑁆 (vem) to Unicode 14 𑀧𑁳𑀬𑀭𑁆𑀧𑁳𑀬𑀭𑁰 and 𑀯𑁳𑀫𑁰.

The Unicode standard does not grandfather the old encoding. This argues that we should convert the spellings now - which will ensure that they don't render properly! However, should the old encoding be allowed to linger? Most (all?) of us lack a font for the new encoding. I propose that we move the old spelling to the new spelling, and then we turn the hard redirects into soft redirects in slower time. Would oty-preuni14 be a suitable name for a template implementing the soft redirect?

When looking at the task, I found no citations for how the words were written. Could someone please advise onm the source of these entries - at the moment I am tempted to RfV the lot, but I fear that would be counterproductive. Also, is Brahmi beiong used as a transliteration of Vatteluttu? Unfortunately, thje original authors were IPs, so it is hard to ask them for citations. @AryamanA, Hk5183, 108.31.52.77, 98.179.127.59 -- RichardW57 (talk) 18:16, 8 June 2022 (UTC)[reply]

Incidentally, do we need to transliterate long and short Old Tamil 'e' and 'o' as ē/ō and e/o rather than e/o and ĕ/ŏ? --RichardW57 (talk) 18:37, 8 June 2022 (UTC)[reply]

In the absence of any response, I have gone ahead and made the change to the transliteration to Roman script. Orthographically long Old Tamil e and o are now marked with a macron, while the short vowels are unmarked. This corresponds to the convention for Tamil. --RichardW57 (talk) 13:30, 12 June 2022 (UTC)[reply]

Flood of audio pronunciation requests

[edit]

As currently visible at Special:Contributions/Rodrigo5260, this user appears to be adding {{rfap}} to pretty much any and every entry they happen to land on. I happened to see a lot of activity from them on the Japanese pages on my Watchlist, but I see from their contributions that it's pretty indiscriminate with regard to term language.

This strikes me as odd and unhelpful, more as noise than anything. But I'm uncertain if that's just me.  :)

What do others think? Is this kind of blanket request useful, or should I ask them to stop? ‑‑ Eiríkr Útlendi │Tala við mig 00:01, 9 June 2022 (UTC)[reply]

@Eirikr: I personally oppose to blanket requests on audio recordings and translations, even though there is no explicit policy on such restrictions. --Anatoli T. (обсудить/вклад) 02:34, 9 June 2022 (UTC)[reply]
Done Done Equinox 02:42, 9 June 2022 (UTC)[reply]
I couldn't disagree more: this helps us find out where gaps are. Why wouldn't we want pronunciations on all entries? —Justin (koavf)TCM 04:33, 9 June 2022 (UTC)[reply]
@Koavf You're not thinking this through: we know where the gaps are: everywhere. If we look at English, which is has much better coverage than most languages, there are 45,115 pages in Category:English terms with audio links out of well over a million total (about 4%). Used properly, {{rfap}} lets us know where people feel a need for audio. Used this way, it only lets us know about this person's browsing habits. It wouldn't be hard to bot-add the template to every entry without audio, but then we would end up flooding the categories with literally millions of requests- if everything is top priority, nothing is. Chuck Entz (talk) 05:47, 9 June 2022 (UTC)[reply]
But how do you decide what is "top priority"? There's no deadline on the dictionary: just keep on adding pronunciations as they come up. If we need some human-curated list, then we can make a request board or have some votes or something. Until then, all of the entries in all of the languages that are spoken could use a pronunciation (and an etymology and rhymes and hyphenation, etc. as applicable). —Justin (koavf)TCM 05:51, 9 June 2022 (UTC)[reply]
Yes, Chuck already said that basically all entries could use a pronunciation. It's pointless tagging all entries with the same tag, which just wastes space and pushes the real content further down. There are too many. Equinox 06:10, 9 June 2022 (UTC)[reply]
Then make it so it doesn't display and just adds entries to a tracking category. —Justin (koavf)TCM 16:44, 9 June 2022 (UTC)[reply]
If this were a social media site, you'd probably see +1 buttons below the requests. Right now it's binary. We could also create a list of requests sorted by number of page views / interwiki links or similar metric. I did something similar a while back while adding German recordings: the category was flooded by one user who added the tag to all the entries they came across. – Jberkel 06:33, 9 June 2022 (UTC)[reply]
Interesting idea. I'd support it if it needed a vote. (Almost any practical attempt to put any set of requests into priority order using something at least plausible seems useful to me. I'd been using just "taxlinks"/redlinks within Wiktionary, but pageviews would be more useful to prioritize entries with higher volumes of pageviews than the bulk of organism names.) DCDuring (talk) 18:35, 9 June 2022 (UTC)[reply]
@Koavf: Given how narrowly our pronunciations are recorded, I'm not sure that we really want 20+ different pronunciations for every Latin script Pali term. --RichardW57 (talk) 07:03, 9 June 2022 (UTC)[reply]
How many do we want? —Justin (koavf)TCM 16:44, 9 June 2022 (UTC)[reply]
@Whoop whoop pull up: what fellow editor? It's best not to take anything he says seriously, especially when he refers to himself in the third person. Chuck Entz (talk) 06:19, 23 June 2022 (UTC)[reply]
@Chuck Entz What evidence do you have that Zumbacool is the same person as WF? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:37, 23 June 2022 (UTC)[reply]
@Whoop whoop pull up Nothing explicit, but then, I have no explicit evidence that you're the same person who was using your account a few months ago. What you don't realize is that we have had an informal arrangement with WF for many years: he only uses one account at a time, and plays by the rules (with certain exceptions). After a decade of dealing with WF, I'm quite familiar with his editing patterns and can always spot his latest account if I'm paying attention.
Zumbacool is working on WF's projects using WF's methods and sources in the characteristic WF style, posting messages on talk pages of people that WF always deals with that would make no sense if he wasn't WF. He even tweaked Equinox recently on his talk page about his edit count, which only makes sense if you are aware of the vast number of edits that WF's accounts have made when taken together. In other words, if Zumbacool isn't WF, then you're basically casting aspersions on him as a WF impersonator. Chuck Entz (talk) 19:03, 24 June 2022 (UTC)[reply]
Ah, makes sense, then. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:42, 24 June 2022 (UTC)[reply]
@Chuck Entz On that note, is User:Udutdut Wonderfool, as I suspect? Zumbacool seems to claim otherwise, though. —Svārtava (talk) • 16:17, 29 June 2022 (UTC)[reply]
No. WF is never overtly hostile or condescending, and is not very good at pretending to be a noob. This person has problems with their English and they seem to genuinely think their formatting is correct. I have no idea what would make anyone think they were the same person. Chuck Entz (talk) 04:13, 30 June 2022 (UTC)[reply]
@Chuck Entz: Whoop whoop pull up, after writing this message, removed Almostonurmind and Zumbacool from the list at User:AryamanA/Wonderfool with the edit summary “Please do not speculatively cast aspersions on editors.” Almost immediately, Fytcha re-added Zumbacool, writing “this one is for sure”. J3133 (talk) 07:39, 23 June 2022 (UTC)[reply]

Currencies and place names

[edit]

I was wondering how best to tackle the proper noun sense of currencies at entries like dollar, pound, franc and so on, which could easily approach 50–100 senses once historical currencies and redenominations are taken into account. Our usual approach is obviously completely unwieldy.

Although we could just hive this stuff off into an appendix, I think a better approach would be to list the major ones (or not, if that would cause too many arguments), and then put the rest (as well as those which usually use a qualifier) in a collapsible box under the definition line. An illustrative example:

  1. the United States dollar
  2. the Canadian dollar
  3. the Australian dollar
  4. Any one of various other currencies:
  5. (historical) Any one of various former currencies:


There's no need for it to look exactly like this, as I've just used one of the templates we already have (combined with some HTML fudging as it's not designed to work like this), but I feel like this is a much better approach to "common" proper nouns.

The other obvious application is place names, and while it would need to be laid out differently (i.e. no columns), the same principle applies, as entries like Kingston or San Antonio are really unwieldy at the moment. Much better to break things down in a more user-friendly manner. Theknightwho (talk) 13:31, 9 June 2022 (UTC)[reply]

I'm not sure there is an objective way to identify which currencies are major and which aren't. How is the Canadian dollar more important than, say, a Belize dollar? Thadh (talk) 13:51, 9 June 2022 (UTC)[reply]
I agree, but it's a side issue. We have this problem with place names, too, where it's difficult to know which ones go under the "various other" bit. The US dollar does definitely count as major, as that's what people assume in countries that don't use the dollar, but it feels a bit uncomfortable to only give special prominence to that.
In any event, the formatting is the main issue here. Theknightwho (talk) 14:05, 9 June 2022 (UTC)[reply]
As we are concerned with word use, the point is: which of the various entities referred to as "[country] dollar" are referred to as "dollar" (and by whom/where). For the most part they are/were referred to as dollar only by English speakers in the country of issue. Dollar, is used much more widely to refer to the US Dollar, most tellingly by English speakers in places like India, UK, Ireland, and Singapore, which have currency names that do not include dollar in their names. The Australian dollar may be referred to as dollar by English speakers in, say, PNG. All of the currencies that are referred to as dollar "only" in the issuing country could share a single definition.
If we accept this rationale, then only the US and Australian dollars would get their own definitions under dollar, the rest being defined as members of a class and appearing individually only under derived terms. DCDuring (talk) 19:08, 9 June 2022 (UTC)[reply]
I don't really understand your reasoning. If I'm going to Canada and I want to check how much money we have (so, whether or not I have to change any once I'm there), I'd also ask "How many dollars do we have?", probably omitting the "Canadian" part due to context. And while I can see how the US dollar is universally the prototypical dollar (even though that's a shame, but oh well), I don't see how the Australian dollar is better-known in PNG than, say, Hong Kong dollar in Cantonia, Brunei dollar in the Phillipines or the Singaporean dollar in Malaysia. Thadh (talk) 19:35, 9 June 2022 (UTC)[reply]
Thanks - I was completely confused by that reasoning as well.
@DCDuring Almost all of these currencies refer to themselves by the name "dollar" on their notes in English, without including the name of the country (e.g. "five dollars"), including Singapore. Of the modern currencies, the only exceptions are Hong Kong, Brunei, Namibia and Taiwan. I simply don't understand how you could call the rest derived terms, and there is no requirement that senses must be used internationally.
Most of these are English-speaking countries, by the way, and there are plenty of contexts where you might use the term as a shorthand outside of the country in question, given the right circumstances (e.g. academia). I think you're making a hugely over-generalised assumption, and in any event the same could be said of a large number of place names, too.
That's also to say nothng of the fact this isn't even taking redenominations into account. Currencies change all the time, and might still use the name dollar, and your logic completely breaks down when we bring up the peso as none of them dominate. Theknightwho (talk) 19:41, 9 June 2022 (UTC)[reply]
My point is that all except the US dollar (and, probably, the Australian dollar) share the common characteristic of being called dollar only within the jurisdiction of issue, whereas the US dollar (& Aus$) have a different range of usage, the US dollar being the intended referent in most English-speaking countries that don't call their own official currency dollar. Thus, all but those two can share the same English definition: "Any local currency with an official name of, or containing, dollar", with something like {{lb|en|principally in respective jurisdiction of issue}}. (We should use jurisdiction because of Hong Kong dollar. This finesses Taiwan dollar as well.)
For dictionary purposes what appears on the currency is secondary to other usage, just as what appears on Kimberly-Clark products is lexicographically secondary to the way normal speakers use kleenex. In any event, dollar is very much like foot before its standardization in UK, a mere locally standardized unit of measure, or noon, a term whose exact meaning differs now by time zone and was formerly defined astronomically for each location.
I don't understand why all such "[country] dollar" names aren't derived directly from dollar, sometimes via Spanish dollar, sometimes via US dollar. US dollar certainly is derived from early use of dollar in other places (possibly other languages).
I think academics don't need Wiktionary to tell them how to use the term dollar. The referent in academic usage is usually made clear in the document in which the term is used. For a term of wide use by normal people academic usage is largely irrelevant unless the academic usage is with a truly distinct meaning. DCDuring (talk) 14:30, 10 June 2022 (UTC)[reply]
@DCDuring Your argument about the implied referent applies to almost all of our place name entries, as well as the vast majority of our detailed taxonomic entries. It also ignores the fact that currencies change while keeping the same name - something completely disguised by your argument, as well as the use of currencies in neighbouring jurisdictions (which is more common than you seem to realise in many parts of the world). I’m also not sure why inclusion would be a problem, particularly when the point of this discussion is about preventing clutter and not about whether or not we should include these in the first place. It’s frustrating to have this discussion go so wildly off-track. Theknightwho (talk) 19:59, 11 June 2022 (UTC)[reply]
Obviously all the [jurisdiction] dollar names are just normal entries, subject to normal RfD. I was addressing the question of how many definitions we need at [[dollar]] to cover all the usage without wasting screen space in the definition portion of the entry. That currencies are revalued (if that indeed is what you mean when you write "fact that currencies change while keeping the same name") is of no import to the lexical meaning of dollar in any reasonable definition. After all, currencies change in value constantly, even when governments say they don't. It is very easy to prevent clutter by putting all the "[jurisdiction] dollar" terms under derived terms. End of. DCDuring (talk) 23:55, 11 June 2022 (UTC)[reply]
I’m referring to the way that currencies get withdrawn and replaced (e.g. there have been at least 5 Argentine pesos), not their shift in value. In any event, the whole point of this post was to suggest a way we could include both without inaccurately glossing over the definition with a fudge. Putting them all under derived terms is not particularly helpful when they might be mixed with other terms which never get referred to as “dollar” alone. I also don’t see how the same logic doesn’t apply to place names. Theknightwho (talk) 00:13, 12 June 2022 (UTC)[reply]

Translations of Alternative Forms

[edit]

@Fytcha Alternative forms are not included in the main form's translation box: see diff. Apparently, translation boxes on the pages of alternative forms are rare: see diff. Does this mean that alternative forms don't have translations in other languages? I say they very well may perhaps do so. See Xensi, Boao, Tatung, Taibei, Peking (these four examples were wiped out by Atitarev- please see the new example at CHamoru). (Note: Wiktionary already has separate translation sections for synonyms (not alt forms) like Hailanpao.) Let me know what you all think. --Geographyinitiative (talk) 23:18, 9 June 2022 (UTC) (modified)[reply]

withdrawn
@Geographyinitiative: First of all, I wasn't involved in your interaction with Fytcha. I have only demonstrated again with the edits, what needs to happen. What has made you a competent editor? Your question has been answered since the day the template to be used for that purpose in this revision was created in 2008. If it's still not clear, alternative forms don't get translations. To avoid duplications, the main form usually houses the translations to avoid duplications. --Anatoli T. (обсудить/вклад) 02:15, 10 June 2022 (UTC)[reply]
(edit conflict) And if you want to demonstrate a specific revision (you can't expect that entry used in discussion will not be modified, even if the discussion is incomplete), I will show the competent editor: Xensi. --Anatoli T. (обсудить/вклад) 02:23, 10 June 2022 (UTC)[reply]
Your viewpoint means these forms of these words don't have a more specific translation in the target languages, which is manifestly inaccurate. That template from 2008 may very well be a pile of shit, which is why I'm questioning it. I don't need approval from a website that ignored Wade-Giles for 20 years. --Geographyinitiative (talk) 02:19, 10 June 2022 (UTC)[reply]
You have to blame all Chinese Wiktionary editors for being so mean to you and Wade-Giles. I personally have nothing against WG. Something's wrong with you. Go to hell. --Anatoli T. (обсудить/вклад) 02:23, 10 June 2022 (UTC)[reply]
@Atitarev Your reaction is way out of line, and your aggressive tone from the start has been completely unnecessary. Aren't you an admin, too?
You've also completely missed the point, which is that languages other than Chinese may have direct equivalents to these alternative forms. Bundling them into the main translation box is obviously an inferior approach, particularly when you've not even bothered to do it properly. Theknightwho (talk) 03:46, 10 June 2022 (UTC)[reply]
Way out of line, huh? Did you check the original tone? I already answered that the decision on not translating all alternative forms has been made long ago. All alt forms may have different connotations and usages in a given language but that may not apply to translations at. Even if you think that "Beijing" is fundamentally different from "Peking", in many target languages it's either the same or one is more common and the other is obsolete. In German both Beijing and Peking are translated as Peking, Japanese 北京(ペキン) (Pekin) and Russian Пеки́н (Pekín), any variations can be added with a {{qualifier}}. You can easily see that these translations are based on the original European name for Beijing (Peking) but they are also current. The template {{trans-see}} softly redirects users from the alternative term entry to where all translations are placed. There is no prejudice here or political bias, just centralising the information. --Anatoli T. (обсудить/вклад) 05:03, 10 June 2022 (UTC)[reply]
@Atitarev If you had bothered to read the discussion, you would have seen this is discussed in detail below. Consensus can change - this is not a court of law. Please try to keep up.
I also flatly disagree with your interpretation of the original post - there's nothing rude about it at all. Theknightwho (talk) 05:04, 10 June 2022 (UTC)[reply]
I agree that there's nothing rude about the original post whatsoever. Geographyinitiative's second post in this thread, however, is less polite, using some crass language and possibly expressing a willingness to override consensus, but at least it's not personally targeted like "Something's wrong with you. Go to hell.". I don't see how that response was called for, unless there's some history between the two users I'm missing. Overall, I don't understand why this topic has aroused such heat, since it seems like the kind of thing that can be discussed calmly. 70.172.194.25 05:29, 10 June 2022 (UTC)[reply]
  • I agree that there is some value in a list of Peking-equivalents as opposed to Beijing-equivalents. These words are so different-sounding in English that I'm not even sure I'd call them alternative forms, even though they come from the same Mandarin source and describe the same territory. IMO they're close to the border line between alternative forms and synonyms.
  • To play devil's advocate, though, what do you do when the situation with regards to usage is the opposite as it is in English? For example, German Peking is the most commonly used form; it seems wrong to only list it in the translation box of Peking and not on Beijing, which someone is more likely to see.
  • We could also consider how we handle non-altform synonyms. For example, entire, complete, and total have more cognate translations than mixing-and-matching even though it may not be technically wrong to translate, e.g., English "complete" as Italian "intero". (Of course, most languages in the world do not even use the Latinate words for these concepts, but the ones that do are among the best-represented on Wiktionary.) 70.172.194.25 04:05, 10 June 2022 (UTC)[reply]
    These are all good points and excellent food for thought. My gut instinct is that we should be trying to capture the equivalent tone and context, rather than the cognate, so moving obsolete translations to the obsolete form might be sensible for langauges that have undergone an equivalent shift, whereas that wouldn't be the case for languages which still primarily use the old form (like German with Peking), as the implications carried by English Peking simply aren't there (even aside from the issue of duplication).
    A compromise might be to have a middle ground function in the template, which says something along the lines of "See Beijing, but note the following exceptions:" (I'm sure there's a better way of phrasing it).
    On your final point, I like the way that German editors will frequently define words by listing a bunch of English equivalents so as to encircle the exact concept conveyed by the word. For example, ganz and gesamt are both defined in very similar ways, but the slight differences in word choice and word order are an effective way of demonstrating the difference without getting bogged down. We could do something similar with the translation boxes.
    Theknightwho (talk) 04:35, 10 June 2022 (UTC)[reply]
@70.172.194.25 There is definitely past history involving User:Geographyinitiative, although I haven't been following the specifics of it.
@Theknightwho I took a look at gesamt and I don't much like the definition with five similar English words. I really think this is unhelpful; much better to explicitly indicate the differences with a usage note. When I studied Spanish, for example, I had a textbook that spelled out all the ways to say "become" (hacerse, quedarse, tornarse, llegar a ser, etc.) and explained the differences explicitly. There's no other way I could have sorted out the differences, and Wiktionary currently does a much worse job of this. Benwing2 (talk) 05:50, 10 June 2022 (UTC)[reply]
While I am not that involved in the general outcome of this, I am strongly opposed to any proposal that would entail e.g. German Peking being removed from the translation section in English Beijing, which is what Atitarev expounded on in more detail to which I entirely subscribe.
I also want to clarify a misunderstanding: @Geographyinitiative: In my diff that you're citing, I wrote "alt spellings" which you've paraphrased as "Alternative forms" in the OP. I want to point out that those are not the same: Alternative spellings always share the same pronunciation which is not necessarily true for alternative forms. — Fytcha T | L | C 10:09, 10 June 2022 (UTC)[reply]
@Fytcha I just want to point out that nobody is suggesting that we blindly try to match cognates while ignoring actual use, and I agree that it would be completely wrong to move German Peking to English Peking. Atitarev had pretty obviously not read the rest of the conversation when he responded, because me and 70 had already discussed that exact example and some possible ways forward, and his response that you agree with concerned points which either no-one had made or which had already been addressed. It's a bit frustrating to see the genuine merits of this proposal being ignored, simply because one user has personal problems with the person that proposed it. We're better than that. Theknightwho (talk) 19:02, 10 June 2022 (UTC)[reply]

Ding, dong, Template:etyl is dead

[edit]

I cleaned up the last few hundred uses that my bot wouldn't touch; these were cases where the source in {{etyl}} mismatched the following {{m}}. Benwing2 (talk) 03:23, 10 June 2022 (UTC)[reply]

Shouldn't it be kept so that old revisions are still readable? 70.172.194.25 03:30, 10 June 2022 (UTC)[reply]
Harrumph, let's see what other people think, I don't particularly want people to be able to continue using it. Benwing2 (talk) 03:56, 10 June 2022 (UTC)[reply]
I'm generally supportive of leaving old templates so that old versions have some level of functionality (and that does come in useful from time to time), but if there's a contingent of hold-outs still using it then it might be better to resurrect it in a few months instead. Theknightwho (talk) 04:06, 10 June 2022 (UTC)[reply]
Some simple version of the template should remain, if possible. IMO, the requirement would be that entry histories would be readable. The long red messages make old versions of entries ugly and intimidating. Perhaps just showing {{temp|etyl}} (ie, no parameters) with a link to Talk:etyl or Documentation:etyl. Those pages would be restored to the last version before they were deleted. DCDuring (talk) 14:50, 10 June 2022 (UTC)[reply]
I propose keeping the template, marking it as obfuscated, and potentially adding an edit filter which flags use (or prevents it). It is very annoying when you go to an old revision and cannot parse what it says. - TheDaveRoss 14:54, 10 June 2022 (UTC)[reply]
I've recreated it in order to be able to view old revisions properly, along the same lines as {{context}}. If it causes issues it can certainly be deleted again. This, that and the other (talk) 03:12, 11 June 2022 (UTC)[reply]
See User:Mglovesfun/-eur for an example of what this looks like. I have no idea why every use of {{etyl}} is on its own line, but honestly it probably doesn't matter - especially if it makes it less likely that people will use the template anew. This, that and the other (talk) 03:17, 11 June 2022 (UTC)[reply]
@This, that and the other {{deprecated code}} was using <div> when it should have been using <span>. I fixed it and now things look more reasonable. Benwing2 (talk) 06:16, 11 June 2022 (UTC)[reply]
I don't really care about old revisions and am not particularly inclined to keep it - lots of templates are deleted all the time, and we don't and definitely shouldn't go keeping all those as deprecated just for old revisions' sake. —Svārtava (t/u) • 15:31, 10 June 2022 (UTC)[reply]

Great job. Link Count still says 566 wikilinks and 15 transclusions, but these are all out of mainspace. Thanks to everyone who helped clean out all of these. —Justin (koavf)TCM 04:35, 11 June 2022 (UTC)[reply]

Stricter attestation criteria for offensive entries

[edit]

Hi, I would like to raise for discussion a proposed amendment to WT:ATTEST for how offensive entries are dealt with. Examples of such offensive entries include Apefrican, Buttswana, criminigger, cumskinned, faggotface, jaboon, koala fucker, Mexicunt, negro fatigue, nigdar, Norgay, piss drinker, Porntugal, San Fransicko, suspook, teenaper, Turd World, Vladimir Pootin, and West Undies (and this is just what's currently on or was recently on the RFD and RFV pages). Please help to refine the amendment, and comment on whether you feel this is a good idea or not. — Sgconlaw (talk) 14:14, 10 June 2022 (UTC)[reply]


If an entry is offensive to an individual, group of persons, or geographical location, it must have at least three quotations satisfying WT:ATTEST added to it within two weeks [one week?] of the entry being created or being nominated at RFD o r RFV, whichever is later, otherwise it may be speedily deleted after that period.

An entry is considered as offensive if it:

  • denigrates a named individual in any way; or
  • denigrates an unnamed individual, group of persons, or geographical location on the basis of ancestry, ethnicity, gender or sex, religion, or sexual orientation.

The speedy deletion of the entry is without prejudice to its re-creation if WT:ATTEST can be satisfied as described above.


The rationales for the proposed amendment are as follows:

  • It is hard to tell whether such entries are genuine or hoaxes.
  • The (usually anonymous) editors who create such entries are essentially pushing the task of verifying these entries to other editors. We are not the Urban Dictionary. The amendment discourages editors from adding offensive entries unless they are willing to put in the effort of ensuring the entries are attested.
  • Due to the dubious nature of these entries, they are rightly challenged at RFV or RFD. However, this clutters up these fora, and uses up the time and effort of editors in discussing and verifying the entries which could be used more productively.
  • Arguably, the reputation of the project as a whole is lowered by the presence of such entries. There is no particular benefit in having many unattested offensive entries; only those which are properly attested within a short period of time deserve to remain.

Discussion

[edit]
Agreed on all points. Many of these entries are nonce words as well and can be formed arbitrarily (one of the points of WT:SOP as well). — SURJECTION / T / C / L / 14:22, 10 June 2022 (UTC)[reply]
Categorizing single words as "SOP" sets a troubling precedent. Affixes such as anti- and -hood are inherently formulaic, yet we still document the words that can be formed with them. Binarystep (talk) 09:49, 11 June 2022 (UTC)[reply]

I'd like to just add to this by saying that a lot of the time these look to be repeat nonce words, rather than genuine words, too. Theknightwho (talk) 14:27, 10 June 2022 (UTC)[reply]

I believe that the main problem with this proposal is that the meaning of "denigrate" will inevitably be over-extended based on politics. --Geographyinitiative (talk) 14:29, 10 June 2022 (UTC)[reply]

It doesn't matter. At the end of the day, if qualifying quotations can be found, the entry will be kept (or can be recreated). This puts the onus on editors wishing to create the entries to do their homework, and not use a scattergun approach by creating numerous entries and then pushing the verification work to others. — Sgconlaw (talk) 14:32, 10 June 2022 (UTC)[reply]
I support this idea, I would advocate for requiring citations to create the entry to begin with. If someone doesn't want to do that legwork they can create a request for the entry to be created. - TheDaveRoss 14:46, 10 June 2022 (UTC)[reply]
(edit conflict) I agree, but I'd actually raise the bar a little: One week after the creation of the entry regardless of whether it gets an RFV or RFD. I don't see why an entry should sit there a week longer just because an RFV has been filed. That said, it might be a good idea to open a new forum (or make it a subtask of RFV) to ask others for help with finding quotations (especailly for WDLs). I can see why it can be frustrating to have to come up with a third quote all by yourself when it's a widespread word. Actually scratch that, Dave makes a good point about using RE for that. Thadh (talk) 14:51, 10 June 2022 (UTC)[reply]
The prob w/ Dave Ross' idea is that this forms an INCREDIBLE barrier to entry for n00bs, which is the exact wrong direction for Wiktionary to go. No new entry w/o cites=gated community. --Geographyinitiative (talk) 11:54, 11 June 2022 (UTC)[reply]
@Geographyinitiative: note that this proposal deals only with denigratory or derogatory entries, not all entries. I feel that a higher standard is required for entries which are essentially used purely for insult, especially when it seems there are editors who deliberately create large numbers of such entries. Frankly, I don't think there's a great loss if n00bs who wish to engage in this sort of behaviour are dissuaded by the policy. — Sgconlaw (talk) 12:20, 11 June 2022 (UTC)[reply]
@Thadh: I proposed two weeks [or one week] after creation or after nomination for RFD or RFV, whichever is later. The latter was to cover the situation where an offensive entry goes unnoticed until after two weeks (or a week) after its creation. So under the proposal it's not necessary to wait till an entry has been nominated for RFD or RFV; an administrator who spots an unverified entry within two weeks (or a week) of its creation can go ahead and nuke it. — Sgconlaw (talk) 16:12, 10 June 2022 (UTC)[reply]
Okay, that wasn't made clear by the wording "whichever is later". Thanks for clearing this up! Thadh (talk) 16:24, 10 June 2022 (UTC)[reply]
A cleaner way to handle this might be to make the requirements explicitly the same as those for recreating an entry that was deleted through rfv, except that they shouldn't be as easy to speedy because deleted entries have a warning that comes up when you edit the deleted page. We probably should add a sentence to the page creation text notifying would-be entry creators. Chuck Entz (talk) 15:33, 10 June 2022 (UTC)[reply]
@Chuck Entz: remind me what these requirements are and where they are noted? — Sgconlaw (talk) 16:12, 10 June 2022 (UTC)[reply]
It seems to be one specific anon (using various IPs) who is mass-creating this kind of thing lately. Equinox 17:21, 10 June 2022 (UTC)[reply]
That might be the case, but my impression is that we’ve had this sort of problem on and off for some years now, so we might as well decide on a way of dealing with it. — Sgconlaw (talk) 17:38, 10 June 2022 (UTC)[reply]
Yes, but this week's anon is different from the anon that triggered the original discussion (Australia vs. US), and I suspect there will be others. Chuck Entz (talk) 19:38, 10 June 2022 (UTC)[reply]

TBH this is not a bad idea at all. If you're going to add rare offensive terms, you should be prepared to back them up with attestation instead of burdening other users with that work. I think it does make sense to apply this specifically to offensive terms since they are more inflammatory, sometimes made up, and are often low-quality entries with just the bare definition provided. And while Geographyinitiative above makes a decent point that "offensive" may end up being interpreted more broadly than intended, I honestly wouldn't mind a wide application of this rule. After all, there's no prejudice against recreating when quotations are found anyway. 70.172.194.25 19:15, 10 June 2022 (UTC)[reply]

The good news is that it is an easy thing to regulate, if someone deletes something which shouldn't be deleted there is quick and easy recourse: either add some citations (anyone can do this) or, other admins can undelete and create an RFV. I'd rather find out if it is a problem rather than ignore the problem which seems apparent already. - TheDaveRoss 19:19, 10 June 2022 (UTC)[reply]
@TheDaveRoss: yes, that's what I figured. There's very little downside to the proposal. It's not intended to act as a ban on offensive entries. If it is really felt that a particular entry should be included, then the editor(s) merely have to back it up with the required minimum number of quotations and it can be recreated or undeleted. On the other hand, the proposal hopefully dissuades editors who really can't be bothered to properly justify offensive entries from creating numerous ones and wasting the time of other editors who then have to deal with the entries at RFD or RFV. — Sgconlaw (talk) 21:46, 10 June 2022 (UTC)[reply]
There is no other burden than with any term, other than your personally feeling offended despite not being spoken to or about by the mere mention of a word.
Also this proposed rule discriminates autistic users who have a hard time recognizing offense in the first place, here even more complicatedly only abstractly assumed from the possible uses of a word rather than its actual use which happens in lexicography, which anyone linguistically minded may barely feel.
I also respect those who enter the editorship of this dictionary by filling the gaps they perceive in the coverage of injurious terms. Laxer criteria attract editors—they don’t repel readers, who don’t search for bad entries. Who of the greatest Wiktionarians started there? Closing the gate after twenty years is cheap.
There are a great many things on the internet to be offended by, these terms being systematically entered into Wiktionary aren’t one. Fay Freak (talk) 21:28, 10 June 2022 (UTC)[reply]
Offence is not the operative criterion, though. Denigration is, which is a much more objective benchmark. Theknightwho (talk) 22:50, 10 June 2022 (UTC)[reply]
It is not clear from the formulation that this is exclusive and even if it were the concept of “denigration” hardly has a lesser compass. Lambiam reckons this “definition” likewise unhelpful below. Fay Freak (talk) 23:29, 10 June 2022 (UTC)[reply]
Note also the revealing misstep in wording of assuming entries denigrating. You’ll only act on what you made up in your mind. If you abstract the entries from the objects described by them you should be indifferent to the former. Fay Freak (talk) 23:32, 10 June 2022 (UTC)[reply]
To be clear: "entry" is being used as a shorthand for "term described by an entry".
I'm also confused by your point that people might not understand that a term is denigrating. No specific person in the Wiktionary community needs to be offended for us to recognise that - the point is that the meaning of the word is derogatory in some fashion. Theknightwho (talk) 03:46, 11 June 2022 (UTC)[reply]
You are really scraping the bottom of the barrel looking for reasons to oppose this, discriminating against autistic people... The whole project discriminates against illiterate people too, might as well shut it down. - TheDaveRoss 23:36, 10 June 2022 (UTC)[reply]
@TheDaveRoss: What if it is not the bottom of the barrel but the gorilla in the room? I can’t relate at all to this culture of being offended, but to those who can’t relate and are passed over by those who show much concern. And it has often happened in larger software projects that those codes of conduct or similar have made all too risky to that kind of people that fail sensitivity to those distinctions of social acceptedness—which is completely irrelevant to objective mission of the project as long as contexts can be caught by rough labels, but even these are controversial (Wiktionary:Requests for verification/English#niggershipTalk:niggership one showed haphazard application of the label “ethnic slur”, the largest contributor exaggerates the meaning of “slang”, another categorized all vaguely right-wing as “Neo-Nazism” and was rightly reverted by him; soon we will only discuss the interpretation of our rules instead of content if the former accretes on this basis—aye, I really like opposing expansion of rules in general, and this is a good enough example for general reservations; new rules, new problems, nothing of concern solved). Fay Freak (talk) 01:09, 11 June 2022 (UTC)[reply]
I am disappointed that you continue to misinterpret the meaning of "ethnic slur" despite having had it explained to you in depth, and I have absolutely no idea how you came to either of your other conclusions other than the fact that you didn't like the fact that you didn't get your own way. Particularly with neo-Nazism, it's blatantly incorrect to say that anyone "categorized all vaguely right-wing as “Neo-Nazism”", because the terms under discussion are either well-known to be neo-Nazist, or were being considered for removal from the category. You just seem to have an axe to grind. Theknightwho (talk) 03:44, 11 June 2022 (UTC)[reply]
@Theknightwho: If you call repeatedly dropping a reference to the synonymous Wikipedia article, whose definition I had already proven to be practically incomprehensible, an “explanation in depth”. deaf rather than deep is the term you aim at to describe the quality of your answer, that’s why you have “no idea”. The construction of my ideas is completely laid out to be traced. The claim stands that you, and WordAndNerdy, misinterpret the meaning of “ethnic slur”, and so you will phantasize broader meanings of being “offensive” and “denigrating”. “Reference” and “allusion” can be understood in various grades of directness. Currently the fourth gloss of refer you refer to (now reading “To allude to, make a reference or allusion to“) is no real definition and must be replaced for using but itself and a synonym for definition. Fay Freak (talk) 09:26, 11 June 2022 (UTC)[reply]
@User:Fay Freak Nothing about the definition I gave you was incomprehensible - you simply didn’t realise that it is possible to use “refer” to mean “allude”, which means to refer to something indirectly. There is nothing circular about that - it just means the verb “refer” can be direct or indirect. Quite clearly that means that the definition of “ethnic slur” encompasses words which indirectly denigrate. This is not a difficult concept, and your prescriptive rules lawyer approach is not convincing to anyone.
Aside from that, are you seriously making the argument that it is impossible to know when a word is denigrating on a collaborative dictionary of all places? Is denigration some kind of special form of knowledge that is uniquely difficult to determine? How do you think we determine the meaning of any words at all? Theknightwho (talk) 13:10, 11 June 2022 (UTC)[reply]
@Theknightwho: I deny that “refer” means “allude”; even if it does, it is not clear that the Wikipedia article uses it in this unusual way, and rather it uses the stricter sense and e.g. niggership is not an ethnic slur under it. The correct word is apparently connote as opposed to denote, no such thing as “indirect reference”—if you search that you find analytic philosophy books with their usual made-up language; Indirect self-reference uses “indirect” not in the sense of “aside from” but “through a longer path directly”; like “rules lawyer” is a paradox and beside the point that editors are unable to work with the definitions to any advantage. I am not making an argument that it is impossible but editors are incapable or it is unnecessary uncertain and hard though “possible”. Perhaps I can, you don’t and WordyAndNerdy doesn’t and does not want as owned by her below as “subjective lines”, and unknown IPs will perform worse than you all. Note that it is a well-known fact that defining any term of law by a criterion “directness” is always to some degree controversial and vague, but you can’t drop it either and content yourself with offensiveness or denigration discovered over five corners. You will meet cases of doubt “is it denigrating (directly) enough?” even according to your broader framing. Fay Freak (talk) 16:07, 11 June 2022 (UTC)[reply]
@User:Fay Freak The Merriam-Webster dictionary defines “refer” in the intransitive sense to mean “to have a relation or connection”, which is precisely the way it is being used, and does not exclude indirect references. connote would not be a correct gloss in Wikipedia’s definition, because the usage is intransitive (“to refer to X”), and not transitive (“to connote X”).
There is also a pretty extreme irony in you arguing that you can dismiss a definition as “made up” while trying to argue that we should keep words that are themselves made up by those that use them. Rules lawyering is not something to aspire to - it is nonsense borne out of working backwards from your conclusion, instead of using reason to work towards one. Do not conflate it for making a cogent argument. It also explains why you would argue against such a plainly common use of the word “refer”, which I and many other native speakers use frequently (whether or not you approve). Your prescriptivism has no place here.
You have also failed to provide any justification for saying that it is particularly difficult to judge that a term is being used in a derogatory way. You have simply expressed doubt, while conflating verbiage with making a substantive point. It’s hard not to see the double standard in your viewpoint, and you have yet to provide any reason for it. That’s aside from the fact that they’re very often added with that exact label or something very similar, which circumvents your entire point. You may call it a mistake, but the operative issue is that the person intended to add a derogatory term, and in the absence of citations we must take it at face value (as we do for the rest of it). This is, after all, a conversation about whether such terms are attestable.
I should also add that you are the only person that has made this about being offended. This has been pointed out to you several times. The issue is actually with phantom terms that are wasting our time, and those happen to more often be derogatory because they’re much more likely to be created by people as a prank (or at the very least without any genuine conviction that they exist or have ever existed in real use). Theknightwho (talk) 17:24, 11 June 2022 (UTC)[reply]
@Theknightwho: No u. You afford verbiage around the fact that you are unable to comprehend the use—mention distinction or English in general without the use of a dictionary. By their having been used they are not made up any more as to be fake, but the dictionaries do not prove that this is not a ghost meaning—the usage examples for the alleged sense are inexactly described with “to allude”. Basic words use to be not well defined but circumscribed, and glossing “to refer” with “to allude” is exactly the kind of no-definition that we have to avert in the long run. These words are not synonymous. You are back in the Middle Ages when the pronunciation of words was illustrated by their being “pronounced like” some other word, abaca was defined as a kind of flax (this exactly happened in Medieval Arabic glossaries regularly) and the like; the Medieval layman state of definition is still there in the dictionaries as their foundation, and some reputable source claiming a word having a certain sense does not absolve use from discerning it in the corpus: use—mention distinction, you have not understood it. Fay Freak (talk) 17:48, 11 June 2022 (UTC)[reply]
@User:Fay Freak It’s the first entry for the intransitive use, and the OED also gives the sense “to mention, allude or make reference to something”. I have also not failed to understand the difference between mention and use - I have simply pointed out that (now two) bodies of experts agree with me. The fact that you have not heard (or more likely, not perceived) a particular use does not mean that it does not exist, particularly when you openly dismiss evidence to the contrary, while failing to understand the difference between transitive and intransitive senses.
At this point, Occam's razor suggests it’s much more likely that you simply just don’t like it because it’s inconvenient to your original bad faith argument that we shouldn’t label a lot of slurs as slurs. I am wholly unconvinced that you are simply trying to be technically correct, because you have presented nothing that supports your position - just unreasonable scepticism in the face of overwhelming evidence.
You are correct that refer and allude are not synonymous, though, because “refer” is general while “allude” more specific. I’m not sure why you think I said otherwise.
Theknightwho (talk) 18:32, 11 June 2022 (UTC)[reply]
@Theknightwho: Then don’t define it that way. My edit to refer was still an improvement—and not a “removal” either since the single usage example of that definition line which used this term was moved by me to the first definition line (which was expanded). But as you start to see the difference between reference and allusion, or its possible meaning, you see how much room there is to see the see some vagueness within concepts of “denigration” and “offensiveness”, or at the periphery of the sets of terms to be covered by them.
I don’t think “just don’t like it” can apply since I don’t even remember having added terms of any connected kind nor plan to do so, and also because not liking also has its reason, and I endeavoured to uncover the reasons why I intuitively don’t like it, not having been convinced of a different stance about offensive terms, which as said I can’t relate to. (Somebody made something offensive on the internet, ugh! Yet formally he was right and a scientist.)
BTW, why not, if at all curtailing the Usenet quotery, restrict it to English, since for foreign languages we have too small editor communities altogether and the problem has not arisen there nor is there equivalent potential? For foreign languages we still have very usual slurs to cover. Fay Freak (talk) 21:29, 11 June 2022 (UTC)[reply]
@Fay Freak Your edit to refer was an incoherent mess that conflated the transitive sense (“I refer you to X” = “I bring your attention to X”) with the intransitive sense (“I refer to X” = “I make reference to X”) - they’re completely different things. Either you are not competent enough in English to be editing the entry, or (as I suspect) you were intentionally trying to remove a sense out of process because you didn't like it. There is no excuse for it, particularly when you have native speakers insisting it is correct. The correct venue is WT:RFV/E, which you very well know. Theknightwho (talk) 21:40, 11 June 2022 (UTC)[reply]
@Theknightwho: Transitivity variation does not automatically make a new sense. Still there is no definition and I don’t take your correctness claim in favour of evident nonsense.for granted. I know English better than most native speakers and you are obviously on the lower end of them—why not? There is no rule that a native speakers trumps a non-native. It is all about the amount of input of language material, and despite perhaps having read more English than any other language this definition line is no explanation of the alleged sense to me—and objectively. What is the alleged sense? In the usage example there is no allusion. It is a complete nonce definition. Perhaps define the basic words sensefully before trying to restrict nonces, this would amount to greater reputation of Wiktionary! Fay Freak (talk) 21:49, 11 June 2022 (UTC)[reply]
@Fay Freak Your definition failed to capture either sense accurately, but feel free to take things to WT:RFV/E or WT:RFC if you perceive there to be a problem. Please do remember, though, that your inability (or unwillingness) to comprehend something does not mean that it is incomprehensible. Also, I recommend you write to the OED to inform them that their definition means they must be on the lower end of the English spectrum, too. I’m sure they’d be delighted to have your input. Theknightwho (talk) 22:03, 11 June 2022 (UTC)[reply]
My definition succeeded in capturing either sense accurately. Your failure to comprehend what it captures does not mean that it is incomprehensible. Conversely your claim of having subjectively comprehended a definition does not mean it is comprehensible, maybe you just fancied something together which is not there, or the definition here only fails higher requirements of those who need less vulgar concepts to content themselves with. So feel free, too, to take my version or both versions to WT:RFV/E or WT:RFC if it is all too hard for you. I’m also sure the OED would be delighted to have my input but they would have to pay for better definitions. Fay Freak (talk) 22:18, 11 June 2022 (UTC)[reply]
@Fay Freak You don’t get to remove a sense and then tell other people to take it to RFV, and if you think one sense doesn’t belong then the correct place is WT:RFD/E. This is a consensus project. Theknightwho (talk) 22:28, 11 June 2022 (UTC)[reply]
@Theknightwho: I have not removed any sense but combined definitions. You still have not shown what sense there would be. Will you show it if I RFV it? I will still have to combine it because your interpretation of what the quotes attest will be wrong; since it is impossible to prove a senseless definition, someone has to completely replace it or merge it. This is why it is no RFV matter. You are very detached from the meaning of all procedures which consensus has introduced. Neither RFV nor RFC are for evident nonsense in entries—OED having the same nonsense does not get you to preserve it. You just try to instigate me to abuse process, for the price of possibly keeping senseless definition. Fay Freak (talk) 22:51, 11 June 2022 (UTC)[reply]
Can this conversation please move elsewhere? I'm on the verge of collapsing it into a box, since it's not explicitly related to the main discussion at hand. Also, fyi there is an informal lemma policy where we do look at other dictionaries, especially OED, to determine if a word should be included. AG202 (talk) 22:58, 11 June 2022 (UTC)[reply]
@AG202 Please feel free. @Fay Freak I refer you to WT:RFD/E to make your case. Be sure to refer to this discussion! Theknightwho (talk) 23:05, 11 June 2022 (UTC)[reply]
@AG202: It is related in so far as the controversy about the proposal concerns how vaguely or indirectly a term (an offensive or denigrating term) might make reference and depreciate an “individual, group of persons, or geographical location on the basis of ancestry, ethnicity, gender or sex, religion, or sexual orientation.” A similar term like “(ethnic) slur” was likewise problematic, for comparison. I mean, in which fashion does niggerhood so, while nigger does undoubtedly in some usages? This is just one stupid and easy example, I am anxious about harder ones.
It makes sense to collapse this argument of increasing detail: meseems it can be after the words ”should be indifferent to the former.” Fay Freak (talk) 23:20, 11 June 2022 (UTC)[reply]
I don't think it's accurate to describe triple parentheses and ZOG as merely "vaguely right-wing", given their origins and usage. Binarystep (talk) 09:42, 11 June 2022 (UTC)[reply]
And I didn’t, this is more in the inner ballpark of “right-wing”, yet ))) ((( is not “Nazism”, and Nazism, given a clear historical picture, should be understood as a more clearly defined term than “offensive” and “denigrating”, yet editors even fail that. Fay Freak (talk) 09:57, 11 June 2022 (UTC)[reply]
))) ((( wouldn't belong in Category:en:Nazism, as it wasn't used during WWII, but it would certainly belong in Category:en:Neo-Nazism if such a category existed. Binarystep (talk) 10:07, 11 June 2022 (UTC)[reply]
@Binarystep Such a category now exists. It’s good to separate them, if nothing else to prevent the kind of blatant misrepresentation we both replied to (including the obvious lie that they weren’t referring to triple parentheses or ZOG, which were the only two terms mentioned in the linked discussion). Theknightwho (talk) 19:11, 11 June 2022 (UTC)[reply]
@Fay Freak My arguments have nothing to do with being personally offended. I firmly do not believe that Wiktionary is a better dictionary or lexical resource if we claim that literally any string of characters to which anyone ever has ascribed meaning is automatically part of the language. I get that you don't agree, but you can express your disagreement without doing so on the highest horse you can find. The vast majority of the arguments you have made in this discussion section have nothing to do with the policy question posed, and instead of furthering the discussion they make it much harder to follow, some kind of lexical Gish gallop. - TheDaveRoss 15:39, 13 June 2022 (UTC)[reply]
Many people find at least some terms offensive that are not denigrating. Conversely, some people may consider some clearly denigratory terms not offensive. To avoid some pointless discussion, it may be better to refer to this by “attestation criteria for denigratory entries”. The definition of “denigratory entry” would be the same as now (“This rule applies to entries that denigrate a specific individual in any way, or an individual, group of persons, or geographical location, etcetera, on the basis of ancestry, ethnicity, gender or sex, religion, or sexual orientation.”)  --Lambiam 22:21, 10 June 2022 (UTC)[reply]
@Lambiam: sure, I’ve no objection to that. — Sgconlaw (talk) 22:27, 10 June 2022 (UTC)[reply]
I think it would be nice if it covered some other dubious slang like the recently deleted daddy's carrot, but I'll take the improvement that merely including denigrating terms would bring. - TheDaveRoss 23:35, 10 June 2022 (UTC)[reply]
I agree with the proposal and I would add further that I'm very skeptical of terms that can be attested only on Usenet. I have never heard of Norgay, for example, or any of the other terms given at the top of this entry, yet someone has added 5 cites from 3 different Usenet groups. Usenet is (or more like was) a sort of subculture with its own idiolectal terms. If you search in Google for "Norgay" for example, you get a zillion hits for Tenzing Norgay, and if you search for "norgay" -tenzing you get 96 hits, none of which seem to refer to Norway except for one link to the Urban Dictionary entry and one other to a random website timetoast.com that for all I know made it up independently, as it also has similar terms like "Swedgay", "Dangay", "Germgay". So if such terms get kept due to Usenet cites, I would want them tagged with a "Usenet only" or similar label. Benwing2 (talk) 00:46, 11 June 2022 (UTC)[reply]
This is not particular to Usenet. Somewhere terms must be interconnected in their arisal for lexicalization to be achieved, rather than having been independently coined. But the connection uses to be invisible in the sources. And there is no provision to save use from occasionalisms other than counting attestation which you perceive as a method reduced to absurdity. Fay Freak (talk) 01:21, 11 June 2022 (UTC)[reply]
Reading the Norgay/West Undies/Buttswana/Porntugal discussion, actually one could add an additional criterion of something like, as a rough 3:30 AM negative formulation, that the word must be believed to be not coined independently or occasionally. WT:CFI requires “independence” of terms meaning independence from referring to a particular environment (controversial in the details), yet they also must be dependent in the sense of being back-coupled in the language communities and perhaps still causally “depend” on a single coiner. One could also just implement User:Fay Freak/Wiktionary:ATTEST 2021 and take the wording “live” seriously, as a word living merely on the occasion is even less then living in the familiar circle of thee and thy best friends (which we soothfast even already agree about to be not inclusionworthy but fail to reflect conceptually). Fay Freak (talk) 01:43, 11 June 2022 (UTC)[reply]
Re "Somewhere terms must be interconnected in their arisal for lexicalization to be achieved, rather than having been independently coined": I agree, and that's what I tried to argue at Talk:cowtastrophe: "The problem is that this word is not a real "trend": it's not being picked up by a speaker, then another, etc. We're simply lumping quotes together to fulfil the CFI, but this is artificial". PUC21:30, 12 June 2022 (UTC)[reply]
I don't think the coinage of a word really matters all that much. Plenty of affixes (such as anti- or -able) can be used in such formulaic ways that some words surely only exist because of different writers independently creating them. Should we restrict them too? Binarystep (talk) 01:53, 13 June 2022 (UTC)[reply]
(They are already restricted, see non-Canadian which I argued to keep) AG202 (talk) 01:58, 13 June 2022 (UTC)[reply]
To my knowledge, they're only restricted if they're hyphenated. I also think non-Canadian should've been kept. Binarystep (talk) 02:27, 13 June 2022 (UTC)[reply]

Overall this seems to be a very good proposal. Vininn126 (talk) 12:34, 11 June 2022 (UTC)[reply]

If the concern is about e.g. some new account just showing up and adding a bunch of racist (or otherwise offensive) terms out of (presumably) racist (or otherwise objectionable) motivations, why not add a requirement that, no matter how well cited a new entry documenting an offensive word is, it can only be added if the user has a history of making numerous other good edits for non-offensive words? Like, maybe only allowed if the proportion of like, offensive words they've added to good positive contributions for non-offensive words, is sufficiently low. So, for example, because my account is new and I haven't made other contributions before this edit, I would not be allowed to create an entry for any offensive words. (If you are wondering why I'm making this suggestion when I haven't made any other edits: someone else mentioned the discussion to me, and I thought of this idea that I thought of as a compromise position, and they asked that I add it, because I thought of it and they didn't want to take claim of my idea. If my participation here is inappropriate, I apologize.) A potential difficulty I see with this idea of mine, is that I'm not sure whether it would be easy enough to measure such ratios... but still, maybe something along these lines could help? --Madaco1 (talk) 01:21, 13 June 2022 (UTC)[reply]

As mentioned in the comment I recently made, I don’t feel like this is implementable. How would they be tracked? Would they just be barred from adding the “offensive” label? Then they could just add the terms and then wait for someone else to add the labels later, and then we’re back to square one. Let alone the issue of new users with genuine intents documenting languages that aren’t covered here. That’s why I came up with this compromise after the multiple discussions that’ve been had. (CC: @Binarystep) AG202 (talk) 01:32, 13 June 2022 (UTC)[reply]
Yes, there's no way to automatically prevent new users from adding "offensive" material because there's way to automatically identify such terms. Benwing2 (talk) 01:39, 13 June 2022 (UTC)[reply]
I'm aware they can't be automatically prevented, but they can be deleted and their creators can be banned. It'd be treated the same way as any other form of vandalism, which is what this form of trolling effectively is. Binarystep (talk) 01:44, 13 June 2022 (UTC)[reply]
@AG202: You never did respond to my points here, by the way. Binarystep (talk) 23:59, 13 June 2022 (UTC)[reply]
I didn't feel the need to; please don't tag me like that. I don't have to respond to everything, and part of me wishes I hadn't as much. AG202 (talk) 00:04, 14 June 2022 (UTC)[reply]
If your concern is whether new users would be allowed to add terms from LDLs, you could always rewrite my suggestion to only block IPs from creating pages for offensive terms from well-documented languages. I'm not sure this is even a problem to begin with, though. Binarystep (talk) 01:47, 13 June 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── I have now created a vote at "Wiktionary:Votes/pl-2022-06/Attestation criteria for derogatory terms". — Sgconlaw (talk) 22:06, 13 June 2022 (UTC)[reply]

Attestation period

[edit]

Assuming people are generally in favour of the proposal, should we stick with two weeks, reduce it to one week, or pick some other period to allow for attestation of such entries? — Sgconlaw (talk) 04:37, 11 June 2022 (UTC)[reply]

One week was my suggestion when I proposed this policy. I'm not set on that exact length of time though. I can see a rationale for allowing newer users a grace period to learn how to find and format CFI-compliant cites. WordyAndNerdy (talk) 05:08, 11 June 2022 (UTC)[reply]
The shorter timing should not start until the entry is identified as offensive. Additionally, it should not start until a putatively adequate set of quotations has been rejected. The word Dutchman strikes me as a good one to start thinking about. Should the offensive meaning 'A male white Afrikaner' be struck immediately under this rule? It lacks three attestations. Even if we strike that meaning, isn't there something offensive about the rider 'or I'm a Dutchman'.
As to the number of attestations, does this requirement also apply to words in less documented languages? I can easily imagine someone innocently adding a Sumerian or Pali word without being aware that it was an offensive term. And if they were aware, would they be required to provide three independent attestations? --RichardW57 (talk) 14:33, 12 June 2022 (UTC)[reply]
Thinking of words that would stand for immediate deletion, I can also think of the term Rupert for an officer. --RichardW57 (talk) 14:45, 12 June 2022 (UTC)[reply]
Is the dog's name Nigger offensive? Apparently it wasn't in the 1950's (source w:Nigger (dog)), and I don't have any quotations to show it's still being bestowed. RichardW57 (talk) 14:45, 12 June 2022 (UTC)[reply]
Would this even be needed as an entry on Wiktionary? Of all the things I've seen folks call encyclopediac here, this seems like a prime example. AG202 (talk) 14:51, 12 June 2022 (UTC)[reply]
For better or worse, it was the name of Lovecraft's cat as well, and was a somewhat common name for pets back then. We certainly shouldn't be referring to any specific animals, though. Theknightwho (talk) 15:55, 12 June 2022 (UTC)[reply]
LDLs only need one cite or reference and I would be very surprised if a person adding the term in Sumerian doesn't have a reference they could give. Thadh (talk) 08:24, 13 June 2022 (UTC)[reply]

More quotations to be required? Disallowance of certain sources?

[edit]

@Sgconlaw I support this proposal, but I think more could be done. If our concern with the project is focused around having these types of words in general, then the amount of citations required needs to be increased, either in this proposal or one very soon after. "Arguably, the reputation of the project as a whole is lowered by the presence of such entries." Currently, there's nothing stopping anyone from easily finding solely 3 cites from Usenet, often from the most horrid places (see: the cites at Apefrican for example), leading to these low-quality entries being kept in the first place. We talk so much about how we're not Urban Dictionary, yet we give a presence to these one-off terms from vile, racist spaces? At least people know not to take Urban Dictionary seriously. By giving these words this kinda platform, we're only furthering the propagation of them, while they may have initially died out, being used only in those spaces. People take this website seriously (see: Petersonian and how the subreddit for it took its presence on Wiktionary as vilifying the term), so I wish that there were more done to actually increase its quality as a whole. And while I can see this current proposal at least stopping a few for now, I can easily see some editors keeping track of which offensive entries have been created, solely to add citations to them so that they aren't deleted. Overall, WT:ATTEST must be updated. AG202 (talk) 05:49, 11 June 2022 (UTC)[reply]

How do editors feel about increasing the minimum number of quotations required for derogatory entries from three to, say, five? — Sgconlaw (talk) 05:55, 11 June 2022 (UTC)[reply]
For me, I feel like it's less the number, but more the place. I think it'd be better if nonce offensive terms were required to be found in multiple sources. Most of these terms are found solely on Usenet and three vs five doesn't really feel like it'll change much. This would just be adding on to the requirement of having 3 separate authors, and that way, offensive terms that do have currency would be included, while the ones that clearly don't wouldn't be. AG202 (talk) 06:01, 11 June 2022 (UTC)[reply]
@AG202, Sgconlaw I agree with AG202 and I would suggest that as part of this proposal we simply disallow Usenet sources from counting as attestations for such words. That should get rid of most of the nonce words while allowing words like libtard and Rethuglican that do have currency in multiple sources. Benwing2 (talk) 06:06, 11 June 2022 (UTC)[reply]
Hmmm, interesting. Let’s see what others think. What do you feel is the justification for disallowing reference to Usenet in this case, which is generally permitted as a source? — Sgconlaw (talk) 09:10, 11 June 2022 (UTC)[reply]
@Sgconlaw Many of the terms listed above appear to be citable only in Usenet, which seems to be a magnet for people making up derogatory blends. On top of that, we tend not to allow postings on Twitter, Instagram or the like, and Usenet seems more similar to that in its self-curation than to a book, newspaper, magazine, etc. Benwing2 (talk) 17:26, 11 June 2022 (UTC)[reply]
I strongly disagree with this. Wiktionary's goal is allegedly to document "all words in all languages", and arbitrarily restricting our coverage of offensive terms runs counter to that. I'm not going to defend these terms or the racists who use them, but they exist whether we like it or not. If a term is citable and isn't SOP, it should be included. It's not our job to sanitize the English language. Binarystep (talk) 09:34, 11 June 2022 (UTC)[reply]
That's not what's happening. You seem obsessed with the idea we're being prescriptivist. We're not. That is a very phenomenon. What's happening is we are trying to reduce the amount of vandalism as well as nonce words, i.e. words perceived as created for the situation by both the speaker and listener. With time nonce words might become real, but until then they are one-offs. Stop accusing everything of being prescriptivist. Vininn126 (talk) 12:38, 11 June 2022 (UTC)[reply]
Category:English nonce terms existed for a decade without any issues, so it's not like nonce words are banned or anything. Only a few terms are being targeted, and it's solely because of their offensiveness, which is what I take issue with. It's not Wiktionary's job to decide which words are too offensive to mention, and none of our policies support removing words for that reason. In fact, similar proposals have been overwhelmingly rejected in the past (see here for one example). Binarystep (talk) 15:20, 11 June 2022 (UTC)[reply]
There's a difference between these kinds of nonce words, and it's their frequency. Vininn126 (talk) 15:21, 11 June 2022 (UTC)[reply]
Tbf looking at that discussion, it doesn't look like it was overwhelmingly rejected? There were many posts from both sides and a lot of editors trying to find a middle ground; it's just that the discussion fizzled out without finding a common ground, as Wiktionary discussions tend to do multiple times on end. AG202 (talk) 15:46, 11 June 2022 (UTC)[reply]
I'm a bit late to the party, but I also oppose this: If three different people have used it, it's a word, that's our attestation criterion. I wouldn't be opposed to giving a label like "internet slang" or "Usenet slang", but dismissing the source doesn't seem right. Thadh (talk) 08:30, 13 June 2022 (UTC)[reply]
I think it could be a good compromise to have a label and category for Usenet-exclusive terms. Binarystep (talk) 09:39, 13 June 2022 (UTC)[reply]
@Thadh Once again, this is not a Usenet-exclusive issue. I also don't really feel comfortable assigning "Usenet slang" to other terms, like fandom slang, that are only citable on Usenet, as that could make them have a negative label. I just want that offensive terms be cited in more than one website/source, no matter what it is. Also, CFI is not that loose, it's definitely not as simple as "if three different people have used it, it's a word", otherwise we wouldn't clarifications about being durably archived, what counts as being independent, the spanning a year requirement, and more that occurs in the RFD/RFV discussions. Like I've mentioned to other folks, it seems like y'all have this image of Wiktionary that's a noble view of what the website should be, but that's not what it is in reality. We definitely have significant standards, and definitely not every word that has been used by three people is included on the website, otherwise we'd be inching much closer to what folks here complain about Urban Dictionary. AG202 (talk) 11:26, 13 June 2022 (UTC)[reply]
Durably archived is an issue for future reference. If Urban Dictionary were durably archived and if it were apparent that the people writing it actually used the words they're describing, and that those words are used by three people or more, yes, we would include it. Thadh (talk) 13:42, 13 June 2022 (UTC)[reply]
@AG202:
I also don't really feel comfortable assigning "Usenet slang" to other terms, like fandom slang, that are only citable on Usenet, as that could make them have a negative label.
How is it negative? We have categories for dialects of English, English used by non-native speakers, Polari slang, thieves' cant, and various other designations that indicate these terms are only used by specific groups of people. Listing a term as Usenet-exclusive isn't derogatory, it's a simple statement of fact.
I just want that offensive terms be cited in more than one website/source, no matter what it is.
Treating Usenet as a single source is factually inaccurate.
Also, CFI is not that loose, it's definitely not as simple as "if three different people have used it, it's a word", otherwise we wouldn't clarifications about being durably archived, what counts as being independent, the spanning a year requirement, and more that occurs in the RFD/RFV discussions.
CFI doesn't exclude words for moral reasons, though. That's the primary difference between the current status quo and this proposal.
Like I've mentioned to other folks, it seems like y'all have this image of Wiktionary that's a noble view of what the website should be, but that's not what it is in reality.
Should we not try to make the site better?
We definitely have significant standards, and definitely not every word that has been used by three people is included on the website, otherwise we'd be inching much closer to what folks here complain about Urban Dictionary.
Most terms on Urban Dictionary haven't been used by one person, much less three. Binarystep (talk) 23:07, 13 June 2022 (UTC)[reply]
I feel like I'm just going in circles. "Listing a term as Usenet-exclusive isn't derogatory, it's a simple statement of fact." Imho labels like "4chan-slang" do not give a positive light to me and some other folks, and with the rate that Usenet is at rn for me, I'm almost starting to feel the same. "Treating Usenet as a single source is factually inaccurate." this is up to interpretation, just like Twitter & Reddit are treated as one source in many conversations here, Usenet can be as well, especially with its community around these terms (I will not elaborate further please). "Should we not try to make the site better?" I don't find including any nonce derogatory that happens to pop up on Usenet three times making the site better, and that's just a fundamental difference between the two of us. "Most terms on Urban Dictionary haven't been used by one person" tbf, someone making an entry and uploading it to Urban Dictionary with a usage example is one person using it ¯\_(ツ)_/¯ I'm sure if we dug enough into the depths of Twitter, MySpace, and other social media, a lot of the terms that folks mention here from Urban Dictionary would be citable, it's just not worth it to do so. AG202 (talk) 23:13, 13 June 2022 (UTC)[reply]
"Listing a term as Usenet-exclusive isn't derogatory, it's a simple statement of fact." Imho labels like "4chan-slang" do not give a positive light to me and some other folks, and with the rate that Usenet is at rn for me, I'm almost starting to feel the same.
Why is that, though? It feels like you're seeing connotations that aren't there. Saying that a word is only used within a particular community isn't an insult.
"Treating Usenet as a single source is factually inaccurate." this is up to interpretation, just like Twitter & Reddit are treated as one source in many conversations here, Usenet can be as well, especially with its community around these terms (I will not elaborate further please).
Do all Usenet posts have the same author?
"Should we not try to make the site better?" I don't find including any nonce derogatory that happens to pop up on Usenet three times making the site better, and that's just a fundamental difference between the two of us.
My idea of making the site better includes making it as accurate as possible.
"Most terms on Urban Dictionary haven't been used by one person" tbf, someone making an entry and uploading it to Urban Dictionary with a usage example is one person using it ¯\_(ツ)_/¯
That's a definition, but not a use. Per Appendix:English dictionary-only terms, that wouldn't justify its inclusion.
I'm sure if we dug enough into the depths of Twitter, MySpace, and other social media, a lot of the terms that folks mention here from Urban Dictionary would be citable, it's just not worth it to do so.
Those terms are mentioned, not used. Even if we allowed online citations, they'd be treated as dictionary-only terms. I guarantee no one here would be able to find three independent uses of a term like "San Fernando Roulette" on any website, for instance. At best, you might find some "Did you know UD has a page for this?" and "This means this according to UD" comments. Binarystep (talk) 23:20, 13 June 2022 (UTC)[reply]
The connotations are there for me and other folks, as 4chan does not have the best rep, whether or not you see them is another thing. Clearly I am not! saying that Usenet posts all have the same author! Holy hell, I really feel like I'm being interrogated here when we clearly just disagree and I'm not going to change your opinion. "That's a definition, but not a use. Per Appendix:English dictionary-only terms, that wouldn't justify its inclusion." wouldn't justify its inclusion yes, but would still be "one person" using it. And then, idk I've been surprised at RFV efforts here, maybe not that specific word but others. AG202 (talk) 23:25, 13 June 2022 (UTC)[reply]
The connotations are there for me and other folks, as 4chan does not have the best rep, whether or not you see them is another thing.
I don't think we should ignore the history or usage of a word just because mentioning it would reflect badly on the word's users.
Clearly I am not! saying that Usenet posts all have the same author!
Exactly my point. By treating all Usenet posts as the same source, you're holding it to a much higher standard than any other "durably archived" source we use.
"That's a definition, but not a use. Per Appendix:English dictionary-only terms, that wouldn't justify its inclusion." wouldn't justify its inclusion yes, but would still be "one person" using it.
It may be a use in the literal sense, but it wouldn't count as one for the purposes of RFV, which is what I meant.
And then, idk I've been surprised at RFV efforts here, maybe not that specific word but others.
I mean... if a word can be proven to exist, then it should be included. Binarystep (talk) 23:44, 13 June 2022 (UTC)[reply]
🫤 This change would apply to all sources, so all Tweets, Reddit posts, NYT articles (though they usually don't have offensive terms, but are also durably archived) would count as one source respectively as I've already said :-////, so it's not just Usenet :-////. My same feelings as my most recent message far below apply here as well. AG202 (talk) 23:47, 13 June 2022 (UTC)[reply]
🫤 This change would apply to all sources, so all Tweets, Reddit posts, NYT articles (though they usually don't have offensive terms, but are also durably archived) would count as one source respectively as I've already said :-////, so it's not just Usenet :-////
This proposal came about because of Usenet, so I'm mentioning it as an example. My point is that you're lumping potentially dozens of authors together and treating them as the same person, solely because they used the same method of communication. Binarystep (talk) 23:53, 13 June 2022 (UTC)[reply]
I am treating them as one source, as I have said. You don't agree with that, and that's fine. You are going to vote oppose, and that's fine. Unfortunately, this conversation has not been fruitful. I am not arguing this further. AG202 (talk) 23:56, 13 June 2022 (UTC)[reply]

I am opposed to selectively requiring more or higher-quality citations for any one category of terms. Such a framework is just as likely to be employed to gatekeep non-derogatory terms that are viewed as less inclusion-worthy by some (fandom slang, obscure regional slang, social-justice and LGBT terminology, etc.) as it is to keep out nonsense pulled from the bowels of Urban Dictionary. Many terms seldom make it into print because they are rarely used outside specific contexts or communities.

I do agree with the principle of minimizing extremist and fringe material. I would support disallowing certain sources to be used as citations unless they are quoted by acceptable secondary sources (e.g. a white supremacist website quoted in an academic text). I would support updating CFI to allow for quotes deemed particularly offensive or unhelpful to be limited to citations pages. But I oppose proposals to redraw the criteria for inclusion along subjective lines. The lexicographer's job is to document language as it is used. And it isn't always used to noble ends. We cannot pick-and-choose which words we document without undermining our mission. It would also create a huge slippery slope.

That said I do support enforcing a time limit on RfV nominations of offensive terms (or at least terms of abuse against race, gender, religion, sexuality, etc.) but more for the purpose of vandalism reduction. WordyAndNerdy (talk) 10:23, 11 June 2022 (UTC)[reply]

@Binarystep @WordyAndNerdy "Such a framework is just as likely to be employed to gatekeep non-derogatory terms that are viewed as less inclusion-worthy by some (fandom slang, obscure regional slang, social-justice and LGBT terminology, etc.)" Imho, this is a slippery slope. I've been an ardent supporter of keeping those terms in Wiktionary and would strongly oppose any attempt to limit them. That's not my goal here. "We cannot pick-and-choose which words we document without undermining our mission." I don't get this argument. We already pick and choose, otherwise we wouldn't have WT:ATTEST, WT:CFI, WT:RFD, or WT:RFV in the first place. Every dictionary except maybe Urban Dictionary has inclusion criteria. Words used significantly on Twitter are still struggling to be covered here. Like we are not a 100% every-word-must-be-included dictionary. We have criteria and we can update it as we see fit. Giving a space to nonce derogatory words used in the most horrid spaces does not have to be our job, and we can update our guidelines as we see fit. That being said, that's part of why I said it's not the number, but the place where the words are being used. If the words are used elsewhere, then that's fine, they can be included, but if they're only used in white supremacist spaces in Usenet, then there's no reason why we need to give them the space that they're currently given here. For all the talk about us not being Urban Dictionary, these terms often make us look worse than them... AG202 (talk) 14:09, 11 June 2022 (UTC)[reply]
"Such a framework is just as likely to be employed to gatekeep non-derogatory terms that are viewed as less inclusion-worthy by some (fandom slang, obscure regional slang, social-justice and LGBT terminology, etc.)" Imho, this is a slippery slope.
Is it? There's already been a recent attempt to ban certain fandom slang terms for being "too niche".
We already pick and choose, otherwise we wouldn't have WT:ATTEST, WT:CFI, WT:RFD, or WT:RFV in the first place.
The issue here is that we'd be making an exception to our existing rules just to ban a handful of terms for reasons that have nothing to do with attestability or SOP-ness.
Words used significantly on Twitter are still struggling to be covered here.
And that's a problem. The solution isn't to restrict our already limited coverage even further.
Like we are not a 100% every-word-must-be-included dictionary.
Again, that's not a good thing. The more gaps we have in our coverage, the less useful the site becomes.
We have criteria and we can update it as we see fit.
That's true, which makes me wonder why we can't update our criteria in a way that makes our coverage more accurate instead.
That being said, that's part of why I said it's not the number, but the place where the words are being used. If the words are used elsewhere, then that's fine, they can be included, but if they're only used in white supremacist spaces in Usenet, then there's no reason why we need to give them the space that they're currently given here.
As long as Usenet is considered a "durably archived" source, we shouldn't make value judgements about which Usenet-exclusive terms are worth mentioning.
For all the talk about us not being Urban Dictionary, these terms often make us look worse than them...
The problem with Urban Dictionary isn't that it allows stupid terms, it's that it allows (and primarily consists of) terms that have literally never been used by anyone. A recurring nonce word used exclusively by racists is still a real word, unlike, say, "cincinatti ferris wheel". Binarystep (talk) 15:51, 11 June 2022 (UTC)[reply]
@Binarystep
Is it? There's already been a recent attempt to ban certain fandom slang terms for being "too niche".
I specifically defended that term (I was literally the first person to vote keep on it, and it's one RFD that's already heavily leaning on keeping the word), my proposal does not center around them, and I don't want it derailed.
The issue here is that we'd be making an exception to our existing rules just to ban a handful of terms for reasons that have nothing to do with attestability or SOP-ness.
We already make a TON of exceptions based on that? We don't include every celestial body, we don't include every place name, we don't include every number, we don't include pleeeease, we don't include Charizard, we don't include sarcastic usages, we don't include some company names, we don't include all chemical formulae, and the list goes on and on. Those are all words too that we exclude. I don't see why we can't make yet another exception for nonce offensive terms, just so that we don't give space to literally any nonce offensive term that Usenet racists make up. They would literally only need to be cited on 1-2 more sources to be included. It's not a blanket ban on every offensive term either, most of the ones we already have would stay, it'd just combat the recent wave of random nonce horrific offensive terms out of the depths of Usenet, that we can't even be sure if they're even used anymore. Wiktionary should be more inclusive, yes, but this is not one of those ways. We should also be thinking about the everyday user and what platform we're giving to words that would've otherwise never seen the light of day. AG202 (talk) 16:38, 11 June 2022 (UTC)[reply]
Why do Wiktionarians now think though that they are righter about Usenet quoting and offensive terms than fifteen years ago? Fay Freak (talk) 21:29, 11 June 2022 (UTC)[reply]
Wiktionary:Policies and guidelines#How are policies decided? - this link should help. Theknightwho (talk) 22:19, 11 June 2022 (UTC)[reply]
Whereby? You need help. Fay Freak (talk) 22:51, 11 June 2022 (UTC)[reply]
@Fay Freak Perhaps sealion#Verb might be more enlightening. Theknightwho (talk) 03:12, 12 June 2022 (UTC)[reply]
I oppose treating offensive words differently in this regard. I don't really want entries for offensive words to contain more quotations and I think it's awfully arbitrary to exclude Usenet for this but not other things. Andrew Sheedy (talk) 21:17, 12 June 2022 (UTC)[reply]
@Andrew Sheedy, @Benwing2, @Binarystep, @WordyAndNerdy, @Sgconlaw To make it clear, I am not advocating for the exclusion of Usenet here. I would just prefer that offensive terms require more than one website to show usage. Otherwise we will have an infinite amount of derogatory nonce terms that really bring down the quality of the website and continue to give them a platform to spread out more. This is the sixth conversation, at the very least, about this issue, and I've listened and talked with so many people and changed my proposal and approach so many times, but nothing seems to be changing, which is really unfortunate. There was a conversation that I read from two years ago about the image that we want to give our users and fellow editors, and I think that it's something that really needs to be taken into consideration. We have so so so so many policies about which words can and cannot be included at WT:CFI, but when it comes to offensive nonce terms that were made in the pits of the most vile, white supremacist places, but did not make it out of them, we're all of a sudden hesitant to require that they be cited a bit more aggressively, and honestly it hasn't sent the best message. It's truly sad and disappointing to me that there's more energy and time and resources being spent on preserving and debating words like Apefrican and Darky Cuntinent than getting words from actual African languages on here. Our coverage on them is so paltry, though I've been able to get more Yorùbá editors on here and increase coverage significantly, and I wish that instead of lengthy RFD, RFV, and Beer Parlour discussions on preserving words that were only used a few times in the most racist spaces, we could actually spend time on preserving some of our most impacted and endangered languages, which is why I joined this community in the first place. However, the longer I've been here, the less welcome I've felt. AG202 (talk) 21:44, 12 June 2022 (UTC)[reply]
Thanks for the clarification. I had indeed misunderstood you. I do agree. The last thing we want is our documentation of the language to be the cause of obscure racist slurs becoming mainstream. I was concerned that what we allowed or didn't would start to become somewhat arbitrary, but I think what you're describing would prevent that from much of an issue. Andrew Sheedy (talk) 21:50, 12 June 2022 (UTC)[reply]
@AG202 My suggestion to exclude Usenet for derogatory terms was just one way of trying to cut down on the crap. I'm fine with a more general requirement that at least two different sources be provided. Benwing2 (talk) 23:05, 12 June 2022 (UTC)[reply]
@Sgconlaw I think we should move to a vote fairly soon; given the viewpoints expressed here, I think we will be able to get one that passes with a 2/3 majority. Only a small minority seem categorically opposed to such a thing. Benwing2 (talk) 23:05, 12 June 2022 (UTC)[reply]
To make it clear, I am not advocating for the exclusion of Usenet here. I would just prefer that offensive terms require more than one website to show usage.
You're still proposing that we hold offensive terms to a different standard. Raising the bar to exclude certain words is something I'll never agree with. For comparison, imagine if we added an addendum to WT:FICTION saying we only accepted terms from well-known fictional works, even if a more obscure term didn't violate policy whatsoever.
Otherwise we will have an infinite amount of derogatory nonce terms that really bring down the quality of the website and continue to give them a platform to spread out more.
It's not Wiktionary's job to prevent the spread of offensive terms, and racists will continue to be racist regardless of whether we document their slurs, something which I can personally attest to. Incidentally, every racist term I've been called would still be allowed on the site after this, given that they didn't originate from Usenet.
As for bringing down the quality of the site, I'd argue that refusing to document attestable terms simply because we don't like them (yes, they're objectively vile terms, but that's not a good reason to pretend they don't exist) does that far more than having pages for terms that no one's obligated to read.
We have so so so so many policies about which words can and cannot be included at WT:CFI, but when it comes to offensive nonce terms that were made in the pits of the most vile, white supremacist places, but did not make it out of them, we're all of a sudden hesitant to require that they be cited a bit more aggressively, and honestly it hasn't sent the best message.
We don't have any policies that justify excluding words for moral reasons. Aside from that, our policies are already overly restrictive, and have held us back as a result. Making them even more limiting is a step backwards.
It's truly sad and disappointing to me that there's more energy and time and resources being spent on preserving and debating words like Apefrican and Darky Cuntinent than getting words from actual African languages on here.
Our coverage isn't a zero-sum game. We could delete everything in Category:English ethnic slurs if you wanted, that wouldn't automatically lead to better documentation of LDLs. This argument is ultimately a non sequitur.
Our coverage on them is so paltry, though I've been able to get more Yorùbá editors on here and increase coverage significantly, and I wish that instead of lengthy RFD, RFV, and Beer Parlour discussions on preserving words that were only used a few times in the most racist spaces, we could actually spend time on preserving some of our most impacted and endangered languages, which is why I joined this community in the first place.
These discussions wouldn't be happening if some users weren't more focused on trying to reduce our coverage than expand it. I don't appreciate how you blame your opposition for something they didn't start in the first place. How many Yoruba terms could've been added in the time it took to come up with this proposal? Binarystep (talk) 23:36, 12 June 2022 (UTC)[reply]
I never said I wasn’t going to hold offensive terms to a different standard, that’s been the main point of my proposal. This discussion spawned because certain IPs were spamming offensive nonce terms which happened to be citable on Usenet. I never said I wanted to delete all of the ethnic slurs in the ethnic slur category, this is mainly to limit the creation of random derogatory nonce terms that have never been used elsewhere. I’ve literally said that I would just prefer that terms be cited on more than one website. That’s very far from saying that they should all be deleted, and I don’t appreciate that assumption, nor do I appreciate calling my experiences here a non-sequitor as it’s what me and other users working on underrepresented languages have felt. It’s also not “refusing to document because we don’t like them”, otherwise I’d once again advocate for their full deletion, which I am not. I’ve also been one of the targets of one of the biggest slurs of all mankind that didn’t originate on Usenet, yet I’m not calling for its deletion because it’s clearly and evidently cited. And then finally, the part about “how many Yorùbá terms could’ve been added in the time it took to come up with this proposal?” frankly feels insulting and not an argument in good-faith, as I have put an immense amount of effort into Yorùbá coverage here and we’ve increased the lemmas almost tenfold since starting, and I'm the one that spent weeks of my time creating modules and templates, let alone the work I’ve done with Jeju as well, so please don’t use that argument with me again or I will not engage with you further. I can use my time to call out the project for not giving as much support as it could be for underrepresented languages like these with the intent that it’ll make things easier in the long-run. AG202 (talk) 01:01, 13 June 2022 (UTC)[reply]
I never said I wasn’t going to hold offensive terms to a different standard, that’s been the main point of my proposal.
And that's what I fundamentally disagree with. It also implies that a word's inclusion in Wiktionary is synonymous with its endorsement, which is problematic.
This discussion spawned because certain IPs were spamming offensive nonce terms which happened to be citable on Usenet.
There are other solutions, though. For one, we could ban IPs and new users from making pages for offensive terms.
I never said I wanted to delete all of the ethnic slurs in the ethnic slur category, this is mainly to limit the creation of random derogatory nonce terms that have never been used elsewhere.
I didn't say you did? If that's how that came off, I'm sorry about that. My intent was only to say that even the strongest possible approach to reducing coverage of offensive terms wouldn't automatically have a positive effect on other gaps in our coverage.
And then finally, the part about “how many Yorùbá terms could’ve been added in the time it took to come up with this proposal?” frankly feels insulting and not an argument in good-faith, as I have put an immense amount of effort into Yorùbá coverage here and we’ve increased the lemmas almost tenfold since starting, and I'm the one that spent weeks of my time creating modules and templates, let alone the work I’ve done with Jeju as well, so please don’t use that argument with me again or I will not engage with you further.
My intent wasn't to imply that you're not putting effort into your contributions, but rather that this specific proposal doesn't solve the other issue you mentioned. My point was that these two situations have nothing to do with each other. Binarystep (talk) 01:13, 13 June 2022 (UTC)[reply]
And I’m fine with you disagreeing with it, that’s why this discussion is happening in the first place. I’ve tried hard to find a middle ground across the multiple discussions had and this seems to be it as I can’t appeal to everyone unfortunately. I’m sure more folks would object to having IPs & new users from making offensive terms (also I feel like that’d be less implementable @Benwing2 can correct me on that) The two issues may not directly impact each other, but they do leave an image about the community. It’s hard for me to convince editors to come and help, as mentioned, I myself feel less welcome when we’re comfortable with giving such a space to those terms that are barely citable, which leads to less enthusiasm on my part and others’ parts to contribute to this project. AG202 (talk) 01:28, 13 June 2022 (UTC)[reply]
I’m sure more folks would object to having IPs & new users from making offensive terms (also I feel like that’d be less implementable @Benwing2 can correct me on that)
Why? IPs can't edit the pages for most offensive terms, why should they be allowed to create new ones?
The two issues may not directly impact each other, but they do leave an image about the community.
It shouldn't. Wiktionary is a dictionary, and the inclusion of a word isn't the same thing as endorsement.
It’s hard for me to convince editors to come and help, as mentioned, I myself feel less welcome when we’re comfortable with giving such a space to those terms that are barely citable, which leads to less enthusiasm on my part and others’ parts to contribute to this project.
This is what I'm struggling to understand. From my perspective, Wiktionary choosing to define a word is only saying "this exists and someone said it". It doesn't mean the site approves of the word, its users, or their ideology. It makes more sense to me to go after racist users (hence not allowing IPs to troll the site with new slurs) than racist words. Binarystep (talk) 01:43, 13 June 2022 (UTC)[reply]
If Wiktionary were an “any word can go” website, then I wouldn’t be having this discussion, but we have tons upon tons of standards. We choose to preserve some words but then choose to delete others. I’m having another discussion about why United Nations should be kept, I had to fight tooth and nail to find appropriate cites for Mickey Mouse ring, internalized homophobia got deleted for being SOP (though I still think it’s needed but alas), but vile words that don’t have any coverage at all past one website get to stay? Wiktionary, whether we like it or not, implicitly approves certain words and phrases, and we don’t have to approve those ones automatically. Also as a side note, Wiktionary has led to the approval of terms for groups, Petersonian being sent to RFV started an issue in the related subreddit and they took it as a victory when it was kept. There’ve been callouts about Wiktionary’s reconstructions and coverage in different linguistic forums. There’ve been questions about who actually makes up Wiktionary’s editors. I’ve seen my own entry at yassification be used as justification for the word “existing” and being cited in multiple tweets. As with any dictionary (see: the RAE and elle, the outrage against Le Petit Robert and its inclusion of iel, and Merriam-Webster’s inclusion of the singular they), we do have an impact, and as such, I think that we could be a bit more strict with how we include those words. AG202 (talk) 01:56, 13 June 2022 (UTC)[reply]
If Wiktionary were an “any word can go” website, then I wouldn’t be having this discussion, but we have tons upon tons of standards.
We have consistent standards. Problems arise when you start banning words on an individual basis.
(I'd also argue that Wiktionary should be an "any word can go" website, given that our collaborative format would make it trivially easy for us to become the most accurate dictionary in existence.)
We choose to preserve some words but then choose to delete others.
Assuming they're valid words, that's not a good thing.
I’m having another discussion about why United Nations should be kept, I had to fight tooth and nail to find appropriate cites for Mickey Mouse ring, internalized homophobia got deleted for being SOP (though I still think it’s needed but alas), but vile words that don’t have any coverage at all past one website get to stay?
Again, none of those should be deleted. I'm well aware that we have numerous problems with our coverage, and a long history of deleting perfectly valid terms due to some problem with our CFI. The solution is to put an end to our excessive deletionism, not make it worse.
Also as a side note, Wiktionary has led to the approval of terms for groups, Petersonian being sent to RFV started an issue in the related subreddit and they took it as a victory when it was kept.
So? I don't see why that's our problem. I'm sure our coverage of ((( ))), Holohoax, and bix nood has led to some neo-Nazis feeling proud of themselves, but that doesn't mean we should pretend those terms don't exist just to stick it to them.
Honestly, Petersonian really doesn't feel like the best example, given that it's a completely neutral term whose inclusion doesn't communicate anything beyond "people talk about Jordan Peterson". His fans may as well celebrate the fact that he has a Wikipedia page.
There’ve been callouts about Wiktionary’s reconstructions and coverage in different linguistic forums.
There'll be criticism no matter what we do. I've seen some people say we have too much fandom slang while others say we have too little.
There’ve been questions about who actually makes up Wiktionary’s editors.
What do you mean by that?
I’ve seen my own entry at yassification be used as justification for the word “existing” and being cited in multiple tweets.
I mean, yassification is a word that exists. Whether that's a good thing isn't for us to decide.
As with any dictionary (see: the RAE and elle, the outrage against Le Petit Robert and its inclusion of iel, and Merriam-Webster’s inclusion of the singular they), we do have an impact, and as such, I think that we could be a bit more strict with how we include those words.
I doubt that our impact is that big in this case. Removing obscure slurs from Wiktionary isn't going to make anyone less racist. No one's getting "redpilled" by the dictionary. People will continue to be shitty regardless of whether we document examples of their shittiness. Binarystep (talk) 02:26, 13 June 2022 (UTC)[reply]
I could apply that last argument to a LOT of different issues in society today, but that’s get far too off-topic. Just because some folks will continue to be trash, doesn’t mean we should continue to cover these words without a more strict guideline. And while I agree that those words shouldn’t have been deleted, they were and so, I’m trying to build off of what Wiktionary currently has as its policies unless a very major change occurs. The consensus didn’t agree with me, so that’s what I build off of, hence why I’ve altered this proposal a ton. Also our policies are definitely not consistent. I’ve been confused on multiple occasions about which words fall under which policies or how to go about entries or what counts as “durably archived” for example. We definitely already ban certain terms on an individual basis, otherwise there wouldn’t be multiple sections at WT:CFI or WT:RFD. It feels like there’s a Wiktionary that you’re wanting that’s different from what’s actually going on. I want more coverage as well (minus these terms), but alas, I know that a policy that removes CFI, for example, would not be popular and would fail spectacularly. I’m trying to focus on what’s practical and what could maybe pass after talking about it with folks. Re: the demographics part, there’ve been questions about Wiktionary’s demographics and why we cover certain terms and languages and why we don’t cover others. Re: impact, our impact isn’t as big, but it’s definitely there, so we should be striving for quality and think a bit more about what we put out there. AG202 (talk) 02:50, 13 June 2022 (UTC)[reply]
Just because some folks will continue to be trash, doesn’t mean we should continue to cover these words without a more strict guideline.
No, the fact that these words exist means we should continue to cover them. The fact that racists exist isn't a reason to delete valid entries.
And while I agree that those words shouldn’t have been deleted, they were and so, I’m trying to build off of what Wiktionary currently has as its policies unless a very major change occurs.
How does it benefit anyone to make Wiktionary even more deletionist than it already is? The fact that our policies are flawed doesn't justify making them worse. Unless something changes, the best thing we can do is protect our existing coverage.
Also our policies are definitely not consistent. I’ve been confused on multiple occasions about which words fall under which policies or how to go about entries or what counts as “durably archived” for example.
Can you elaborate?
We definitely already ban certain terms on an individual basis, otherwise there wouldn’t be multiple sections at WT:CFI or WT:RFD.
We usually don't ban CFI-compliant words because we dislike them, which is what this proposal ultimately boils down to. Off the top of my head, I can only think of two comparable cases from RFD: the proposal to delete Kent State Gun Girl for being "non-notable" (which sadly passed despite having zero basis in policy), and the proposal to delete everypony for being "too niche".
I can understand deleting entries for being sum-of-parts, names of individuals, non-lexicalized trademarks, or terms coined in fiction that haven't entered general use. What I don't agree with is deleting terms that'd otherwise be kept, solely because they're offensive. All else being equal, the offensiveness of a term should not be the reason for its removal.
It feels like there’s a Wiktionary that you’re wanting that’s different from what’s actually going on.
Well, yeah. Is that not the case for you as well? Both of us want to see the site change in some way or another.
I want more coverage as well (minus these terms), but alas, I know that a policy that removes CFI, for example, would not be popular and would fail spectacularly.
How is less coverage the solution? Sure, outright abolishing CFI isn't feasible, but gradually improving it definitely is. Consider the recent CFI change allowing online citations on a case-by-case basis, which came after years of failed proposals to do pretty much the same thing. Consensus changes over time, and as traditional media becomes less relevant, our policies will likely change to reflect that. On the other hand, coming up with more reasons to delete valid entries will only lead to us becoming less accurate.
Re: impact, our impact isn’t as big, but it’s definitely there, so we should be striving for quality and think a bit more about what we put out there.
We should be striving for accuracy and completeness, or, in other words, "all words in all languages". Our format gives us the potential to become a far better resource than our stricter counterparts, and creating a more restrictive CFI would only accomplish the exact opposite.
Whatever our impact is, I can't see how it matters here. No one became racist because they read a slur in the dictionary. We're not making the world a worse place by describing the bad things that already exist. Additionally, as I said before, our decision to document a word isn't the same thing as us advocating for its usage. Our job is to describe what exists, not to decide what should exist. Binarystep (talk) 09:37, 13 June 2022 (UTC)[reply]
"The fact that racists exist isn't a reason to delete valid entries." I've addressed this and how I'm explicitly not calling for mass deletions. "Our job is to describe what exists, not to decide what should exist." this is not what happens with Wiktionary in reality though. I've definitely seen many many "CFI-compliant" words be deleted because Wiktionarians do not like them. I generally am considered an "inclusionist" by some other editors here, but even then, I don't feel like these terms are really needed without any major citations. I don't want them bulk-deleted, as I've mentioned, I want them to be cited on more than one website. "Additionally, as I said before, our decision to document a word isn't the same thing as us advocating for its usage." once again, this may be our explicit job, but it's not what happens implicitly. "Can you elaborate?" I've had multiple arguments about what is encyclopediac at RFD, what counts as durably archived (I was told that I would have to see if every newspaper at Mickey Mouse ring at the time had a print version before it could pass RFV), RFD closing guidelines, what should be the entry line for some languages, why we decide to strip diacritics from Yorùbá but not from Vietnamese, and more. Our policies are the opposite of consistent. I've ironically used this argument before "all words in all languages", at Wiktionary:Votes/2020-07/Removing_letter_entries_except_Translingual which was a much more sweeping proposal than this one, but the more I've been on this website, the more I realize that that's not the case. And so, I'm personally fine with requiring more than one website for offensive terms, though I'm very aware that you're not, which is also fine. AG202 (talk) 11:39, 13 June 2022 (UTC)[reply]
"The fact that racists exist isn't a reason to delete valid entries." I've addressed this and how I'm explicitly not calling for mass deletions.
Are you or are you not calling for the mass deletion of offensive tems that can only be cited on Usenet?
"Our job is to describe what exists, not to decide what should exist." this is not what happens with Wiktionary in reality though. I've definitely seen many many "CFI-compliant" words be deleted because Wiktionarians do not like them.
And that's a problem. You're acknowledging what's wrong with our current system, but your response is to accept it as unfixable.
I don't want them bulk-deleted, as I've mentioned, I want them to be cited on more than one website.
Aside from the inaccuracy of referring to Usenet as "one website", what happens if a word can't be cited elsewhere? Not all terms entered widespread usage.
"Additionally, as I said before, our decision to document a word isn't the same thing as us advocating for its usage." once again, this may be our explicit job, but it's not what happens implicitly.
How doesn't it? Because some neo-Nazis might think us having a page for their favorite slur validates their beliefs in some way? We can't control what other people think. Regardless of some people's opinions, our explicit purpose is to document words, not support them. If some people refuse to accept that, that's not our fault.
"Can you elaborate?" I've had multiple arguments about what is encyclopediac at RFD, what counts as durably archived (I was told that I would have to see if every newspaper at Mickey Mouse ring at the time had a print version before it could pass RFV), RFD closing guidelines, what should be the entry line for some languages, why we decide to strip diacritics from Yorùbá but not from Vietnamese, and more. Our policies are the opposite of consistent.
Everything you've described is a major problem, but, again, the solution isn't to move even further in that direction.
I've ironically used this argument before "all words in all languages", at Wiktionary:Votes/2020-07/Removing_letter_entries_except_Translingual which was a much more sweeping proposal than this one, but the more I've been on this website, the more I realize that that's not the case.
I agree that this site has the unfortunate tendency to contradict its mission statement, but that isn't a good reason to continue the trend. Binarystep (talk) 22:59, 13 June 2022 (UTC)[reply]
@Binarystep "Are you or are you not calling for the mass deletion of offensive tems that can only be cited on Usenet?" As I've said, if they are citable they are fine, this doesn't even affect that many words, mainly the ones that have been spammed. If they're not citable, then they'd be subject to RFV like other words are. "Regardless of some people's opinions, our explicit purpose is to document words, not support them. If some people refuse to accept that, that's not our fault." I don't think I will change your mind on this, as it's a very fundamental difference in our experiences, so I will leave it at that. "but your response is to accept it as unfixable." I don't know how long you've been here, but I have 100% tried to fix those issues as I've shown. I don't need to prove that to you further. AG202 (talk) 23:03, 13 June 2022 (UTC)[reply]
"Are you or are you not calling for the mass deletion of offensive tems that can only be cited on Usenet?" As I've said, if they are citable they are fine, this doesn't even affect that many words, mainly the ones that have been spammed. If they're not citable, then they'd be subject to RFV like other words are.
You didn't answer my question. If a derogatory term is only citable on Usenet, it would be deleted, correct?
"Regardless of some people's opinions, our explicit purpose is to document words, not support them. If some people refuse to accept that, that's not our fault." I don't think I will change your mind on this, as it's a very fundamental difference in our experiences, so I will leave it at that.
I don't intend to sound rude, but unless I'm mistaken, your experiences only prove that some people think Wiktionary endorses every word it documents. I'm not denying that some people may believe that, but that doesn't make them correct.
"but your response is to accept it as unfixable." I don't know how long you've been here, but I have 100% tried to fix those issues as I've shown. I don't need to prove that to you further.
I know you've tried to fix those issues in the past, which makes me wonder why you're bringing them up now as proof that the status quo is unchangeable. Binarystep (talk) 23:12, 13 June 2022 (UTC)[reply]
Yes, if they fail RFV they'd be deleted, but that does not mean that they're all going to be deleted with a snap of a finger, which is what "mass deletion" sounds like to me, that many many entries would be deleted, like the letter vote would've implied. "I don't intend to sound rude, but unless I'm mistaken, your experiences only prove that some people think Wiktionary endorses every word it documents." This is exactly what I was talking about with implicit impact, Wiktionary may not explicitly endorse certain terms, but having them here gives them power and people thinking that Wiktionary endorses them can create issues. If you don't agree, then fine, but I don't want to go back and forth on that point anymore either. I mainly brought them the status quo now as a rationale for being more stringent with these nonce offensive terms. If we're already more stringent on a lot of valid terms, then I don't know why we can't be even a bit strict with offensive nonce terms. That's another thing on which we fundamentally disagree, so that's that. AG202 (talk) 23:18, 13 June 2022 (UTC)[reply]
Yes, if they fail RFV they'd be deleted, but that does not mean that they're all going to be deleted with a snap of a finger, which is what "mass deletion" sounds like to me, that many many entries would be deleted, like the letter vote would've implied.
Then my statement is accurate. All offensive terms that can't be cited outside of Usenet would be deleted, which is the definition of mass deletion. The phrase doesn't imply a lack of due process.
"I don't intend to sound rude, but unless I'm mistaken, your experiences only prove that some people think Wiktionary endorses every word it documents." This is exactly what I was talking about with implicit impact, Wiktionary may not explicitly endorse certain terms, but having them here gives them power and people thinking that Wiktionary endorses them can create issues.
It's not Wiktionary's fault or responsibility what people think. I'm sure neo-Nazis feel proud of themselves because of our decision to include terms like ((( ))), 1488, bix nood, chimp out, Holohoax, Holocaustianity, electric Jew, and countless other vile epithets, yet I don't think you'd support deleting them.
I mainly brought them the status quo now as a rationale for being more stringent with these nonce offensive terms. If we're already more stringent on a lot of valid terms, then I don't know why we can't be even a bit strict with offensive nonce terms.
You want to hold offensive terms to a higher standard than everything else, solely because they're offensive. Wiktionary is a lot of things, both good and bad, but it's not censored. Binarystep (talk) 23:27, 13 June 2022 (UTC)[reply]
I never understood the "Wiktionary is not censored" portion, when when we have very clear guidelines it is. Your other points have been addressed already in our many exchanges, so I won't rehash them. AG202 (talk) 23:29, 13 June 2022 (UTC)[reply]
I never understood the "Wiktionary is not censored" portion, when when we have very clear guidelines it is.
There's a difference between removing a word for being SOP and removing a word for being objectionable. That's not to say that the former is always a good thing, but they're not the same situation.
Your other points have been addressed already in our many exchanges, so I won't rehash them.
They really haven't been addressed, though. Why should Wiktionary delete valid entries simply because of how some people feel about them? Why does that only apply to obscure slurs, but not more popular ones like yard ape? The same people feel validated in both scenarios, yet only one is worth censoring for the common good. Why is that? Binarystep (talk) 23:48, 13 June 2022 (UTC)[reply]
Me and other folks have stated how those nonce offensive terms bring down the quality of the website to us. They don't bring it down for you, and that's fine, we disagree. I've already addressed the other points to my satisfaction multiple times, and unfortunately, I don't think I'll ever be able to explain myself to your own satisfaction, so here as well, I will end my portion here. AG202 (talk) 23:58, 13 June 2022 (UTC)[reply]
Me and other folks have stated how those nonce offensive terms bring down the quality of the website to us. They don't bring it down for you, and that's fine, we disagree.
These terms bring down the quality of the English language. Unfortunately, they're still part of it, and I don't see why we should lie to our readers by pretending otherwise. Binarystep (talk) 00:02, 14 June 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── From the discussion thus far, it looks like there isn’t a consensus for excluding a specific source like Usenet. Since that raises separate issues, I think the use or otherwise of Usenet as a source shouldn’t be made part of the present proposal. — Sgconlaw (talk) 03:06, 13 June 2022 (UTC)[reply]

@Sgconlaw I’m fine with separating the proposals, but once again, please please please, I’ve said a few times now that I’m not proposing the exclusion of Usenet. I am proposing increasing the number of sources required for offensive terms by 1~2. I don’t want folks once again getting confused based on a proposal that’s not being presented. Also, it does seem like there’s a general consensus on that proposal, minus one or two folks (whose opinions are also valid). AG202 (talk) 03:57, 13 June 2022 (UTC)[reply]
@AG202: thanks. I will figure out how to create a vote soon. I could add an option for requiring that for derogatory entries, the quotations must originate from more than one source (not just Usenet alone), and we could see if there is support for that. Personally, if what is proposed is a greater diversity of sources rather than questioning Usenet itself as a source (which requires a separate discussion) I have no problem with that. — Sgconlaw (talk) 05:24, 13 June 2022 (UTC)[reply]
@Sgconlaw Yes that'd be fine with me. I'd remove language about Usenet, maybe something like "Offensive terms must be cited in more than one source (website, book, television show, etc.) to be included on Wiktionary" with a caveat for LDLs obviously. AG202 (talk) 11:41, 13 June 2022 (UTC)[reply]
@AG202: to avoid doubt, I think it might be necessary to expressly mention that Usenet as a whole is considered as a single source, rather than each conversation therein, or each post with a distinct date, being regarded as a separate source. — Sgconlaw (talk) 11:46, 13 June 2022 (UTC)[reply]
@Sgconlaw That would be fine, maybe we could add more examples like for example "Twitter, Reddit, 4chan, Usenet all count as one source individually" just so that there's less focus on Usenet. AG202 (talk) 15:35, 13 June 2022 (UTC)[reply]
@AG202: sure, though don't we not use some of these websites like Reddit and Twitter because they aren't durably archived? We should only mention durably archived sites. — Sgconlaw (talk) 15:37, 13 June 2022 (UTC)[reply]
@Sgconlaw I do feel that Reddit & Twitter are on their way towards being included especially with the current RFVs shenanigans going on + the passed vote on this issue, so I think that it'd be good to include them now rather than later. AG202 (talk) 15:47, 13 June 2022 (UTC)[reply]
@AG202: OK, then. — Sgconlaw (talk) 16:15, 13 June 2022 (UTC)[reply]
The fact that this proposal gives more weight to a word used in 3 books than a word used in 50 Usenet posts is inherently problematic. Usenet is a medium, not a single source. The very basis of this proposal is an inaccurate oversimplification. Binarystep (talk) 23:01, 13 June 2022 (UTC)[reply]
If you don't see the difference between 3 books and 50 Usenet posts when it comes to the propogation of offensive terms, then I just think that's a different perspective and experience between the two of us, and there's unfortunately little point to try and go back and forth about this anymore. AG202 (talk) 23:04, 13 June 2022 (UTC)[reply]
What's the difference, then? How does the fact that a word made it into print make it more of a word, even if it's only been used three times in documented human history? Usage defines the validity of a word, not the medium it's used in. Binarystep (talk) 23:13, 13 June 2022 (UTC)[reply]
If the medium does not matter then we wouldn't have the durably archived clause. But anyways, yeah, a book feels more established when it comes to words being used, vs a racist community where they're literally just attaching any negative word to the name of an African country, using it three (or fifty, doesn't matter) times, and then bam we got an entry in Wiktionary. As the target of a lot of these new terms, it just doesn't feel right. AG202 (talk) 23:27, 13 June 2022 (UTC)[reply]
If the medium does not matter then we wouldn't have the durably archived clause.
I mean, the "durably archived" rule is keeping us stuck in the past and should probably be abolished, but that's another story.
But anyways, yeah, a book feels more established when it comes to words being used, vs a racist community where they're literally just attaching any negative word to the name of an African country, using it three (or fifty, doesn't matter) times, and then bam we got an entry in Wiktionary.
This isn't the Middle Ages. It's trivially easy for anyone with enough money to get a book published, the barrier of entry isn't nearly as high as people tend to assume. Aside from that, Wiktionary documents terms based on their usage (within "durably archived" sources), not based on whether they were used by a more elite class of people. Usenet is considered a "durably archived" source, exceptions shouldn't be made to get rid of terms we don't like.
As the target of a lot of these new terms, it just doesn't feel right.
Does a term become less offensive because its user had more resources available to them? How does it make a difference to either of us whether a term like dindu nuffin was used in a book or a neo-Nazi website? It's still the same word and it carries the same meaning. Ultimately, Wiktionary pretending a handful of slurs don't exist isn't going to make anyone less racist. Binarystep (talk) 23:38, 13 June 2022 (UTC)[reply]
🫤 and we're back several exchanges once again. I'm not going to change your mind. This issue, as I've mentioned is very near and dear to my heart and as such it's already very taxing to participate in (especially seeing how internet users continue to show their hate for Black people in very very novel ways). And now, my most important and heartfelt message has been utterly drowned out by back and forth exchanges that have led neither of us to shift at all. And so, I will bow out here and see how the vote goes in the end. AG202 (talk) 23:44, 13 June 2022 (UTC)[reply]
🫤 and we're back several exchanges once again. I'm not going to change your mind. This issue, as I've mentioned is very near and dear to my heart and as such it's already very taxing to participate in (especially seeing how internet users continue to show their hate for Black people in very very novel ways).
Censoring Wiktionary isn't going to end racism, and you're blaming it for something it has no involvement in. Twitter user "RaceRealist88" isn't going to plunge the depths of Wiktionary for a slur that was used between 1993 and 2005 on alt.fan.adolf-hitler, he's just going to say the N-word and call it a day. I feel pretty confident in saying that, given my experience dealing with that exact word. Binarystep (talk) 23:58, 13 June 2022 (UTC)[reply]
"and you're blaming it for something it has no involvement in." It has in my own and others' experiences :-/ which is what I've been trying to get at. I'm not going to get into detail about them here, but I have felt its direct impact, so I wish you'd at least respect that even if you disagree with the proposal. AG202 (talk) 00:02, 14 June 2022 (UTC)[reply]

Without looking thoroughly into things, the idea of "Disallowance of certain sources?" sounds very dangerous on its face. Descriptivism means the whole language and everywhere it is used. That doesn't mean "no standards", but it does mean that all sources must be allowed- including Chinese Communist Party mouthpieces, Putin/KGB/FSB, CIA, religious fundamentalists, cults, deviants, and everybody else. --Geographyinitiative (talk) 23:13, 13 June 2022 (UTC)[reply]

@Geographyinitiative For the upteenth time, please, no source is being disallowed 🙏. I don't know how this conclusion keeps happening, but that's not the goal here. AG202 (talk) 23:19, 13 June 2022 (UTC)[reply]
AG202, I apologize if I have misunderstood anything. But the words "Disallowance of certain sources?" appear above. It causes my spider sense to tingle. I want the Iranian Revolutionary Guard, Juche, Turkmenbashi, the Green Book, whatever else to be allowable if relevant. If these get too extreme or "fringe" then, I would confine them to the Citations page. But use of the word "disallowance" was a mistake if you all mean that "please, no source is being disallowed 🙏". God bless. --Geographyinitiative (talk) 23:57, 13 June 2022 (UTC)[reply]
@Geographyinitiative I was not the one that came up with that header unfortunately, and it was an unfortunate misunderstanding of my original point. I never called for the disallowance of certain sources in this conversation as I had proposed it months back, and it got a rightful pushback. AG202 (talk) 00:00, 14 June 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── I have now created a vote at "Wiktionary:Votes/pl-2022-06/Attestation criteria for derogatory terms". — Sgconlaw (talk) 22:06, 13 June 2022 (UTC)[reply]

(Replying to the above discussion) A user has tried to add Daily Stormer quotes to random entries in the past. Without firm, clear rules disallowing links to fringe and extremist sites, this is an issue that will metastasize throughout the wiki. We can implement editorial standards like every other dictionary that aims to balance accuracy, reliability, and accessibility, or we can be a digital bathroom stall for provocateurs to graffiti. "Wiktionary is not censored" is not carte blanche for anything to be included in any entry. The entry for head does not include an image of a severed head. The entry for brown doesn't include an image of feces. We make discretionary choices about what to include in entries all the time. That said, I don't believe we should disallow offensive quotations, only that we shouldn't lend visibility to fringe websites by linking them. Usenet or print books don't raise the same misgivings for me. I'm not opposed to the idea of limiting certain quotations to citations pages. But for me, at least, there can be a degree of academic distance in quoting a book or historical Usenet thread, but not in linking to a live website that exists solely to propagate fringe ideas or theories. WordyAndNerdy (talk) 09:32, 15 June 2022 (UTC)[reply]
@WordyAndNerdy A LOT of the offensive Usenet quotes have been from the past 5 years though (see: Apefrican for an example); I don't feel like those are really historical anymore. And if they are still live (@Equinox can correct me on that), then the same issue arises as linking to fringe websites. We link to the specific Google group with the Usenet group name, so imho this issue still applies to it. AG202 (talk) 16:20, 19 June 2022 (UTC)[reply]
One of the cites at turdler is from 2021, last year. This truly isn't really historical anymore and runs into the same issues with linking to fringe sites. AG202 (talk) 13:23, 20 June 2022 (UTC)[reply]

Mozarabic: what to do when the sources disagree?

[edit]

Mozarabic is a long-extinct Romance language attested only in a few dozen compositions, where it is written in a rather haphazard way in Arabic or Hebrew script. That, in addition to centuries of copying errors by non-Mozarabophones, makes interpretation of any given text tricky, hence different scholars often arrive at different results.

As an example, we will consider a quote from kharja A1, which is cited on our entry for Mozarabic دلج (dalji) and which is, in fact, the only thing supporting the existence of that entry.

Jones (1988: 33; cited on that page) transcribes the relevant quote as ⟨yā ?nwāmni? dalji⟩, without giving a translation, and with nwāmni indicated as an uncertain reading.

Corriente (1993: 27–28; also cited on that page) transcribes it as ⟨yā nwāmin dalji⟩, which he translates as 'sweet name'. He takes this to phonetically represent a Mozarabic ya NWÉMNE DÓLČE, where the lowercase word is of Arabic origin, and the uppercase words of Latin origin, per Corriente's system. Judging by that and by his translation, we are dealing, etymologically speaking, with Arabic يا () + Latin nōmine and dulcem.

The problem is that, a decade and a half later, Corriente apparently changed his mind. He decided (2009: 120; see here) that the phrase really says ya ndá min tháljE, which he translates as 'oh you who are fresher than the snow', indicating all of the words as being Arabic in origin (and apparently claiming that Romance contributed a single vowel).

Needless to say, that completely undermines our entry for دلج (dalji), as well as the one for نوامن (nwāmni), which depends on the same quote. Incidentally, I have another reason to doubt that a supposed nwāmni (interpreted by Corriente as phonetically representing nwémne) could really derive from Latin nōmine: it shows diphthongization, as if derived from *nŏmine, with a short tonic vowel.

In any case, the larger question here is: what should we do when the aforementioned sources disagree? Perhaps we should only rely on phrases that all three of them agree on (of which there is, fortunately, a decent number).

It would also be helpful to find an additional modern source that transcribes and translates the Mozarabic kharjas. Post-1988, preferably, as there are serious issues with earlier attempts, which I won't get into right now.

Pinging @Santi2222, Ser be etre shi, and @Fay Freak.

- Nicodene (talk) 23:46, 10 June 2022 (UTC)[reply]

@Nicodene IMO you hit the nail on the head. There should be agreement among sources (or at least among a substantial number of them) when it comes to what are essentially reconstructions like this. Benwing2 (talk) 00:30, 11 June 2022 (UTC)[reply]
At least for Arabic I require that a reading has little to doubt, as I for pre-printing (1800) Arabic in accordance with European languages which start around 1500 (English, German, Polish, Russian) or even 1615 and 1650 (French, Dutch) I would only seek one occurrence excluding the presence of a ghost word. Here you don’t know the language and don’t get much meaning into the texts, and interestingly we even have Category:Undetermined lemmas but this is not even for this, maybe those shambles of languages aren’t a matter for a dictionary like this but only for the use in specialist fields. Alas, we can’t stop editors from including Mozarabic references and thus also entries, however it is easy to dismiss entries of one does not even know the rough spelling aimed at nor the language roughly. The benefit from deciphering these poems is really low anyway as we know Latin and copious descendants and Arabic so I judge that you really need to gather something exclusive to bother much about these uncertainties. In other words your time may be too valuable to care about those certain words as the yield won’t likely be of great significance in any case!
This “agreement” thing is misleading since it is usually just one author following another sequentially with no direct conclusion for our purposes. Some people learned working with Meroitic that you can’t trust anyone. Fay Freak (talk) 00:41, 11 June 2022 (UTC)[reply]
Regarding Corriente, I have not even mentioned my suspicion that he did not actually change his opinion that much but let his academic gofers write articles published under his name, understood as a kind of brand. In كرزية the three literature loci starting with “Corriente” have three different etymologies of which only the chronologically second one I could even make sense of, others include corrupted or phantasized citations of Iranian words – the secondary literature is often as bad as the medieval manuscripts in transmission but we savvy to filter the popular stuff. (We could open another can of strange “references” cited on Wiktionary but it is 3:54 AM in Germany.) Fay Freak (talk) 01:54, 11 June 2022 (UTC)[reply]
An initial ameliorative step would be to slam {{LDL}} on the dubious entries. The agreement on allowed single sources should then exist, and should allow for challenges. After all, there is a significant reason to suspect scribal errors at various levels. It then comes down to what we do with uncertain words - do we include them with a warning, which is useful in itself, or do we puristically exclude them because we aren't sure. In this particular case, what do people here make of the actual Arabic script text? Or has no-one here tried to read it?
Actually, using Google Books, I see that Jones gives two examples of the word دلج (dlj). Does this solve this particular matter?
Perhaps it would help to have some mechanism for indicating the reliability of quotations - including the level of fabrication. --RichardW57 (talk) 16:07, 12 June 2022 (UTC)[reply]
I don't know what the standard WT practice is in cases like this, but perhaps having an appendix with the kharjas (and plausible reconstructions) could be a work-around for words like the ones mentioned here. There are entries for other badly attested languages with headers like "Word" and definitions with the phrase "the meaning of this term is uncertain", but given that in Mozarabic the difficulties in interpretation are often at text level (and not word level) I would personally favor an appendix-style solution (in case we want to include dubious terms).--Santi2222 (talk) 18:26, 12 June 2022 (UTC)[reply]

Translingual entries and anagrams

[edit]

If there is an English word or term whose letters can be re-arranged to make a translingual entry should that be a valid anagram for Wiktionary? I think it should because translingual terms are used in English. Others may disagree. Some contributors may think we shouldn't have anagrams at all. John Cross (talk) 11:26, 11 June 2022 (UTC)[reply]

Yes, just as it should for any other language that the term is used in. I have mixed feelings about Translingual entries, because there is an occasional tendency to assume that use in 3 (or sometimes even 2) languages using a term in the same way warrants lumping things together there. But the fact is that there are plenty of terms (primarily symbols and proper nouns) which do deserve entry there, and are clearly used in English. Theknightwho (talk) 19:24, 11 June 2022 (UTC)[reply]
I have a hunch that this might lead to a lot of not-very-English clutter in anagram sections, like taxonomic names which we all secretly know are Latin. Equinox 04:14, 12 June 2022 (UTC)[reply]
Like English altbier, betrail, librate, tablier, triable, trilabe → Translingual alberti.  --Lambiam 11:36, 12 June 2022 (UTC)[reply]
Since Translingual terms are, in principle, used in every language, we could have larger anagram sections for every Latin-script language, based on taxonomic names alone. This doesn't seem very productive. DCDuring (talk) 20:49, 12 June 2022 (UTC)[reply]
No, I don't think this is overly useful. Some people view Translingual terms as belonging to any language. I tend to see them as belonging to no language, but being used within a language. What this means practically is that most people would not consider most Translingual terms English, nor would you be able to use them in word games, which is where Anagrams are most useful. If you're playing Scrabble, it's not useful to know that the combination of letters you have can spell Poecilia. Andrew Sheedy (talk) 21:09, 12 June 2022 (UTC)[reply]
As a point of interest, one can find "poecilias" (lower case) in running English text. Many taxonomic names, both current and obsolete, have corresponding English names. DCDuring (talk) 22:28, 12 June 2022 (UTC)[reply]

I feel we should adopt a policy with regard to internet quotations and settling. As was stated, there was a clear feeling that we should settle these kinds of issues in RFV. I propose we allow for the ability to create votes within an RFV thread. Vininn126 (talk) 13:25, 11 June 2022 (UTC)[reply]

This is currently also being discussed at WT:RFVE#creeper. @Fytcha @AG202 @WordyAndNerdy This, that and the other (talk) 02:31, 12 June 2022 (UTC)[reply]

Can we settle whether affixes in Arabic-script languages should be lemmatized with or without ـ , e.g. ـی vs. ی?

[edit]

Currently, it’s a bit in a shambles:

Can we settle this for good? IMO, af least in the case of Persian, it’s better to lemmatize with ـ, since Arabic-script languages use orthographic spaces and hence there’s always an orthographic difference between e.g. Persian چه (če, what), always preceded by a space, and ـچه (-če, diminutive suffix), always written joined. Korean was somewhat recently revised to use hyphens in lemmatization for the same reason.

In addition, some Persian affixes are more commonly written spaced or with zero-width non-joiners (especially in formal writing), e.g. the verbal prefix می (mi). Currently we have no way to tell readers outside a cumbersome Usage Note that e.g. the prefix می (mi-) is usually written spaced or with a ZWNJ while ب (be-) is never spaced, whereas if lemmatization with ـ was consistently implemented, this would be obvious from the very title.

Thoughts?--Tibidibi (talk) 02:26, 12 June 2022 (UTC)[reply]

@Tibidibi Hello! I was told you might be gone for awhile due to army service; good to see you back. I think we should include the tatweel character before the suffix or after the prefix if the affix attaches to the main word without a space or ZWNJ (which is always the case for Arabic at least). You enumerated some reasons why this makes sense for Persian, but IMO it should be done for Arabic as well, if for no other reason than that several characters look noticeably different in their independent vs. joined forms, and the tatweel forces the joined form, which visually helps signal that we're dealing with an affix. Benwing2 (talk) 03:18, 12 June 2022 (UTC)[reply]
@Benwing2 Could you code {{af}} so it links to tatwiled forms for Persian, please?--Saranamd (talk) 02:20, 5 July 2022 (UTC)[reply]
@Tibidibi I like what they're doing with Arabic at the moment (compare كَـ (ka-) vs ـكَ (-ka) under ك (k)). It looks like lemmatising with tatweel would make more sense for Persian, but I don't feel the need to do that for Arabic. Sartma (talk) 23:50, 12 June 2022 (UTC)[reply]
@Tibidibi, Sartma: In my opinion it's better to lemmatise affixes in all Arabic based (script) languages without the taṭwīl but use it on the correct side of the word in the headword, if those affixes are spelled together with a corresponding word (no space or ZWNJ), as is the case with the Persian suffix ـچه (-če) (the entry title is at چه). A hyphen in the transliteration should be used for terms written together or with a ZWNJ. So prefix می (mi-) (no taṭwīl but a hyphen in the transliteration), which needs a ZWNJ is also good as it is now. --Anatoli T. (обсудить/вклад) 01:02, 14 June 2022 (UTC)[reply]
@Atitarev This can easily lead to clutter on single-letter entries. Why do we distinguish between Arabic-script languages and Latin (or Cyrillic, etc.) script ones in this regard, when both scripts make use of orthographic spaces?—Tibidibi (talk) 09:20, 14 June 2022 (UTC)[reply]
@Tibidibi: I understand your idea better now. It may work. It might be difficult to engage all editors for all Arabic script-based languages, though. Perhaps, focusing on one, such as Persian? Anatoli T. (обсудить/вклад) 23:12, 14 June 2022 (UTC)[reply]
(Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ):
If there aren’t any responses by this time next week, I will move the relevant entries to the tatwil-ed form.
@Benwing2 If this passes, can you modify the relevant code so that {{af|fa|آزاد|ـی}} links to the tatwil-ed form?—Tibidibi (talk) 09:42, 20 June 2022 (UTC)[reply]
@Tibidibi, Benwing2: How did you go? Any updates? I also think it would be beneficial to demonstrate sample edits (even if not approved, a revision can be used as a demo). --Anatoli T. (обсудить/вклад) 10:11, 7 August 2022 (UTC)[reply]

Can we standardize morphophonemic/phonemic/phonetic/etc conventions for Middle Korean, Modern Standard Korean and Jeju?

[edit]

Which level of "underlyingness" counts as phonemic seems to be not so well defined, and I often see "phonemic analysis" of Middle Korean and Modern Korean that just seems to follow Modern Korean morphophonemic orthography rules. I find this problematic, because it only allows underlying forms that are possible to write in hangul:

Hangul 짚다 짚어서 짚는
Analysis /cita/ /ciʌsʌ/ /cinɯn/
Pronunciation [ʨipt˭a] [ʨiʌsʌ] [ʨimnɯn]
Hangul 깁다 기워서 깁는
Analysis /kipta/ /kiwʌsʌ/ /kipnɯn/
Pronunciation [kipt˭a] [kiwʌsʌ] [kimnɯn]

Since they follow the same pattern, it would make more sense to analyze the latter verb as /kiw-/, but to me the fact that it does not means it is very heavily influenced by the orthography, which I think should be avoided.


Current Korean entries also give IPA transcripts which I find sucks. Take the example of 설화: [sʰʌ̹ɾβwa̠]

I don't know who the first person to write intervocalic /hw/ as [β], but I keep seeing it and I am tired of it. It seems to be derived from the fact that /hw/ is normally realized as [ɸ], and /h/ normally undergoes voicing intervocalically. The reason /hw/ is fricated is because of the strengthened air stream caused by the /h/, making it easier to fricate in places where it normally would produce an approximant. There's also the fact that initial /w/ is more strongly rounded than in other places, meaning that medial /hw/ will have even less chance of getting fricated. The most common pronunciation in my experience is [sʰʌɾʷa] with the /h/ completely dropped, or [sʰʌɾʱʷa], with the /l/ becoming breathy voiced. There are other problems with it also, like the fact that medial and doubled /l/ is written as [ɭ]. I believe that it is possible, and even recommended, to write apical lateral approximant as [ɭ], even if it is not strictly a true retroflex. However, given how other sounds are given such specific realizations with all the diacritics, I believe it makes more sense to write it as [l] with an apical diacritic under it. That would indeed fix the problem if it weren't for the fact that coda /l/ is also varied greatly even within Seoul Korean. Some people seem to have [ɹ] for final /l/ except before coronals and word finally. It would be misleading to say that specifically [ɭ] is the pronunciation of coda /l/ in Korean. If the purpose of IPA transcription was to help non-Korean speakers pronounce Korean words, then it does a terrible job at it, because anyone who knows IPA but not korean will see [sʰʌ̹ɾβwa̠] and read it with a consonant cluster followed by a semi vowel.


As I mentioned earlier, what actually counts as being phonemic is not well defined and there are different conventions between different linguists. I see multiple ways Korean is analyzed phonemically with different levels. For example:

Hangul 짚다 짚어서 짚는
Option 1 /cipʰta/ /cipʰʌsʌ/ /cipʰnɯn/
Option 2 /cipta/ /cipʰʌsʌ/ /cipnɯn/
Option 3 /cipt˭a/ /cipʰʌsʌ/ /cimnɯn/

Option 1 is I believe better described as "morphophonemic", because then we can make a distinction between it and the other two options, and it seems to be the most common convention. Morphophonemic analysis uses |pipes|, ||double pipes||, or //double slashes//, instead of /slashes/. Option 2 is basically what you get if you tell a Korean to pronounce something syllable by syllable. It is similar to Option 3, sans assimilation etc. Option 3 is like Option 2, but with assimilation etc rules applied. It assumes that what's pronounced the same, are phonemically the same, and it's basically phonetic hangul in IPA. I think Option 2 is the best option because it closely matches how people perceive pronunciation. Of course it also relies on orthography to some degree, although much less than the aforementioned Korean orthography based analysis.


In conclusion, I think we should come up with a standardized and consistent way to transcribe Koreanic words, including Middle Korean and Jeju, using their equivalents of whatever Modern Standard Korean would have. I believe implementing phonemic analysis for Middle Korean would be fairly straightforward, since the ortho is already phonemic, but it might not be a great idea to use IPA since you might be providing extra information of what is a reconstructed pronunciation. If we do use IPA, native transc and non-native transc (e.g. 동국정운 pronunciation) should probably not share the same system, and the latter might be better to be left untranscribed. I also want to propose a pitch accent analysis system for busan dialect, which is more toneme oriented than the phonetic approach we have right now, and it could possibly make analyzing pitch accent patterns of verbs easier. Jeannebluemonheo (talk) 12:55, 12 June 2022 (UTC)[reply]

Strong Support, we already implemented the morphophonemic change for Jeju a while back, see: 뜬 쉐가 울 넘나 (tteun swega ul neomna), and it would be amazing to finally have a phonemic transcription somewhere. AG202 (talk) 13:19, 12 June 2022 (UTC)[reply]

French pre/post-1990 spellings

[edit]

@PUC I'm curious to understand why we seem to prefer pre-1990 French spellings but post-1996 German spellings. As an example, the French verb meaning "to know/to recognize (a person)" is lemmatized under the pre-1990 spelling connaître, and the post-1990 spelling connaitre redirects to it. The article Appendix:French spelling reforms of 1990 just says this:

Some [post-1990 spellings] are now more prevalent than the still correct pre-1990 spellings, but many less. On Wiktionary, French words with revised spellings are usually treated as alternative spellings, while the traditional spelling is the main article.

This doesn't give any explanation as to why Wiktionary prefers pre-1990 spellings. Benwing2 (talk) 23:12, 12 June 2022 (UTC)[reply]

I read that as saying that pre-1990 spellings are still commoner than post-1990 spellings, and to keep things simple, we uniformly standardise on the pre-1990 spelling. --RichardW57 (talk) 23:31, 12 June 2022 (UTC)[reply]

CAT:D pages added by User:Fish bowl

[edit]

Hi. There are > 100 Talk pages in CAT:D added for speedy deletion by User:Fish bowl. I want to make sure these are correctly added. They are all tagged with either "copyright violation" (because someone asked "please translate the following" along with a quote) or "spam". The ones labeled "spam" in particular I'm not sure about. E.g. in Talk:麺, someone asked for a Cantonese pronunciation, which was answered by someone else, who added the pronunciation. Another example is Talk:㓃, which has a couple of topics, one of which asks whether the character is simplified or traditional, and another asks for clarification of the contexts of the various Mandarin readings. These don't seem obviously like spam to me, and I'm not sure why they're tagged. Benwing2 (talk) 00:57, 13 June 2022 (UTC)[reply]

Agreed. Fish bowl, can you give us an example of a copyright violation and the source that is being violated? (Note that [https://en.wiktionary.org/w/index.php?title=Special:Contributions/Fish_bowl&offset=&limit=5000&target=Fish+bowl I don't see any edits with an edit summary stating this.) —Justin (koavf)TCM 01:32, 13 June 2022 (UTC)[reply]
I did see something tagged that way and the text was Chinese so I left it alone, as I didn't understand. I wonder if this is perhaps the same user who used to create huge numbers of rather useless talk pages saying "can it be added..."? (We have rfp, rfe, etc. templates for this.) Equinox 03:19, 13 June 2022 (UTC)[reply]
It is, yes. — SURJECTION / T / C / L / 14:47, 13 June 2022 (UTC)[reply]
they're spam. [1]Fish bowl (talk) 19:17, 13 June 2022 (UTC)[reply]
I'm not sure how this link (an edit of yours) supports the claim that someone else's edits are spam. I did note that you added all of these speedy deletion templates after the proposal that they be deleted failed to gain consensus, and that seems like a very bad faith use of the speedy deletion template. - TheDaveRoss 19:22, 13 June 2022 (UTC)[reply]
Every Chinese editor who I've talked to doesn't like this guy. I kind of don't give a fuck anymore about the "keep it 😠" opinions of non-Chinese editors 🤷🤪 —Fish bowl (talk) 20:11, 13 June 2022 (UTC)[reply]
Perhaps best to ignore them rather than create more work for others against consensus. - TheDaveRoss 20:13, 13 June 2022 (UTC)[reply]
I put in futile work answering too many of these in the past. (Did you? Would you like to try?) How hard is it to press "delete" 🤪 —Fish bowl (talk) 20:20, 13 June 2022 (UTC)[reply]
I didn't even mark them all (although I could 😳) This is just a small corner. —Fish bowl (talk) 20:24, 13 June 2022 (UTC)[reply]

Decades

[edit]

@BD2412 When you go to 1360s, you see "deleted page 1360s Per RfD discussion on Decades". I recently created 1370s and 2160s with cites (1370s has stronger cites). What would the participants of the previous discussion think of a piecemeal creation of decades articles IF they have good cites? Thanks. --Geographyinitiative (talk) 10:20, 13 June 2022 (UTC)[reply]

I don't think this is a good idea. Such numbers are created in an entirely predictable way, so there is no need to have such entries at all, whether or not quotations can be found for them. — Sgconlaw (talk) 11:02, 13 June 2022 (UTC)[reply]
Regarding @Sgconlaw's statement, I am 100% neutral on the issue of whether these decade entries fall within Wiktionary's scope. If you want them, I'll work on them. If you don't want them, I'll delete them. However, I do think that Wiktionary:Criteria_for_inclusion#Numbers,_numerals,_and_ordinals (or somewhere on that page?) should talk about decade entries and reference the relevant discussion (sorry if it's there and I'm not seeing it). The 1370s entry is facially similar to the 1990s article, so there must be something I'm missing. --Geographyinitiative (talk) 11:16, 13 June 2022 (UTC) (modified)[reply]
Category:en:Decades shows extensive coverage of the 18th through 21st centuries, and barely anything else. Anyway, I see no principled reason to treat 1990s differently from how we treat 1370s, except recency bias. 98.170.164.88 15:13, 13 June 2022 (UTC)[reply]
Do you really not see the difference? What if you took it a little further back, say the BCE 278990s? That was a decade that happened (I assume), but in since the goal of the project is not to be a calendar but to instead be a dictionary, there is perhaps less value in having "definitions" for highly predictable numeric constructions which are unlikely to be used in any manner other than the most narrow, literal ones. Similar to first being a tremendously useful and often used ordinal, but two-hundred-seventy-eight-thousand-nine-hundred-and-ninetieth being somewhat less so. - TheDaveRoss 15:21, 13 June 2022 (UTC)[reply]
You would not be able to find three independent quotations for BCE 278990s, so the comparison fails. By the way, I said there is not much reason to treat them differently. That doesn't rule out deleting 1770s, 1870s, and 1970s along with 1370s as all being predictable/non-CFI-worthy terms. Their content and usage is entirely analogous. 98.170.164.88 16:11, 13 June 2022 (UTC)[reply]
At this point I am fairly certain that I could find three cites for it on UseNet, it seems like everything which is possible to type has been typed there. But whether or not something is attestable isn't actually relevant to the argument, or the previous discussion. It is very easy to attest "the sky is blue", that isn't a counter-argument to the policy to exclude sum-of-parts terms. - TheDaveRoss 16:27, 13 June 2022 (UTC)[reply]
In all fairness the terms are not SoP, and this issue is dealt with at WT:CFI#Issues to consider. They're also no more formulaic than many other kinds of entry, such as plurals. Theknightwho (talk) 23:29, 13 June 2022 (UTC)[reply]

Here is the RfD discussion on these entries. I would suggest, rather than adding routinely generated entries for decades that are lexicologically unremarkable, we should add content to the ones that are lexicologically remarkable. For the last century, at least, each decade has its own cultural associations — the "roaring" twenties, the 1930s (global depression), 1940s (war and aftermath), 1950s (postwar boom), 1960s (counterculture movement), 1970s (disco and stagflation), 1980s (consumerism), 1990s (grunge vs. synth-pop and post-Cold War), 2000s (war on terror), etc. bd2412 T 16:23, 13 June 2022 (UTC)[reply]

This feels like recency bias, though. It's not like the twentieth century was the only time period to have associated culture or events. By the same token, shouldn't 1776 or 1770s be an entry, since it was the year/decade of the American Revolution? (Even used metaphorically: Alex Jones said "1776 will commence again"; " "spirit of the 1770s" has been used) Should 1492 or 1490s be an entry because of the discovery of the Americas ("spirit of 1492")? etc. We could draw a line in the sand and say that only things from 1900 and on are allowed, but that's pretty arbitrary. Maybe you're saying that for each decade we need to separately determine whether there is some significance beyond just referring to the mere time period. I'm not sure how you'd precisely draw that line, though, so maybe you can expand on that. 98.170.164.88 16:51, 13 June 2022 (UTC)[reply]
@BD2412: yeah, not keen on that idea. To me, all the "decade" years are simply SoP. — Sgconlaw (talk) 18:31, 13 June 2022 (UTC)[reply]

Just to guestimate what we're talking about here, I'm thinking that if we went "full bore", it would be 100 entries per millennium, so if you get all of 2000 BC-AD 3000 (which would be hard?), that will be maximum 500 entries (depending on citations, which will be harder near each end). Then there will be decades outside that range are the focus of sci-fi or scientific speculation or the focus of archaeology. Anyway, I doubt the whole collection, if confined to that which can be cited, would exceed 400 entries. Again, I am neutral on the issue. --Geographyinitiative (talk) 20:16, 13 June 2022 (UTC)[reply]

You should take into account that this amount should be multiplied by the number of WDLs we have. Thadh (talk) 22:48, 13 June 2022 (UTC)[reply]
This is another good question. Idk how many languages use the letter "s" here? Variants? Etc? The Iran's calendar would have decades of their own- 300-400 pages of that in Farsi then too, I suppose. Again, I am neutral on the issue, but I would find it fun to do cites for these as I ran across them. --Geographyinitiative (talk) 22:55, 13 June 2022 (UTC)[reply]
Based on the cites in 2160s, it does seem SOP. We do seem to accept that several categories of unspaced but formulaic things are SOP, e.g. episode numbers (Talk:S01E01), Latin -que words (Talk:fasque) and Tzotzil -e words (Talk:antse), chemical formulas, and yes, decades (Talk:1700s). So 2160s should probably be deleted per that. A few decades have stronger arguments for inclusion, not because of specific cultural associations per se, but because those associations pull the period referred to as "the XXs" out of the actual period from XXX0 to XXX9. For example, a fair bit has been written about how (in English) the 60s refers to a cultural period from 1963-64 to 1970 and the 90s refers to a cultural period from 1998-99 to the early 2000s ("Blink-182? Didn’t go mainstream until 1999. Shrek? 2001. The Tony Hawk series? Debuted in 1999 and peaked in popularity around 2003. You’ll struggle mightily to find a cultural touchstone of “the 90s” that dates earlier than maybe late 1998"). - -sche (discuss) 21:49, 14 June 2022 (UTC)[reply]

The logic being applied here would justify the deletion of the vast majority of English plural forms. There are a finite number of decades for which this format will see any use, and it’s not that high. Theknightwho (talk) 21:54, 14 June 2022 (UTC)[reply]

So-called "wiki" is secret alt-right hive!!!!

[edit]

We should probably be preparing some DAMAGE CONTROL... can you imagine what will happen when Twitter, Wired, and Salon find out that we have got 57 variants of the n-word? Especially with the recent IPs who keep adding stupid slurs like Buttswana. Presumably the answer is "well, they are words, and we are volunteers". Right. What are we really going to do? Equinox 13:52, 13 June 2022 (UTC)[reply]

Locking Fay Freak in the shoe cupboard might be a good start. Equinox 13:53, 13 June 2022 (UTC)[reply]
How many variants of the f-word do we have? bd2412 T 17:01, 13 June 2022 (UTC)[reply]
I am probably more right wing than most people on here, but I tell you that some of these words you guys find out there on the intertubes are f'n wild. But I think it really was worthwhile for Wiktionary to document the horrific term "Citations:niggership"- no other dictionary had this evil term, and now we know a little something about its 19th century roots. --Geographyinitiative (talk) 20:27, 13 June 2022 (UTC)[reply]
Collective nostalgie de la boue. – Jberkel 21:09, 13 June 2022 (UTC)[reply]
Meh. As long as we're describing offensive words as offensive, and not giving them undue prominence (e.g. when the "Synonyms" section of Jew was a long list of slurs that was bad), we're a dictionary defining words people have used. Of course, in cases where "words people have used" means "obscure/nonce slurs someone with a few usernames on usenet used in 2001 and 2002", or more recently "4chan op coinages cited via reddit/twitter", we should do better; if we get criticized for falling for some 4chan op invented word, I suppose we deserve it and may it impel us to improve our CFI. If we get criticized for documenting that people have used N-words for a few hundred years, meh. (As to the other point: I do think based on other factors that FF is, like you once suggested Dentonius was, an "entryist", but I'm not sure how many of these entries he was involved in making.) - -sche (discuss) 21:24, 14 June 2022 (UTC)[reply]
@- -sche The issue is surely the number of uses something has, and whether it actually lexicalised as a genuine term. There’s a difference between objecting to nonce words that have been independently coined a few times, and objecting to terms that just happens to only be used on Reddit and Twitter. I’m not sure it’s a good idea to conflate the two, particularly when I’m pretty sure you mean 👌, which has seen pretty widespread use. Theknightwho (talk) 21:38, 14 June 2022 (UTC)[reply]
Oh, no, I'm thinking of the various pukeskin, cumskin, etc type rare/nonce insults, and (as far as "coordinated"/"op" stuff) things like clovergender. I don't know whether the OK emoji is attestable, but I agree the gesture is a genuine white-supremacist signal, flashed by Stephen Miller in the White House etc (as you said at RFV, the "hoax" there is not "the gesture is white supremacist" but rather "the gesture isn't white-supremacist, it's just a joke somehow!"). - -sche (discuss) 22:04, 14 June 2022 (UTC)[reply]
Thanks - sorry for being a little prickly. It’s something about that term in particular. Equinox did mention the idea of having a post-ironic label, which I think would make sense for a bunch of these 4chan coinages (among others). The blurring of humour and sincerity is 4chan’s MO, after all. Theknightwho (talk) 22:19, 14 June 2022 (UTC)[reply]
Post-post-post-whatever is not meaningful. Whether we are serious or joking when we call someone an "XYZ", the word still has its meaning. The sarcasm is something beyond a dictionary. If I get on a video game server and call someone (sorry, I don't play these games, so I dunno) an "epic winner", and I mean they are actually shit, that isn't a new sense of "winner", that's just me taking the piss. There might be a very, very few words that are mostly used sarcastically, and not used honestly, but I'm not sure. That would be a usage note. Equinox 04:18, 17 June 2022 (UTC)[reply]
You all are very rude. It could not take long for the editorship to miss me, for the quality edits I create myself as well attract by new editors often from or for non-Western countries who feel encouraged. That this project has reached higher agreement and refinement in presentation matters without becoming an echo chamber is also the work of my memorious distinctions, presented on many an occasion of possible controversy.
Alt-rights are still strawmen, whom we have barely encountered and whose agenda would barely withstand the hard reality of lexicography. Have I mentioned that extremist groups feel enticed by attempts to exclude them rather than assimilate them? We make all as boring as possible for them as well as for so-called vandals, whatever the distinction may be, and thus they are stripped of their essentials. Fay Freak (talk) 10:32, 16 June 2022 (UTC)[reply]
Every time I think "THE WHITE MAN IS THE REAL OPPRESSION VICTIM" I just check out your posts and I feel okay again. Here's a beer. Equinox 04:19, 17 June 2022 (UTC)[reply]

A content-neutral way to look at the problem with offensive terms

[edit]

There are a number of ways that nonsense can enter the mainstream, but offensive nonsense gets eliminated fairly quickly. Wiktionary, on the other hand, only eliminates nonsense via processes that take time. That creates an incentive for people to add offensive nonsense here: it may get deleted, but fringe interests get a period of mainstream exposure that they wouldn't get elsewhere.

Our approach should be to neutralize this incentive by removing the conditions that create it. Chuck Entz (talk) 15:09, 13 June 2022 (UTC)[reply]

@Chuck Entz This is exactly what I've been trying to do with my proposals above. Whether we like it or not, we do have an impact. I'm getting frustrated with being told by the same folks that we don't have an impact here, even though I've seen it with my own eyes (and all it takes is a few news articles about this project). I would like to propose once again that offensive terms should require citation on at least two to three sources (websites, books, etc.). This would preserve the majority of offensive terms that we have while limiting the amount of nonce terms that would've never seen the light of day otherwise. (CC: @Equinox since this a similar issue to your point above) AG202 (talk) 15:29, 13 June 2022 (UTC)[reply]
One thing we could do is just flip the attestation requirement to be "up front" for all terms. The benefit there is that everything comes with citations providing evidence of its actual use, the downside is it is a huge barrier to creating new entries, especially for people who are less familiar with the practices and policies here. Anyone who saw any entry or definition about which they were skeptical could delete it and immediately create an RFV asking for evidence of usage. If someone added red and claimed it was a color, my guess is that nobody would feel compelled to delete/RFV it, but if someone added red and claimed it was yet another neo-Nazi word for Jew, well I can imagine lots of people would question the veracity of that definition and ask for further research. I don't think it is a great injustice, or a disservice to freedom-loving Wiktionary mirrors to put a short wait pending verification on less used or fringe terms. - TheDaveRoss 15:31, 13 June 2022 (UTC)[reply]
@TheDaveRoss For all terms? While this is more neutral, I feel that this would not be implementable nor practical unfortunately. The majority of the entries I create do have citations (ex: Jeju ᄒᆞ다 (hawda), Yorùbá ọ̀kan, and English yassification) as I try to focus on quality (and have experience dealing with some ... choice RFVs), so I'm very well-versed in the tasking process of finding and adding quotes. And so, I wouldn't feel comfortable putting that burden on new users, even though I would prefer that there were more citations on the website. It's just too much work and effort for often little return as I've found. AG202 (talk) 15:46, 13 June 2022 (UTC)[reply]
What it would functionally do is give the people with the ability to delete entries more power to do so at their discretion, pending a completed verification process. Since 98% of entries are non-controversial, it wouldn't change anything with those (since hardly anyone would be inclined to delete such entries). Some other number of entries are already deleted on sight, nothing changes there either. The difference would be in the small number of entries which are currently added and then immediately sent to RFV or RFD, it would not be allowable to delete those while sending them to RFV or RFD and then restore them pending successful outcomes. Very little would actually change except that dubious entries would have to wait slightly longer to be visible to the masses. - TheDaveRoss 16:13, 13 June 2022 (UTC)[reply]
I think I'm on board with this. I am certainly the guy Twitter hates, who thinks that "I am offended" is frequently weaponised, but there's no doubt we keep getting a lot of crappy unsubstantiated slurs lately and if the rule is just "you have to cite it if it looks rude" then yeah, um, I could go with that. I'm rather sick of these entries. Equinox 16:14, 13 June 2022 (UTC)[reply]
Within existing rules we can exercise discretion on offensive entries as follows:
  1. Speedily delete poorly formatted offensive entries and poorly worded offensive definitions
  2. RfV offensive definitions as soon as they are seen
  3. Withhold citation effort for offensive entries and definitions
  4. Promptly delete after 30 days
Why is this not sufficient? DCDuring (talk) 16:34, 13 June 2022 (UTC)[reply]
I think the biggest gain is that, in the case of trolls and other bad-faith adders, there isn't the validation of the terms sticking around. Also we wouldn't be propagating them out into the internet at large via the many mirror sites which just copy Wiktionary data directly and present it, sometimes without any context or caveat. - TheDaveRoss 18:17, 13 June 2022 (UTC)[reply]
What about putting offensive entries/definitions "on hold" with respect to inclusion in dumps or whatever APIs the mirror sites use. We could go further and suspend their visibility until they pass. I realize I am talking through my hat about the technical possibilities, but WP holds certain contributions in suspense until review. I am also amazed what kind of things are technically possible. DCDuring (talk) 18:54, 13 June 2022 (UTC)[reply]
That solution would be more difficult and solve half of the problem, not sure why it would be preferred. We don't have much control over what content of ours people choose to take, and without making actual software changes the best control we have is deletion. - TheDaveRoss 18:59, 13 June 2022 (UTC)[reply]
I assume that they take it all from dumps, probably not from the much larger diff files. Content excluded from dumps would probably not be on most mirrors, if my assumption is correct. That would be the objective. I just don't think that arbitrary Ptolemaic epicycles to our rules are better than mostly automated solutions. DCDuring (talk) 20:48, 13 June 2022 (UTC)[reply]
I still think the best solution would be to simply ban new users and IPs from adding offensive terms in the first place. I guarantee it'd filter out all the nonsense we've been getting lately. We already block those users from editing pages for offensive terms, so it seems odd that we'd allow them to add new ones. Binarystep (talk) 06:32, 16 June 2022 (UTC)[reply]
This seems like a good idea if it could be done, but how could it be done? (Manually block any new user who adds offensive terms?) - -sche (discuss) 10:27, 16 June 2022 (UTC)[reply]
That'd probably be the easiest way, though I wonder if some sort of filter would be feasible. Binarystep (talk) 10:33, 16 June 2022 (UTC)[reply]
The main hurdle I see to formalizing an "attestation up-front" requirement for offensive terms as an official rule is...the process of formalizing it; I can see it getting derailed in discussions of what is offensive; nonetheless, I'd support it. Any rule can be rules-lawyered or gamed; well moderated sites' mods have some discretion (to block or delete things for violation the "spirit" of rules even if not the "letter", and to interpret things like "offensive".). We have some discretion inasmuch as "Creative invention or protologism" is a stock deletion rationale, and we could use that more often. I also like the idea that when we delete some offensive protologism (whether under any new rule or as a "Creative invention or protologism" now), if someone challenges that, the RFV can proceed for its usual month while the entry stays deleted until it's actually cited/RFV-passed; there's no reason an entry needs to be live for the RFV process to operate (as long as the definition to be cited is copied over to the RFV thread). - -sche (discuss) 10:27, 16 June 2022 (UTC)[reply]
I prefer putting offensive definitions on hold pending passing RfV rather than devising other special rules. It wouldn't be bad to do so for non-offensive definitions, so the overenthusiastic application of the 'offensive' label would do little harm. I'd also favor suspending existing uncited offensive definitions if some number of contributors greater than three agreed. DCDuring (talk) 12:52, 16 June 2022 (UTC)[reply]
You can talk all month about this, but what you really want to do is ban IPs. Equinox 04:20, 17 June 2022 (UTC)[reply]
Hey, hold on, are you talking about me?  :-)
—DIV (1.145.44.125 06:27, 29 June 2022 (UTC))[reply]
P.S. You could also just be me, or SemperBlotto, and block people who look like bad faith, instead of wringing your hands and letting them creating 80 entries which we then spend the next entire year putting solicitously through RFV. It worked in the old days. Ask Blotto what he thinks. Best wishes, Equinox 04:28, 17 June 2022 (UTC)[reply]
If you don't like it, you're gonna be terrified when I finally die and you wonder why there is a huge sudden influx of bullshit you have to deal with. Is there a statistician in the house. Equinox 04:29, 17 June 2022 (UTC)[reply]
Actuarially speaking we can expect something like 4 million more edits out of you, get cracking. - TheDaveRoss 17:39, 17 June 2022 (UTC)[reply]

Shitgibbons

[edit]

In the spirit of the season, I wanted to bring up the topic of shitgibbons. This is the name that was coined a few years ago for those tiresome insults like cockwomble, jizztrumpet, cuntwaffle and wankpuffin that get used as faux-Britishisms by fans of Benedict Cucumber Sandwich, but also covers such delightful words as fucknugget, shitlicker, turd burglar and so on.

Naturally, documenting this is of the highest priority, but unfortunately the evidence for the term is a bit scant. There are some blog posts that do seem to be by genuine linguists [2][3][4] and an opinion piece, as well as a few other bits and pieces scattered around the web, but it would be good to know if there's something a bit more concrete.

So I guess my question is whether (a) anyone knows a more formal term, and (b) whether this is a phenomenon that crops up in other languages, because I genuinely do think it's deserving of a category due to its usefulness in etymologies. At the moment the entries just say things like wank +‎ puffin, which is utterly useless to anyone who wants to know where the word came from, and says nothing of the wider lnguistic phenomenon that it developed out of. Theknightwho (talk) 01:58, 14 June 2022 (UTC)[reply]

Tessier & Becker (2018). 98.170.164.88 02:31, 14 June 2022 (UTC)[reply]
Perfect. Thank you. Theknightwho (talk) 02:34, 14 June 2022 (UTC)[reply]
I salute the pair of pissdrinkers who wrote it. Nicodene (talk) 22:40, 18 June 2022 (UTC)[reply]
I think it's worth making a category for. Binarystep (talk) 03:05, 14 June 2022 (UTC)[reply]
I think "3-syllable words" should be a parent category. 98.170.164.88 04:41, 14 June 2022 (UTC)[reply]
If I'm understanding this correctly, the following entries belong in the category, in addition to the ones mentioned above and the ones recently created by WordyAndNerdy:
Not an exhaustive list, but I went through Category:English vulgarities and these were the ones that stood out.
I'm not sure if we count words that have the syllable structure but where there is a meaningful interpretation (e.g. "assmuncher"), so I excluded them in favor of entries where the second component was obviously there for prosody. Other terms I was uncertain about since they don't sound quite like the pattern to me: douche canoe and fuck-knuckle. 98.170.164.88 05:34, 14 June 2022 (UTC)[reply]
From Category:English derogatory terms: assmonkey, cockweasel, dickweasel, shitpuddle, twatwaffle. Possibly (but not vulgar): nutburger, scumbucket, sleazebucket, slimebucket. 98.170.164.88 06:06, 14 June 2022 (UTC)[reply]
It definitely would include assmuncher - turd burglar is similar in that it's also not just total nonsense. Thanks for these - I'll get them added. Even the seemingly milder ones still follow the pattern (e.g. nut is being used to mean "crazy person"). Theknightwho (talk) 11:33, 14 June 2022 (UTC)[reply]
I second this in that I don't think that just any 3-syllable vulgarity that is composed of a 1-syllable and a 2-syllable word is a shitgibbon, which seems to be @Theknightwho's working definition: dick muncher, jizz bucket, nutsucker, dicksucker, dickrider, even piss drinker. My personal opinion is that the second part must not be semantically relevant for it to be a shitgibbon. Pinging @-sche, DCDuring, Benwing2. — Fytcha T | L | C 03:00, 15 June 2022 (UTC)[reply]
No, that isn't my working definition. There are specific requirements:
  • It must be an insult.
  • The stress must be antibacchius.
  • The first word must be an expletive.
  • The second word must be a noun.
Theknightwho (talk) 03:10, 15 June 2022 (UTC)[reply]
Note that none of the examples in the original blog post had meaningful second parts as in "dicksucker". And I would tend to agree that "dicksucker", "asskisser", etc. is a different phenomenon than what is going on with "cuntwaffle". 98.170.164.88 03:17, 15 June 2022 (UTC)[reply]
That is true. There is definitely a pattern going on with words consisting of a 1 syllable noun + a 2 syllable agent noun, with an antibacchius stress pattern. In an overwhelming number of cases they're used as insults, and almost all use an expletive or pejorative for the first word. If not shitgibbons, they're very closely related. They're overwhelmingly derogatory, too, which does not apply to true shitgibbons which use innocuous nouns. Theknightwho (talk) 03:36, 15 June 2022 (UTC)[reply]
@Theknightwho: Thank you for changing these words back again; I wholly agree with the contents of Category:English shitgibbons now. — Fytcha T | L | C 10:15, 15 June 2022 (UTC)[reply]
Theknightwho keeps edit-warring at shitgibbon to restore their poorly-formatted version of the etymology section, including a confusing, unhelpful circular explanation of the term's origin. Someone ought to tell them to knock it off. I thought working on "shitgibbon" entries would be a fun distraction from the heavier CFI matters that have arisen lately, but I'm checking out again before this drives up the wall. WordyAndNerdy (talk) 04:23, 16 June 2022 (UTC)[reply]
To give background to this, the issue is with the etymology beginning "Shitgibbon of shit + gibbon." However, I've already explained at Talk:shitgibbon why this makes sense in the context of the rest of the etymology section. The word "shitgibbon" does not define what it means for a word to be a shitgibbon, and it didn't inspire all of the others. The causal relationship is the other way around: it was coined by linguists precisely because it came about as a shitgibbon in the first place. There's also no issue with separating the senses with bulletpoints, either. We do that in other entries as well. Theknightwho (talk) 04:36, 16 June 2022 (UTC)[reply]
There was a time where I thought Wonderfool had got a girlfriend, bc a number of these stupid words got girl audio in short order. Personally I used to like to do a bit of audio now and then, but I realised that audio is something where AI will beat us in a few years (we can generate perfect speech from the IPA). As humans, we should spend our effort on doing things that only humans can do: writing convincing definitions based on our knowledge of language and novels is good. Although some Harry Potter fuck-wit will ruin them in six months. Equinox 04:24, 17 June 2022 (UTC)[reply]
WF hasn't ever got their other half to record audio. Zumbacool (talk) 09:21, 23 June 2022 (UTC)[reply]
Oh really? Then what is this? — Fytcha T | L | C 00:14, 5 July 2022 (UTC)[reply]
Lol, she's awesome Zumbacool (talk) 12:46, 5 July 2022 (UTC)[reply]

How to follow up on a move request

[edit]

From what I understand, only admins can move pages. In April, I added a request to Wiktionary:Requests for moves, mergers and splits in April, which led to no discussion or other followup: Telchinis→Telchines and Hyadis → Hyades. These pages are currently located at non-lemma forms of the nouns, so moving them shouldn't be controversial ... the only question might be whether the singular Hyas or plural Hyades is better (both the cited dictionaries, L&S and Gaffiot, have the main entry at the nominative plural form). Is the best thing for me to do next making a Beer Parlour topic like this to get an admin's attention? Or just wait? Unlike with RFV, I don't understand what timeline to expect. Urszag (talk) 04:38, 14 June 2022 (UTC)[reply]

@Urszag Non-admins can move pages as well. Normally a redirect gets left behind, but if you have the right privileges (maybe it's the "mover" privilege?), you can disable that and not leave a redirect. I don't see why you shouldn't get that privilege; unless someone objects in the next couple of days, I'll give it to you. Benwing2 (talk) 07:54, 14 June 2022 (UTC)[reply]
Discussion moved to Wiktionary:Requests_for_moves,_mergers_and_splits#Renaming_Category:Disputed_terms_by_language_to_Category:Proscribed_terms_by_language.

Make Template:euphemistic spelling of categorize into a different category than Category:Euphemisms by language

[edit]

I feel like there is an important difference between bullsh*t and at peace: the former is an attempt by the writer to adhere to the societal rule of not using profane language in certain settings and/or to distance themself from the profane nature of the underlying word while still conveying the exact profane word, whereas the latter is intended to evoke a different, milder feeling in the recipient than a non-euphemistic synonym would. I suggest renaming Template:euphemistic spelling of to Template:censored spelling of and making it categorize to Category:Censored spellings by language but I'm not married to this exact naming scheme; I'd be pleased with any scheme that clearly differentiates between the two. — Fytcha T | L | C 20:12, 14 June 2022 (UTC)[reply]

Good point. I agree these should be differentiated; at a minimum, looking at how {{rare spelling of|en|foobar}} catgorizes into Category:English rare forms, I would've expected {{euphemistic spelling of}} to produce "Category:English euphemistic forms" (etc. m.m. for other languages). But "euphemistic" doesn't really seem like an intuitive way of describing bullsh*t, so I agree that something like "censored spelling" or "redacted spelling" (if "censored" is a loaded word) seems better. - -sche (discuss) 20:47, 14 June 2022 (UTC)[reply]
@-sche, Fytcha I agree with this, but should we use 'censored' or 'redacted'? Also there are some entries like fuggheaded and forkhead that are less obviously "censored" or "redacted". Benwing2 (talk) 00:05, 19 June 2022 (UTC)[reply]
@Benwing2: forkhead is much better categorized as a minced oath, no? Unless the claim is that it has the same pronunciation as fuckhead which I find doubtful. fuggheaded is tougher; either put it in a third category or bite the bullet and put it in the censored/redacted one. — Fytcha T | L | C 00:13, 19 June 2022 (UTC)[reply]
@Fytcha I think fuggheaded is more a euphemism than a censored/redacted spelling. Benwing2 (talk) 03:49, 21 June 2022 (UTC)[reply]
I would expect fugghead(ed) to also have a different pronunciation, so I feel like it could also be described as a minced oath, or else as a euphemism for... (creating a template {{euphemism for}}), rather than as just a euphemistic spelling of.... What do you think of that? Then putting it directly into the "Euphemisms" category would be more coherent. (Of course, we could even decide to keep a "Template:euphemistic form of" / "Category:Euphemistic forms", as well as "Category:Euphemisms" and "Category:Censored forms"/"Redacted forms" (not sure which is better; trying to judge their relative commonness in Google Books is of little help), although how well people would maintain distinctions between all three is perhaps questionable.) - -sche (discuss) 17:16, 21 June 2022 (UTC)[reply]
@-sche, Benwing2: To settle censored vs. redacted, I am slightly in favor of censored because it has more cognates which makes it in expectation slightly less confusing for ESL users. I propose the following scheme:
If we do this and get rid of {{euphemistic spelling of}} then I am actually quite confident that people will be using the intended templates. Tell me if you like this scheme or how you would like to have it changed. — Fytcha T | L | C 19:18, 21 June 2022 (UTC)[reply]
@Fytcha Sounds good to me. Benwing2 (talk) 02:54, 22 June 2022 (UTC)[reply]
Yeah, sounds reasonable. I suppose it probably does make more sense to treat minced oaths via a "form-of" template rather than just defining them as their base word with a "minced oath" label. - -sche (discuss) 23:27, 23 June 2022 (UTC)[reply]
@Benwing2, -sche: Thanks for confirming. I've implemented the first of the bullet points now.
What do you think about forms like p1ll and @$$? There's a whole range of these anti-filter spellings, most of them don't exist on Wiktionary but seem to be easily citable off of Usenet (think f@ggot and the likes). In my opinion, they should be Category:English leet and/or Category:English filter-avoidance spellings. Do you think creating another form-of template and category is justified? Otherwise I'll change them to {{lb|en|Internet slang|leet}} {{alternative spelling of|en|}}. Thanks for the help in this thus far. — Fytcha T | L | C 01:41, 24 June 2022 (UTC)[reply]
@Fytcha Depends on how many such spellings exist and/or we think will exist in Wiktionary. I suspect not too many, so it's not clear it's worth it to create another category. Benwing2 (talk) 06:26, 24 June 2022 (UTC)[reply]
@Fytcha I got rid of all the remaining terms using {{euphemistic spelling of}} except for fuq and fuk; not sure how to classify them. If you can figure this out, we can rid of this template. Benwing2 (talk) 00:08, 26 June 2022 (UTC)[reply]
@Benwing2: Tough. I've just changed them to {{deliberate misspelling of}} for lack of a better alternative. — Fytcha T | L | C 00:16, 26 June 2022 (UTC)[reply]
Just chiming in to point out that there are a lot of spellings like this, even though most of them aren't on Wiktionary yet. Pretty much every term related to sex, drugs, or advertising has a handful of misspellings intended to trick spam filters. Binarystep (talk) 06:21, 26 June 2022 (UTC)[reply]
I'd support creating Category:English filter-avoidance spellings. Binarystep (talk) 06:18, 26 June 2022 (UTC)[reply]

🏁 as a translation of chequered flag

[edit]

Do we want these? I've seen some back and forth on these in another article that I can't remember. Either way, it'd probably be helpful to write down the community consensus in WT:TRANS. Pinging @Equinox, Fay Freak because I think you've been involved in something like this in the past. — Fytcha T | L | C 12:51, 16 June 2022 (UTC)[reply]

I don’t remember taking a stance on them, for I ignore this by reason that it is easy and the harm from these entries is predictably low and they will add them anyway, and it may be an argument that people seek out emojis as translations when remembering their names (although stupidly since this is the kind of thing you should and can search offline much easier with many a setup), probably it is even SEO, so I would rather just do nothing. Fay Freak (talk) 14:03, 16 June 2022 (UTC)[reply]
A translation into what? What language is emoji? Is it the same language as memes or is that a different language? What about animal noises? Facial expressions? Is there an ISO code for Extremely Modern Old Egyptian? - TheDaveRoss 15:31, 16 June 2022 (UTC)[reply]
I agree - it's not a translation. It might be possible to substitute it in various contexts, but that doesn't mean it's being translated into "Translingual". That doesn't make any sense. Theknightwho (talk) 15:33, 16 June 2022 (UTC)[reply]
Previous discussion: Wiktionary:Information desk/2021/June § Translingual translations to emoji. J3133 (talk) 15:40, 16 June 2022 (UTC)[reply]
Definitely not. It's not a translation. Simply ask any professional translator. Equinox 03:36, 17 June 2022 (UTC)[reply]
We also need to punish and lock away the people who regularly add the fuckin ice-cream emoji as a "translation" of the word ice cream. Ummmm yeah what language is that, and what grammar does it use? "Picture of X" ain't the same as "translation of X". Equinox 05:54, 17 June 2022 (UTC)[reply]
Counterpoint: This is useful cross-referencing between entries that's worth including, in much the same way as it is useful to link em dash to , cotangent to cot, and cat to Felis catus. The problem is that calling it a "Translingual translation" makes us look stupid. A slightly less dumb way to handle this would be to put a more structured list at the beginning of the list of translations:
But that doesn't deal with the fact that we'd still be labelling them "translations". Even smarter would be to put them in a separate section of the entry ("See also" perhaps? Anyone worked out what that's for yet?) This, that and the other (talk) 06:56, 17 June 2022 (UTC)[reply]
“See also” seems a good place. — Sgconlaw (talk) 12:03, 17 June 2022 (UTC)[reply]
I was absolutely gonna say the same thing as Sgconlaw: I believe it's useful to link the entries, but we shouldn't pretend they are synonyms, or translations. "See also" is fine. Equinox 12:17, 17 June 2022 (UTC)[reply]
I did it at chequered flag. This, that and the other (talk) 12:27, 17 June 2022 (UTC)[reply]
@This, that and the other: By the way: "anyone worked out what see also is for?" I think that's exactly what it is: stuff that we know is related, or worth linking, but isn't explicitly a hypernym, hyponym, synonym, meronym, example, or whatever you can make up! There may be a far-future time where we literally connect everything semantically (uh, once we get rid of IPs, and Wonderfool), but this is a good start for now. Equinox 12:20, 17 June 2022 (UTC)[reply]
That's pretty much how I see "See also". (P.S. All "Wiktionary:Entry layout" says about it is this: "The “See also” section is used to link to entries and/or other pages on Wiktionary, including appendices and categories. Don’t use this section to link to external sites such as Wikipedia or other encyclopedias and dictionaries.") — Sgconlaw (talk) 12:52, 17 June 2022 (UTC)[reply]
We should ban Translingual translations with an abuse filter... This, that and the other (talk) 12:28, 17 June 2022 (UTC)[reply]
It would be better to change the translation adder first. Chuck Entz (talk) 20:33, 17 June 2022 (UTC)[reply]
Make me an interface-admin and I'll gladly take care of it 😉 This, that and the other (talk) 08:29, 18 June 2022 (UTC)[reply]
[5]Fytcha T | L | C 12:31, 17 June 2022 (UTC)[reply]
Yeah, "See also" seems like a better place to put these than "Translations". - -sche (discuss) 20:23, 17 June 2022 (UTC)[reply]
Okay I've got CAT:Translingual translations almost empty. Hopefully someone here can figure out what to do with the last four entries. This, that and the other (talk) 14:18, 18 June 2022 (UTC)[reply]

I tried to import w:User:Abcormal/List of numbers in various languages to Appendix:Numerals in various languages, but an edit filter prevented me from importing it in its entirety, since I got an automated warning message saying that "this action has been idenfitied as harmful." This list was deleted in an English Wikipedia AfD (Articles for Deletion) discussion. Although it may be beyond the scope of Wikipedia, this looks like it would be a very useful Wiktionary appendix. Could anyone help me import it? Many of the Wikipedia templates in there will also need to be removed, converted, and/or imported. Suomitaiga (talk) 22:44, 17 June 2022 (UTC)[reply]

@Suomitaiga I could speculate about what exactly set the abuse filter off, but what I see in the abuse filter logs wouldn't have worked, anyway.
  1. To start with, all the wikilinks are wrong: this is a different project, do you would have to add "w:" to the beginning of all the wikilinks. Either that, or use the {{w}} template instead.
  2. As you mentioned, most of the templates are wrong.
    1. We dont have things like "short description" or "cleanup lang"
    2. Wikipedia uses different templates for different languages and scripts. We use basically two:
      1. {{l|[language code]|[term to link to]|[display form]|[gloss]]}} the third parameter can be replaced with |t=, and for cases where the correct transliteration isn't automatically provided, use |tr=
      2. {{m}}, which is identical except it displays in italics
      See WT:LOL for the list of language codes we use, and WT:LANGTREAT for explanations. Please note that you have to provide a language code
    3. We don't have a "note" template
    4. There are differences in the templates that are shared between projects, though I couldn't give you a comprehensive rundown on them.
    5. I'm sure many of the cited references have their own dedicated reference templates. See Category:Reference templates by language
  3. This is a dictionary, so we have entries for almost all of the words. Please wikilink them or use the {{l}} or {{m}} templates I mentioned above. I see HTML markup within words for some languages, which we don't use. That complicates things.
Aside from that, I would recommend putting in the section headers and empty tables first, then adding content to one part at a time. The content is going to need to be reworked anyway, with the type of reworking depending on the language, so even without the abuse filters it's a good idea, anyway.
I should mention that the added templates will also add system overhead, so it may be a good idea to split the appendix into sub-pages. Chuck Entz (talk) 01:49, 18 June 2022 (UTC)[reply]
I fished the headers and the smallest of the tables out of the abuse logs as a start. We seem to have a different orthography for the languages in question, so they're all redlinks except for pages with sections only for unrelated languages.
Although most languages will yield much better results, this looks like a massive undertaking / time-sink in order to do it even half right. Chuck Entz (talk) 03:37, 18 June 2022 (UTC)[reply]
I feel like these would be better on their own pages, separated out by families with a central page linking to them. Also some corrections will need to be made for sure, but I do like the idea. AG202 (talk) 13:09, 21 June 2022 (UTC)[reply]
Yes, especially scripts should sometimes be fixed. It seems the lists were compiled based on one source per language... Thadh (talk) 13:24, 21 June 2022 (UTC)[reply]
I don't think referencing a Wikipedia user page history provides permanent attribution. You're going to have to edit a copy of the Wikipedia history list to make an attribution record for contributions collected on Wikipedia. The talk page of the Wikipedia original doesn't contain an invocation of Wikipedia Template:copied, and I would have no confidence that it would be honoured even if it did. User pages not contributing to the Wikimedia project risk summary removal. --RichardW57m (talk) 16:59, 21 June 2022 (UTC)[reply]

Sicilian phonemic transcriptions

[edit]

(Notifying Inqvisitor, Scorpios90, Afc0703, A. T. Galenitis, SignorNic, 151.82.148.85, 151.18.206.223):

Many of our Sicilian entries have phonemic transcriptions that aren't actually phonemic. Some examples are:

/ɐ̠l.fɐ̠bˈbɛː.tu̞/ = alfabbetu
/çɪɾɪ(ɨ)ˈv(ʲ)ɛɖːu/ = ciriveddu
/ˌnku.nʊk.ˈkja.ɾɪ/ = ncununcchiari
/ˈcjum.mʊ/ = chiummu
/ˈpa.ʃɪ/ = paci
/sʊˈʃːaɾɪ/ = susciari
/ˈvɛːk.cju̞/ = vecchiu

These are really just phonetic transcriptions of varying accuracy, as they indicate various allophonic phenomena, such as stressed-open-syllable lengthening (ˈɛː), unstressed vowel reduction (ɪ, ɨ, ʊ), and synchronic palatalization (vʲɛ, cj).

Phonemic transcriptions are only meant to contain distinctive elements or things that are not 'predictable' in a language. The phonemic structure of the above words, for instance, would be something like:

/alfabˈbɛtu/
/tʃiriˈvɛɖɖu/
/nkunukˈkjari/
/ˈkjummu/
/ˈpatʃi/
/suˈʃari/
/ˈvɛkkju/

The accompanying phonetic transcriptions can include as many details as one likes. For the last five words, we would have perhaps something like:

[ˌŋku.nʊc.ˈcjäː.ɾɪ]
[ˈcjum.mʊ]
[ˈpäː.ʃɪ]
[sʊʃ.ˈʃäː.ɾɪ]
[ˈvɛc.cjʊ]

The above is only meant as an example; the important thing is to agree on some way or other of doing things. Ideally this would involve finding a source or two that describes Sicilian phonetics in detail.

Perhaps we could design an automated pronunciation module for Sicilian, such as the one that @Benwing2 has made for Italian, and then have a bot clean up the 1001 Sicilian entries with manually written pronunciations. Nicodene (talk) 01:19, 18 June 2022 (UTC)[reply]

@Nicodene I would agree with this. For some languages such as Russian we include some allophonic phenomena in the phonemic pronunciations but I prefer to separate the phonemic and phonetic variants. Unfortunately I don't know much at all about Sicilian pronunciation or how predictable it is from the spelling. Benwing2 (talk) 01:28, 18 June 2022 (UTC)[reply]
Also ncununcchiari has a pronunciation without the third /n/, and likewise the conjugation table leaves out the third n. I assume there is a mistake somewhere. Benwing2 (talk) 01:30, 18 June 2022 (UTC)[reply]
Yes, not sure what happened with the /n/.
For Russian, it seems we don't have phonemic transcriptions at all, only phonetic ones. Perhaps that was to avoid disagreements over what counts as a phoneme, since e.g. the unstressed vowel mergers complicate things. Also the old controversy over /ɨ/. Nicodene (talk) 02:21, 18 June 2022 (UTC)[reply]
@Nicodene I agree, I think we should firstly create a standard key for broad phonemic transcriptions. The module shouldn't be too hard since Sicilian orthography is mostly phonemic, just like Italian. The things left unwritten in the page title that should be given to the module are stress (vècchiu usually written vecchiu), ç/tʃ-distinction (çiùri, ciàula usually written ciuri, ciaula), the ɖɖ/dd-distinction (cavàddhu, addìu usually written cavaddu, addiu) and syntactic gemination (/a*/ "to" and /a/ "the", both written a).
I'm not sure I agree on whether we should add narrow transcriptions though, as the phonetic realization varies highly depending on the region. Quoting [this site]:
... una certa uniformità del Siciliano scritto esiste già e non si vede la necessità di imporre anche una uniformità del Siciliano parlato.
Catonif (talk) 10:59, 8 July 2022 (UTC)[reply]
@Nicodene. I don't know if this discussion is supposed to have died, so sorry for the ping. This is my proposal for the new Sicilian IPA key. It's a only a stub, and it's highly debetable since it's based on my subjective opinion on what should be treated as a phoneme and what not, so feedback is appreciated. I also wanted to write a couple of examples to clarify, but it got out of hand and now there's 24 examples. Catonif (talk) 21:17, 12 July 2022 (UTC)[reply]
@Catonif Actually, thanks for the reminder. There's so much to do around here that it can be easy to forget things.
I am far from an expert on Sicilian, so my feedback is from a more general perspective.
*/ɪ/ and */ʊ/ do not have phonemic status, since they are in complementary distribution with /i/ and /u/ and hence not distinctive. They can be represented in auto-generated phonetic transcriptions, if we eventually set those up. That would of course require selecting some specific variety of Sicilian as 'standard'.
/kj/ seems a good deal simpler than having both /ç/ and /c/. In that case I'd also suggest /ɡj/.
If we have a geminate phonemic /rr/, then the singleton counterpart should probably just be /r/ phonemically. In other words, length is the relevant distinction, as it is across the Sicilian consonant system.
Representing word-initial /ʃʃ/ is an interesting choice. I assume it behaves the same as the segment at the beginning of Italian sciogliere, which we have represented as single /ʃ/, with an implied rule of intervocalic gemination. To be fair, we should have done the same for the segment in the middle of e.g. coscia (= /ˈkɔʃa/, phonetically [ˈkɔʃ.ʃä]), since there is certainly no /ʃʃ/-/ʃ/ contrast in the standard language. Syllable structure isn't phonemic in either Italian or Sicilian, incidentally, but it seems people like adding the dots. Nicodene (talk) 21:55, 12 July 2022 (UTC)[reply]
@Nicodene Thank you for taking a look at the page. Those are good points, I'll now say my point of view and the reason why I chose to do differently. In general, the key was built up starting from the phonetic pronunciation, rather than starting from the orthography to define phonemes, which is why some things might look unusual.
About /c ɟ/, writing them phonemically as /kj gj/ is just rendering them as the orthography does and it's not giving them the love that they deserve. /c/ indeed is a realization of older /kj/ (eg. chiaru < La. clarum), but it is also the result of /pj/, in which the /p/ shift it's position back to the palate by assimilation of /j/, eg. cchiui (< La. plus) has its first consonant as [c] < [pj] < [pl], which was never in history and is nowhere now pronounced [kj]. Same thing goes from [ɟ] < [ʎ] < [lj] (figghiu) or [ɟ] < [jj] (tri jorna, indipendently /ʈɽi*/ and /jɔɾna/ and together /ʈɽi ɟɟɔɾna/), where a velar [g] seems kind of out of place. In spite of what I just claimed, some AIS maps show /c/ being realized as [kç] in a couple of varieties near Palermo, though inconsistently: other maps use /c/ for the same varieties. /ɟ/ on the other never finds itself in a velar position.
About /ç/, it's a phoneme in standard Sicilian, originating from [ç] < [fj] < [fl] (eg. çiumi < La. flumen) which apperently merged with /tʃ/ outside of West Sicily.
About using single /ʃ/ intervocalically, it seems like it could cause confusion, since while there technically no distinction between /ʃʃ/ and /ʃ/ there is a distinction between [ʃʃ] and [ʃ] (/tʃ/). This could mean that while /ˈkɔʃa/ whould represent "coscia" phonemically, it is also the exact phonetic representation of *cocia [ˈkɔʃa]. This is why I think we should use /ʃʃ/ and /tʃ/, which everybody can recognize, while /ʃ/ stays alone in the corner.
It's the same idea that got me into using the distinct and unambiguous /rr/ and /ɾ/, while /r/ doesn't make it very clear of which one of the two one is talking about, especially at the beginning of a word, if not by checking the key page. It is understandable enough to be acceptable, but if we have the choice to be clearer why not be. /ɾ/ technically doesn't appear word-initially in the current key, but we might want to change it, in case we open up to more 'non-standard' pronunciation like /ɾɪsˈpɛt.tʊ/ dispettu ("trick") not to confuse it with /rrɪsˈpet.tʊ/ rispettu ("respect") or /ˈɾat.tsɪ/ grazzi ("thank you") with /rˈrat.tsɪ/ razzi ("races") (not homophones, though they might look like some if we only use /r/).
Initial /ʃʃ-/ is written like this by analogy of all the other initial geminated consonants. Italian prefers /*ʃ-/ just like it prefers /*k-/ in qua, where Sicilian transcriptions on the other hand seem to always write it fully as /kk-/ (ccà). If /k/ can, /ʃ/ should too.
About /ɪ ʊ/ [i̞ u̞], I used that notion only because it is the one used anywhere, I guess we could just use /i u/.
The dots are mainly for readability. Catonif (talk) 12:15, 13 July 2022 (UTC)[reply]
I am Sicilian mothertongue and I am trying to create, add and specify a lot of mispelled Sicilian entries. I would like to create several templates which can be used to produce automatically the conjugation of as many verbs as possibile. It would be useful even to understand the sounds of Sicilian language and dialects to avoid mispelling and sounds misunderstanding. In modernday spoken Sicilian there are also a lot of incoming italianisms, which are eroding the core of Sicilian pronunciation. Plus: Sicilian speakers are not literate on their own language, which is like a Ukrainian who has been literated only in Russian. Talking about ncununcchiari that was a new entry created by me, but as you can notice it should be ncunucchiari, it was a typo (I'd like to erase a lot of nonsense ones, like this, but I do not know how to do it). Sugnu cca pi dàrivi na manu, si vi serbi (I am here to help if needed). <span style="font-family:'Comic Sans MS';"><b>[[Utente:Scorpios|<span style="color:oucrimsonred">Σκορ</span>]][[Discussioni utente:Scorpios|<span style="color:darkblue">πιός</span>]] (talk) 14:37, 1 August 2022 (UTC)[reply]
@Scorpios90 If you created a page with a typo in the headword, you can use {{d}}. This template will tell sysops to delete the page. Sartma (talk) 10:18, 2 August 2022 (UTC)[reply]

Removal of PWG from etymologies

[edit]

Some users are doing this [[6]], essentially removing the Proto-West Germanic step in etymologies from view, and replacing it with a dercat marker which add it to the category and the bottom of the page. Is this something we want to do ? Leasnam (talk) 13:23, 18 June 2022 (UTC)[reply]

No, we would want the exact opposite: PG in the dercat, and PWG in the etymologies. Thadh (talk) 13:25, 18 June 2022 (UTC)[reply]
That's what I was thinking as well. Leasnam (talk) 13:27, 18 June 2022 (UTC)[reply]
@Leasnam, @Thadh:: thanks for bringing this up. I've noticed that @Hundwine is going through many Old English entries and curiously replacing Proto-West-Germanic with Proto-Germanic, and moving the former into dercat, essentially undoing many of the edits I've been doing (moving Proto-Germanic into dercat and adding in Proto-West-Germanic). I started to put them back in, but so did he, so I gave up for the sake of avoiding an edit war. But I am wondering what Hundwine's reasoning is (if you've got a minute, please let us know?). I, @Victar and a lot of other editors decided a while ago that for non-English lemmas, it's probably best practice to keep the chain as direct and short as possible to avoid inconsistencies in etymologies. DJ K-Çel (contribs ~ talk) 16:00, 21 June 2022 (UTC)[reply]
<<for non-English lemmas, it's probably best practice to keep the chain as direct and short as possible to avoid inconsistencies in etymologies>> - This is an outstanding proposal and I think it should be a practice. I wasn't aware, but this makes very good sense to me. Leasnam (talk) 17:40, 21 June 2022 (UTC)[reply]
To be honest, seeing how often the English etymologies are outdated, I wouldn't be opposed to making this a rule for English, too. Thadh (talk) 18:43, 21 June 2022 (UTC)[reply]
I wonder if something akin to {{desctree}} but for etymologies is possible. Would solve not only the inconsistency issue but would also remove redundancy (always nice) while not forcing the user to click through x articles just to find the desired chain. — Fytcha T | L | C 18:47, 21 June 2022 (UTC)[reply]
I'd have thunk {{desctree}} is for etys. I never understood why bijectivity is not automatically enforced. Nor can I make out, for that matter, what most "direct" is in terms of etymology chains. Do you mean immediate (like atomic)? — This unsigned comment was added by ApisAzuli (talkcontribs) at 09:59, 23 June 2022 (UTC).[reply]
I often thought about a template for each etymology, but could the template be programmed to know where in the chain to stop ? What I mean is, if we created a template that showed the etymology of English 'stone' back to PIE, would the template at Middle English know to only display PIE to Middle English, but at modern English it would show PIE to English ? And what about modern Scots ? Would it be a wholly independent template/tree or would it share part of the English template ? Leasnam (talk) 22:39, 23 June 2022 (UTC)[reply]
Bijectivity isn't always easy. We have blending of words and and/or sources, as French/Latin to English and Pali/Sanskrit to Thai. We also have the issue of words that we can't show to satisfy CFI and homophones. We've also got problems like Pali dvebhāva with Thai meanings 'duplicity' and 'living twice'. I can document roughly the first meaning for Pali, but I'm only confident of finding the meaning of 'living twice' in Thai. (So far I only have it in the Royal Institute Dictionary.) The two meanings may both exist in Pali, but they would only be homophones. --RichardW57m (talk) 13:33, 24 June 2022 (UTC)[reply]

All Words in All Languages

[edit]

From the front page: "It aims to describe all words of all languages using definitions and descriptions in English." Should we really be advertising this motto everywhere? It's misleading and vague, there are too many disagreements. Something I think would be better, while still vague, is "It aims to be the most complete multilingual dictionary using definitions and descriptions in English". Vininn126 (talk) 19:31, 20 June 2022 (UTC)[reply]

What about this is misleading or vague? This is a clear and cogent mission statement for the project, straightforward and to the point. That participants disagree on what constitutes a "word" does not change the ultimate goal. ‑‑ Eiríkr Útlendi │Tala við mig 21:07, 20 June 2022 (UTC)[reply]
Well, it isn't true. We exclude most constructed languages - even those that see/saw real use (e.g. Interlingue, not to be confused with Interlingua). Theknightwho (talk) 21:24, 20 June 2022 (UTC)[reply]
How about "most words in most languages" as a compromise? 98.170.164.88 21:31, 20 June 2022 (UTC)[reply]
Several words in a number of lanugages! Vininn126 (talk) 21:37, 20 June 2022 (UTC)[reply]
Unfortunately, it's come up way too many times in discussions (including when I used to use it myself, but now I'm more disillusioned and more aware of how Wiktionary really works), to the point where it's really misleading at this rate, and people literally treat it as policy. Even without the issues of WT:CFI, RFDs sometimes come down to literally opinions of editors with no policy basis, with words being deleted. We don't include every possible string of characters that could have a meaning. It's time that we update that intro to actually reflect the practices of the website. We also definitely don't treat each language the same, but that gets more into another discussion. AG202 (talk) 21:41, 20 June 2022 (UTC)[reply]
I don't see any of this as really changing the basic mission of the project.
  • Re: conlangs, we don't exclude them outright as a matter of policy, we just organize them differently (in appendices, not in mainspace).
  • Re: "every possible string of characters that could have a meaning", no, that comes down to ideas about what is a "word" (i.e. "term" for purposes of CFI). For instance, we don't include English white house, which is a "string of characters that could have a meaning", since this is SOP -- easily discernible from its constituent parts. But we do include White House, which is also a "string of characters that could have a meaning", since this is not SOP -- not easily discernible from its constituent parts.
The suggested changes to the mission statement introduce ambiguity, as best I can see. ‑‑ Eiríkr Útlendi │Tala við mig 00:27, 21 June 2022 (UTC)[reply]
It's not even just SOP-ness, WT:CFI's alignment on place names, celestial objects, brand names, etc etc don't even align with each other, and as mentioned some RFDs literally boil down to whether editors like them or not (which is fine at this rate). Thus, saying that we include all words in all languages doesn't really feel like the goal anymore, and it definitely should not be encouraged to be used in policy discussions. AG202 (talk) 01:30, 21 June 2022 (UTC)[reply]
This is a very abstract discussion. "Concrete example" is string of characters that does have a meaning. Do you feel to consider this meaningful string not inclusion-worthy fails our mission? Should we also include "eat dogfood", "three little mice" and "however poorly"? Otherwise, can you give a few concrete examples of how our policies betray our mission?  --Lambiam 11:21, 21 June 2022 (UTC)[reply]
Aside from the SOP question, we exclude plenty of proper nouns which are clearly not SOP. Generally, these are only included if idiomatic, despite the fact that their names are often not SOP.
To be clear, I don't think we should include brand names (unless there is some good reason to justify their inclusion), but that's beside the point. Theknightwho (talk) 12:35, 21 June 2022 (UTC)[reply]
Exactly ^ I specifically said it's not just SOP-ness as well. It's kinda weird to include every possible neighborhood and dividing line, but not include parks and beaches. We also exclude comets as it turns out, but we include meteor showers? Also Batman vs the deleted Spider-Man, also the deletion of non-Canadian even though WT:COALMINE should have applied, etc. etc. Some of the policies I'm fine with now, but our guidelines should definitely be updated. AG202 (talk) 12:59, 21 June 2022 (UTC)[reply]
@AG202: You are right that the deletion of non-Canadian was pretty weird. Unlike most WT:Idiom criteria, WT:COALMINE has actually been affirmed by a vote. It has been on my mind for a while now that we need a policy clearly stating that informal votes (RFD, BP) may not override voted-on policies. That would also have allowed me to speedily close Talk:everypony because "exclusively used within a niche group" is obviously of complete irrelevance to CFI. FYI @WordyAndNerdy.Fytcha T | L | C 14:05, 21 June 2022 (UTC)[reply]
@Fytcha On this point, it's frustrating to see slippery slope arguments being made time and again (e.g. with non-Canadian), when WT:CFI explicitly disallows them on the basis that terms need to be attestable. Theknightwho (talk) 00:27, 22 June 2022 (UTC)[reply]
Mission statements are meant to be aspirational and broad, so as to capture the spirit of the mission in a pithy manner. The Red Cross (in the USA anyhow) has the mission statement "... prevents and alleviates human suffering in the face of emergencies ...", but I don't think they actually expect to prevent all human suffering related to every emergency, even though the mission statement could be read that way. Their aim is to do as much of what they do as they can, within their scope, within their budget, and within reason. Same with all words in all languages, it is aspirational (we aren't done yet), and doesn't define the full scope of project (what is a word, what is a language, how much is all). It is still a pretty good mission statement. - TheDaveRoss 12:26, 21 June 2022 (UTC)[reply]
I agree it should be aspirational, but note they are lacking "all". And the problem is many people do join thinking it's literally everything, whereas you are claiming people understand it differently for the Red Cross. Perhaps they do, but we still deal with this issue regularly. My goal with the change is to remain aspirational yet not misleading. Vininn126 (talk) Vininn126 (talk) 12:28, 21 June 2022 (UTC)[reply]
IMO we can get away with this. Coca-Cola might advertise its drink as "the coolest, tastiest beverage" -- they aren't gonna say "hang on, should we say 'it depends on your mood, and various regional statistics'" -- no, bc they can get away with it. So can we. The spirit is there. If we limit the definitions of "word" and "language" we do it in good faith, and largely prevent gibberish by doing so. Equinox 20:21, 21 June 2022 (UTC)[reply]

Twitter

[edit]

The Twitter account @WiktionaryUsers ("Writing on behalf of the http://en.wiktionary.org community") last posted in 2013. Do we know who has the password for the account, or can we usurp it? Is there another Twitter account promoting (en.)Wiktionary? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:09, 21 June 2022 (UTC)[reply]

Got a very vague feeling Wonderfool set this up after I'd been talking about how much I disliked Twitter. Equinox 12:31, 21 June 2022 (UTC)[reply]
'Twas indeed Wonderfool, who's longer got access to the account. Zumbacool (talk) 09:09, 23 June 2022 (UTC)[reply]

Desktop Improvements update

[edit]
Making this the new default

Hello. I wanted to give you an update about the Desktop Improvements project, which the Wikimedia Foundation Web team has been working on for the past few years. Our work is almost finished! 🎉

We would love to see these improvements become the default for readers and editors across all wikis. In the coming weeks, we will begin conversations on more wikis, including yours. 🗓️ We will gladly read your suggestions!

The goals of the project are to make the interface more welcoming and comfortable for readers and useful for advanced users. The project consists of a series of feature improvements which make it easier to read and learn, navigate within the page, search, switch between languages, use article tabs and the user menu, and more. The improvements are already visible by default for readers and editors on more than 30 wikis, including Wikipedias in French, Portuguese, and Persian.

The changes apply to the Vector skin only, although it will always be possible to revert to the previous version on an individual basis. Monobook or Timeless users will not notice any changes.

The newest features
  • Table of contents - our version is easier to reach, gain context of the page, and navigate throughout the page without needing to scroll. It is currently tested across our pilot wikis. It is also available for editors who have opted into the Vector 2022 skin.
  • Page tools - now, there are two types of links in the sidebar. There are actions and tools for individual pages (like Related changes) and links of the wiki-wide nature (like Recent changes). We are going to separate these into two intuitive menus.
How to enable/disable the improvements
Global preferences
  • It is possible to opt-in individually in the appearance tab within the preferences by selecting "Vector (2022)". Also, it is possible to opt-in on all wikis using the global preferences.
  • On wikis where the changes are visible by default for all, logged-in users can always opt-out to the Legacy Vector. There is an easily accessible link in the sidebar of the new Vector.
Learn more and join our events

If you would like to follow the progress of our project, you can subscribe to our newsletter. You can read the pages of the project, check our FAQ, write on the project talk page, and join an online meeting with us.

Thank you! SGrabarczuk (WMF) (talk) 16:59, 21 June 2022 (UTC)[reply]

Join us on Tuesday

Join an online meeting with the team working on the Desktop Improvements! It will take place on 28 June 2022 at 12:00 UTC and 19:00 UTC on Zoom. Click here to join. Meeting ID: 5304280674. Dial by your location. The following events will take place on 12 July and 26 July.

The meeting will not be recorded or streamed. Notes will be taken in a Google Docs file and copied to Etherpad. Olga Vasileva (the Product Manager) will be hosting this meeting. The presentation part will be given in English. At this meeting, both Friendly space policy and the Code of Conduct for Wikimedia technical spaces apply. Zoom is not subject to the WMF Privacy Policy.

We can answer questions asked in English and a number of other languages. If you would like to ask questions in advance, add them on the talk page or send them to sgrabarczuk(at)wikimedia.org. We hope to see you! SGrabarczuk (WMF) (talk) 21:44, 23 June 2022 (UTC)[reply]

The State of WT:RFDN

[edit]

The page is getting way too unwieldy. It takes forever to load and even sending a reply using the auto-reply feature takes ages to load, and I can't even imagine trying to use it on mobile. The page length is over 500,000 bytes which is longer than any other article/page that I'm aware of. Maybe it's due time for another split like was done with WT:RFVCJK? Also, on another note, is it possible to suggest limiting RFDs/RFVs that involve a TON of words at one time? They're a pain to archive to begin with even with the bot, and when they start to involve some words that get deleted and some words that stay, they become almost impossible to keep track of, only making them stay longer. A very prominent example of this is with Wiktionary:Requests for deletion/Non-English § Old English pseudo-prefixes, where there are over 50 words in one RFD. While I get why it may be better to group them all together, it does make it harder to get through. AG202 (talk) 19:25, 21 June 2022 (UTC)[reply]

It would make sense to split by language, like Wiktionary:Requested entries. Anyone who cares about certain languages would be more likely to see all he should then. Fay Freak (talk) 19:53, 21 June 2022 (UTC)[reply]
That's not a good idea. In my eyes, this would ultimately lead to languages without active admins being completely ignored and in a state of anything-goes. We already have enough trouble with people adding nonsense in smaller languages, let's not make it easier for them to do. Thadh (talk) 20:56, 21 June 2022 (UTC)[reply]
It practically already happens if admins know there is currently hardly anyone reading these pages and understanding the language, I think of Albanian, but I have not thought that all thousands would get a page but we would split by a few dozens and some groups and “all the rest“, and also it would make sense to dynamically show the number of entries on the overview page so backlogs are spotted. Fay Freak (talk) 22:09, 21 June 2022 (UTC)[reply]
I love this heading, as it makes me think of somebody (probably with Brit accent) saying "the absolute state of it"! As you state, we previously broke out CJK languages for basically the same reason of size. Obviously what we really need is some awful "infinite scroll" Web design (NO, please don't, it screws up search and everything). Yeah, break it again... but bring some statistics telling which languages are the best ones to break out. Equinox 20:23, 21 June 2022 (UTC)[reply]
Splitting has proven reasonably practical. As Eq says, the big practical question is how to do the next split. By character set family? By language family? Other? DCDuring (talk) 20:34, 21 June 2022 (UTC)[reply]
It is also reasonable to see it as merely a problem of scale: suppose that Wiktionary got really popular overnight and suddenly WT:REE had to be broken into a load of pages by year, oh lol wait, that happened. But yeah, there might be other solutions. I am just extremely suspicious of any shit like LiquidThreads that claims to make things "easier" (and gives a superficially nicer-to-use user interface) but ends up making things hidden or demoted. Equinox 22:56, 21 June 2022 (UTC)[reply]
When we split WT:RFVN, my suggested splits were CJK and Romance. I suggest we do something similar here. Possibly "Romance" should include Latin, in which case it should maybe be named "Latinate" or "Italic" to make it clear it's not just Romance languages per se. Benwing2 (talk) 03:01, 22 June 2022 (UTC)[reply]
What about splitting it into periods of several months rather than years? Nicodene (talk) 07:26, 22 June 2022 (UTC)[reply]
That is pretty close to the vibe I was going for honestly. Glad that someone appreciates it. AG202 (talk) 03:04, 22 June 2022 (UTC)[reply]

To try to pull "action points" from the above (it's like being in the meeting at work, innit!): Benwing2 suggested sub-splitting by language, as Romance, or Latinate, or Italic; and Nicodene suggested temporal splitting by months instead of years. Which will we do and why? Do we have useful statistics to prefer one? Or should we try both and see what works? Equinox 12:02, 22 June 2022 (UTC)[reply]

I'd personally lean on the side of separating out CJKV at the very least for RFD. And then maybe doing the same for Romance/Italic and so on and so forth. AG202 (talk) 21:04, 22 June 2022 (UTC)[reply]
@AG202, Equinox, Nicodene For reference, I computed a few statistics (all are approximate):
  • There are about 265 level-2 headers in WT:RFDN, which increases to 376 counting level-3 and level-4 headers.
  • Splitting CJK out would remove about 66 entries.
  • Splitting "Italic" (Romance+Latin) out would remove about 64 entries.
  • Splitting both CJK and "Italic" out would remove about 130 entries, or about half of the entries.
Benwing2 (talk) 03:43, 23 June 2022 (UTC)[reply]
I'm in favour. Nicodene (talk) 04:28, 23 June 2022 (UTC)[reply]
Apart from Nicodene above, do our Latin and Romance editorbases actually overlap? If not, it may not be a good idea to put them together, I would imagine Latin and Greek might have a lot of overlap as well. Thadh (talk) 21:55, 23 June 2022 (UTC)[reply]
Even if they do, that doesn't mean they will in the future. Vininn126 (talk) 22:13, 23 June 2022 (UTC)[reply]
@AG202, Equinox, Nicodene, Vininn126 OK if no one objects in the next couple of days I'll split out the CJK entries; we can then decide whether to split out the Romance (and maybe Latin) entries. Benwing2 (talk) 03:44, 25 June 2022 (UTC)[reply]
That sounds good to me! AG202 (talk) 02:29, 29 June 2022 (UTC)[reply]
As a side note, it seems that WT:RFVN is somehow worse, and that's with the split into WT:RFVCJK. I honestly think that this is a pristine indicator of the state of Wiktionary when it comes to languages that don't fit under the major languages or that don't have a stable source of editors. It really does feel like the project doesn't cultivate as much of a welcoming environment as it should and that there aren't really a stream of new editors that want to deal with the inner workings like RFVs and RFDs. That's something that needs to change in the future if we want to fix issues like these long-term. AG202 (talk) 02:40, 29 June 2022 (UTC)[reply]
@AG202, Equinox, Nicodene, Vininn126 I have split out WT:RFDCJK. IMO we still need to split out Italic or Romance. User:AG202's point about WT:RFVN is also well-taken; we should do the split there too. Should the split be Romance or Italic, and if the former, should we split out "Classical" languages (Latin+Ancient Greek)? Does this latter split make any sense? Does it make more sense to group Ancient and Modern Greek (if we were to split out Modern Greek)? Benwing2 (talk) 01:14, 2 July 2022 (UTC)[reply]
Could just make one grouping for all of these. In my experience at least there is considerable overlap, not least because the languages in question had plenty of mutual borrowing. Nicodene (talk) 06:37, 2 July 2022 (UTC)[reply]
@Nicodene One grouping for all of what? Italic + Greek? Benwing2 (talk) 00:15, 3 July 2022 (UTC)[reply]
Yes, I think Italic + Greek is similar to Chinese-Japanese-Korean in terms of overlap. Nicodene (talk) 05:45, 3 July 2022 (UTC)[reply]
@AG202, Equinox, Nicodene, Vininn126 OK, unless there are objections over the next couple of days, I'm going to split out Italic+Greek into WT:RFDIG and WT:RFVIG. (Or should it be WT:RFDGI etc. to keep alphabetized? I prefer putting Italic first since it represents the lion's share of challenged terms.) Benwing2 (talk) 08:14, 4 July 2022 (UTC)[reply]
Heh, we're becoming the new "alphabet mafia". I think it's fine, as long as we make sure that there is some root "entry point" offering easy access to the language being sought. (Look at Wiktionary:Requested entries for a decent example. It's daunting once, but then you can bookmark it, or remember the shortcuts.) Ideally, it would be great to get some user feedback in a few months about whether the page splitting has helped or annoyed people, and proceed based on that. Remember to love your users. We're all users. Equinox 20:07, 5 July 2022 (UTC)[reply]
Ah!! In particular, when someone drops the RFV or RFD tag into an entry, I suppose we can't make it point to the correct page, can we...? (Because the pages use our special markup, and the language sectioning is quite arbitrary.) But if we can't, that's a very serious problem: you'd add an RFV or RFD tag and click it and end up... where? Not in the right place? Equinox 20:09, 5 July 2022 (UTC)[reply]
@Equinox: We can, because the RFD/RFV templates have the language code. See diff that I did following the split. — Fytcha T | L | C 20:15, 5 July 2022 (UTC)[reply]
Could you please also add "ko-ear" (code for Early Modern Korean) to those templates as well? AG202 (talk) 23:37, 5 July 2022 (UTC)[reply]
@AG202: Done Done. — Fytcha T | L | C 10:57, 6 July 2022 (UTC)[reply]
@Benwing2 That's fine with me, though I could easily see someone saying that it should be "Hellenic" instead of "Greek" in line with Italic, and since I'd assume that it'd include other Hellenic languages (and now I'm starting to agree a bit). AG202 (talk) 23:39, 5 July 2022 (UTC)[reply]
@AG202 Sure, I can do that. Benwing2 (talk) 02:22, 7 July 2022 (UTC)[reply]

On a related note, I've now revamped the RFD/RFV templates so that all the logic for which language goes to which subforum is now in one central location: Module:request-forum. I did this because, one, I noticed that there are some discrepancies in our templates (Special:Diff/67657314) and, two, because it is now much easier to update. I would appreciate having an extra set of experienced eyes (@Benwing2) go over my work to check if I forgot anything. — Fytcha T | L | C 11:53, 6 July 2022 (UTC)[reply]

@Fytcha It all looks good to me. Thanks for doing this. Benwing2 (talk) 02:21, 7 July 2022 (UTC)[reply]

Is there anyway to make these examples more relevant in policy? It seems like other than WT:COALMINE (which to be fair, was voted upon officially) and the examples that @Theknightwho has been mentioning, they really aren't brought up in RFD discussions/votes. And even with WT:COALMINE, it isn't even followed (see: Talk:non-French). Are they supposed to be used or not? I get that other than WT:COALMINE, they're supposed to be guidelines, but even then barely anyone is using them. Another example is Talk:internalized homophobia, which should've passed with WT:JIFFY as it seems like it came before that meaning of internalized, but it wasn't mentioned once (and I wasn't aware of it either). AG202 (talk) 11:47, 22 June 2022 (UTC)[reply]

I also want to have the discussion of why non-Canadian was deleted even though it is a coal mine. In my opinion, RFD and BP votes should never have the power to override formal votes. — Fytcha T | L | C 11:54, 22 June 2022 (UTC)[reply]
This page is full of people going on about non-Canadian coalmine but I don't see any noncanadian entry, what is the rationale?? Equinox 11:59, 22 June 2022 (UTC)[reply]
Theknightwho mentioned how there's evidence of it being used as a single word, no hyphen. I'd personally be hard-pressed to create an entry if I know that it's going to get deleted in the end. AG202 (talk) 12:01, 22 June 2022 (UTC)[reply]
We found durably archived uses online. [7] is one example. Theknightwho (talk) 12:01, 22 June 2022 (UTC)[reply]
[8] is another. Admittedly most results on GBooks are scannos. Theknightwho (talk) 12:08, 22 June 2022 (UTC)[reply]
[9] Cited. Theknightwho (talk) 12:11, 22 June 2022 (UTC)[reply]
I don't think it is really relevant whether nonCanadian exists as a cited article or not given that the very first page of Google Books already contains 3 valid citations (meaning it is patently attestable, a point already raised in Talk:non-French). But anyway, I went ahead and created it. Does that mean I can now restore non-Canadian against the result of the RFD? — Fytcha T | L | C 12:18, 22 June 2022 (UTC)[reply]
I'm going to be brave and recreate non-Canadian, because nonCanadian should be noted as an uncommon form. Theknightwho (talk) 19:24, 22 June 2022 (UTC)[reply]

Making it easier to add several descendants

[edit]

At the moment, if you want to add several descendants in a given language, you have to write them out in a rather tedious way. For instance, the Sardinian descendants under Latin placēre are written like this:

{{desc|sc|piachere}}, {{l|sc|piaghere}}, {{l|sc|piazeri}}, {{l|sc|piaceri}}, {{l|sc|piasgè}}, {{l|sc|plexeri}}, {{l|sc|praxeri}}, {{l|sc|praxei}}, {{l|sc|prexeri}}, {{l|sc|prexei}}, {{l|sc|pregheri}}, {{l|sc|prexu}}, {{l|sc|paciurru}}, {{l|sc|piciurru}}, {{l|sc|peciurru}}

I propose to edit the {{desc}} template, or perhaps add a variant of it that works like this:

{{desc|sc|piachere|piaghere|piazeri|piaceri|piasgè|plexeri|praxeri|praxei|prexeri|prexei|pregheri|prexu|paciurru|piciurru|peciurru}}

This requires less than half as many characters and is far easier to work with.

As it happens, that is exactly how our template for alternative forms works (see, for instance, the entry for Catalan ovella). In theory, then, one can always create an entry for a given language, fill it with alternative forms, and then use {{desctree}} for it in the descendants section of the ancestor language. And in fact that is what I've resorted to lately. But one does not always have the time, inclination, or knowledge to create decent entries in a variety of languages, and it drastically slows down the process of adding descendants. Nicodene (talk) 08:20, 23 June 2022 (UTC)[reply]

Why don't you just use {{desc|sc|piachere|alts=1}}? (assuming the template page itself uses the {{alt}} template). Thadh (talk) 08:23, 23 June 2022 (UTC)[reply]
Yes, that works too, but see the last paragraph. Nicodene (talk) 08:27, 23 June 2022 (UTC)[reply]
Ah, I see. Thadh (talk) 08:33, 23 June 2022 (UTC)[reply]
@Nicodene, Thadh I actually implemented this exact thing about a year ago and wrote a script to convert existing {{desc}} pages that are incompatible with the new format (i.e. that use numbered params 3 or 4) to the new format, but Victar vetoed and reverted it, as he was wont to do, with no clear explanation. I planned to bring it the the Beer Parlour to get consensus but never got around to it. I think it is time to resurrect that code. Benwing2 (talk) 06:22, 24 June 2022 (UTC)[reply]
It does seem like a natural change to make. Nicodene (talk) 09:15, 24 June 2022 (UTC)[reply]
Unless there are objections, in the next day or so I'll implement this, following a plan something like this:
  1. Introduce an |alt= param in {{desc}}.
  2. Convert existing uses of |3= in {{desc}} to |alt=, and existing uses of |4= in {{desc}} to |t=. (There aren't actually very many uses of either parameter, esp. |3=, compared with the number of uses of {{desc}}.)
  3. Disallow using |3= and |4= in {{desc}}, so I can find any remaining uses of these params that weren't caught in the previous step.
  4. After a day passes, introduce the new format, where |3=, |4=, |5=, etc. are additional terms, similar to {{alt}} and {{syn}}. Benwing2 (talk) 16:45, 26 June 2022 (UTC)[reply]
I like the proposal. For instance, I had found the way that Beijing Peking and Pei-ching are formatted in the Descendants section of 北京 (Běijīng) as wasteful/silly/inefficient/whatever. --Geographyinitiative (talk) 17:03, 26 June 2022 (UTC)[reply]
@Nicodene I pushed the new code to production. You can now use multiple terms with {{desc}}. Note that you can now specify properties like |bor1=, |unk2=, |der3=, etc. on individual terms, as well as |bor=, |unk=, |der= etc. that apply to all terms. I also added a property |inh=/|inh1=/|inh2=/etc. to make it clearer that something is inherited (esp. if you mix inherited and borrowed terms). I will update the documentation appropriately. Benwing2 (talk) 00:33, 3 July 2022 (UTC)[reply]
Also I have a script to convert existing multi-term usages to the multi-term {{desc}} format, which I will run soon (there are about 25,000 pages that can be fixed up in this fashion). Benwing2 (talk) 00:36, 3 July 2022 (UTC)[reply]
@Nicodene I am wavering over whether the |inh=/|inh1=/|inh2=/etc. property should only add the word "inherited" after the term (or all terms), or should also add a ">" before the term (or language name, in case of all terms) with a tooltip "inherited". The ">" with tooltip "inherited" is currently used when |unc= is specified without |der=. Benwing2 (talk) 01:35, 3 July 2022 (UTC)[reply]
See User:Benwing2/test-desc for a couple of examples. Benwing2 (talk) 01:40, 3 July 2022 (UTC)[reply]
Thank you, that is a huge QOL improvement. No more having to spam {{l}}.
As for |inh=, I'm one of those who prefer to separate borrowings entirely, placing them together at the bottom of the descendants section rather than mixing them with inherited forms. So I don't have any particular opinion on that parameter. Nicodene (talk) 05:50, 3 July 2022 (UTC)[reply]
@Nicodene The other change I'm planning on making is to switch |q=, |q1=, |q2=, ... to place its text before rather than after the term, and introduce |qq=, |qq1=, |qq2=, ... to specify a qualifier that goes after the term. This use of |q= and |qq= is consistent with {{syn}}, {{ant}}, etc. and the general placement of |q= qualifiers before rather than after a term is the majority usage everywhere. When I make this switch, I'll first introduce |qq=, then switch existing usages of |q= to use |qq= (not too many of them since |q= has never been documented), then make |q= an error, then wait a few days and re-introduce |q= with the new semantics (similar to the change to multiple terms in {{desc}}). Benwing2 (talk) 07:33, 3 July 2022 (UTC)[reply]
I find it more intuitive to place qualifiers after, so I'm glad we'll have both options. Also thank you for adding the q1, q2, etc. functionality. Nicodene (talk) 12:45, 4 July 2022 (UTC)[reply]

@Benwing2: these are important improvements, thank you. Ideally, {{desc}} would work with dialect modules like {{alter}}, but I don't know if that's feasible. --Vahag (talk) 12:02, 3 July 2022 (UTC)[reply]

@Vahagn Petrosyan Are you suggesting that it work like {{alter}} in that any numbered params after a blank param are qualifiers or dialect tags? That could be done, using the same data modules as for {{alter}}. Should they be displayed at the end after a dash? Benwing2 (talk) 19:49, 3 July 2022 (UTC)[reply]
The blank parameter would work with several descendants only if you could use it multiple times, e.g. {{desc|hy|descendant1||dialecttag|dialecttag||descendant2||dialecttag|dialecttag|dialecttag}}. You can't do that with {{alter}}. The dialect tags should probably be displayed like the usual qualifiers given with |q= or |qq=. Vahag (talk) 20:15, 3 July 2022 (UTC)[reply]
I also would have needed this now and then; this also solves the largest part of the problem that {{desctree}} cannot fetch multiple {{alter}}. Fay Freak (talk) 21:58, 3 July 2022 (UTC)[reply]
I am also here to voice my agreement, since less clunky code entails less blunders in the actual data as the editors see better what they enter, and now reading this I also acknowledge Nicodene’s explicit argument that it makes copying between alternative forms and descendants more straightforward. Fay Freak (talk) 21:58, 3 July 2022 (UTC)[reply]
@Vahagn Petrosyan, Fay Freak Another, maybe less clunky solution for the dialect tags is to use a param |tag1=, |tag2=, etc. Then you would say
{{desc|hy|descendant1|tag1=dialecttag,dialecttag|descendant2|tag2=dialecttag,dialecttag,dialecttag}}
The tags are comma-separated with no space after the comma; that way, if for some reason you need to enter a literal comma, it won't be recognized as a tag separator as long as there is a space after it. In practice, since as you suggest the tags should be displayed like qualifiers, there probably isn't a need to have literal commas in any case since you can always use the qualifier params. Benwing2 (talk) 01:27, 4 July 2022 (UTC)[reply]
I like that solution. Vahag (talk) 15:00, 4 July 2022 (UTC)[reply]
@Vahagn Petrosyan I implemented this. You can write something like this:
  • {{desc|hy|մուկ|մուկը|tag2=տփ}}
or equivalently this:
  • {{desc|hy|մուկ|մուկը<tag:տփ>}}
The latter form uses what I term inline modifiers; essentially the same syntax is supported by {{syn}}/{{ant}}/etc. along with {{col2}}/{{col3}}/{{col4}}/{{der2}}/{{der3}}/{{der4}}/etc.
Both produce this:
  • {{desc|hy|մուկ|մուկը<tag:տփ>}}
You can also attach the tag logically to the whole collection of descendants like this:
  • {{desc|hy|մուկ|մուկը|tag=տփ}}
which produces this:
  • {{desc|hy|մուկ|մուկը|tag=տփ}}
You can put multiple comma-separated tags as long as there isn't a space after the comma:
  • {{desc|grc|παραγίνομαι|tag=ion,post-Classical}}
which produces this:
  • {{desc|grc|παραγίνομαι|tag=ion,post-Classical}}
You can also mix term-specific and non-term-specific tags if you really want, e.g.:
  • {{desc|grc|ἐμεωυτοῦ<tag:ion>|παραγίνομαι<tag:ion,post-Classical>|tag=rare}}
which produces this:
  • {{desc|grc|ἐμεωυτοῦ<tag:ion>|παραγίνομαι<tag:ion,post-Classical>|tag=rare}}
Benwing2 (talk) 03:10, 6 July 2022 (UTC)[reply]
@Benwing2: this is great, thank you! Is this the correct syntax if I don't want the form to be linked? Also, since {{syn}} etc. use the same syntax, could the same functionality be added to them? Vahag (talk) 13:48, 6 July 2022 (UTC)[reply]
@Vahagn Petrosyan: Yes, that is the right syntax. I added dialect tags to {{syn}} etc. using the same syntax as above (either with separate params |tag1=, |tag2=, ... or inline modifiers). Benwing2 (talk) 02:14, 7 July 2022 (UTC)[reply]
@Benwing2: Please update the template documentation for clarity. “Inline modifiers are supported” is hardly a comprehensive statement of what is supported; particularly concerning the tag syntax as opposed to qualifiers. I think you just added the syntax but not what can filled into that syntax; and I see that there are added parameters |tag= and |tag1= not in the parameter list. Fay Freak (talk) 11:31, 13 September 2022 (UTC)[reply]

collocations

[edit]

Hi. I made a few changes like here which are possibly an improvement. I plan to do something similar to these entries which have useful information. Any suggestions on a better format before I plow ahead with them? Zumbacool (talk) 14:30, 23 June 2022 (UTC)[reply]

Looks good to me. — Fytcha T | L | C 18:29, 23 June 2022 (UTC)[reply]
More is better. DCDuring (talk) 21:11, 23 June 2022 (UTC)[reply]
I think this is a good idea. Not fond of the original format. — Sgconlaw (talk) 06:29, 24 June 2022 (UTC)[reply]
Or even better, I hope to inspire someone else to make these changes. I'll be away from Wiktionary for the next 7 years. Zumbacool (talk) 09:04, 24 June 2022 (UTC)[reply]

How to merge history of dictionary entry content at the wrong lemma form

[edit]

The entry sortilegum currently has a misplaced definition: the noun meaning "soothsayer" in Latin actually has the nominative singular form sortilegus. Sortilegum is just its accusative singular form (reference: Logeion sortilegus). What is the best way to move the definition to the correct page?

The page on Template:merge says "Mergers of dictionary entries are inappropriate since each spelling gets its own page." The issue here is not that I want the spellings to be treated the same, but that I want to keep proper attribution for who contributed the definitions while moving them to the correct form. Do I need to request a history merge for the entries? What should I do in the meantime? Urszag (talk) 18:00, 23 June 2022 (UTC)[reply]

@Urszag I would simply copy the content of the entry, and when saving it on the new page, leave a link to the old page in the edit summary: "Content copied from [[sortilegum]]". Since the old page will be converted to a non-lemma form, its revision history will remain accessible for attribution purposes. This, that and the other (talk) 02:26, 24 June 2022 (UTC)[reply]

Where should the "diminutive of" template go?

[edit]

I have seen the template "diminutive of" placed in the following positions:

- definition line: this seems most common when it by itself is used as the full definition, e.g. many Dutch examples such as aandeelhoudertje.

- etymology section: In Latin, where I edit the most, I prefer to use the "af" template to specify which of the several Latin diminutive suffixes was used in the formation of the word. I guess both templates could be used alongside each other in this section.

- on the headword line, e.g. fabella. I copied this formatting from other entries where I had seen it, but I think I saw recently that this should not be used because it doesn't play nice with bots. Can anyone confirm that?

My current impression is that the best place would nearly always be the definition line, and a further explanation if needed can be added after a colon. For example, I edited tusculum to have the following definition line: "Diminutive of tūs: a small amount of incense". Is this the format I should follow in the future for entries that I work on? Urszag (talk) 18:25, 23 June 2022 (UTC)[reply]

Read the documentation. The definition line.
"This template is not meant to be used in etymology sections."
Ideally the headword should have a parameter for diminutives. Vininn126 (talk) 18:26, 23 June 2022 (UTC)[reply]
Templates that end in "of" are always meant for definition lines. — Fytcha T | L | C 18:28, 23 June 2022 (UTC)[reply]
Thanks, oops, I didn't check that. I guess the only thing I have left to wonder about is whether the formatting with a colon and then further explanation is standard and correct to follow.--Urszag (talk) 18:29, 23 June 2022 (UTC)[reply]
It's common. I'd say if you just want to state the obvious (i.e. repeat what is in the linked-to article anyway) then don't do it but if there is something non-obvious about that particular form then yeah, feel free — Fytcha T | L | C 18:37, 23 June 2022 (UTC)[reply]

WT:ETY lemma form clause

[edit]

In my opinion, we should add the following clarification to WT:ETY (i.e. Wiktionary:Etymology) stating what should be obvious, but apparently isn't:

The etymology for a term should not discuss details regarding the particular lemma form used, unless there are noteworthy details to mention. For example, with languages that lemmatize verbs under infinitive forms, the verb etymology should not describe the etymology of the infinitive form (including an infinitive suffix), but of the verb in general. This is because the choice of which form to use as that lemma has no influence on how the word itself has developed (outside of some rare circumstances).

This does not apply to English where nouns and verbs are generally zero-marked in their lemma forms, but they aren't in most languages. In Finnish, verbs are lemmatized under the first infinitive forms (ending in -a, -da, -ta or their front-vowel equivalents), but our etymology sections shouldn't focus on them. tehdä has the etymology for the verb with the stem teke-, not for the (first) infinitive form "tehdä", and so on. Lots of Polish entries list -ować as a derivational affix for loanwords, yet from the inflectional table it is apparent that this suffix only appears in the infinitive form, and it seems the suffix does not appear at all in the actual verb stem. I'm sure there are other examples as well. — SURJECTION / T / C / L / 22:42, 23 June 2022 (UTC)[reply]

I agree on all counts (compare also less obvious exsmples like Ingrian obižoittaa).
I'm also wondering if we shouldn't all chip in and finally finish WT:ETY so it can become policy (this is a good example of a point that may ultimately need enforcing, but doesn't have a natural place to be noted). Thadh (talk) 22:50, 23 June 2022 (UTC)[reply]
WT:EL (official policy!) already even links to WT:ETY so yes, turning WT:ETY into an official policy should be a priority. — SURJECTION / T / C / L / 22:52, 23 June 2022 (UTC)[reply]
Note that this is also the reason it is standard in works discussing etymology to use stems as opposed to lemmas in languages where they are different. Basically all works discussing etymologies of Finnish verbs (at least those that are not written for consumption by the general public) use the verb stem (teke-), not the lemma (the (first) infinitive form, tehdä). — SURJECTION / T / C / L / 22:52, 23 June 2022 (UTC)[reply]
For languages with multiple infinitive morphemes, we would lose the frequency data of how often each one has been used constructively, see Romanian -i/-a/ (the second one is not filtered by function yet). Not sure if we care about this, just pointing it out.
Another point is that, if we want to analyze German abspecken as only ab- +‎ Speck, then analyzing formalisieren as formal +‎ -isieren would be inconsistent whereas formal +‎ -isier would be nonstandard (DWDS and Duden recognize the suffix as including the infinitive ending -en). — Fytcha T | L | C 23:50, 23 June 2022 (UTC)[reply]
If the Romanian verb suffixes have no difference in how they affect the verb stem (only the infinitive) and are parallel forms, then yes, on this argument, which one was used shouldn't be documented. But without knowing any better, I think -i and -a do have some differences in that regard.
We wouldn't write formal +‎ -isier in formalisieren even with this proposal. It is true that formalisier- is the stem, but we're linking to a suffix here and it makes sense to use the lemma in such a case. That doesn't mean we're documenting the infinitive form - we are still documenting the verb, but using the "lemma form" of the suffix to represent that suffix. — SURJECTION / T / C / L / 07:35, 24 June 2022 (UTC)[reply]
Basically the same as the above point, but in many fusional European languages, it's common for learners' resources to teach as "endings" portions of the verb that actually contain part of the verb stem. For example, Latin learners are often taught about verbs taking different "suffixes" -āre, -ēre, -ere, -īre, and -ō, -eō, -(i)ō, -iō depending on their conjugation class, but from a more morphology-oriented perspective, it makes more sense to regard the infinitive suffix as just -re and to treat the "thematic vowels" -ā-, -ē-, -ī- etc as part of the stem (and in some cases, that vowel is etymologically a suffix to the base root). These kind of subtleties might not be apparent just from the guide "should not describe the etymology of the infinitive form (including an infinitive suffix)". I'm thinking in particular of the large class of denominative first-declension Latin verbs in -āre/-ō: arguably, these are not cases of zero derivation, but cases of suffixation with the theme vowel -ā- (a suffix which we represent awkwardly by citing the first-person singular form -ō, in line with our current convention for the lemma form of Latin verbs), but given that this theme vowel is not overtly present in the first-person singular citation form, an editor reading the revised proposal above might easily come to the false conclusion that "bellō" (a verb with the stem bellā-, derived from the noun bellum which has the distinct stem bello-) should be given as a zero-derived form since the -ō ending is "just an inflectional suffix".--Urszag (talk) 01:54, 24 June 2022 (UTC)[reply]
I'm also not sure it's right to say that Polish -ować "only appears in the infinitive form": the Polish infinitive ending seems to be -ć, so that only appears in the infinitive, but the rest of the paradigm is not the same as that of a verb without the suffix such as badać. I don't know what the best analysis of the actual derivational suffix would be (maybe -ow-?), but, as in Latin, this seems to be case where there really is a derivational suffix involved, it's just that Wiktionary obscures the morphological structure of the word by combining the derivational suffix with the inflectional ending of the word category's citation inflected form.--Urszag (talk) 01:59, 24 June 2022 (UTC)[reply]
If the proposed text doesn't make my above point clear enough (about how using the lemma ("infinitive") forms of suffixes to represent those suffixes is fine), then it should be adjusted, perhaps by adding

However, representing morphological features, such as affixes, with their lemma forms is acceptable and generally preferred.

It's possible also that -ować does have some morphological representation in the actual stem. In that case, if I was wrong, it should be documented. — SURJECTION / T / C / L / 07:35, 24 June 2022 (UTC)[reply]
-ować is frequently used to turn even native nouns into verbs, e.g. bajer and bajerować, and frequently leaves traces in other suffixes, e.g. the -ow in biodegradowalny. You would never almost never -ć to mark a change, but -ować and other similar infinitive markers are used. This creates a situation where a word like autować, cited as being borrowed from English out, but we have a native-like suffix (that isn't just morphological, as it's used with other native like words). I think comparing German -en to -ować is a false comparison, as it's much closer to -ć like Urszag pointed out. Vininn126 (talk) 11:21, 24 June 2022 (UTC)[reply]
(Notifying Hergilei, Tweenk, Shumkichi, Wrzodek, Asank neo, KamiruPL, BigDom, Hythonia, Tashi, Luxtaythe2nd, Max19582): Because this affects Polish entries quite a bit. Vininn126 (talk) 11:28, 24 June 2022 (UTC)[reply]

What about creating a new template, {{adapted}}, which would have an output like this:

{{adapted|pl|en|out|-ować}}
English out, adapted using the suffix -ować
Category:Polish terms borrowed from English Category:Polish words adapted using the suffix -ować

Or something similar? This would ensure that we do have the frequency data Fytcha was talking about, yet it's not really a suffixed term (the category is slightly different, and we'd probably put the adapted category as a daughter to the suffixed category). So, would that solve this problem? Thadh (talk) 11:48, 24 June 2022 (UTC)[reply]

I think this would be the best solution - it reduces clutter in the affixed category by removing borrowed words, while providing a more precise one. We could even set it up so that the adapted category shows up as a sort of subcategory of the affixed one. Vininn126 (talk) 11:50, 24 June 2022 (UTC)[reply]
(read what's in the brackets) Thadh (talk) 11:59, 24 June 2022 (UTC)[reply]
What's wrong with {{affix}} |langN=? The fact that it doesn't categorize terms as "borrowed"? — SURJECTION / T / C / L / 12:33, 24 June 2022 (UTC)[reply]
Yes. Vininn126 (talk) 12:34, 24 June 2022 (UTC)[reply]
I prompted it. If there is derivation, what makes it a direct borrowing? — SURJECTION / T / C / L / 12:37, 24 June 2022 (UTC)[reply]
It is borrowing a verb and just applying a verb suffix to make the word operate normally within the language - speakers using it still consider it a borrowing - and I do not think ignoring the suffix makes sense either, for reasons I stated above. If it were just -ć, then I would, but it's not. If there were a change in part of speech or meaning, then the derived meaning makes more sense, but with these words you're taking a foreign and all of it's meanings. Vininn126 (talk) 12:41, 24 June 2022 (UTC)[reply]
I see the point, but then again, there are also cases where Polish has borrowed verbs without adaptation. — SURJECTION / T / C / L / 12:56, 24 June 2022 (UTC)[reply]
That is true. These are probably cases where they were borrowed in a slightly different, unattested Old Polish form where sound changes further changed them to their modern form. That said, Old Polish had -ć instead of -ti already. In such causes I do not think applying a suffix would make any sense, seeing as Czech had the same/similar system for verb endings. Vininn126 (talk) 13:00, 24 June 2022 (UTC)[reply]
There is a difference between affixation and semantic matching of a suffix and a borrowed term, is the point. Thadh (talk) 15:44, 24 June 2022 (UTC)[reply]
If the consensus is to add a template like this, I can implement it. The parameters need to be thought out, though - I'd imagine not all languages do adaptation with just suffixes. — SURJECTION / T / C / L / 09:12, 25 June 2022 (UTC)[reply]
I was thinking of something like {{affix}} does: suffixes, infixes and prefixes are recognised by the combining hyphens.
I think we should wait though, seeing as we haven't had much imput on this idea yet. Thadh (talk) 09:40, 25 June 2022 (UTC)[reply]
Module:User:Surjection/adapted, Template:User:Surjection/adapted and Module:User:Surjection/category tree/poscatboiler/data/terms by etymology (to be merged to Module:category tree/poscatboiler/data/terms by etymology) exist now. If the idea needs changes, it's easier to change those than to restart from scratch. — SURJECTION / T / C / L / 13:25, 25 June 2022 (UTC)[reply]
I've thought about this and my main concern is introducing another etymology template and set of categories. We already have quite a lot of such templates, and with another one we're likely to have entries randomly categorized into either one for an indefinite amount of time, as we do with 'derived' vs. 'borrowed' categories. That said, I'm not opposed to an {{adapted}} template but I don't think it should just have the same format as {{affix}}; if that's the case we may as well just use {{affix}} itself and modify it appropriately. (Note, recently I added support for replacing |langN= with prefixing the term itself by the language code, so instead of {{affix|pl|out|-ować|lang1=en}} you can say {{affix|pl|en:out|-ować}}.) In the case of {{adapted}}, I think we'd at least want a way of specifying that affix X in the source language was replaced with Y in the destination language, or was removed entirely, which is common if the source language is inflected and the destination language respectively is or is not inflected. Benwing2 (talk) 16:39, 26 June 2022 (UTC)[reply]

remove template data from doc pages

[edit]

See {{blend}} for an example of template data. It is big, bulky, and ugly, and it turns out to be impossible to auto-generate from Lua (I tried). Whoever designed this system should be shot as it violates all the principles of good software design. Keeping the template data in sync with the rest of the documentation has to be done manually and as a result almost never happens, so I doubt this stuff is very useful. (Does anyone actually make use of it, and if so, how?) I propose simply removing the stuff from all doc pages where it currently exists. Benwing2 (talk) 06:17, 24 June 2022 (UTC)[reply]

Without it, the output produced by newbies using VisualEditor is liable to be even more cancerous, as it also has instructions on parameter spacing. —Fish bowl (talk) 06:27, 24 June 2022 (UTC)[reply]

Translations in persons

[edit]

The non-lemma persons has a translation box. Usually I just remove these but here I could kind of see the benefit of having it. Thoughts? — Fytcha T | L | C 10:41, 24 June 2022 (UTC)[reply]

Are there languages in which the translation of the plural is different from the plural of the translated singular? There may be special cases, as with the irregular plurals of the Dutch suffix -man. But if not, I don’t get why this should get a special treatment.  --Lambiam 17:42, 24 June 2022 (UTC)[reply]

Should we consider changing the order of sections in reconstructed entries for proto languages (like PIE)?

[edit]

I've noticed for PIE reconstructions, you first see a long list of descendants that are actually under the "Derived Terms" section, from various derived, often compound terms and not the actual main root itself. Only when you scroll down to the Descendants proper section below do you get the more direct descendants... which can be kind of misleading. I think it makes more sense to have those simple direct descendants appear above the ones from all the derived terms. I know we have Derived Terms above Descendants for other lemmas, but the difference is with PIE we actually tend to add descendants of those as well, unlike with other languages. Word dewd544 (talk) 23:01, 24 June 2022 (UTC)[reply]

Not sure. With the current layout, at least you know you may have to scroll down. With the inverse exception you still would scroll down. A reliable or predictable order of appearance is a value. Fay Freak (talk) 00:15, 26 June 2022 (UTC)[reply]
I'd support this change given the special nature and layout of these entries. It's confusing as it currently stands. This, that and the other (talk) 01:11, 27 June 2022 (UTC)[reply]
Yes, it'd be nice to immediately see the direct reflexes or whether there are any. Nicodene (talk) 03:48, 29 June 2022 (UTC)[reply]

Sanskrit romanizations

[edit]

When discussed by scholars writing in languages like English or German, terms in this language are often only given in their romanized transliterated forms. This often makes it very difficult to find the entries for Sanskrit words on Wiktionary. Therefore I think it would be useful if we had Sanskrit romanizations that link to those entries. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 16:35, 27 June 2022 (UTC)[reply]

support. Makes a lot of sense. Sartma (talk) 23:11, 27 June 2022 (UTC)[reply]
A good example of this is the Etymologisches Wörterbuch des Altindoarischen. Another is the Metrically Restored Text of the Rigveda. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 12:46, 28 June 2022 (UTC) [reply]
I don’t know much about Sanskrit. Is there an internationally recognized system of romanization that can be adopted? — Sgconlaw (talk) 05:30, 29 June 2022 (UTC)[reply]
This has come up before multiple times, most recently not all that long ago: Wiktionary:Beer_parlour/2022/February#We_should_allow_romanized_Sanskrit_(IAST)_as_entries
Re: romanization systems, online dictionaries like Sanskrit Dictionary use IAST. ‑‑ Eiríkr Útlendi │Tala við mig 08:22, 29 June 2022 (UTC)[reply]
If there is an established system of romanization, I can't see any strong objection to the proposal. Seems to me to be akin to what is done with Japanese entries. — Sgconlaw (talk) 17:25, 29 June 2022 (UTC)[reply]
Well, there seems to be a majority opinion for having romanizations. Should a vote be held on it or can users just go ahead and start creating the entries (or better, have a bot do it)? (edit: forgot to sign) ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 19:45, 29 June 2022 (UTC)[reply]
Probably best to let this discussion go on for a bit longer, and then put it to a formal vote since it will involve a major change to the dictionary. — Sgconlaw (talk) 20:48, 29 June 2022 (UTC)[reply]
There actually was a vote previously and the proposal failed, which I think makes it even more important to hold another one to "officially" overturn old consensus. 98.170.164.88 20:49, 29 June 2022 (UTC)[reply]
Was there a formal vote at Wiktionary:Votes, or are you referring to the previous Beer Parlour discussion? — Sgconlaw (talk) 21:02, 29 June 2022 (UTC)[reply]
@Sgconlaw: I was actually mistaken; there were three formal votes, not including various BP discussions and the like:
98.170.164.88 21:09, 29 June 2022 (UTC)[reply]
Hmmm. In that case it might be difficult to obtain a different consensus on the matter. All the more reason to let the discussion here go on for longer to gauge editors' opinions. — Sgconlaw (talk) 21:15, 29 June 2022 (UTC)[reply]
  • FWIW, I support any reasonable means to improve usability and discoverability for Wiktionary in general. For terms written in scripts other than the Latin alphabet (such as Sanskrit, Gothic, or Japanese), adding romanized entries as soft redirects is one way to support our user base -- and, as the English Wiktionary, we can only safely assume that our users understand English (to some degree) and are able to input using ASCII.
Others have pointed out before that, so long as an entry for a non-ASCII headword includes an alphabetical representation of the term somewhere, our search algorithms are likely to find that. So long as this works correctly for Sanskrit entries, that reduces the need for romanized entries.
In addition, I'd also like to point out that some users enter the term they're looking for directly into the URL, such as en.wiktionary.org/wiki/TERM. Terms entered this way do not invoke our search algorithm, and thus cannot return hits the same user-friendly way. Romanized entries do improve Wiktionary's overall utility, and considering WT:NOTPAPER, I see no particular negative impact from including romanized entries -- provided that these are only soft redirects and do not include any information that would require updating or other maintenance over time. ‑‑ Eiríkr Útlendi │Tala við mig 18:39, 30 June 2022 (UTC)[reply]
Yeah me too, I support this at least for Sanskrit for similar reasons to User:Eirikr. Benwing2 (talk) 00:13, 3 July 2022 (UTC)[reply]
Not sure that a bare soft redirect will be feasible, as I suspect there may be multiple Sanskrit entries with the same romanization. We will probably have to follow what is done for Japanese (see konnichi wa (konnichi wa)) and Mandarin (nǐ hǎo) (unless that’s what’s meant by a soft redirect?). — Sgconlaw (talk) 05:55, 3 July 2022 (UTC)[reply]
@Sgconlaw A soft redirect is a page that directs the user to go to another page, e.g. using {{alternative form of}}, {{romanization of}}, etc. When you say a "bare soft redirect" you may be thinking of a hard redirect, which uses #REDIRECT to redirect the page automatically. Benwing2 (talk) 01:45, 4 July 2022 (UTC)[reply]
@Benwing2: thanks. So I guess a page like konnichi wa (konnichi wa) is a soft redirect? — Sgconlaw (talk) 04:15, 4 July 2022 (UTC)[reply]
@Sgconlaw: Yes. Benwing2 (talk) 04:22, 4 July 2022 (UTC)[reply]

Hyphen–dash entry title discussion

[edit]

Not sure whether I should have added to the existing discussion at Wiktionary:Beer_parlour/2019/January#Hyphens_and_dashes_in_entry_titles (as I have already done) or started a new one. I don't feel that the 2019 discussion was resolved. —DIV (1.145.44.125 06:22, 29 June 2022 (UTC))[reply]

Are there any new developments? Looks like it went the way of the curly quotes discussion, which also comes up from time to time to say hello. – Jberkel 20:17, 29 June 2022 (UTC)[reply]
Less filling! Tastes great!
Less filling! Tastes great!
Benwing2 (talk) 01:26, 30 June 2022 (UTC)[reply]
I am strongly opposed to this becoming an argument about Unicode codepoints, because I feel that misses the point. My personal preference is that we should use en-dashes when we mean a dash and a hyphen when we mean a hyphen, with the appropriate redirects. There is a semantic difference that exists between them, irrespective of which Unicode character people sometimes use. In a similar vein to the current vote on misspellings, what matters is intention. However, it should only affect a relatively small number of entries anyway. Theknightwho (talk) 02:16, 30 June 2022 (UTC)[reply]
I would probably, in theory, prefer us to use hyphens and en-dashes in the way traditionally dictated by typography (as I've mentioned before, like Smith-Jones-Brown syndrome referring to 3 people, and Smith-Jones–Brown syndrome being the syndrome of Smith and of Jones-Brown). This is clearly a situation where we tend to be hamstringed by the computer keyboard: I personally find it annoying every time I have to type an em or en dash (and Google knows I often search for "em/en dash" to find the bastard, bc it's quicker than digging through Windows Character Map). I think our page lookup/search can handle it, also: if you type the "wrong" dash it will still suggest (or maybe even redirect to) the existing one. End of the day, we are still such a slapdash mess, with rules inconsistently followed, that I don't imagine people would use the dashes correctly even if we tried to enforce it; and imagine all the concomitant issues with links etc. I suppose I see this as a good idea whose time hasn't come yet, due to technical and user limitations. Equinox 03:28, 30 June 2022 (UTC)[reply]
(Mind you I fucking hate curly quotes so perhaps you shouldn't listen to me.) Equinox 03:28, 30 June 2022 (UTC)[reply]
I like curly quotes. I only dislike the misdirected instances, like "in the ‘burbs" (should be "in the ’burbs", of course). But either way it doesn't deter me from listening to you :-) —DIV (49.179.13.5 12:03, 6 August 2022 (UTC))[reply]
Sorry to be picky, but in your example shouldn't it be
Smith–Jones–Brown syndrome referring to 3 people, and Smith–Jones-Brown syndrome being the syndrome of Smith and of Jones-Brown
?
I wouldn't like to start up a big discussion of Unicode, but just FYI, my usual practice is (in Windows) to hold Alt while typing 0150 on the numerical keypad to get an en-dash (for em-dash use 0151). In Windows I also have the Character Map tool readily accessible.
FWIW, in Wikipedia it's available to click in the widget-thing under the text-editing window. On the other hand, a lot of people seem to manually type the encoding &​ndash; in Wikipedia.
What puzzles me is that there are literally no known cases (or no cited cases in these discussions) where en-dash is being used in entry titles on (English) Wiktionary. I would expect that it should work much like Wikipedia:
  • various authors create content
  • some care about en-dash and insert it appropriately,
  • others don't;
  • various readers/editors peruse Wiktionary
  • some notice the misused/absent en-dash and fix it,
  • some notice the misused/absent en-dash and either don't care enough or don't know enough to fix it themselves,
  • some don't notice.
Take the case of win-win, the current title. I would say that it should be possible for an editor to create win–win and at a bare minimum redirect it to win-win. And then — perhaps once that type of minimal usage on Wiktionary has been established — shift the content over to win–win, leaving a redirect at win-win.
—DIV (49.179.13.5 12:27, 6 August 2022 (UTC))[reply]
P.S. Any usage on other-language Wiktionaries that we could take a lead from? 49.179.13.5 12:27, 6 August 2022 (UTC)[reply]

Citing le Reddit etc.

[edit]

I just created fakeclaim with non-durably-archived citations (Reddit etc.) because "it's obviously a word" and we had that vote a few months ago; I suppose it might well pass through the new "editors' discussion" provision if challenged. Is this, in general, a good thing to do with net slang now? Equinox 19:49, 29 June 2022 (UTC)[reply]

Can we use any online source? Do we prefer centralized ones like Twitter, Reddit, YouTube comments, etc.? 98.170.164.88 19:53, 29 June 2022 (UTC)[reply]
I would cautiously suggest that "popular" sites are better (which in 2022 means social media and newspapers), bc they are more likely to stay online for a long time, and to be archived. However, that may be a distraction. I would like more Wikt user input on whether I should continue to do what I did, i.e. use "realistic" (non-spammy) but non-CFI cites to add plausible words that wouldn't pass otherwise. Equinox 03:30, 30 June 2022 (UTC)[reply]
To be honest, we should be using the Internet Archive (or some other archiving site) for every non-durably archived cite. Otherwise, loads of these s/cites will be dead in 5 years. Theknightwho (talk) 21:45, 30 June 2022 (UTC)[reply]
Yes and absolutely no. —Justin (koavf)TCM 05:46, 9 August 2022 (UTC)[reply]
Cites from popular sites that seem to archive and not periodically delete cites seem worth having if they clearly illustrate meaning or usage. Whether they should count for attestation is another matter. If no one challenges the entry, then it probably has some validity, however ephemeral. DCDuring (talk) 16:13, 30 June 2022 (UTC)[reply]
So we're good until WF decides to rules-lawyer us. Equinox 17:33, 30 June 2022 (UTC)[reply]
We've been allowing Usenet posts to be used to attest terms for over ten years. Allowing Twitter, Reddit, etc. posts to be used for attestation is functionally no different. Wiktionary has been built by the kind of original research that would never pass muster on Wikipedia. In my understanding WF projects have a certain degree of latitude in setting policies to suit their own unique needs. As to the question of whether we can start citing Reddit etc., I've been waiting for this shoe to drop for months. "It's better to ask forgiveness than permission," as the adage goes. WordyAndNerdy (talk) 22:19, 1 July 2022 (UTC)[reply]
Wonderfool projects seem to be pretty laissez-faire. 98.170.164.88 01:45, 2 July 2022 (UTC)[reply]
I honestly thought that was being used as an acronym for "Wikimedia Foundation." WordyAndNerdy (talk) 02:23, 2 July 2022 (UTC)[reply]
I created New Englishwoman with two durably archived quotations of New Englishwomen, one of hyphenated New-Englishwomen, and one web-based quotation of New Englishwoman. Under a super-strict interpretation of CFI, this might not pass, but I think it's reasonable. Are we allowed to RfV our own entries just to make sure the web quotations are considered sufficient? 98.170.164.88 23:35, 2 July 2022 (UTC)[reply]
I oppose allowing citations from Reddit and other similar anonymous comment sections (including UseNet) until we get back to the idea that citations are supposed to be evidence of widespread usage, and not evidence of any usage ever. Right now I firmly believe that some editors think it is a good idea to include terms which have been used literally three times ever in history, and that is just stupid. The door is too wide open at the moment and it is making Wiktionary a progressively worse dictionary.
Unrelatedly, I wish people would stop putting citations in the entry itself and put them on the citations page, the entry should only have a citation if it is for the purpose of demonstrating how the term is used. - TheDaveRoss 12:59, 8 August 2022 (UTC)[reply]
Since when is that the rule? I always put citations on the entry page. I aim for three. I create a citations page if I happen to have more handy. bd2412 T 06:00, 9 August 2022 (UTC)[reply]
It isn't a rule, it is my wish. It does have a lot of benefits - it keeps information which is not germane to understanding the term out of the entry, which keeps the entry smaller and easier to edit/read. A good usage example is a great thing to have in an entry, but having many citations which sometimes are clear usage examples and sometimes are very murky and difficult to parse doesn't add to the utility of the entry in my view. - TheDaveRoss 12:38, 9 August 2022 (UTC)[reply]
@TheDaveRoss: While the editing part of "which keeps the entry smaller and easier to edit/read" has some merit, the reading part doesn't because quotations are minimized to a button by default. And for the editing part: the slight discomfort while editing is the much lesser evil compared to having to switch to a different page where one has to again find the appropriate language and sense just to read some quotations. If we followed your approach, it would also not be clear whether switching to the citations page was even worth it; who knows whether a sense has quotations? All in all, a lot of wasted time for our readers if we did it like you wished. — Fytcha T | L | C 18:04, 9 August 2022 (UTC)[reply]
You are assuming that the reader has JavaScript enabled, and that they are reading locally, and that they are a human. All threeTwo of those are probably minority groups. The number of times that a reader will care about citations is vanishingly small compared to other use, it is only we editors who care about citations. - TheDaveRoss 18:35, 9 August 2022 (UTC)[reply]
@TheDaveRoss: If you've spent any significant time citing words, you'll know that "terms for which there can be found only three admissible quotations" and "terms which have been used literally three times ever in history" (quoting you here) are not the same by many orders of magnitude. Google Books returns 10'500 hits for Dachschaden, 7330 hits for Fahrradkette, 191 hits for Fitnessteller (all these figures are massively highballed because Google Book hits become spurious no-previews relatively quickly) even though all these words have been uttered and written down millions of times by native speakers (the first one probably coming close to a billion). — Fytcha T | L | C 17:57, 9 August 2022 (UTC)[reply]
I bet you a nickel I have added more citations to Wiktionary than you have. My argument is that the CFI may work for words which are primarily used in spoken language, since showing up a few times on a UseNet is likely evidence that the word is being spoken thousands of times for every instance it is written. For words which are primarily written on UseNet the CFI fails entirely, since they may never actually be spoken at all, and we have the ability to find virtually every use ever. If a word has not been used or encountered thousands of times by thousands of people it probably shouldn't be in the dictionary. - TheDaveRoss 18:42, 9 August 2022 (UTC)[reply]
@TheDaveRoss: Sorry, the first part wasn't to imply that you haven't added a significant number of quotations to Wiktionary if that's what you got from it; it was more just like "if you've done X, you know about Y". — Fytcha T | L | C 18:45, 9 August 2022 (UTC)[reply]
I didn't think you had any ill intent, I am just feeling ornery. I am fully aware that you do a lot of good work here. - TheDaveRoss 19:18, 9 August 2022 (UTC)[reply]
Any sufficiently motivated person could "make fetch happen" by gaming print sources. CFI was designed to be a filter, not an impenetrable firewall. At a certain point the unwritten "no online sources except Usenet" policy began doing more damage to our collective lexicographic endeavour than it prevented. I'd say that point was at least ten years ago. WordyAndNerdy (talk) 05:09, 9 August 2022 (UTC)[reply]
There's a difference between "making fetch happen" by persuading other people to use it as a word with the assigned meaning, than by merely going to a bunch of accessible messageboards and putting down sentences using "fetch" as an adjective, without anyone else ever taking it up and using it with that meaning. Even where scientists and philosophers coin words, we look to see if they have been adopted by others. bd2412 T 05:36, 9 August 2022 (UTC)[reply]
All someone would've had to do under the old CFI framework to game their protologism into mainspace is get a letter featuring their coinage printed in a local newspaper and then convince two friends to get similar letters printed 365 days later (or just send letters under two different pseudonyms). Thankfully, the number of trolls with that level of dedication is rather low. Most opt for forms of provocation that yield quicker gratification. Meanwhile the old CFI framework made it difficult-to-impossible to attest emerging language and ensured we were usually five years behind the curve in documenting slang. WordyAndNerdy (talk) 06:05, 9 August 2022 (UTC)[reply]
There has to be some middle ground. A word unproductively planted a few places by a single would-be wordsmith isn't really "slang". bd2412 T 06:12, 9 August 2022 (UTC)[reply]
The middle ground is that we have a community of would-be lexicographers who function as something of a peer-review process. It's possible to use Twitter's advanced search functions to look for tweets from a specific timeframe or with a specific level of likes. That makes it roughly possible to discern slang that gained some degree of lasting use from non-starters shared between a circle of friends for a brief time. It's not perfect, but it's better than what we had before. WordyAndNerdy (talk) 06:32, 9 August 2022 (UTC)[reply]

Blog post discussing Wiktionary's coverage of pejoratives

[edit]

I found this blog post on Hacker News, which may be of interest: Compound pejoratives on Reddit – from buttface to wankpuffin. It reminds me of the discussion above about #Shitgibbons, as well as all the other recent debates about Internet sources, attestation requirements for offensive terms, etc. The Hacker News discussion thread is here. 98.170.164.88 20:47, 29 June 2022 (UTC)[reply]

From the blog: "Despite getting a lot less attention than Wikipedia, and having orders of magnitude fewer active editors, I find the breadth of its coverage and quality of its definitions to be very high". Well that's very nice. bd2412 T 05:40, 9 August 2022 (UTC)[reply]
I found it somewhat amusing that all the terms he mentioned as being surprisingly not on Wiktionary (due to how common they are) now have entries already. Andrew Sheedy (talk) 03:01, 10 August 2022 (UTC)[reply]
@Andrew Sheedy: Not all: we still do not have titfucker, gayass, soyboi, and gayboi. J3133 (talk) 08:03, 10 August 2022 (UTC)[reply]

Mariupol Greek in the Section for Translations

[edit]

Could Mariupol Greek (grk-mar) be placed alongside Ancient Greek (grc) in the translation sections? Apisite (talk) 08:30, 30 June 2022 (UTC)[reply]

@Thadh What do you think, Thadh? --Apisite (talk) 09:07, 30 June 2022 (UTC)[reply]

I don't think it does it justice to place Mariupol Greek under (standard) Greek. Similarly, Pontic Greek isn't placed under Greek either. In any case, however, the decision should be made for all Hellenic varieties (Tsakonian, Cypriot, Italiot as well). Thadh (talk) 09:07, 30 June 2022 (UTC)[reply]
I think there's some merit in Apisite's proposal. Under the current policy, the Greek header is a weird paraphyletic grouping. It would be roughly equivalent to a situation where Italian and Latin are grouped together, but other Romance languages are treated separately; or where Old High German, Middle High German, and standard modern German are treated as one group, but Central Franconian, Alemannic, and Yiddish are not.
Maybe the groupings don't have to reflect a precise linguistic taxonomy. After all, their purpose is to make finding the translation more convenient for the reader. Still, it seems to suggest (perhaps unwittingly) that Ancient Greek has only one true modern descendant, when that's just not the case. 98.170.164.88 15:12, 30 June 2022 (UTC)[reply]
I understand. However, as far as I know, Ancient Greek has been put under Greek for no other reason than that people wanting to find Ancient Greek translation might be looking under the letter 'G'. That is not the case for Mariupol Greek - if anything, people might be looking for the letter 'R' of Rumeika.
I see where the proposal is coming from, but I'm afraid we might go into a debate which modern Hellenic variety is more important than others, in order to give it as the primary node. Thadh (talk) 15:37, 30 June 2022 (UTC)[reply]
I don't disagree. An example where the case for having a paraphyletic grouping is pretty strong is Arabic vs. Maltese. Currently most Arabic dialects (Hijazi, North/South Levantine, Egyptian, etc.) are grouped under the "Arabic" heading, but Maltese isn't, and I think this makes sense. Even if it would be technically correct to call Maltese a dialect of Arabic, nobody would expect to find it under 'A' instead of 'M'. I'm not sure about the various Hellenic languages, though, since they are called "Xyz Greek", which makes grouping them all under "Greek" make a bit more sense, but it's still debatable for sure. 98.170.164.88 15:49, 30 June 2022 (UTC)[reply]