Wiktionary:Beer parlour/2020/March

Template:sense vs Template:label for marking which sense synonyms apply to

As the edit summary says, diff "replaced deprecated {{sense}} w/ {{lb}}". I recall some discussions about how it is confusing to repeat a word's definition (in {{sense}}) in front of its antonyms, but was not aware {{sense}} was deprecated; is it? And using {{label}} outside a definition line, and to give a definition rather than a qualifier, seems wrong. (Pinging @BlueCaper as the person who made the edit.) Have I missed something? - -sche (discuss) 03:27, 2 March 2020 (UTC)[reply]

I don't think {{sense}} is deprecated, and I think the use of {{lb}} illustrated in that edit is mistaken. —Mahāgaja · talk 07:02, 2 March 2020 (UTC)[reply]

I try to follow the documentation associated with the templates. If it isn't mentioned there, it doesn't count. -Mike (talk) 17:35, 2 March 2020 (UTC)[reply]

Reverted. P U C 17:53, 2 March 2020 (UTC)[reply]

Wiktionary's "Vulgar" Latin pronunciation seems fundamentally flawed -- I'd like to get rid of it

I brought this up late last year on the talk page for the la-pronunc module, which generates pronunciations for Latin entries, but haven't followed up on it til now. I think that the current system used to generate "Vulgar Latin" pronunciations is not useful and can't feasibly be revised into a useful form, so it should be removed. I don't think my reasons require very much specialized knowledge of the history of Latin, so I'm posting this here to see whether other editors find my argument convincing after taking a look at the entries where Vulgar Latin pronunciations are currently implemented.

1. The pronunciation isn't implemented correctly. It uses phonetic slashes and phonetic brackets in an odd way that I don't understand: e.g. using both /eː/ and /i/ as phonemes corresponding to a single phone [e] (see bibo and cedo). Also, there seem to be a lot of bugs in the coding logic with the ordering and conditioning of sound changes: I don't think transcriptions like proscenium [prosˈke.nʲo.õ], cambiare [kam.βjˈa.re], zeugma [ˈzeo̯ɡ.ma], mulier [ˈmo.le.er], saeculum [ˈse.ɡo.lõ] and aether [ˈe.tʰer] are even correct as representations of whatever form of Latin the author had in mind. (I'd guess these were intended to come out something like [prosˈke.nʲo], [kamˈbja.re], [ˈzeu̯ɡ.ma], [ˈmo.ljer], [ˈsɛ.go.lo] and [ˈɛ.ter].)

2. In many areas, it isn't clear what the correct version of this pronunciation guide would even be. I can't tell whether it is meant to present a hypothesis about the pronunciation used by ordinary speakers of Latin in Rome in ancient times, or an approximation of features common across the majority of speakers in the Late Latin period. Either way, several features of the Wiktionary Vulgar Latin pronunciation seem unlikely to me, like the merger of /w/ and /b/ in word-initial position (which is found in only a few of the Romance languages descended from Latin) or the lenition of intervocalic /p t k/ to [b d g] (a sound change characteristic to Western Romance, but not consistently present in other branches of Romance, which suggests that this voicing occurred later than other "Vulgar Latin" sound changes like the loss of /h/, merger of intervocalic /w/ and /b/ in the middle of a word, or affrication of /t/ before i followed by a vowel.) As evident from the preceding points, it definitely isn't an accurate representation of Proto-Romance, Proto-Italo-Western Romance, or Proto-Western Romance.

If nobody disagrees, I'm planning to contact an admin for help with removing all of these pronunciations. --Urszag (talk) 06:35, 3 March 2020 (UTC)[reply]

Oppose --{{victar|talk}} 07:06, 3 March 2020 (UTC)[reply]

Since I haven't even posted this as a vote yet, I'd like to hear reasons/discussion rather than just a single word response. What value do you see in having these pronunciations, and having them generated rather than manually put in? How would you clean up the ones that are currently incorrect/buggy?--Urszag (talk) 07:11, 3 March 2020 (UTC)[reply]

If things are buggy, they can be fixed. No sense throwing the baby out with the bath water. --{{victar|talk}} 18:57, 3 March 2020 (UTC)[reply]

I wouldn't agree to remove them to never be replaced by more appropriate forms. Worst case scenario, the pronunciations have to be divided into different dialects. – Tom 144 (𒄩𒇻𒅗𒀸) 17:03, 3 March 2020 (UTC)[reply]

@Brutal Russian. P U C 15:41, 3 March 2020 (UTC)[reply]

I'm in favor of removing them. Urszag's point 1 can be fixed, but point 2 can't. There wasn't one single monolithic Vulgar Latin pronunciation, and trying to list them all would be both unwieldy and highly speculative. —Mahāgaja · talk 15:54, 3 March 2020 (UTC)[reply]
I support removing them. Presenting one phonemic and phonetic transcription without any qualifiers is a strange statement that all forms of Vulgar Latin throughout Romance-speaking regions of the Roman Empire through several centuries were pronounced roughly the same way, which is dubious. I don't object to adding transcriptions for particular regions and times, if there's enough evidence. I think it's best to immediately remove it and later maybe come up with a replacement, because of the significant errors and the fundamental flaw in how the information is currently presented. — Eru·tuon 18:18, 3 March 2020 (UTC)[reply]
Support - Vulgar Latin is not a linguistic variety, but a collection of disparate sociolinguistic features - it cannot be transcribed on principle. To use the baby analogy, it's a composite portrait of a couple of dozen babies that's being passed for a real baby they're all supposedly descended from. Even attempting to devise a common phonetic transcription of any single Proto-Romance branch would be...optimistic enough - in the literature, such transcriptions are only used to discuss particular phonetic developments in particular languages, and even then without implying any certainty. The Proto-Romance thing needs to exist under the reconstructed namespace, where there's potential for different phonemic shapes, but a phonetic transcription even into some very particular and well-attested variety such as proto-Florentine would end up surfacing as many transcriptions depending what theories one adopts. What can be attempted for Latin are variations in standard pronunciation over time and space. Perhaps the current "Vulgar Latin" could be converted into a stereotypical Pompeiian/Campanian-style variety? Brutal Russian (talk) 18:04, 4 March 2020 (UTC)[reply]

Splitting Aramaic cont.

Continuing the discussion started by @victar, here are some distinctive phonological and orthographic features of three conventional periodizations of Aramaic that I think should be considered by any proposal to split Aramaic on Wiktionary.

Old Aramaic (c. 1000 - 600 BCE)

Proto-Semitic *ś is retained as */ɬ/ and written ⟨𐤔⟩ (š).
Proto-Semitic *ṯ is retained as */θ/ and written ⟨𐤔⟩ (š).
Proto-Semitic *ṯ̣ is retained as */θʼ/ and written ⟨𐤑⟩ (ṣ).
Proto-Semitic *ḏ is retained as */ð/ and written ⟨𐤆⟩ (z).
Proto-Semitic *ṣ́ is backed to */k͡ʟ̥ʼ/ ~ */k͡xʼ/ and written ⟨𐤒⟩ (q).
Proto-Semitic *ḫ is retained as */x/ and written ⟨𐤇⟩ (ḥ).
Proto-Semitic *ḡ is retained as */ɣ/ and written ⟨𐤏⟩ (ʕ).

Ex.

Old Aramaic: 𐤔𐤋𐤔 f (šlš /⁠ṯalāṯ⁠/, “three”)

Old Aramaic: 𐤏𐤔𐤓 f (ʕšr /⁠ʕaśar⁠/, “ten”)

Old Aramaic: 𐤀𐤓𐤒 f (ʔrq /⁠ʔarḳ́⁠/, “earth”)

Imperial Aramaic (c. 600 - 300 BCE)

Old Aramaic š_ś is retained as */ɬ/ and written ⟨𐡔⟩ (š).
Old Aramaic š_ṯ is merged with /t/ and written ⟨𐡕⟩ (t).
Old Aramaic ṣ_ṯ̣ is merged with /tˤ/ and written ⟨𐡈⟩ (ṭ).
Old Aramaic z_ḏ is merged with /d/ and written ⟨𐡃⟩ (d).
Old Aramaic q_ṣ́ is deaffricated and voiced to */ɣˤ/ and written ⟨𐡏⟩ (ʕ).
Old Aramaic ḥ_ḫ is retained as */x/ and written ⟨𐡇⟩ (ḥ).
Old Aramaic ʕ_ḡ is retained as */ɣ/ and written ⟨𐡏⟩ (ʕ).

Ex.

Imperial Aramaic: 𐡕𐡋‎𐡕 f (tlt /⁠talāṯ⁠/, “three”) (with spirantization due to begadkefat)

Imperial Aramaic: 𐡏𐡔𐡓 f (ʕšr /⁠ʕaśar⁠/, “ten”)

Imperial Aramaic: 𐡀𐡓𐡏𐡀 f (ʔrʕʔ /⁠ʔarḡ̇ā⁠/, “earth”) (emphatic state)

Aramaic (c. 300 BCE - )

Imperial Aramaic š_ś is merged with /s/ and written ⟨ס⟩ (s).
Imperial Aramaic ʕ_q is merged with /ʕ/ and written ⟨ע⟩ (ʕ).
Imperial Aramaic ḥ_ḫ is merged with /ħ/ and written ⟨ח⟩ (ḥ).
Imperial Aramaic ʕ_ḡ is merged with /ʕ/ and written ⟨ע⟩ (ʕ).

Ex.

Aramaic: תלת f (tlt /⁠talāṯ⁠/, “three”) (with spirantization due to begadkefat)

Aramaic: עסר f (ʕsr /⁠ʕasar⁠/, “ten”)

Aramaic: ארעא f (ʔrʕʔ /⁠ʔarʕā⁠/, “earth”) (emphatic state)

Rhemmiel (talk) 07:48, 3 March 2020 (UTC)[reply]

In defense of the thesuarus

https://getpocket.com/explore/item/the-thesaurus-is-good-valuable-commendable-superb-actually —Justin (koavf)❤T☮C☺M☯ 15:25, 3 March 2020 (UTC)[reply]

Use subpages for Japanese entries

Everyone, please take a look at our current 己 entry. I hope you can agree that it has bad layout. Different lexical items (き, つちのと, …) and different parts of each lexical item (Etymology, Pronunciation, …) are both laid vertically. As a result, the user is lost in an ocean of headers, and the horizontal dimension is wasted.

A new solution is to move each lexical item into its own subpage (己/き, 己/つちのと, …) and transclude them on the main entry. Each subpage will contain only one lexical item so they won't be crammed. The main entry can use Lua to list subpages in a 2D format like Jisho.org or {{ja-see-kango}}.

Doing so also eases maintenance: Since the reading is in the page title, there is no need to pass it to every template that need it. Also, {{DEFAULTSORT:sortkey}} works correctly, and the problem of not being able to categorize 避く twice in Category:Japanese shimo nidan verbs disappears.

What do you think of this approach?

(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Dine2016, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Nyarukoseijin (talk) 04:13, 8 March 2020 (UTC)[reply]

Those "subpages" are not recognized software-side as actual subpages, though, since subpages in the main namespace are disabled and nobody but the WMF can change that. mellohi! (僕の乖離) 05:04, 8 March 2020 (UTC)[reply]

Well, we could ask for the devs to change that if we demonstrated local consensus, I believe. However, I continue to oppose per-language subpages, although perhaps we need to consider them again. I definitely oppose such a radical change to Japanese entries alone, as making Japanese deviate any more from the standard puts a strain on everybody who does the unthanked work of keeping the wiki as a whole tidy and functional. —Μετάknowledge^{discuss/deeds} 05:13, 8 March 2020 (UTC)[reply]

I think it does not really matter whether they are "software-side actual" subpages or not. -- Huhu9001 (talk)

The "subpages" still follow the standard entry layout. Entries like 歌/うた still have ==Japanese==, ==Okinawan==, ==Yaeyama== and ==Yonaguni== language headers on L2, ===Etymology===, ===Pronunciation===, ===Noun=== headers on L3, and the structure is pretty much the same. The only difference is the title, which is collectively defined by kanji and kana. Citing an entry by both spellings is better than citing it by either, because Japanese is full of homographs. --Nyarukoseijin (talk) 08:49, 8 March 2020 (UTC)[reply]

I don't really see the point. It doesn't matter where the content is stored, once it's transcluded it's subject to the limitations of the page it's transcluded into. Likewise, the difference in categorization only applies to the subpages themselves. To make it work you would have to have the categories link directly to the subpages, and not to the main page. If you transclude the categories along with the other content, the page they're transcluded into will be categorized exactly the same as if the categories were all generated on the page itself. To avoid categorizing the main page, you would have to separate the category wikitext from the rest of the content, which wouldn't be compatible with our existing template architecture. The same with "DEFAULTSORT". Chuck Entz (talk) 06:46, 8 March 2020 (UTC)[reply]

The "subpages" still follow the standard entry layout. Lua is powerful enough to take care of the rest, such as stripping the categories and putting the transcluded pages into a tabular format. --Nyarukoseijin (talk) 10:05, 8 March 2020 (UTC)[reply]

Don’t rely too much on Lua, which has little maintainability in a long term. If you think the layout is the problem, you can’t solve it by transclusion. — TAKASUGI Shinji (talk) 23:20, 8 March 2020 (UTC)[reply]

Lua is only a plus. It fetches definitions from the subpages and does nothing more.

For pronunciation and definitions of 已 – see the following entries.

己/き	[proper noun]the sixth of the ten Heavenly Stems
己/つちのと	[proper noun]the sixth of the ten Heavenly Stems
己/おのれ	[pronoun]reflexive pronoun: oneself [pronoun]first-person pronoun: I, me [pronoun]second-person pronoun: you [adverb]by oneself [interjection]An interjection expressing anger or chagrin
	…

If Lua breaks, we can easily fallback on a template implementation like:

For pronunciation and definitions of 已 – see 己/き, 己/つちのと, 己/おのれ, ….

--Nyarukoseijin (talk) 07:15, 9 March 2020 (UTC)[reply]

This is more like making the main entry a soft redirect. -- Huhu9001 (talk) 08:13, 9 March 2020 (UTC)[reply]

To me the one possible advantage is if DEFAULTSORT could be used effectively (though Chuck Entz seems to suggest that wouldn't work, either). Otherwise, I agree with others that there is no real point to doing this. Once subpages are transcluded, any issues of layout, organization, and maintenance remain just as the current status quo. (And by the way, personally I don't see vertical as opposed to horizontal layout as a problem.) Cnilep (talk) 00:46, 9 March 2020 (UTC)[reply]

DEFAULTSORT is not the only advantage. By citing entries by both kanji and kana, etymologies will have a better style. Compare:

Originally a compound of 土 (tsuchi, “earth, one of the Chinese five elements”) +‎ の (no, possessive particle) +‎ 弟 (oto, “younger brother”).
Originally a compound of 土/つち (tsuchi, “earth, one of the Chinese five elements”) +‎ の (no, possessive particle) +‎ 弟/おと (oto, “younger brother”).

Category pages benefit as well. The words are no longer listed in the order [reading 1], [kanji 1a], [kanji 1b], [reading 2], [kanji 2a], etc. Instead, you get [kanji 1a/reading 1], [kanji 1b/reading 1], [kanji 2a/reading 2], …. The kanji and readings are paired, just like printed dictionaries. --Nyarukoseijin (talk) 07:15, 9 March 2020 (UTC)[reply]

To me, those don't appear to be advantages. The treatment of compounds is visually more difficult to parse, and understanding it may require previous knowledge of both Japanese and Wiktionary organization – not something we should expect every reader to have. And while the category scheme you envision is no more difficult, neither is it necessarily any better. YMMV — Cnilep (talk) 02:21, 10 March 2020 (UTC)[reply]

Difficult to parse: We can modify Module:links so that {{m|ja|土/つち}} produces 土/つち. This also applies to all templates that use Module:links, like {{com}}, so you don't need specialized templates like {{ja-compound}} for ruby.

Requiring knowledge of organization: We can modify {{ja-kanjitab}} to show the following on every kanji entry, so that readers will know 己/つちのと is not a single spelling.

Hiragana	つちのと
Kanji	己

Category scheme: The new scheme allows users to quickly find terms by 五十音順, because every term is followed by its reading. By contrast, please take a look at the current Category:Okinawan lemmas. Can you pair the kanji and kana without knowledge of the Okinawan language? Then compare

い

Note that we're allowed to categorize 魚 twice. This is impossible with the current status quo. --Nyarukoseijin (talk) 05:42, 10 March 2020 (UTC)[reply]

I don't think categorizing a word twice is a good practice. -- Huhu9001 (talk) 04:47, 11 March 2020 (UTC)[reply]

I didn't give a good example. Okinawan 魚/いう (iu) and 魚/いゆ (iyu) may be alternative forms, but Japanese 己(つちのと) (tsuchinoto) and 己(おのれ) (onore) are two different words. If we lemmatize them at 己, there's no way to make them appear under both つ and お. --Nyarukoseijin (talk) 05:02, 11 March 2020 (UTC)[reply]

Embedding the kanji in the page title is definitely a fascinating hack in order to circumvent the inability to categorize 避ける twice, but it personally feels like we would be trying to force MediaWiki categories to be something that they're not, or bending over backwards to twist it into our will.

Maybe we could maybe revive indexes? —Suzukaze-c ◇◇ 06:04, 11 March 2020 (UTC)[reply]

It's not just for categories; it's mainly for reducing homographs. Having multiple terms on the same entry is terrible data management. The new approach ensures that most entries will have only one lexical item, so you can for example store the romaji in the pronunciation section, and every headword template can just fetch it by transcluding the whole page. No more ambiguities over which etymology section to target. --Nyarukoseijin (talk) 14:48, 11 March 2020 (UTC)[reply]

Re: "Having multiple terms on the same entry is terrible data management." -- wholeheartedly agree.

For those unfamiliar with Japanese orthography, I'd like to explain that this is an entirely different phenomenon from cases like English record (noun) and verb both from the same ultimate root, or English slough with two distinct etymologies but one written form and multiple, albeit closely similar, pronunciations. (For that matter, the "muddy or marshy area" sense is missing the pronunciation /sloʊ/, used at least in California for the Elkhorn Slough on Monterey Bay.) Again, a single Japanese grapheme may sometimes have more than 10 distinct and unrelated pronunciations and etymologies. Pretty much every single kanji has at least two unrelated pronunciations and etymologies, one from Middle Chinese and one from Old Japanese. What we've had to do so far for Japanese entries here at Wiktionary is a bit like if Wikipedia didn't have disambiguation, and everything now at disambig pages like Slough_(disambiguation) were instead shoehorned into one page, and any incoming links could only reliably point at the top of the page. Frankly speaking, it's a mess. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:15, 11 March 2020 (UTC)[reply]

Yeah, this is what happens if English is written logographically. --Nyarukoseijin (talk) 02:54, 12 March 2020 (UTC)[reply]

I still think that storing content at the kana form is superior. Especially for 大和言葉, spelling is secondary to whatever kanji literati have あてはめた. I think that つち (土, tsuchi) +‎ の (no) +‎ おと (弟, oto) is not bad. (For certain words that are regularly written in kanji, like 今日 (kyō), I acknowledge that kanji should definitely be somewhere nearby.) —Suzukaze-c ◇◇ 07:34, 9 March 2020 (UTC)[reply]

This proposal comes close to replicating the basic organization of Japanese monolingual dictionaries.

The underlying challenge is that, in the written form that we must use for this text-based dictionary, most Japanese lemmata exist at the intersection between the orthography and the pronunciation.

That might seem like a "no shit, Sherlock" statement for editors used to other languages and simpler writing systems. However, for Japanese, the writing system is -- frankly -- a bit of a mess. It is gloriously flexible and expressive, but harder to pin down for dictionary purposes. Words and spellings have a much more tenuous relationship than in other languages.

Consider the entry structure at 柄. One grapheme, nine lemmata, eight of which have distinct pronunciations. Or take 生. Our entry isn't fully fleshed out yet, but my local monolingual KDJ lists fourteen different lemmata, each with distinct pronunciations. As a simpler example, take 雲. One grapheme, two lemmata, with distinct pronunciations.

And I'm not even getting into the interesting fluidity of Japanese orthography. Any avid reader of manga will recognize that a word as spoken may appear on the page as a completely different word as written, with ruby or some other mechanic to link the two. If a given written form is used often and widely enough for a given word, it becomes adopted into the lexicon, as a new intersection between speech and spelling.

Japanese presents challenges that simply aren't encountered with other languages. I respect the opinion that we should generally avoid language-specific divergences in Wiktionary layout and structure -- but I also think that we should make allowances where circumstances warrant. And Japanese lexicography demands a different approach than what we've been doing, if it is to be done well. There's a reason that no monolingual electronic Japanese dictionary organizes their entries as we have, and it's not because Wiktionary has discovered something special -- we're behind the curve when it comes to effective data organization for Japanese lexicography.

The proposed data structure much more effectively correlates graphemes and phonemes in the headword or "address" of an entry. Again, this comes close to replicating the basic organization of Japanese monolingual dictionaries, and I view this as a positive. ‑‑ Eiríkr Útlendi │^{Tala við mig} 09:04, 9 March 2020 (UTC)[reply]

@Eirikr: Thank you. I realized that my proposal was really about making the headword collectively defined by kanji and kana. It didn't have to be implemented with subpages or even slashes; other formats like 己:つちのと would serve equally well.

The only change required is to add the following text to WT:EL#Entry name:

For languages whose terms are identified by both logographic and phonographic spellings, place the character "/" between the two spellings. For example, use 歌/うた for the Japanese, Okinawan, Yaeyama, and Yonaguni word whose logographic spelling is 歌 and whose phonographic spelling is うた.

We can try to convince the community by comparing the format of English and Japanese dictionaries. --Nyarukoseijin (talk) 10:24, 9 March 2020 (UTC)[reply]

Don't use a vague phrase ("languages whose terms...") or you'll risk pulling Egyptian and other languages into this. —Μετάknowledge^{discuss/deeds} 16:39, 9 March 2020 (UTC)[reply]

OK.

@Suzukaze-c: The kana form alone is not enough to identify a word. つち could be either 土・地 or 槌・鎚・椎, and おと could be either 弟・乙, 音, or 遠・彼方. So, you still need special mechanism to embed the kanji. If we use titles like 土/つち, we can use standard templates like {{com}}, {{synonym of}}, {{syn}}, {{ant}}, etc. with exactly the same syntax as western languages. --Nyarukoseijin (talk) 05:42, 10 March 2020 (UTC)[reply]

@Everyone: It is common practice to cite Japanese terms by both kanji and kana, and many templates use two parameters for that purpose (e.g. {{ja-r|己|つちのと}}). This leaves the problem of how to arrange the parameters when citing multiple terms. For example, {{ja-compound|土|つち|の||弟|おと}} groups parameters by term, and {{ja-vp|見る|見える|みる|みえる|c=見せる|ck=みせる}} groups parameters by orthography. The new headword format makes the syntax of Japanese templates more consistent and predictable ({{ja-compound|土/つち|の|弟/おと}}, {{ja-vp|見る/みる|見える/みえる|c=見せる/みせる}}) and more in line with the general norm of one parameter per word (e.g. {{compound|en|place|holder}}).

@Eirikr Even if the proposal is passed, we probably need bots to update the thousands of existing entries and cross-references. So it'll probably be a huge project, comparable to Unified Chinese. I wonder if it is possible to do this first to a subset of entries, such as wago terms or terms with multiple readings. For entries with one or usually one reading, there is little ambiguity of which term is being described or cited, so changing the headword format is less helpful. According to the Introduction to the Second Edition of the Oxford English Dictionary:

Many of the 580,000 cross-references in the Dictionary are imprecise, citing headwords without parts of speech and homonym numbers, for example. It was impossible for the automatic cross-referencing system to determine which of two or more possible targets was the one proper to an ambiguous cross-reference of this sort, and so, on the whole, these have not been made more precise; in many cases, the intended target is obvious to the reader, and amplification would merely be fussy. […]

--Nyarukoseijin (talk) 06:12, 11 March 2020 (UTC)[reply]

I think starting with a subset would make sense, entries like 柄 with its eight readings and nine lemmata, or 生 with its fourteen readings (once the entry is built out).

As a query, the proposal above would successfully disambiguate, essentially with an index starting with the kanji. This is very like the behavior of monolingual JA electronic dictionaries when the user enters a kanji search string -- all readings that match that string are shown, and the user finds the one they want. For our purposes, we will keep the kanji spelling page, such as 柄, as the "main" page for that grapheme, and all of the individual entry pages for the readings of that kanji will be transcluded so that 柄 presents a unified view. Not unlike how Wiktionary:Beer parlour is the "main" view for the various Year/Month subpages.

Extending this functionality to kana strings, how would you gather all of the graphemes read as から (kara) or しょう (shō) to display on those entries? A user of a Japanese monolingual electronic dictionary would be accustomed to entering a kana string and seeing all of the graphemes that correspond to those phonemes. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:15, 11 March 2020 (UTC)[reply]

柄 will have {{ja-see|柄/え|柄/かい|柄/かび|柄/から|柄/がら|柄/つか|柄/つく}}, and から will have {{ja-see|空/から|柄/から|唐/から|殻/から|涸/から|幹/から|掛絡/から|蜾蠃/から|加羅/から}}. According to Shinji's comment above, Lua has little maintainability in a long term, so we shouldn't rely on it too much. Embedding the "search results" into spelling entries ensures that we can easily fallback to {{ja-see-also}} in case {{ja-see}} breaks. --Nyarukoseijin (talk) 02:54, 12 March 2020 (UTC)[reply]

Two questions:

1. Why kanji first?

2. How would kanji be choosen for words like 超える・越える?

Independently, the idea of treating something like 殻/から in a manner similar to vocalized Arabic for display and transliteration purposes (which I didn't notice well enough previously) is extremely intriguing. —Suzukaze-c ◇◇ 04:55, 12 March 2020 (UTC)[reply]

1. Sticking to one format makes titles predictable. Kanji first because that's the spelling frequency for most terms.

2. That's not the concern of this proposal. The ultimate way is still to go database-like. At the current stage, just choose between 超える/こえる and 越える/こえる as you would choose between 超える and 越える, and make the other a redirect. (We have asymmetry between color and colour too.)

(3.) Theoretically you could make 殻/から link to 殻 and から/殻 link to から. But then you need to target specific Etymology sections. I once made {{ja-spellings}} and {{ja-see}} generate anchors like 殻#ja-から and から#ja-殻 and planned to make {{ja-r}} accommodate that, but I now find 殻/から a better way. --Nyarukoseijin (talk) 06:17, 12 March 2020 (UTC)[reply]

Since they are not subpages anyway. I prefer a style with brackets: 柄 (がら). It looks more natural. -- Huhu9001 (talk) 03:51, 12 March 2020 (UTC)[reply]

I personally don't think parentheses would be appealing as a page title. —Suzukaze-c ◇◇ 05:15, 12 March 2020 (UTC)[reply]

Many online dicts and almost all other wikiprojects use brackets. -- Huhu9001 (talk) 06:18, 12 March 2020 (UTC)[reply]

@Eirikr: Can you draft a proposal if you're interested? I'm not good at it. --Nyarukoseijin (talk) 03:52, 14 March 2020 (UTC)[reply]

@Nyarukoseijin, sorry, just saw this by pure chance. My responsibilities IRL are bit pressing of late what with the whole COVID-19 mess, especially here in the US, and my schedule is not as forgiving as it was. I can't promise rapid progress. But I like the idea, and I feel that something like this is needed to improve our (EN WT) handling of Japanese, so I figure let's give it a go. I'll see about starting a draft and loop you in. It might not be today though (albeit already a couple weeks after your post...). Cheers, ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:08, 31 March 2020 (UTC)[reply]

Redirect entries without the generic objects

For example, serve right should redict to serve someone right (in this case all the more reason since the latter includes the non idiomatic meaning); if such searchs indeed deserve their own entries (as do duke (it) out), the existence of the corresponding parrallel entry must be specified. --Backinstadiums (talk) 12:43, 8 March 2020 (UTC)[reply]

11-year-olds

According to OldPages, there are now a few entries which haven't been edited in eleven years. Either they are perfect, or so obscure that nobody would ever find them. Anyway, the winner is the entry maliferous, the first entry not to have been touched in 11 years. It was made by Jackofclubs (talk • contribs), a mediocre editor if there ever was one. _{I wonder what came of them...} --Alsowalks (talk) 18:23, 8 March 2020 (UTC)[reply]

I added a usage example. I'm looking for a proper quotation now. Some etymology can certainly be added and a recording of pronunciation as well. —Justin (koavf)❤T☮C☺M☯ 18:30, 8 March 2020 (UTC)[reply]

Given that the word is archaic, is it really helpful to add a usage example that is otherwise very modern sounding? Andrew Sheedy (talk) 04:43, 9 March 2020 (UTC)[reply]

Wiktionary:Votes/2020-03/Letter entries to be Translingual

A new vote, to make all Letter entries Translingual. Input is appreciated, as well as links to any previous discussions — I think this may have been first proposed by @Rua, but I can't remember where. —Μετάknowledge^{discuss/deeds} 00:20, 9 March 2020 (UTC)[reply]

The letter b is read /biː/ in English, /beː/ in Dutch. In Dutch, the plural of b is b's, the plural of s is s'en. Where would all that information go? MuDavid 栘𩿠 (talk) 02:14, 9 March 2020 (UTC)[reply]

That information is not about the letter, but about the name of the letter, which can be found at bee. —Μετάknowledge^{discuss/deeds} 05:09, 9 March 2020 (UTC)[reply]

The names of the letters are another piece of information that (some) language-specific letter entries have, although those could very well also be put on a table inside an appendix. — sur jec tion ⟨?⟩ 10:47, 9 March 2020 (UTC)[reply]

@Metaknowledge: But that doesn't address the problem. Yes, bee is one possible spelling of the English name for the letter, but b is also a spelling, and both spellings have their inflections. It would be strange to list Dutch s'en as an inflection of es. Moreover, fully spelled-out versions of the letter names are much rarer in practice than the single-letter versions. —Rua (mew) 11:24, 9 March 2020 (UTC)[reply]

@Rua: That's a good point. We're still talking about nouns, here, though (which have inflected forms like plurals), not letters themselves. Do we want nouns under the 'Letter' L3 header? —Μετάknowledge^{discuss/deeds} 16:34, 9 March 2020 (UTC)[reply]

I'm not sure if there is much point in distinguishing them in this case. How many languages are there where a letter is not also a noun? If the majority of letters are also nouns, then we don't gain much by separating out just the letters; the nouns will flood the pages all the same. —Rua (mew) 17:19, 9 March 2020 (UTC)[reply]

Many letters are strictly language-specific: ჯ, ձ, xh. It would seem strange to classify them as translingual – although this was done for the Glagolitic letters such as Ⰷ. --Lambiam 11:46, 9 March 2020 (UTC)[reply]

You think of those as language-specific, but minority languages using those scripts would disagree. (Although I doubt any other language uses xh, I am also unconvinced it ought to have an entry as a letter.) —Μετάknowledge^{discuss/deeds} 16:34, 9 March 2020 (UTC)[reply]

@Metaknowledge: My proposal was actually to treat scripts as languages, so that letters are listed and categorised under their script rather than as translingual. —Rua (mew) 17:20, 9 March 2020 (UTC)[reply]

That's intriguing — I'd be curious to see what people think of it. Given this response, I'm thinking of pulling the vote anyway. —Μετάknowledge^{discuss/deeds} 22:31, 9 March 2020 (UTC)[reply]

That seems more straightforward, and it would solve those cases where a digraph is only used by one language (as discerned by the editor), so one avoids the embarrassment of calling it translingual. Fay Freak (talk) 23:28, 9 March 2020 (UTC)[reply]

I think that could be a good solution, provided pronunciation is given for all the languages that use the script. Which could be a lot (but could be put in a collapsible box by default)... Andrew Sheedy (talk) 04:16, 16 March 2020 (UTC)[reply]

Proposal to handle kanji spellings of Japanese given names

((Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ): and @Nyarukoseijin) Unlike surnames, given names in Japanese have consistent and lexically relevant phonetic forms, with their kanji spellings being the opposite - kanji spellings can be assigned arbitrarily with no regards to phonetics, etymology, or lexical history of the name. (Attempting to figure out the etymologies of given names in Japanese can be a huge headache at times.) This leads to a phonetic given name being assigned many kanji spellings and a kanji spelling assigned to multiple phonetic given names. In my view, two men named Yoshiaki are still two men named Yoshiaki even if they spelled their names with two different kanji spellings. Category:Japanese given names is infested with kanji spellings of given names that makes it difficult to navigate between phonetic given names.

As such, I propose that we come up with solutions that filter the phonetic given names (written in kana) from their kanji spellings. Like separating the kanji spellings into Category:Kanji spellings of Japanese given names or something like that, with the subcats of Category:Japanese given names being reserved only for phonetic forms. The kanji spellings could be cross-referenced on the kana form pages.

Previous discussion:

mellohi! (僕の乖離) 13:46, 9 March 2020 (UTC)[reply]

I agree with you basically. I’d like to make exceptions such as 太郎(たろう) (Tarō), 次郎(じろう) (Jirō), 三郎(さぶろう) (Saburō), and probably also 花子(はなこ) (Hanako), which are lexically established. — TAKASUGI Shinji (talk) 00:57, 10 March 2020 (UTC)[reply]

I'm not sure I understand the nature of the 'exception'. Would ハナ子, はな子, and 華子 be treated somehow differently from 花子, for example? If so, differently how? To the original proposal: is this proposal related only to the content and organization of Categories, or would something new (a new template? an argument in e.g. {{given name}}?) need to be added to entries? I'm not opposed in principle; I just don't understand. Cnilep (talk) 02:35, 10 March 2020 (UTC)[reply]

花子(はなこ) (Hanako) is used as an example, just like Jane Doe. — TAKASUGI Shinji (talk) 02:21, 25 March 2020 (UTC)[reply]

@TAKASUGI Shinji: I don't understand. With exceptions, the category page would be like

…はな、はなえ、はなこ、花子、はなみ、はね…

This looks unnatural and unnecessary. -- Huhu9001 (talk) 02:13, 27 March 2020 (UTC)[reply]

You never use ハナ子, はな子, or 華子 for “Jane Doe”. It’s always 花子(はなこ) (Hanako). — TAKASUGI Shinji (talk) 11:36, 27 March 2020 (UTC)[reply]

@TAKASUGI Shinji: So I guess you are saying 花子 is a Japanese version of English term Jane Doe. But this has nothing to do with this topic. Jane Doe is not in cat:English names, cat:English given names or whatsoever categories. -- Huhu9001 (talk) 12:09, 27 March 2020 (UTC)[reply]

花子 is a given name, not a full name (ex. [4]). — TAKASUGI Shinji (talk) 05:39, 29 March 2020 (UTC)[reply]

@TAKASUGI Shinji: As I have stated above, this kind of usage are not related to the categories for names and thus has nothing to do with this topic. -- Huhu9001 (talk) 09:20, 29 March 2020 (UTC)[reply]

I see. — TAKASUGI Shinji (talk) 10:18, 29 March 2020 (UTC)[reply]

L2 of taxonomic components

In reaction to seeing Wiktionary:Requests for verification/Non-English#nematodes. The consensus seem to be that components of taxonomic names that (although structurally and grammatically conforming to Latin) are not attested in Latin texts should not get an L2 of Latin, but solely of Translingual (and possibly other languages in which the term is attested). See e.g. Wiktionary:Etymology scriptorium/2011/March and Talk:albifrons. AFAIK this has never been laid down as a rule, though, and it is a recurring issue. Should we add a clause to Wiktionary:Taxonomic names to this effect? --Lambiam 12:06, 11 March 2020 (UTC)[reply]

Yes, this should be recorded somewhere (go for it). Any word that exists solely in taxonomic names is Translingual. - -sche (discuss) 08:43, 12 March 2020 (UTC)[reply]

And while you are at it we should have "mul" headword templates or redirects to the corresponding Latin headword template for them. For nouns one might have to include genitive plurals in a micro declension table. DCDuring (talk) 17:03, 12 March 2020 (UTC)[reply]

Added. - -sche (discuss) 02:12, 18 March 2020 (UTC)[reply]

Also added to WT:ALA. - -sche (discuss) 02:14, 18 March 2020 (UTC)[reply]

CAT:en:List of sets, CAT:en:List of topics

@Rua Maybe you understand this. What is the purpose of these two categories, and what is the difference between the two of them? Each of them contains a subset of the total set of topic categories, with no rhyme or reason as to what is included and what is not. I'd like to fix them up so that all topic categories go in one or both, but first I need to understand their purpose. Benwing2 (talk) 06:05, 13 March 2020 (UTC)[reply]

Same request for the generic CAT:List of sets, CAT:List of topics. Benwing2 (talk) 06:07, 13 March 2020 (UTC)[reply]

This might have been related to the idea of keeping sets that contain a list of items that are each an example of the set's name (e.g. bones) separate from topics that contain words describing or relating to a field that is the topic's name (e.g. anatomy). If we're going to organise our categories that way, we should do it right. That requires some serious thought, including considering a new naming scheme so that people know what's what (I think it was @-sche who suggested something like Category:en:list:Bones for the set categories). —Μετάknowledge^{discuss/deeds} 06:47, 13 March 2020 (UTC)[reply]

Yeah, if we want anyone to be able to distinguish these categories, and know which entries go in which categories, they need to have [more] distinct names. I would go as far as (to arbitrarily pick a word to use as an example) "en:list:Birds" (for sparrow, crow, etc) and "en:topic:Birds" (for beak, birdfood, etc). - -sche (discuss) 02:03, 18 March 2020 (UTC)[reply]

Importing and exporting quotations

It'd be fun to import and export quotations from other Wiktionaries. We have, for example, 6800+ in Category:Spanish terms with quotations which we could ship off to es.wiktionary, and fr.wiktionary could probably fiddle with to make a list of all French words with quotations. How would that work? --Alsowalks (talk) 14:21, 13 March 2020 (UTC)[reply]

I'm assuming at least one another WT has something like Category:Terms with quotations by language, which would be a good place to start. --Alsowalks (talk) 14:22, 13 March 2020 (UTC)[reply]

Given the vast number of different formats for quotations used on this wiki, not to mention the problem of different senses, it wouldn't. DTLHS (talk) 16:09, 13 March 2020 (UTC)[reply]

Perhaps fun is not the right word. It'd be useful, though. --Alsowalks (talk) 20:44, 13 March 2020 (UTC)[reply]

"Unsimplified": Mainland China-related Entries displaying Traditional characters only

Hello all. I'm not sure if this should be a policy discussion or a technical discussion in the Grease Pit, but I think the change I want to propose is already mirrored in the functionality of other parts of the website, so I don't think there is an existing technical problem. I have been working on geography related topics in Wiktionary since 2017 and I am very frustrated that Wiktionary's entries for counties 县, towns 镇 and townships 乡 in mainland China, where simplified Chinese is the primary written script, all use only traditional characters in zh-div. See my most recent entry 苔菉 (Táilù) which reads "(～鎮) Tailu (a town in Lianjiang, Fuzhou, Fujian, China)". It ought to read something like, "(～鎮/镇) Tailu (a town in Lianjiang, Fuzhou, Fujian, China)" or perhaps even "(～镇/鎮) Tailu (a town in Lianjiang, Fuzhou, Fujian, China)". This should be possible to accomplish given that simplified characters automatically appear when traditional characters are added in zh-der or zh-l. The time for this rank absurdity is over my friends. Is this a website ready to be used by the world or is it still in beta? --Geographyinitiative (talk) 12:45, 14 March 2020 (UTC)[reply]

It's not a bad idea. But the status quo isn't "rank absurdity"; it's perfectly consistent with our choice to lemmatise at traditional forms. Many people have tried to tell you this, but I will try again: histrionics and rants will only make people less likely to side with you. (Also, Wiktionary is always "in beta" — that's how it works.) —Μετάknowledge^{discuss/deeds} 17:29, 14 March 2020 (UTC)[reply]

Capitalization of nationalities

In most Latin script based languages, nationalities are capitalized. This is the case in English, and it reflected as such on en.Wikt. @Rua however seems to disagree with this general practice, and when I probed her for an explanation, she declined, and has now locked Reconstruction:Proto-Germanic/finnaz without any rational. --{{victar|talk}} 22:23, 15 March 2020 (UTC)[reply]

This is hard, because it's an arbitrary orthographic distinction in a reconstructed language. We seem to follow Ringe in most matters on PGem, so what does he do in terms of capitalising names of nationalities? —Μετάknowledge^{discuss/deeds} 22:33, 15 March 2020 (UTC)[reply]

It reminds me of the considerations about Latin.

* Wiktionary:Tea room/2019/July § Boundaries of noun vs. proper noun in Latin, and use of capital vs. lowercase initial letters

I reasoned that for Latin it is most appropriate to capitalize the noun but the adjective not. Aptly the first example in the first discussion linked is Fennus. It would be odd to have it uncapitalized, the same in Proto-Germanic. Note particularly my third point there: “English is irregular in capitalizing both nouns and adjectives.” Fay Freak (talk) 22:41, 15 March 2020 (UTC)[reply]

@Metaknowledge: We capitalize reconstructed given names, so to capitalize one and not the other seems far more arbitrary. Most authors avoid reconstructing names, places, nationalities, etc., all together, but I can try and dig of some examples, if that helps. To note though, {{R:gem:HGE}} does not capitalize any of its entries, but Orel is also Russian, and Russians do not capitalize nationalities in Cyrillic. --{{victar|talk}} 22:56, 15 March 2020 (UTC)[reply]

Names of nationals are essentially unlike given names. But note according to my logics Volscī is a proper noun, as Polish Niemcy (“Germany”, literally “Germans”), while Volscus is a noun – compare English Maya as a proper noun and as a noun –, and volscus is but an adjective. I think we are most consistent if we do it like this for Proto-Germanic. Fay Freak (talk) 23:14, 15 March 2020 (UTC)[reply]

@Fay Freak: That would be fine with me. --{{victar|talk}} 23:53, 15 March 2020 (UTC)[reply]

@Metaknowledge: Ringe does indeed capitalize nationalities, ex. *Rūmōnīz ({{R:gem:PIEPG|146}}), where as we (Rua) do not. --{{victar|talk}} 23:53, 15 March 2020 (UTC)[reply]

This however does not lead us to believe the same for Proto-Slavic. Category:sla-pro:Nationalities has all in lowercase, presumably because Proto-Slavic orthography follows Old Church Slavonic Cyrillic orthography – and it is the same for all Cyrillic-written Slavic languages except Serbian because this mirrors the Latin orthography used concurrently, so we write Serbo-Croatian Чѝфутин (“Yahudi”). Fay Freak (talk) 22:51, 15 March 2020 (UTC)[reply]

Apart from Serbo-Croatian, Slovene, Macedonian, Polish, Czech and Slovak also capitalises nationalities (nouns referring to nationalities only, not language names or adjectives!). Russian, Ukrainian, Belarusian, Bulgarian doesn't. Just listed the major Slavic languages. In Romance languages Italian, Romanian, Portuguese and Spanish don't capitalise at all. French only capitalises nouns referring to nationalities only, not language names or adjectives. --Anatoli T. ^{(обсудить}/^вклад) 00:17, 16 March 2020 (UTC)[reply]

I already posted this purview of Slavic and Romance at the first discussion linked 🤓. Can somebody make an overview of the practices in the Germanic languages? Well it is basically Dutch, Afrikaans, Danish, Norwegian, Swedish, Faroese, and Icelandic, right, minority languages will copy the majority practice I presume (how Luxembourgish?). Perhaps one needs to document somewhere in a table the practices of all standard languages for reference. Fay Freak (talk) 00:57, 16 March 2020 (UTC)[reply]

Where is your Slavic purview? Anyway, Dutch/Afrikaans capitalise everything (nationality - nouns and adjectives). German capitalises ALL nouns but all adjectives (icluding nationality-related) are lower case (e.g. deutsch). North Germanic are all lower case. --Anatoli T. ^{(обсудить}/^вклад) 01:10, 16 March 2020 (UTC)[reply]

The passage “3. The writing tradition differs according to the country […]“ at Wiktionary:Tea room/2019/July § Boundaries of noun vs. proper noun in Latin, and use of capital vs. lowercase initial letters. Fay Freak (talk) 01:16, 16 March 2020 (UTC)[reply]

Translations of big fish in a small pond

Shouldn't translations of an entry be the same part of speech as the entry? In big fish in a small pond, the Mandarin and Spanish translations seem to be idioms or proverbs rather than nouns. — SGconlaw (talk) 13:57, 16 March 2020 (UTC)[reply]

You're right, I've removed the Spanish. Ultimateria (talk) 18:43, 16 March 2020 (UTC)[reply]

Oaths

What is the function of Category:Oaths by language? There are very few entries, and it looks like they're just various swear words and minced oaths added by an overzealous IP. Ultimateria (talk) 18:41, 16 March 2020 (UTC)[reply]

I think they're an attempt to fill the role of Category:Vulgarities by language by someone who doesn't know the terminology we use. Chuck Entz (talk) 02:37, 17 March 2020 (UTC)[reply]

Are things like by all that is good and holy and by my troth vulgarities? DCDuring (talk) 02:54, 17 March 2020 (UTC)[reply]

There may be a core of legitimate old examples, but the flood of new additions are merely vulgar or euphemized vulgar interjections. "Frig it" is not an oath as far as I'm concerned. Maybe I just need to do some reverting... Chuck Entz (talk) 03:08, 17 March 2020 (UTC)[reply]

Also, by God is/was apparently not considered blasphemous by most speakers (and editors) AFAICT. I feel we can't trust our modern atheistic sensibilities to correctly label and categorize minced oaths, euphemisms etc. DCDuring (talk) 15:22, 17 March 2020 (UTC)[reply]

Yeah, many entries were added by a user who seems to not quite grasp how we label things / how to label things; look at the edit history of like a cow pissing on a flat rock, for example. As Chuck says, it's possible there are some legitimate candidates for such a category (though whether there are enough to merit the category existing is another question), but there are (or were at various times recently) wrong entries. - -sche (discuss) 01:55, 18 March 2020 (UTC)[reply]

"Eye dialect" label

I think we should consider abolishing the "eye dialect" label and replacing it with something else, such as perhaps "pronunciation spelling", or whatever we can agree on. My feeling is that the term is largely unknown within the general population, and also there seems to be perennial uncertainty or lack of agreement about whether it should or should not include cases such as gerrit (= "get it"), goin', dwagon etc., that represent nonstandard or defective pronunciations. What do you think? Mihia (talk) 21:19, 17 March 2020 (UTC)[reply]

I think we should keep it, but we should only use it for things that are actually eye dialect: nonstandard spellings that represent the standard pronunciation, like sez. Your examples represent nonstandard pronunciations and so are not eye dialect. —Mahāgaja · talk 21:28, 17 March 2020 (UTC)[reply]

Just to mention again (you may already be aware, and apologies if it is repetitive) that this is not what eye dialect presently says in the "broad" definition. Mihia (talk) 23:32, 17 March 2020 (UTC)[reply]

As another data point, I see that the "broad" definition in our regular entry at [[eye dialect]] is at odds with the Appendix:Glossary definition at [[Appendix:Glossary#eye_dialect]].

I may be wrong, but I was under the impression that the Appendix:Glossary is (at least supposed to be) the set of definitions relevant to our various labels. It occurs to me to wonder if that is a correct understanding? ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:53, 17 March 2020 (UTC)[reply]

When you click on the "eye dialect" label link, it takes you to the dictionary entry at eye dialect, so that is what users who don't know the meaning of the term will be reading. Mihia (talk) 00:01, 18 March 2020 (UTC)[reply]

@Eirikr FWIW, Appendix:Glossary seems to have only changed to its current, narrow definition in June of last year by the user who prefers that the narrow definition be the only definition used. (Who is a great lexicographer, and just seems to be in the minority here when it comes to ideas of how broadly the label can be used.) - -sche (discuss) 01:49, 18 March 2020 (UTC)[reply]

@Mihia: The discussion I'm always referring people to is Wiktionary:Tea room/2015/January § gub'mint. See especially the second part of the discussion, which starts with Dan Polansky's intervention. I think it encapsulates perfectly what the problem is. (Note that "Angr" is Mahagaja's former username.) P U C 23:55, 17 March 2020 (UTC)[reply]

If we have not properly resolved this since 2015, I think that is a good reason for abolishing the label altogether. Mihia (talk) 00:18, 18 March 2020 (UTC)[reply]

Yeah, I mean, it's clear that both newer users as they come along and several veteran editors continue to use the label and template broadly, so realistically we will not reach a situation where only "narrow-defition eye dialect" entries use the template/label. That's exactly the kind of situation where I would consider either retiring the template altogether ... or just continuing our longstanding broad/loose usage of it (which might entail revisiting the entries that have been switched to {{pronunciation spelling of}}) (until it was deleted, there was also {{pronunciation respelling of}}). - -sche (discuss) 01:49, 18 March 2020 (UTC)[reply]

Reconstruction:Proto-West Germanic/-ōjan

West Germanic *-ōjan is extended form of the verbal suffix *-ōn, specific to North Sea Germanic languages. @Rua deleted it (now restored) and believes it shouldn't exist because it's a "Post-PWG innovation". If you follow that same logic, we would have to delete all the Proto-Nuclear-Indo-European entries that aren't shared with Anatolian, ex. *-éh₁yeti extended from *-yeti. I think having a North Sea Germanic label would be a great idea, but deleting it simply because it's a dialectal form of West Germanic makes no sense. @Mahagaja, Leasnam, Metaknowledge --{{victar|talk}} 18:17, 19 March 2020 (UTC)[reply]

I agree that regional PWG is still PWG. Besides, unless we set up a code for Proto-North Sea Germanic, where else are we going to put it? —Mahāgaja · talk 18:24, 19 March 2020 (UTC)[reply]

@Mahagaja: I had created two separate entries, *-ōjan and *-ōn, but Rua merged the former into the latter, even though they're two separate suffixes. --{{victar|talk}} 18:29, 19 March 2020 (UTC)[reply]

The innovation was not formed within a single language, but rather a continuum. It then spread to some degree through the continuum. But it's, per Ringe, a post-PWG innovation. Moreover, the two suffixes have the same function and are allomorphs, so they should not be treated as distinct lemmas. Rather, one would be a dialectal form of the other.

In any case, if we accept this allomorph as part of PWG, does that mean we have to include all those allomorphs in our inflection tables of class 2 and 3 weak verbs? That seems excessive. —Rua (mew) 19:40, 19 March 2020 (UTC)[reply]

Even if we assume it's a regional innovation, I think there is a very strong case of it still occurring quite early, at a point one could argue being still PWG, or at least a dialect of it, or "post-PWG", as you like to say.

To answer your question though, I would rather two tables, akin to PIE entries like *wódr̥, but maybe a combined *-ō(ja)n table in an option. What do you think? We also have to ask if the plural forms on the *-ōjan table should be merged, as per PNSG. --{{victar|talk}} 21:15, 19 March 2020 (UTC)[reply]

@Rua I see you edited *-ōjan to your liking but have not replied to the above. Can you please do so? Thanks. --{{victar|talk}} 16:40, 20 March 2020 (UTC)[reply]

Two tables is excessive, especially if we are going to have to do that for every class 2 and 3 weak verb. —Rua (mew) 17:14, 20 March 2020 (UTC)[reply]

@Rua Yeah, you said that. What other options would you support? --{{victar|talk}} 17:16, 20 March 2020 (UTC)[reply]

Violation of the rules (Edit warning) by Metaknowledge

I don't think I need to repeat myself. I wrote the details on my discussion page. If this is necessary, I will definitely write again. The administrator did not pay attention to this. -- Gnosandes (talk) 19:56, 19 March 2020 (UTC)[reply]

Suevic

This vote converted Frankish from a separate language to an etymology-only variety of Proto-West Germanic. However, I just discovered that Suevic is an etymology-only variety of the West Germanic language family, not of the proto-language. Generally, do we want to have such things as etymology-only varieties of families as opposed to (proto-)languages? And specifically, do we want Suevic to continue to be a variety of the West Germanic family, or do we want to convert it to a variety of Proto-West Germanic? —Mahāgaja · talk 20:19, 19 March 2020 (UTC)[reply]

I think the latter makes sense, making it a etym-only code for PWG. I created an entry for PWG *laiwarikā as an example of a possible Suebic borrowing. --{{victar|talk}} 23:14, 19 March 2020 (UTC)[reply]

I see your example with *laubiju, which illustrates the issue I've been mulling over of how to demonstrate what variety of PWG a borrowing comes from, be it Frankish, or Suebic. --{{victar|talk}} 23:30, 19 March 2020 (UTC)[reply]

Language of place names

I noticed Dumbrăveni among the new pages list this morning. It's a city name so explicitly allowed under the criteria for inclusion... but it was added as an English word. Should it have an English entry or only Romanian? Or Translingual, but Paris has separate entries for all the languages you might refer to Paris in. And México is in English as Mexico although I have seen an English-lanuage book use the accented form. Is there a policy that covers this situation? Vox Sciurorum (talk) 11:55, 20 March 2020 (UTC)[reply]

No. The status quo seems to be that all place names are English. DTLHS (talk) 16:34, 20 March 2020 (UTC)[reply]

Roma and Ουάσιγκτον (Ouásigkton) are placenames, but I doubt they are English. --Lambiam 21:03, 20 March 2020 (UTC)[reply]

Fine, I should have said, "all places can have English entries". DTLHS (talk) 21:12, 20 March 2020 (UTC)[reply]

I don't think that's true, either. It's simply that it's quite hard to find a placename that hasn't been used in English text or on an English map enough to meet CFI — but it's theoretically possible. —Μετάknowledge^{discuss/deeds} 21:59, 20 March 2020 (UTC)[reply]

Does quoting a foreign word three times in English turn it into an English word? Or is there a phase where it's still considered foreign, and in time it is considered part of the language as people forget its origin? Vox Sciurorum (talk) 23:06, 20 March 2020 (UTC)[reply]

Quoting is insufficient; it must be used. See use-mention distinction. —Μετάknowledge^{discuss/deeds} 23:08, 20 March 2020 (UTC)[reply]

It appears as "Dumbrăveni" in English text so it is fine. I assume it is borrowed straight from Romanian so would also have a Romanian entry with their pronunciation. -Mike (talk) 21:18, 22 March 2020 (UTC)[reply]

I couldn’t figure out its grammatical gender in Romanian. --Lambiam 02:00, 23 March 2020 (UTC)[reply]

The sheer terror of English prepositions

Are we going to do WikiGrammar yet? (see Talk:get_around.) I think we are completely right in refusing to have entries for SoP stuff with a preposition stuck on it. On the other hand we are not making it easy for people to learn which preposition works with which verb. (It's not quite like Finnish where they are glued on the end...) Just a thought. WikiGrammar could also answer all those boring questions like "is it really okay to split an infinitive?" and this crazy bullshizz [5]. Equinox ◑ 01:44, 22 March 2020 (UTC)[reply]

I still think we should include collocations in some capacity. We'll never have the potential to be a good translation dictionary until we do. Andrew Sheedy (talk) 05:54, 22 March 2020 (UTC)[reply]

Agreed. And not only translation, many dictionary users are probably accustomed to looking up phrases and not being to do so on Wiktionary might be a serious drawback. Crom daba (talk) 05:02, 28 March 2020 (UTC)[reply]

Examples of English verbs that cannot be adjectives

It is common that the simple past tense and past participle form of a verb can also be used as an adjective in English: e.g. covered. We have many such verb forms that do not include the adjective form as well: e.g. weatherized. I'm wondering if there is any pattern to verbs that can and cannot be adjectivize-ed like this so that I can add several of these adjective forms in one go. Does anyone know of tacit or explicit grammatical rules regarding this? Thanks. —Justin (koavf)❤T☮C☺M☯ 21:50, 23 March 2020 (UTC)[reply]

Why add an adjective section to the entry for the past (and present) participles if the meaning is completely predictable from the meaning of the verb? If there is a sense that is not predictable, that seems worthwhile. Usage examples for the participles showing their varied use seems worthwhile, too. That such a thing is predictable is exactly what makes it not worth a separate PoS section. IMHO, this is a case were ease of copying makes for a waste of user — and contributor — time. DCDuring (talk) 00:51, 24 March 2020 (UTC)[reply]

First off, that is why I'm asking. I'm trying to figure out if some verbs can act like adjectives like this and some can't: maybe some entries could use the adjective form and others are ungrammatical. Additionally, some of these adjectives may have non-obvious meanings or some verbs only have one sense that can be adjective-ized (still not sure how to hyphenate that). Secondly, I think it's totally worth it in as much as these entries are all "[verb]-ed: simple past tense and past participle form of [verb]", so I don't see any readers' time being wasted or any entries becoming unwieldy and long. —Justin (koavf)❤T☮C☺M☯ 01:18, 24 March 2020 (UTC)[reply]

I don't know if there's any way to predict, from just the form/etymology/pronunciation/sense of the word itself, whether or not it's been used as an adjective. One would have to check for citations where the word clearly adjectival and not verbal. In particular, any that has -er and/or -est forms clearly deserves and benefits from an adjective section; this makes me realize that fallen (another one I looked at, which we labelled incomparable) has an -est form. - -sche (discuss) 03:18, 24 March 2020 (UTC)[reply]

All the transitive ones have 'adjective' meanings like "that is [VERB]ed". When such a past participle is used predicatively it is not readily distinguished from being a component of a passive use of the verb. "The door was closed" could refer to the action of a door having been shut or the state of the door as the result of the action. The usual tests for adjectivity include gradability, comparability, predicate use (not readiliy applicable as mentioned above), and novel semantics. Just about any past participle can be used gradably or comparable, though sometimes the use is uncommon.

More confusing are the denominal verbs like arm. A regular formation is the application of -ed to the noun to yield a meaning like "having arms or an arm" for most definitions of the noun arm. Then adding -ed to the verb arm yields a word that, for some meanings of arm (verb) is not readily distinguished from the meaning of arm(s) + -ed.

I just don't see that there is any simple rule to follow that won't lead to the addition of adjective PoS sections that are spurious.

A more laborious process would be to add a References header with links to OneLook and OED and look up whether other dictionaries have adjective definitions of the past participles. Then you could add the definitions, while carefully avoiding COPYVIO. DCDuring (talk) 03:39, 24 March 2020 (UTC)[reply]

If the simple past and past participle verb forms are not identical, only the latter can be used as an adjective (ate – eaten, drank – drunk, flew – flown, forsook – forsaken, went – gone, swore – sworn, wrote – written). --Lambiam 14:12, 24 March 2020 (UTC)[reply]

“A had chance” and “a been opportunity”, although understandable, sound peculiar to me. --Lambiam 14:22, 24 March 2020 (UTC)[reply]

I‘d recommend the following boundaries, for both present and past participles:

If the term in question has an attestable meaning that is not entirely predictable from the verb, do include it.
Otherwise, if the term in question does not pass, attestably, both the gradability and the comparability test, do not include it.

As to the first one, I’d like to see it applied while erring on the side of inclusion. As to the second one, there may be a semantic reason for the lack of gradability/comparability (as for knocked up), which then offers an exemption. --Lambiam 14:41, 24 March 2020 (UTC)[reply]

Geographyinitiative's latest rant about Chinese

For those interested, expand the fold.

For @Geographyinitiative, your behavior in this pursuit is strange, not easily understood by others, and appears highly indicative of a disturbed mind. Please seek professional psychiatric help. This is a serious and sincere offering of advice. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:14, 26 March 2020 (UTC)[reply]

"Shaming Wiktionary", "Further shaming" threads

Shaming Wiktionary

Chinese is not a language, not even on Wikipedia: in the second sentence on Chinese language, it says "Chinese langauges"

You all need to be shamed for allowing this website to pretend that Mandarin Chinese and Cantonese are the same language. Also, Hokkien/Min Nan is another language. Also, Middle English has it's own header on Wiktionary, but isn't it funny that Chinese doesn't have, I don't know, maybe an Ancient Chinese header? or a Classical Chinese header? It almost seems like we want to pretend that there is a language with 10 exisiting unintelligible dialects that has never changed for 5000 years. Bizarre huh?

^[1]^[2]^[3]

In order to protect my account, I can not do any direct responses since I am directly challenging the status quo on this website.

Template:zh-pron is an abomination because it puts Mandarin on top. In "Translation" boxes on English word pages, Mandarin is put right before Min Dong and right after Hakka where it belongs: in alphabetical order.

Again, I can't do direct responses because of the danger to the existence of my account. But I can speak truth to the man.

--Geographyinitiative (talk) 13:15, 26 March 2020 (UTC)[reply]

Further shaming

You remember all that rich history and language from 5,000 years of uninterrupted Chinese history? Yeah, all of that is under ONE language header on Wiktionary. There was never a time when anyone in the borders of modern China used Chinese characters in a way that constituted something outside the scope of "Chinese" (and Mandarin is the de-facto form of Chinese, so it has to get the top spot!) Be ashamed --Geographyinitiative (talk) 15:27, 26 March 2020 (UTC)[reply]

What's more likely: that Wiktionary is being enthraled to a backwards political campaign to destroy non-Mandarin langagues, or that there are some other languages that have cropped up beside Mandarin Chinese in the past 5000 years? Oh yeah don't forget: for this one language header on Wiktionary, there are nine Wikipedia language versions! Look upon your shame Wiktionary!

^ Victor Mair (2018 January 3) “Hakka now an official language of Taiwan”, in (Please provide the book title or journal name)‎^[1]: “Hakka thus joins Taiwanese / Hokkien / Hoklo and Mandarin as an official language of Taiwan.”
^ Henning Klöter ((Can we date this quote?)) “Language Policy in the KMT and DPP eras”, in (Please provide the book title or journal name)‎^[2]:
Other terms for the same language include Hoklo (also spelled Holo; the etymology of these terms is uncertain), Taiwanese Min, and Taiwanese Hokkien. Hokkien reflects the Southern Min expression Hok-kien for Fujian. Its frequent use notwithstanding, the term Taiyu/Taiwanese has also been criticised as it suggests that Southern Min is the only local language of Taiwan. This is by no means the case. Other languages spoken in Taiwan are Mandarin Chinese
^ “Taiwanese Hokkien/Southern Min”, in (Please provide the book title or journal name)‎^[3], (Can we date this quote?): “The Southern Min languages, or Min Nan, are a family of Chinese languages spoken in southern Fujian, in Taiwan (where it is known as Taiwanese) and in Southeast Asia (known as Hokkien in Singapore and Malaysia).”

It looks like you read the first paragraph of the Wikipedia article you linked, you should move on and read the second. It would answer several of your questions. - TheDaveRoss 16:30, 26 March 2020 (UTC)[reply]

Well, I support the idea to rearrange t:zh-pron in alphabetical order. By the way, any links for where these comments are from? 恨国党非蠢即坏 (talk) 02:31, 27 March 2020 (UTC)[reply]

I oppose this rearrangement. Mandarin pinyin is definitely more in demand than any other Chinese variety. --Anatoli T. ^{(обсудить}/^вклад) 06:54, 27 March 2020 (UTC)[reply]

If we were to sort by "more in demand", Spanish L2 should have come before, say, Czech ones. 恨国党非蠢即坏 (talk) 09:59, 27 March 2020 (UTC)[reply]

The content above was made on this page, in this section, before the section was given a new header and the comments collapsed into that box.- -sche (discuss) 03:32, 27 March 2020 (UTC)[reply]

Thank you very much. 恨国党非蠢即坏 (talk) 05:44, 27 March 2020 (UTC)[reply]

I don't know what Geographyinitiative is mumbling about. There is no other site that gives so much respect to Chinese lects and gives so much information to users and learners. If he were truly interested, he would strive to add more Min Nan and other topolect contents. Anyone who has seen Chinese entries in the expanded mode, knows how much information there is. Various types of romanisations and regional readings. There are not only Chinese topolects (major Chinese/Sinitic dialects or languages) but also subdialects.

There are 69,994 Min Nan entries and counting. If something is (still) missing, it's because it's hard to add or no knowledge. Look at how Min Nan 講／讲 (kóng / káng) (a very simple example).

Look at these usage examples:

我袂曉講閩南語。 [Hokkien, trad.]
我袂晓讲闽南语。 [Hokkien, simp.]

Góa bē-hiáu kóng Bân-lâm-gí. [Pe̍h-ōe-jī]

I can't speak Min Nan.

伊無佇厝。／伊无伫厝。 [Hokkien] ― I bô tī chhù. [Pe̍h-ōe-jī] ― He/she is not home.

There's a plethora of information for anyone who is really interested. Traditional and simplified character, POJ romanisation, links to each word, which will have more information including the pronunciation of each character.

It's Geographyinitiative who should be shamed, not everyone who has been doing a great job here. --Anatoli T. ^{(обсудить}/^вклад)

I agree that Chinese is not really a single language, but it is a lot more convenient and organized to have all the info on pronunciation across the various varieties, historical character forms, archaic meanings, etc. in one section. —Enervation (talk) 07:57, 11 September 2020 (UTC)[reply]

Part of speech "idiom"

Wiktionary:Entry_layout says "some POS headers are explicitly disallowed: ... Clitic, Gerund, Idiom", but there are many entries using this POS, e.g. 美輪美奐. Which one is right? 恨国党非蠢即坏 (talk) 09:52, 27 March 2020 (UTC)[reply]

I think we have essentially two definitions of idiom. The one that is explicitly prohibited as a header, because it's nonsensical, is idiom in the sense of a non-sum-of-parts term. Almost every term should be non-sum-of-parts. But the Chinese definition of idiom is a bit different and I recall User:Wyang or someone saying that it was hard to assign such terms to a specific part of speech, like noun, verb, or adjective. (I don't remember what discussion this was in.) So there are lots of Chinese entries with the Idiom header because it's hard to convert them to another header and part of speech category. — Eru·tuon 17:30, 27 March 2020 (UTC)[reply]

@Erutuon: We have "phrase" and "proverb". 恨国党非蠢即坏 (talk) 17:24, 1 April 2020 (UTC)[reply]

Learning Mandarin growing up, I remember the term "idiom" being used to describe such Mandarin phrases. I'd agree they are really like phrases (like 水落石出 (shuǐluòshíchū, “the truth is revealed”, literally “the water recedes and the rocks appear”)) or sometimes proverbs (like 平时不烧香，临时抱佛脚 (píngshí bù shāoxiāng, línshí bào fójiǎo, “never burning incense when all is well, but clasping Buddha's feet when one is in trouble”)), so either of those headings would be appropriate depending on the term. — SGconlaw (talk) 17:31, 1 April 2020 (UTC)[reply]

@Sgconlaw, that latter one reminds me of the English saying, there are no atheists in foxholes. :) ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:18, 1 April 2020 (UTC)[reply]

I think we should modify WT:EL to allow "Idiom" for certain languages (like Chinese). Another way we could do this is to change it to "Chengyu", "Xiehouyu", etc. to be more specific. @Tooironic, Suzukaze-c, Atitarev, thoughts? — justin(r)leung _{{ (t...) | c=› }} 03:59, 15 May 2020 (UTC)[reply]

Chinese terms can be labelled as "chengyu", "xiehouyu" or "proverb". If they are longer, "phrase" is allowed as a PoS. I have no objection of re-allowing "idiom" as a PoS. --Anatoli T. ^{(обсудить}/^вклад) 04:06, 15 May 2020 (UTC)[reply]

I like the idea of using Chengyu as a part of speech header. We just need to be clear about how Wiktionary defines what is and what is not a chengyu. It can get complicated. Certainly, it would be incorrect to label them as 'adjectives', 'nouns', etc. ---> Tooironic (talk) 08:05, 15 May 2020 (UTC)[reply]

I notice we have the terms chengyu and xiehouyu in the Wiktionary as English terms, but I really wonder if they have been fully assimilated into the language. In any case, I don't think it would be good to use them as headings as they are not widely understood. However, they could be mentioned in usage notes. — SGconlaw (talk) 11:17, 15 May 2020 (UTC)[reply]

The explanatory paragraph in the entry for xiehouyu and the long definition for chengyu suggest that the entries' contributor(s) didn't think the terms were understood by very many. No other OneLook dictionary has an entry for either (not to say we shouldn't, with the appropriate label).

Further, these certainly seem like hyponyms of proverb to me. They might make useful category names, but seem completely inappropriate for PoS headings in an English-language dictionary of Chinese.

Aren't Proverb and Phrase adequate? We've forced English into the procrustean bed of the Latin parts of speech, with minimal accommodation (Determiner, Proverb, Phrase). Why shouldn't Chinese suffer the same? DCDuring (talk) 14:24, 15 May 2020 (UTC)[reply]

Vulgar Latin conjugation and declension tables

Hi guys,

I have noticed we have those neat templates VL-conj and VL-decl along with whole declension / conjugation class breakdowns and split between Eastern and Western forms. The contents of these templates is, however, never sourced and its use sometimes not justified (like presenting the Eastern declension of *fortia even though this lexeme is not attested in Eastern Romance. So far the templates don't even have a proper documentation. Is there some policy that would apply here? Doesn't it fall a bit under the original research? Michalite (talk) 10:09, 27 March 2020 (UTC)[reply]

We don't have a rule against original research here; in fact, we use original research in order to produce definitions all the time! But I agree that these are a bit questionable, and I understand your suspicion. In cases where it is not justified, there is a potential technical solution, but if you think they should be removed, that is a discussion in which all Latin editors should be invited to express their thoughts. —Μετάknowledge^{discuss/deeds} 20:24, 28 March 2020 (UTC)[reply]

Why do Old English entries lack macrons? And why do Hebrew entries mark begadkefat consonants?

I've noticed two things: 1) Old English entries do not mark vowel length in the page names, neither by macron nor by acute accent; and 2) Hebrew terms always mark begadkefat, even though the concept is almost universal with the typical consonants and is one of the first things taught to learner? I'm proposing we rename pages for the Old English dilemma with acute accents, while the Hebrew tag "terms beginning with a begadkefat letter" just gets removed. Starbeam2 (talk) 17:27, 27 March 2020 (UTC)[reply]

I can't speak to Hebrew, but the actual Old English texts that I've seen aren't marked for length, and length information isn't universally available. In such cases (including Latin), we have the page name without the macron, but show the macron in the headword. See WT:AANG and WT:AHE for more. Chuck Entz (talk) 17:51, 27 March 2020 (UTC)[reply]

To clarify, "the Hebrew tag 'terms beginning with a begadkefat letter'" means Category:Hebrew terms beginning with a begedkefet letter I think. — Eru·tuon 18:11, 27 March 2020 (UTC)[reply]

Erutuon is right, i did forget the name exactly, but it was what i meant. Anyhow; Chuck Entz, the vast majority of Old English words have predictable length. While i understand the actual text did not mark macrons, most people learning Old English, Latin, or any pre-language for any reason isn't reading directly from the original manuscripts. That said, i understand easier knowing that Latin and other languages that originally didn't mark it also do this. Starbeam2 (talk) 19:09, 27 March 2020 (UTC)[reply]

How is Old English vowel length predictable? — Eru·tuon 19:20, 27 March 2020 (UTC)[reply]

Because the descendants and cognates typically have long vowels as well. It's obviously not foolproof, tho. Starbeam2 (talk) 20:18, 28 March 2020 (UTC)[reply]

That is not what people mean when they say "predictable", and is a terrible basis for making an Old English dictionary. —Μετάknowledge^{discuss/deeds} 20:20, 28 March 2020 (UTC)[reply]

For Old English, you're proposing that we change the length mark from macron (stān) to acute (stán). Why? — Eru·tuon 23:01, 28 March 2020 (UTC)[reply]

I have also wondered about the usefulness of Category:Hebrew terms beginning with a begedkefet letter. What purpose does it serve? —Mahāgaja · talk 19:59, 27 March 2020 (UTC)[reply]

Formatting of terms that are both a plural form and a plurale tantum

Discussion moved from Wiktionary:Tea room/2020/March#Formatting of terms that are both a plural form and a plurale tantum.

How are terms that are both a plural form and a plurale tantum formatted? For example, heats is the plural of "heat" and also a plurale tantum meaning "a period of hot weather". This is a problem because those terms needs the categories for both nouns and lemmas, and noun forms and non-lemmas; one head template is not able to do that.

I used two numbered headings ("Noun 1" and "Noun 2") but it seems that is nonstandard; my edits were reverted. I also looked at the Garinagu entry, which uses two noun headings but they are not numbered. J3133 (talk) 00:23, 28 March 2020 (UTC)[reply]

You should use two headers. There is no need to number them. DTLHS (talk) 00:26, 28 March 2020 (UTC)[reply]

Old Saxon adjective table (e.g. siok)

This template creates wrong forms. Clearly, the neuter and feminine forms are mixed up (cf. "neuter" genitive -aro and "feminine" genitive -es). But I'm not sure it's all there is to it, because in the weak plural there's a nominative-accusative distinction for all genders including the neuter. I can't positively rule out that such a change happened in Old Saxon by analogy with the other two genders, but the rule that nominative = accusative in the neuter is pretty much a Indo-European generality throughout the ages, so it surprises me. Please invert the neuter and feminine forms and double check the whole thing. Thank you. 178.4.151.167 09:35, 28 March 2020 (UTC)[reply]

I fixed the issues you mentioned and some other blatant inaccuracies. I also added a bunch of possible variant endings for various forms, however currently the amount of dat. sg. strong forms is maybe a bit high. I'm not sure which to prune; I already left out quite a few possible variant endings. I believe adjs. like blīthi also have variant declensions not accounted for by the current module. There's a fair bit of work to be done on this template. — Mnemosientje (t · c) 15:19, 31 March 2020 (UTC)[reply]

@Mnemosientje Thank you! 92.218.236.35 23:01, 15 April 2020 (UTC)[reply]

PWG compounds

I've noticed some PWG compounds are being reconstructed with an intermorphemic thematic vowel, like *godakund and *hīwarād. If these are newly created compounds, shouldn't these be *godkund and *hīwrād? Even if they're inherited from PG proper, syncopation be taken into account, no? @Leasnam, Rua, Holodwig21 --{{victar|talk}} 21:07, 28 March 2020 (UTC)[reply]

Note that *bodaskapi has the vowel actually attested. The loss of short word/component-final vowels appears to be a post-PWG development. It occurred late enough that the lost vowels triggered umlaut. —Rua (mew) 21:59, 28 March 2020 (UTC)[reply]

@Rua: Noted. There are also examples of the retention of such intermorphemic vowels, but perhaps syncopation was also governed by certain syllabalic rules. But again to my first point, shouldn't many newly constructed PWG compounds be lacking this thematic vowel, such as the two examples above? --{{victar|talk}} 22:39, 28 March 2020 (UTC)[reply]

Aside from the aforementioned, Ringe includes these compounds with the theme vowel still present in his PWG reconstruction: *feþruhamō, *fullalaistijan, *gaduling, *galīkanassī, *gamainiskapi, *ganuhtisam, *gasinþaskapi, *hagatusi (if it's actually a compound), *langasam, *lobasam, *nāhawisti, *skuldihaitijō. Every attested language has undergone regular loss of such vowels after heavy syllables, and this loss postdates PWG. The vowels were preserved in light syllables, even in the case of a as shown by forms like *bodaskapi and *lobasam. Their apparent loss in other cases must therefore be analogical. It is likely that there was some pressure to make the compounded forms match the isolated forms in PWG and post-PWG times, and also to make the light forms match the heavy ones in the southern languages. But given that there are preserved instances, there are really only two possibilities for PWG: either a was lost after heavy syllables already in PWG, or (the option Ringe appears to advocate) a was preserved and its loss coincides with that of i and u in that position, with further analogical pressure due to the lack of a in the isolated form. —Rua (mew) 22:56, 28 March 2020 (UTC)[reply]

I find it a bit funny that I had a similar conversation re Proto-Brythonic some years ago, and the argument being made there was all descendants exhibit syncopation, so it must go back to PB.

My second concern hasn't been addressed in my examples of *godakund and *hīwarād, which, as new constructions, they should be moved to *godkund and *hīwrād. Objections? --{{victar|talk}} 17:37, 30 March 2020 (UTC)[reply]

Why would new constructions be different from inherited ones? —Rua (mew) 21:39, 30 March 2020 (UTC)[reply]

Because in my examples, their first elements have no thematic vowel. -- {{victar|talk}} 22:12, 30 March 2020 (UTC)[reply]

That doesn't mean they didn't have a thematic vowel in PWG. And as the examples I gave show, such nouns did keep their thematic vowel. —Rua (mew) 09:59, 31 March 2020 (UTC)[reply]

*gamainī +‎ *-skapi = *gamainiskapi makes sense. *hīw +‎ *rād = *hīwarād does not. If the word was inherited from PG, that's a different discussion all together. --{{victar|talk}} 17:52, 31 March 2020 (UTC)[reply]

Wait, why wouldn't that make sense? Just because the thematic vowel isn't visible in the nominative doesn't mean speakers of the language aren't aware of it or include it in new compounds. Works the same in Gothic: 𐌴𐌹𐍃𐌰𐍂𐌽 (eisarn) + 𐌱𐌰𐌽𐌳𐌹 (bandi) yields 𐌴𐌹𐍃𐌰𐍂𐌽𐌰𐌱𐌰𐌽𐌳𐌹 (eisarnabandi), not *eisarnbandi. — Mnemosientje (t · c) 18:53, 31 March 2020 (UTC)[reply]

@Mnemosientje: Thanks for the examples from Gothic. Because that's a pretty unique trick, inserting thematic vowels in compounds when it's lost in the nominative -- we don't see that in Middle Iranian languages and we don't see that in Proto-Brythonic, as far as I know. To just say though, Gothic is far more archaic than PWG, so I don't think we can apply the same rules wholehog. --{{victar|talk}} 21:50, 31 March 2020 (UTC)[reply]

It is an interesting phenomenon, but I am not sure it is unique given how early Germanic worked. I think this is really the crux of the argument here: (1) the phenomenon existed in PGmc and persisted into Proto-Norse and Gothic; (2) There is proof of it surviving in inherited terms in PWgmc which renders it not altogether unlikely that the thematic vowels were also used analogically in new compounds; (3) There are indications from Latinizations that new compounds (such as the name I mentioned) may have used it as well, but this is difficult to verify and (4) it is fundamentally very difficult to actually know due to the nature of our sources, which postdate PWgmc by hundreds of years. (I should have a closer look at the runic corpora for West-Germanic, I feel like it might hold some interesting data.) All in all, I think it's safest to just go with Ringe's view, especially seeing as he seems to basically be the only published academic source and precedent for our PWgmc project in its current form here on Wiktionary. — Mnemosientje (t · c) 08:56, 1 April 2020 (UTC)[reply]

Note that in the Slavic languages, the thematic vowel -o- is still productive for forming compounds despite the ending having been lost a millennium ago. However, it has since lost its connection to the stem-forming vowel, and is now used for any first element in a compound regardless of its original inflection. This is comparable to the situation in Ancient Greek, where -o- was inserted for many nouns that were original consonant stems. —Rua (mew) 09:38, 1 April 2020 (UTC)[reply]

Also, just to point out, our attestations of Gothic are from the 4th century (though the manuscripts are from the 6th), the same time that later PWG was spoken. Gothic would definitely be no more archaic than PWG. Phonologically they seem quite similar. —Rua (mew) 09:43, 1 April 2020 (UTC)[reply]

An interesting point regarding Gothic is that it doesn't always have the thematic vowel, even though the thematic vowel is used in most new compounds. I am not sure what are the conditions causing ommission of the thematic vowel, but apparently there are some. Compare 𐍅𐌴𐌹𐌽𐌳𐍂𐌿𐌲𐌺𐌾𐌰 (weindrugkja) with 𐍅𐌴𐌹𐌽𐌰𐍄𐌰𐌹𐌽𐍃 (weinatains), for example. Further examples: 𐌷𐌰𐌿𐌷𐌸𐌿𐌷𐍄𐍃 (hauhþuhts), 𐌷𐌰𐌿𐌷𐌷𐌰𐌹𐍂𐍄𐍃 (hauhhairts), 𐌰𐌹𐌽𐍈𐌰𐌸𐌰𐍂𐌿𐌷 (ainƕaþaruh), 𐍃𐌹𐌲𐌹𐍃𐌻𐌰𐌿𐌽 (sigislaun). — Mnemosientje (t · c) 12:00, 2 April 2020 (UTC)[reply]

Leaving aside the arguments made so far I could add that names may provide additional evidence of the use of thematic vowels in new WGmc compounds. For example, Radegund appears to be a Proto-WGmc formation, but preserves as -e- the thematic vowel of *rēdaz instead of yielding *Radgund. Early Frankish compound names appear to use thematic vowels across the board. Some of these names will be inherited, but some, like Radegund, appear to be new formations. Cf. w:List of Frankish kings and w:List of Frankish queens. — Mnemosientje (t · c) 10:41, 31 March 2020 (UTC)[reply]

@Mnemosientje: *gunþi only survived into OE in poetry, so, for what it's worth, I doubt *Rādagunþi was formed in PWG. --{{victar|talk}} 00:33, 1 April 2020 (UTC)[reply]

The element is found in affixed form in OS guthia and in compounds such as PWGmc *gunþifanō, OHG gundhamo, a whole host of compounds in OE (cf. Bosworth Toller or have a look at the list here) and beyond. I strongly doubt the term as simplex was not productive and well-known in proto-West-Germanic (predating the OE attestation and compound/affixed attestations by hundreds of years, and representing their common origin), and I do not think we need to assume a PGmc origin here. The fact it is only attested as simplex in OE is also misleading: the other continental early medieval Germanic languages have corpora that overwhelmingly skew towards religious or at least non-heroic/military texts. The Hildebrandslied is a rare exception and it indeed features a compound with gund. — Mnemosientje (t · c) 08:48, 1 April 2020 (UTC)[reply]

I'm not sure why you would assume that. There are plenty of words and affixes that solely exist in compounds. --{{victar|talk}} 09:03, 1 April 2020 (UTC)[reply]

Not really the main point of that post, but to address your concern: because there were proven routes of transmission of heroic stories between Anglo-Saxon England, early medieval Germany and Scandinavia from the Migration Period and onwards (as evidenced by the travel of various stories, such as the northward spread of the Theoderic, Ermanaric etc. legends from the Lombard kingdom and southern Germany) and there absolutely appears to have been a common poetic lexicon for heroic stories, especially within West-Germanic. Furthemore, the word is in e.g. OHG found in compounds where its meaning is clear simply by looking at the meanings of different words containing the same compound: if gundfano (fano is attested) means battle-banner and gundhamo (hamo is attested) means battle-shirt, and there likely existed many more compounds like this for the reasons I mentioned in this post and the previous one, it is a no-brainer that people would realize that gund- means battle, especially in light of the other argument I just made. So while there is no hard proof in that sense, as a historian and philologist I'd say it's a safe bet to say that the term would be recognized for what it meant.

However, that argument was an addendum to my main point: if the term existed as simplex in OE it is certain to have existed as simplex in PWGmc, and there is a good chance that while in OE it appears to have been mostly limited to poetry, it need not have been limited in the same way in PWgmc at all (especially as late PWgmc times, i.e. the Migration Period, appear to have been the time when the WGmc poetic tradition as we know it was born). It also very much appears to have been productive as an element in the creation of PWgmc compounds, not just Radegonde (which may be disputed), but also the aforementioned *gunthifano which I think is a clear-cut case of its productivity in compound formation. — Mnemosientje (t · c) 09:34, 1 April 2020 (UTC)[reply]

Yeah, this is all gone off on a tangent. Evidence points to PWG *gunþi existing, although you could maybe argue that OE gūþ is a backformation, but my point is that it seems to have clearly fallen out of favor as a word by the time PWG came about, making it less likely that it would have been used for the creation of novel compounds, personal names or otherwise, over other terms meaning "battle". --{{victar|talk}} 23:38, 1 April 2020 (UTC)[reply]

On the subject of names, I might add two more compounds that appear to be limited to WGmc and preserve the thematic vowel of the a-stems: the name Agilaþruþ found in the Griesheim fibula inscription (dated 6th century) and Alaguþ on the Schretzheim I inscription (going by Findell's naming; dated to c. 600). — Mnemosientje (t · c) 09:50, 1 April 2020 (UTC)[reply]

Literally no one is currently arguing that intermorphemic thematic vowels didn't exist in PWG, personal names or otherwise. What I'm exploring is their loss through syncopation and in newly formed constructions, and there isn't a lack of counter examples, i.e. *Hrōþiland. --{{victar|talk}} 23:38, 1 April 2020 (UTC)[reply]

And I am not arguing against that either, I am arguing that these are apparently names formed in PWGmc (given the lack of cognates outside WGmc) which feature, as you say, intermorphemic thematic vowels. Which seems to me quite relevant to the discussion? (As for the word *gunthi, I guess I just disagree with your conclusion regarding its productivity in PWGmc, it seems to me that there is plenty of evidence (including known common WGmc compounds) it was still used in creating new compounds in PWGmc times. But I am not sure what more evidence I could add to prove that point.) — Mnemosientje (t · c) 11:48, 2 April 2020 (UTC)[reply]

about reconstructed terms

Hi, is there a technical answer on connecting reconstructed terms as interwikis cross other wiktionaries? So we have tr:YeniKurum:Ana Türkçe/ạ̄gu on Turkish wiktionary. Do we need to add interwikis by hand? ~ Z (m) 13:54, 30 March 2020 (UTC)[reply]

Do I understand you want to create a page Reconstruction:Proto-Turkic/ạ̄gu? Does the term have any known descendants? --Lambiam 08:16, 31 March 2020 (UTC)[reply]

@Lambiam No, not creating a new page. Another example tr:YeniKurum:Ana Batı Cermence/-ōjan with Reconstruction:Proto-West Germanic/-ōjan are the same lemmas. So do we have a way of connecting these pages without the need to create a new item on wikidata just as in the entry pages? ~ Z (m) 10:29, 31 March 2020 (UTC)[reply]

I’ve just added tr:YeniKurum:Ana Batı Cermence/-ōjan to our Reconstruction:Proto-West Germanic/-ōjan page by hand without bothering with wikidata. As there are only two entries in the YeniKurum namespace, it seems not productive at the moment to consider something more automated. Or do you have a grand plan for adding many more entries there? --Lambiam 12:10, 31 March 2020 (UTC)[reply]

"WT:NORM" tag (again)

There was a previous discussion about this, which I cannot now locate, in which, as I recall, it was agreed that this tag was useless clutter to ordinary editors, and it was unclear whether "advanced" editors were actually making any use of it. If this tag is useful to some people then that's absolutely fine, but if not, can we just get rid of it? Mihia (talk) 21:02, 31 March 2020 (UTC)[reply]

I find it slightly useful as a patroller. I definitely check an edit with that tag first, because it's more likely to be vandalism. —Μετάknowledge^{discuss/deeds} 21:10, 31 March 2020 (UTC)[reply]

OK, fair enough, thanks. Mihia (talk) 21:48, 31 March 2020 (UTC)[reply]

@Metaknowledge: I would think the frequency of this tag desensitizes. And it is hardly comprehensible when the tag appears. It appears on white-space characters, particularly often when one adds some box template or image so that an extra line feed is counted, and it does not appear to be regulated how to place such side boxes. The tag also appears if there is a line feed between the first language section and {{also}}, which I did not know was wrong before observing the tagging behaviour, and I still don’t know the reason why (because of what rule, if you want) it is considered wrong if it is wrong. I do not remember a previous discussion, more than a mention. Fay Freak (talk) 22:42, 31 March 2020 (UTC)[reply]

I agree with WT:NORM that regulating white-space is good for sanity, but tagging every single edit that merely has bad whitespace ~somewhere~ in the entry definitely gets dull. —Suzukaze-c ◇◇ 22:27, 1 April 2020 (UTC)[reply]

I would find it much further from being useless if there were some indication of the specific location of the norm violation, or which norm was being violated. I can't be bothered to pore over the entry looking for suspicious-looking whitespace. I suppose that data must be processed somewhere in the system for the tag application to work properly; I don't know whether editors who are in the know have some way to access it, but as a casual editor this information is completely unclear to me.--Urszag (talk) 04:37, 15 May 2020 (UTC)[reply]

[1] Victor Mair (2018 January 3) “Hakka now an official language of Taiwan”, in (Please provide the book title or journal name)‎^[1]: “Hakka thus joins Taiwanese / Hokkien / Hoklo and Mandarin as an official language of Taiwan.”

[2] Henning Klöter ((Can we date this quote?)) “Language Policy in the KMT and DPP eras”, in (Please provide the book title or journal name)‎^[2]:
Other terms for the same language include Hoklo (also spelled Holo; the etymology of these terms is uncertain), Taiwanese Min, and Taiwanese Hokkien. Hokkien reflects the Southern Min expression Hok-kien for Fujian. Its frequent use notwithstanding, the term Taiyu/Taiwanese has also been criticised as it suggests that Southern Min is the only local language of Taiwan. This is by no means the case. Other languages spoken in Taiwan are Mandarin Chinese

[3] “Taiwanese Hokkien/Southern Min”, in (Please provide the book title or journal name)‎^[3], (Can we date this quote?): “The Southern Min languages, or Min Nan, are a family of Chinese languages spoken in southern Fujian, in Taiwan (where it is known as Taiwanese) and in Southeast Asia (known as Hokkien in Singapore and Malaysia).”

[1]

[2]

[3]