User:LA2
LA2 is the username for Lars Aronsson, Sweden. See w:user:LA2.
Wiktionary:Babel | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Search user languages or scripts |
For my cut-and-paste convenience: | |||
==Swedish== ===Etymology=== {{compound|a|b|lang=sv}} ====Conjugation==== ====Declension==== ====Related terms==== ====Usage notes==== ===References=== * {{R:SAOL|åäö|%e5%e4%f6}} * {{R:SAOB online|åäö}} ====Translations==== {{trans-top|}} {{trans-mid}} * Swedish: {{t|sv|}} {{trans-bottom}} |
===Adjective=== {{head|sv|adjective form}} # {{sv-adj-form-abs-indef-n|}} # {{sv-adj-form-abs-def-m|}} # {{sv-adj-form-abs-def+pl|}} # {{sv-adj-form-comp|}} # {{sv-adj-form-sup-pred|}} # {{sv-adj-form-sup-attr|}} ===Adverb=== {{head|sv|adverb}} |
===Noun=== {{head|sv|noun form}} # {{sv-noun-form-indef-gen|}} # {{sv-noun-form-def|}} # {{sv-noun-form-def-gen|}} # {{sv-noun-form-indef-pl|}} # {{sv-noun-form-indef-gen-pl|}} # {{sv-noun-form-def-pl|}} # {{sv-noun-form-def-gen-pl|}} |
===Verb=== {{head|sv|verb form}} # {{sv-verb-form-pre|}} # {{sv-verb-form-past|}} # {{sv-verb-form-sup|}} # {{sv-verb-form-imp|}} # {{sv-verb-form-inf-pass|}} # {{sv-verb-form-pre-pass|}} # {{sv-verb-form-past-pass|}} # {{sv-verb-form-sup-pass|}} # {{sv-verb-form-prepart|}} # {{sv-verb-form-pastpart|}} |
Diary
[edit]August 26, 2020: I start working on Appendix:Swedish corpus, based on my 2017 presentation.
June 2017: I submit a proposal for a presentation at the Wikimedia Central and East European conference in Warszaw in September. It is approved.
May 2017: I start to contribute to Ukrainian Wiktionary (my user page).
December 14, 2015: CodeCat is renaming several Swedish inflection templates for no apparent reason, leaving bewilderment and fatigue. For example, sv-noun-reg-er becomes {{sv-infl-noun-c-er}}
.
October 2014: I start to contribute actively to Russian Wiktionary (my user page).
May 4, 2013: Should sometimes read:
- Ladislav Zgusta, Manual of Lexicography (1971; foreword signed 1968) Google Books
- C.C. Berg (professor at Leiden), Report on the Need for Publishing Dictionaries which do not to-date exist (booklet, between 1960 and 1962, published by CIPSH, Conseil International de la philosophie et des sciences humaines)
February 2013: I start to contribute actively to Danish Wiktionary (my user page).
January 24, 2013: I introduce {{sv-compound}}
and category:Swedish compounds with maskin, as used for displaying Derived terms in maskin#Swedish. -- Bad idea.
November 19, 2012: Fun photo gallery: 10 Swedish words you won’t find in English: orka, harkla, hinna#Verb, blunda, mysa, vabba, duktig, jobbig, gubbe/gumma, mormor/farmor/morfar/farfar (actually 14).
August 27, 2012: I give up all hope about the Norwegian entries in en.wiktionary. Please remind me to stay away if any discussion should come up again.
April 18, 2011: To do: handgemäng, hägn, ohägn, hugnad, misshällighet
April 7, 2011: All the words from this article about common translation errors should be incorporated into Wiktionary.
April 3, 2011: I think I'm done with Swedish form entries for now. When the new XML dump arrived 20110402, Wiktionary contained 87,651 Swedish words. After parsing the XML dump I was able to generate 1521 new Swedish form entries. I have the machinery in place to fill in the missing form entries after each new dump. Now we need to expand the 20,000 Swedish gloss entries to a full Swedish vocabulary. But can that work be automated? How do we add the next 20,000 gloss entries without spending 3 minutes on each? (1000 hours or 25 weeks of fulltime work)
March 20, 2011: When spannen#Swedish is the definite singular of spann (bucket) and definite plural of spann (set of horses), I'd like to indicate in the form entry which sense belongs to which form. Perhaps "senseid" is the way to do this. Both the form templates and the declension/conjugation templates would have to take the sense ID as an extra parameter. This would be a major change to the 80,000 existing Swedish entries.
March 18, 2011: I create Appendix:Swedish verbs.
March 10, 2011: The new XML database dump shows 80,000 Swedish entries, yet another giant leap forward. My simple script for generating missing form entries has evolved into one that reads the declension and conjugation table template calls and concludes which form entry templates should be called from where. For example {{sv-noun-reg-ar|2=and}}
in ande should generate {{sv-noun-form-def|ande}}
in the page anden. If this form entry template call is found, fine. If not, the wanted form entry is saved as a file, that a modified version of pagefromfile.py can read. If the page doesn't exist, it is created. If it exists, a ==Swedish== entry is appended at the bottom. If a Swedish entry already exists, because "anden" is also the definite form of and, this is logged and I have to edit the existing Swedish entry manually. At least for now, this happens a lot. In some cases, a verb form entry is also an adjective form. In some cases, the form entry exists but uses another template (form of, plural of, ...) or no template at all. Right now I have a backlog of 8,000 entries to go through, or 10 percent of the existing stock. Maybe I should automate the addition of adjective form entries to Swedish entries that don't have an adjective subheading already ... done.
March 2, 2011: The most commonly used Norwegian templates are: {{no-noun-infl}}
(733 calls), {{nn-noun-m1}}
(351), {{nb-noun-m2}}
(221), {{nn-noun-form}}
(178), {{no-noun}}
(125), {{nn-verb}}
(101), {{no-noun-c}}
(97), {{nb-noun-m1}}
(87), {{nn-inf}}
(85), {{no-verb}}
(76), {{no-noun-m1}}
(73), {{no-noun-n1}}
(71), {{no-verb-1}}
(68), {{no-verb-2}}
(54), {{nn-noun-n1}}
(51), {{no-noun-mu}}
(48), {{no-adj-infl}}
(47), {{no-noun-form}}
(41), {{no-noun-irreg}}
(40), {{no-adj-2}}
(39), {{no-adj-1}}
(33), {{nn-verb-form}}
(32), {{nb-noun}}
(32), {{nn-verb-1}}
(30), {{no-adj}}
(26), {{nn-noun-f2}}
(24), {{nb-noun-n1}}
(23), {{no-verb_form}}
(22), {{nn-noun-irreg}}
(21), {{nb-class1}}
(18), {{nb-g}}
(17), {{nb-noun-c}}
(16), {{no-adj-3}}
(15), {{no-noun-nu}}
(13), {{nn-pers-pron}}
(13), {{no-noun-n4}}
(12), {{no-noun-n3}}
(12), {{nn-noun-f1}}
(12), {{nn-adj-2}}
(11), {{nb-verb-1}}
(11), {{no-noun-cu}}
(10), {{nn-adj-table}}
(10), {{nb-noun-n3}}
(10), {{no-verb-4}}
(9), {{nn-adj-1}}
(9), {{nb-verb}}
(9), {{no-noun-f}}
(8), {{nn-verb-2}}
(8), {{nb-adj-table}}
(8), {{nn-verb-form-pre}}
(7), {{nb-pers-pron}}
(7), {{no-noun-f1}}
(6), {{no-adv}}
(6), {{no-adj-irreg}}
(6), {{nn-noun-f3}}
(6), {{nn-adj-3}}
(6), {{nb-verb-2}}
(6), {{nb-class2}}
(6), {{nb-adj-2}}
(6), {{no-verb-form}}
(5), {{no-noun-reg-m}}
(5), {{nn-g}}
(5).
February 27, 2011: I don't speak French or Italian, but when I saw all these form entries (mostly created by Keenebot2 and SemperBlottoBot) for verbs using the primitive {{form of}}
, I started to substitute them to the more structured {{conjugation of}}
. See Template talk:conjugation of#Stats. I have made the following translations of parameters:
|
|
February 25, 2011: In the XML database dump of 2011-02-05, the most common headings for Swedish entries (compare August 21, 2010) are:
61850 Swedish 43879 Noun 11033 Verb 5695 Adjective 5524 Declension 4499 Etymology 4115 Related terms 2774 Pronunciation 1777 Conjugation 1675 See also 1478 Proper noun 1006 Synonyms 666 Adverb |
591 Derived terms 565 References 533 Usage notes 333 Antonyms 135 Pronoun 129 Abbreviation 125 Cardinal number 104 Interjection 90 Etymology 2 90 Etymology 1 86 Inflection 79 Preposition 76 Suffix |
71 Compounds 53 Idiom 51 Conjunction 49 Phrase 39 Prefix 37 Ordinal number 25 Proverb 22 Descendants 18 Etymology 3 16 Hypernyms 12 Hyponyms 11 Phrases 11 Initialism |
10 Homophones 10 Determiner 9 |
As a comparison, the most common headings for all languages (not counting the L2 headings for the language names themselves) are:
1235093 Verb 811866 Noun 272027 Etymology 267882 Pronunciation 254013 Adjective 234614 Anagrams 123356 Related terms 119880 Declension 91788 Synonyms 76466 Derived terms 66909 Translations 66639 References 62966 Proper noun 58676 See also 49495 Alternative forms 48712 Conjugation 36726 Adverb 33230 Participle 32812 Hanzi 26224 Han character |
24396 Inflection
17626 Usage notes
17535 External links
16834 Antonyms
15105 Descendants
13497 Readings
13331 Kanji
10042 Etymology 1
10033 Etymology 2
8953 Hanja
7029 Pronoun
4578 Interjection
3809 Compounds
3623 Phrase
3610 Suffix
3452
|
2998 Preposition
2663 Prefix
2572 Quotations
2491 Mutation
2433 Letter
2380 Idiom
2169 Conjunction
1901
|
973 Statistics 728 |
The most common combinations and sequences for Swedish sections are:
36998 ((Swedish(Noun))) 8123 ((Swedish(Verb))) 3620 ((Swedish(Adjective))) 981 ((Swedish(Proper noun))) 918 ((Swedish(Etymology;Noun(Declension)))) 699 ((Swedish(Noun(Declension)))) 410 ((Swedish(Etymology;Noun(Declension;Related terms)))) 372 ((Swedish(Adjective;Verb))) 359 ((Swedish(Verb(Conjugation;Related terms)))) 340 ((Swedish(Etymology;Verb(Conjugation;Related terms)))) 339 ((Swedish(Noun(Declension;Related terms)))) 330 ((Swedish(Noun;Verb))) 249 ((Swedish(Pronunciation;Noun))) 220 ((Swedish(Etymology;Adjective(Declension)))) 211 ((Swedish(Etymology;Noun(Declension)References))) 182 ((Swedish(Adverb))) 180 ((Swedish(Noun(Related terms)))) 156 ((Swedish(Pronunciation;Noun(Declension)))) 145 ((Swedish(Etymology;Proper noun))) 139 ((Swedish(Etymology;Noun))) 121 ((Swedish(Pronunciation;Noun(Declension;Related terms)))) 120 ((Swedish(Etymology;Adjective(Declension;Related terms)))) 114 ((Swedish(Noun(See also)))) 109 ((Swedish(Proper noun(Related terms)))) 106 ((Swedish(Etymology;Noun(Declension;See also)))) 105 ((Swedish(Noun(Synonyms)))) 104 ((Swedish(Etymology;Verb(Conjugation)))) |
99 ((Swedish(Pronunciation;Verb(Conjugation;Related terms)))) 95 ((Swedish(Noun(Declension;See also)))) 91 ((Swedish(Adjective;Adverb))) 88 ((Swedish(Verb(Conjugation)))) 79 ((Swedish(Pronunciation;Noun(Related terms)))) 79 ((Swedish(Noun(Declension;Related terms;See also)))) 78 ((Swedish(Etymology;Pronunciation;Noun(Declension)))) 72 ((Swedish(Abbreviation))) 71 ((Swedish(Adjective(Related terms)))) 70 ((Swedish(Cardinal number))) 69 ((Swedish(Pronunciation;Adjective))) 62 ((Swedish(Noun(Declension;Synonyms)))) 62 ((Swedish(Etymology;Noun(Declension;Related terms;See also)))) 61 ((Swedish(Adjective(Declension;Related terms)))) 57 ((Swedish(Adjective(Declension)))) 52 ((Swedish(Etymology;Noun(Declension;Synonyms)))) 43 ((Swedish(Etymology;Pronunciation;Noun(Declension;Related terms)))) 42 ((Swedish(Verb(Conjugation;Related terms;See also)))) 42 ((Swedish(Pronunciation;Verb))) 42 ((Swedish(Pronunciation;Etymology;Verb(Conjugation;Related terms)))) 42 ((Swedish(Noun(Derived terms)))) 42 ((Swedish(Etymology;Pronunciation;Noun))) 40 ((Swedish(Alternative forms;Proper noun))) 38 ((Swedish(Pronoun))) 38 ((Swedish(Etymology;Verb(Conjugation;Related terms;See also)))) 38 ((Swedish(Etymology;Adjective))) 37 ((Swedish(Etymology;Adverb))) |
February 8, 2011: English Wiktionary now contains more Swedish entries (78,985) than Swedish Wiktionary (76,119). The overlap is only 34,178 entries. Swedish Wiktionary has more gloss definitions and English Wiktionary has more form entries, many created by LA2-bot.
February 6, 2011: I should try to incorporate as much as possible of Wikipedia:Swedish Wikipedians' notice board/Terminology into Wiktionary.
February 4, 2011: I set up {{R:Rikstermbanken}}
and create some entries that refer to it.
January 30, 2011: I set up {{R:Utrikes namnbok}}
and create some entries that refer to it, mostly in Category:sv:Government.
January 20, 2011: How to extract a list of Swedish headwords from the Swedish Wiktionary:
wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=sv&wikifam=.wiktionary.org&basecat=Svenska&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" | awk '-F\t' '$1==0 {print $2}' | tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort
January 10, 2011: How to extract a list of Swedish headwords:
wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=en&wikifam=.wiktionary.org&basecat=Swedish+language&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" | awk '-F\t' '$1==0 && $3!="Translation_requests_(Swedish)" && $3!="Translations_to_be_checked_(Swedish)" && $3!~/derived_from_Swedish/ {print $2}' | tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort
November 19, 2010: I import {{R:runeberg.org}}
from sv.wikipedia.
November 15, 2010: I think there are now 20,000 Swedish entries in en.wiktionary.org, which is twice as many as the beginning of this year. This has been achieved mainly by adding form entries. Statistics here. I have added more word forms, based on word frequency lists (see corpus coverage in the August 31 entry below). I have focused less on including all defintions and all forms for every word. What I have tried to do is to create links between the entries, so compounds link to their component words. Hopefully, this will attract more users who then start to fill in the missing definitions (second usage of words) and forms. This philosophy, known as eventualism, is similar to creating stub articles in Wikipedia, hoping that later users will fill in more facts. I'm not a general subscriber to that idea, but it can be a useful approach in the early stages of a project. A useful Swedish dictionary probably needs 120,000 basic forms (and half a million form entries), which is ten times more than en.wiktionary has today and five times more than sv.wiktionary has.
September 18, 2010: There are 51,318 pages that call {{t}}
, {{t+}}
or {{t-}}
. The page with most translations is be (607 translations), followed by you (447), set (438), love (421). Halfway down the list we find words like toner and toadstool (4 translations each). The most translated words that don't yet have any Swedish translation (or where the translations didn't use these templates in the database dump of 2010-09-12) are: judge (161), 下 (156), heat (154), jump (153), spread (141), stroke (140), proper (137), cry (131), behind (130), desire (126), nose (125), round (123), article (122), double (121), taste (117), end (117), situation (116), shut up (116), male (116), Albanian (116), draft (112), chest (112), e-mail (110), truth (108), storm (108), squeeze (105), same (105), job (105), exit (105), 牛 (104), cheap (103), steer (102), prayer (100), entry (100), cinema (100), split (99), Gypsy (99), care (99), waste (98), sole (97), hook (97), chat (97), welcome (96), believe (96), coach (95), short (94), bend (94), herd (91), finish (91), sit (90), return (90), pickle (90), drill (90), dragon (90), cum (90), cherry (90), butt (90), British (90), masculine (88), correct (88), icon (87), gun (87), gentleman (87), freedom (87), beginning (87), separate (86), Moon (86), account (86), justice (85), I'm Jewish (85), definition (85), puzzle (84), atmosphere (84), corner (83), Macedonian (81), lime (81), lady (80), decline (80), damn (80), cardinal (79), plague (78), interest (78), dash (78), auxiliary (78), study (77), newspaper (77), hi (77), criminal (77), cement (77), bundle (77), bug (77), appropriate (77), agree (77), vacuum (76), swarm (76), reach (76), poetry (76), late (76), harmony (76), custom (76), chip (76), certainly (76), authority (76), rear (75), pumpkin (75), discharge (75), silk (74), dinner (74), crash (74), Commonwealth of Independent States (74), cheat (74), accept (74), walnut (73), transfer (73), grain (73), ceremony (73), abate (73), victim (72), vagina (72), type (72), prophet (72), increase (72), contact (72), constitution (72), constellation (72), budget (72), application (72), soldier (71), plot (71), painting (71), crew (71), brass (71), thunder (70), roast (70), psychology (70), communism (70), brake (70), witch (69), saddle (69), neighbour (69), vault (68), shallow (68), perfume (68), particle (68), harvest (68), electronic (68), coral (68), camp (68), amount (68), odd (67), occupation (67), how much (67), device (67), chamber (67), bust (67), association (67), airplane (67), track (66), stab (66), spice (66), pomegranate (66), crust (66), comfort (66), aeroplane (66), random (65), plough (65), no way (65), married (65), foundation (65), execution (65), channel (65), breath (65), arrest (65), studio (64), Myanmar (64), fail (64), enter (64), dish (64), actual (64), abrupt (64), wizard (63), Vladimir (63), substantial (63), splinter (63), reply (63), purple (63), paddle (63), nucleus (63), notice (63), illusion (63), how are you (63), deliver (63), dairy (63), counterfeit (63), blackmail (63), arrive (63), wardrobe (62), stuff (62), seat (62), not at all (62), deliberate (62), cylinder (62), crop (62), advertisement (62), zone (61), tower (61), source (61), sexuality (61), litter (61), gravity (61), fill (61), composition (61), business (61), bully (61), asshole (61), trial (60), sponge (60), sigh (60), resolution (60), orthography (60), mount (60), Java (60), implement (60), hood (60), half (60), habit (60), forever (60), anyway (60). Of course there can also be many definitions of be or you that don't have Swedish translations.
September 7, 2010: Some Unix/Linux shell commands:
To extract just one language (here: Swedish) from the XML database dump and removing the interlanguage links: |
sed 's/<text.*>/\n/;s/<\/text>/\n==End==/' enwiktionary.xml | \ sed '/^==\s*Swedish/,/^==[^=]/!d;/^==[^=]/d;/^\[\[[a-z][-a-z]*:/d' |
To extract just the native language example sentences from the above (beware of the " and ' trick): |
sed '/^#:[^:]/!d;s/^#:*\s*//;s/=.*//;s/'"'''"'//g;s/'"''"'//g;s/&[/a-z]*;//g' |
To cut plain text into a list of words (I kept hyphen in words, but not digits; you might want to add »: |
tr ' -&(-,.-?[]|' '\n'|sed '/^$/d' |
To find the most frequent words: |
sort | uniq -c | sort -nr |
When all of the above are combined, I get a list of all words occurring in the Swedish example sentences, sorted by frequency. And so I can check that Wiktionary provides explanaitions for all or most of them. The Swedish example sentences constitute an 84 kbyte e-text, having 13,255 words of which 4819 are unique. Wiktionary has Swedish entries for 71.1 percent of the occurrences. This is rather low. Part of the explanation is that some text is in English, because the example sentences are incorrectly formatted and contain templates and URLs.
September 4, 2010: Inserting the templates l and t:
python replace.py -family:wiktionary -lang:en -xml:enwiktionary.xml -summary:"l:sv, t:sv" -regex -recursive \ '\[\[#Swedish\|([^\]]+)\]\]' '{{l|sv|\1}}' \ '\[\[([^#\|\]]+)#Swedish\|[^\]]*\]\]' '{{l|sv|\1}}' \ '(\* *Swedish:.*?)\[\[([^\]]*)\]\]' '\1{{t|sv|\2}}' \ '(\* *Swedish:.*?){{l\|sv\|' '\1{{t|sv|' \ '(\* *Swedish:.*?{{t[^}]*)}} {{([cfmnp](\|[cfmnp])*}})' '\1|\2'
August 31, 2010: The Swedish Bible of 1917 contains 769,316 words of text, using a vocabular of 26,990 words and word forms, including some capitalized words at the beginning of sentences. Of this vocabulary, 3802 words or 14 % have Swedish entries in en.wiktionary. However, since these 14 % contain many of the most common words, they make up 74 % of the text. This number (74 %) is the definition of the dictionary's coverage of this corpus of text. If you pick a random page, line and word in the Bible, there's 74 % chance that word has a Swedish entry here. 74 % is a very low coverage for a dictionary, and a sign that we have a very long way to go.
Here's how it works on the two first verses: i begynnelsen skapade gud himmel och jord. och jorden var öde och tom, och mörker var över djupet, och guds ande svävade över vattnet. (Genesis 1:1-2) Of these 24 words, 5 are "och", 2 are "var", 2 are "över". These three words alone make up 9 of the 24 words or 37% of the text.
Corpus | Bible (1917) |
Herr Arnes penningar |
Swedish Wikipedia as of 2010-06-08 |
Tankar i utvandrings- frågan |
KB:s underlag till en nationell strategi... (2010) |
Kultur- utredningen (2009) |
SvD Under- streckare, Sept. 1–18, 2010 |
Framtidens Internet by Jan Kallberg | |
---|---|---|---|---|---|---|---|---|---|
Words in corpus | 769,316 | 23,514 | 111,625,635 | 93,078 | 18,607 | 248,282 | 31,608 | 23,414 | |
Unique words | 26,990 | 3,303 | 3,412,039 | 14,516 | 4,017 | 23,050 | 8,815 | 5,086 | |
Date of database dump |
Swedish entries |
Percent coverage of corpus | |||||||
2010-08-12 | 10,987 | 72.6 | 75.4 | 55.6 | 66.1 | 49.2 | 58.3 | 63.9 | 70.1 |
2010-08-24 | 11,531 | 74.2 | 76.1 | 55.8 | 66.7 | 49.4 | 58.5 | 64.0 | 70.4 |
2010-09-01 | 14,678 | 84.8 | 84.7 | 59.7 | 73.1 | 55.0 | 65.0 | 69.2 | 76.0 |
2010-09-12 | 16,926 | 87.3 | 87.5 | 61.6 | 77.0 | 65.7 | 73.2 | 71.2 | 78.5 |
2010-09-23 | 17,836 | 87.5 | 88.1 | 62.9 | 78.4 | 70.3 | 76.4 | 73.9 | 80.2 |
2010-10-05 | 17,851 | 87.5 | 88.1 | 63.0 | 78.4 | 70.4 | 76.4 | 74.0 | 80.2 |
2010-10-15 | 17,885 | 87.5 | 88.1 | 63.2 | 78.4 | 70.5 | 76.4 | 74.1 | 80.3 |
2010-10-30 | 19,449 | 87.7 | 88.2 | 64.0 | 80.4 | 71.5 | 77.8 | 75.5 | 81.4 |
*2010-12-31 | 22,135 | 88.7 | 89.2 | 65.9 | 84.5 | 77.8 | 83.7 | 79.3 | 85.1 |
**2011-01-10 | 40,621 | 89.5 | 89.6 | 68.3 | 85.6 | 78.4 | 84.9 | 81.1 | 89.7 |
**2011-01-23 | 53,421 | 90.0 | 89.8 | 69.4 | 86.6 | 82.8 | 86.3 | 82.1 | 90.5 |
**2011-01-31 | 59,889 | 90.2 | 90.0 | 69.9 | 87.5 | 83.1 | 86.8 | 82.6 | 91.2 |
**2011-02-08 | 78,985 | 91.1 | 90.6 | 71.1 | 89.1 | 84.1 | 88.1 | 83.9 | 92.3 |
**2011-03-23 | 87,267 | 91.4 | 90.7 | 71.8 | 89.7 | 84.9 | 88.5 | 84.6 | 92.7 |
(The Wikipedia corpus used here contains some garbage that will never be covered by the dictionary, e.g. Wikipedia user names, occasional talk pages in English, and some remaining wiki markup, so the coverage percentage will inevitably be lower. It's still interesting to have a really large corpus to study.)
(* No database dump exists for 2010-12-31, but a preliminary dictionary was extracted.)
(** Dictionary generated by category wget. See diary entry for January 10, 2011.)
August 28, 2010: I think it would be helpful to know how common a word is. This can be determined by computing its rank in some large body of text, putting the most frequent word ("the" for English, "och" for Swedish) at position 1. This is what template {{rank}}
does, for example able has rank 391, but I think a logarithmic scale would be more informative than a linear one. Color graphics could indicate how "hot" a word is, but with the cool and neutral black, white and light-blue appearance of Wiktionary, the colors must be restricted to a very small area:
August 21, 2010: Many open issues:
- So far, only 10,000 entries in Swedish. Redefining templates is easier now than after many more entries have been created.
- How should templates be named? Is the -reg-/-irreg- part of the name really necessary? Can we do with fewer templates and shorter names?
- How do we create entries for all inflected forms? Can this be automated?
- Can conjugation/declension tables handle passive verbs? Subjunctives? All adjectives?
- Should template parameters be standardized? Now they are different everywhere: 2=, stem=, sg-def-gen=
- Can templates support irregular verbs, so avgå, tillstå kan be based on gå, stå?
- Can templates support prefixed and suffixed words, e.g. "gå an/gick an" smarter than today?
- Should templates for Swedish words be standardized across languages of Wiktionary?
- Old spelling (elf/älf/älv) can be handled, but how should we handle giva/ge, hava/ha?
The most common headings in Swedish sections are:
10969 Swedish 533 Derived terms 72 Compounds 37 Ordinal number 6402 Noun 319 Adverb 72 Abbreviation 31 Conjunction 2618 Pronunciation 251 Usage notes 63 Cardinal number 25 Proverb 1705 Verb 251 Antonyms 58 Conjugation 22 Verb form 1520 Related terms 214 Alternative spellings 54 Idiom 22 Descendants 1300 Adjective 100 Etymology 2 52 References 17 Etymology 3 1247 Proper noun 100 Etymology 1 51 Preposition 16 Hypernyms 1013 Etymology 96 Inflection 48 Phrase 14 Homophones 995 See also 88 Interjection 41 Alternative forms 12 Hyponyms 789 Synonyms 83 Pronoun 39 Suffix 11 Phrases
The most common heading structures are listed below. "((" means heading level 2.
3158 ((Swedish(Noun))) 57 ((Swedish(Pronunciation;Noun(See also)))) 831 ((Swedish(Proper noun))) 56 ((Swedish(Pronunciation;Noun(Derived terms)))) 660 ((Swedish(Verb))) 47 ((Swedish(Abbreviation))) 565 ((Swedish(Pronunciation;Noun))) 45 ((Swedish(Pronunciation;Adjective(Related terms)))) 505 ((Swedish(Adjective))) 43 ((Swedish(Noun;Verb))) 290 ((Swedish(Noun(Related terms)))) 42 ((Swedish(Pronunciation;Noun;Verb))) 206 ((Swedish(Etymology;Noun))) 41 ((Swedish(Verb(See also)))) 168 ((Swedish(Noun(Synonyms)))) 37 ((Swedish(Alternative spellings;Proper noun))) 168 ((Swedish(Noun(See also)))) 34 ((Swedish(Pronunciation;Noun(Synonyms)))) 156 ((Swedish(Pronunciation;Verb))) 34 ((Swedish(Pronunciation;Adverb))) 142 ((Swedish(Pronunciation;Noun(Related terms)))) 34 ((Swedish(Alternative spellings;Noun(Related terms)))) 131 ((Swedish(Pronunciation;Adjective))) 33 ((Swedish(Phrase))) 121 ((Swedish(Verb(Related terms)))) 32 ((Swedish(Adjective(See also)))) 112 ((Swedish(Etymology;Proper noun))) 29 ((Swedish(Adjective;Noun))) 101 ((Swedish(Proper noun(Related terms)))) 28 ((Swedish(Pronunciation;Verb(See also)))) 81 ((Swedish(Adjective(Related terms)))) 28 ((Swedish(Etymology;Noun(Related terms)))) 73 ((Swedish(Adverb))) 27 ((Swedish(Etymology;Verb))) 72 ((Swedish(Pronunciation;Verb(Related terms)))) 27 ((Swedish(Etymology;Adjective))) 72 ((Swedish(Etymology;Pronunciation;Noun))) 26 ((Swedish(Verb(Synonyms)))) 62 ((Swedish(Noun(Derived terms)))) 26 ((Swedish(Interjection)))
Starting to introduce ====Declension==== and ====Conjugation==== on a big scale, will change this pattern.
It seems I have a bot command that works:
python replace.py -family:wiktionary -lang:en -cat:'Swedish verbs' -summary:'Conjugation heading' -regex -dotall \ '(===Verb===\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)' '\1\5====Conjugation====\n\3' \ '(====Verb====\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)' '\1\5=====Conjugation=====\n\3'
August 20, 2010: In the database dump of 2010-08-12, there were 6341 calls to templates named sv-. Kinds are conj = conjugation table for verbs, decl = declension table for adjectives and nouns, form = referring from an inflected form to the main entry, infl = one-liner inflection pattern.
Calls | Template | Kind | Comment |
---|---|---|---|
813 | {{sv-noun-reg-er}} |
decl | Since painted blue |
707 | {{sv-noun-reg-ar}} |
decl | Since painted blue |
488 | {{sv-verb-reg-ar}} |
conj | Since painted blue |
433 | {{sv-noun-reg-or}} |
decl | Since painted blue |
418 | {{sv-noun-n-zero}} |
decl | Since painted blue |
343 | {{sv-noun}} | decl | Since painted blue and renamed {{sv-decl-noun}} . Contains table layout and colours, serving as the base for other noun templates.
|
257 | {{sv-adj-reg}} |
decl | |
254 | {{sv-noun-unc-irreg-c}} |
decl | Since painted blue |
250 | {{sv-noun-irreg-c}} |
decl | Since painted blue |
218 | {{sv-verb-irreg}} |
conj | Since painted blue. Contains table layout and colours, serving as the base for other verb templates. |
191 | {{sv-adv}} |
infl | |
153 | {{sv-noun-reg-r-c}} |
decl | Since painted blue |
137 | {{sv-verb-reg}} |
infl | |
132 | {{sv-noun-unc-irreg-n}} |
decl | Since painted blue |
108 | {{sv-adj-abs}} |
decl | |
105 | {{sv-adj-peri}} |
decl | |
102 | {{sv-verb-reg-er}} |
conj | Since painted blue |
101 | {{sv-noun-c-zero}} |
decl | Since painted blue |
96 | {{sv-noun-unc-n}} |
decl | A redirect to {{sv-noun-unc-irreg-n}}
|
92 | {{sv-noun-irreg-n}} |
decl | Since painted blue |
86 | {{sv-adj}} |
infl | |
84 | {{sv-noun-unc-c}} |
decl | A redirect to {{sv-noun-unc-irreg-c}}
|
78 | {{sv-verb-form-pre}} |
form | |
72 | {{sv-noun-reg-n}} |
decl | Since painted blue |
60 | {{sv-noun-form-indef-pl}} |
form | |
56 | {{sv-verb-form-past}} |
form | |
54 | {{sv-noun-form-def}} |
form | |
33 | {{sv-verb-irr}} |
infl | |
31 | {{sv-verb-form-sup}} |
form | |
31 | {{sv-adj-form-abs-pl}} |
form | |
30 | {{sv-adj-form-abs-indef-n}} |
form | |
27 | {{sv-verb-form-imp}} |
form | |
26 | {{sv-adj-form-abs-def}} |
form | |
19 | {{sv-noun-form-indef-gen}} |
form | |
18 | {{sv-adj-pastpart}} |
decl | |
17 | {{sv-noun-reg-r-n}} |
decl | Since painted blue |
16 | {{sv-noun-form-def-pl}} |
form | |
15 | {{sv-verb-form-pastpart}} |
form | |
14 | {{sv-verb-form-prepart}} |
form | |
14 | {{sv-verb-ar}} |
infl | A redirect to {{sv-verb-reg}}
|
13 | {{sv-noun-form-indef-gen-pl}} |
form | |
13 | {{sv-noun-ar}} |
decl | A redirect to {{sv-noun-reg-ar}}
|
13 | {{sv-adj-form-abs-def-m}} |
form | |
11 | {{sv-adj-prepart}} |
decl | |
11 | {{sv-adj-form-comp}} |
form | |
10 | {{sv-noun-or}} |
decl | A redirect to {{sv-noun-reg-or}}
|
10 | {{sv-noun-form-def-gen}} |
form | |
9 | {{sv-noun-form-def-gen-pl}} |
form | |
9 | {{sv-adj-form-sup-pred}} |
form | |
8 | {{sv-adv-form-sup}} |
form | |
7 | {{sv-noun-un}} |
decl | A redirect to {{sv-noun-unc-irreg-c}}
|
7 | {{sv-adj-form-sup-attr-pl}} |
form | |
6 | {{sv-adj-form-sup-attr-m}} |
form | |
5 | {{sv-adv-form-comp}} |
form | |
5 | {{sv-adj-form-sup-attr}} |
form | |
4 | {{sv-noun-n}} |
decl | A redirect to {{sv-noun-reg-n}}
|
3 | {{sv-verb-form-pre-pass}} |
form | |
3 | {{sv-verb}} |
Erroneous call, since replaced. | |
2 | {{sv-verb-form-pres-pass}} |
form | |
2 | {{sv-verb-form-inf-pass}} |
form | |
2 | {{sv-adj-irreg}} |
decl | |
2 | {{sv-adj-form-sup-pred-pl}} |
form | |
1 | {{sv-noun-reg-}} |
Mentioned in {{sv-new-noun}}
| |
1 | {{sv-noun-proper-def-irreg}} |
Listed on Wiktionary:Swedish inflection templates | |
1 | {{sv-noun-pl-irreg}} |
Listed on Wiktionary:Swedish inflection templates | |
1 | {{sv-noun-form-adj}} |
form | |
1 | {{sv-adj-small}} |
decl | Called from {{sv-adj-decl}} , which is never used.
|
1 | {{sv-adj-form-comp-pl}} |
form | |
1 | {{sv-adj-abs-irreg}} |
Listed on Wiktionary:Swedish inflection templates |
August 19, 2010: There are currently 81 templates named sv-... (too many for my taste), having the following parts of their names:
Number of templates having this component in their name |
Name component |
Meaning |
---|---|---|
6 | abs | Absolute form of an adjective |
27 | adj | Adjective |
4 | adv | Adverb |
4 | ar | -ar plural declension of noun |
3 | attr | Superlative attribute form of an adjective |
5 | c | Common gender of noun (= utrum, n-gender) |
3 | comp | Comparative form of an adjective |
1 | custom | sv-verb-custom is a base/meta template |
2 | decl | Declension of nouns and adjectives |
6 | def | Definite form of nouns/adjectives |
2 | er | -er plural declension of noun |
30 | form | Inflected forms referring to the main entry |
4 | gen | Genitive form |
1 | imp | Imperative form of a verb |
4 | indef | Indefinite form of nouns/adjectives |
1 | inf | Infinitive passive form of a verb |
1 | irr | Irregular inflection |
8 | irreg | Irregular inflection |
2 | m | Masculine form of adjectives |
1 | mermest | Redirect shorthand for "peri" |
8 | n | Neutral gender (neutrum, t-gender) |
5 | new | Called from "nogomatch" |
1 | nogomatch | "You can create an entry..." |
30 | noun | Noun |
2 | or | -or plural declension of noun |
3 | pass | Passive form of a verb |
1 | past | Past tense form of a verb |
2 | pastpart | Past participle form of a verb |
2 | peri | Adjective comparation with mer/mest |
8 | pl | Plural |
2 | pre | Present tense form of a verb |
2 | pred | Superlative predicative form of an adjective |
2 | prepart | Present particip form of an adjective |
1 | pres | Present passive form of a verb |
2 | r | -r plural declension of noun |
11 | reg | Regular inflection |
1 | small | Smaller table layout, not used |
7 | sup | Superlative form of an adjective |
81 | sv | Swedish |
2 | un | Redirect synonym for abs or unc |
4 | unc | Uncountable noun (no plural forms) |
18 | verb | Verb |
2 | zero | Declension of nouns where plural = singular |