Jump to content

Wikipedia:Typo Team/moss

From Wikipedia, the free encyclopedia

The moss project seeks to find and remove the furry green typos that have been growing on Wikipedia articles. It uses a python script named moss and written by User:Beland to automatically find misspellings, mistakes in English grammar, violations of the Wikipedia:Manual of Style, and confusing or broken wiki markup.

Death to typos!

QUICK LINK TO THE BEST PAGE FOR NEW PARTICIPANTS

About misspellings

[edit]

How the lists are made

[edit]

The moss spell checker is run against a recent set of database dumps, which are generated on the 1st and 20th of every month (but take a few days to process). All the articles in the English Wikipedia are examined. The following are ignored:

  • Text inside references, templates, tables, quotation marks, sections like "External links" and "Works", and some other weird places.
  • Capitalized words (which are presumed to be correctly-spelled proper nouns)
  • Words that appear in titles in the English Wiktionary (which has definitions of all words in all languages, excluding proper nouns and systematic words like chemical names and large numbers)
  • Words that appear in titles in the English Wikipedia (which explains some things that don't appear in the dictionary)
  • Words that appear in titles in the Wikispecies (which has many technical words that don't appear in the dictionary or encyclopedia)

Many mistakes are not (yet) caught:

  • Improper addition of 's (possessives are not added to Wiktionary, so these are excluded systematically)
  • Incorrect capitalization
  • Incorrect multi-word phrases
  • Wrong word used in context
  • Non-English language words not tagged with {{lang}} or where an English misspelling happens to be the same as a word in another language. (These are counted as correct spellings if they are in the English Wiktionary, which lists words in all languages – only the definitions are restricted to English.)
  • Other situations listed in #False negatives below

2023 statistics

[edit]
See also: Older statistics
Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C D H HB HL L ME N P T/ T1 TE TF TS U Z
2023-01-01 (c2370a5) 161163 + 29891 1187870 10615 83981 534264 8233 0 1498 4601 110 1975 179206 1905 5 2229 41525 6115 198814 97810 1428 13556
2023-01-20 (36ce94e) 161298 + 29949 1182833 10598 83813 534411 8235 0 1525 4965 116 1958 178578 1889 6 2196 38722 6055 198441 96321 1402 13602
2023-02-01 (90a97fc) 161048 + 29944 1180485 10602 83842 534121 8245 0 1500 5011 111 1936 178163 1862 6 2183 38247 6050 197047 96542 1392 13625
2023-02-20 (f606b45) 161111 + 30009 1180176 10609 83664 534782 8249 0 1509 5224 108 1930 177709 1861 4 2071 37810 5997 196478 97105 1383 13683
2023-03-01 (75cbca7) 161224 + 30095 1179378 10613 83570 534792 8206 0 1510 5286 100 1918 177568 1860 5 2076 37445 5970 196360 97010 1382 13707
2023-03-20 (56a3811) 161344 + 30169 1177045 10566 83245 535523 8214 0 1509 5202 99 1911 176955 1861 5 2092 36281 5811 196309 96321 1361 13780
2023-04-01 (no run)
2023-04-20 (57a4619) 161810 + 30162 1178156 10577 83076 536215 8241 0 1541 5473 105 1904 175853 2043 5 2049 36561 5740 196528 96979 1370 13896
2023-05-01 (77de75d) 162001 + 30150 1171871 10418 82887 536140 8170 0 1535 4633 98 1890 173066 2028 5 2050 36282 5781 195082 96960 1361 13485
2023-05-20 (73bb66d) 162329 + 30138 1171817 10379 82480 536386 8161 0 1470 4913 88 1890 171905 2037 0 2064 36364 5817 195132 97814 1367 13550
2023-05-20 (d0a8560) 163084 + 29893 1170266 10186 81955 529811 8192 0 1473 4902 89 1879 173759 2042 1 2064 38044 5842 194194 100920 1366 13547
2023-06-01 (040dd4d) 163371 + 29818 1169150 10189 81451 529652 8200 0 1474 5163 90 1895 172815 2031 1 2052 37997 5827 193963 101375 1365 13610
2023-06-20 (50a82ce) 163664 + 29771 1169732 10189 81086 529892 8232 0 1519 5624 86 1879 171891 2050 1 2059 38342 5785 194184 101817 1364 13732
2023-07-01 (8533535) 163877 + 29747 1169420 10201 80978 529664 8242 0 1564 5806 83 1873 171484 2042 3 2061 38446 5814 193933 102073 1373 13780
2023-07-20 (9812c05) 164115 + 29742 1170482 10174 80456 529875 8255 0 1553 5943 80 1872 171720 2036 3 2057 38956 5806 194057 102367 1361 13911
2023-08-01 (7468187) 164308 + 29748 1170928 10136 80230 529739 8249 0 1549 6036 79 1873 171743 2037 5 2061 39182 5811 194411 102497 1351 13939
2023-08-20 (7170d29) 164473 + 29635 1171932 10148 80137 529804 8263 0 1556 6132 80 1874 171627 2048 8 2062 39280 5856 194769 102930 1344 14014
Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C D H HB HL L ME N P T+gcld3_broken T/ T1 TS U Z
2023-09-01 (8c03bd1)* 164600 + 29593 1173119 10135 80154 530301 8245 0 1567 5692 87 1875 171823 2061 9 200991 2057 39595 103147 1337 14043
2023-09-20 (8c03bd1)* 164777 + 29611 1173098 10183 80123 530578 8240 0 1583 4775 85 1870 171711 2064 8 201138 2064 39874 103376 1339 14087
2023-10-01 (d531b95)* 164779 + 29586 1173193 10164 80017 530906 8238 0 1577 4719 87 1860 171300 2061 9 201083 2047 39886 103784 1328 14127
2023-10-20 (9c53721)* 164889 + 29667 1173548 10178 79977 531174 8243 138 1584 4762 87 1860 171070 2048 11 201277 2042 39910 103702 1323 14162
2023-11-01 (9c53721)* 165069 + 29668 1174710 10164 79988 531412 8252 138 1577 4738 90 1844 171440 2033 11 201449 2059 40250 103724 1338 14203
2023-11-20 (1edb851)* 165362 + 29748 1177078 10196 79995 531684 8262 138 1597 4859 93 1856 171957 2034 10 202060 2054 40847 103797 1323 14316
2023-12-01 (1edb851)* 165429 + 29788 1179043 10208 79941 531789 8294 138 1610 4950 93 1867 172253 2028 12 202513 2056 41284 104336 1310 14361
2023-12-20 (1edb851)* 165685 + 29862 1180181 10205 79762 531632 8362 138 1603 4895 103 1868 172415 2022 12 203189 2042 41499 104750 1301 14383

* Due to software issues, language detection wasn't working for this run.

2024 statistics

[edit]
Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C D H HB HL L ME N P T+gcld3_broken T/ T1 TS U Z
2024-01-01 (1edb851)* 165792 + 29766 1180781 10226 79927 531362 8352 0 1628 4917 100 1865 172474 2027 9 203478 2043 41749 104903 1301 14420
2024-01-20 (2caa23a)* 165661 + 29837 1180491 10237 79493 531501 8345 0 1624 4127 103 1858 172622 2019 9 203838 2044 41878 105071 1298 14424
2024-02-01 (3242653)* 165836 + 29834 1181230 10245 79246 531803 8337 0 1629 4120 103 1858 172799 2024 8 204049 2043 42002 105240 1287 14437
2024-02-20 (10d0c37)* 165885 + 29901 1182750 10251 78915 531861 8343 1 1630 4043 114 1849 173461 2015 10 204251 2045 42357 105827 1286 14491
2024-03-01 (9ccfa0d)* 166045 + 29975 1182428 10255 78805 531778 8362 0 1638 4041 112 1854 173370 2030 24 203994 2037 42461 105848 1299 14520
2024-03-20 (460959f)* 166141 + 30055 1185611 10292 78621 532345 8424 0 1631 4237 116 1858 173672 2045 25 204545 2049 42870 106954 1278 14649
2024-04-01 (ce9f129)* 166181 + 30054 1184405 10287 76464 533031 8419 0 1618 4309 114 1849 173577 2051 40 204408 2031 42961 107298 1258 14690
2024-04-20 (1ee7a35)* 166362 + 30118 1177599 10275 67649 533534 8425 0 1617 4335 112 1848 173787 2063 40 204403 2012 43481 107996 1258 14764
2024-05-01 (6d3c9c7)* 166292 + 30184 1175980 10277 66114 533831 8426 0 1643 4495 110 1845 173629 2064 1 204334 2020 43407 107675 1248 14861
2024-05-20 (489f6f1)*† 144265 + 25968 1003795 8924 53789 453466 7619 0 1381 3715 90 1693 150497 1795 1 176951 1725 37151 92577 1120 11301
2024-06-01 (07eaceb)* 166755 + 30248 1173354 10304 60088 534568 8460 0 1648 4461 105 2020 174740 2074 2 203514 1997 44495 108560 1241 15077
2024-06-20 (b1c7e7b)* 166980 + 30276 1173538 10299 59845 534381 8444 0 1673 4501 102 1922 174948 2071 3 204346 2000 43905 108742 1227 15129
2024-07-01 (6787e3e)* 167034 + 30300 1172833 10295 59766 533956 8440 0 1654 4345 101 1924 175086 2065 3 204357 1992 43915 108542 1227 15165

* Due to software issues, language detection wasn't working for this run.
† This run seems to have malfunctioned, possibly run on partial dumps.

Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C D H HB HL L ME N P T/ T1 TE TF TS U Z
2024-07-20 (9c0d979)* 167018 + 30354 1175268 10337 59894 533911 8455 0 1675 4304 102 1942 175528 1909 2 2015 44274 6018 199908 108530 1219 15245
2024-08-01 (027458a) 167192 + 30364 1172497 10336 59874 533608 8473 0 1657 4315 100 1917 175240 1904 0 2011 43272 5990 199733 107535 1225 15307
2024-08-20 (a13c743) 167561 + 30399 1170154 10336 59930 533732 8498 0 1661 4324 97 1911 174117 1902 1 2015 42363 5945 199740 106986 1224 15372
2024-09-01 (313f784) 167769 + 30088 1169770 10346 60064 533615 8504 0 1652 4370 94 1916 173479 1894 0 2014 42271 5946 200037 106914 1223 15431
2024-09-20 (61a2a69) 167769 + 30088 1170579 10346 60064 533615 8504 0 1652 5640 94 1915 173240 1894 0 2004 42244 5944 199857 106912 1223 15431
2024-10-01 (6afa51c) 168227 + 30163 1174679 10337 60291 534111 8536 0 1648 8004 95 1942 173723 1892 1 2053 42304 5936 199891 107127 1235 15553

Typo classification legend

[edit]
Reporting symbol Explanation
Parse failure Mismatched punctuation; spell checker is unsure which words to ignore, so the whole page is skipped
A mAth
BC Bad Characters (not allowed by Manual of Style)
BW Bad Words (not allowed by Manual of Style)
C Chemistry words
D DNA sequence
H HTML/XML/SGML tag
HB Known bad HTML tag, like <font>
HL Bad HTML-like linking, like <http://...>
L Probable Romanization (transLiteration)
ME Probable coMpound, English (with and without dash) - need to be added to Wiktionary
N A-Z plus numbers and hyphens
P Patterns (e.g. rhyme schemes - Beland fixes these)
T/ Suspected MOS:SLASH violation
T1 Edit distance 1 from common English word
TE AI thinks it's trying to be English
TF AI thinks it's trying to be a non-English language (Foreign to English Wikipedia), sorted by language (e.g. TF+el)
TS Missing or extra whitespace or dash (or new compound). Currently included if there is a period (TS+DOT), comma (TS+COMMA), or extra space (TS+EXTRA). Missing bracket (TS+BRACKET) needs code improvements to be reliable, and the remainder of TS need sorting.
U URL
Z Decimal fraction missing leading Zero
I Definitely not English (International) due to accents or mixed with punctuation (other than hyphen)
MI Probable coMpound, non-English (International) in English Wiktionary (both A-Z and non-ASCII characters, with and without dash)
ML Probable coMpound, transLiteration
MW Probable coMpound, found in non-English Wiktionary
R Regular word (A-Z only) not near a common English word
T2 Edit distance 2 from common English word
T3 Edit distance 3 from common English word
W Not in English Wiktionary, in non-English Wiktionary
  • red = Probably need to fix
  • yellow = Unsorted - need code improvements to sort into likely vs. unlikely typos or subtypes that can be usefully processed.
  • blue = Probably OK (but may need to verify)
  • bold = actively working on fixing
  • grey = no longer used

Instructions for editors

[edit]

Just like a regular spell checker, sometimes a word that's highlighted is really a misspelling and should be changed, but sometimes it is a correct spelling that needs to be added to the spell checker's dictionary (which in this case is the English Wiktionary and Wikispecies). For the below lists, here's how you can help:

  • For spelling mistakes: Click on the links to the individual Wikipedia articles, and edit them to correct the misspelling. Make sure this is actually a misspelling, and not a technical term that needs to be better explained, or an alternate spelling (possibly from a different regional variety of English).
  • For non-English words (including words from Old English and Middle English, since they are pronounced differently): Edit the article and use the {{lang}} or {{transl}} templates to mark all non-English passages. Template contents are ignored, so they will not show up in the next report. If you can define the word, it would still be helpful to add the non-English word to the English Wiktionary or the same-language Wiktionary if you speak that language. As of the March 20, 2019 dump, only words not found in any Wiktionary are reported by moss as misspellings. (The "home" Wiktionary for Old and Middle English words is the modern English one.)
    • For Early Modern English spellings, use {{lang|en-emodeng}}.
    • For languages that don't have an ISO 639 code (often happens with historical languages), you can use an IETF language tag instead. Failing that, use the miscellaneous code "mis" and add an HTML comment indicating the language. For example: {{lang|mis|sharbe do kin ratz}}<!-- Old Runish -->
  • For incorrect spellings in direct quotes:
    • These shouldn't be picked up by the spell checker, as text in double quotes ("") is ignored. The article probably has incorrect punctuation.
    • Regardless of punctuation problems, you can add {{sic}} around the word or phrase. See Wikipedia:Manual of Style#Quotations for guidance.
  • For correct spellings that belong in the dictionary: Click on the word to add it to the English Wiktionary. Remember the word might not be English (though the definition must be) and be sure to check capitalization!
  • For correct spellings already in the dictionary: Delete from the list. These have been added in the meantime since the database dump by other editors. They do not automatically turn red as internal Wikipedia links do.
  • For correct spellings not appropriate for Wiktionary:
    • For complicated chemical names:
      • If there is an article about this chemical, it's best to make a redirect. You may want to tag it {{R from systematic name}} or {{R from technical name}} if appropriate.
      • If there is no Wikipedia article, you can either {{chem name}}; for example:
        • {{chem name|poly(1-phenylethene)}}
        This should not be used for chemical formulas such as H2O, for which {{H2O}} or {{chem2}} may be appropriate. For some common compounds there are specific templates available such as Template:CO2.
    • For DNA sequences, add {{DNA sequence}} around it.
    • For species, add the whole name to Wikispecies:Wikispecies:Requested articles#From_Wikipedia and it will be suppressed from future runs.
    • For proper nouns and (including non-English titles) that aren't capitalized, put inside a {{proper name}} tag.
    • Use <code></code> or similar tags for computer programs; see Wikipedia:WikiProject_Computer_science/Manual_of_style#Code_samples.
    • For terms that are only relevant to one Wikipedia article (and for which the article makes clear the definition) consider creating a redirect to the article. As long as the "typo" word is in the title (as a whole word), it won't show up as a mistake in future spell checks.
    • {{IPA}} or {{respell}} can be used for word pronunciations. See Wikipedia:Manual of Style/Pronunciation for details.
    • For bird calls: Treat these as foreign-language words or words-as-words and put them in italics, following MOS:ITALICS. Put the call inside {{not a typo}} so it won't show up on moss spell check reports. (It doesn't matter if the double apostrophes that make the italics go inside or outside the template.)
    • Anything else, add {{not a typo}} around it (for example, nonsense series of letters used as examples in puzzles).
  • Correct or incorrect, when finished delete the entry for the word from the lists on this page (or subpages), so work won't be duplicated. (There is no longer any need for strikethru.)
  • If an article or section has generally bad grammar, and you don't have time to fix the whole thing, just add {{copyedit}} at the top of the article or {{copyedit|section}} at the top of the affected section. If it's just a sentence or two, {{copy edit inline}} or {{incomprehensible inline}} can go at the end of the problem passage.
  • If you see errors being reported from footnotes or bibliographies, check to make sure the section is titled with a standard name following MOS:APPENDIX conventions. Standard end-matter sections like "References" and "Further reading" and "Works" are ignored.
  • If it helps to leave a message on the article's talk page asking if the word is correct or incorrect, you can use Template:Typo help like this when editing the bottom of the talk page (leave the section header blank; it will automatically be added):
    • {{subst:typo help|PUT WORD HERE}} -- ~~~~

Don't worry if you miss something; it will reappear in a future report if there are still mistakes.

Suggested edit summaries

[edit]

If you want to help publicize this project, you can copy-and-paste these into your edit summary, if appropriate.

For Wikipedia edits:

Fix misspelling found by [[Wikipedia:Typo Team/moss]] – you can help!
Tag non-English text found by [[Wikipedia:Typo Team/moss]] – you can help!
Tag correct text as {{not a typo}} for automated spell checkers (including [[Wikipedia:Typo Team/moss]])
Fix mismatched quote marks found by [[Wikipedia:Typo Team/moss]] – you can help!

For Wiktionary edits:

Add word identified by [[w:Wikipedia:Typo Team/moss]] – you can help!

Wiktionary cheat sheet

[edit]

Need to add a word to Wiktionary? The Wiktionary cheat sheet has copy-and-paste templates that make it easy for the types of words commonly encountered here, even if you've never done it before.

Misspellings - lists of things to fix

[edit]

Likely misspellings by article (main listing)

[edit]

The most efficient list to work on if all you want to do is fix misspellings. These listings try to list all the typos from a given article, so they can be fixed all at once. It also tries to only show typos that legitimately need fixing. It's not perfect, so a few words found need to be added to Wiktionary or tagged as not English, not a typo, etc. Only a few letters are updated on each run, to avoid stale listings as the whole list takes far longer than two weeks to work through. (This also avoids duplicating recent work when listings are refreshed.)

See subpages due to length:

Notes:

  • For more cases that require investigation, see Category:Articles with unidentified words.
  • Due to length and an increased number of false positives, typo reports for dumps 2020-05-20 and later don't include T2+, T3+, and TS+BRACKET+.

Possible typos by length

[edit]

(Updated from 2022-12-20 dump.)

Longest or shortest in certain categories are shown, sometimes just for fun and sometimes because they form a useful group. Feel free to delete articles that are fixed or tagged.

Likely chemistry words

[edit]

These need to be checked by a chemist and marked as {{chem name}}.

Chemical formulas

[edit]

(Updated from 2023-05-20 dump.)

Chemical formulas should be written with HTML subscripts or {{chem2}}; these listings identify those that incorrectly just use regular numbers.

Chemical formulas that use Unicode subscripts (which is against MOS:SUBSCRIPT) will be detected automatically by moss_entity_check.py.

Chemical formulas that use <sub>...</sub> are allowed by MOS:CHEM, but may show up in the main typo listings above. They can be converted to use {{chem2}} to be accepted by the spell checker, and {{chem2}} is also the way to fix listings of partial formulas.

Any "possible" listings that aren't chemical formulas can be cleared from this list by adding a redirect to an appropriate target (like Dy4 Systems). Most "known" listings that aren't chemical formulas can be fixed with {{proper name}}.

Redirects added for strings that are chemical formulas should be added to Category:Chemical formulas.

Most chemical articles

[edit]

Articles with a large number of chemical formulas triggering the spell checker are listed here (manual check on 2022-06-20 dump; counts include potential typos other than formulas, mostly compound names):

Possible chemical formulas that don't use subscripts

[edit]

Note: These are easier to find by searching with "insource://", for example: insource:/Si6Al2/. -- Beland (talk) 02:32, 27 December 2022 (UTC)[reply]

  • 11/6 - Ge9
  • 10/2 - N62B44
  • 7/6 - V2O7
  • 7/5 - Ac2S3
  • 7/1 - B3R2
  • 6/6 - Cu5
  • 6/5 - Ti3O5
  • 6/5 - S50B32
  • 6/5 - Bi2O2
  • 6/5 - Al63Cu24Fe13
  • 6/3 - Pr2C6H3
  • 6/3 - H3R17
  • 6/2 - Mn12O12
  • 6/2 - Ga2I3
  • 6/2 - C6R6
  • 5/5 - Si9O27
  • 5/5 - Pb9
  • 5/5 - No17
  • 5/5 - H3K18
  • 5/5 - Fe5Si3
  • 5/5 - Fe2O4
  • 5/5 - B18B4
  • 5/4 - Zr4
  • 5/4 - S6K2
  • 5/4 - Mo6S8
  • 5/4 - Fe4S3
  • 5/3 - V3R6 - version 3 release 6?
  • 5/3 - Pu2O3
  • 5/3 - K3V2
  • 5/3 - H3R26
  • 5/3 - Cf2O3
  • 5/2 - Np2O5
  • 5/2 - N62B48
  • 5/2 - Mn5Si3
  • 5/2 - Lv5
  • 5/2 - B12C3
  • 5/1 - Si4O13
  • 5/1 - Np2S3
  • 5/1 - B12Cl11
  • 4/4 - Ti22
  • 4/4 - Si4O10
  • 4/4 - Sb3O6
  • 4/4 - No16
  • 4/4 - Kr2
  • 4/4 - I4O9
  • 4/4 - H4R3
  • 4/4 - Gd3Ga5O12
  • 4/4 - Ga2Cl4
  • 4/4 - C6H5O7
  • 4/4 - C6H3Cl2
  • 4/4 - C2B2
  • 4/4 - C16H33
  • 4/4 - Bi4Ti3O12
  • 4/4 - Au75Si25
  • 4/4 - Al2Si2
  • 4/3 - W18O49
  • 4/3 - Tc3Cl9
  • 4/3 - R2B2
  • 4/3 - Pb10
  • 4/3 - No11
  • 4/3 - Ni6
  • 4/3 - H3R8
  • 4/3 - Ca3Al2
  • 4/3 - C5H3
  • 4/3 - C2B7H13
  • 4/3 - B6H10
  • 4/3 - B18C4
  • 4/2 - R2P2
  • 4/2 - Ni31Si12
  • 4/2 - H4K8
  • 4/2 - Cu4O3
  • 4/2 - Cr7C3
  • 4/2 - B5O6
  • 4/1 - Ti4N3
  • 4/1 - Ta5N6
  • 4/1 - Ta2Cl6
  • 4/1 - Sm2Co17
  • 4/1 - O2C6Cl4
  • 4/1 - Np3S5
  • 4/1 - Mg3Si2O5
  • 4/1 - Lv8
  • 4/1 - Ho5
  • 4/1 - H4H2
  • 4/1 - Ga2I4
  • 4/1 - Cr2Ge2Te6
  • 4/1 - C6S4
  • 4/1 - C50H10
  • 4/1 - C2P2
  • 4/1 - Ag6
  • 3/3 - V4R4 - version 4, release 4?
  • 3/3 - V4R3 - version 4, release 3?
  • 3/3 - Th4

Known chemical formulas that don't use subscripts

[edit]
H2O
[edit]
CO2
[edit]
CS2
[edit]

(Mostly not carbon disulfide.)

C2H2 zinc finger weirdness
[edit]

These might be better written as Cys2His2; see Zinc finger#Classes. -- Beland (talk) 01:16, 18 June 2022 (UTC)[reply]

Remainder
[edit]
Problem cases
[edit]

Parsing problems (where noted) are probably resulting in words showing up in debug-spellcheck-ignored.txt that shouldn't. -- Beland (talk) 03:09, 27 December 2022 (UTC)[reply]

Repeating patterns

[edit]

For rhyme schemes, they probably need to be re-styled to follow Wikipedia:WikiProject Poetry#Style for rhyme schemes. If this ends up making them all-caps, they won't show up here on the next run. For mixed-case rhyme scheme notations, use {{not a typo}} after making sure dashes, commas, and spaces follow the recommended style.

(All fixed as of 2022-12-20 dump!)

False positives

[edit]

Is there a word that is correctly used in an article, but which shouldn't be added to Wiktionary? List it here, and Beland will fix the problem.

Archived solutions: Wikipedia:Typo Team/moss/Archive

False negatives

[edit]

Is there a misspelled word in an article mentioned here that was not reported? Feel free to list it below and Beland will try to improve the code if appropriate.

These are currently over-ignored, but could be used to suggest correct spellings:

  • Wikipedia articles with {{R from misspelling}}, {{R from incorrect name}}, {{R from miscapitalisation}}, and redirects to these templates
  • Wiktionary entries that are known misspellings (e.g. wikt:anticiliary)
  • In cases where there are variant spellings of the same word or phrase, Wikipedia should probably pick one and stick to it except to mention the variants. This happens with:
    • Compound words - whether to use a space, dash, or nothing, as in "junebug" vs. "june bug" or "email" vs. "e-mail".
    • Words with multiple transliterations from another language (often there are multiple systems, no particular system, or a modern system different from historical systems).
    • Redirects with {{R from alternate spelling}} and redirects to that template.

Archived notes

[edit]

See Wikipedia:Typo Team/moss/Archive.

For Wiktionary

[edit]

Spell-checking Wiktionary itself

[edit]

A new project has started to do that using moss software, at wikt:Wiktionary:Spell check.

Triaged for Wiktionary

[edit]

Dictionary writers needed! And speakers of languages other than English!

Many words (English and otherwise) detected as potential typos have been manually triaged as legitimate words that need to be added to Wiktionary, and are listed at Wikipedia:Typo Team/moss/For Wiktionary. (Moved from this page due to length.) Many of the subpages under the misspelling main listing also have long lists of words to add to Wiktionary, which are sometimes bundled up and moved to the "For Wiktionary" subpage.

Wiktionary aims to have definitions for all words in all languages (with some exceptions), and acts as the primary database for the moss spell-checker.

Highest-frequency words missing from dictionary (a-m)

[edit]

(updated 2022-12-20) Good candidates for words to add to the English Wiktionary (which provides English definitions for words in all languages, including all compound words), as it seems English Wikipedia readers will frequently encounter them. For each run, only words from half of the alphabet are shown, to avoid duplicate work from when new dumps are being processed.

Most of the words are not from English. To get them off this list, you can either add an entry to the English Wiktionary (which provides English definitions for words in all languages) or tag all instances of the word on the English Wikipedia with {{lang}}. Wiktionary does not accept Romanizations for some languages, so those cases must be tagged as {{transl}} or {{lang}}.

Legitimate misspellings are candidates for Wikipedia:Lists of common misspellings. If there is an obvious correction, adding that to Wikipedia:Lists of common misspellings/For machines will help editors who use automated tools to fix cases faster.

Translation and general cleanup

[edit]

See Wikipedia:Typo Team/moss/not English.

Mismatched markup and punctuation

[edit]

Errors in punctuation (mostly quotation marks) and wiki markup generally cause confusion for readers, and also prevent the spell checker from running on these articles.

Inches and feet should not use " and ', per Wikipedia:Manual of Style/Dates and numbers#Specific units; use letters instead. (See MOS:UNITS for general guidance.) Where conversions are needed, use {{convert}}, for example: 2 feet 3 inches (69 cm)


WORK IN PROGRESS

  • Integrating these with main listings
  • Filter only unmatched " for now
    • Filter articles with non-ASCII quote marks to a separate list for JWB processing
    • Filter \d" and \d' to a separate sublist for inch/feet style conversion
  • Explain ✂ or skip snippets showing this
  • Bracketbot web UI seems to be down

-- Beland (talk) 19:03, 4 September 2019 (UTC)[reply]

Gender-neutral language

[edit]

Manned

[edit]

The word "manned" and related forms like "unmanned" are used in many articles, but is not gender-neutral as required by MOS:S/HE and the NASA style guide. Gender-neutral alternatives include:

  • Crewed, uncrewed
  • Staffed, unstaffed
  • Human spaceflight
  • Defended

Not all instances need to be changed.

  • Proper nouns should remain the same, like Manned Orbiting Laboratory
  • Titles of sources and quotes should remain unchanged.
  • If the term itself is being discussed, for example to say that "manned spaceflight" is another way of saying human spaceflight.
  • There seems to be consensus on unmanned aerial vehicle that this and related phrases (like unmanned aerial system) should remain intact, since it is much more frequent than "uncrewed aerial vehicle" at the moment. However, when using Wikipedia's voice it is preferred to describe a UAV as "uncrewed" when not using the whole phrase.
  • Non-article pages that are retained for historical interest shouldn't be modified if they won't be visible to readers.
  • Redirects with this title should be left alone if they are redirecting readers to a gender-neutral title

If the word is found the names of articles and categories (except those with names directly related to UAVs), those should be renamed, and the links changed. Many articles have already been renamed, and the links just need to be updated. (Remember that to rename a category, all the articles in that category must be edited to change their pointers.)

Borderline cases

[edit]

These may need to be discussed before being changed.

  • Manned Venus flyby - Based on the NASA style guide, NASA probably would now refer to this as "human Venus flyby" but historical sources say "manned Venus flyby" so that's what the majority of editors commenting on the talk page currently favor. There is some question as to whether the scope of the article concerns a specific mission or this type of mission in general, which is related to the proper name exception (but then the title would be "Manned Venus Flyby"). Compare Colonization of Venus and Human mission to Mars. -- Beland (talk) 19:41, 21 May 2019 (UTC)[reply]
Discussion in progress on Talk:Manned Venus flyby. -- Beland (talk) 09:37, 5 January 2022 (UTC)[reply]

Objections in specific cases:

Marriage

[edit]

Wikipedia:Writing about women § Marriage points out:

Ladies

[edit]

Wikipedia:Writing about women § Girls, ladies prefers "women" to "ladies" except where part of set phrases or traditional titles (like first lady). find all lowercase "ladies"

Instructional and presumptuous language

[edit]

MOS:NOTE says to avoid the following phrases when they address the reader directly. Not all instances are problematic, such as those in direct quotations.

Internationally comprehensible spelling and vocabulary

[edit]

MOS:COMMONALITY advises the use of vocabulary and spellings that are shared across national varieties of English, where possible. This section collects instances where an unshared term is being used which could be improved. For proper nouns and direct quotes, a translation or re-spelling into another dialect may be helpful.

looks like its wrapped up, with jail preferred except in proper nouns Xurizuri (talk) 15:36, 21 December 2020 (UTC)[reply]

Currency style

[edit]

Per MOS:CURRENCY:

  • For the UK, Irish, Australian, New Zealand, and South African pound, ₤ should be changed to £
  • ₤ is OK to use with Italian lira. Changing e.g. ₤100,000 to [[Italian lira|₤]]100,000 will prevent legitimate uses from showing up in automated reports, and also help readers understand that this is not British pounds. (Mentions of Italian lira are increasingly rare because it has been replaced by the Euro.)

Find all problem cases for ₤

Caution: Not all problem pages show up reliably; if you do a search, fix all the pages in the results, and then do another search, you will probably get a fresh batch of problem pages. It may also take a minute or two for fixed pages to disappear from the results, due to lag updating the search index.

Work is in progress on detecting and fixing other MOS-related issues with numbers and currencies.

Small caps

[edit]

Per MOS:SMALLCAPS, smallcaps are not to be used for years like "400 BC". Find all instances of known smallcaps issues...

HTML tags

[edit]

Updated from 2024-04-01 dump.

You can do one of two things for these articles:

  • Remove, repair, or convert the HTML markup to wiki markup yourself.
  • Tag the article {{cleanup HTML}} and it will show up under Category:Articles with HTML markup but not on this list. Use the "tags" parameter to indicate which tags are present on the page; many editors find it hard to locate the offending HTML. For example: {{cleanup HTML|tags=table, cite}}

How to clean up

[edit]

See Category:Articles with HTML markup for instructions on how to find the offending tags and what to do about them.

Find all articles by tag

[edit]

Can't wait for the next database dump? Want to look for or fix all instances of a specific tag? Use the links below!

Additional HTML problems are listed at Special:LintErrors.


Sometimes editors use angle brackets (< and >) for other purposes. Though these are not HTML markup, they often need to be fixed.

<<...>> find all can indicate:

  • French quotation marks rendered as <<quoted text>>. These should be normalized to "quoted text" or 'quoted text', even in quotations, per MOS:CONFORM.
  • A broken citation that should be converted to {{cite web}})

Other weirdness:

  • <the> - find all - More French quoting style, bad linking, bad citation style, etc.
  • <blockquote> sometimes shows up on the reports if it is capitalized or all-caps on the article page. It should be all lowercase.

Known bad HTML tags (HB)

[edit]

These are also included in the main listings.

[edit]

These are also included in the main listings. Angle brackets are not used for external links (per Wikipedia:Manual of Style/Computing § Exposed URLs); "tags" like <https> and <www> are actually just bad link formatting. See Wikipedia:External links#How to link for external link syntax; use {{cite web}} for footnotes.

Unsorted (H)

[edit]

Many of these can be replaced by {{var}} (for text to be replaced) or {{angbr}} (e.g. for linguistic notation). Enclose in <code>...</code> for inline software source code.

Need debugging

[edit]

Notification of new dumps

[edit]

"Most likely misspellings by articles" should always have work to do (if not, ping Beland to add more from the current dump). Some of the other sections are occasionally waiting for a new dump to get a useful list, either because they are ranked by frequency or a code change has been made to clean up noise in the next run. New runs are generally posted twice a month. The database snapshot from the first day of the month generally takes about 9-13 days to process, and the snapshot from the twentieth day of the month might take 4-6 days until it can be posted.

All that said, if you want to get a ping when results from a new dump are posted, you can add your name to the list below. If you are only interested in a particular section, include a note to that effect.

moss code and data sources

[edit]

moss is written in Python, and is available on github at: https://github.com/cdbeland/moss

Data is obtained from XML database backup dumps.

  1. ^ "Phosphorus recovery from human urine and anaerobically treated wastewater through pH adjustment and chemical precipitation". Environmental technology. PMID 21879544.