Page MenuHomePhabricator

Convert more wikis to numerical sorting
Closed, ResolvedPublic3 Estimated Story Points

Description

Collecting requests from wikis asking for numerical sorting. We'll save up a batch, and then do them at the same time.

Before we start these -- let Danny & Johan know, so they can notify the wikis.

Bengali WP
Consensus discussion: https://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A6%BF%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:%E0%A6%86%E0%A6%B2%E0%A7%8B%E0%A6%9A%E0%A6%A8%E0%A6%BE%E0%A6%B8%E0%A6%AD%E0%A6%BE#.E0.A6.AC.E0.A6.BF.E0.A6.B7.E0.A6.AF.E0.A6.BC.E0.A6.B6.E0.A7.8D.E0.A6.B0.E0.A7.87.E0.A6.A3.E0.A7.80.E0.A6.A4.E0.A7.87_.E0.A6.B8.E0.A6.82.E0.A6.96.E0.A7.8D.E0.A6.AF.E0.A6.BE.E0.A6.B0_.E0.A6.95.E0.A7.8D.E0.A6.B0.E0.A6.AE_.E0.A6.A0.E0.A6.BF.E0.A6.95_.E0.A6.B0.E0.A6.BE.E0.A6.96.E0.A6.BE.E0.A6.B0_.E0.A6.AC.E0.A7.8D.E0.A6.AF.E0.A6.AC.E0.A6.B8.E0.A7.8D.E0.A6.A5.E0.A6.BE
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_bn.40wikipedia_and_bn.40wikisource

Bengali Wikisource
Consensus discussion: https://bn.wikisource.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%B8%E0%A6%82%E0%A6%95%E0%A6%B2%E0%A6%A8:%E0%A6%B8%E0%A7%8D%E0%A6%95%E0%A7%8D%E0%A6%B0%E0%A6%BF%E0%A6%AA%E0%A7%8D%E0%A6%9F%E0%A6%B0%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE%E0%A6%AE#.E0.A6.AC.E0.A6.BF.E0.A6.B7.E0.A6.AF.E0.A6.BC.E0.A6.B6.E0.A7.8D.E0.A6.B0.E0.A7.87.E0.A6.A3.E0.A7.80.E0.A6.A4.E0.A7.87_.E0.A6.B8.E0.A6.82.E0.A6.96.E0.A7.8D.E0.A6.AF.E0.A6.BE.E0.A6.B0_.E0.A6.95.E0.A7.8D.E0.A6.B0.E0.A6.AE_.E0.A6.A0.E0.A6.BF.E0.A6.95_.E0.A6.B0.E0.A6.BE.E0.A6.96.E0.A6.BE.E0.A6.B0_.E0.A6.AC.E0.A7.8D.E0.A6.AF.E0.A6.AC.E0.A6.B8.E0.A7.8D.E0.A6.A5.E0.A6.BE
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_bn.40wikipedia_and_bn.40wikisource

Czech WP
Consensus discussion: https://cs.wikipedia.org/wiki/Wikipedie:Pod_l%C3%ADpou#.C5.98azen.C3.AD_.C4.8Dl.C3.A1nk.C5.AF_v_kategori.C3.ADch_podle_.C4.8D.C3.ADsel
Talk page request: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Natural_number_sorting_on_cswiki

French WP
Consensus discussion: https://fr.wikipedia.org/wiki/Discussion_Projet:Cat%C3%A9gories#Mini-sondage_:_tri_automatique_des_nombres_dans_les_cat.C3.A9gories

Hebrew WP
Consensus discussion: https://he.wikipedia.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94:%D7%9E%D7%96%D7%A0%D7%95%D7%9F#.D7.A9.D7.99.D7.A0.D7.95.D7.99_.D7.A9.D7.99.D7.98.D7.AA_.D7.94.D7.9E.D7.99.D7.95.D7.9F_.D7.A7.D7.98.D7.92.D7.95.D7.A8.D7.99.D7.95.D7.AA_.D7.A2.D7.9D_.D7.9E.D7.A1.D7.A4.D7.A8.D7.99.D7.9D_.D7.91.D7.A9.D7.9E.D7.95.D7.AA
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_on_hewiki

Hungarian WP
Consensus discussion: https://hu.wikipedia.org/wiki/Wikip%C3%A9dia:Kocsmafal_(javaslatok)#Kateg.C3.B3ri.C3.A1k_numerikus_rendez.C3.A9se
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_in_hu.40wikipedia

Italian WP
Consensus discussion: https://it.wikipedia.org/w/index.php?title=Wikipedia:Bar/Discussioni/Ordine_alfabetico_di_default&diff=0&oldid=83416864
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Numerical_sorting

Norwegian (Bokmål) WP
Consensus discussion: https://no.wikipedia.org/wiki/Wikipedia:Tinget#Numerisk_sortering_i_kategorier
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numeric_sorting_at_no.wikipedia

Polish WP
Consensus discussion: https://pl.wikipedia.org/wiki/Wikipedia:Kawiarenka/Og%C3%B3lne#Zmiana_konfiguracji_.E2.80.93_w.C5.82.C4.85czenie_poprawnego_sortowania_numerycznego_artyku.C5.82.C3.B3w_na_stronach_kategorii
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Numerical_sorting_on_pl.wp

Russian WP
Consensus discussion: https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A4%D0%BE%D1%80%D1%83%D0%BC/%D0%9F%D1%80%D0%B5%D0%B4%D0%BB%D0%BE%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F#.D0.A1.D0.BE.D1.80.D1.82.D0.B8.D1.80.D0.BE.D0.B2.D0.BA.D0.B0_.D1.87.D0.B8.D1.81.D0.B5.D0.BB_.D0.B2_.D0.BA.D0.B0.D1.82.D0.B5.D0.B3.D0.BE.D1.80.D0.B8.D1.8F.D1.85
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enable_numerical_sorting_in_ru.40wikipedia

Vietnamese WP
Consensus discussion: https://vi.wikipedia.org/wiki/Wikipedia:Th%E1%BA%A3o_lu%E1%BA%ADn/S%E1%BA%AFp_x%E1%BA%BFp_c%C3%A1c_th%E1%BB%83_lo%E1%BA%A1i_theo_gi%C3%A1_tr%E1%BB%8B_s%E1%BB%91_%C4%91%E1%BA%BFm_thay_v%C3%AC_theo_t%E1%BB%ABng_ch%E1%BB%AF_s%E1%BB%91_%C4%91%C6%A1n_thu%E1%BA%A7n
Request on talk page: https://meta.wikimedia.org/wiki/User_talk:DannyH_(WMF)#Enabling_numerical_sorting_on_vi.wikipedia

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

You don't anticipate anything possibly going wrong in the transition, right?

During the time that the script is running, the sorting on some category pages gets weird -- newly-updated pages get the new sorting while older pages are still in the old sorting, and using "next page" doesn't work as you'd expected.

But those problems get corrected as the script reaches that category, and when the script is done, everything works the way that it's supposed to.

The script took about four hours to run when converted Swedish Wikipedia. I'd expect it to be shorter than that for Italian, because there are fewer pages.

English WP took six days, but that's the biggest by far, and it turns out the process is exponential rather than arithmetic. :)

Could it be written a proper page describing how this subsystem works and what the expected impact will be? It seems like the implemented system is somewhat different from the announced system.

DannyH set the point value for this task to 3.
DannyH moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

Could it be written a proper page describing how this subsystem works and what the expected impact will be?

@jeblad: Documentation can be found at https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation. In most of these cases (except Hebrew and Norwegian), the wiki is switching from uca-<langcode> to uca-<langcode>-u-kn, in which case the only difference is the addition of the numeric sorting feature. For Hebrew and Norwegian, they are starting from uppercase collation. Norwegian can either upgrade to uca-no-u-kn (which is Unicode Collation Algorithm tailored for Norwegian + numeric sorting) or numeric (which is identical to what they have now, but with numeric sorting). Hebrew isn't supported by our IcuCollation class, so they can only upgrade to numeric (unless someone modifies IcuCollation to support Hebrew in the very near future).

How will these three blocks be sorted

  • foo 123 456 bar
  • foo 456 123 bar
  • foo 789 bar
  • foo 789

  • foo 123.456 bar
  • foo 456.123 bar
  • foo 789 bar
  • foo 789

  • foo 123,456 bar
  • foo 456,123 bar
  • foo 789 bar
  • foo 789

@jeblad: Numeric sorting only works for unbroken sequences of digits. Digits separated by commas, periods, or spaces are treated as separate numbers (and thus may still require DEFAULTSORT keys).

Change 316486 had a related patch set uploaded (by Kaldari):
Switching 10 wikis to numeric category collation per T146675

https://gerrit.wikimedia.org/r/316486

Change 316486 merged by jenkins-bot:
Switching 10 wikis to numeric category collation per T146675

https://gerrit.wikimedia.org/r/316486

Mentioned in SAL (#wikimedia-operations) [2016-10-19T23:20:11Z] <dereckson@mira> Synchronized wmf-config/InitialiseSettings.php: Switching 10 more wikis to numeric category collation (T146675) (duration: 00m 59s)

Well, there is a problem. 99<019<101

Well, there is a problem. 99<019<101

Where are you seeing this?

Thanks @IKhitron! I've filed a bug for that: T148774. I think I know exactly how to fix this, so it shouldn't take long.

Btw, how do you sort

  • abc 20 5
  • abc 5 20
  • abc 5 80

? Can I assume that it wil be sorted by first number and then by second?

@IKhitron: Yes. I'm not 100% sure there are no bugs with complicated sequences of numbers, but I just tested it locally and got the following sort order:

  • Abc 5 3
  • Abc 5 20
  • Abc 5 80
  • Abc 20 5

@IKhitron: The patch to fix leading zeros has been merged. It should get deployed to he.wiki next Thursday. Then we can rebuild the sortkeys on Thursday evening or Friday.

Thanks, and it will be deployed next Wednesday, @kaldari.

All the languages besides Norwegian are finished.

Change 317652 had a related patch set uploaded (by Kaldari):
Switch Norwegian Wikipedia to uca-no-u-kn category collation

https://gerrit.wikimedia.org/r/317652

Change 317652 merged by jenkins-bot:
Switch Norwegian Wikipedia to uca-no-u-kn category collation

https://gerrit.wikimedia.org/r/317652

Norwegian is finished.

For Hebrew and Norwegian, they are starting from uppercase collation. Norwegian can either upgrade to uca-no-u-kn (which is Unicode Collation Algorithm tailored for Norwegian + numeric sorting) or numeric (which is identical to what they have now, but with numeric sorting). Hebrew isn't supported by our IcuCollation class, so they can only upgrade to numeric (unless someone modifies IcuCollation to support Hebrew in the very near future)

I added he to IcuCollation in https://gerrit.wikimedia.org/r/318674

@kaldari, or maybe @DannyH? When can we expect the rerun? Thank you.

@IKhitron: Rerunning now. Should be done in a few hours. Sorry for the delay.