Page MenuHomePhabricator

Stress marks should be stripped as part of the title normalization
Open, LowestPublic

Description

Wiktionary (https://en.wiktionary.org/wiki/) can't find the Russian word with the stress mark "восто́рг" but finds an article for the spelling without the stress mark: "восторг".

These two are the exactly the same word. All stress marks should be stripped, at least in Wiktionary.

This is a very bad problem because people will be accidentally creating variants of the same words, when these variants are perceived as the same word.

Event Timeline

Yurivict raised the priority of this task from to Needs Triage.
Yurivict updated the task description. (Show Details)
Yurivict subscribed.

I think this deserves high priority

Thanks for taking the time to report this!

Could you provide a list of steps to reproduce so someone else could see exactly the same problem, for example what you mean by "can't find the word" exactly? Also see https://www.mediawiki.org/wiki/How_to_report_a_bug for more information.
Thanks a lot!

Steps to reproduce:

  1. Append the word восторг to https://en.wiktionary.org/wiki/, open it in the browser, and see that this is an existing Wiktionary page
  2. Append the word восто́рг to https://en.wiktionary.org/wiki/, open it in the browser, and observe how Wiktionary offers to create the page

This is wrong: the expected behavior is that восто́рг should redirect to восторг. There should not be any page created that contains the stress mark, but Wiktionary offers to create one (!!!)

People can accidentally create duplicate pages for the same word, thinking that page doesn't exist.

Krenair renamed this task from Stress marks should be stripped as part of the wiktionary (or wikipedia) title normalization to Stress marks should be stripped as part of the title normalization.Aug 17 2015, 2:41 AM
Krenair set Security to None.
Krenair added a project: MediaWiki-General.
Krenair subscribed.
Aklapper triaged this task as Lowest priority.Aug 17 2015, 10:44 AM

the expected behavior is that восто́рг should redirect to восторг.

I would not call that expected behavior at all, as such stress marks are not part of how words are spelled. Plus many languages use (similar looking) accents as part of how words are spelled, so such a request would have to be a per-language/script setting anyway.

I'm proposing to decline this request.

In Russian they add stress mark to tell where the stress is, but the canonic form has no stress mark.

It should be possible to write links with stress marks, and expect the server to generate redirect. This is the best behavior from the user perspective.

If some other languages are different - there should be different rules for them. That's why there are language sections in wiktionary documents.

Mediawiki cannot apply a rule for a language without knowing what is the expected language. Writing a pagename only in the url is not enough, and you should not expect the server to generate a redirect from it.

The best way to tackle this issue is to use links in the form [[восторг|восто́рг]] (no ambiguity here).

In any case, this should be part of the policies of each Wiktionary project. I don't think it is directly relevant to Mediawiki.