Skip to content

add thai encoding aliases to encodings.aliases #61456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fomclyahoocom mannequin opened this issue Feb 20, 2013 · 11 comments
Closed

add thai encoding aliases to encodings.aliases #61456

fomclyahoocom mannequin opened this issue Feb 20, 2013 · 11 comments
Labels
3.9 only security fixes topic-unicode type-feature A feature request or enhancement

Comments

@fomclyahoocom
Copy link
Mannequin

fomclyahoocom mannequin commented Feb 20, 2013

BPO 17254
Nosy @malemburg, @ezio-melotti, @btwood
PRs
  • gh-61456: Add Thai language codec aliases #15079
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2013-02-20.11:48:44.247>
    labels = ['type-feature', '3.9', 'expert-unicode']
    title = 'add thai encoding aliases to encodings.aliases'
    updated_at = <Date 2020-09-10.17:33:40.641>
    user = 'https://bugs.python.org/fomclyahoocom'

    bugs.python.org fields:

    activity = <Date 2020-09-10.17:33:40.641>
    actor = 'Benjamin Wood'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Unicode']
    creation = <Date 2013-02-20.11:48:44.247>
    creator = 'fomcl@yahoo.com'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 17254
    keywords = ['patch']
    message_count = 10.0
    messages = ['182489', '182493', '182513', '182522', '182528', '182529', '348902', '354287', '368192', '376689']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'ezio.melotti', 'era', 'fomcl@yahoo.com', 'Benjamin Wood']
    pr_nums = ['15079']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17254'
    versions = ['Python 3.9']

    @fomclyahoocom
    Copy link
    Mannequin Author

    fomclyahoocom mannequin commented Feb 20, 2013

    This is almost identical to: http://bugs.python.org/issue854511
    However, tis602, which is mentioned in the orginal bug report, is not an alias to cp874. Therefore, I propose the following:

    import encodings
    
    aliases = encodings.aliases.aliases
    more_aliases = {'ibm874'     : 'cp874',
                    'iso_8859_11': 'cp874',
                    'iso8859_11' : 'cp874',
                    'windows_874': 'cp874',
                   }
    aliases.update(more_aliases)

    @malemburg
    Copy link
    Member

    On 20.02.2013 12:48, albertjan wrote:

    New submission from albertjan:

    This is almost identical to: http://bugs.python.org/issue854511
    However, tis602, which is mentioned in the orginal bug report, is not an alias to cp874. Therefore, I propose the following:

    import encodings

    aliases = encodings.aliases.aliases
    more_aliases = {'ibm874' : 'cp874',
    'iso_8859_11': 'cp874',
    'iso8859_11' : 'cp874',
    'windows_874': 'cp874',
    }
    aliases.update(more_aliases)

    Please provide evidence that those encodings are indeed the same.

    Thanks,

    Marc-Andre Lemburg
    eGenix.com

    @fomclyahoocom
    Copy link
    Mannequin Author

    fomclyahoocom mannequin commented Feb 20, 2013

    Hi,
     
    I found this report that includes your name:
    http://mail.python.org/pipermail/python-bugs-list/2004-August/024564.html
     
    Other relevant websites:
    http://en.wikipedia.org/wiki/ISO/IEC_8859-11  # is wikipedia 'proof'?
    http://code.ohloh.net/file?fid=dhX2dJrRWGISzQAijawMU6qzWJQ&cid=YD58Y-grdtE&s=&browser=Default
    http://msdn.microsoft.com/en-us/goglobal/cc305142.aspx
    http://www.iso.org/iso/catalogue_detail?csnumber=28263  # non-free

    Regards,
    Albert-Jan

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
    fresh water system, and public health, what have the Romans ever done for us?
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

    ----- Original Message -----

    From: Marc-Andre Lemburg <report@bugs.python.org>
    To: fomcl@yahoo.com
    Cc:
    Sent: Wednesday, February 20, 2013 1:22 PM
    Subject: [bpo-17254] add thai encoding aliases to encodings.aliases

    Marc-Andre Lemburg added the comment:

    On 20.02.2013 12:48, albertjan wrote:
    >
    > New submission from albertjan:
    >
    > This is almost identical to: http://bugs.python.org/issue854511
    > However, tis602, which is mentioned in the orginal bug report, is not an
    alias to cp874. Therefore, I propose the following:
    >
    > import encodings
    >
    > aliases = encodings.aliases.aliases
    > more_aliases = {'ibm874'    : 'cp874',
    >                 'iso_8859_11': 'cp874',
    >                 'iso8859_11' : 'cp874',
    >                 'windows_874': 'cp874',
    >                 }
    > aliases.update(more_aliases)

    Please provide evidence that those encodings are indeed the same.

    Thanks,

    Marc-Andre Lemburg
    eGenix.com

    ----------
    nosy: +lemburg


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue17254\>


    @malemburg
    Copy link
    Member

    On 20.02.2013 15:40, albertjan wrote:

    albertjan added the comment:

    Hi,

    I found this report that includes your name:
    http://mail.python.org/pipermail/python-bugs-list/2004-August/024564.html

    Other relevant websites:
    http://en.wikipedia.org/wiki/ISO/IEC_8859-11 # is wikipedia 'proof'?
    http://code.ohloh.net/file?fid=dhX2dJrRWGISzQAijawMU6qzWJQ&cid=YD58Y-grdtE&s=&browser=Default
    http://msdn.microsoft.com/en-us/goglobal/cc305142.aspx
    http://www.iso.org/iso/catalogue_detail?csnumber=28263 # non-free

    Thanks.

    Something is wrong with your request, though:

    • we already have an iso8859_11 code, so aliasing it to some
      other name is not possible

    • we already have an cp874 code, so aliasing it to some
      other name is not possible

    • cp874 differs from iso8859_11 in a few places, so aliasing
      cp874 is not possible (see http://en.wikipedia.org/wiki/ISO/IEC_8859-11#Code_page_874)

    What we could do is add aliases 'x-ibm874' and 'windows_874' to
    'cp874'. I'm not sure whether 'ibm874' and 'x-ibm874' are the same
    thing. The references only mention 'x-ibm874'.

    @fomclyahoocom
    Copy link
    Mannequin Author

    fomclyahoocom mannequin commented Feb 20, 2013

    Sent: Wednesday, February 20, 2013 4:25 PM
    Subject: [bpo-17254] add thai encoding aliases to encodings.aliases

    Thanks.

    Something is wrong with your request, though:

    • we already have an iso8859_11 code, so aliasing it to some
        other name is not possible

    • we already have an cp874 code, so aliasing it to some
        other name is not possible

    • cp874 differs from iso8859_11 in a few places, so aliasing
        cp874 is not possible (see
      http://en.wikipedia.org/wiki/ISO/IEC_8859-11#Code_page_874)

    Sorry about that.
     

    What we could do is add aliases 'x-ibm874' and 'windows_874' to
    'cp874'. I'm not sure whether 'ibm874' and
    'x-ibm874' are the same
    thing. The references only mention 'x-ibm874'.

    The following document says the following are aliases: x-IBM874, cp874, ibm874, ibm-874, 874
    http://www.java2s.com/Tutorial/Java/0180__File/DisplaysAvailableCharsetsandaliases.htm
    http://www.fileformat.info/info/charset/x-IBM874/index.htm
    In addition it seems that 'windows_874' is used (that's the one that raised this issue for me), but I've also seen references of windows-874, windows874 , WIN874:
    http://doxygen.postgresql.org/encnames_8c_source.html

    1 similar comment
    @fomclyahoocom
    Copy link
    Mannequin Author

    fomclyahoocom mannequin commented Feb 20, 2013

    Sent: Wednesday, February 20, 2013 4:25 PM
    Subject: [bpo-17254] add thai encoding aliases to encodings.aliases

    Thanks.

    Something is wrong with your request, though:

    • we already have an iso8859_11 code, so aliasing it to some
        other name is not possible

    • we already have an cp874 code, so aliasing it to some
        other name is not possible

    • cp874 differs from iso8859_11 in a few places, so aliasing
        cp874 is not possible (see
      http://en.wikipedia.org/wiki/ISO/IEC_8859-11#Code_page_874)

    Sorry about that.
     

    What we could do is add aliases 'x-ibm874' and 'windows_874' to
    'cp874'. I'm not sure whether 'ibm874' and
    'x-ibm874' are the same
    thing. The references only mention 'x-ibm874'.

    The following document says the following are aliases: x-IBM874, cp874, ibm874, ibm-874, 874
    http://www.java2s.com/Tutorial/Java/0180__File/DisplaysAvailableCharsetsandaliases.htm
    http://www.fileformat.info/info/charset/x-IBM874/index.htm
    In addition it seems that 'windows_874' is used (that's the one that raised this issue for me), but I've also seen references of windows-874, windows874 , WIN874:
    http://doxygen.postgresql.org/encnames_8c_source.html

    @ezio-melotti ezio-melotti added topic-unicode type-feature A feature request or enhancement labels Feb 22, 2013
    @btwood
    Copy link
    Mannequin

    btwood mannequin commented Aug 2, 2019

    From what I can tell

    cp874 != ibm_874 != iso_8859_11

    What I can say is that the current cp874 is the implementation of the windows_874 code page. The page itself references the microsoft code page, and also contains the appropriate characters (like EURO SIGN).

    https://github.com/python/cpython/blob/master/Lib/encodings/cp874.py
    """ Python Character Mapping Codec cp874 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT' with gencodec.py.

    https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT

    It seems appropriate to at least alias windows_874 with cp874. They are provably the same.

    If someone needs the IBM standard, they may have to write a different code page.

    @btwood
    Copy link
    Mannequin

    btwood mannequin commented Oct 9, 2019

    I've created the codepage alias 874.

    This is only pending is a merge into the mainline.

    Thanks.

    @csabella csabella added the 3.9 only security fixes label Jan 10, 2020
    @btwood
    Copy link
    Mannequin

    btwood mannequin commented May 5, 2020

    This is an easy alias to a valid codepage. I supplied proof that they are the same.

    I don't understand why this has been allowed to languish for 9 months.

    Did I miss something? Is there more work I need to do?

    Thanks

    @btwood
    Copy link
    Mannequin

    btwood mannequin commented Sep 10, 2020

    Bumping this again.

    I'd like to try and understand why this change can not or has not been approved.

    I added the technical info here to the github PR.

    Is there a shortage of reviewers? What can I do to help speed up the process?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @jasonm23
    Copy link

    Please add windows_874 (aka windows-874) as an alias of cp874.

    To parse emails, I need to use:

    import encodings
    encodings.aliases.aliases['windows_874'] = 'cp874'
    

    ambv added a commit that referenced this issue Apr 7, 2025
    Adding aliases for Thai language support. The current code page is an implementation of the windows code page.
    This will alias '874', 'ms874', and 'windows_874' to cp874, adding Thai language support for those users.
    
    Co-authored-by: Łukasz Langa <lukasz@langa.pl>
    @ambv ambv closed this as completed Apr 7, 2025
    seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
    Adding aliases for Thai language support. The current code page is an implementation of the windows code page.
    This will alias '874', 'ms874', and 'windows_874' to cp874, adding Thai language support for those users.
    
    Co-authored-by: Łukasz Langa <lukasz@langa.pl>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes topic-unicode type-feature A feature request or enhancement
    Projects
    Development

    No branches or pull requests

    5 participants