Skip to content

Encode to EBCDIC doesn't take into account conversion table irregularities #74771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
VladimirFilippov mannequin opened this issue Jun 7, 2017 · 3 comments
Open

Encode to EBCDIC doesn't take into account conversion table irregularities #74771

VladimirFilippov mannequin opened this issue Jun 7, 2017 · 3 comments
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@VladimirFilippov
Copy link
Mannequin

VladimirFilippov mannequin commented Jun 7, 2017

BPO 30586
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-06-07.11:41:40.052>
labels = ['type-bug', 'invalid', 'expert-unicode']
title = "Encode to EBCDIC doesn't take into account conversion table irregularities"
updated_at = <Date 2017-06-07.15:45:17.575>
user = 'https://bugs.python.org/VladimirFilippov'

bugs.python.org fields:

activity = <Date 2017-06-07.15:45:17.575>
actor = 'Vladimir Filippov'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Unicode']
creation = <Date 2017-06-07.11:41:40.052>
creator = 'Vladimir Filippov'
dependencies = []
files = []
hgrepos = []
issue_num = 30586
keywords = []
message_count = 3.0
messages = ['295329', '295336', '295352']
nosy_count = 4.0
nosy_names = ['vstinner', 'ezio.melotti', 'serhiy.storchaka', 'Vladimir Filippov']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue30586'
versions = ['Python 3.6']

@VladimirFilippov
Copy link
Mannequin Author

VladimirFilippov mannequin commented Jun 7, 2017

These 4 symbols were encoded incorrectly to EBCDIC (codec cp500): "![]|". Correct table of conversation for these symbols described in https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Conversion_Table_Irregularities.html

This code:
--------------------
ascii = '![]|';
print("ASCII: " + bytes(ascii, 'ascii').hex())
res = ascii.encode('cp500')
print ("EBCDIC: " +res.hex())
--------------------
on Python 3.6.1 produce this output:
--------------------
ASCII: 215b5d7c
EBCDIC: 4f4a5abb
--------------------

Expected encoding (from IBM's table):
! - 5A
[ - AD
] - BD
| - 4F

Workaround: use this translation after encoding
bytes.maketrans(b'\x4F\x4A\x5A\xBB', b'\x5A\xAD\xBD\x4F')

@VladimirFilippov VladimirFilippov mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jun 7, 2017
@serhiy-storchaka
Copy link
Member

The cp500 codec in Python is generated from the table ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT .

There are several EBCDIC code pages. EBCDIC-compatible encodings supported in Python are: cp037, cp273, cp424, cp500, cp875, cp1026 and cp1140. Three of them, cp037, cp424 and cp1140, encode '!' to b'\x5A' and '|' to b'\x4F'.

@VladimirFilippov
Copy link
Mannequin Author

VladimirFilippov mannequin commented Jun 7, 2017

According to ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT symbols [ and ] have other codes (instead of 0xAD and 0xBD):
0xBA 0x005B #LEFT SQUARE BRACKET
0xBB 0x005D #RIGHT SQUARE BRACKET

Looks like ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT was created based on https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_ASCII_to_EBCDIC.html
But this information "This translation is not bidirectional. Some EBCDIC characters cannot be translated to ASCII and some conversion irregularities exist in the table. For more information, see Conversion table irregularities." was ignored. Additional, this line from CP500.TXT:
0xBB 0x007C #VERTICAL LINE
haven't any source in IBM's table.

Example from z/OS mainframe:
-------------------
bash-4.3$ iconv -f 819 -t 1047 -T ascii.txt > ebcdic.txt
bash-4.3$ ls -T *.txt
t ISO8859-1 T=on ascii.txt
t IBM-1047 T=on ebcdic.txt
bash-4.3$ cat ascii.txt
![]|bash-4.3$ od -h ascii.txt
0000000000 21 5B 5D 7C
0000000004
bash-4.3$ cat ebcdic.txt
![]|bash-4.3$ od -h ebcdic.txt
0000000000 5A AD BD 4F
0000000004
-------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-unicode type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

2 participants