Unicode - Overview: ABAP Systems
Unicode - Overview: ABAP Systems
Unicode - Overview
Application Server ABAP supports both Unicode and non-Unicode systems. Non-Unicode systems are
traditional ABAP systems in which one character is usually represented by one byte. Unicode systems are
ABAP systems that are based on Unicode representation and an appropriate operating system and database.
Before Unicode, SAP used various different codes for representing characters in different fonts, such as ASCII,
EBCDIC as single-byte code pages, or double-bytecode pages:
ASCII (American Standard Code for Information Interchange) encodes every character with one byte.
This means that a maximum of 256 characters can be displayed (strictly speaking, standard ASCII only
encodes one character using 7 bit and can therefore only represent 128 characters. The extension to 8
bit is introduced with ISO-8859). Examples of common code pages are ISO-8859-1 for Western
European, or ISO-8859-5 for Cyrillic fonts.
EBCDIC (Extended Binary Coded Decimal Interchange) also encodes each character using one byte,
and can therefore also represent 256 characters. For example,EBCDIC 0697/0500 is an IBM format
that has been used on the AS/400 (now known as IBM System i)platform for Western European fonts.
Double byte code pages require between 1 and 2 bytes per character. This enables the representation
of 65,536 characters, of which only 10,000 to 15,000 characters are normally used. For example, the
code page SJIS is used for Japanese and BIG5 for traditional Chinese fonts.
Using these character sets, all languages can be handled individually in one AS ABAP. Difficulties arise if texts
from different incompatible character sets are mixed in one central system. The exchange of data between
systems with incompatible character sets can also lead to problems.
The solution to this problem is the use of a character set that includes all characters at once. This is realized by
Unicode (ISO/IEC 10646). A variety of Unicode character representations is possible for the Unicode character
set, for example UTF, in which a character can occupy between one and four bytes.
From Release 6.10, the SAP NetWeaver Application Server supports both Unicode and non-Unicode systems.
Non-Unicode systems are conventional ABAP systems, in which one character is usually represented by one
byte. Unicode systems are ABAP systems that are based on a Unicode character set and which have a
corresponding underlying operating system, including a database.
Before Release 6.10, many ABAP programming methods were based on the fact that one character
corresponds to one byte. Before a system is converted to Unicode, ABAP programs must therefore be modified
at all points where an explicit or implicit assumption is made about the internal length of a character.
ABAP supports this conversion using new syntax rules and new language constructs, whereby emphasis was
placed on retaining as much of the existing source code as possible. As a preparation for the conversion to
Unicode - but also independently of whether a system will actually be converted to Unicode - the checkbox
Unicode checks active can be selected in the program properties. The transaction UCCHECK supports the
activation of this check for existing programs. If this property is set, the program is identified as a Unicode
program. In a Unicode program, an additional stricter syntax check is performed than in non-Unicode
programs. In some cases, statements must also be enhanced by using new additions. A syntactically correct
Unicode program will normally run with the same semantics and the same results in Unicode and non-Unicode
systems. (Exceptions to this rule are low-level programs that query and evaluate the number of bytes per
character). Programs that are required to run in both systems should therefore also be tested on both
platforms.
In a Unicode system, only Unicode programs can be executed. Before converting to a Unicode system, the
profile parameter abap/unicode_check should be set to "ON" so that only the execution of Unicode programs is
permitted. Non-Unicode programs can only be executed in non-Unicode systems. All language constructs that
have been introduced for Unicode programs can, however, also be used in non-Unicode programs.
It has been established that existing programs that have been programmed with no errors mostly fulfill the new
Unicode rules and therefore require very little modification. Conversely, most programs that require significant
changes are due to an error-prone and therefore questionable programming style. Even if you are not planning
a conversion to a Unicode system, Unicode programs are preferable because they are more easily maintained
and less prone to errors. Just as outdated and dangerous language constructs are declared obsolete and are
no longer permitted for use in ABAP objects, the rules for Unicode programs also offer increased security when
programming, for example when working with character fields and mixed structures. This applies particularly for
the storage of external data (for example using the file interface), which has been completely reviewed for use
in Unicode programs. When creating a new program, SAP therefore recommends that you always identify the
program as a Unicode program, and older programs can be converted to Unicode in stages.