0% found this document useful (0 votes)
268 views

Unicode Support in Oracle9i Database

Content: Existing Unicode Support in 8i New Unicode Features in 9i Character Semantics Support in 9i Unicode reliable data type as NCHAR in 9i VARCHAR2 vs. NVARCHAR2 for Unicode UTF-8 or UTF-16 for NCHAR Unicode Access Interface Unicode Migration and Compatibility

Uploaded by

rp.anbu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views

Unicode Support in Oracle9i Database

Content: Existing Unicode Support in 8i New Unicode Features in 9i Character Semantics Support in 9i Unicode reliable data type as NCHAR in 9i VARCHAR2 vs. NVARCHAR2 for Unicode UTF-8 or UTF-16 for NCHAR Unicode Access Interface Unicode Migration and Compatibility

Uploaded by

rp.anbu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Unicode Support in Oracle9i Database

®
Topics

• Customer New Requirement


• Existing Unicode Support in 8i
• New Unicode Features in 9i
• Character Semantics Support in 9i
• Unicode reliable data type as NCHAR in 9i
• VARCHAR2 vs. NVARCHAR2 for Unicode
• UTF-8 or UTF-16 for NCHAR
• Unicode Access Interface
• Unicode Migration and Compatibility
• Conclusion
®
Requirements

• Consistent character length semantics


– Column definition close to visual length
– SQL standard recommends varchar2(10) to hold
10 characters
• Reliable Unicode data type
– To be independent of database character set
– To enable gradually migration to Unicode
– To enable third vendor/component for Unicode
• Easy programming for other environments
– Java, XML, Window NT
• High information density for performance
– Storage efficiency for Asian or European data
®
Existing Unicode Support in 8i

• UTF-8 as database character set


– UTF8 for ASCII based platform
– UTFE for EBCDIC based platform
• UTF-8 as client character set
• UTF-16 for OCI bind/define buffer
• UCHAR/UVARCHAR as UTF-16 in ProC
• UTF-16 for ODBC and OLEDB
• UTF-16 for JDBC
• Unicode binary sort

®
New Unicode features in Oracle9i

• Character semantics support for text column


• Reliable Unicode datatype as NCHAR
• UTF-16 support for Oracle Call Interface(OCI)
• Complete Unicode support for
ODBC/OLEDB /JDBC
• Unicode and ISO14651 based multilingual sort
• Unicode enabled Oracle utilities such as
SQL*Loader
• Unicode based locale builder for locale
customization

®
Length Semantics Support in Oracle9i
• A new semantics
– CHAR ( size [BYTE | CHAR] )
– VARCHAR2 ( size [BYTE | CHAR] )
• It meets Ansi SQL standard
– the size is defined in character in the standard, but
most vender implemented in byte
• It fulfills Customer’s requirement
– Portable database schema
– Character set independent
– Same data size across server, client, and third
middle tier
– Easy migration to Unicode support

®
Character Semantics Support in 9i - Cont.

• Character semantics column


– Explicit with quantifier: varchar2(30 char)
– Implicit with NLS_LENGTH_SEMANTICS setting to
‘char’ for varchar2(30)
• Same semantics support for PL/SQL variable
• Character length constraint checking
• SQL functions for different flavor of length
semantics in Unicode
– like/like2/like4/likec
– lengthb/length/length2/length4/lengthc
– substrb/substr/substr2/substr4/substrc
– compose/decompose

®
Character Semantics Support in 9i - Cont.

• UTF-16 semantics
– UTF8 encodes surrogate by a pair of three bytes
– It has the same semantics as UTF-16 and has the match
between varchar2(10 char) and wchar(10)
– It has the same binary sorting order as UTF-16
• UTF-32 semantics
– AL32UTF8 follows UTF-8 standard by encoding
surrogate in 4 bytes
– It has the same semantics as UTF-32 in coding point
and the same binary order
• Conversion between UTF8 and AL32UTF8
– AL32UTF8 can be used at client for the UTF-8
compliance

®
Reliable Unicode Data Type Support

• NCHAR, NVARCHAR2, NCLOB


- 8i NCHAR: any fixed width character set
- Defined in SQL standard
• Unicode Character set encoding
- UTF-8, UTF-16
- Independent on DB character set
• Character Length Semantics Only
- Avoid migration issues in the future
• Support Unicode in non-Unicode database

®
Inter-operability With Other
Data Types
• Explicit Conversion Functions
- TO_NCHAR()
- TO_CHAR()
- ROWIDTONCHAR()
- CHARTOROWID()
- TO_CLOB()
- TO_NCLOB()
- TO_NUMBER()
- TO_DATE()
- TO_TIMESTAMP()
- TO_TIMESTAMP_TZ()
- TO_YMINTERVAL()
…...

®
Inter-operability - cont.
• Implicit Conversion
- Between NCHAR and CHAR types
- Between NCHAR and NUMBER, DATE, ROWID, RAW,
CLOBs etc.
• Conversion Direction:
- Insert/select into/update/assignment operations:
convert to target
- Comparison, concatenation: SQL CHAR to SQL
NCHAR avoid any data loss
- SQL function: convert to first string parameter
• Makes migration to SQL NCHAR much
easier
®
Data Loss Exception Handling
• NLS Parameter:
- NLS_NCHAR_CONV_EXCP
- Dynamically changed in each session
- Effective for both explicit and implicit conversions
• Smoothness of operation vs. accuracy of
operation

®
SQL Unicode String
Processing
• Same level of support as CHAR
- Can use NCHAR same way as CHAR.
• SQL functions support for NCHAR
- SUBSTR, LENGTH, INSTR, LIKE, CONCAT,
LPAD/RPAD, LTRIM, RTRIM, NLS_SORT,
NLS_UPPER, NLS_LOWER etc.

- UNISTR, ASCIISTR
• Mixed type arguments
- CONCAT(nchar,char) - result type is based on first string
parameter
• Easy programming
®
Unicode Database vs. Unicode Data Type

• Codepoint semantics for UTF8 will make


Oracle database a virtual UTF-16 database
– There is no need to use NVARCHAR2 unless it is
for the storage compression for Asian data
– The migration effort is minimum as there is no
need to convert VARCHAR2 into NVARCHAR2
– It is recommended to use one-step migration for a
new system
• NCHAR/NVARCHAR2
– It allows incremental migration to Unicode
– It is always a Unicode column
– It can use UTF-16 encoding natively

®
NCHAR Choice between
UTF8 and AL16UTF16
• UTF-8
- ASCII compatible
- Internet friendly: HTML, XML etc.
- More space efficient for western languages
• UTF-16
- More space efficient for Asian languages
- Faster in string processing
- Supported by JAVA, WINDOWS etc.

®
Programming Interfaces
• OCI Unicode Support
- Support UTF-16 bind/define buffers
- Unicode meta data, SQL_TEXT, error
messages through mode parameter
- Unicode interface support independent on
server or client character set
- Character length semantics
• PL/SQL
• Pro*C/C++: Unicode support through UCHAR,
UVARCHAR
• JDBC
• ODBC/OLEDB
®
Migration, Conversion and
Compatibility
• Old NCHAR to 9i NCHAR migration
• Migration to Unicode Columns
ALTER TABLE tname MODIFY col (NCHAR(n))

• Convert whole database to Unicode


database

®
Migration, Conversion and
Compatibility
• Character length semantics
- Database schema
ALTER TABLE tname MODIFY col (CHAR(n CHAR))
- Modify application to be in sync with database length
semantics

Example: PL/SQL migration


1. Set NLS_LENGTH_SEMANTICS to CHAR
2. Apply %ROWTYPE, or explicit CHAR quantifier
3. Change substrb, lengthb and instrb to substr,
length and
instr

®
Summary

• A flexible and complete Unicode support


– Character semantics on UTF-8 or Unicode data type
– All major access interfaces support Unicode
• High performance by high information density
– UTF-8 for Western scripts
– UTF-16 for Asian scripts
• Easy programming
– Same length semantics between database and
other components
• Easy migration
– One step migration or gradual migration

You might also like