Skip to content

Commit 61bbeb8

Browse files
markusicuecheran
authored andcommitted
ICU-22723 download 76rc
1 parent 73626da commit 61bbeb8

File tree

1 file changed

+177
-43
lines changed

1 file changed

+177
-43
lines changed

docs/download/76.md

Lines changed: 177 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,29 @@ License & terms of use: http://www.unicode.org/copyright.html
1414

1515
# ICU 76
1616

17-
ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o), used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).
17+
ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o),
18+
used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).
1819

1920
## Release Overview
2021

21-
ICU 76 updates to [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog),
22+
ICU 76 updates to
23+
[Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
24+
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)),
2225
including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations.
23-
It also updates to [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog) locale data with new locales and various additions and corrections.
26+
27+
It also updates to
28+
[CLDR 46](https://cldr.unicode.org/downloads/cldr-46)
29+
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html))
30+
locale data with new locales, signficant updates to existing locales,
31+
and various additions and corrections.
32+
For example, the CLDR and Unicode default sort orders are now very nearly the same.
33+
34+
Most of the java.time (Temporal) types can now be formatted directly
35+
using the existing ICU4J date/time formatting classes.
2436

2537
There are some new APIs to make ICU easier to use with modern C++ and Java patterns.
38+
Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs,
39+
and usable on top of binary stable C APIs, which is a first for ICU.
2640

2741
The Java and C++ technology preview implementations of the (also in [tech preview](https://github.com/unicode-org/message-format-wg?tab=readme-ov-file#messageformat-2-technical-preview)) CLDR MessageFormat 2.0 specification have been updated to match recent changes.
2842

@@ -34,7 +48,7 @@ Please use the [icu-support mailing list](https://icu.unicode.org/contacts) and/
3448

3549
The initial release has library version number 76.1.
3650

37-
* Release date: 2024-10-TODO
51+
* Release date: _planned for_ 2024-10-24
3852
* [List of tickets fixed in ICU 76](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20ICU%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20%28Fixed%2C%20%22Fixed%20by%20Other%20Ticket%22%29%20AND%20fixVersion%20%3D%2076.1%20ORDER%20BY%20component%20ASC%2C%20created%20DESC)
3953

4054
If there are maintenance releases, they will be 76.2, 76.3, etc. (During ICU 76 development, the library version number was 76.0.x.)
@@ -43,51 +57,168 @@ Note: There may be additional commits on the [maint/maint-76](https://github.com
4357

4458
## Common Changes
4559

46-
* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog):
47-
* TODO
48-
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog):
49-
* TODO: new stuff
50-
* TODO: below is from 45
51-
* MessageFormat 2.0 tech preview being included into LDML.
52-
* Structural “under the hood” work and limited data bug fixes, but no new data collection.
53-
* Some time zones deprecated following IANA TZ database changes.
54-
* TODO: new stuff
55-
* TODO: below is from 75
56-
* New Unicode properties APIs for Identifier_Status and Identifier_Type, defined by UTS \#39 Unicode Security Mechanisms, [General Security Profile for Identifiers](https://www.unicode.org/reports/tr39/#General_Security_Profile). ([ICU-11396](https://unicode-org.atlassian.net/browse/ICU-11396))
57-
* Time zone data (tzdata) version 2024a (2024-jan). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
60+
* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
61+
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)):
62+
* Adds five modern-use scripts: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar
63+
* Adds two historic scripts & almost 4000 additional Egyptian Hieroglyphs
64+
* Seven new emoji characters
65+
* Over 700 symbols from legacy computing environments
66+
* ICU line breaking improvements have been upstreamed into
67+
[UAX #14](https://www.unicode.org/reports/tr14/tr14-53.html#Modifications)
68+
* ICU 76 adds support for the new UCD property Modifier_Combining_Mark for
69+
[UAX #53](https://www.unicode.org/reports/tr53/) Arabic Mark Rendering
70+
* ICU 76 also adds support for the UCD property Indic_Conjunct_Break
71+
which was new in Unicode 15.1. ([ICU-22503](https://unicode-org.atlassian.net/browse/ICU-22503))
72+
* [IDNA](https://www.unicode.org/reports/tr46/tr46-33.html#Modifications):
73+
The handling of UseSTD3ASCIIRules was simplified.
74+
Some existing characters changed from disallowed (when that was only for compatibility with
75+
long-obsolete IDNA2003) to valid.
76+
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md)
77+
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html)):
78+
* Significant data updates across all locales
79+
* Locales which are now at modern coverage level: Nigerian Pidgin, Tigrinya
80+
* Locales which are now at moderate coverage level:
81+
Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
82+
* New measurement units "night" and "light-speed"
83+
* Note: ICU 76 does not yet support `portion-per-1e9` (aka per-billion). (See [ICU-22781](https://unicode-org.atlassian.net/browse/ICU-22781))
84+
* [MessageFormat 2.0 tech preview updates](https://cldr.unicode.org/downloads/cldr-46#message-format-specification)
85+
* Language matching: Dropped the fallback mapping
86+
desired="uk" → supported="ru"
87+
(so that Ukrainian (uk) doesn’t fall back to Russian (ru))
88+
* [Collation](https://cldr.unicode.org/downloads/cldr-46#collation-data-changes):
89+
Significant changes to the CLDR root collation (CLDR default sort order)
90+
* Realigned With DUCET:
91+
The order of groups of characters which sort below letters is now the same.
92+
In both sort orders, non-decimal-digit numeric characters now sort after decimal digits,
93+
and the CLDR root collation no longer tailors any currency symbols
94+
(making some of them sort like letter sequences, as in the DUCET).
95+
_These changes eliminate sort order differences among almost all
96+
regular characters between the CLDR root collation and the DUCET._
97+
* Improved Han Radical-Stroke Order:
98+
The CLDR radical-stroke order now matches that of the Unicode Radical-Stroke Index;
99+
traditional vs. simplified forms of radicals are now distinguished on a lower level than the number of residual strokes.
100+
In alphabetic indexes for radical-stroke sort orders,
101+
only the traditional forms of radicals are now available as index characters.
102+
* Time zone data (tzdata) version 2024b (2024-sep). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
103+
* The Asia/Almaty time zone has become an alias following IANA TZ database changes.
104+
* CLDR added support for deprecated timezone codes by remapping:
105+
CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York,
106+
MST7MDT → America/Denver, PST8PDT → America/Los_Angeles
107+
(These IANA TZ changes were motivated by CLDR, see
108+
[CLDR-17111](https://unicode-org.atlassian.net/browse/CLDR-17111))
58109

59110
## ICU4C Specific Changes
60111

61-
* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
62-
* TODO: new stuff
63-
* TODO: below is from 75
64-
* MessageFormat 2.0 tech preview new API ([ICU-22261](https://unicode-org.atlassian.net/browse/ICU-22261))
65-
* C: Require C11 (up from C99)
66-
* C++: Require C++17 (up from C++11)
67-
* Many changes for more robust string and buffer handling.
112+
* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
113+
* A UnicodeString can now be converted to & from UTF-16 standard string_view types
114+
(std::u16string_view, and on Windows to/from std::wstring_view)
115+
and other UTF-16 types (string literals, standard string classes).
116+
Several other member functions have been widened to accept standard UTF-16 types as well.
117+
([ICU-22843](https://unicode-org.atlassian.net/browse/ICU-22843))
118+
* New APIs for colloquial iteration over the elements of a C++ UnicodeSet or a C USet. ([ICU-22876](https://unicode-org.atlassian.net/browse/ICU-22876))
119+
* For details and an example see the “C++ Header-Only APIs” section of the [Migration Issues](#migration-issues) below.
120+
* New APIs for colloquial use of C++ Collator / C UCollator with
121+
standard C++ algorithms (e.g, sort) & data structures (e.g., map).
122+
([ICU-22879](https://unicode-org.atlassian.net/browse/ICU-22879))
123+
(The UCollator wrappers are also C++ header-only APIs.)
124+
* Note: Some APIs were changed to accept a wider range of input types than before,
125+
but in the API change report they look like the old, stable signatures are removed,
126+
and like the wider signatures are added as “born stable”.
127+
For example, several UnicodeString constructors that take a raw pointer
128+
have been replaced with a signature that accepts such raw pointers but also additional input types.
129+
* Note: Similarly, the API change report appears to show removal+addition of
130+
certain UnicodeString::remove() and UnicodeString::removeBetween() overloads,
131+
but only the _expression_ of one of their default parameter values has changed.
132+
* Many changes for more robust string and memory handling.
68133

69134
## ICU4J Specific Changes
70135

71-
* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
72-
* TODO: new stuff
73-
* TODO: below is from 75
74-
* MessageFormat 2.0 tech preview update ([ICU-22690](https://unicode-org.atlassian.net/browse/ICU-22690))
75-
* Performance (multi-threading / lock contention) improvement for BreakIterator.clone() and ULocale.getDefault(). ([ICU-22582](https://unicode-org.atlassian.net/browse/ICU-22582))
136+
* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
137+
* Most of the java.time (Temporal) types can now be formatted directly
138+
using the existing ICU4J date/time formatting classes. ([ICU-22853](https://unicode-org.atlassian.net/browse/ICU-22853))
139+
* New APIs for colloquial iteration over the elements of a UnicodeSet.
140+
In addition to the existing ranges(), strings(), and UnicodeSet-is-an-Iterable,
141+
there is a new codePoints() (returns an Iterable),
142+
and new methods that return Streams (e.g., codePointStream() & rangeStream()).
143+
([ICU-22845](https://unicode-org.atlassian.net/browse/ICU-22845))
76144

77145
## Known Issues
78146

79-
* TODO: new stuff
80-
* TODO: below is from 75
81-
* [ICU-22729](https://unicode-org.atlassian.net/browse/ICU-22729) udatpg_getBestPattern requires exact skeleton match in ICU 76
82-
* Due to a combination of an ICU bug fix and issues with CLDR availableFormats data, some skeletons in some languages yield inconsistent data/time formatting patterns.
147+
* None yet
83148

84149
## Migration Issues
85150

86-
* See [CLDR 46 migration issues](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md#migration)
87-
* TODO: new stuff
88-
* TODO: below is from 75
89-
* ICU4C behavior for ill-formed locale IDs/language tags: uloc_getName(), uloc_getLanguage() and similar functions (and functions that rely on them) may fail with a U_ILLEGAL_ARGUMENT_ERROR when they used to fail only with a U_BUFFER_OVERFLOW_ERROR. (due to changes for [ICU-22520](https://unicode-org.atlassian.net/browse/ICU-22520))
90-
* On Linux, the configure script now defaults to "cc" rather than preferring "clang". If you want to choose clang, then configure for "Linux/clang". ([ICU-22556](https://unicode-org.atlassian.net/browse/ICU-22556))
151+
### IDNA Default Option Changed to Nontransitional Processing
152+
After all major browsers have switched to nontransitional processing,
153+
Unicode 15.1 (a year ago) changed the [UTS #46 spec](https://www.unicode.org/reports/tr46/#Processing)
154+
to declare transitional processing deprecated.
155+
156+
ICU 76 changes the "DEFAULT" API constants from 0 to UIDNA_NONTRANSITIONAL_TO_ASCII | UIDNA_NONTRANSITIONAL_TO_UNICODE.
157+
158+
ICU 76 does not change the behavior of using options value 0.
159+
(That would change the behavior of existing binaries linking with new ICU libraries.)
160+
However, when code is recompiled against a new version of ICU,
161+
and when it uses the DEFAULT constant, then it will pass these option flags into the factory method.
162+
163+
* In C/C++: unicode/uidna.h [UIDNA_DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uidna_8h.html#a726ca809ffd3d67ab4b8476646f26635aa1eb63014cdaf41c7ea6cf3abecf1169)
164+
* In Java: IDNA.java [DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/IDNA.html#DEFAULT)
165+
166+
See [ICU-22294](https://unicode-org.atlassian.net/browse/ICU-22294)
167+
168+
### SimpleNumber::truncateStart() Removed
169+
ICU 75 renamed the still-draft SimpleNumber::truncateStart() to setMaximumIntegerDigits().
170+
ICU 76 removes the never-stable, original function.
171+
Same for the C API usnum_truncateStart().
172+
([ICU-22900](https://unicode-org.atlassian.net/browse/ICU-22900))
173+
174+
### C++ Header-Only APIs
175+
ICU 76 is the first version where we add what we call C++ header-only APIs.
176+
These are especially intended for users who rely on only binary stable DLL/library exports of C APIs
177+
(C++ APIs cannot be binary stable).
178+
179+
_Please test these new APIs and let us know if you find problems —
180+
especially if you find a platform/compiler/options combination
181+
where the call site does end up calling into ICU DLL/library exports._
182+
183+
Remember that regular C++ APIs can be hidden by callers defining `U_SHOW_CPLUSPLUS_API=0`.
184+
The new header-only APIs can be separately enabled via `U_SHOW_CPLUSPLUS_HEADER_API=1`.
185+
186+
([GitHub query for `U_SHOW_CPLUSPLUS_HEADER_API` in public header files](https://github.com/search?q=repo%3Aunicode-org%2Ficu+U_SHOW_CPLUSPLUS_HEADER_API+path%3Aunicode%2F*.h&type=code))
187+
188+
These are C++ definitions that are not exported by the ICU DLLs/libraries,
189+
are thus inlined into the calling code,
190+
and which may call ICU C APIs but not into ICU non-header-only C++ APIs.
191+
192+
The header-only APIs are defined in a nested `header` namespace.
193+
If entry point renaming is turned off (the main namespace is `icu` rather than `icu_76` etc.),
194+
then the new `U_HEADER_ONLY_NAMESPACE` is `icu::header`.
195+
196+
([Link to the API proposal which introduced this concept](https://docs.google.com/document/d/1xERVccTYsptzjfbjcj6HDtoKVF_mEKmslPsOiQzzaFg/view#heading=h.cf4bmhjgozry))
197+
198+
For example, for iterating over the code point ranges in a `USet` (excluding the strings):
199+
200+
```c++
201+
U_NAMESPACE_USE
202+
using U_HEADER_NESTED_NAMESPACE::USetRanges;
203+
LocalUSetPointer uset(uset_openPattern(u"[abcçカ🚴]", -1, &errorCode));
204+
for (auto [start, end] : USetRanges(uset.getAlias())) {
205+
printf("uset.range U+%04lx..U+%04lx\n", (long)start, (long)end);
206+
}
207+
for (auto range : USetRanges(uset.getAlias())) {
208+
for (UChar32 c : range) {
209+
printf("uset.range.c U+%04lx\n", (long)c);
210+
}
211+
}
212+
```
213+
214+
(Implementation note: On most platforms, when compiling ICU itself,
215+
the `U_HEADER_ONLY_NAMESPACE` is `icu::internal`,
216+
so that any such symbols that get exported differ from the ones that calling code sees.
217+
On Windows, where DLL exports are explicit,
218+
the namespace is always the same, but these header-only APIs are not marked for export.)
219+
220+
### Migration Issues Related to CLDR
221+
* See [CLDR 46 migration issues](https://cldr.unicode.org/downloads/cldr-46#migration)
91222
92223
## ICU4C Platform Support
93224
@@ -97,27 +228,30 @@ We routinely test on recent versions of Linux, macOS, and Windows.
97228
98229
We accept patches for other platforms.
99230
231+
For ICU 76, we have received a contribution to make ICU4C work again on z/OS,
232+
using a newer (clang-based) compiler. ([ICU-22714](https://unicode-org.atlassian.net/browse/ICU-22714) [icu/pull/3008](https://github.com/unicode-org/icu/pull/3008) + [ICU-22916](https://unicode-org.atlassian.net/browse/ICU-22916) [icu/pull/3208](https://github.com/unicode-org/icu/pull/3208))
233+
100234
Windows: The minimum supported version is Windows 7. (See [How To Build And Install On Windows](../userguide/icu4c/build.html#how-to-build-and-install-on-windows) for more details.)
101235
102236
## ICU4J Platform Support
103237
104-
ICU4J works on Java 8..17 (at least).
238+
ICU4J works on Java 8..21 (at least).
105239
106240
ICU4J should work on Android API level 21 and later but may require “[library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring)”.
107241
108242
## Download
109243
110-
Source and binary downloads are available on the git/GitHub tag page: TODO: https://github.com/unicode-org/icu/releases/tag/release-76-1
244+
Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-76-rc
111245
112246
See the [Source Code Setup](../devsetup/source/) page for how to download the ICU file tree directly from GitHub.
113247
114248
ICU locale data was generated from CLDR data equivalent to:
115249
116-
* TODO: fix/update
117-
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta4
118-
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta4
250+
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta3
251+
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta3
119252
120-
TODO: Maven dependency:
253+
[Maven dependency](https://central.sonatype.com/artifact/com.ibm.icu/icu4j):
254+
TODO
121255
```
122256
<dependency>
123257
<groupId>com.ibm.icu</groupId>

0 commit comments

Comments
 (0)