Skip to content

[Feature #20724] Bump Unicode version to 16.0.0 #13117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 18, 2025

Conversation

ima1zumi
Copy link
Member

@ima1zumi ima1zumi commented Apr 15, 2025

This comment has been minimized.

@duerst
Copy link
Member

duerst commented Apr 15, 2025

Removing gsub may make this much slower. There should be no need for such big changes.


quick_checks = %w(NFD_QC NFC_QC NFKD_QC NFKC_QC)

File.foreach('enc/unicode/data/16.0.0/ucd/DerivedNormalizationProps.txt') do |line|
Copy link
Member

@nobu nobu Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use InputDataDir and vpath, not the hardcoded path.

Suggested change
File.foreach('enc/unicode/data/16.0.0/ucd/DerivedNormalizationProps.txt') do |line|
vpath.foreach("#{InputDataDir}/DerivedNormalizationProps.txt") do |line|

def self.to_nfkd_arr(string)
kompatibled_arr = string.each_char.flat_map { kompatible_one(it) }
decomposed_arr = kompatibled_arr.each.flat_map { decompose_one(it) }
ordered_arr = canonical_ordering(decomposed_arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordered_arr is assigned but unused variable.

Suggested change
ordered_arr = canonical_ordering(decomposed_arr)
canonical_ordering(decomposed_arr)

@nobu
Copy link
Member

nobu commented Apr 18, 2025

diff --git a/common.mk b/common.mk
index 8d1ea8815a1..d8d6e2525e3 100644
--- a/common.mk
+++ b/common.mk
@@ -1245,7 +1245,12 @@ srcs-extra: $(EXTRA_SRCS)
 realclean-srcs-extra::
 	$(Q)$(RM) $(EXTRA_SRCS)
 
-LIB_SRCS = $(srcdir)/lib/unicode_normalize/tables.rb
+UNICODE_NORMALIZE_TABLES = $(srcdir)/lib/unicode_normalize/tables.rb
+encs: $(ALWAYS_UPDATE_UNICODE:yes=update-unicode_normalize-tables)
+
+update-unicode_normalize-tables: $(UNICODE_NORMALIZE_TABLES)
+
+LIB_SRCS = $(UNICODE_NORMALIZE_TABLES)
 
 srcs-lib: $(LIB_SRCS)
 

@duerst
Copy link
Member

duerst commented Apr 18, 2025

[unicode_normalize/tables.rb should not only be updated when ALWAYS_UPDATE_UNICODE=yes. It should e.g. be updated when template/unicode_norm_gen.tmpl is changed. And makefile-related changes should be independent of updates for Unicode 16.0.0.

@ima1zumi ima1zumi force-pushed the unicode-to-16.0.0 branch from 3685ad0 to 97f35c7 Compare April 18, 2025 07:17
@ima1zumi ima1zumi changed the title [WIP] Unicode 16.0.0 Unicode 16.0.0 Apr 18, 2025
0363..036F    ; Alphabetic # Mn  [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
@ima1zumi ima1zumi force-pushed the unicode-to-16.0.0 branch from 97f35c7 to f7b53e3 Compare April 18, 2025 07:49
@ima1zumi ima1zumi changed the title Unicode 16.0.0 [Feature #20724] Bump Unicode version to 16.0.0 Apr 18, 2025
@ima1zumi ima1zumi marked this pull request as ready for review April 18, 2025 07:52
@ima1zumi ima1zumi enabled auto-merge (rebase) April 18, 2025 09:08
@ima1zumi ima1zumi disabled auto-merge April 18, 2025 09:49
@ima1zumi
Copy link
Member Author

ima1zumi commented Apr 18, 2025

@duerst I removed make-file changes from this PR.
template/unicode_norm_gen.tmpl and tables.rb depends on UnicodeData.txt, CompositionExclusions.txt. When we update Unicode in Ruby, we should run unicode_normalize/tables.rb. What do you think?

@ima1zumi ima1zumi merged commit ccc7493 into ruby:master Apr 18, 2025
81 checks passed
@duerst
Copy link
Member

duerst commented Apr 18, 2025

@ima1zumi Thanks for all your work!

@ima1zumi @nobu 'template/unicode_norm_gen.tmpl' does NOT depend on UnicodeData.txt,... 'lib/unicode_normalize/tables.rb' depends on 'template/unicode_norm_gen.tmpl', ' UnicodeData.txt', and 'CompositionExclusions.txt'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants