You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the confusable data doesn't fully include CJK ideographs related info.
Unihan database provides these:
kZVariant which describes logically unifiable ideographs that are separately encoded for other reasons
kSpoofingVariant which describes confusable ideograph relationships
EquivalentUnifiedIdeograph.txt which describes CJK Radical/Stroke and their corresponding Ideograph.
Procedurally i'd recommend Unicode security standards adopt these as confusables upstream, either by data or by algorithm, and then this crate can support them automatically.
I don't know what it would take for Unicode to do this, and whether it wants to. I think Han confusables are a bit different in a couple ways (a lot of Han confusables show up in regular text, even side by side!) so that might be why they are separate. We could still parse unihan and support it though, unihan isn't too complicated.
Currently the confusable data doesn't fully include CJK ideographs related info.
Unihan database provides these:
kZVariant
which describes logically unifiable ideographs that are separately encoded for other reasonskSpoofingVariant
which describes confusable ideograph relationshipsEquivalentUnifiedIdeograph.txt
which describes CJK Radical/Stroke and their corresponding Ideograph.Procedurally i'd recommend Unicode security standards adopt these as confusables upstream, either by data or by algorithm, and then this crate can support them automatically.
cc @Manishearth
The text was updated successfully, but these errors were encountered: