forked from huonw/unicode_names
-
Notifications
You must be signed in to change notification settings - Fork 11
Replace built-in MPH with a dependency on the 'phf' crate #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I am sorry. I was totally forgetting about this patch. |
I created #24 |
#12:
#24:
( #24 with a hacky patch to merge the PHFsdiff --git a/build.rs b/build.rs
index 8caa8c1..dedbffe 100644
--- a/build.rs
+++ b/build.rs
@@ -19,11 +19,6 @@ fn main() {
{
let mut generated_phf_path = out_dir.clone();
generated_phf_path.push("generated_phf.rs");
- generator::generate_phf(UNICODE_DATA, Some(&generated_phf_path), None);
- }
- {
- let mut generated_alias_path = out_dir;
- generated_alias_path.push("generated_alias.rs");
- generator::generate_aliases(NAME_ALIASES, &generated_alias_path);
+ generator::generate_phf(UNICODE_DATA, NAME_ALIASES, Some(&generated_phf_path), None);
}
}
diff --git a/generator/src/lib.rs b/generator/src/lib.rs
index 11b8103..bfad690 100644
--- a/generator/src/lib.rs
+++ b/generator/src/lib.rs
@@ -400,7 +400,7 @@ fn get_truncated_table_data(
(codepoint_names, cjk)
}
-pub fn generate_phf(unicode_data: &'static str, path: Option<&Path>, truncate: Option<usize>) {
+pub fn generate_phf(unicode_data: &'static str, name_aliases: &'static str, path: Option<&Path>, truncate: Option<usize>) {
let (codepoint_names, _) = get_truncated_table_data(unicode_data, truncate);
let mut ctxt = make_context(path);
@@ -408,6 +408,10 @@ pub fn generate_phf(unicode_data: &'static str, path: Option<&Path>, truncate: O
for &(character, name) in &codepoint_names {
builder.entry(name.as_bytes(), &format!("{:?}", character));
}
+ for Alias { code, alias, .. } in get_aliases(name_aliases).into_iter() {
+ let formatted = format!("'\\u{{{code}}}'");
+ builder.entry(alias.as_bytes(), &formatted);
+ }
write!(
&mut ctxt.out,
"pub static NAMES: phf::Map<&'static [u8], char> = {};",
@@ -432,13 +436,3 @@ pub fn generate(unicode_data: &'static str, path: Option<&Path>, truncate: Optio
fs::rename(path.with_extension("tmp"), path).unwrap()
}
}
-
-pub fn generate_aliases(name_aliases: &'static str, path: &Path) {
- let mut aliases = phf_codegen::Map::new();
- for Alias { code, alias, .. } in get_aliases(name_aliases).into_iter() {
- let formatted = format!("'\\u{{{code}}}'");
- aliases.entry(alias, &formatted);
- }
- let aliases = aliases.build().to_string().replace("(\"", "(b\"");
- writeln!(BufWriter::new(File::create(path).unwrap()), "{aliases}",).unwrap();
-}
diff --git a/src/lib.rs b/src/lib.rs
index 5768386..05e00dd 100755
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -86,11 +86,6 @@ mod generated_phf {
#[allow(dead_code)]
mod jamo;
-/// A map of unicode aliases to their corresponding values.
-/// Generated in generator
-#[allow(dead_code)]
-static ALIASES: phf::Map<&'static [u8], char> = include!(concat!(env!("OUT_DIR"), "/generated_alias.rs"));
-
mod iter_str;
static HANGUL_SYLLABLE_PREFIX: &str = "HANGUL SYLLABLE ";
@@ -291,16 +286,6 @@ pub fn name(c: char) -> Option<Name> {
}
}
-/// Get alias value from alias name, returns `None` if the alias is not found.
-fn character_by_alias(name: &[u8]) -> Option<char> {
- println!(
- "name {name}, {alias:?}",
- name = std::str::from_utf8(name).unwrap(),
- alias = ALIASES.get(name)
- );
- ALIASES.get(name).copied()
-}
-
/// Find the character called `name`, or `None` if no such character
/// exists.
///
@@ -378,7 +363,7 @@ pub fn character(search_name: &str) -> Option<char> {
let codepoint = match generated_phf::NAMES.get(search_name) {
Some(&codepoint) => codepoint,
- None => return character_by_alias(search_name),
+ None => return None,
};
// Now check that this is actually correct. Since this is a
@@ -587,9 +572,9 @@ mod tests {
#[test]
fn character_by_alias() {
- assert_eq!(super::character_by_alias(b"NEW LINE"), Some('\n'));
- assert_eq!(super::character_by_alias(b"BACKSPACE"), Some('\u{8}'));
- assert_eq!(super::character_by_alias(b"NOT AN ALIAS"), None);
+ assert_eq!(super::character("NEW LINE"), Some('\n'));
+ assert_eq!(super::character("BACKSPACE"), Some('\u{8}'));
+ assert_eq!(super::character("NOT AN ALIAS"), None);
}
#[bench]
|
So it seems pretty clear the built-in PHF is (predictably) still better than generic PHF even on aliases. Too bad. |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before:
After:
while running on i5-1240P (downclocked to 2GHz)
so performance is similar, but
character_10000
is noticeably slower, and the binary size grew by 5MB, which is not negligeable.@youknowone Could you try rewriting #12 based on this? I'd like to know the performance and size difference between your current code (keeps the custom MPH, and adds a
phf
MPH just for the aliases) and using a singlephf
MPH for both tables.