-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ZJIT: Compile toregexp #14200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZJIT: Compile toregexp #14200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you
@@ -2823,6 +2842,14 @@ pub fn iseq_to_hir(iseq: *const rb_iseq_t) -> Result<Function, ParseError> { | |||
let insn_id = fun.push_insn(block, Insn::StringConcat { strings, state: exit_id }); | |||
state.stack_push(insn_id); | |||
} | |||
YARVINSN_toregexp => { | |||
let opt = get_arg(pc, 0).as_usize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this something more descriptive and also store it in a way that's more conducive to examination than a raw int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least, if we store it as a raw int---which we'll need to pass into codegen later---print it out more nicely as OR-ed flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this something more descriptive
I want this to be consistent with whatever name insns.def uses (currently opt
). Since we have to compare the interpreter implementation and the ZJIT implementation for checking compatibility, it'd be harder to maintain if the names have inconsistency.
If you need a description, we should add a comment. If you do need a rename of the variable, please rename that in insns.def first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want this to be consistent with whatever name insns.def uses
That's the rule I've been following for naming things too. 👍🏻
print it out more nicely as OR-ed flags?
I added some fancier printing (I wasn't printing the options at all before), plus a comment about what opt is. Does that seem sufficient?
83a03da
to
08ac56e
Compare
08ac56e
to
23929ad
Compare
This comment has been minimized.
This comment has been minimized.
23929ad
to
41b24ec
Compare
zjit/src/hir.rs
Outdated
} | ||
|
||
let opt = *opt as u32; | ||
if opt & (ONIG_OPTION_MULTILINE | ONIG_OPTION_IGNORECASE | ONIG_OPTION_EXTEND) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a bunch more of these like ONIG_OPTION_DONT_CAPTURE_GROUP
. Maybe something like this?
/// make a pair of stringified and int
macro_rules! strenum {
($name:ident) => {
(stringify!($name), $name)
};
}
static ALL_FLAGS: &[(&str, u32)] = &[
strenum!(ONIG_OPTION_NONE),
strenum!(ONIG_OPTION_IGNORECASE),
strenum!(ONIG_OPTION_EXTEND),
strenum!(ONIG_OPTION_MULTILINE),
strenum!(ONIG_OPTION_SINGLELINE),
strenum!(ONIG_OPTION_FIND_LONGEST),
];
/// iterate over ALL_FLAGS and call bitflag!
macro_rules! bitflags {
($writer:expr, $flag:expr) => {
let mut sep = "";
for (name, flag) in ALL_FLAGS {
if $flag & flag != 0 {
write!($writer, "{sep}{name}")?;
sep = "|";
}
}
};
}
impl std::fmt::Display for F {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
bitflags!(f, self.options);
Ok(())
}
}
Please pardon the Rust---I'm very much not an expert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is cleaner I think
macro_rules! bitflags {
($writer:expr, $bitfield:expr, $($flag:ident),*) => {
let mut sep = "";
$(
if $bitfield & $flag != 0 {
write!($writer, "{sep}{}", stringify!($flag))?;
sep = "|";
}
)*
};
}
impl std::fmt::Display for F {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
bitflags!(f, self.options, ONIG_OPTION_NONE, ONIG_OPTION_IGNORECASE, ONIG_OPTION_EXTEND, ONIG_OPTION_MULTILINE, ONIG_OPTION_SINGLELINE, ONIG_OPTION_FIND_LONGEST);
Ok(())
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but please get all the relevant flags from include/ruby/onigmo.h
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't think all of those onigmo flags match exactly what is in opt
here. I modeled this printing after
static char * | |
option_to_str(char str[OPTBUF_SIZE], int options) | |
{ | |
char *p = str; | |
if (options & ONIG_OPTION_MULTILINE) *p++ = 'm'; | |
if (options & ONIG_OPTION_IGNORECASE) *p++ = 'i'; | |
if (options & ONIG_OPTION_EXTEND) *p++ = 'x'; | |
*p = 0; | |
return str; | |
} |
/* see Regexp.options and Regexp.new */ | |
rb_define_const(rb_cRegexp, "IGNORECASE", INT2FIX(ONIG_OPTION_IGNORECASE)); | |
/* see Regexp.options and Regexp.new */ | |
rb_define_const(rb_cRegexp, "EXTENDED", INT2FIX(ONIG_OPTION_EXTEND)); | |
/* see Regexp.options and Regexp.new */ | |
rb_define_const(rb_cRegexp, "MULTILINE", INT2FIX(ONIG_OPTION_MULTILINE)); | |
/* see Regexp.options and Regexp.new */ | |
rb_define_const(rb_cRegexp, "FIXEDENCODING", INT2FIX(ARG_ENCODING_FIXED)); | |
/* see Regexp.options and Regexp.new */ | |
rb_define_const(rb_cRegexp, "NOENCODING", INT2FIX(ARG_ENCODING_NONE)); |
So 16 is ARG_ENCODING_FIXED
and 32 is ARG_NO_ENCODING
. Those don't match up with the ognimo options with the same values. NOENCODING with the n
flag, for example:
% ruby --dump=insn -e '/#{}/n'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,6)>
0000 putobject "" ( 1)[Li]
0002 putnil
0003 dup
0004 objtostring <calldata!mid:to_s, argc:0, FCALL|ARGS_SIMPLE>
0006 anytostring
0007 toregexp 32, 2
0010 leave
% ruby -e 'p Regexp::NOENCODING; p //n.options'
32
32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARG_NO_ENCODING is easy enough to display as n
. I'm not sure how best to print ARG_ENCODING_FIXED though since it seems like the option itself doesn't tell us whether it's an e, s, or u.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.......huh. That's a weird subset. I don't think we need to print as one-char; printing as OR-ed long-name is probably fine (like the function call flags). Thank you for investigating. I trust your judgement here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try printing it as the Ruby constant name like IGNORECASE|EXTENDED|MULTILINE|FIXEDENCODING|NOENCODING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've got something that works and I think covers all the possible options. I didn't end up using the macros you offered above since using the Ruby constant name made them somewhat less useful, but I'm open to suggestions if there's a more elegant way to do this.
0bb4382
to
a58a4a6
Compare
`toregexp` is fairly similar to `concatstrings`, so this commit extracts a helper for pushing and popping operands on the native stack. There's probably opportunity to move some of this into lir (e.g. Alan suggested a push_many that could use STP on ARM to push 2 at a time), but I might save that for another day.
617c0b4
to
912903c
Compare
toregexp
is fairly similar toconcatstrings
, so this commit extracts a helper for pushing and popping operands on the native stack.There's probably opportunity to move some of this into lir (e.g. Alan suggested a push_many that could use STP on ARM to push 2 at a time), but I might save that for another day.