Skip to content

Add MatchData#deconstruct/deconstruct_keys #6216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

palkan
Copy link
Contributor

@palkan palkan commented Aug 6, 2022

Feature #18821

The #deconstruct_keys behaviour is inferred from the Struct#deconstruct_keys, in particular:

  1. Both Symbols and Strings could be passed as the required keys, the type is preserved.
  2. If the number of the passed keys is greater than the number of the named groups, an empty Hash is returned.
  3. Collecting the keys stops as soon as an unknown key is encountered; the returned Hash contains all the keys collected so far.

IMO, 1) and 2) are questionable. But since we already do that for Structs, it makes sense to preserve the current behaviors and not introduce a new one. Alternatively, we can re-visit the Struct API (which I proposed once, but it was rejected).

/cc @baweaver

@k-tsj
Copy link
Member

k-tsj commented Sep 8, 2022

@palkan
Sorry for late review.
Could you update NEWS.md if possible? Then, I'll merge this.

  1. Both Symbols and Strings could be passed as the required keys, the type is preserved.
  2. If the number of the passed keys is greater than the number of the named groups, an empty Hash is returned.

These are not required. In case of Struct, I implemented them this way for performance.

@palkan palkan force-pushed the feat/match-data-deconstruct branch from 8019674 to be215ab Compare September 10, 2022 01:46
@palkan
Copy link
Contributor Author

palkan commented Sep 10, 2022

@k-tsj Thanks for the review! Updated.

re.c Outdated
return rb_hash_new_with_size(0);
}

names = rb_reg_names(RMATCH(match)->regexp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to delay calling rb_reg_names until it is really needed.
Pseudo code is here:

if (NIL_P(keys)) {
  VALUE names = rb_reg_names(RMATCH(match)->regexp);
  ...
}

...

if (onig_number_of_names(RREGEXP_PTR(RMATCH(match)->regexp))) < RARRAY_LEN(keys)) {
  return rb_hash_new_with_size(0);
}

(In addition to that, I think rb_reg_names in if (NIL_P(keys)) { could be replaced by onig_foreach_name so that we can avoid rb_ary_new_capa call in rb_reg_names.)

@k-tsj
Copy link
Member

k-tsj commented Sep 10, 2022

@palkan Thanks for updating NEWS!
I added some review comments, could you check it?

@palkan palkan force-pushed the feat/match-data-deconstruct branch from be215ab to 8574e6f Compare September 12, 2022 00:28
Comment on lines +2290 to +2292
if (symbolize > 0) {
key = rb_str_intern(key);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k-tsj I decide to re-use the existing iterator function (from #named_captures), since we need almost the same functionality for #deconstruct_keys. WDYT?

(And we can probably go further and add symbolize_keys: trueto#named_captures`, which could be useful on its own)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palkan It looks good. (IMHO, u3->state is preferable to u3->cnt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Updated

@palkan palkan force-pushed the feat/match-data-deconstruct branch from 8574e6f to 7cbc61c Compare September 20, 2022 16:12
@k-tsj k-tsj merged commit 4954c9f into ruby:master Oct 10, 2022
@palkan palkan deleted the feat/match-data-deconstruct branch October 11, 2022 18:12
tomstuart added a commit to tomstuart/wasminna that referenced this pull request Oct 20, 2022
I do have to refine MatchData to make this work, but @paracycle pointed
out that Ruby 3.2 will contain this anyway [0], so I’m happy for it to
live here until Christmas, at which point I’m sure I’ll still not have
finished this project.

[0] ruby/ruby#6216
mudge added a commit to mudge/re2 that referenced this pull request Oct 22, 2022
GitHub: #57

In order to support pattern matching in Ruby, add a deconstruct_keys
method to RE2::MatchData that will return all named matches if given nil
or a subset of matches if given an array of capturing group names as
symbols.

To match [Ruby's implementation of
this](ruby/ruby#6216) there are a few edge
cases:

1. If the number of keys given exceeds the number of named capturing
   groups, immediately return an empty hash.
2. Each key is added to the hash in turn and the first invalid key will
   cause the hash to be returned so it is possible to get a partial set
   of named captures.
mudge added a commit to mudge/re2 that referenced this pull request Oct 22, 2022
GitHub: #57

In order to support pattern matching in Ruby, add a deconstruct_keys
method to RE2::MatchData that will return all named matches if given nil
or a subset of matches if given an array of capturing group names as
symbols.

To match [Ruby's implementation of
this](ruby/ruby#6216) there are a few edge
cases:

1. If the number of keys given exceeds the number of named capturing
   groups, immediately return an empty hash.
2. Each key is added to the hash in turn and the first invalid key will
   cause the hash to be returned so it is possible to get a partial set
   of named captures.
mudge added a commit to mudge/re2 that referenced this pull request Oct 22, 2022
GitHub: #57

In order to support pattern matching in Ruby, add a deconstruct_keys
method to RE2::MatchData that will return all named matches if given nil
or a subset of matches if given an array of capturing group names as
symbols.

To match [Ruby's implementation of
this](ruby/ruby#6216) there are a few edge
cases:

1. If the number of keys given exceeds the number of named capturing
   groups, immediately return an empty hash.
2. Each key is added to the hash in turn and the first invalid key will
   cause the hash to be returned so it is possible to get a partial set
   of named captures.
mudge added a commit to mudge/re2 that referenced this pull request Oct 22, 2022
GitHub: #57

In order to support pattern matching in Ruby, add a deconstruct_keys
method to RE2::MatchData that will return all named matches if given nil
or a subset of matches if given an array of capturing group names as
symbols.

To match [Ruby's implementation of
this](ruby/ruby#6216) there are a few edge
cases:

1. If the number of keys given exceeds the number of named capturing
   groups, immediately return an empty hash.
2. Each key is added to the hash in turn and the first invalid key will
   cause the hash to be returned so it is possible to get a partial set
   of named captures.
tomstuart added a commit to tomstuart/wasminna that referenced this pull request Nov 15, 2022
This is part of Ruby 3.2.0-preview3 [0], so we no longer need to define
it ourselves.

[0] ruby/ruby#6216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants