Skip to content

refactor: API for deriving customized versions of the base rules #2610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 13, 2025

Conversation

rickeylev
Copy link
Collaborator

@rickeylev rickeylev commented Feb 14, 2025

This implements a "builder style" API to allow arbitrary modification of rule, attr, etc
objects used when defining a rule. The net effect is users are able to use the base
definition for our rules, but define their own with the modifications they need, without
having to copy/paste portions our implementation, load private files, or patch source.

The basic way it works is a mutable object ("builder") holds the args and state that would
be used to create the immutable Bazel object. When build() is called, the immutable
Bazel object (e.g. attr.string()) is created. Builders are implemented for most objects
and their settings (rule, attrs, and supporting objects).

This design is necessary because of three Bazel behaviors:

  • attr etc objects are immutable, which means we must keep our own state
  • attr etc objects aren't inspectable, which means we must store the arguments for
    creating the immutable objects.
  • Starlark objects are frozen after initial bzl file evaluation, which means creation
    of any mutable object must be done at the point of use.

The resulting API resembles the builder APIs common in other languages:

r = create_py_binary_rule_builder()
r.attrs.get("srcs").set_mandatory(True)
r.attrs.get("deps").aspects().append(my_aspect)
my_py_binary = r.build()

Most objects are thin wrappers for managing a kwargs dict. As such, and because they're
wrapping a foreign API, they aren't strict in enforcing their internal state and the
kwargs dict is publicly exposed as an escape hatch.

As of this PR, no public API for e.g. create_py_binary_rule_builder() is exposed. That'll come in a separate PR (to add public access points under python/api).

Work towards #1647

@rickeylev rickeylev force-pushed the refactor.attr.builders branch from 24616e2 to f934e94 Compare February 14, 2025 02:59
@rickeylev
Copy link
Collaborator Author

The diff here is pretty big, so I'll summarize the changes:

In attributes.bzl, the dict[name, Attribute] dicts are changed to dict[name, <lambda to create builder>]. Similar for the other dicts of attributes elsewhere. This is "necessary" because of the immutability rules of bazel.

In rule_builders.bzl, all the various builder classes to wrap the inputs to rule, aspect, attr.XXX, etc etc are implemented. It's standard "object oriented" starlark code.

Most of the rest of the changes are just whitespace, some incidental moving of code, or notes.

I put "necessary" in quotes earlier because it's necessary in the sense of providing a mutable object. It's not necessary in the sense of: I think most things about a rule won't be modified. So it feels a bit wasteful.

With that in mind, I did prototype another idea: store both a built object and a callable to create a builder for it. e.g. struct(built=<Attribute> new_builder=lambda: FooBuilder(...)). This worked, but the API on the user side was really annoying (you had to convert the immutable to mutable, modify it, then reassign it where it came from). You can see how that lambda for the builder is still there.

I also prototyped using regular ol' dicts throughout. This largely eliminated the builder wrappers, but dicts also had to have a special @constructor key so e.g. attributes could call the right attr.xxx function. The UX of this was mixed. Just imagine a big nested dict of all the stuff that gets fed into rule(), transition(), attr.xxx, etc, etc. You end up doing a lot of manual "book keeping" interacting with them. Same as before, the lambdas for the "builders" are still there, they just create plain dicts instead of structs.

I also briefly explored using a visitor pattern, but no, that was really bad UX.

There's really no escaping having a function to create "builders" of some sort.

Polymorphic attributes (rule.cfg, attr.cfg take several different data types), and things like providers/aspects (lists of objects that aren't easily identified/introspected) were also complications.

Those complications eventually lead me to lean more into the builder pattern, my reasoning being: at least we control the API surface and can make it mostly ergonomic.

It leaves me pretty unhappy with the internal details of what attributes.bzl looks like, though. A bunch of dict[name, lambda] being merged into one another feels very meh.

In the end, I'm sort of happy with this? Not as happy as I was hoping to be. I'm tempted to shelve it and look for another way to expose a way to create a derived rule.

Maybe keep the current def create_xx_rule(...) functions, but expose functions to create each attribute. So if you want to overide, you do create_xx_rule(attrs={"srcs": create_srcs(...)}). Ironically, this means having a function for creating each attribute, which, yep, puts us back to where we started.

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts reading this.

Copy link
Collaborator Author

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we both like 1a+2b (struct-api with everything mutable by default), so'll i'll clean things up to do it that way.

Thanks for looking and thinking!

@rickeylev rickeylev force-pushed the refactor.attr.builders branch from f934e94 to b9902c8 Compare March 4, 2025 04:21
@rickeylev rickeylev marked this pull request as ready for review March 4, 2025 04:22
@rickeylev
Copy link
Collaborator Author

I expanded this into the struct-with-mutables style API. Notable changes:

  • Per chat: removed Attr, Builder suffixes; split out attr_builders.bzl. Much nicer.
  • attrb, ruleb names: I found attr_builders.XXX to be pretty repetitively verbose. I accidentally typed attrb and then liked it. The "b" standing for "builder".
  • Added docs using the sphinxdocs TYPEDEF feature: preview: https://rules-python--2610.org.readthedocs.build/en/2610/api/rules_python/python/private/rule_builders.html#Rule
  • Added Value; it's like Optional, but without the .present() and won't fail if accessed.
  • Replaced Optional with Value for most attributes. Only allow_files and allow_single_file actually need Optional-type behavior (those two args are mutually exclusive, even if their values are compatible)
  • Added implementations for all other attributes
  • Added builders for toolchain type and exec group.

The main thing missing now are tests, I think.

Unfortunately, I accidentally amended somewhere in my commits and had to force-push, so the diff is "reset".

@rickeylev
Copy link
Collaborator Author

I started adding tests and three things became kind of annoying: polymorphic fields, Optional, and .extra_kwargs.

Label's allow_files is particularly annoying because its bool | list[str] and mutually exclusive with allow_single_file. Interacting with this is tedious: I can't append a value unless I know it's a list. I can't set it to a list because that could overwrite existing values. Then I have to ensure allow_single_file isn't set. So one must do:

if not obj.allow_files.present() or not types.is_list(obj.allow_files.get()):
  obj.allow_files.set([])
  obj.allow_single_file.clear()
obj.allow_files.get().append('.txt')

It took me 3 tries to get that code above right.

The default field has a similar problem; being a mutable collection or some non-collection value. e.g. LabelList's default is list | callable, similar for LabelStringDict, etc.

The extra_kwargs escape hatch also became awkward. e.g. foo.extra_kwargs[x] = y might be set, but foo.x.whatever() won't reflect that state at all. Which is kind of ok (it's an escape hatch to do arbitrary overrides), but always bothered me. Writing tests made it more salient that its confusing.


So I'm changing the API and impl to:

  • Mostly drop Value and Optional entirely. getters/setters are used instead (doc()/set_doc()). This makes it easier to "intercept" changes and keep the whole AttributeBuilder in a correct state. I also like that the conceptual overhead of Value/Optional go away.
  • Change self.extra_kwargs to self.kwargs. Under the hood, self.kwargs acts to hold the mutable state for the object (h/t to 2578 for this idea). This neatly unifies the "escape hatch" with the regular AttributeBuilder APIs. I also generally like that it's conceptually like Python's foo.__dict__, which holds the local state of foo. These builders are, essentially, trying to be convenience wrappers around the kwarg dicts that we want to be mutable.

In combination, it means being able to do e.g.

obj = LabelList()
obj.add_allow_files(".py")
obj.add_default("//some:target")
obj = StringKeyedLabelDict()
obj.add_default("key", "//foo:bar")

And the functions can transparently handle if the previous state was a a bool, function, etc. It also means None becomes the value which means "missing", which feels more natural. e.g. obj.allow_files() returns None and, under the hood, that state is used to keep it omitted from the final kwargs dict generation. So overall, it feels more ergonomic.

@rickeylev rickeylev changed the title wip: API for deriving customized versions of the base rules refactor: API for deriving customized versions of the base rules Mar 10, 2025
@rickeylev rickeylev mentioned this pull request Mar 10, 2025
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at the biggest files later.

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments I have are not blocking. Thanks for designing this. This looks like an easy to maintain solution.

Copy link
Collaborator Author

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I started writing docs with some examples for a followup PR, and it looks pretty good. The main shortcoming I see now is that the implementation function for a rule is very opaque (ctx in, list of providers out). So, that'll be the next thing to figure out, but that's largely separate from "how to define a rule".

@rickeylev rickeylev enabled auto-merge March 13, 2025 23:12
@rickeylev rickeylev added this pull request to the merge queue Mar 13, 2025
Merged via the queue into bazel-contrib:main with commit 389431b Mar 13, 2025
4 checks passed
@rickeylev rickeylev deleted the refactor.attr.builders branch March 15, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants