Skip to content

test: Handle Filter flake with ctx errors #7119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 14, 2023
Merged

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Apr 13, 2023

Fixes: #7114

@Emyrk Emyrk requested a review from johnstcn April 13, 2023 15:55
@johnstcn johnstcn requested a review from spikecurtis April 13, 2023 16:26
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right to me, I think this is a tricky flake to hit.
Can we add a test to assert this behaviour?

Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variation on this bug probably exists in all the places in the code we call Authorize

@@ -137,6 +137,9 @@ func Filter[O Objecter](ctx context.Context, auth Authorizer, subject Subject, a
err := auth.Authorize(ctx, subject, action, o.RBACObject())
if err == nil {
filtered = append(filtered, o)
} else if ctx.Err() != nil {
// Exit early if the error comes from the context
return nil, ctx.Err()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code implicitly assumes that there are only 2 classes of errors that could be returned from auth.Authorize

  1. the authorization completed, but the subject is not authorized
  2. the context is Done

Prior to this change, this code implicitly assumed that there was only 1 class of errors.

It is difficult to be sure that the right answer is 2. Could it be 3 classes? 4?

Honestly, I think a better design would be for auth.Authorize to return two values --- one that tells you whether the subject is authorized, and another to tell you whether or not there was an error checking the authorization.

Failing that, it's generally better for error checking to be explicit and specific:

err := auth.Authorize(...)
if xerrors.Is(err, rbac.Unauthorized) {
    continue
}
if err != nil {
    return nil, err
}
filtered = append(filtered, o)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about this more, it's not about implicit vs explicit. It's about how the code reacts if Authorize() returns an unexpected error. By only handling the context case, we'd be the in the same boat again of swallowing the unexpected error and treating it like an unauthorized response.

If we instead handle the expected errors, and return any others, we're less likely to incorrectly ignore them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the 2 classes of error returned by authorize:

coder/coderd/rbac/authz.go

Lines 326 to 333 in 6d41edc

results, err := a.query.Eval(ctx, rego.EvalParsedInput(astV))
if err != nil {
return ForbiddenWithInternal(xerrors.Errorf("eval rego: %w", err), subject, action, object, results)
}
if !results.Allowed() {
return ForbiddenWithInternal(xerrors.Errorf("policy disallows request"), subject, action, object, results)
}

Right now both are ForbiddenWithInternal. Maybe the correct solution here is to return something different on the Eval error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I can check for rbac.IsUnauthorizedError() on the Filter function loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it gets a bit more annoying in the prepared one.

EachQueryLoop:

🤔

@spikecurtis
Copy link
Contributor

This looks right to me, I think this is a tricky flake to hit. Can we add a test to assert this behaviour?

Yeah, a test would be good. We generate a list of say, 3 Objecters --- but the middle one is a little bomb that, as a side effect to it's RBACObject() call, cancels the context. Assert that we get the error back instead of a list of 2.

Copy link
Member Author

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.
Correctly made 2 categories of errors.

I also added some code to clean up cancelled errors to be more uniform with expected context.Cancelled

Comment on lines +330 to +331
err = correctCancelError(err)
return xerrors.Errorf("evaluate rego: %w", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For authorize single items.

Comment on lines +442 to +443
err = correctCancelError(err)
return xerrors.Errorf("eval error: %w", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For prepared.

@Emyrk Emyrk requested review from johnstcn and spikecurtis April 14, 2023 15:37
@Emyrk
Copy link
Member Author

Emyrk commented Apr 14, 2023

Ugh a unit test is really hard to make for this.

The ctx is checked for cancel in a go routine: https://github.com/open-policy-agent/opa/blob/main/rego/rego.go#L2002-L2004

If the policy executes fast enough, then the ctx is ignored. Which means this is racy.

@Emyrk Emyrk changed the title test: Handle Fitler flake with ctx errors test: Handle Filter flake with ctx errors Apr 14, 2023
@Emyrk Emyrk merged commit 2137db0 into main Apr 14, 2023
@Emyrk Emyrk deleted the stevenmasley/rbac_context_err branch April 14, 2023 17:30
@github-actions github-actions bot locked and limited conversation to collaborators Apr 14, 2023
@spikecurtis
Copy link
Contributor

Ugh a unit test is really hard to make for this.

The ctx is checked for cancel in a go routine: https://github.com/open-policy-agent/opa/blob/main/rego/rego.go#L2002-L2004

If the policy executes fast enough, then the ctx is ignored. Which means this is racy.

The standard procedure for when your code needs to call some 3rd party thing that is non-deterministic, is to mock it out and have a test for each case/class of what the non-deterministic thing can return.

@Emyrk
Copy link
Member Author

Emyrk commented Apr 18, 2023

@spikecurtis yea, I thought mocking out the rego would be tough because they are all structs. But I just need to mock out our authz interface. I'll make this test work in another PR and open it again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RBAC filtering truncates filterset instead of returning error if context expires
3 participants