Skip to content

S3/ASF parser: fix parsing header list shape type with multivalue headers #13022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Aug 18, 2025

Motivation

We got a report with #12999 that sending a GetObjectAttributes request would fail depending on the order of the fields being passed.

While debugging and printing out low level HTTP request on the Java side, I realized that Java was sending the request with multiple headers for the same key, like some others SDK (Go in particular)

GET /test-key?attributes= HTTP/1.1
[...]
Amz-Sdk-Invocation-Id: 5cb65548-6947-4436-b731-ee6d618bc91c
Amz-Sdk-Request: attempt=1; max=3
Authorization: ---
X-Amz-Date: 20220316T000334Z
X-Amz-Object-Attributes: Checksum
X-Amz-Object-Attributes: ObjectParts

When receiving headers that way, our web server will return them as a list of values, accessible via Headers.getlist().

Because the parser was calling Headers.get(), it would only get the first value, and in the user sample above, would not even return the second requested attribute.

This PR adapts the parser to use getlist in the header shape type is list, and keep the old automatic splitting on , that is still required.

Changes

  • add a parser unit test (thanks a lot for adding the headers override! This was so much easier to write a test for it!!!)
  • fix the parsing logic to handle multivalue headers
  • add an S3 test around multivalue headers (requests concatenates multivalue headers by default but urllib3 does not-
  • add an order S3 test because my first intuition was related to the order of returned fields

Testing

If you remove the parser fix, the tests now fail.
I also validated that the user-provided sample in the linked issue now works 👍

@bentsku bentsku added this to the 4.8 milestone Aug 18, 2025
@bentsku bentsku self-assigned this Aug 18, 2025
@bentsku bentsku added aws:s3 Amazon Simple Storage Service area: asf semver: patch Non-breaking changes which can be included in patch releases labels Aug 18, 2025
Copy link

github-actions bot commented Aug 18, 2025

S3 Image Test Results (AMD64 / ARM64)

    2 files    2 suites   8m 11s ⏱️
  517 tests 467 ✅  50 💤 0 ❌
1 034 runs  934 ✅ 100 💤 0 ❌

Results for commit 4df5e5c.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Aug 18, 2025

Test Results - Preflight, Unit

22 140 tests  +1   20 404 ✅ ±0   6m 43s ⏱️ +16s
     1 suites ±0    1 735 💤 ±0 
     1 files   ±0        1 ❌ +1 

For more details on these failures, see this check.

Results for commit 4df5e5c. ± Comparison against base commit 5bee08e.

♻️ This comment has been updated with latest results.

Copy link

Test Results (amd64) - Acceptance

7 tests   5 ✅  3m 4s ⏱️
1 suites  2 💤
1 files    0 ❌

Results for commit ac1d143.

Copy link

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 21m 10s ⏱️
4 991 tests 4 399 ✅ 592 💤 0 ❌
4 997 runs  4 399 ✅ 598 💤 0 ❌

Results for commit ac1d143.

@bentsku bentsku marked this pull request as ready for review August 18, 2025 19:48
Copy link

LocalStack Community integration with Pro

    2 files      2 suites   1h 43m 3s ⏱️
4 632 tests 4 192 ✅ 440 💤 0 ❌
4 634 runs  4 192 ✅ 442 💤 0 ❌

Results for commit ac1d143.

Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a nice catch! LGTM

Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely great catch! I am just wondering about a super-edge case, which really doesn't block a merge here. 💯

# headers may contain a comma separated list of values (e.g., the ObjectAttributes member in
# s3.GetObjectAttributes), so we prepare it here for the handler, which will be `_parse_list`.
# Header lists can contain optional whitespace, so we strip it
# https://www.rfc-editor.org/rfc/rfc9110.html#name-lists-rule-abnf-extension
payload = [value.strip() for value in payload.split(",")]
# It can also directly contain a list of headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): Is it also possible that there are multiple headers with the same key, some contain a single value, and some contain multiple values (comma-separated)?

Copy link
Contributor Author

@bentsku bentsku Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought of this one 👀 I was thinking about splitting all the individual values in the list over , at first. It's a bit tricky, because I know for example Set-Cookie (which was one of the first case I had with multi values headers in localstack/rolo#29) can contain commas in the value, which is why they are sending it a multiple values with the same key instead of a comma-separated single header.

The RFC 2616 states the following:

Multiple message-header fields with the same field-name MAY be
present in a message if and only if the entire field-value for that
header field is defined as a comma-separated list [i.e., #(values)].
It MUST be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.

So it's not super clear either. I'll give it a try, and depending on the result will update or merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried with curl, which treats multi value headers properly but couldn't get it to run against AWS because GetObjectAttributes needs to be signed, so I've also tried with awscurl but it doesn't manage multi value headers properly and just overwrite with the last one specified 😕

But reading the RFC, I'm thinking of actually joining all returned values from getlist over ,, then splitting them like the previous behavior, following this particular part: It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

I'll update, I think this is probably never gonna happen, but it might simplify the code a little bit 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: asf aws:s3 Amazon Simple Storage Service semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants