-
Notifications
You must be signed in to change notification settings - Fork 1.1k
urequests: Expose HTTP response headers. #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urequests: Expose HTTP response headers. #217
Conversation
Section 3.2 of RFC 7230 states that HTTP header names are case-insensitive. Further, section 3.2.2 states that HTTP header names do not need to be unique as long as the semantics of the message are retained when concatenating the corresponding values into a comma separated list. When non-unique header names are present, their order is to be preserved in the concatenation. This implementation attempts to comply with these requirements.
You are right that this PR add too much, in both the HttpHeaderDict class and the fact that all headers are stored in this dict. One way around the need for HttpHeaderDict is to just use a normal And then there could be an optional argument to the |
Also #194 |
That's maybe the easiest and a memory efficient solution. But how about instead letting the user provide a custom
This would require only very minimal changes to
|
A simpler headers access change was merged in 5854ae1 as part of #500. While it doesn't meet all the desired behavior here, hopefully it's enough for most users. At this stage it's still case-sensitive but if anyone has suggestions for a light-weight case-insensitive change a new PR would be very welcome. |
Besides lower vs. upper case: The said PR does not allow multiple headers of the same name, such as Anyways: I'm new to MicroPython, so I will not add a PR, nor am I sure about the following suggestion: Is there something like defaultdict available in MicroPython? But even if not, I guess it should be possible to add something similar in line 110 in 5854ae1? CC @pi-anl |
This is controlled by parse_headers param to request(), which defaults to True for compatibility with upstream requests. In this case, headers are available as .headers of Response objects. They are however normal (not case-insensitive) dict. If parse_headers=False, old behavior of ignore response headers is used, which saves memory on the dict. Finally, parse_headers can be a custom function which can e.g. parse only subset of headers (again, to save memory).
If a parse_header function is passed in it should receive any duplicated headers that may have been sent by the server. In fact, even with the fallback dict, if two copies of a header with different case are sent you'll be able to see both of them - this is different to cpython requests library where headers are exposed in a caseless dict - no duplicates will be visible there. Is there a standard compliant use case for duplicate headers you can describe where the current implementation doesn't work? I'm getting mixed information from a Google search on the topic, though it looks like RFC 6265 states you should not have multiple set-cookie headers, there should be one header with string separator between multiple cookies.
It does go on to note that in practice set-cookie is sometimes duplicated in violation of the spec and should be handled separately to cover this. In light of this it could make sense to explicitly look for duplicates of set-cookie, then perhaps combine them properly to expose to the user in the dict? Either way the existing parse_header callback should be usable to handle that already if needed. What are you proposing defaultdict to be used for? I don't see how that would help for duplicates or key case? |
Last in first out: defaultdict. The defaultdict Examples show how to parse_header callback: I will have a look at it. Sorry I didn't do it before, but as I said I'm new to MicroPython, and I struggle in finding docs. In fact, I didn't even find something about getting the headers at all before this PR here. Most important, my "standard compliant use case for duplicate headers". The "standard compliant" is where I most probably will be beaten :-) There's a discussion about that under the Cache-Control-answer on Stack Overflow I linked, btw. But anyways: PHP not only allows to set headers of the same type, it even has a Edit: See the MDN web docs I linked in my first answer:
|
Were you referring to the "Using list as the default_factory" example? Having that means you no longer have a "key: value" set of headers, you have a "key: [list of values]" which would be a break of convention for with other python http libraries. Speaking of which, the usage of this library generally tries to match (a subset of) the usage of the There's also only so much "working around poorly behaving servers" we can efficiently add to a library that needs to be lean for embedded use on a range of devices. |
Well, any web dev will look for sources like that: PHP if it's a backend developer, MDN for frontend devs (or similar, this is not about PHP or MDN, they're just examples). Both state that multiple headers are not only possible (PHP), they are even a "should" (MDN). No mention of concatenating them into one header. You asked for use cases, here you are. No user of the standards look up RFCs, they look up their respective languages' documentation. That being said, I see your point, specially in being somewhat "limited" by compatibility. That's one valid policy, bc why would the expectations of web dev be more important than the expectations of regular Python's devs? ;-) So, as defaultdict and its "list of values" approach isn't a valid return value in line 122 of 5854ae1, maybe it still is handy in gathering the data in line 110? And before returning it, each entry of that defaultdict is In other words: If multiple headers of the same type are supposed to come merged into one, why not merging them into one? |
I didn't see any links to where these other languages are promoting usage of duplicate headers? It's a real shame if they are, as this is why incompatibilities arise and developers are led astray, when anti-patterns that break spec are shown to people who don't know better.
It turns out the built-in python urllib3 (which underpins most python web libraries) used to merge keys this way 15 years ago, but found this caused more problems: https://bugs.python.org/issue1660009 Regardless of what individual language docs / examples might show, a quick Google shows many instances of the controversy over duplicate headers in many languages. Regardless of what servers should be doing, there are clearly some that don't follow spec already out there. I'm not sure what the best way to deal with these arguable broken servers here is... The existing Response.headers is "correct", in that it follows spec and is consistent with cpython requests. The optional headers callback function is non-standard, but is a workaround that should allow developers to deal with these broken servers - perhaps an example of this usage would be good to add to the module here? Alternatively, the multiple headers could be stored internally in a list and exposed some other way to users that need to deal with broken servers; I'm just not sure how common these are and whether it's worth adding code bulk to all users to benefit the few? |
For a project that I'm working on, I need to be able to extract some information from HTTP response headers. This capability is provided by the CPython
requests
library, but is not presently provided by the MicroPythonurequests
library. Hence, I have added it in a consistent way.This change is benficial from a functional standpoint, but there are some downsides with respect to the MicroPython project's goals. In particular, this increases the Python byte code size for the
urequests
module by about 2x (i.e. 4966 vs. 2498 bytes). Of course, this also increases the amount of memory used by theurequests
module at runtime since the HTTP header response headers are now held in memory. For my need this is acceptable and necessary, but it is disappointing to see that the memory consumption is about 5x the raw byte count for the header text. In one example that I looked at, this was 6816 vs. 1383 bytes.Given the above, I'm not necessarily expecting this pull request to be accepted, but I would appreciate feedback and discussion on how to work towards that. Are there any tricks known from working on other modules that I might be able to leverage here?
Here are a few observations that I have:
Since RFC 7230 states that HTTP headers are case-insensitive and do not require unique keys, I implemented a specialized
dict
class to handle this. This is certainly the most consequential decision with respect to byte code size and runtime memory usage. Most every HTTP library has some variation of this. It appears that often base classes in the CPythoncollections
module are used to simplify the implementation somewhat, but those are not available in MicroPython.Initially the HTTP header strings are read via the network connection a line at a time. Each string is then split to extract the key and value that are ultimately added to the header
dict
. An opportunity for improvement would be to some how reuse the initial strings instead of generating new ones along the way to building thedict
. Maybe this could be achieved usingmemoryview
instances, but it seems like it would quickly get complex and require a lot of code.Has any approach to turning on and off features been found with respect to other modules? The best solution might be to have a way to turn on or off HTTP response header handling based on the client's needs. As the
micropython-lib
project continues to get built out I'm guessing such an approach will become relevant in numerous places. Even with respect to theurequests
module, this could be relevant to adding capabilities like redirect or cookie handling.