-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Clarify base64.a85(en,de)code
documentation for Adobe mode
#134837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you need to be more permissive about whitespace, could you call I changed this to a feature since the decoder is standard compliant, and you're asking for a behavior change, but even then I'm not sure if this is something we should make more flexible if there is an easy solution for users that want flexibility around whitespace. |
Yes, that works! In practice I do
In the end, I think it should simply be a documentation change, to make it explicit that I can make a PR for this. |
Actually there are a few things to be improved in the documentation:
|
Here is a quick solution to repair the broken pdfs: https://github.com/kevinveenbirkenbach/pdf-healer |
base64.a85decode
throws exception on trailing whitespace in Adobe modebase64.a85(en,de)code
documentation for Adobe mode
I changed this to a docs issue since it sounds like the work that needs to be done here is mostly around documenting the semantic differences when using adobe mode and expand on what limitations it enforces. @dhdaines would you be interested in making a PR to expand the documentation? https://devguide.python.org/documentation/start-documenting/ |
Absolutely! I already started one, will submit it in the next few days. |
Interesting, didn't realize it was such a widespread issue! The bug has been fixed for a while in (shameless plug) PLAYA-PDF and also more recently in pdfminer.six. |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
It seems that whitespace is allowed everywhere by
base64.a85decode
, except after the end-of-data delimiterb'~>'
inadobe
mode:While this behaviour is actually compliant with the very latest PDF standard, including errata, in practice it's quite surprising, and also causes problems due to the legacy of
centuriesdecades of ambiguous PDF standards and implementations that emit and accept extra whitespace due to these amgibuities.A separate but related issue is that some very broken PDF implementations have even been known to insert whitespace between the
~
and>
bytes. It maybe useful for "Adobe" mode to be tolerant of this as well.Obviously, also, PostScript doesn't care about extra whitespace after
~>
in ASCII85 literal strings. (Note that the leading<~
is only accepted in PostScript and not in PDF).Because
>
is a valid ASCII85 digit, an improved rule would be to only accept the regular expression~\s*>\s*
at the end of input in Adobe mode.CPython versions tested on:
3.11
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: