Clarify `base64.a85(en,de)code` documentation for Adobe mode #134837

dhdaines · 2025-05-28T14:03:51Z

Bug report

Bug description:

It seems that whitespace is allowed everywhere by base64.a85decode, except after the end-of-data delimiter b'~>' in adobe mode:

>>> base64.a85decode(b"6#q'\\F`JTK<-N74;eT`QF!;`!@:O(oDf,~>", adobe=True)
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b"  6  # q' \\     F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf,~>", adobe=True)
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b"  6  # q' \\     F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf,  ")
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b"  6  # q' \\     F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf,~>  ", adobe=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/base64.py", line 388, in a85decode
    raise ValueError(
ValueError: Ascii85 encoded byte sequences must end with b'~>'

While this behaviour is actually compliant with the very latest PDF standard, including errata, in practice it's quite surprising, and also causes problems due to the legacy of ~~centuries~~decades of ambiguous PDF standards and implementations that emit and accept extra whitespace due to these amgibuities.

A separate but related issue is that some very broken PDF implementations have even been known to insert whitespace between the ~ and > bytes. It maybe useful for "Adobe" mode to be tolerant of this as well.

Obviously, also, PostScript doesn't care about extra whitespace after ~> in ASCII85 literal strings. (Note that the leading <~ is only accepted in PostScript and not in PDF).

Because > is a valid ASCII85 digit, an improved rule would be to only accept the regular expression ~\s*>\s* at the end of input in Adobe mode.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

emmatyping · 2025-05-28T14:40:49Z

If you need to be more permissive about whitespace, could you call .rstrip() on the input to the decoder? Or otherwise replace whitespace in the input?

I changed this to a feature since the decoder is standard compliant, and you're asking for a behavior change, but even then I'm not sure if this is something we should make more flexible if there is an easy solution for users that want flexibility around whitespace.

dhdaines · 2025-05-29T14:01:44Z

If you need to be more permissive about whitespace, could you call .rstrip() on the input to the decoder? Or otherwise replace whitespace in the input?

Yes, that works! In practice I do re.sub with ~\s*>\s*$: https://github.com/dhdaines/playa/blob/main/playa/ascii85.py#L8

I changed this to a feature since the decoder is standard compliant, and you're asking for a behavior change, but even then I'm not sure if this is something we should make more flexible if there is an easy solution for users that want flexibility around whitespace.

In the end, I think it should simply be a documentation change, to make it explicit that adobe=True will throw a ValueError on trailing whitespace. As you say, it is standard compliant, and also changing the behaviour could create all sorts of confusion.

I can make a PR for this.

dhdaines · 2025-05-29T14:17:08Z

Actually there are a few things to be improved in the documentation:

ASCII85 is formally defined, in both the PostScript Language Reference and the PDF standard (ISO32000-2).
As mentioned above, PDF and PostScript do not agree on delimiters, as the opening <~ is in PostScript but not PDF. This also means that the behaviour of a85encode in "Adobe mode" is not standards-compliant for PDF.

kevinveenbirkenbach · 2025-05-31T20:07:14Z

Here is a quick solution to repair the broken pdfs: https://github.com/kevinveenbirkenbach/pdf-healer

emmatyping · 2025-05-31T20:58:08Z

I changed this to a docs issue since it sounds like the work that needs to be done here is mostly around documenting the semantic differences when using adobe mode and expand on what limitations it enforces.

@dhdaines would you be interested in making a PR to expand the documentation? https://devguide.python.org/documentation/start-documenting/

dhdaines · 2025-06-02T00:51:21Z

I changed this to a docs issue since it sounds like the work that needs to be done here is mostly around documenting the semantic differences when using adobe mode and expand on what limitations it enforces.

@dhdaines would you be interested in making a PR to expand the documentation? https://devguide.python.org/documentation/start-documenting/

Absolutely! I already started one, will submit it in the next few days.

dhdaines · 2025-06-02T00:52:48Z

Here is a quick solution to repair the broken pdfs: https://github.com/kevinveenbirkenbach/pdf-healer

Interesting, didn't realize it was such a widespread issue! The bug has been fixed for a while in (shameless plug) PLAYA-PDF and also more recently in pdfminer.six.

dhdaines added the type-bug An unexpected behavior, bug, or error label May 28, 2025

emmatyping added type-feature A feature request or enhancement stdlib Python modules in the Lib dir and removed type-bug An unexpected behavior, bug, or error labels May 28, 2025

emmatyping added docs Documentation in the Doc dir and removed type-feature A feature request or enhancement labels May 31, 2025

github-project-automation bot added this to docs issues May 31, 2025

github-project-automation bot moved this to Todo in docs issues May 31, 2025

emmatyping changed the title ~~base64.a85decode throws exception on trailing whitespace in Adobe mode~~ Clarify base64.a85(en,de)code documentation for Adobe mode May 31, 2025

emmatyping removed the stdlib Python modules in the Lib dir label Jun 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Clarify `base64.a85(en,de)code` documentation for Adobe mode #134837

Clarify `base64.a85(en,de)code` documentation for Adobe mode #134837

dhdaines commented May 28, 2025 •

edited

Loading

emmatyping commented May 28, 2025 •

edited

Loading

Uh oh!

dhdaines commented May 29, 2025 •

edited

Loading

Uh oh!

dhdaines commented May 29, 2025 •

edited

Loading

Uh oh!

kevinveenbirkenbach commented May 31, 2025

Uh oh!

emmatyping commented May 31, 2025 •

edited

Loading

Uh oh!

dhdaines commented Jun 2, 2025

Uh oh!

dhdaines commented Jun 2, 2025

Uh oh!

Uh oh!

Clarify base64.a85(en,de)code documentation for Adobe mode #134837

Clarify base64.a85(en,de)code documentation for Adobe mode #134837

Comments

dhdaines commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

emmatyping commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinveenbirkenbach commented May 31, 2025

Uh oh!

emmatyping commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented Jun 2, 2025

Uh oh!

dhdaines commented Jun 2, 2025

Uh oh!

Clarify `base64.a85(en,de)code` documentation for Adobe mode #134837

Clarify `base64.a85(en,de)code` documentation for Adobe mode #134837

dhdaines commented May 28, 2025 •

edited

Loading

emmatyping commented May 28, 2025 •

edited

Loading

dhdaines commented May 29, 2025 •

edited

Loading

dhdaines commented May 29, 2025 •

edited

Loading

emmatyping commented May 31, 2025 •

edited

Loading