Skip to content

base64::encode() borks on some strings (0x03 End TX character?) #3194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
psychogony opened this issue May 3, 2017 · 4 comments
Closed

base64::encode() borks on some strings (0x03 End TX character?) #3194

psychogony opened this issue May 3, 2017 · 4 comments

Comments

@psychogony
Copy link
Contributor

Basic Infos

Hardware

Hardware: Wemos D1 mini (hardware v2.1.0)
Core Version: 2.3.0

Description

I am base64 encoding some raw binary data (actually AES128 CBC cipher output).
base64::encode() appears to break on some data (other data works fine) adding non printable characters.

I haven't yet managed to pin this on the String libraries, lib64, or something else.
That said, swapping out the C lib64/base64 implementation makes the problem go away.
The same input string (hex escaped the same way) works fine when encoded with other platform implementations (eg. php base64_encode).
Looks like lib64 itself is no longer maintained...

Example minimal repo is below.
In this particular example, the problem appears to relate to the ETX character in the input.

Settings in IDE

N/A - not an IDE problem.
I am actually using PlatformIO (under Linux Mint).
Same issue reproducable with Arduino IDE (Linux version 1.81) using below example.

Sketch

Using lib64 backend

#include <Arduino.h>

// cores/esp8266/base64.h
// Actually http://platformio.org/platforms/espressif8266, version: 1.3.0 but origin git files unchanged
#include "base64.h"

String test = "\x0d\xa6\x8d\x16\x39\x40\xc0\x9b\x54\xba\xed\xd2\x01\xe7\xa8\xd6\x8f\x9b\x9f\x71\xdb\x7f\x98\x93\xd8\x02\x87\x8d\x6e\xfa\xcb\xd8\xa9\x03\x28\xe3\x31\xfe\x3f\xeb\xcd\x23\x03\x51\x71\xc7\x1c\x12\xd3\x6b\x8d\xd6\x86\x32\x5d\x36\xcc\x52\xa4\xe5\xbd\x63\x50\xd4\x07\xaa\x4c\xec\x64\x4c\x5d\x3f\x93\xea\xb1\x45\x30\xa9\x40\xa8";

  String strOut = base64::encode(test);

  Serial.println("\n");
  Serial.print("'");
  Serial.print(strOut.c_str());
  Serial.println("'");

// Output NOTE: the unprintable character / line break is part of output
// 'DaaNFjlAwJtUuu3SAeeo1o+bn3Hbf5iT2AKHjW76y9ipAyjjMf4/680jA1FxxxwS02uN1oYy
// XTbMUqTlvWNQ1AeqTOxkTF0/k+qxRTCpQKg='

// Closer examination of the string buffer reveals an unprintable character
// ie.
// <snip> 78 78 78 77 53 30 32 75 4e 31 6f 59 79 0a 58 54 62 4d 55 71 54 <snip>
// <snip> x  x  x  w  S  0  2  u  N  1  o  Y  y     X  T  b  M  U  q  T  <snip>

Using another third party implementation for backend (same hardware device/environment)

// https://github.com/adamvr/arduino-base64 (MIT license)
// Requires __AVR__ header guard for <pgmspace.h>, messing with function char constness and renaming .h file/functions due to name clash
#include "Base64.h"
String test = "\x0d\xa6\x8d\x16\x39\x40\xc0\x9b\x54\xba\xed\xd2\x01\xe7\xa8\xd6\x8f\x9b\x9f\x71\xdb\x7f\x98\x93\xd8\x02\x87\x8d\x6e\xfa\xcb\xd8\xa9\x03\x28\xe3\x31\xfe\x3f\xeb\xcd\x23\x03\x51\x71\xc7\x1c\x12\xd3\x6b\x8d\xd6\x86\x32\x5d\x36\xcc\x52\xa4\xe5\xbd\x63\x50\xd4\x07\xaa\x4c\xec\x64\x4c\x5d\x3f\x93\xea\xb1\x45\x30\xa9\x40\xa8";

  char out[base64_enc_len(test.length())];
  base64_encode(&out[0], test.c_str(), test.length());
  String strOut = String(out);

  Serial.println("\n");
  Serial.print("'");
  Serial.print(strOut.c_str());
  Serial.println("'");

// Output NOTE: No line break from unprintable character
// 'DaaNFjlAwJtUuu3SAeeo1o+bn3Hbf5iT2AKHjW76y9ipAyjjMf4/680jA1FxxxwS02uN1oYyXTbMUqTlvWNQ1AeqTOxkTF0/k+qxRTCpQKg='


@igrr
Copy link
Member

igrr commented May 3, 2017

This is fixed in master, see #2883.

@psychogony
Copy link
Contributor Author

Hi igrr,

Thanks for getting back to me so quickly (and thanks for all the work on esp8266/Arduino!).

Unfortunately this issue is not fixed by #2883.
The above commit is for the decoder, the issue I am having is with the encoder.
The encoder does not seem to be affected by the 'xtensa gcc char is unsigned' problem.
At least on my platform/setup anyway (-funsigned-char speced somewhere in toolchain?)

A deeper look suggests this is intended behavior for lib64.
Perhaps dates back to the original command line functionality of lib64.

Commenting out this code block fixes my issue.
Should the stepcount linebreaks stay in the repository for ESP8266 Arduino?

@igrr
Copy link
Member

igrr commented May 4, 2017

Apologies, i misread your post, i thought it was about decoding and ETX character appearing in the output.

I understand that newlines are not needed for some cases, but it's hard to guarantee that removal of newlines will not break applications which depend on them. How about introducing another function into lib64, which takes line width as an argument (with 0 meaning no line breaks)? And then implementing the existing one as a call to the new one, with line width set to the current value.

@psychogony
Copy link
Contributor Author

psychogony commented May 5, 2017

Yes - I had it wrong.
I believed the problem related to encoded bytes, when actually it relates to encoded byte length.
For me this bug manifests intermittently as the tail end of a long series of events (sensor readings -> rtc mem over multiple wake/sleep cycles -> AES encrypt bytes -> base64 encode -> send -> receiving server chokes).

Technically what lib64 is spitting out is not base64 encoded (\n is not in the base64 alphabet). This behavior breaks both uris and json. Reading up more on lib64 this may relate to historical pop/smtp server limitations. That said, you have a point that someone out there probably relies on this behavior.

I can just String = String.replace("\n", ""), but that seems like a massive waste of RAM and CPU cycles.
Should have time to provide a patch/pull request this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants