Skip to content

Fixes for poor HardwareSerial performance/crashes exacerbated by SDK1.5 #1320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 4, 2016

Conversation

alltheblinkythings
Copy link
Contributor

After SDK 1.5, libraries like NeoPixelBus which use the UART to generate timing signals stopped working correctly (logic analyser shows large intercharacter delays) as well as experiencing crashes in the IRQ handler. This series of commits should up many of these issues.

… FIFO.

Not sure why, but this reduces the occurrence rate of an occasional ~3.25 or
~7μs intercharacter delay, which was interfering with use of the UART to
generate precise timing pulses (such as driving WS2812 LEDs).
…RAM.

This is required per the non-OS SDK doc, which states:
  "Using non-OS SDK, please do not call any function defined with
  ICACHE_FLASH_ATTR in the interrupt handler."

This avoids an "Illegal instruction" exception with epc1 pointing at a valid
instruction (in flash) for one of the moved methods.
…RX if we can't allocate a buffer for them.

Prior to this change, the interrupt could fire during initialisation,
necessitating a deep check that the HardwareSerial structure had valid
_tx_buffer or _rx_buffer each time an interrupt occurred.

By keeping uart_t's and HardwareSerial's (txEnabled, _tx_buffer) and
(rxEnabled, _rx_buffer) in sync, we can remove this extra check, as
well as fixing a null pointer dereference if e.g. _tx_buffer allocation
failed and write() was subsequently called.
Prior to this change, if interrupts were disabled during a call to
HardwareSerial::write() when the circular buffer was full, or
HardwareSerial::flush() when the circular buffer was non-empty,
we would loop forever and trip a watchdog timeout.
Tracking _written was required on AVR devices to work around a hardware
limitation when implementing flush().  The ESP8266 doesn't have the same
issue, so we can remove the tracking and make write() more lightweight.

The cost is a minor increase in work done in flush() when write() was not
previously called, but this should be much rarer than individual character
writes.
@me-no-dev
Copy link
Collaborator

Can you please outline which parts of the code that you changed you think are actually related and causing the problems you describe. Going over the files, most edits seem to be more of your preferred coding style than actually changing anything in the logic.
The most in-my-eyes change is adding InterruptLock and postponing UART init after buffers initialization.
There have been some issues especially when printing inside interrupt routines, so I'm curious to what actually might be the cause and what you have done to overcome it.

@alltheblinkythings
Copy link
Contributor Author

Sure. But let me first start by saying two things:

  1. none of this is to change to a "preferred style", just trying to get things working consistently after the SDK1.5 change. I've tried very hard to stick with the same style as the previous code and to break things up into easily digestible chunks, and any change that changes the formatting was likely an error from cutting and paste. Is there something in particular you are referring to?
  2. I wasn't working to fix any particular issue with printing inside interrupt routines, just trying to get the Neopixel code working reliably. This said, I have noticed some issues when using the debug handlers (e.g. Serial.setDebugOutput(true)), which persist as of right now - and I am still testing some fixes (which I can add to this request, but wanted more soak time for and was going to propose separately after this). Do you have any issue numbers I can look at so I can see if they're the same thing?

With regard to the individual commits, none of the changes independently "solve" the issues seen with the NeoPixelBus usage. I've tried to put the reason for each one into the individual commit message, but here's some more info on the set as a whole:

  • with no commits, basically every character shows a gap after it and the precise timing is gone. We get random colours out of connected neopixels with simple test code set to mark every pixel black.
  • e147314 to transmit when FIFO is empty outside of the InterruptLock dramatically improves performance, but there's still a low rate of latency spikes - once per second, per a logic analyser. (I came up with this as a progressive revert of the previous change to add an InterruptLock to this block to prevent corruption. If you know why this would make such a difference and could help understand what's going on, let me know - I've not spotted anything obvious in the architecture manuals.)
  • e83dd4d to use lightweight tests seemed to reduce the rate of latency spikes to once per five seconds or so (as seen by logic analyser). But I started to see rare crashes traced to cbuf::getSize() or cbuf::room() being called from flash in the IRQ handler via tx_empty_irq.
  • cfe7ae1 is to address the crashes seen from the IRQ handler calling functions that are in flash. I identified the code to move by reading the functions in objdump output. I converted a number of calls to macros because this reduced the amount of code generated and executed in the critical path, as well as reducing the amount of IRAM usage. Some code moves from .h to .cpp to avoid "section conflict" errors at compile time. This commit seems to reduce intercharacter performance, too - my speculation is that this is due to HardwareSerial::write() now being forced into RAM. I thought maybe this could be the fix for everything, but when tried on its own we still have many visible glitches without e147314.
  • 4ef0466 is to switch to a polling mode for transmission if interrupts are disabled, and the system buffers are in such a state that we can't make progress without them. I was running code to exercise various code paths I was touching and observed this leading to crashes. It might help out with transmitting data from an interrupt handler (if you're seeing a WDT error) but doesn't fix the issues I mentioned with debug output.
    • I haven't yet had a chance to experiment and see if stuffing the TX FIFO more aggressively in a else if(_tx_buffer->write(c)) { _tx_empty_irq(); break; } code path would help out further - I'd want to sit down again with a logic analyser and look.

The remainder of the commits are to cut down the code size and number of instructions in the critical path:

  • 83398f6 came up as a cleanup when trying to diagnose the IRQ crash problem. It shaves some cycles from the IRQ handler and seems to help performance minimally, as seen from the logic analysers.
  • c8772cf is a correctness fix, stemming from my trying to remove duplicate code (some tests already done by the isTxEnabled() call in the uart_interrupt_handler) from the critical path. It shaves some cycles from the TX handler and fixes a memory leak on repeated start() and delayed crash if cbuf allocation fails. I don't expect it to fix anything that you would have seen unless you are calling HardwareSerial::end() while actively using the serial port.
  • 2375fb0 is a trivial cleanup of unneeded #includes. I noticed these when looking for other users of the cbuf code when splitting it into .cpp and .h.
  • bc9493e is to shorten the write() path. It came from diving into the use of _written to find out if there was a particular reason for its existence, and discovering it is seemingly just a carryover from AVR code.

@me-no-dev
Copy link
Collaborator

Wow what a lengthy response :) To be clear about the "coding style" I meant mostly the Macros you defined as the compiler will replace them in the code and produce the same exact code but with while(0) around it, but skipping those through the changes is not easy :)
If you have time and want to give something a shot on a side here is some code I use specially for debugging. It uses only the hardware buffer and does not touch interrupts at all.

extern "C" void system_set_os_print(uint8 onoff);
extern "C" void ets_install_putc1(void* routine);

static void _u0_putc(char c){
  while(((U0S >> USTXC) & 0x7F) == 0x7F);
  U0F = c;
}

void initSerial(){
  Serial.begin(115200);
  ets_install_putc1((void *) &_u0_putc);
  system_set_os_print(1);
}

This code enabled os_printf to work through it. Do some debugging and see if it will make a difference :)

@alltheblinkythings
Copy link
Contributor Author

The do { something_that_returns_void(); } while(0) is a style thing, I guess, but hardly personal! :-) It's the "safe" way to write void-function-like-macros that I spent years being told to use across various projects.

http://kernelnewbies.org/FAQ/DoWhile0 seems to be a reasonable explanation of why this is the "right" thing to do in macros. I've also seen another thread here recent where disabling some debugging macros caused bad code by not using this style for empty macro blocks.

It's probably not not strictly necessary here, given most of these are single statement expressions, but still protects against a class of errors that could come up.

For example, if one were to erroneously write UART_TRANSMIT_CHAR(uart_nr, c) + 1;, you'd expect a compile-time error, however without the braces this expands to USF(uart_nr) = (c) + 1;, which generates unexpected results instead. Yes, we could replace the macro with a statement-expression instead, like so: #define U_T_C(n, c) ({USF(n) = (c);}), but it's no longer emulating a void function, so at best, we get a "expression result unused" type warning with the "+ 1" example here, rather than an error.

Do you have better guidance on that which is used by this project? I'm wondering if I missed a style guide somewhere.

@me-no-dev
Copy link
Collaborator

There is no style guide :) the same as those defines did not change the logic. The situations you explain are not something that the code was about to get into. Those changes just make it difficult to see what you actually changed.
Have you tried what I pasted above?
And SDK1.5 sucks! I am firmly at 1.3 till a better alternative arrive.

igrr added a commit that referenced this pull request Jan 4, 2016
Fixes for poor HardwareSerial performance/crashes exacerbated by SDK1.5
@igrr igrr merged commit ce1b64b into esp8266:master Jan 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants