History Of Streams

The history of streams

prehistory (streams before a stream api)
streams 1
summer of streams, userland vs core
streams2 (one step forward, 2 steps back)
streams3
conclusion

prehistory

in node.js’s early days back in 2010, the node.js website featured an example of a tcp echo server.

on feb 9th, 2010 the tcp stream’s events were named “receive” and “eof”. Shorly later, by feb 18th, 2010 those events had been renamed to their current “data” and “end”. There was not yet a Stream class, just an EventEmitter, but it was recognised that files and tcp sockets, etc were basically the same and a convention had developed.

This is the birth of node’s most important api, the Stream. But things were just beginning to take form.

Although it never appeared on the website, in nodejs@0.2 the sys module (now the util module) featured a pump method. by the time of the first node conference there was a base class for Stream and sys.pump(readStream, writeStream, callback) had become readStream.pipe(writeStream)

By node@0.4 stream was now well cemented in the node.js api, but it wasn’t until pipe becomes chainable that it feels like the node.js streams as we know it.

Around this time the streams was beginning to get popular. When I discovered streams and realized they would be useful, I wrote event-stream which applied streams to a use similar to Functional Reactive Programming

I convinced substack and Max Ogden how useful streams were, and then they convinced everyone else.

Streams had been created by the core node developers, but they were then adopted by userspace - by people who maybe didn’t contribute much to node’s core, but did contribute greatly to node’s ecosystem.

These two groups had slightly different interpretations of what a stream was. Mainly, the core developers intended streams to only be applied to raw data. The core interpretation was that stream could be buffers or strings - but the userland interpretation was that a stream could be anything that is serializeable - anything that could become buffers or strings.

My argument was that you needed all the features of streams, passing items one at a time, possibly with an eventual end or error wether it was a sequence of buffers, bytes, strings or objects. Why not use the same api? If javascript was strictly typed, we never would have had this argument, but javascript didn’t care and the streams code at that time was agnostic - it worked just the same with objects as it did with strings.

problems with early node streams.

Streams were recognised as a powerful abstraction, but also difficult to use. One main difficulty was that data started flowing through streams as soon as you created them. Once you had a stream, had to pipe it somewhere before you did anything async.

This problem was exacerbated by the fact that the pause method was only advisory, it did not mean that data would stop flowing immediately.

There were also generally just too many things you had to get right for streams to work - not only did you have to implement write end and events 'data' 'end' and then there was the (still controversial) destroy methods and 'close' events.

I had written pause-stream to handle the second case, and more importanly through. Through represented about 11 months worth of myself using streams in userland, and I had finally figured out the pattern that was most general and easiest to use - from a userland persective (stream authors near core were more likely to implement streams that read from or wrote to some device - low level interfaces - but userland streams are mostly transformations)

I had called the module “through” because I figured that input, output… throughput. it was later pointed out by Isaac that “through” was a common word and “transform” stream was better name. In hindsight, I agree. through finally made creating streams easy, you just had to provide two functions, and all the edge cases were handled for you. through currently has 1601 dependant modules, and it’s heir, through2 (same api but uses streams2) has 5341 dependents. (check with npm-dependents)

Interestingly, I started this a few weeks before isaac began developing streams2 - streams2 addressed the problems in streams1, but made a much more radical departure from the api.

streams2 was paused until you called pipe, and was implemented around a read(n) function. read took an optional number of bytes, and would return a buffer with that many bytes in it. This was intended to be useful for implementing parsers, but to use it you had to have a handle on the stream directly, because pipe didn’t give you a way to specify a number of bytes.

streams2 was much more complicated than streams1, but worse, it was backwards compatible with streams1 (although streams1 was considered “legacy”)

streams1 was a readable 100 lines or so, and I read it many times, but by the time streams2 was ready to go into node@0.10, the main class: Readable, was over 800 lines

During streams2 development the are-objects-data argument was settled with streams of objects officially sanctioned in the api (which would have otherwise excluded their possibility)

problems with streams2

All these changes in streams were causing their own problems, but one good move was developing streams2 in it’s own repository. This made it possible for those interested to depend on streams2 and test it out without having to use a fork of node.

Indeed, if you use through2 (and you very likey do use it, maybe because a module you use uses it) then you depend on that readable-stream repo. Although it’s built into node, the version in the repo is more cross version compatible. Modules that use readable-stream still work in old versions of node that did may not have as modern a version of streams!

Once the initial difficulties of piping new streams and getting friendlier apis for stream creation were surmounted, new problems emerged. There is a growing awareness that errors need to be propagated somehow. In this (open) issue, it’s shown that a naive http file server will have a file descriptor leak

To prevent that, you need to have a handle on both the source fs stream, and the destination stream that errored, not to mention a detailed understanding of how node streams function.

This is an open issue, and not even the iojs revolution could fix it. This would most surely be a breaking change, and so changing how core streams worked would be very difficult.

There is an approach in userland (that harkens back to sys.pump!)

There was one serious proposal to add error propagation to node streams. But the difficulty of providing backwards compatibilty made it untenable.

streams 3

The version of streams now in contemporary node@5 is considered streams3. it’s a significant refactor from streams2, but, is still backwards compatible. While it still supports streams2#read(n) that behavior is not activated until you trigger it by using it in a streams2 style. If you just use streams normally, via source.pipe(dest) then it works basically like streams1 streams3.pipe

If streams1 is classic streams and streams2 is new streams then streams3 is vintage streams. It’s old, but it’s new. But the sad thing, is that now it’s still too complex, because of features picked up during the streams2 detour.

stream 4? no.

If node streams teach us anything, it’s that it’s very difficult to develop something as fundamental as streams inside a “core” you can’t change core without breaking things, because things simply assume core and never declare what aspects of core they depend on. Hence a very strong incentive occurs to simply make core always be backwards compatible, and to focus only on performance improvements. This is still a pretty good thing, except sometimes decisions get inadvertently made that have negative implications, but that isn’t apparent until it’s too late. In this situation, a clean break is necessary - node.js itself is a great example of this. node created a IO performance improvement over previous dynamic languages (perl, ruby, python) because node was able to abandon threads and blocking, and write an entire stack from scratch.

But, usability problems with streams1 stopped us from seeing the need for error propagation until it was too late, and now node’s success has created it’s own obstacle for further improvement.

Streams are one of the best things about node, but they are still quite difficult to use, and they still have room for improvement. But that improvement cannot be made while they are a part of node.js core.

to be continued in part two.