Objection

RSS
Dec 7

We don’t need a healthcare platform

This text was triggered by discussions on Twitter in the wake of a Norwegian blog post I published about health platforms.  

I stated that we need neither Epic, nor OpenEHR, nor any other platform to solve our healthcare needs in the Norwegian public sector.  Epic people have not responded, but the OpenEHR crowd have been actively contesting my statements ever since.  And many of them don’t read Norwegian, so I’m writing this to them and any other English speakers out there who might be interested. I will try to explain what I believe would be a better approach to solve our needs within the Norwegian public healthcare system.

The Norwegian health department is planning several gigantic software platform projects to solve our health IT problems.  And while I know very little about the healthcare domain, I do know a bit about software.  I know for instance that the larger the IT project, the larger the chances are that it will fail. This is not an opinion, this has been demonstrated repeatedly.  To increase chances of success, one should break projects into smaller bits, each defined by specific user needs. Then one solves one problem at a time. 

I have been told that this cannot be done with healthcare, because it is so interconnected and complex. I’d say it’s the other way around. It’s precisely because it is so complex, that we need to break it into smaller pieces. That’s how one solves complex problems.  

Health IT professionals keep telling me that my knowledge of software from other domains is simply not transferrable, as health IT is not like other forms of IT.  Well, let’s just pretend it is comparable to other forms of IT for a bit.  Let’s pretend we could use the best tools and lessons learned from the rest of the software world within healthcare.  How could that work?

The fundamental problem with healthcare seems to be, that we want all our data to be accessible to us in a format that matches whatever “health context” we happen to be in at any point in time.  I want my blood test results to be available to me personally, and any doctor, nurse or specialist who needs it. Yet what the clinicians need to see from my test results and what I personally get to see will most likely differ a lot.  We have different contexts, yet the view will need to be based on much of the same data.  How does one store common data in a way that enables its use in multiple specific contexts like this?  The fact that so many applications will need access to the same data points is possibly the largest driver towards this idea that we need A PLATFORM where all this data is hosted together. In OpenEHR there is a separation between a semantic modelling layer, and a content-agnostic persistence layer.  So all datapoints can be stored in the same database/s - in the same tables/collections within those databases even. The user can then query these databases and get any kind of data structures out, based on the OpenEHR archetype definitions defined in the layer on top.  So, they provide one platform with all health data stored together in one place - yet the user can access data in the format that they need given their context.  I can see the appeal of this.  This solves the problem.

However, there are many reasons to not want a common platform. I have mentioned one already - size itself is problematic. A platform encompassing “healthcare” will be enormous.  Healthcare contains everything from nurses in the dementia ward, to cancer patients, to women giving birth, orthopaedic surgeons, and family of children with special needs… the list goes on endlessly.  If we succeed building a platform encompassing all of this, and the plattform needs an update - can we go ahead with the update? We’d need to re-test the entire portfolio before daring to do any changes.  What happens if there is a problem with the platform (maybe after an upgrade.)  Then everything goes down.  The more things are connected, the more risky it is to make changes. And in an ever changing world, both within healthcare and IT, we need to be able to make changes safely. There can be no improvement without change.  Large platforms quickly become outdated and hated.  

In the OpenEHR case - the fact that the persistence has no semantic structure will necessarily introduce added complexity in how to optimise for context specific queries.  Looking through the database for debugging purposes will be very challenging, as everything is stored in generic constructs like “data” and “event” etc.  Writing queries for data is so complex, that one recommends not doing it by hand - but rather creating the queries with a dedicated query creator UI.  Here is an example of a query for blood pressure for instance:

let $systolic_bp=“data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude”
let $diastolic_bp=“data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude”

SELECT
  obs/$systolic_bp, obs/$diastolic_bp
FROM
  EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
      CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1]
WHERE
  obs/$systolic_bp>= 140 OR obs/$diastolic_bp>=90


This is needless to say a very big turn-off for any experienced programmer.

The good news though, is that I don’t think we need a platform at all. We don’t need to store everything together. We don’t need services to provide our data in all sorts of context-dependent formatting. We can both split health up into smaller bits, and simultaneously have access to every data point in any kind of contextual structure we want.  We can have it all. Without the plattform.  

Let me explain my thoughts.

Health data has the advantage of naturally lending itself to being represented as immutable data.  A blood test will be taken at a particular point in time. Its details will never change after that. Same with the test results. They do not change. One might take a new blood test of the same type, but this is another event entirely with its own attributes attached. Immutable data can be shared easily and safely between applications. 

Let’s say we start with blood tests. What if we create a public registry for blood test results.  Whenever someone takes a blood test, the results need to be sent to this repository. From there, any application with access, can query for the results, or they can subscribe to test results of a given type. Some might subscribe to data for a given patient, others for tests of a certain type.  Any app that is interested in blood test results can receive a continuous stream of relevant events.  Upon receipt of an event, they can then apply any context specific rules to it, and store it in whatever format is relevant for the given application.  Every app can have its own context specific data store.

Repeat for all other types of health data.  

The beauty of an approach like this, is that it enables endless functionality, and can solve enormously complex problems, without anyone needing to deal with the “total complexity”. 

The blood test registry will still have plenty of complexity in it. There are many types of blood tests, many attributes that need to be handled properly, but it is still a relatively well defined concrete solution. It has only one responsibility, namely to be the “owner” of blood test results and provide that data safely to interested parties.  

Each application in turn, only needs concern itself with its own context.  It can subscribe to data from any registry it needs access to, and then store it in exactly the format it needs to be maximally effective for whatever usecase it is there to support.  The data model in use for nurses in the dementia ward does not need to be linked in any way to the data model in use for brain-surgeons.  The data stores for each application will only contain the data that application needs. Which in itself contributes to increased performance as the stores themselves are much smaller. In addition they will be much easier to work with, debug and performance tune, since it is completely dedicated for a specific purpose.

Someone asked me how I would solve an application for 

“Cancer histopathology reporting, where every single cancer needs its own information model? and where  imaging needs a different information model for each cancer and for each kind of image (CT, MRI, X-ray) +where genomics is going to explode that further”

Well I have no idea what kind of data is involved here.  I know very little about cancer treatment. But from the description given, I would say one would create information models for each cancer, and for each type of image and so on. The application would get whatever cancer-data is needed from the appropriate registries, and then transform the data into the appropriate structures for this context and visualise the data in a useful way to the clinician.   

We don’t need to optimize for storage space anymore, storage is plentiful and cheap, so the fact that the same information is stored in many places in different formats is not a problem.  As long as we, in our applications can safely know that “we have all the necessary data available to us at this point in time”, we’re fine.  Having common registries for the various types of data solves this.  But these registries don’t need to be connected to each-other. They can be developed and maintained separately. 

Healthcare is an enormous field, with enormous complexity. But this does not mean we need enormous and complex solutions. Quite the contrary. We can create complex solutions, without ever having to deal with the total complexity. 

The most important thing to optimise for when building software, is the user experience.  The reason we’re making the software is to help people do their job, or help them achieve some goal. Software is never an end in itself.  In healthcare, there are no jobs that involve dealing with the entirety of “healthcare”.  So we don’t need to create systems or platforms that do either.  Nobody needs them. 

Another problem in healthcare, is that one has gotten used to the idea that software development takes for ever. If you need an update to your application, you’ll have to wait for years, maybe decades to see it implemented. Platforms like OpenEHR deal with this by letting the users configure the platform continually. As the semantic information is decoupled from the underlying code and storage, users can reconfigure the platform without needing to get developers involved.  While I can see the appeal of this too, I think it’s better to solve the underlying problem.  Software should not take years to update.  With DevOps now becoming more and more mainstream, I see no reason we can’t use this approach for health software as well.  We need dedicated cross functional teams of developers, UXers, designers and clinicians working together on solutions for specific user groups.  They need to write well tested (automatically tested) code that can be pushed to production continuously with changes and updates based on real feedback from the users on what they need to get their jobs done.  This is possible. It is becoming more and more mainstream, and we are now getting more and more hard data that this approach not only gives better user experiences, but it also reduces bugs, and increases productivity. 

The Norwegian public sector is planning on spending > 11 billion NOK on new health platform software in the next decade. 

We can save billions AND increase our chances of success dramatically by changing our focus - away from platforms, and on to concrete user needs and just making our health data accessible in safe ways.  We can do this one step at a time. We don’t need a platform.

Dec 7

Just coding it like it is

Our coding environments are going too soft! Everywhere you integrate, programs are like 

400 Bad Request

SOAPFaultException

So sensitive. Bloody snowflakes.  

Well, it’s not my problem that you can’t handle my JSON. I’m just coding it like it is, man!  I know what my message is, there’s no problem with my message. My message is superior to yours in every way.  If you can’t handle my data, if you can’t handle the truth, that’s your problem not mine.  

But is it though?  

We don’t argue like this about code, do we? Why not? Because in coding we actually have to be rational.  If we want our applications to be effective, we have to consider the message protocols and formats of the services we are integrating with.  Even if we don’t particularly like them.  It makes no sense to send off messages in the wrong format, and then get all superior and defensive about it when we get an exception.  

“Bad Request? BAD REQUEST? That’s the best request you’ve ever seen in your life! What’s wrong with you? Calling MY requests bad, shame on you, you entitled little shit”   

Ridiculous, right?
Right? 

Surprisingly, this is true for human communication too. You can’t just spew out messages the way _you_ like them, without any consideration of the audience, and then get angry when they respond in the only way available to them. 

Well, you can, but don’t expect great success with this approach. 

RTFM people, get to know the APIs you’re calling if you want to have any success calling them. 

Reuse? Refuse! 9 minute rant on reuse done wrong

Growing pains

“Being a grown-up is basically googling things and being tired all the time” I read somewhere.  Yup, that pretty much sums it up.  At least after getting kids, I just never seem to have enough time.  I’m exhausted,  I’ve always got like 10 things going on at once.  And I’m not the only one.  Does it have to be this way? Don’t you sometimes wish you could just be bigger? That would be great, wouldn’t it? Like King Kong. That would make my life so much easier!  Arms 5 meters long, feet the size of walruses, a mouth big enough to bite the head of a fully grown manatee.

Wait… WAT? 

No, that obviously wouldn’t help at all. OK, I guess the extra size would mean I could just stomp on, or eat, anyone who imposes extra work on me.  But as for normal completion of the work at hand, added size gives no extra benefit whatsoever.  Less work might help.  More people might help.  Growing bigger, even if this were possible, will obviously not help at all.  It will likely make things a lot worse.  With your head towering five floors above, you will have trouble picking up on what’s actually going on closer to ground.  Preparing meals for your family will be a bit of a hassle if you don’t even fit in the kitchen.  Unless your goal is to intimidate or harm, growing bigger is not such a great deal.  

So why is this our go-to approach when scaling business and software?  In organisations: we start with a small team. A visionary manager - the head, an engineer - the arms and legs, a marketing person - the mouth. A well functioning organisation.  Everything is efficient and exciting. Then as more and more work comes in, we scale this structure. An engineer becomes an engineering department, the market/sales-person becomes the sales department, and so on. Before long the organisation has turned into a bureaucratic Godzilla. 

In software - we start with a simple view, some business logic and a persistence layer. A sensible division for a single purpose small application. Then as time goes on and more and more features are added, we keep and grow this structure.  Instead of dividing our code up by functionality delivered to users, our application’s package hierarchy still starts off with “views”, “controllers”, “models”,  and so on with few clues about what problem the application actually solves. If and when our monolith finally gets too big, we typically keep partitioning along the same lines.  We end up with a “frontend team” and a “backend team”, a “rules engine team” and so on.  You’ve gone from a giant Godzilla monolith to more of a Power Ranger Zord thing or whatever it is called (- you know when all the power rangers join forces and become this big awkward-looking super-creature?) I don’t know if this is that much of an improvement.

A lot has been said about Conway’s law, and what I’m talking about kind of relates to this principle.  But while Conway describes how (large) organisations end up creating software in their image, I’m talking about how small organisations and code bases grow into large dysfunctional bureaucracies. This is as big a problem.  Our default strategy for coping with more work, is to grow the existing structure, adding bureaucracy.  Instead of splitting the problem up and creating new distinct organisations to handle distinct problems. 

Everyone knows that large bureaucracies (in code or otherwise) tend to be slow and hard to work for/with.  Change is a nightmare.  Every change necessarily needs to fit into the whole in a way that doesn’t harm any of the other parts of the system.   Just thinking about a potential change can take months! Implementing it can take years, with costs in the millions.  If, on the other hand, your organisation is small and focused, only handling a limited number of tasks, it’s easier to reason about the effects any potential change will have.  Changes won’t impact other organisations, so decisions can be made locally and implemented immediately. No need to go up a long chain of management to get approval. 

Being a large bureaucracy isn’t all bad of course.  Losses in efficiency are mitigated by the fact that, as already mentioned, given their size, they can can just stomp on, or eat, any competition.  Thus obviating the need for improving performance.  Large bureaucracies also help fund business management consultancies - who would otherwise go bankrupt.  And they give jobs to large swaths of management who would otherwise be out of a job.  These are noble pursuits of course.  You can’t say an action is dumb until you know the intent behind it.  If the intent of growing your current organisational structure (of code or people) is to 
a) employ the maximum number of people and fund your buddies in the management consulting sector - or 
b) crush all competition by sheer size - or 
c) you find endless meetings and coordination fun and find quick paced development stressful -  then you might be on to something.  
But if you want efficiency, the ability to move fast, grab opportunities,  deliver exactly what your customers want, then you should take some time to think about how you’re growing.  

When a new requirement comes in for your software - don’t just blindly bolt it on to the existing application. Think about how this new feature will be used.  Will it be used by the same people at the same time?  Are you creating datastructures where only half of the attributes are in use in any given setting?  Are you reusing components in a way where everyone relies on component X, but they all use it in a different way?  This is a code smell. This smells of King Kong and the BFGs whizzpoppers.  Stop blindly growing your existing structure. Actively look for ways of creating new ones instead.  New ones that are free to evolve in whatever way they need to. Leaving you the ability to evolve what you’ve got in whatever way it needs to.  It might be easier in the short run to just keep adding features,  but it is likely to slow you down in the long run.  And speed is of the essence if you want your business to stay alive in today’s fast moving marketplace.  

If you’re good and fast enough you might just get lucky and get bought up by one of those large bureaucracies :-) 

Christin Gorman - OFFENTLIG IT V2.0

Norwegian 30 min presentation about improving public sector IT in Norway

5 approaches to concurrency in Java - which one is the best?: Christin Gorman, Eivind Barstad Waaler

Why and when blocking matters

So last year I worked on some Vertx.io code.  I’d never come across it before, but the documentation was great and it worked extremely well.  But I just couldn’t get myself to like using it.  I really disliked working with the asynchronous non-blocking APIs. They were so cumbersome to work with. In every way.  Like driving a Lamborghini - fast, great quality, but just not very easy to use.  In fact, downright impractical in many ways.  Do we really have to put ourself through this to get performant code, I thought to myself.  Then I read this post on concurrency by Joe Armstrong. Hah! I thought to myself. I’m not the only one who reacts to this stuff.  Why can’t we do concurrency like Erlang in the JVM?  Why does blocking code have to be so problematic?  I couldn’t find any texts that really explained it in a simple and satisfactory way.  But after lots of reading, thinking and experimenting I have finally sorted out my understanding of the subject and thought I’d share my findings.  In case there are others like me out there who feel confusion around the issue.  So here we go.

Let’s start at the beginning - bear with me.  You’ve got a CPU. It can process one instruction at a time. If you want multiple tasks running simultaneously, parallell execution, well, you can’t.  However, since each task consists of a list of instructions, the CPU can interleave instructions from one task to another, giving the impression of simultaneous execution, at the cost of some reduction of performance; concurrent execution. The instructions for each task is executed on what we call a thread.  The CPU scheduler allocates time for each thread according to various rules so each one progresses in a more or less predictable manner.  Sometimes the instruction in a thread is simply: “wait for a response coming out of the end of this socket”. The thread is blocked, we say then.  There is nothing for the CPU to do so the CPU simply moves on to the next thread immediately.  Blocked threads do not block the CPU in any way - it always keeps moving. There is no blocking. I repeat: there is no blocking.  Blocking is simply a mechanism to signal that the CPU can move along to the next task. 

for (int i = 0; i < numThreads; i++) {
   new Thread(()->Thread.sleep(1000)).start(); 
}

(Yes, yes, yes, I know that Thread.sleep throws an InterruptedException, so this won’t compile, but let’s just, for the sake of readable code examples, pretend it doesn’t, OK?)


Thread.sleep is a blocking call. But that doesn’t matter at all for the performance of this code.  This code will complete in 1 second whether numThreads = 1 or 1000.  

So why are we spending so much time talking about non blocking code? If the CPU is never blocked anyway - why is blocking a thread such a big deal? Well it isn’t, as long as you don’t have too many concurrently running tasks. If your web server (for instance) only ever has 500 concurrent users, you can have a thread pool with 500 threads, and you won’t ever have a problem with the threads blocking. True story.

But what if you have 10 000 concurrent users? Now you might have a problem. But the problem is _not_ that your threads are blocking. Blocking threads are not the problem.  The problem is that you can’t have that many threads.  Try the sleep code example above with numThreads = 10 000. You won’t see a performance degradation, you’ll see an OutOfMemoryException.  Each thread takes up a considerable amount of system resources.  If your concurrency strategy is based on threads, you are limited by the number of threads your system can handle. 

Basically, your OS has implemented a concurrency mechanism that does not scale.  Why can the Erlang VM scale their concurrency mechanism when the OS can’t? I honestly don’t know enough about it to answer, but that’s the reality.  For whatever reason, your OS sucks at concurrency.  If you want scalable concurrency, you’re going to have to either choose a framework/library that helps you (like Vertx.io) or a VM that does (like BEAM - the Erlang VM).  But before you do, check what the limits are for your application and your servers.  If you’ve got a blocking java implementation that works and the number of concurrent operations do not exceed the amount of threads your servers can handle, you’re sorted. There is no need for you to worry about blocking threads, ever. Don’t have to solve problems you don’t have.

But if you’ve only got one server, or a limited amount of them, and need to scale your application to handle more than a couple of thousand of simultaneous blocking operations, you need to base your concurrency strategy on something other than threads. You’ll still be using threads at a lower level, but at runtime you’ll need each thread to handle multiple tasks.  _This_  here and only here is where blocking becomes a problem.  If each thread is responsible for completing, say, 1000 tasks and one of them blocks execution - this means none of the other 999 are getting _any_ work done.  Blocking one task means blocking them all. Obviously blocking is disastrous in this case.

So what do you do in practice? There are a few different approaches available out there to deal with this. One is the “single threaded event loop”. Most famously used in Node.js. On the JVM, the already mentioned Vertx framework uses this strategy. Put simply, your application (like the CPU scheduler) keeps a list tasks - units of runnable code, that are executed one by one in a single thread. If you need anything done, you just add that operation to the list and it will be executed in turn.  The tasks don’t take up much memory, so you can have practically any number of them. As opposed to the CPU scheduler- which can stop threads at will to switch execution over to the next thread waiting in line, the single threaded event loop system completes each task before moving on to the next one. One at a time they get completed as fast as can be. (UNLESS of course one of them blocks the thread.)  So how do you query a database? Or call a remote service? At the end of the day, if your program needs data from the database, it’s going to have to wait for that data to return. Something in your code has to be listening for the response back from the database.  At some level the application is “blocked” while it’s waiting for the data. At some level a socket is opened, and something is waiting for data to come out of it. What is doing the waiting for you if you’re not allowed to block the thread?

Let’s take a trivial web example. Starting with a traditional blocking implementation.  A servlet that queries some remote service for data before returning the result:

public Response getSomething(Request parameters) {
    validate(parameters) //throws exception if invalid
    String id = parameters.getParameter(“id”)
    Person person = getPersonFromRemoteLocation(id)
    return transformObjectSomehow(person)
}

Here, the “getPersonFromRemoteLocation” implementation will most likely open a socket, connect to a remote socket, send a request and block the thread until the response has been completely received. With non-blocking IO, you’ll still be opening sockets like before, still sending data over them like before, and still receiving data from them like before - it’s just that you do it all in small increments.  For the example above to work, you need, conseptually, to turn your one code block with consecutive operations, into 3 separate tasks.
First you set up your request, open a socket and send it.  

(None of this is “real” code, I’m just trying to illustrate the process in a simple way.)

getSomething(Request parameters) {
    validate(parameters)
    String id = parameters.getParameter(“id”)
    SomeConnectionContext ctx = sendDataOnSocket(remoteAddress, id)
    addTask(isResponseComplete(ctx))
}

This does not block. The last method is void.  You’re just adding the next step of the operation - the polling for data off the socket - to the application’s list of tasks.

The polling task could look something like this.  It reads data from the socket and checks whether it is complete yet.

isResponseComplete(SomeConnectionContext ctx){
  SomeConnectionContext updated = ctx.pollForMoreData()
  if (updated.isComplete()) {
    addTask(resultHandler(updated.completedResponse))
  } else {
    addTask(isResponseComplete(updated))
  }
}

Polling the socket for data is not blocking - you just get all the data available so far, check if it’s complete - if not, keep it and try again.  If it is complete, call the result handler with the completed data.

Finally you’ve got the code that handles the result.

resultHandler(Person p){
   return translateToMyDomain(p);
}

Each of the three bits of code can be completed without blocking. Each bit can easily be interleaved with other tasks. Your thread will always be busy, always performing meaningful work.

Performance-wise this works great. But as mentioned already, there is a major drawback. The code becomes a lot more complex.  Instead of having one easy to read list of instructions, you’ve now got to split everything up.  Testing becomes more cumbersome, understanding what’s going on is more difficult, if you forget yourself and make a blocking call - which is really easy to do given that most of today’s libraries and framework are based on blocking calls - you’re undoing all the benefits of the async framework. Finally, depending on what kind of remote service you’re calling, it might not even give you a real performance benefit.  If you’re calling a database that only can do 500 queries at a time. It doesn’t really help if your web application can do 1 000 000. The throughput of your system is only as fast as the slowest part of it.  At the end of the day, you need to way the pros and cons. Sometimes you really need the speed of a sports car, you really need a Lamborghini - like… when… hmmm… Damn it, I can’t think of a single reason you’d actually need a Lamborghini.  But let’s just say there is one.  For those cases, you just have to live with the inconveniences they come with.  

There are other concurrency approaches out there for the JVM.  I’ve already written a tiny bit about Quasar in a previous post.  Instead of providing library-style mechanisms for concurrency like Vertx.io does, they are trying to solve the problem at more of a VM-level.  Remember how blocking is not the problem when executing thousands of concurrent tasks? The problem is your limited supply of/room for threads. Quasar attempts to remove _this_ problem. “What if we could just have an endless supply of threads?”  With Quasar you write your code as before. With blocking code. But instead of running your code on a thread, it is run on a “fiber”.  The Fiber class has pretty much the same API as the Thread class and a lot of common libraries have been rewritten to use fibers instead of threads. In an application using fibers, your code gets instrumented with the quasar agent before it starts up. At each blocking call, the Quasar system, like the CPU scheduler, will switch execution over to another fiber.  The result is that you can write code that looks more or less like before.  You can have a function that constructs a query, calls the database, and then processes the results all written with instructions following eachother one line after another. But at runtime, the thread running the first part of the function will not be the same as the thread running the end of the function (just like .Net’s async-await framework). So ThreadLocals and any other mechanism depending on the OS thread need to be avoided. 

I don’t have any experience with running Quasar in production, I’ve just been playing with it a little at home. Trying to get it to beat vertx. Because I really wanted it to be better than vertx.  But I can’t beat Vertx performance.  In addition to this I keep messing up and getting error messages and things crash.  Now I know full well that this may be because I suck, that I’m doing something wrong.  I have no doubt that’s the case.  But this still doesn’t change the fact that I got Vertx to perform astoundingly well without fail “straight out of the box”.  The documentation is great and it just works.  I still really hate the syntax, but I just have to admit that everything else is awesome.

So what’s my conclusion? Well you know how we used to do all sorts of weird complex stuff to optimise for limited memory? We don’t any more, because we pretty much universally have come to the conclusion that the benefits of simple code, far outweighs the drawbacks of using more memory. Especially since memory is so cheap anyway now a days.  I hope some day we’ll see non blocking code as another quaint example of how we used to have to work around limits in our coding environment.  At the end of the day - our applications, the tasks it performs, what the user sees, it will always block. When you clicked the link to this blog post, you had to wait for it to render on your screen.  The code responsible for handling a task should resemble that task as closely as possible for it to be easy to understand and work with.  Blocking code is by far the easiest to understand and thus the easiest to keep correct.  
So… what concurrency approach do I recommend right now then? Should we go with Vertx despite annoying api? Should we switch to fibers? Or should we just use lots of load balanced traditional blocking web servers? Well for me the answer is easy: none of the above. Screw the JVM, I’m going with BEAM - I’m going to learn Erlang, Elixir, or LFE (lisp (flavoured (erlang))) 

May 5

Workers idling by the coffee machine

Some times you just can’t get everything done with your own employees, you need to get outside help. Contractors. Consultants. Your office space puts a limit on how many of them you can have at any time.  As they can be expensive, once they are on premise you should ensure that they can produce as much value as possible. You don’t want them sitting around drinking coffee while waiting for decisions to be made.  Every minute they are billing you should be spent producing value for you.  How do you achieve maximum contractor utilization? Multitasking of course! We all love multitasking, don’t we? Ensure that there is ALWAYS some work to do. While a contractor is waiting for an answer to a question, she can pick up another task. When the answer comes in, the work can be picked up again by whichever consultant is free. Any one of them. If you limit a contractor’s work to just one task, there will be wasted time.  If you have them share the tasks between them, you can maximize their utility. Will they get exhausted and confused by all the different tasks? Who cares, they are contractors :-) To some degree we all do a bit of multitasking of course.  The “agile” idea of cross functional teams where anyone can do any task is to a large extent built on this idea.  It can work well up to a certain point. But there is a real problem with this approach - it can be much harder for you to keep a good overview and understanding of what exactly is going on. The big picture.  Who is working on task A? Who can tell me how work on task B has progressed this last month? These questions become increasingly difficult to figure out the more people have been involved and the more the tasks are broken into tiny fragments.

I am of course talking about concurrent programming and threads.  Threads are expensive and we don’t have room for all that many of them, we don’t want them to sit idle and wait for IO. Writing blocking code, means threads sitting idle waiting.  This is wasteful.  Writing non-blocking code however, while being efficient, all to often leads to code that is very hard to understand and work with.  The value the program is delivering -
 1) receiving a request from a user
 2) validating it
 3) retrieving values from external services
 4) merging them with the user input
 5) storing it all in a database
 6) returning the result back to the user
These kinds of flow, have to happen in sequence, and they are hard to follow if everything is chunked up into small bits all handled by random threads with intermediate results put on queues and sendt back and forth. This:

Result submitApplication(UserInput submission) {
    validate(submission);
    Address address = retrieveAddressFor(submission.userId);
    ContactDetails contact = retrieveContactDetailsFor(submission.userId);
    CompletedApplication application = CompletedApplication
        .withAddress(address)
        .withContactDetails(contact)
        .withUserInput(submission);

    return database.store(application);
}

Having it all there, in sequence, is a lot easier to understand than:


void submitApplication(UserInput submission) {
    validationQueue.offer(submission);
}

Where the hell did the data go? What happens next? How do I test this? What are the consequences of me putting this data on the queue. It’s not obvious.

Or this monstrosity:

void submitApplication(UserInput submission) {
    validate(submission, ((result) -> {
        if (!result.failed) {
            retrieveAddressFor(submission.userId, (addressResult) -> {
                if (!addressResult.failed) {
                    retrieveContactDetailsFor(submission.userId, (contactResult) -> {
                        if (!contactResult.failed) {
                            database.store(CompletedApplication
                              .withAddress(addressResult.data)
                              .withContactDetails(contactResult.data)
                              .withUserInput(submission)), (dbresult) -> {
                                  somehow find whatever context this request originated
                                  in and send the result back to the user
                              }
                        }
                    }
                }
           }
       }
    });
}

It is perfectly possible to write decent applications where threads multitask efficiently, but these non-blocking approaches quickly get confusing and are often implemented poorly.  

There must be another way. There is another way. There are several. Cloud computing, load balancing and so forth can provide one answer. We can keep our simple blocking implementations and simply have more of them running. This way, the fact that one process is a bit wasteful doesn’t matter as much, since we can just add more of them to get the performance we need. The cost of programming resources are dropping steadily, so this is a perfectly viable option in many cases.  Moore’s law has meant that the value of optimization in code has been steadily decreasing. It is often both cheaper and better to solve inefficiencies with hardware, rather than software.  

But there are other ways. Why are we relying on “expensive external contractors”, OS threads, at all? Do we have to? What if we solved that problem? That’s what Erlang has done. And that’s what the guys at Parallel Universe are trying to do with Quasar. They make their own threads, lightweight threads. Quasar calls them Fibers.  We can’t get away from using OS threads at a lower level, but we can abstract them away in our programming environment. Fibers are cheap, you can make as many as you want, millions.  So there is no nead to worry about any of them “idling by the coffee machine”. You can write blocking, simple straight forward code, without any of the drawbacks. Lightweight threads are like having an unlimited supply of employees at a fixed total cost. Like outsourcing your programming to a low cost country. …Except they speak your language, understand your domain and culture, share your goals and operate in your own timezone. So… quite different from outsourcing, when it comes down to it.  More like outsourcing to robots - small, cheap robots that do your work for free.

Quasar Fibers have pretty much the same interface as Threads, so you can port your thread blocking code to Fiber blocking code with very little effort. The Comsat project has already provided fiber-based implementations to a whole range of technologies.  I think this looks pretty awesome! Why hadn’t I heard about this before?  I’m going to have to look into this in more detail.  I’ll get back to you with my findings in not long hopefully :-)