Increasing the speed of your DevOps teams

by Carl Weller on 28/03/2019 10:00

This is the third post in a series exploring Lean thinking and DevOps. In this post I cover one of the reasons work takes longer than it should.

For many years 'multi-tasking' has been touted as a virtue. We all do it, at an individual level throughout the day, and organisationally by assigning people to a mix of Business as Usual (BAU) and project work. Most organisations I work with have dozens (and in some cases hundreds) of projects on the go at any one time.

But there's a problem. Multi-tasking isn't good, and we generally don’t truly multi-task anyway. What people generally do is move items from "waiting" to "in progress" while they are focused on the task, and then back to "waiting" as they pick something else up. This tends to make delivery of any one item longer than it would if we worked in a sequential rather than parallel fashion.

If we take the example of three simple tasks, each five days long, we can deliver them sequentially or in parallel. In series these items will be delivered every five days, so that’s a delivery on the 5th, 10th and 15th day respectively.

Three simple tasks being delivered sequentially

The worst case delivering them in parallel (i.e. 'multi-tasking') will result in delivering each item very close to the 15-day mark. And that's if we assume perfect efficiency and no productivity loss from context-switching. This is illustrated below.

Three simple tasks being delivered in parallel (multi-tasking)

In reality, each task may be delivered somewhat late, but in no case will two tasks be delivered on or before the 10th day.

This logic scales up and down your organisation – just as multi-tasking slows down one person with three simple tasks to do, multi-tasking with larger tasks or projects has the same issues at an organisational level.

Multi-tasking is a way of ensuring high utilisation when there are variable waits between work items due to dependencies. I'd say it is a hangover from 20th Century manufacturing techniques, except most successful manufacturers either use statistical process control to remove sources of variation or use some form of synchronising mechanism to manage small variations (e.g. Eli Goldratt's Theory of Constraints or Toyota's Just-in-Time process).

By trying to do more you are going slower. This was proven by John Little in the early 1960s and is known in operations management and Agile circles as "Little's Law".

Little's Law is foundational in queuing theory and is used in the allocation of toll-booths on new highways, in designing computer chips, in fact anything where there is a flow of incoming work, one or more in-progress states, and a way for work to leave the 'system'. For those who are interested in researching this further here's a useful paper by Mr Little himself. Be aware it does get a little bit geeky in places.

An easy way to think about this is to visualise water flowing through a pipe. The most efficient process would be one where the water is entering and departing the pipe at the same rate (i.e. very low 'work in progress').

Visualisation of water flowing through a pipe as a metaphor for work in progress

However, what typically happens in knowledge working organisations is that we allow more work to enter the system than is leaving. One of the reasons for this is we have no useful ways of seeing how full the work system is. Most of the work in sitting in people's heads (or on hard-drives). Its not like an airport where, if more planes arrived than departed, you would start to see issues very quickly and then act to resolve them. Let's imagine we add a storage tank to our pipe to show where work sits while its waiting to be done.

Throughput, work in, work in progress and work out

Using Little's Law we know the throughput (2 litres per minute) and we know the work in progress (10 litres). From this we can calculate that on average every work item (why not think of them as cups of water?) will take 5 minutes. The formula here is that average completion time = work in progress divided by throughput.

If we allow even more work to enter the system then we need a bigger storage tank. Let's say we allow another 10 litres of water in (even though we are still only getting through 2 litres of work every minute).

If we allow more work to enter the system we have more work in progress

Once the work system stabilises again, we can re-calculate average time to completion (Little's Law assumes a stable work system). In this case the average time to completion is now 10 minutes. That is: 20 litres divided by 2 litres per minute. The worst thing is, many organisations find that things start taking longer so they start them earlier! This adds more work in progress and makes the problem worse rather than better.

So, the more work in progress you have the longer everything takes. Now think about your in-flight projects. Each one of those is not a matter of minutes.

If we stop any new work entering the system until work in progress is down to 5 litres, then average time to completion will drop to 2.5 minutes. We are now much more responsive. If these are short projects instead of cups of water, it would give us the opportunity to quickly change direction as required by our customers.

Reducing work in makes the team ore responsive

And the most magical thing about this? Allowing work to enter the system is a policy decision. It is much easier to make such a management call than it is for people to learn how to work twice as fast (i.e. change the "work out" volume to 4 litres per minute).

Turning off the "on tap" is the fastest way to reduce work in progress and start seeing the average time to completion drop. If this is too much, then try to reduce incoming work, but remember, for work in progress to go down more work has to be leaving the system than is entering the system.

So how do I start implementing this approach?

Well, my first post in this series, Introducing Lean thinking to DevOps, has information on creating a pull system and using work in progress limits. My second post in this series, How Lean can help DevOps teams be more responsive, covers how you might use a priority-setting meeting to continually replenish the input area of a team board.

Both posts use visual workflow boards. These can be physical or virtual (e.g. Trello, Jira, Azure DevOps). To be quite frank, a virtual board can vanish from sight on a hard-drive pretty easily and simply become a team workflow management tool rather than a highly visual way of communicating with stakeholders, so I prefer physical boards.

If you need to run a virtual board for offsite team members, please consider using a physical board as well. It will provide the level of transparency you need to get your work system under control. It will also help discussions with stakeholders about adding more work. Try having those conversations in front of your physical work board!

You will need to use some political capital to get stakeholders to try this approach as it is counter to the way knowledge working organisations tend to think about work.

In a future post I'll look at value adding and non-value adding work.

If you like these posts and/or would like to know more, please use the comments section below.


Get blog posts by email

New call-to-action
New call-to-action