Equinox IT Blog

Conquering Excel to build a Monte Carlo simulation for Agile project estimation

Conquering Excel to build a Monte Carlo simulation for Agile project estimation

Monte Carlo simulations can be used on Agile software development projects to manage uncertainty and reduce risk when determining the likely time cost and time estimates for the project. Monte Carlo simulations involve mathematical modelling and this is normally performed in Microsoft Excel or similar spreadsheet software.

I recently had the joy of building a fairly detailed Monte Carlo simulation using Microsoft Excel to model the expected costs of an Agile project for one of my clients. Along the way I learned a few things about Excel. So, if you are looking to build a Monte Carlo simulation for an Agile project and you’re wrestling with Microsoft Excel, then this article may help you avoid some of the challenges and pitfalls that I encountered.

My Monte Carlo scenario

My Monte Carlo simulation was used to better visualise the delivery risk and associated costs for the client. A number of the parameters of the cost model were included as probability distributions and then a random number was simulated within those distributions for the purposes of including within the overall cost.

How good is the Excel RAND() function at generating random numbers?

The first problem I faced was generating random numbers within the distributions for the simulation. Honestly I’m no statistics expert but there was enough material online to give me concern - see McCullough's paper On the accuracy of statistical procedures in Microsoft Excel 2007. However, this does appear to have been addressed since Excel 2010 as per Microsoft’s Excel release notes, although I couldn’t find any additional information about it. I decided to give it a go with the standard Excel RAND() function knowing that I would need to look at the results and empirically assess whether there actually was a problem or not after the event.

When does the worksheet_calculate method get called?

Part of the cost model implemented a facet of Brooks’ Law to simulate the overhead of adding individuals to teams and multiple teams to projects in an attempt to deliver in a shorter timeframe. Because numbers of people and teams are discrete, however, I couldn’t ‘just’ resolve the simultaneous equations this resulted in analytically. Instead I needed to implement the equivalent of a goal-seeking function as a simple method in Visual Basic for Applications (VBA).

My first problem was that I didn’t know where best to do this. The Worksheet_calculate method seemed like an obvious place but I didn’t know whether it was called at the right time to reference all the information I needed from the spreadsheet or not. To save you the time trying to work this out, I can confirm that worksheet_calculate is called AFTER Excel has finished executing all of the calculations in the spreadsheet.

Why do I get an “Out of Stack Space” error from my worksheet_calculate method?

So having determined that the Worksheet_calculate was the right place to implement my method, I trialled it and immediately started seeing some very peculiar behaviour.

Firstly I got an ‘Out of Stack Space’ error. It turns out that this is because at the end of the calculation I was performing in the worksheet_calculate method I was, unsurprisingly, writing out my result to a cell on the spreadsheet. This resulting value was then used elsewhere, and this triggered another calculation chain call in Excel, which in turn resulted in my worksheet_calculate method being called again, ad infinitum.

I resolved this by disabling and then re-enabling Application Events at the beginning and end of the method, like this:

Private Sub Worksheet_Calculate()
    Application.EnableEvents = False

    result = 1 + 1
    Sheet1.Range("A1").Value = result

    Application.EnableEvents = True
End Sub

It’s worth noting that you should be careful to turn events back on again, especially in error scenarios, as there is no way of doing this in Excel outside of VBA, with the exception of restarting Excel again as a whole.

It’s important to understand what this does because it might not be appropriate for your scenario. By disabling Application Events, the event that would result in the worksheet_calculate method being called is never created. The sequence in which this plays out is therefore:

  1. A cell used in a calculation is changed on the worksheet
  2. The worksheet is automatically recalculated by Excel
  3. Excel generates an event to say that the calculation has finished
  4. This event results in the worksheet_calculate method being called
  5. Our worksheet_calculate method disables events and then does some maths and outputs the result to a cell which is required by another calculation in the sheet
  6. Excel automatically recalculates the worksheet
    1. This all happens before our worksheet_calculate method finishes executing the write of output to a cell and so events are still disabled
    2. Because events are still disabled, the worksheet_calculate method is NOT called at the end of the calculation chain following our output being written to cell A1.
  7. Now our worksheet_calculate method is allowed to continue and re-enables events.

In my scenario it was okay to use this approach to dealing with the problem because the output of my calculation was not used, indirectly or otherwise, to affect the input values upon which my calculation was based so it wouldn’t need to be recalculated.

The result of my worksheet_calculate method doesn’t relate to the input values on the spreadsheet!

Now the worksheet_calculate method was completing execution and not crashing Excel by running out of stack space I could start looking at and testing the results. It became obvious pretty quickly that the output was wrong. I spent a while rechecking my method and the calculations and everything seemed to check out, so what was going wrong?

I eventually discovered that the values that my method was using as inputs did not marry up with the values in the cells on the spreadsheet where it was getting them from. After some more digging I eventually discovered one of the problems with what Microsoft calls Volatile functions in Excel.

What is a Volatile function in Excel and why do I care?

Excel has a small list of what it calls Volatile functions. These look like any other function but they behave subtly differently. Generally an Excel function is executed only when the inputs to it change which is done as part of the Calculation Chain. Volatile functions however will re-execute whenever anything changes anywhere. RAND() is one such Volatile function. If you have =RAND() as a formula in a cell you will notice that if you change any other cell on the spreadsheet, a new random number will be generated.

Normally this behaviour is exactly what I want, however, some of you may have already realised why this was a headache for me in this scenario. The problem is with an assumption I made earlier about disabling Application Events. The assumption was:

“In my scenario it was OK to use this approach to dealing with the problem because the output of my calculation was not used, indirectly or otherwise, to affect the input values upon which my calculation was based so it wouldn’t need to be recalculated.”

As I worded it above, I was and still am correct. In writing out the result of my calculation to the spreadsheet I was triggering another recalculation of the cells that relied on the output. Unfortunately, a side effect of that recalculation was that every Volatile function in the Workbook was re-executed which meant that the RAND() function generated a new random number, which was then eventually used to calculate one of the inputs to my worksheet_calculate method. Since I had disabled Application Events in my method however, it didn’t get called again as a result of the inputs changing so the old result remained written on the spreadsheet but with new input values specified. Disaster!

Writing your own User Defined Function (UDF) in Excel

Taking a step back from the problem I realised that in my case I didn’t actually want RAND() to regenerate random numbers all the time. I only wanted it to be ‘volatile’ just before each iteration of my simulation. The solution was therefore to create a non-volatile random number generating function.

Excel allows you to define your own functions by creating them in a VBA module. There is plenty of material on the web to explain how to do this once you know that you should search for User Defined Function (UDF) so I won’t bother elaborating on how to do it here.

The function I defined was simply looking up a random number from a table of random numbers defined in another spreadsheet. To ensure that each instance of a random number was different, one of the parameters to my function was an ID number that correlated with a specific random number from the table. This way I could ensure that the same number was used where it needed to be used multiple times in a single equation. It also meant that I could ensure that every separate probability distribution had its own random number.

To simplify my own life I decided to lazy populate the random number table whenever my function came across an ID that wasn’t already in the table.

Why doesn’t my User Defined Function seem to do anything?

I soon discovered that lazy population wasn’t an option. Excel is very specific about what a UDF is and is not allowed to do, and one of the things that it is definitely NOT allowed to do is to change the spreadsheet in any way. I even briefly tried to work around this by storing my random numbers using the Application.ExecuteExcel4Macro method to store them in memory but for the record, that’s also not allowed. Slightly irritatingly though, Excel doesn’t tell you this at the time. You will find that your UDF simply stops executing on the first line of code that violates one of these rules and returns the #VALUE error in the cell.

Why don’t my calculations use my new random numbers?

Finally, having created a very simple UDF that reads a number from a table and writing a separate function called by my Monte Carlo simulation code to re-generate all the random numbers in the table I was good to go and I hit my Simulate button. And I got absolutely no distribution of numbers whatsoever. I got a single point from my simulation over and over and over again. The reason was that although Excel re-executes formulas when the values upon which they rely change, this doesn’t apply to UDFs. Re-generating the table of random numbers did not trigger any of the cells using my UDF to re-execute and therefore they were still using the same random number they had returned the last time they were executed, even though the cell that the random number itself existed in had changed.

The final change I had to make therefore was to make my UDF Volatile. You can do this by simply putting Application.Volatile at the beginning of your function. Don’t worry, despite scare-mongering about Volatile functions earlier, it’s ok here because the result of the function executing won’t change because it’s simply a look up to a table. It would only change if the contents of the table changes and that will only happen when we intentionally re-generate the values.

Good luck

And that’s it. I’ve developed a new love-hate relationship with Excel as a result of this work but hopefully this article will save you some of the hate!

If you other experiences with modelling Monte Carlo simulations in Excel, then feel free to add these in the comments below.

Good luck with your Monte Carlo simulations and with using Excel for Agile project estimations.

Recorded webinar: Using agile techniques to manage risk more effectively

Subscribe by email