Defect management used to be a very well defined, well understood and tightly controlled, all be-it wasteful, process. However, in the world of Agile delivery we drive accountability for the overall process of delivering working software to the delivery team as a whole and not a lot is said about how, specifically, the team should operate.
This lack of clarity and therefore certainty crops up again and again when I’m teaching people about the various Agile methodologies that exist so I thought it would be worth putting down answers to some of the more common questions. The first of which is: “How do you run defect management on Agile projects?”
I’m going to cover three different scenarios. Whether you encounter these scenarios or not in your situation depends upon how widely and deeply your team has adopted Agile principles and processes. Lots of teams have very different processes because of how they are constrained by the wider organisation they are working within but hopefully, whatever the wider context, there should be something here for you.
Scenario 1 – Work in progress defects
Scenario: A user story is in the process of being delivered and while being tested internally in the team a defect is discovered.
The first scenario is what I’d expect to be the norm for most Agile delivery teams running any methodology that is delivering new features.
In this scenario the user story should go back to the developer to resolve the defect and then be re-tested until the defect is fixed. There’s often a debate here around whether the defect should be recorded in some way. In short, it depends. There is value to the team in knowing how much waste is occurring in the feedback loop between development and test so that it can be reduced but that’s the only real value in recording the defect. So, unless the team is using those metrics or is reliant upon the information that would be recorded along with the ticket for the purposes of resolving the defect, I wouldn’t bother.
When you encounter this type of defect I recommend that you don’t re-estimate the size of the story, the number story points for example. Your user story shouldn't be marked as "Done" because it’s not finished, because it doesn’t work, which has been identified by the fact that there’s a defect.
If you look at the metrics after the event, you will see two possibilities, firstly that the user story appears to have been significantly underestimated or secondly that the team velocity has significantly dropped. Root cause analysis would then likely ratify the latter conclusion, highlighting a possible process problem as there shouldn’t have been a significant defect in the first place. This is something that should be queried and addressed in the next retrospective at the latest.
Scenario 2 – A 'deferred' defect
Scenario: A user story is in the process of being delivered and for historical reasons or because it’s not pragmatic to fix all defects found during development, the story is considered “Done” with known defects being deferred.
In this instance, a new user story should be raised to go on the backlog representing the defect. The theory is that if a defect has been 'deferred' then in reality the defect represents a sub-feature of the story that hasn’t been fully implemented. In effect, you've chosen to defer implementation of a specific acceptance criteria or sub-feature after testing of requirements identified the defect.
Sometimes the specific requirement or acceptance criteria against which the defect was raised may not have been clearly identified in the story in the first place but they've been defined as expected requirements despite this during testing. In this case these are still defects, just ones with the requirements specification rather than the implementation of the requirements. In waterfall projects these would likely have then become Change Requests and gone through a formal process to update the scope. Thankfully that's not a problem we have to contend with on an Agile project.
In either case the defect becomes another user story that is sized and prioritised along with everything else.
From a metrics and analysis stand point, these defects should definitely be measured and assessed as they are a clear indicator of process problems resulting in a reduction in quality, even if that is intentional.
Scenario 3 – Defects occurring during regression testing
Scenario: An application is going through separate regression testing or some other 'later phase' of testing and a defect is discovered.
Note that the important distinction between this scenario and scenario 1 is that the testing is decoupled from the implementation of the specific user story, either because the testing is ad-hoc after a user-story is considered "Done" by the team or because for whatever reason certain types of testing have been deferred until after implementation. This is common for security testing or performance testing but sometimes also includes regression testing, even in organisations with a relatively mature Agile adoption.
These defects are effectively technical debt and there is more than one way of dealing with them. Most teams don’t record user stories or anything for these defects – they are 'down tools' moments and teams swarm to resolve them so that flow can be maintained. This is the same response as that which you would expect in a 'broken build' event for teams running Continuous Integration. The overall effect is noticed as a reduction in team velocity which should then be discussed at the next retrospective at the latest.
The other option is to record and defer fixing these defects, however, this becomes a snowballing problem which will result in an iterative delivery system very quickly turning in to a waterfall one, so in my opinion this should be avoided at all costs.
Tl;dr: Summary of scenarios
So in summary, Definitions of Done should cover quality levels and therefore most defects should be dealt with under Scenario 1. In this scenario the developer should fix the defect immediately and track it only if the metrics are used or the recording is required for communication. Don’t re-estimate stories as this just disguises the problem as one of poor estimation, get better at improving your quality instead.
If you’re deferring defects in one form or other (Scenarios 2 and 3), then measure this as waste in the system. This, like those defects found outside of the delivery cadence, is an indicator of process issues resulting in quality and productivity problems. If these root process problems aren’t addressed then these will start to erode the team’s ability to do iterative delivery. In a wider IT department’s context this then becomes a driver for creating separate 'support' or 'BAU' teams so if you’re aiming for DevOps, this is a particularly important thing to look out for.
Defect management should be more disciplined on Agile projects
Hopefully it’s become clear in this post that just because you’re part of, or setting up, an Agile delivery team, that doesn’t mean that you should be any less explicit or disciplined in the implementation of processes surrounding defect management. If anything, it’s actually even more important than with a Waterfall project because of the impacts it can have. The significant difference of course is that the team itself is defining that process rather than one being imposed on them, though a little guidance wouldn’t go amiss!
Just because the Scrum Guide doesn’t tell you what to do, doesn’t mean you should do nothing. We’re all professionals and we’ve done it all before one way or another, so in the absence of specific guidance, start with what you’re used to and then work out how to make the process more efficient. It’s all about continuous improvement after all.
If you like this post or want to talk about it then feel free to leave a comment, or click the social media share buttons at the top to let me know that I should keep writing more. Thanks!
Ray Cooke is a Development Manager and a Lean & Agile Business Transformation Coach, based in Equinox IT's Wellington office.