Topic 2

Recovering From Mistakes

Jubal Rife - 8 July 2018

An important and ever present part in the life of a software developer is dealing with mistakes. How we respond to our mistakes and the mistakes of others is important. Properly responding to mistakes can create value that will otherwise go to waste. Improperly responding to mistakes can result in a hostile environment that discourages openness and progress.

There is a reasonable set of actions that can be taken in the face of a mistake

  1. Make the mistake known
  2. Be Kind to the one who made the mistake
  3. Correct the mistake Quickly
  4. Attempt to prevent the mistake in the future
  5. Reset before continuing

The goal of these actions is to reduce harm to the team, the product, and the users while extracting as much value from a mistake as is possible.

1 Make the Mistake known

Mistakes vary wildly in severity from simple mistakes that may result in a loss of face or mild annoyance to mistakes which may be damaging to the team as a whole or the core product experience. Regardless of the severity of the mistake, it is important to share what has happened with someone else. There are multiple reasons behind doing this. When a problem shared is a problem that is less likely to happen again. Awareness is one of the most powerful tools against mistakes. Another reason is that a solution will likely be more quickly created with more minds on the task; indeed, sharing may make an otherwise unfixable mistake fixable. If a mistake is invisible, there may be a temptation to hide it. This would be unprofessional and should be avoided. A hidden mistake is a trap for yourself and others. Sharing mistakes is the professional thing to do.

2 Be Kind to the one who made the mistake

For a team to function, learn, and grow, there must be a safe environment for the passage of information. All failures are opportunities for growth. It will damage the team’s ability to learn and grow to belittle people for making mistakes: This includes yourself. Treat mistakes as a tool for moving the team forwards rather than a tool for beating yourself or others about the head. If you work in an environment which is not comfortable sharing mistakes either due to fear of ridicule or punishment, it is more likely that the team will suffer when a mistake that cannot be corrected alone is made.

3 Correct the Mistake Quickly

If a mistake can be corrected quickly, it should be corrected quickly. This will reduce the total damage done by the mistake. If you can reduce the damage a mistake has caused, then you can maximize the benefits of having made a mistake without paying as much of a price.

4 Attempt to Prevent the mistake in the future

Code-based mistakes are generally prevented from happening again by creating automated tests to verify that a behavior is doing the correct thing. I would recommend this for most code problems. Not all problems can be fixed with an automated solution; indeed, some problems have nothing to do with code whatsoever. For such problems, it is important to be creative. If a solution is impossible within the current domain, expand the domain. Perhaps the problem can be fixed with convention or procedure farther out, or perhaps training and increased awareness. If there is no prevention solution, then perhaps a contingency plan could be implemented. Recognize the failure and devise a way to quickly recover.

By creating a way to prevent or quickly respond to a mistake, as a team you gain resilience. While the burden of a mistake rests on the one who made the mistake, the burden of the response to a mistake rests entirely on the team. If someone in the team deleted half of the database, you’d better have a backup of the database. If you don’t, then you’d better have a backup of the database next time someone deletes half of the database.

5 Reset Before Continuing

This is a step that is easily missed because it is not required for you to continue functioning. Depending on the stress level of the experience you just had, and the response of other team members to a mistake, you may be in a fight-or-flight response mode. Fight-or-flight is a state which is not well suited to critically solving abstract problems; it is a state which is suited to physically responding to a threat. We are more likely to clash with team members or make additional mistakes when in this state. If it is possible, it is a good idea to take some time to reset yourself before returning to work after dealing with a stressful situation. Do some light exercise or have a snack. Take a walk or just move away from your computer and sit peacefully for a bit. When your body has returned to a normal state, then you can return to work. In this way you will avoid a snowballing of mistakes. This is especially important if the mistake occurred during a stressful operation which must still be completed. Unfortunately some mistakes are urgent enough that they do not allow us to rest before they are completed. For these mistakes, it is especially important to have people you can trust and rely on. With more eyes on the problem, you can alleviate some of the problems caused by being flustered.

Following these guidelines will encourage a safe environment for quickly and effectively responding to mistakes. Ideally, the frequency of mistakes will be reduced by preventive mechanisms, the severity of un-preventable mistakes will be reduced by contingency plans, and the team will build a healthy attitude to sharing responsibility.