A guide to debugging
Become better at fixing bugs.
I finished reading the book Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems.
In this article I wanted to share the 9 rules I learned, this article also serves as a place for me to come to whenever I find myself debugging, to remind myself of the 9 rules 😃
I also share knowledge from my own experience when it comes to debugging, and love bringing up analogies.
In the end, a bonus section is included with a few things I learned from the book The Pragmatic Programmer when it comes to debugging. 🥳
Understand the system
The first rule is the most important one, understanding.
It’s important to have knowledge of what the software is supposed to do, how it’s designed, and in some instances, why it was designed that way. With such knowledge, it becomes easier and can even become obvious where the problem lies.
The essence of Understand the System is, Read the manual.
Reading the company’s internal documentation, reading the documentation of the technology you are using, looking at the test suites, reading comments, are some of the things you can do to better understand the software.
I remember someone on Twitter saying: Reading the documentation can save you hours of debugging.
When reading, make sure to read thoroughly and not skim. Actually, from my own experience, I once skimmed answers I found on StackOverflow, after a while I decided to read the answers more thoroughly because I didn’t find anything that helped me, guess what, the answer was right in front of me! 🤣
Knowing the fundamentals of your field helps. If we lack the fundamental knowledge, the chance is much higher for us to debug for longer, run into new problems. If you look at the tree for an instance, the fundamentals, the roots, are very strong, hence it doesn’t fall, or very rarely falls I should say. 🌳
Know how to work with the debugging tools, they are your eyes and ears into the system, take your time to learn them, it will pay off greatly. Personally, as a Frontend Developer, once I learned how to work with the Browser DevTools, fixing bugs became much easier and faster. 🔥
Don’t trust your memory or instincts, don’t make guesses or assumptions, look things up, if you think something works in a certain way, prove it! Don’t have guesses or assumptions when fixing bugs, if you do, prove if your assumptions are either wrong or right.
Make It Fail
Reproducing the bug allows us to see the failure, knowing under precisely what conditions the failure occurs can give us a better understanding of the possible causes, and also is important to know how to make it fail in order to ensure that you’ve fixed the bug after having implemented the solution.
Reproduce the bug not just once, do it again, write down each step as you go, identify exactly when the bug occurs, that will help you recognize what causes the bug to happen.
Sometimes it’s intermittent, and it is tough to consistently reproduce the bug, perhaps it happens 1 out 5 times.
Look at each time the bug happens, each time you successfully reproduce it, and capture the key information, try identifying the difference between the good and failure cases.
Personally, I enjoy starting the process of fixing bugs by making sure to have tests in place, or adding a test suite that resembles the desired behavior (test is failing). That way I can be sure when I’ve completely fixed the bug. I can run the test multiple times as well, ensuring that it is not flaky (the failure/bug not being intermittent).
Quit Thinking and Look
Don’t guess how something is failing and then try to implement the fix to your guess, it takes time, money and may break something else in the system, instead, look and find the cause of the bug.
The bug we witness is the result of that which went wrong, not necessarily what went wrong, rather the result is the undesired behavior. Just because you are able to reproduce a bug doesn’t mean you’ve found the cause of it. The book speaks about an example for this:
What we see when we note the bug is the result of the failure: I turned on the switch and the light didn’t come on. But what was the actual failure? Was it that the electricity couldn’t get through the broken switch, or that it couldn’t get through the broken bulb filament? (Or did I flip the wrong switch?) You have to look closely to see the failure in enough detail to debug it.
Find the exact cause and fix the bug, it’s much faster than guesswork which leads nowhere.
Usually, when looking into the system to see the failure, you often learn more about why the system might be failing, and that helps you where to look even deeper to get more detail.
Guessing and making assumptions itself is not something bad, its actually good, and your guesses may be close, just make sure to prove your them and look in detail for the cause of the bug, don’t think too much, or immediately start implementing a fix you think will solve the bug, it just takes unnecessary time, money and may potentially break something else in the system as mentioned.
Divide and Conquer
This rule is the heart of debugging and actually involves finding the problem.
Originally, Divide and Conquer is an algorithm design paradigm. Such an algorithm recursively breaks down a problem into two or more smaller problems of the same or similar type. If you can solve the little problems, you can solve the bigger ones as well. Divide and conquer is used primarily for sorting or finding the closest pair of points.
It can also be a powerful approach to debugging. 🪄
Let’s say we have a large codebase that we aren’t really familiar with, and we don’t have any assumptions where the cause of the bug is in the code, it would take an eternity for us if we were to set breakpoints and make tons of logs. Well, Divide and Conquer to the rescue.
We divide the code in half, then identify in which half the problem is, and do the same thing to the half where the problem was. We continue doing this till we’ve narrowed down exactly where the bug gets caused.
Sometimes when debugging, you may come across other bugs that seem like they are unrelated to the bug you are trying to fix, it’s better to fix such bugs right away, that way you can have a clean look at the main bug you are trying to fix.
Sometimes, fixing one problem does fix the other problem; the two really are the same bug.
Change One Thing at a Time ☝️
Change one thing at a time, not multiple things. Even if it seems like you changed something that didn’t have an effect, you don’t really know how it has affected other parts of the system.
If a change doesn’t seem to have an effect, undo it right away, don’t let it stay there while you continue looking for the cause of the bug.
Find exactly what is failing, and fix only that thing.
Sometimes a bug exists because something else has been changed, but the system did previously work, then it helps going back to the previous versions of the system to see what exactly caused the bug to appear.
Keep an Audit Trail
When trying to find the cause of the bug, write down what you did, what order you did it in, and what happened as a result. It’s important to keep track of each step, in order to determine where to focus on during debugging. ✍️
The important information lies in the details, what we observe is only the result of the failure.
When writing, be consistent and specific in describing things.
Correlate the steps with the result. Instead of It disappeared, a better description would be it disappeared after the form was submitted. Chain the different events and see how they may relate.
Don’t trust your memory with a detail, write it down. You will never be able to remember exactly how things happened, in what order, the result of it, the details etc.
Write it down. It’s better to write it electronically so you can make backup copies, attach it to bug reports, distribute it to others easily, and maybe even filter it with automated analysis tools later. Write down what you did and what happened as a result. Save your debug logs and traces, and annotate them with related events and effects that they don’t inherently record themselves. Write down your theories and your fixes. Write it all down.
Check the Plug 🔌
Checking the plug is about checking that things actually work as they should, things we expected to work in a way. It seems like a silly thing but happens a lot. It can even be in the tools you are using, the framework, library, compiler, debugging tools, linter, etc.
The tools we use were also built by engineers, so they aren’t perfect themselves either.
Personally, this reminds me of working with TypeScript, sometimes TypeScript will complain, despite the type error being fixed, the solution in such cases would be to restart the Code Editor or the TypeScript server.
Get a Fresh View
This rule is about asking for help. Sometimes something that would’ve taken us hours to solve can take us minutes to solve when asking a teammate for help. Someone who comes at the problem from a different angle can give us great insights and new approaches to try. 👀
Rubber Ducking: By explaining the problem to someone else, step by step, you yourself can gain new insights, and in some instances spot where the problem lies.
By asking a teammate that is more experienced and understands the system better than you do, not just can find we faster find the bug, but they will also help us implement a proper fix that won’t mess up the rest of the system.
Don’t be afraid of asking for help. It is not a sign of incompetence, rather true eagerness to get the bug fixed. If you ask the right person for help, you will get the bug fixed faster.
When asking for help, tell your teammate what is happening and show the bug. Don’t poison them with assumptions, guesses or theories that you may have, let them gain their own insight on the problem.
If You Didn’t Fix It, It Ain’t Fixed
If you follow the “Make It Fail” rule, you’ll know how to prove that you’ve fixed the problem. Do it! Don’t assume that the fix works; test it. No matter how obvious the problem and the fix seem to be, you can’t be sure until you test it.
When you think you’ve fixed an engineering design, take the fix out. Make sure it’s broken again. Put the fix back in. Make sure it’s fixed again.
Personally, this is why I believe it’s so important to start with a failing test, a test that resembles the desired behavior which is currently not in place, that way you can truly ensure you’ve fixed the bug and that your software works as it should.
This is the bonus section with some things from the book The Pragmatic Programmer when it comes to debugging.
Rubber Ducking 🗣️
I did mention rubber ducking, getting someone and explaining the problem to them, but it doesn’t have to be a person, you can also try explaining the problem to something, like a teddy bear for example. The goal is to explain the problem to something or someone, and that can help you gain better insights and trigger new approaches.
Reading the Error Message
Something that seems simple, but is extremely powerful, read the error message, thoroughly read it and try to understand what exactly is going wrong, perhaps google and see what the error message is about, that can give you another clue to what could be causing the bug to occur. 🔍
The Pragmatic Programmer also introduces a checklist you can keep to your side when debugging. 📝
Is the problem being reported a direct result of the underlying bug, or merely a symptom?
Is the bug really in the framework you're using? Is it in the OS? Or is it in your code?
If you explained this problem in detail to a coworker, what would you say?
If the suspect code passes its unit tests, are the tests complete enough? What happens if you run the tests with this data?
Do the conditions that caused this bug exist anywhere else in the system? Are there other bugs still in the larval stage, just waiting to hatch?