I’ve argued previously that rigid or otherwise over-prescriptive processes lead to sub-optimal performance of creative teams. But not everything about software development is creative. For all but the most esoteric of offerings, there will come a point when we need to deliver the fruits of our labour to users, and from that point on an element of “production” is invoked. In Agile methodologies, the point of release to production is, theoretically, brought forward to the end of the first iteration — even if no physical release is issued at this point.
Products in production require supporting, and part of support is handling of defects, in software aka Bugs.
Bugs are a pain in the ass. They are unexpected (by definition) and nearly always have negative consequence. They are also often difficult to analyse and expensive to fix. We wish there were no bugs, because we strive for perfection. Perfect engineering solves a human problem with technology. Bugs in the technology represent not a solution but a new problem! If the new problem is severe enough, it will nullify the benefits of the solution. Owning technology which not only doesn’t solve the original problem but also introduces new problems makes customers sad, and not many engineers set out to make the world a sadder place.
We wish there weren’t bugs, but there always will be, because life’s like that. We shouldn’t stop striving for perfection, but we should have a robust plan for what happens when we fall short of it, so that our customers don’t get sad. Because bugs are so problematic, and their effects so critical, whatever process we define to deal with them should be highly consistent, responsive and efficient. In other words, a process quite apart from that of development.
In this blog post, I’m going to first give my personal take on what bugs are and why they are, and then to outline some guidelines which I believe can be used to define a reasonable process for their handling.
What a Bug isn’t
I was recently involved in a forum discussion where somebody said the following:
“[A] Defect is [the] developer not meeting the user functionality or […] requirement” and
“…defects are not user functionality but problems introduced by development team”
Whilst I categorically don’t agree with either the definitions, or the underlying sentiments here, this is a very common viewpoint within our industry. Even if it’s not said out loud in certain organisations, it may very well be being said behind closed doors, or inside closed minds.
NOTE: I don’t mean to disparage the person who made these particular comments. It is quite possible that I misunderstood or misrepresented the context in which they were made, but I’d like to deal with the general phenomenon of the “blame culture”, which assuredly does exist. I could have found similar quotes most anywhere, but perhaps not so succinctly stated!
Firstly, the idea that that perfect requirements are delivered by some perfect mechanism to a team of imperfect developers who may or may not be competent enough to actually deliver them perfectly on any occasion is a hideously outdated point of view. Usually it’s indicative of a culture with insufficient respect for developers. I won’t go into this further in this post at least, but respecting the technical experts on your team is something I feel very strongly about.
Secondly, the idea that requirements can be separated from implementation is a pure abstraction. It’s a very convenient abstraction which we use nearly all of the time — which is why we tend to forget that it doesn’t correspond to the physical reality. You can’t sell a requirement to your customers, and neither can you just sell them chaotic and unspecified “implementation” without agreeing on what that implementation should achieve. From a customer’s point of view, the product either delivers a feature (satisfactorily) or doesn’t. If it doesn’t, they don’t actually care on which side of the abstract line between requirements and implementation the failure occurred. As a team member, you should care, but only on a case-by-case basis, and only insofar as it helps you make a positive decision about how to improve the effectiveness of your team.
The comments quoted above are indicative of a blame-culture. It’s clearly a case of a group of people who aren’t “developers” or don’t consider themselves part of the “development team” casting blame away from themselves and onto another group. If you are involved in delivered a product or a project, in any capacity, you should share responsibility and ownership of problems – not apportion them to others. Once again, your customers will only see failures of you as an organisational entity – not as teams or as individuals. By casting blame you will only appear disloyal in the eyes of your accusers.
So what is a bug?
My definition of a bug is very simple and highly generalised:
“A bug is any unexpected behaviour which somebody should know about.”
What? That’s it!? Let’s break it down a little bit:
“A bug is…” — the existence criterion
A bug exists. A “worry” isn’t a bug (yet). But there’s no smoke without fire: keep digging.
“…unexpected…” — the obviousness criterion and “…behaviour…” — the effect criterion
A bug must have an effect which wouldn’t be perfectly obvious to a user of the system. It’s tempting to write “negative” in place of “unexpected”, but negative is a little too loaded for my liking. It’s hard to think of a bug which doesn’t have some negative effect, and I rather suspect it’s possible to put a negative spin on all bugs without too much contortion, but still I prefer the looser sounding term above.
Note that the word “behaviour” is important. Behaviour implies visibility to users, although not necessary UX/UI related – for instance a memory leak is “visible” when it causes system degradation. Something that is never visible is not a bug. “Coding conventions violated for xyz.java” is not a bug. It’s still bad, but not a bug under my definition, because it does not constitute behaviour.
“…which somebody should know about.” — the relevance criterion
This may seem obvious, but if nobody is ever going to be interested in something, there’s no point telling them about it. Almost always whether or not some piece of information is useful is down to whether or not that information is actionable. Again, I’m talking in the loosest possible sense. The act of explaining to an angry customer why something has gone wrong is an action, even if the root cause of the issue is never addressed. Similarly, the action might be simply “remember this bug exists” because it could have repercussions later.
It’s a common urban myth that the average person eats seven spiders a year in their sleep without ever realising it. Even if this were true (and I’m assured it isn’t), since we’re asleep we will never know when this has or hasn’t occurred and so there’s nothing we can do about it. The spider is not a bug, as it turns out.
Again — there’s a difference between a piece of information being true and it being a bug. In my definition, a bug is by definition something that somebody has an interest in knowing about.
Some examples (of bug reports)
(any similarities to real events is purely coincidental and also hilarious)
“I’m worried that if a power-cut occurs on the client while it is synchronising with the server, the server might burst into flames – or worse!” — not a bug (existence criterion) but sounds pretty scary – try and prove it and get back to me!
“When I save a document over an existing document, and then click Okay to some warning dialog or other, it overwrites my old document! My novel – lost forever!” — not a bug (obviousness criterion). That dialog you so recklessly dismissed was warning you that your old document was about to be overwritten. Next time RTFM.
“If you’re using this product, you need to know that the team who wrote this product are all jerks. One of them pushed me over in the company canteen and the others all laughed and threw peas at my face.” — not a bug (effect and probably also relevance criterion). This is certainly a concern you should probably bring to your HR department, but the effect of this bullying behaviour on the product as it pertains to its users is nil. Perhaps it raises some ethical concerns for your customers, but that’s outside of the scope of product bugs, in my definition.
“When the browser window is minimised, all the people in your photographs come alive and have adorable little tiny conversations with each other.” — probably not a bug (relevance criterion). You never actually see any of this happening so as far as you’re concerned it might as well not be. However, if it makes your PC draw more power then it might have an effect that you hadn’t considered (more on this in part 2).
You may disagree with my definition of what a bug is. But stick with me for a moment, and let’s use my definition to see if we can figure out why bugs happen:
1) Software changes — a new behaviour is introduced where none previously existed.
2) Expectations change — a previously expected behaviour becomes unexpected.
3) Interest changes — something which was previously considered irrelevant becomes something which someone should know about.
It should be noted that only one of these scenarios permits bugs to be controlled by the non-clairvoyant developer writing code. Even within the first scenario, the root cause of a bug is rarely directly attributable to some code written as part of the software change. Other causes for bug creation in this category include, but are not limited to:
– Ambiguous specification (communications failure in link between voice-of-customer and developers)
– Incorrect or incomplete specification (communications failure in link between customer and voice-of-customer)
– Sleeper bugs (previous behaviour which originally did not constitute a bug is activated)
When considering code changes which do directly lead to bugs, the fault is rarely incompetence. Truly incompetent software developers don’t tend to last long as developers (we usually transition to Product Owners – woof, woof. Future employers: kidding ;))
Other causes for bug creation than pure incompetence include:
– When a system becomes sufficiently complex that it is practically impossible for a human to avoid unintended changes
- When external pressures (e.g. overwork, poor work environment, etc.) cause an otherwise avoidable decrease in performance
– Lack of investment in quality and related infrastructure (e.g. unit testing tools, training, etc.)
Part 2 of this post will look into my strategy for dealing with software bugs and there I’ll return to the thought I started with concerning the challenge of dealing simultaneously with pre-production and post-production software.
As a final note, on the subject of both incompetency and dealing with the problem of bugs, I think it’s worth stating for the record that if you’re managing a development effort, even an individual’s incompetency is ultimately your problem. Somebody has to hire an incompetent individual — they don’t just walk in off the street and start bashing away at the nearest keyboard. By far and away the best solution to incompetent team members is not to hire them in the first place, but in the very rare cases where they exist and are causing a problem the solution is still in your hands. Personally I think of the problem as a similar one to the noisy work environment, or overworked team members — there is a net detrimental effect on the team and this has to be resolved to improve both efficiency and also morale.
More in part 2…