1% bugs
Continuing on the 1% theme from yesterday, the worst kind of bugs to deal with are critical bugs that occur both intermittently and rarely.
On the one hand, a bug that only happens 1% of the time may be okay. If you can easily recover from it, it may not be worth prioritizing. Imagine a web form, for example, where you hit Submit, and about 1% of the time, you get an error message that says, “Oops, something went wrong, please try again.” If you don’t lose any data and can simply hit the button again and it works, digging into the cause of the problem may take days. If you’re short on time, that kind of bug is often deferred.
But if it’s something more serious, such as a bug involving data loss, and it only happens 1% of the time, but isn’t reproducible (that is, if the same set of steps don’t always cause the bug to occur), then you’re dealing with a disastrous situation. Often times your beta program has not uncovered the bug. A beta program involving even 500 users is expensive and difficult to coordinate properly. During the course of your beta, if you’re lucky, you may see the 1% bug 10 times and prioritize a fix. But if you’re unlucky, you may only see it once or not at all. If it only occurs once (and the beta tester actually reports it), your beta coordination team has to be very sharp to notice the issue and flag it appropriately. Most of the time it will be overlooked, and even if it isn’t, it’s quite possible for a developer to defer the bug if he or she can’t reproduce the problem.
If you’re a mechanic working on a car, you’d much rather deal with a repair where there’s smoke coming out of the engine, because there you can see what’s wrong and know where the issue is. You’d rather not have to troubleshoot a problem where the driver says that sometimes the car stalls when shifting between first and second but only if it’s cold and only on Tuesdays. (That’s the kind of problem that is best suited for Car Talk.) The same applies to computer bugs.
Every program of any complexity that we use has bugs. I think users who aren’t involved with developing software often wonder why bugs are not all found and removed before a product is released. But the more complexity there is, the more variables, and consequently the harder it is for a developer to even experience the bug prior to release in the first place. (And developers spent enough time fixing the bugs they know about, they’re never going to fix the ones they don’t know about.)
One percent bugs are the most insidious, because they’re the easiest to miss or defer pre-release, yet once you roll out to the general public, it’ll quickly become apparent there’s an issue. Even if only 1% of users experience the problem, if you have a user base of a few million customers, then you’re talking about tens of thousands of support cases — enough to overwhelm the support center of even the largest of companies.
Good coding and unit testing practices can help, along with automated testing suites that hammer on the product at high rates of speed.
January 15th, 2009 at 9:55 am
Say you have Quicken 2009. Does that mean most of the bugs have been fixed from previous editions?
January 15th, 2009 at 12:21 pm
I’ve not used 2008 or 2009, so I don’t know personally.
The more new features that are introduced, the more new bugs are introduced. In general, minor updates (like going from 2.0 to 2.1) fix more bugs than they introduce, while major updates (like going from 2.0 to 3.0) introduce new bugs, but things vary.
In college I’d always want the latest version of anything because I thought it would always be better and more stable. These days I wait to read reviews before upgrading.
January 16th, 2009 at 9:42 am
Yet sometimes there is no choice and upgrades are forced upon the device. When the 1% errors occur (although I think the % is actually greater) the only option is to await a fix…which is usually a long period of time…and can be very frustrating to the customer especially so if the device was working prior to the update.
I agree with good coding practices, but I think in many cases pre-rollout testing needs to be more intense, with proven testers that have an avenue to support contacts, and ensure that all (if possible) variation of Models are covered in testing.