Machars Blog: Prevention v. cure

James Whittaker
Developer testing, which I call prevention because the more bugs devs find the fewer I have to deal with is often compared to tester testing () which I call detection. Detection is much like a cure, the patient has gotten sick and we need to diagnose and treat it before it sneezes all over our users. Users get cranky when they get app snot on them and it is advisable to avoid that situation to the extent possible.
Developer testing consists of things like writing better specs, performing code reviews, running static analysis tools, writing unit tests (running them is a good idea too), compilation, etc. Clearly developer testing is superior to detection for the following reasons:
1. An ounce of prevention is worth a pound of cure. For every bug kept out of the ecosystem we decrease testing costs and those [censored] testers are costing us a [censored] fortune. [editor note to author: the readers may very well detect your cynicism at this point, suggest tone-down. Author note to editor: I’m a tester and I can only contain my cynicism for a finite period, that period has expired]
2. Developers are closer to the bug and therefore can find it earlier in the lifecycle. The less time a bug lives, the cheaper it is to remove. Testers come into the game so late and that is another reason they cost so much.
Tester testing consists of mainly two activities: automated testing and manual testing. I’ll compare those two in a future post. For now, I just want to talk about prevention v. cure. Are we better to keep software from getting sick or should we focus on disease control and treatment?
Again the answer is obvious: prevention is superior so fire the testers. They come to the patient too late after the disease has run rampant and the cure is costly. What the heck are we thinking hiring these people in the first place?
Ok, re-hire the testers.
Perhaps you’ve noticed but the whole prevention thing isn’t working so well. Failures in software are running rampant. Before I talk about where we should invest our resources to reverse this trend, I want to talk about why prevention fails.
I see a number of problems, not the least of which is that good requirements and specifications seldom get written and when they do they often fall out-of-date as the focus shifts to writing and debugging code. We’re working on that problem in Visual Studio Team System but let’s not get ahead of ourselves. The question in front of us now is why prevention fails. It turns out, I have an opinion about this:
The developer-makes-the-worst-tester problem. The idea that a developer can find bugs in their own code is suspect. If they are good at finding bugs, then shouldn’t they have known not to write the bugs in the first place? This is why most organizations that care about good software hire a second set of eyes to test it. There’s simply nothing like a fresh perspective to detect defects. And there is no replacement for the tester attitude of how can I break this to compliment the developer attitude of how can I build this.
The software-at-rest problem. Any technique such as code reviews or static analysis that don’t require the software to actually run, necessarily analyzes the software at rest. In general this means techniques based on analyzing the source code, byte code or the contents of the compiled binary files. Unfortunately, many bugs don’t surface until the software is running in a real operational environment. Unless you run the software and provide it with real input many bugs will simply remain hidden.
The no-data problem. Software needs input and data to execute its myriad code paths. Which code paths actually get executed depends on the inputs applied, the software’s internal state (the values of the data structures and variables) and external influences like databases and data files. It’s often the accumulation of data over time that causes software to fail. This simple fact limits the scope of developer testing which tends to be short in duration…too short to catch these data accumulation errors.
Perhaps tools and techniques will one day emerge that allow developers to write code without introducing bugs. Certainly it is the case that for narrow classes of bugs like buffer overflows which developer techniques can and have driven to near extinction. If this trend continues, the need for a great deal of testing will be negated. But we are a very long way, decades in my mind, from realizing that dream. Until then, we need a second set of eyes, running the software in an environment similar to real usage and using data that is as rich as real user data.
Who provides this second set of eyes? Software testers provide this service, using techniques to detect bugs and then skillfully reporting them so that they get fixed. This is a dynamic process of executing the software in varying environments, with realistic data and with as much input variation as can be managed in the short cycles in which testing occurs.
In part 3 of this blog series I will turn my attention to tester testing and talk about whether we should be doing this with automation or with manual testing.
Now that the testers are once again gainfully employed, what shall we do with them? Do we point them toward writing test automation or ask them to do manual testing?
First, let’s tackle the pros and cons of test automation. Automated testing carries both stigma and respect.
The stigma comes from the fact that tests are code and writing tests means that the tester is necessarily also a developer. Can a developer really be a good tester? Many can, many cannot but the fact that bugs in test automation are a regular occurrence means that they will spend significant time writing code, debugging it and rewriting it. One must wonder how much time they are spending thinking about testing the software as opposed to writing the test automation. It’s not hard to imagine a bias toward the latter.
The respect comes from the fact that automation is cool. One can write a single program that will execute an unlimited number of tests and find bugs while the tester sleeps. Automated tests can be run and then rerun when the application code has been churned or whenever a regression test is required. Wonderful! Outstanding! How we must worship this automation! If testers are judged based on the number of tests they run, automation will win every time. If they are based on the quality of tests they run, it’s a different matter altogether.
The kicker is that we’ve been automating for years, decades even and we still produce software that readily falls down when it gets on the desktop of a real user. Why? Because automation suffers from many of the same problems that other forms of developer testing suffers from: it’s run in a laboratory environment, not a real user environment, and we seldom risk automation working with real customer databases because automation is generally not very reliable (it is software after all and one must question how much it gets tested). Imagine automation that adds and deletes records of a database—what customer in their right mind would allow that automation anywhere near their database? And there is one Achilles heel of automated testing that no one has ever solved: the oracle problem.
The oracle problem is a nice name for one of the biggest challenges in testing: how do we know that the software did what it was supposed to do when we ran a given test case? Did it produce the right output? Did it do so without unwanted side effects? How can we be sure? Is there an oracle we can consult that will tell us—given a user environment, data configuration and input sequence—that the software performed exactly as it was designed to do? Given the reality of imperfect (or nonexistent) specs this just is not a reality for modern software testers.
Without an oracle, test automation can only find the most egregious of failures: crashes, hangs (maybe) and exceptions. And the fact that automation is itself software often means that the crash is in the test case and not in the software! Subtle and/or complex failures are missed in their entirety.
So where does that leave the tester? If a tester cannot rely on developer bug prevention or automation, where should she place her hope? The only answer can be in manual testing. That will be the topic of part four of this series

Manual testing is human-present testing. A human tester using their brain, their fingers and their wit to create the scenarios that will cause software either to fail or to fulfill its mission. Manual testing often occurs after all the other types of developer and automated techniques have already had their shot at removing bugs. In that sense, manual testers are at somewhat of an unlevel playing field. The easy bugs are gone; the pond has already been fished.
However, manual testing regularly finds bugs and, worse, users (who by definition perform manual testing) find them too. Clearly there is some power in manual testing that cannot be overlooked. We have an obligation to study this discipline in much more detail … there’s gold in them-thar fingers.
One reason human-present testing succeeds is that it allows the best chance to create realistic user scenarios, using real user data in real user environments and still allow for the possibility of recognizing both obvious and subtle bugs. It’s the power of having an intelligent human in the testing loop.
Perhaps it will be the case that developer-oriented techniques will evolve to the point that a tester is unnecessary. Indeed, this would be a desirable future for software producers and software users alike, but for the foreseeable future, tester-based detection is our best hope at finding the bugs that matter. There is simply too much variation, too many scenarios and too many possible failures for automation to track it all. It requires a brain-in-the-loop. This is the case for this decade, the next decade and at perhaps a few more after that. We may look to a future in which software just works, but if we achieve that vision, it will be the hard work of the manual testers of this planet that made it all possible.
There are two main types of manual testing.
Scripted manual testing
Many manual testers are guided by scripts, written in advance, that guide input selection and dictate how the software’s results are to be checked for correctness. Sometimes scripts are specific: enter this value, press this button, check for that result and so forth. Such scripts are often documented in Microsoft Excel tables and require maintenance as features get updated through either new development or bug fixes. The scripts serve a secondary purpose of documenting the actual testing that was performed.
It is often the case that scripted manual testing is too rigid for some applications or test processes and testers take a less formal approach. Instead of documenting every input, a script may be written as a general scenario that gives some flexibility to the tester while they are running the test. At Microsoft, the folks that manually test Xbox games often do this, so an input would be “interact with the mirror” without specifying exactly the type of interaction they must perform.
Exploratory testing
When the scripts are removed entirely, the process is called exploratory testing. A tester may interact with the application in whatever way they want and use the information the application provides to react, change course, and generally explore the application’s functionality without restraint. It may seem ad hoc to some, but in the hands of a skilled and experienced exploratory tester, this technique can be powerful. Advocates would argue that exploratory testing allows the full power of the human brain to be brought to bear on finding bugs and verifying functionality without preconceived restrictions.
Testers using exploratory methods are also not without a documentation trail. Test results, test cases and test documentation is simply generated as tests are being performed instead of before. Screen capture and keystroke recording tools are ideal for this purpose.
Exploratory testing is especially suited to modern web application development using agile methods. Development cycles are short, leaving little time for formal script writing and maintenance. Features often evolve quickly so that minimizing dependent artifacts (like test cases) is a desirable attribute. The number of proponents of exploratory testing is large enough that its case no longer needs to be argued so I’ll leave it at that.
At Microsoft, we define several types of exploratory testing. That’s the topic I’ll explore in part five.
Ok, we're getting to the end of this thread and probably the part that most of you have asked about: exploratory testing, particularly how it is practiced at Microsoft.
We define four types of exploratory testing. This isn’t meant as a taxonomy, it’s simply for convenience, but it underscores that exploratory testers don’t just test, they plan, they analyze, they think and use any and all documentation and information at their disposal to make their testing as effective as possible.
Freestyle Exploratory Testing
Freestyle exploratory testing is ad hoc exploration of an application’s features in any order using any inputs without regard to what features have and have not been covered. Freestyle testing employs no rules or patterns, just do it. It’s unfortunate that many people think that all exploratory testing is freestyle, but that undersells the technique by a long shot as we’ll see in the following variations.
One might choose a freestyle test as a quick smoke test to see if any major crashes or bugs can be easily found or to gain some familiarity with an application before moving on to more sophisticated techniques. Clearly, not a lot of preparation goes into freestyle exploratory testing, nor should it. In fact, it’s far more ‘exploratory’ than it is ‘testing’ so expectations should be set accordingly.
There isn’t much experience or information needed to do freestyle exploratory testing. However, combined with the exploratory techniques below, it can become a very powerful tool.
Scenario-based Exploratory Testing
Traditional scenario-based testing involves a starting point of user stories or documented end-to-end scenarios that we expect our ultimate end user to perform. These scenarios can come from user research, data from prior versions of the application, and so forth, and are used as scripts to test the software. The added element of exploratory testing to traditional scenario testing widens the scope of the script to inject variation, investigation and alternative user paths.
An exploratory tester who uses a scenario as a guide will often pursue interesting alternative inputs or pursue some potential side effect that is not included in the script. However, the ultimate goal is to complete the scenario so these testing detours always end up back on the main user path documented in the script.
Strategy-based Exploratory Testing
If one combines the experience, skill and Jedi-like testing perception of the experienced and accomplished software tester with freestyle testing one ends up with this class of exploratory testing. It’s freestyle exploration but guided by known bug-finding techniques. Strategy-based exploratory testing takes all those written techniques (like boundary value analysis or combinatorial testing) and unwritten instinct (like the fact that exception handlers tend to be buggy) and uses this information to guide the hand of the tester.
These strategies are the key to being successful; the better the repertoire of testing knowledge, the more effective the testing. The strategies are based on accumulated knowledge about where bugs hide, how to combine inputs and data and which code paths commonly break. Strategic testing combines the experience of veteran testers with the free-range habits of the exploratory tester.
Feedback-based Exploratory Testing
This category of testing starts out freestyle but as soon as test history is built up, the tester uses that feedback to guide future exploration. “Coverage” is the canonical example. A tester consults coverage metrics (code coverage, UI coverage, feature coverage, input coverage or some combination thereof) and selects new tests that improve that coverage metric. Coverage is only one such place where feedback is drawn. We also look at code churn and bug density, among others.
I think of this as ‘last time testing’: the last time I visited this state of the application I applied that input, so next time I will choose another. Or, the last time I saw this UI control I exercised property A, this time I will exercise property B.
Tools are very valuable for feedback-based testing so that history can be stored, searched and acted upon in real time. Unfortunately, few such tools exist.

Machars Blog

Search This Blog

Welcome to Machers Blog

Monday, October 13, 2008

Prevention v. cure

No comments: