Search This Blog

Welcome to Machers Blog

Blogging the world of Technology and Testing which help people to build their career.

Tuesday, September 8, 2009

The Butterfly Model for Test Development ( part-1)

There is a dichotomy between the development and testing of software. This schism is illustrated by the plethora of development models employed for planning and estimating the development of software as opposed to the scarcity of valid test development models. At first glance, the same models which serve to underlay the software development process with forethought and diligence appear to be adequate for the more complex task of planning, developing, and executing adequate verification of the application.
Unfortunately, software development models were not intended to encapsulate the vagaries of software verification and validation, the two main goals of software testing. Indeed, software development models can be antithetical to the effective testing of software. It lies in the hands of software testing professionals, therefore, to define an effective model for software test development that complements and completes any given software development model.
One such test development model is the Butterfly Model, which I will explore in some detail in this paper. It should be understood that the butterfly model is neither separate nor integrated with the development model, but instead is a monitoring and modifying factor in the completion of the development model. While this may seem arbitrary and self-contradictory, it is my hope that the elaboration of the butterfly model presented herein will both explain and justify this statement.
In this paper I will present a modified view of the ubiquitous “V” software development model. On top of this modified model I will superpose the butterfly model of test development. Finally, I will reconcile the relationship between the models, clarifying the effects of each on the other and identifying the information portals germane to both, together or separately.

The Standard V Software Development Model
Nearly everyone familiar with modern software development knows of the standard V development model, depicted below.
In this standardized image of the V development model, both the design and test phases of development are represented as linear processes that are gated according to the specific products of specific activities. On the design side, system requirements beget software requirements, which then beget a software design, which in turn begets an implementation.
On the test side of development, the software design begets unit tests. Similarly, software requirements beget integration tests (with a little help from the system requirements). Finally, system requirements beget system tests. Acceptance testing, being the domain of the end user of the application, is deliberately omitted from this view of the V model.
It should be understood that the V model is simply a more expressive rearrangement of the waterfall model, with the waterfall’s time-line component mercifully eliminated and abstraction of the system indicated by the vertical distance from the implementation. The V model is correct as far as it goes, in that it expresses most of the lineage required for the artifacts of successful software development. From an application development point of view, this depiction of the model is sufficient to convey the source associations of the major development cycle artifacts, including test artifacts.
Unfortunately, the application development viewpoint falls well short of the software test development vantage required to create and maintain effective test artifacts.
Rigor of Model Enforcement
Before launching into a discussion of the shortfalls of the V software development model, a side excursion to examine the appropriate level of rigor in enforcing the model is warranted. It needs to be recognized from the start that not all applications will select to implement the V model in the same manner. Generally, deciding on how rigidly the model must be followed is largely a product of understanding the operational arena of the application.
For example, any certification requirements attached to the application will dictate the rigor of the model’s implementation. Safety critical software in the commercial aerospace arena, for example, undergo an in-depth certification review prior to being released for industry use. Applications in this arena therefore tailor their implementation of the V model toward fulfillment of the objectives listed for each segment of the process in RTCA/DO-178B, the Federal Aviation Administration’s (FAA’s) selected guidelines for certification.
Similarly, medical devices containing software that affects safety must be developed using a version of the model that fulfills the certification requirements imposed by the Food and Drug Administration (FDA). As automotive embedded controller software continues to delve into applications that directly affect occupant safety (such as actuator based steering), it can be expected that some level of certification requirement will be instituted for that arena, as well.
Other arenas do not require anything approaching this level of rigor in their process. If the application cannot directly cause injury or the loss of life, or trigger the financial demise of a company, then it can most likely follow a streamlined version of the V model.
Web applications generally fall into this category, as do many e-commerce and home-computing applications. In fact, more applications fall into the second category than the first. That doesn’t exempt them from the need to follow the model, however. It simply modifies the parameters of their implementation of the model.
Where the V Model Leaves Off
The main issue with the V development model is not its depiction of ancestral relationships between test artifacts and their design artifact progenitors. Instead, there are three facets of the V model’s that are incomplete and must be accounted for. Just as in software development, we must define the problem before we can attempt to solve it.
First, the V model is inherently a linear expression of a nonlinear process. The very existence of the spiral model of software development should be sufficient evidence of the nonlinearity of software development, but this point deserves further examination.
Software design artifacts, just like the software program they serve, are created and maintained by people. People make mistakes. The existence of software testers bears witness to this, as does the amount of buggy software that still seems to permeate the marketplace, despite the best efforts of software development and testing professionals. When mistakes are found in an artifact, the error must be corrected. The act of correction, in a small way, is another iteration of the original development of the artifact.
The second deficient aspect of the V model is its implication of unidirectional flow from design artifacts into test artifacts. Any seasoned software developer understands that feedback within the development cycle is an absolute necessity. The arrows depicting the derivation of tests from the design artifacts should in reality be two headed, although the left-pointing arrowhead would be significantly smaller than the right-pointing head.
While test artifacts are generally derived from their corresponding design artifacts, the fact that a test artifact must be so derived needs to be factored in when creating the design artifact in the first place. Functional requirements must be testable – they must be stated in such a manner as to be conducive to analysis, measurement, or demonstration. Vague statement of the requirements is a clear indicator of trouble down the road. Likewise, software designs need to be complete and unambiguous. The implementation methodology called for in the software design must be clear enough to drive the definition of appropriate test cases for the verification and validation of that design.
If the implementation itself is to be part of the test ancestry, then it, too, must be concise and complete, with adequate commentary on the techniques employed in its construction but without ambiguity or self-contradiction.
It should be noted that this discussion of the second deficient aspect of the V model is predicated on a rigorous enforcement of the model’s dictates, such as is required for most aerospace applications. For less rigorous instances of the model, the absolutes listed above may not apply. This issue will be discussed further later in this paper.
The third deficient aspect of the V software development model is its encapsulation of test artifact ancestry solely within the domain of the design artifacts. As stated above, test artifacts are generally derived from their corresponding design artifacts. There are a multitude of other sources that must be touched upon to ensure success in generating a “complete” battery of tests for the software being developed.
A Closer View
The first issue mentioned with regard to the V development model is its essential linearization of a nonlinear process – software development. This problem is one of perception, really, or perhaps perspective. The root cause can be found in the fact that the V software development model is a simplified visualization tool that illustrates a complex and interrelated process. A more detailed view of a segment of the design leg (which segment is immaterial) is shown below.
In this expanded view of the design leg of the V, the micro-iterative feedback depicted by the small black arrows within the overall gray feed-forward thrust are visible. Each micro-iteration represents the accumulation of further data, application of a lesson learned, or even the bright idea someone dreamed up while singing in the shower. The point to be made here is this: The general forward-leaning nature of the legs of the V tends to disguise the frenzied iterations in thought, specification, and development required to create a useful application.
There are critical points along the software development stream that must be accounted for in any valid test development model. For example, any time there is a handoff of an artifact (or part of an artifact), the transacted artifact must be analyzed with respect to its contents and any flow-down effects caused by those contents [MARI99]. In the expanded view of the V development model shown above, the left edge of the broad arrow represents the genesis of a change in the artifact under development. This edge, where new or modified information is being introduced, is the starting point for all new micro-iterations. The right edge of the broad arrow is the terminus for each micro-iteration, where the new or modified information is fully incorporated in the artifact.
It should be further understood that micro-iterations can be independent of each other. In fact, most significant software development incorporates a maelstrom of independent micro-iterations that ebb and flow both concurrently and continuously throughout the overall development cycle.
The spiral model of software development, which many consider to be superior to the V model, is founded on an explicit understanding of the iterative nature of software creation. Unfortunately, the spiral model tends to be expressed on a macro scale, hiding the developmental perturbations needed for the production of useful design and test artifacts.
The Butterfly Model
Now that we have rediscovered the hidden micro-iterations in a successful process based on the V model, we need to understand the source of these perturbations. Further, we need to understand the fundamental interconnectedness of it all, to borrow an existential phrase.
Butterflies are composed of three pieces – two wings and a body. Each part represents a piece of software testing, as described hereafter.
Test Analysis
The left wing of the butterfly represents test analysis – the investigation, quantization, and/or re-expression of a facet of the software to be tested. Analysis is both the byproduct and foundation of successful test design. In its earliest form, analysis represents the thorough pre-examination of design and test artifacts to ensure the existence of adequate testability, including checking for ambiguities, inconsistencies, and omissions.
Test analysis must be distinguished from software design analysis. Software design analysis is constituted by efforts to define the problem to be solved, break it down into manageable and cohesive chunks, create software that fulfills the needs of each chunk, and finally integrate the various software components into an overall program that solves the original problem. Test analysis, on the other hand, is concerned with validating the outputs of each software development stage or micro-iteration, as well as verifying compliance of those outputs to the (separately validated) products of previous stages.
Test analysis mechanisms vary according to the design artifact being examined. For an aerospace software requirement specification, the test engineer would do all of the following, as a minimum:
• Verify that each requirement is tagged in a manner that allows correlation of the tests for that requirement to the requirement itself. (Establish Test Traceability)
• Verify traceability of the software requirements to system requirements.
• Inspect for contradictory requirements.
• Inspect for ambiguous requirements.
• Inspect for missing requirements.
• Check to make sure that each requirement, as well as the specification as a whole, is understandable.
• Identify one or more measurement, demonstration, or analysis method that may be used to verify the requirement’s implementation (during formal testing).
• Create a test “sketch” that includes the tentative approach and indicates the test’s objectives.
Out of the items listed above, only the last two are specifically aimed at the act of creating test cases. The other items are almost mechanical in nature, where the test design engineer is simply checking the software engineer’s work. But all of the items are germane to test analysis, where any error can manifest itself as a bug in the implemented application.
Test analysis also serves a valid and valuable purpose within the context of software development. By digesting and restating the contents of a design artifact (whether it be requirements or design), testing analysis offers a second look – from another viewpoint – at the developer’s work. This is particularly true with regard to lower-level design artifacts like detailed design and source code.
This kind of feedback has a counterpart in human conversation. To verify one’s understanding of another person’s statements, it is useful to rephrase the statement in question using the phrase “So, what you’re saying is…”. This powerful method of confirming comprehension and eliminating miscommunication is just as important for software development – it helps to weed out misconceptions on the part of both the developer and tester, and in the process identifies potential problems in the software itself.
It should be clear from the above discussion that the tester’s analysis is both formal and informal. Formal analysis becomes the basis for documentary artifacts of the test side of the V. Informal analysis is used for immediate feedback to the designer in order to both verify that the artifact captures the intent of the designer and give the tester a starting point for understanding the software to be tested.
In the bulleted list shown above, the first two analyses are formal in nature (for an aerospace application). The verification of system requirement tags is a necessary step in the creation of a test traceability matrix. The software to system requirements traceability matrix similarly depends on the second analysis.
The three inspection analyses listed are more informal, aimed at ensuring that the specification being examined is of sufficient quality to drive the development of a quality implementation. The difference is in how the analytical outputs are used, not in the level of effort or attention that go into the analysis.
Test Design
Thus far, the tester has produced a lot of analytical output, some semi-formalized documentary artifacts, and several tentative approaches to testing the software. At this point, the tester is ready for the next step: test design.
The right wing of the butterfly represents the act of designing and implementing the test cases needed to verify the design artifact as replicated in the implementation. Like test analysis, it is a relatively large piece of work. Unlike test analysis, however, the focus of test design is not to assimilate information created by others, but rather to implement procedures, techniques, and data sets that achieve the test’s objective(s).
The outputs of the test analysis phase are the foundation for test design. Each requirement or design construct has had at least one technique (a measurement, demonstration, or analysis) identified during test analysis that will validate or verify that requirement. The tester must now put on his or her development hat and implement the intended technique.
Software test design, as a discipline, is an exercise in the prevention, detection, and elimination of bugs in software. Preventing bugs is the primary goal of software testing [BEIZ90]. Diligent and competent test design prevents bugs from ever reaching the implementation stage. Test design, with its attendant test analysis foundation, is therefore the premiere weapon in the arsenal of developers and testers for limiting the cost associated with finding and fixing bugs.
Before moving further ahead, it is necessary to comment on the continued analytical work performed during test design. As previously noted, tentative approaches are mapped out in the test analysis phase. During the test design phase of test development, those tentatively selected techniques and approaches must be evaluated more fully, until it is “proven” that the test’s objectives are met by the selected technique. If all tentatively selected approaches fail to satisfy the test’s objectives, then the tester must put his test analysis hat back on and start looking for more alternatives.
Test Execution
In the butterfly model of software test development, test execution is a separate piece of the overall approach. In fact, it is the smallest piece – the slender insect’s body – but it also provides the muscle that makes the wings work. It is important to note, however, that test execution (as defined for this model) includes only the formal running of the designed tests. Informal test execution is a normal part of test design, and in fact is also a normal part of software design and development.
Formal test execution marks the moment in the software development process where the developer and the tester join forces. In a way, formal execution is the moment when the developer gets to take credit for the tester’s work – by demonstrating that the software works as advertised. The tester, on the other hand, should already have proactively identified bugs (in both the software and the tests) and helped to eliminate them – well before the commencement of formal test execution!
Formal test execution should (almost) never reveal bugs. I hope this plain statement raises some eyebrows – although it is very much true. The only reasonable cause of unexpected failure in a formal test execution is hardware failure. The software, along with the test itself, should have been through the wringer enough to be bone-dry.
Note, however, that unexpected failure is singled out in the above paragraph. That implies that some software tests will have expected failures, doesn’t it? Yes, it surely does!
The reasons behind expected failure vary, but allow me to relate a case in point:
In the commercial jet engine control business, systems engineers prepare a wide variety of tests against the system (being the FADEC – or Full Authority Digital Engine Control) requirements. One such commonly employed test is the “flight envelope” test. The flight envelope test essentially begins with the simulated engine either off or at idle with the real controller (both hardware and software) commanding the situation. Then the engine is spooled up and taken for a simulated ride throughout its defined operational domain – varying altitude, speed, thrust, temperature, etc. in accordance with real world recorded profiles. The expected results for this test are produced by running a simulation (created and maintained independently from the application software itself) with the same input data sets.
Minor failures in the formal execution of this test are fairly common. Some are hard failures – repeatable on every single run of the test. Others are soft – only intermittently reaching out to bite the tester. Each and every failure is investigated, naturally – and the vast majority of flight envelope failures are caused by test stand problems. These can include issues like a voltage source being one twentieth of a volt low, or slight timing mismatches caused by the less exact timekeeping of the test stand workstation as compared to the FADEC itself.
Some flight envelope failures are attributed to the model used to provide expected results. In such cases, hours and days of gut-wrenching analytical work go into identifying the miniscule difference between the model and the actual software.
A handful of flight envelope test failures are caused by the test parameters themselves. Tolerances may be set at unrealistically tight levels, for example. Or slight operating mode mismatches between the air speed and engine fan speed may cause a fault to be intermittently annunciated.
In very few cases have I seen the software being tested lay at the root of the failure. (I did witness the bugs being fixed, by the way!)
The point is this – complex and complicated tests can fail due to a variety of reasons, from hardware failure, through test stand problems, to application error. Intermittent failures may even jump into the formal run, just to make life interesting.
But the test engineer understands the complexity of the test being run, and anticipates potential issues that may cause failures. In fact, the test is expected to fail once in a while. If it doesn’t, then it isn’t doing its job – which is to exercise the control software throughout its valid operational envelope. As in all applications, the FADEC’s boundaries of valid operation are dark corners in which bugs (or at least potential bugs) congregate.
It was mentioned during our initial discussion of the V development model that the model is sufficient, from a software development point of view, to express the lineage of test artifacts. This is because testing, again from the development viewpoint, is composed of only the body of the butterfly – formal test execution. We testers, having learned the hard way, know better.

To be continued in Part-2

1 comment:

Brian Bailey said...

So where are the diagrams that go along with the article?