Search This Blog

Welcome to Machers Blog

Blogging the world of Technology and Testing which help people to build their career.

Tuesday, September 8, 2009

The Butterfly Model for Test Development (Part-2)


A Swarm of Testing
We have now examined how test analysis, test design, and test execution compose the body of the butterflies in this test development model. In order to understand how the butterfly model monitors and modifies the software development model, we need to digress slightly and reexamine the V software development model itself.
In Figure 1, not only have the micro-iterations naturally present in the design cycle been included, but the major design phase segments (characterized by their outputs) have been separated into smaller arrows to clearly define the transition point from one segment to the next. The test side of the V has been similarly separated, to demarcate the boundaries between successful formal execution of each level of testing. Complete Expanded V Development Model View
No micro-iterations on the test side of the V are shown in this depiction, although there are a few to be found – mostly around the phase segment transitions, where test execution documentary artifacts are formulated and preserved. The relative lack of micro-iterations on the test side of the V is due to the fact that it represents only the formal running of tests – the leg work of analysis and design are done elsewhere. The question, therefore, is: Where?
The answer to this all-important question is shown in Figure 2.
Figure 1. Illustration of the Butterfly Test Development Model
At all micro-iteration termini, and at some micro-iteration geneses, exists a small test butterfly. These tiny test insects each contribute to the overall testing effort, encapsulating the test analysis and design required by whatever minor change is represented by the micro-iteration.
Larger, heavier butterflies spring to life on the boundaries between design phase segments. These larger specimens carry with them the more formal analyses required to transition from one segment to the next. They also answer the call for coordination between the tests designed as part of their smaller brethren. Large butterflies also appear at the transition points between test phase segments, where documentary artifacts of test execution are created in order to claim credit for the formal execution of the test.
A single butterfly, by itself, is of no moment – it cannot possibly have much impact on the overall quality of the application and its tests. But a swarm of butterflies can blot out the sun, affecting great improvement in the product’s quality. The smallest insects handle the smallest changes, while the largest tie together the tests and analyses of them all.
The right-pointing lineage arrows, which show the roots of each test artifact in its corresponding design artifact, point to the moment in the software development model where the analysis and design of tests culminate in their formal execution.
Butterfly Thinking
“A butterfly flutters its wings in Asia, and the weather changes in Europe.” This colloquialism offers insight into the chaotic (in the mathematical sense of the word) nature of software development. Events that appear minor and far removed from relevance can have a profound impact on the software being created. Many seemingly minor and irrelevant events are just that – minor and irrelevant. But some such events, despite their appearance, are not.
Identifying these deceptions is a key outcome of the successful implementation of the butterfly model. The following paragraphs contain illustrations of this concept.
Left Wing Thinking
The FADEC must assert control over the engine’s operation within 300 msec of a power-on event.
This requirement, or a variant of it, appears in every system specification for a FADEC. It is important because it specifies the amount of time available for a cold-start initialization in the software.
The time allotted is explicit. No more than three tenths of a second may elapse before the FADEC asserts itself.
The commencement of that time period is well defined. The nearly vertical rising edge of the FADEC power signal as it moves from zero volts (off) to the operational voltage of the hardware marks the start line.
But what the heck does “assert control” mean?
While analyzing this requirement statement, that question should jump right off the written page at the tester. In one particular instance, the FADEC asserted control by crossing a threshold voltage on a specific analog signal coming out of the box. Unfortunately, that wasn’t in the specification. Instead, I had to ask the senior systems engineer, who had performed similar tests hundreds of times, how to tell when the FADEC asserted itself.
In other words, I couldn’t create a test sketch for the requirement because I couldn’t determine what the end point of the measurement should be. The system specification assumed that the reader held this knowledge, although anyone who was learning the ropes (as I was at that point) had no reasonable chance of knowing. As far as I know, this requirement has never been elaborated.
As a counterpoint example, consider the mass-market application that, according to the verbally preserved requirements, had to be “compelling”. What the heck is “compelling”, and how does one test for it?
In this case, it didn’t matter that the requirement was ill suited for testing. In fact, the testers’ opinions on the subject weren’t even asked for. But the application succeeded, as evidenced by the number of copies purchased. Customers found the product compelling, and therefore the project was a success.
But doesn’t this violate the “must be testable” rule for requirements? Not really. The need to be “compelling” doesn’t constitute a functional requirement, but is instead an aesthetic requirement. Part of the tester’s analysis should weed out such differences, where they exist.
Right Wing Thinking
Returning to our power-up timing example, how can we measure the time between two voltage-based events? There are many possibilities, although most can’t handle the precision necessary for a 300 msec window. Clocks, watches, and even stopwatches would be hideously unreliable for such a measurement.
The test stand workstation also couldn’t be used. That would require synchronization of the command to apply power with the actual application of power. There was a lag in the actual application of power, caused by the software-driven switch that had to be toggled in the test stand’s circuitry. Worse yet, detection of the output voltage required the use of a digital voltmeter, which injected an even larger amount of uncertainty into the measurement.
But a digital oscilloscope attached to a printer would work, provided that the scope was fast enough. The oscilloscope was the measurement device (obviously). The printer was required to “prove” that the test passed. This was, after all, an application subject to FAA certification.
As a non-certification counter example, consider the product whose requirements included the following statement:
Remove unneeded code where possible and prudent.
In other words, “Make the dang thing smaller”. The idea behind the requirement was to shrink the size of the executable, although eliminating unnecessary code is usually a good thing in its own right. No amount of pleading was able to change this requirement into a quantifiable statement, either.
So how the heck can we test for this? In this case, the tester might rephrase the requirement in his or her mind to read:
The downloadable installer must be smaller than version X.
This provides a measurable goal, albeit an assumed one. More importantly, it preserves the common thread between the two statements, which is that the product needed to shrink in size.
Body Thinking
To be honest, there isn’t all that much thought involved in formally executing thoroughly prepared test cases. The main aspect of formal execution is the collection of “evidence” to prove that the tests were run and that they passed. There is, however, the need to analyze the recorded evidence as it is amassed.
For example, aerospace applications commonly must be unit tested. Each individual function or procedure must be exercised according to certain rules. The generally large number of modules involved in a certification means that the unit testing effort required is big, although each unit test itself tends to be small. Naturally, the project’s management normally tries to get the unit testing underway as soon as possible to ensure completion by the “drop-dead” date for unit test completion implied in the V model.
As the established date nears, the test manager must account for every modified unit. The last modification of the unit must predate the configured test procedures and results for that unit. All of the tests must have been peer reviewed prior to formal execution. And all of the tests must have passed during formal execution.
In other words, “dot the I’s and cross the T’s”. It is largely an exercise in bookkeeping, but that doesn’t diminish its importance.
The Swarm Mentality
To better illustrate the swarm mentality, let’s look at an unmanned rocket project that utilized the myriad butterflies of this model to overwhelm bugs that could have caused catastrophic failure. This rocket was really a new version of an existing rocket that had successfully blasted off many, many times.
First, because the new version was to be created as a change to the older version’s software, a complete and thorough system specification analysis was performed, comparing the system specs for both versions. This analysis found that:
• The old version contained a feature that didn’t apply to the new version. A special extended calculation of the horizontal bias (BH) that allowed for late-countdown (between five and ten seconds before launch) holds to be restarted within a few minutes didn’t apply to the new version of the rocket. BH was known to be meaningless after either version left the launch pad, but was calculated in the older version for up to 40 seconds after liftoff.
• The updated flight profile for the new version had not been included in the updated specification, although this omission had been agreed to by all relevant parties. That meant that discrepancies between the early trajectory profiles between the two versions were not available for examination. The contractors building the rocket didn’t want to change their agreement on this subject, so the missing trajectory profile information was marked as a risk to be targeted with extra-detailed testing.
Because of the fairly serious questions raised in the system requirements analysis, the test engineers decided to really attack the early trajectory operation of the new version. Because this was an aerospace application, they knew that the subsystems had to be qualified for flight prior to integration into the overall system. That meant that the inertial reference system (SRI) that provided the raw data required to calculate BH would work, at least as far as it was intended to.
But how could they test the interaction of the SRI and the calculation of BH? The horizontal bias was also a product of the rocket’s acceleration, so they knew that they would have to at least simulate the accelerometer inputs to the control computer (it is physically impossible to make a vibration table approach the proper values for the rocket’s acceleration).
If they had a sufficiently detailed SRI model, they could also simulate the inertial reference system. Without a detailed simulation, they’d have to use a three-axis dynamic vibration table. Because the cost of using the table for an extended period of time was higher than the cost of creating a detailed simulation, they decided to go with the all simulation approach.
In the meantime, a detailed analysis of the software requirements for both versions revealed a previously unknown conceptual error. Every exception raised in the Ada software automatically shut down the processor – whether the exception was caused by a hardware or software fault!
The thinking behind this problem was that exceptions should only address random hardware failures, where the software couldn’t hope to recover. Clearly, software exceptions were possible, even if they were improbable. So, the exception handling in the software spec was updated to differentiate between hardware and software based exceptions.
Examining the design of the software, the test engineers were amazed to discover that the horizontal bias calculations weren’t protected for Operand Error, which is automatically raised in Ada when a floating point real to integer conversion exceeds the available range of the integer container. BH was involved just such a conversion!
The justification for omitting this protection was simple, at least for the older version of the rocket. The possible values of BH were physically limited in range so that the conversion couldn’t ever overflow. But the newer version couldn’t claim that fact, so the protection for Operand Error was put into the new version’s design. Despite the fact that this could put the 80% usage goal for the SRI computer at risk, the possibility that the computer could fail was simply too great.
Finally, after much gnashing of teeth, the test engineers convinced the powers that be to completely eliminate the prolonged calculation of horizontal bias because it was useless in the new version. The combined risks of the unknown trajectory data, the unprotected conversion to integer, and the money needed to fund the accurate SRI simulation was too much for the system’s developers. They at last agreed that it was better to eliminate the unnecessary processing, even though it worked for the previous version.
As a result, the maiden demonstration flight for the Ariane 5 rocket went off without a hitch.
That’s right – I have been describing the findings of the inquiry board for the Ariane 5 in light of how a full and rigorous implementation of the butterfly model would have detected, mitigated, or eliminated them [LION96].
Ariane 4 contained an extended operation alignment function that allowed for late-countdown holds to be handled without long delays. In fact, the 33rd flight of the Ariane 4 rocket used this feature in 1989.
The Ariane 5 trajectory profile was never added to the system requirements. Instead, the lower values in the Ariane 4 trajectory data were allowed to stand.
The SRI computers (with the deficient software) were therefore never tested to the updated trajectory telemetry.
The missing Operand Error exception handling for the horizontal bias therefore never occurred during testing, causing the SRI computer to shut down.
The flawed concept of all exceptions being caused by random hardware faults was therefore never exposed.
SRI 1, the first of the dual redundant components, therefore halted on an Operand Error caused by the conversion of BH in the 39th second after liftoff. SRI 2 immediately took over as the active inertial reference system.
But then SRI 2 failed because of the same Operand Error in the following data cycle (72 msec in duration).
And therefore, Ariane 5 self destructed in the 42nd second of its maiden voyage – all for lack of a swarm of butterflies.
The Butterfly Model within the V Model Context
The butterfly model of test development is not a component of the V software development model. Instead, the butterfly test development model is a superstructure imposed atop the V model that operates semi-independently, in parallel with the development of software.
The main relationship between the V model and the butterfly swarm of testing activity is timing, at least on the design side of the V. Test development is driven by software development, for software is what we are testing. Therefore, the macro and micro iterations of software development define the points at which test development activity is both warranted and required. The individual butterflies must react to the iterative software development activity that spawned them, while the whole of the swarm helps to shape the large and small perturbations in the software design stream.
On the test side of the V, the relationship is largely reversed – the software milestones of the V model are the results of butterfly activity on the design side. The differences between the models give latitude to both the developer and the tester to envision the act of testing within their particular operational context. The developer is free to see testing as the culmination of their development activity. The tester is likewise free to see the formal execution of testing as the end of the line – where all of the analytical and test design effort that shepherded the software design process is transformed into the test artifacts required for progression from development to delivery.
But the butterfly model does not entirely fall within the bounds of the V model, either. The third issue taken with the standardized V model stated that the roots of software testing lay mainly within the boundaries of the software to be tested. But proper performance of test analysis and design require knowledge outside the realm of the application itself.
Testers in the butterfly model require knowledge of testing techniques, tools, methodologies, and technologies. Books and articles about test theory are hugely important to the successful implementation of the butterfly model. Similarly, software testing conferences and proceedings are valuable resources.
Testers in this test development model also need to keep abreast of technological advancements related to the application being developed. Trade journals and periodicals are valuable sources for such information.
In the end, the tester is required to not only know the application being tested, but also to understand (at some level) software testing, valid testing techniques, software testing tools and technologies, and even a little about human nature.
Next Steps
The butterfly model of test development is far from complete. The model as described herein is a first step toward a complete and usable model. Some of the remaining steps to finish it include:
• Creating a taxonomy of test butterflies that describes each type of testing activity within the context of the software development activity it accompanies.
• Correlating the butterfly taxonomy with a valid taxonomy of software bugs (to understand what the butterflies eat).
• Formally defining and elaborating the “objectives” associated with various testing activities.
• Creating a taxonomy of “artifacts” to better define the parameters of the model’s execution.
• Expanding visualization of the model to cover the spiral development model.
• Defining the framework necessary to achieve full implementation of the model.
• Identifying methods of automating significant portions of the model’s implementation.
The butterfly model for software test development is a semi-dependent model that represents the bifurcated role of software testing with respect to software development. The underlying realization that software development and test development are parallel processes that are separate but complementary is embodied by the butterfly model’s superposition atop the V development model.
Correlating the V model and butterfly model requires understanding that the standard V model is a high-level view of software development that hides the myriad micro-iterations all along the design and test legs of the V. These micro-iterations are the core of successful software development. They represent the incorporation of new knowledge, new requirements, and lessons learned – primarily during the design phase of software development, although the formation of test artifacts also includes some micro-iterative activity.
Tiny test butterflies occupy the termini of these micro-iterations, as well as some of their geneses. Larger, more comprehensive butterflies occupy phase segment transition points, where the nature of work is altered to reach toward the next goal of the software’s development.
The parts of the butterfly represent the three legs of successful software testing – test analysis, test design, and formal test execution. Of the three, formal execution is the smallest, although it is the only piece explicitly represented in the V model. Test analysis and test design, ignored in the V model, are recognized in the butterfly model as shaping forces for software development, as well as being the foundation for test execution.
Finally, the butterfly model is in its infancy, and there is significant work to do before it can be fully described. However, the visualization of a swarm of testing butterflies darkening the sky while they steer software away from error injection is satisfying– at last we have a physical phenomena that represents the ephemeral act of software testing.

End of document

No comments: