Machars Blog

Tuesday, September 8, 2009

The A-B-C's of software testing models

Summary:-
This article provide you brief on testing methodologies in various software development models
Theme:-
This article provide you brief on testing methodologies in various software development models

The Butterfly Model for Test Development (Part-2)

Theme:-

A Swarm of Testing
We have now examined how test analysis, test design, and test execution compose the body of the butterflies in this test development model. In order to understand how the butterfly model monitors and modifies the software development model, we need to digress slightly and reexamine the V software development model itself.
In Figure 1, not only have the micro-iterations naturally present in the design cycle been included, but the major design phase segments (characterized by their outputs) have been separated into smaller arrows to clearly define the transition point from one segment to the next. The test side of the V has been similarly separated, to demarcate the boundaries between successful formal execution of each level of testing. Complete Expanded V Development Model View
No micro-iterations on the test side of the V are shown in this depiction, although there are a few to be found – mostly around the phase segment transitions, where test execution documentary artifacts are formulated and preserved. The relative lack of micro-iterations on the test side of the V is due to the fact that it represents only the formal running of tests – the leg work of analysis and design are done elsewhere. The question, therefore, is: Where?
The answer to this all-important question is shown in Figure 2.
Figure 1. Illustration of the Butterfly Test Development Model
At all micro-iteration termini, and at some micro-iteration geneses, exists a small test butterfly. These tiny test insects each contribute to the overall testing effort, encapsulating the test analysis and design required by whatever minor change is represented by the micro-iteration.
Larger, heavier butterflies spring to life on the boundaries between design phase segments. These larger specimens carry with them the more formal analyses required to transition from one segment to the next. They also answer the call for coordination between the tests designed as part of their smaller brethren. Large butterflies also appear at the transition points between test phase segments, where documentary artifacts of test execution are created in order to claim credit for the formal execution of the test.
A single butterfly, by itself, is of no moment – it cannot possibly have much impact on the overall quality of the application and its tests. But a swarm of butterflies can blot out the sun, affecting great improvement in the product’s quality. The smallest insects handle the smallest changes, while the largest tie together the tests and analyses of them all.
The right-pointing lineage arrows, which show the roots of each test artifact in its corresponding design artifact, point to the moment in the software development model where the analysis and design of tests culminate in their formal execution.
Butterfly Thinking
“A butterfly flutters its wings in Asia, and the weather changes in Europe.” This colloquialism offers insight into the chaotic (in the mathematical sense of the word) nature of software development. Events that appear minor and far removed from relevance can have a profound impact on the software being created. Many seemingly minor and irrelevant events are just that – minor and irrelevant. But some such events, despite their appearance, are not.
Identifying these deceptions is a key outcome of the successful implementation of the butterfly model. The following paragraphs contain illustrations of this concept.
Left Wing Thinking
The FADEC must assert control over the engine’s operation within 300 msec of a power-on event.
This requirement, or a variant of it, appears in every system specification for a FADEC. It is important because it specifies the amount of time available for a cold-start initialization in the software.
The time allotted is explicit. No more than three tenths of a second may elapse before the FADEC asserts itself.
The commencement of that time period is well defined. The nearly vertical rising edge of the FADEC power signal as it moves from zero volts (off) to the operational voltage of the hardware marks the start line.
But what the heck does “assert control” mean?
While analyzing this requirement statement, that question should jump right off the written page at the tester. In one particular instance, the FADEC asserted control by crossing a threshold voltage on a specific analog signal coming out of the box. Unfortunately, that wasn’t in the specification. Instead, I had to ask the senior systems engineer, who had performed similar tests hundreds of times, how to tell when the FADEC asserted itself.
In other words, I couldn’t create a test sketch for the requirement because I couldn’t determine what the end point of the measurement should be. The system specification assumed that the reader held this knowledge, although anyone who was learning the ropes (as I was at that point) had no reasonable chance of knowing. As far as I know, this requirement has never been elaborated.
As a counterpoint example, consider the mass-market application that, according to the verbally preserved requirements, had to be “compelling”. What the heck is “compelling”, and how does one test for it?
In this case, it didn’t matter that the requirement was ill suited for testing. In fact, the testers’ opinions on the subject weren’t even asked for. But the application succeeded, as evidenced by the number of copies purchased. Customers found the product compelling, and therefore the project was a success.
But doesn’t this violate the “must be testable” rule for requirements? Not really. The need to be “compelling” doesn’t constitute a functional requirement, but is instead an aesthetic requirement. Part of the tester’s analysis should weed out such differences, where they exist.
Right Wing Thinking
Returning to our power-up timing example, how can we measure the time between two voltage-based events? There are many possibilities, although most can’t handle the precision necessary for a 300 msec window. Clocks, watches, and even stopwatches would be hideously unreliable for such a measurement.
The test stand workstation also couldn’t be used. That would require synchronization of the command to apply power with the actual application of power. There was a lag in the actual application of power, caused by the software-driven switch that had to be toggled in the test stand’s circuitry. Worse yet, detection of the output voltage required the use of a digital voltmeter, which injected an even larger amount of uncertainty into the measurement.
But a digital oscilloscope attached to a printer would work, provided that the scope was fast enough. The oscilloscope was the measurement device (obviously). The printer was required to “prove” that the test passed. This was, after all, an application subject to FAA certification.
As a non-certification counter example, consider the product whose requirements included the following statement:
Remove unneeded code where possible and prudent.
In other words, “Make the dang thing smaller”. The idea behind the requirement was to shrink the size of the executable, although eliminating unnecessary code is usually a good thing in its own right. No amount of pleading was able to change this requirement into a quantifiable statement, either.
So how the heck can we test for this? In this case, the tester might rephrase the requirement in his or her mind to read:
The downloadable installer must be smaller than version X.
This provides a measurable goal, albeit an assumed one. More importantly, it preserves the common thread between the two statements, which is that the product needed to shrink in size.
Body Thinking
To be honest, there isn’t all that much thought involved in formally executing thoroughly prepared test cases. The main aspect of formal execution is the collection of “evidence” to prove that the tests were run and that they passed. There is, however, the need to analyze the recorded evidence as it is amassed.
For example, aerospace applications commonly must be unit tested. Each individual function or procedure must be exercised according to certain rules. The generally large number of modules involved in a certification means that the unit testing effort required is big, although each unit test itself tends to be small. Naturally, the project’s management normally tries to get the unit testing underway as soon as possible to ensure completion by the “drop-dead” date for unit test completion implied in the V model.
As the established date nears, the test manager must account for every modified unit. The last modification of the unit must predate the configured test procedures and results for that unit. All of the tests must have been peer reviewed prior to formal execution. And all of the tests must have passed during formal execution.
In other words, “dot the I’s and cross the T’s”. It is largely an exercise in bookkeeping, but that doesn’t diminish its importance.
The Swarm Mentality
To better illustrate the swarm mentality, let’s look at an unmanned rocket project that utilized the myriad butterflies of this model to overwhelm bugs that could have caused catastrophic failure. This rocket was really a new version of an existing rocket that had successfully blasted off many, many times.
First, because the new version was to be created as a change to the older version’s software, a complete and thorough system specification analysis was performed, comparing the system specs for both versions. This analysis found that:
• The old version contained a feature that didn’t apply to the new version. A special extended calculation of the horizontal bias (BH) that allowed for late-countdown (between five and ten seconds before launch) holds to be restarted within a few minutes didn’t apply to the new version of the rocket. BH was known to be meaningless after either version left the launch pad, but was calculated in the older version for up to 40 seconds after liftoff.
• The updated flight profile for the new version had not been included in the updated specification, although this omission had been agreed to by all relevant parties. That meant that discrepancies between the early trajectory profiles between the two versions were not available for examination. The contractors building the rocket didn’t want to change their agreement on this subject, so the missing trajectory profile information was marked as a risk to be targeted with extra-detailed testing.
Because of the fairly serious questions raised in the system requirements analysis, the test engineers decided to really attack the early trajectory operation of the new version. Because this was an aerospace application, they knew that the subsystems had to be qualified for flight prior to integration into the overall system. That meant that the inertial reference system (SRI) that provided the raw data required to calculate BH would work, at least as far as it was intended to.
But how could they test the interaction of the SRI and the calculation of BH? The horizontal bias was also a product of the rocket’s acceleration, so they knew that they would have to at least simulate the accelerometer inputs to the control computer (it is physically impossible to make a vibration table approach the proper values for the rocket’s acceleration).
If they had a sufficiently detailed SRI model, they could also simulate the inertial reference system. Without a detailed simulation, they’d have to use a three-axis dynamic vibration table. Because the cost of using the table for an extended period of time was higher than the cost of creating a detailed simulation, they decided to go with the all simulation approach.
In the meantime, a detailed analysis of the software requirements for both versions revealed a previously unknown conceptual error. Every exception raised in the Ada software automatically shut down the processor – whether the exception was caused by a hardware or software fault!
The thinking behind this problem was that exceptions should only address random hardware failures, where the software couldn’t hope to recover. Clearly, software exceptions were possible, even if they were improbable. So, the exception handling in the software spec was updated to differentiate between hardware and software based exceptions.
Examining the design of the software, the test engineers were amazed to discover that the horizontal bias calculations weren’t protected for Operand Error, which is automatically raised in Ada when a floating point real to integer conversion exceeds the available range of the integer container. BH was involved just such a conversion!
The justification for omitting this protection was simple, at least for the older version of the rocket. The possible values of BH were physically limited in range so that the conversion couldn’t ever overflow. But the newer version couldn’t claim that fact, so the protection for Operand Error was put into the new version’s design. Despite the fact that this could put the 80% usage goal for the SRI computer at risk, the possibility that the computer could fail was simply too great.
Finally, after much gnashing of teeth, the test engineers convinced the powers that be to completely eliminate the prolonged calculation of horizontal bias because it was useless in the new version. The combined risks of the unknown trajectory data, the unprotected conversion to integer, and the money needed to fund the accurate SRI simulation was too much for the system’s developers. They at last agreed that it was better to eliminate the unnecessary processing, even though it worked for the previous version.
As a result, the maiden demonstration flight for the Ariane 5 rocket went off without a hitch.
That’s right – I have been describing the findings of the inquiry board for the Ariane 5 in light of how a full and rigorous implementation of the butterfly model would have detected, mitigated, or eliminated them [LION96].
Ariane 4 contained an extended operation alignment function that allowed for late-countdown holds to be handled without long delays. In fact, the 33rd flight of the Ariane 4 rocket used this feature in 1989.
The Ariane 5 trajectory profile was never added to the system requirements. Instead, the lower values in the Ariane 4 trajectory data were allowed to stand.
The SRI computers (with the deficient software) were therefore never tested to the updated trajectory telemetry.
The missing Operand Error exception handling for the horizontal bias therefore never occurred during testing, causing the SRI computer to shut down.
The flawed concept of all exceptions being caused by random hardware faults was therefore never exposed.
SRI 1, the first of the dual redundant components, therefore halted on an Operand Error caused by the conversion of BH in the 39th second after liftoff. SRI 2 immediately took over as the active inertial reference system.
But then SRI 2 failed because of the same Operand Error in the following data cycle (72 msec in duration).
And therefore, Ariane 5 self destructed in the 42nd second of its maiden voyage – all for lack of a swarm of butterflies.
The Butterfly Model within the V Model Context
The butterfly model of test development is not a component of the V software development model. Instead, the butterfly test development model is a superstructure imposed atop the V model that operates semi-independently, in parallel with the development of software.
The main relationship between the V model and the butterfly swarm of testing activity is timing, at least on the design side of the V. Test development is driven by software development, for software is what we are testing. Therefore, the macro and micro iterations of software development define the points at which test development activity is both warranted and required. The individual butterflies must react to the iterative software development activity that spawned them, while the whole of the swarm helps to shape the large and small perturbations in the software design stream.
On the test side of the V, the relationship is largely reversed – the software milestones of the V model are the results of butterfly activity on the design side. The differences between the models give latitude to both the developer and the tester to envision the act of testing within their particular operational context. The developer is free to see testing as the culmination of their development activity. The tester is likewise free to see the formal execution of testing as the end of the line – where all of the analytical and test design effort that shepherded the software design process is transformed into the test artifacts required for progression from development to delivery.
But the butterfly model does not entirely fall within the bounds of the V model, either. The third issue taken with the standardized V model stated that the roots of software testing lay mainly within the boundaries of the software to be tested. But proper performance of test analysis and design require knowledge outside the realm of the application itself.
Testers in the butterfly model require knowledge of testing techniques, tools, methodologies, and technologies. Books and articles about test theory are hugely important to the successful implementation of the butterfly model. Similarly, software testing conferences and proceedings are valuable resources.
Testers in this test development model also need to keep abreast of technological advancements related to the application being developed. Trade journals and periodicals are valuable sources for such information.
In the end, the tester is required to not only know the application being tested, but also to understand (at some level) software testing, valid testing techniques, software testing tools and technologies, and even a little about human nature.
Next Steps
The butterfly model of test development is far from complete. The model as described herein is a first step toward a complete and usable model. Some of the remaining steps to finish it include:
• Creating a taxonomy of test butterflies that describes each type of testing activity within the context of the software development activity it accompanies.
• Correlating the butterfly taxonomy with a valid taxonomy of software bugs (to understand what the butterflies eat).
• Formally defining and elaborating the “objectives” associated with various testing activities.
• Creating a taxonomy of “artifacts” to better define the parameters of the model’s execution.
• Expanding visualization of the model to cover the spiral development model.
• Defining the framework necessary to achieve full implementation of the model.
• Identifying methods of automating significant portions of the model’s implementation.
Summary
The butterfly model for software test development is a semi-dependent model that represents the bifurcated role of software testing with respect to software development. The underlying realization that software development and test development are parallel processes that are separate but complementary is embodied by the butterfly model’s superposition atop the V development model.
Correlating the V model and butterfly model requires understanding that the standard V model is a high-level view of software development that hides the myriad micro-iterations all along the design and test legs of the V. These micro-iterations are the core of successful software development. They represent the incorporation of new knowledge, new requirements, and lessons learned – primarily during the design phase of software development, although the formation of test artifacts also includes some micro-iterative activity.
Tiny test butterflies occupy the termini of these micro-iterations, as well as some of their geneses. Larger, more comprehensive butterflies occupy phase segment transition points, where the nature of work is altered to reach toward the next goal of the software’s development.
The parts of the butterfly represent the three legs of successful software testing – test analysis, test design, and formal test execution. Of the three, formal execution is the smallest, although it is the only piece explicitly represented in the V model. Test analysis and test design, ignored in the V model, are recognized in the butterfly model as shaping forces for software development, as well as being the foundation for test execution.
Finally, the butterfly model is in its infancy, and there is significant work to do before it can be fully described. However, the visualization of a swarm of testing butterflies darkening the sky while they steer software away from error injection is satisfying– at last we have a physical phenomena that represents the ephemeral act of software testing.

End of document

The Butterfly Model for Test Development ( part-1)

Summary:-
There is a dichotomy between the development and testing of software. This schism is illustrated by the plethora of development models employed for planning and estimating the development of software as opposed to the scarcity of valid test development models. At first glance, the same models which serve to underlay the software development process with forethought and diligence appear to be adequate for the more complex task of planning, developing, and executing adequate verification of the application.
Unfortunately, software development models were not intended to encapsulate the vagaries of software verification and validation, the two main goals of software testing. Indeed, software development models can be antithetical to the effective testing of software. It lies in the hands of software testing professionals, therefore, to define an effective model for software test development that complements and completes any given software development model.
One such test development model is the Butterfly Model, which I will explore in some detail in this paper. It should be understood that the butterfly model is neither separate nor integrated with the development model, but instead is a monitoring and modifying factor in the completion of the development model. While this may seem arbitrary and self-contradictory, it is my hope that the elaboration of the butterfly model presented herein will both explain and justify this statement.
In this paper I will present a modified view of the ubiquitous “V” software development model. On top of this modified model I will superpose the butterfly model of test development. Finally, I will reconcile the relationship between the models, clarifying the effects of each on the other and identifying the information portals germane to both, together or separately.

Theme:-
The Standard V Software Development Model
Nearly everyone familiar with modern software development knows of the standard V development model, depicted below.
In this standardized image of the V development model, both the design and test phases of development are represented as linear processes that are gated according to the specific products of specific activities. On the design side, system requirements beget software requirements, which then beget a software design, which in turn begets an implementation.
On the test side of development, the software design begets unit tests. Similarly, software requirements beget integration tests (with a little help from the system requirements). Finally, system requirements beget system tests. Acceptance testing, being the domain of the end user of the application, is deliberately omitted from this view of the V model.
It should be understood that the V model is simply a more expressive rearrangement of the waterfall model, with the waterfall’s time-line component mercifully eliminated and abstraction of the system indicated by the vertical distance from the implementation. The V model is correct as far as it goes, in that it expresses most of the lineage required for the artifacts of successful software development. From an application development point of view, this depiction of the model is sufficient to convey the source associations of the major development cycle artifacts, including test artifacts.
Unfortunately, the application development viewpoint falls well short of the software test development vantage required to create and maintain effective test artifacts.
Rigor of Model Enforcement
Before launching into a discussion of the shortfalls of the V software development model, a side excursion to examine the appropriate level of rigor in enforcing the model is warranted. It needs to be recognized from the start that not all applications will select to implement the V model in the same manner. Generally, deciding on how rigidly the model must be followed is largely a product of understanding the operational arena of the application.
For example, any certification requirements attached to the application will dictate the rigor of the model’s implementation. Safety critical software in the commercial aerospace arena, for example, undergo an in-depth certification review prior to being released for industry use. Applications in this arena therefore tailor their implementation of the V model toward fulfillment of the objectives listed for each segment of the process in RTCA/DO-178B, the Federal Aviation Administration’s (FAA’s) selected guidelines for certification.
Similarly, medical devices containing software that affects safety must be developed using a version of the model that fulfills the certification requirements imposed by the Food and Drug Administration (FDA). As automotive embedded controller software continues to delve into applications that directly affect occupant safety (such as actuator based steering), it can be expected that some level of certification requirement will be instituted for that arena, as well.
Other arenas do not require anything approaching this level of rigor in their process. If the application cannot directly cause injury or the loss of life, or trigger the financial demise of a company, then it can most likely follow a streamlined version of the V model.
Web applications generally fall into this category, as do many e-commerce and home-computing applications. In fact, more applications fall into the second category than the first. That doesn’t exempt them from the need to follow the model, however. It simply modifies the parameters of their implementation of the model.
Where the V Model Leaves Off
The main issue with the V development model is not its depiction of ancestral relationships between test artifacts and their design artifact progenitors. Instead, there are three facets of the V model’s that are incomplete and must be accounted for. Just as in software development, we must define the problem before we can attempt to solve it.
First, the V model is inherently a linear expression of a nonlinear process. The very existence of the spiral model of software development should be sufficient evidence of the nonlinearity of software development, but this point deserves further examination.
Software design artifacts, just like the software program they serve, are created and maintained by people. People make mistakes. The existence of software testers bears witness to this, as does the amount of buggy software that still seems to permeate the marketplace, despite the best efforts of software development and testing professionals. When mistakes are found in an artifact, the error must be corrected. The act of correction, in a small way, is another iteration of the original development of the artifact.
The second deficient aspect of the V model is its implication of unidirectional flow from design artifacts into test artifacts. Any seasoned software developer understands that feedback within the development cycle is an absolute necessity. The arrows depicting the derivation of tests from the design artifacts should in reality be two headed, although the left-pointing arrowhead would be significantly smaller than the right-pointing head.
While test artifacts are generally derived from their corresponding design artifacts, the fact that a test artifact must be so derived needs to be factored in when creating the design artifact in the first place. Functional requirements must be testable – they must be stated in such a manner as to be conducive to analysis, measurement, or demonstration. Vague statement of the requirements is a clear indicator of trouble down the road. Likewise, software designs need to be complete and unambiguous. The implementation methodology called for in the software design must be clear enough to drive the definition of appropriate test cases for the verification and validation of that design.
If the implementation itself is to be part of the test ancestry, then it, too, must be concise and complete, with adequate commentary on the techniques employed in its construction but without ambiguity or self-contradiction.
It should be noted that this discussion of the second deficient aspect of the V model is predicated on a rigorous enforcement of the model’s dictates, such as is required for most aerospace applications. For less rigorous instances of the model, the absolutes listed above may not apply. This issue will be discussed further later in this paper.
The third deficient aspect of the V software development model is its encapsulation of test artifact ancestry solely within the domain of the design artifacts. As stated above, test artifacts are generally derived from their corresponding design artifacts. There are a multitude of other sources that must be touched upon to ensure success in generating a “complete” battery of tests for the software being developed.
A Closer View
The first issue mentioned with regard to the V development model is its essential linearization of a nonlinear process – software development. This problem is one of perception, really, or perhaps perspective. The root cause can be found in the fact that the V software development model is a simplified visualization tool that illustrates a complex and interrelated process. A more detailed view of a segment of the design leg (which segment is immaterial) is shown below.
In this expanded view of the design leg of the V, the micro-iterative feedback depicted by the small black arrows within the overall gray feed-forward thrust are visible. Each micro-iteration represents the accumulation of further data, application of a lesson learned, or even the bright idea someone dreamed up while singing in the shower. The point to be made here is this: The general forward-leaning nature of the legs of the V tends to disguise the frenzied iterations in thought, specification, and development required to create a useful application.
There are critical points along the software development stream that must be accounted for in any valid test development model. For example, any time there is a handoff of an artifact (or part of an artifact), the transacted artifact must be analyzed with respect to its contents and any flow-down effects caused by those contents [MARI99]. In the expanded view of the V development model shown above, the left edge of the broad arrow represents the genesis of a change in the artifact under development. This edge, where new or modified information is being introduced, is the starting point for all new micro-iterations. The right edge of the broad arrow is the terminus for each micro-iteration, where the new or modified information is fully incorporated in the artifact.
It should be further understood that micro-iterations can be independent of each other. In fact, most significant software development incorporates a maelstrom of independent micro-iterations that ebb and flow both concurrently and continuously throughout the overall development cycle.
The spiral model of software development, which many consider to be superior to the V model, is founded on an explicit understanding of the iterative nature of software creation. Unfortunately, the spiral model tends to be expressed on a macro scale, hiding the developmental perturbations needed for the production of useful design and test artifacts.
The Butterfly Model
Now that we have rediscovered the hidden micro-iterations in a successful process based on the V model, we need to understand the source of these perturbations. Further, we need to understand the fundamental interconnectedness of it all, to borrow an existential phrase.
Butterflies are composed of three pieces – two wings and a body. Each part represents a piece of software testing, as described hereafter.
Test Analysis
The left wing of the butterfly represents test analysis – the investigation, quantization, and/or re-expression of a facet of the software to be tested. Analysis is both the byproduct and foundation of successful test design. In its earliest form, analysis represents the thorough pre-examination of design and test artifacts to ensure the existence of adequate testability, including checking for ambiguities, inconsistencies, and omissions.
Test analysis must be distinguished from software design analysis. Software design analysis is constituted by efforts to define the problem to be solved, break it down into manageable and cohesive chunks, create software that fulfills the needs of each chunk, and finally integrate the various software components into an overall program that solves the original problem. Test analysis, on the other hand, is concerned with validating the outputs of each software development stage or micro-iteration, as well as verifying compliance of those outputs to the (separately validated) products of previous stages.
Test analysis mechanisms vary according to the design artifact being examined. For an aerospace software requirement specification, the test engineer would do all of the following, as a minimum:
• Verify that each requirement is tagged in a manner that allows correlation of the tests for that requirement to the requirement itself. (Establish Test Traceability)
• Verify traceability of the software requirements to system requirements.
• Inspect for contradictory requirements.
• Inspect for ambiguous requirements.
• Inspect for missing requirements.
• Check to make sure that each requirement, as well as the specification as a whole, is understandable.
• Identify one or more measurement, demonstration, or analysis method that may be used to verify the requirement’s implementation (during formal testing).
• Create a test “sketch” that includes the tentative approach and indicates the test’s objectives.
Out of the items listed above, only the last two are specifically aimed at the act of creating test cases. The other items are almost mechanical in nature, where the test design engineer is simply checking the software engineer’s work. But all of the items are germane to test analysis, where any error can manifest itself as a bug in the implemented application.
Test analysis also serves a valid and valuable purpose within the context of software development. By digesting and restating the contents of a design artifact (whether it be requirements or design), testing analysis offers a second look – from another viewpoint – at the developer’s work. This is particularly true with regard to lower-level design artifacts like detailed design and source code.
This kind of feedback has a counterpart in human conversation. To verify one’s understanding of another person’s statements, it is useful to rephrase the statement in question using the phrase “So, what you’re saying is…”. This powerful method of confirming comprehension and eliminating miscommunication is just as important for software development – it helps to weed out misconceptions on the part of both the developer and tester, and in the process identifies potential problems in the software itself.
It should be clear from the above discussion that the tester’s analysis is both formal and informal. Formal analysis becomes the basis for documentary artifacts of the test side of the V. Informal analysis is used for immediate feedback to the designer in order to both verify that the artifact captures the intent of the designer and give the tester a starting point for understanding the software to be tested.
In the bulleted list shown above, the first two analyses are formal in nature (for an aerospace application). The verification of system requirement tags is a necessary step in the creation of a test traceability matrix. The software to system requirements traceability matrix similarly depends on the second analysis.
The three inspection analyses listed are more informal, aimed at ensuring that the specification being examined is of sufficient quality to drive the development of a quality implementation. The difference is in how the analytical outputs are used, not in the level of effort or attention that go into the analysis.
Test Design
Thus far, the tester has produced a lot of analytical output, some semi-formalized documentary artifacts, and several tentative approaches to testing the software. At this point, the tester is ready for the next step: test design.
The right wing of the butterfly represents the act of designing and implementing the test cases needed to verify the design artifact as replicated in the implementation. Like test analysis, it is a relatively large piece of work. Unlike test analysis, however, the focus of test design is not to assimilate information created by others, but rather to implement procedures, techniques, and data sets that achieve the test’s objective(s).
The outputs of the test analysis phase are the foundation for test design. Each requirement or design construct has had at least one technique (a measurement, demonstration, or analysis) identified during test analysis that will validate or verify that requirement. The tester must now put on his or her development hat and implement the intended technique.
Software test design, as a discipline, is an exercise in the prevention, detection, and elimination of bugs in software. Preventing bugs is the primary goal of software testing [BEIZ90]. Diligent and competent test design prevents bugs from ever reaching the implementation stage. Test design, with its attendant test analysis foundation, is therefore the premiere weapon in the arsenal of developers and testers for limiting the cost associated with finding and fixing bugs.
Before moving further ahead, it is necessary to comment on the continued analytical work performed during test design. As previously noted, tentative approaches are mapped out in the test analysis phase. During the test design phase of test development, those tentatively selected techniques and approaches must be evaluated more fully, until it is “proven” that the test’s objectives are met by the selected technique. If all tentatively selected approaches fail to satisfy the test’s objectives, then the tester must put his test analysis hat back on and start looking for more alternatives.
Test Execution
In the butterfly model of software test development, test execution is a separate piece of the overall approach. In fact, it is the smallest piece – the slender insect’s body – but it also provides the muscle that makes the wings work. It is important to note, however, that test execution (as defined for this model) includes only the formal running of the designed tests. Informal test execution is a normal part of test design, and in fact is also a normal part of software design and development.
Formal test execution marks the moment in the software development process where the developer and the tester join forces. In a way, formal execution is the moment when the developer gets to take credit for the tester’s work – by demonstrating that the software works as advertised. The tester, on the other hand, should already have proactively identified bugs (in both the software and the tests) and helped to eliminate them – well before the commencement of formal test execution!
Formal test execution should (almost) never reveal bugs. I hope this plain statement raises some eyebrows – although it is very much true. The only reasonable cause of unexpected failure in a formal test execution is hardware failure. The software, along with the test itself, should have been through the wringer enough to be bone-dry.
Note, however, that unexpected failure is singled out in the above paragraph. That implies that some software tests will have expected failures, doesn’t it? Yes, it surely does!
The reasons behind expected failure vary, but allow me to relate a case in point:
In the commercial jet engine control business, systems engineers prepare a wide variety of tests against the system (being the FADEC – or Full Authority Digital Engine Control) requirements. One such commonly employed test is the “flight envelope” test. The flight envelope test essentially begins with the simulated engine either off or at idle with the real controller (both hardware and software) commanding the situation. Then the engine is spooled up and taken for a simulated ride throughout its defined operational domain – varying altitude, speed, thrust, temperature, etc. in accordance with real world recorded profiles. The expected results for this test are produced by running a simulation (created and maintained independently from the application software itself) with the same input data sets.
Minor failures in the formal execution of this test are fairly common. Some are hard failures – repeatable on every single run of the test. Others are soft – only intermittently reaching out to bite the tester. Each and every failure is investigated, naturally – and the vast majority of flight envelope failures are caused by test stand problems. These can include issues like a voltage source being one twentieth of a volt low, or slight timing mismatches caused by the less exact timekeeping of the test stand workstation as compared to the FADEC itself.
Some flight envelope failures are attributed to the model used to provide expected results. In such cases, hours and days of gut-wrenching analytical work go into identifying the miniscule difference between the model and the actual software.
A handful of flight envelope test failures are caused by the test parameters themselves. Tolerances may be set at unrealistically tight levels, for example. Or slight operating mode mismatches between the air speed and engine fan speed may cause a fault to be intermittently annunciated.
In very few cases have I seen the software being tested lay at the root of the failure. (I did witness the bugs being fixed, by the way!)
The point is this – complex and complicated tests can fail due to a variety of reasons, from hardware failure, through test stand problems, to application error. Intermittent failures may even jump into the formal run, just to make life interesting.
But the test engineer understands the complexity of the test being run, and anticipates potential issues that may cause failures. In fact, the test is expected to fail once in a while. If it doesn’t, then it isn’t doing its job – which is to exercise the control software throughout its valid operational envelope. As in all applications, the FADEC’s boundaries of valid operation are dark corners in which bugs (or at least potential bugs) congregate.
It was mentioned during our initial discussion of the V development model that the model is sufficient, from a software development point of view, to express the lineage of test artifacts. This is because testing, again from the development viewpoint, is composed of only the body of the butterfly – formal test execution. We testers, having learned the hard way, know better.

To be continued in Part-2

Understanding Metrics in Software Testing

Summary:-
Metrics are the means by which the software quality can be measured; they give you confidence in the product. You may consider these product management indicators, which can be either quantitative or qualitative. They are typically the providers of the visibility you need.
Theme:-
Metrics
Metrics usually fall into a few categories: project management (which includes process efficiency) and process improvement. People are often confused about what metrics they should be using. You may use different metrics for different purposes. For example, you may have a set of metrics that you use to evaluate the output of your test team. One such metric may be the project management measure of the number of bugs found. Others may be an efficiency measure of the number of test cases written, or the number of tests executed in a given period of time.
________________________________________
The goal is to choose metrics that will help you understand the state of your product.
________________________________________
Ultimately, when you consider the value of a metric, you need to ask if it provides visibility into the software product's quality. Metrics are only useful if they help you to make sound business decisions in a timely manner. If the relevancy or integrity of a metric cannot be justified, don't use it. Consider, for example, how management analysis and control makes use of financial reports such as profit/loss, cash flow, ratios, job costing, etc. These reports help you navigate your business in a timely manner. Engineering metrics are analogous, providing data to help perform analyses and control the development process. However, your engineers may not be the right people to give you the metrics you need to help in making business decisions, because they are not trained financial analysts. As an executive, you need to determine what metrics you want and tell your staff to provide them.
For example, coverage metrics are essential for your team. Coverage is the measure of some amount of testing. You could have requirements coverage metrics, platform coverage metrics, path coverage metrics, scenario coverage metrics, or even test plan coverage metrics, to name a few. Cem Kaner lists over 100 types of coverage measures in his paper "Negligence and Testing Coverage." Before the project starts, it is important to come to agreement on how you will measure test coverage. Obviously, the more coverage of a certain type, the less risk associated with that type.
The goal is to choose metrics that will help you understand the state of your product. Wisely choose a handful of these metrics specific to your type of project and use them to give you visibility into how close the product is to release. The test group needs to be providing you plenty of useful information with these metrics.
Conclusion
The metrics provided by testing offer a major benefit to executives: visibility into the maturity and readiness for release or production, and visibility into the quality of the software product under development. This enables effective management of the software development process, by allowing clear measurement of the quality and completeness of the product.

End of document

Software testing metrics for a medium-sized project

Summary:- This article provides you details on all the metrics one should collect for a typical medium-sized software testing project and how long these metrics be collected during the project schedule?
Theme:-
IMHO, project size doesn't change your need to know what you're doing, which is what metrics are for. And I can't think of a point in a project when it's no longer necessary to know what's going on. Failing to know key measures, including the consequences after the project supposedly is done, is a major way in which small projects turn into big projects.
Basically, you always need measures of two things: (1) results you are getting, and (2) the causes of those results.
Results
Typically, the primary measure of results is whether the project is on time and in budget, which usually actually says more about the effectiveness of setting budgets and schedules than about the project itself. Poorly set budgets and schedules are the biggest reasons for overruns. Other results measures include size and quality of what has been produced.
Size may be measured in terms of KLOC (K for thousand, LOC for lines of code), function points, modules, objects, methods, or similar units which reliably describe physical size of software produced. Some people measure project size in number of requirements or pages of design. Other types of sizing measures include capacity, such as number of users or sites served, and database and transaction volumes. Project results involving hardware are also often sized with respect to numbers and capacities or capabilities of hardware components. A highway project ordinarily would be sized with respect to the length of the road involved. Although somewhat circular, many projects are sized by the budget and/or schedule.
Quality of results is typically measured in terms of defects, ordinarily as defect density, which is the number of defects relative to the physical size of the product, system or software. However, the way many folks measure defects can create as many issues as it addresses.
For instance, it's especially common for defect measures to include only coding errors, which reflect poorly on the developer and thereby create incentives for developers to pay more attention to avoiding accountability than actually doing a good job. Arguing about whether something is a defect is a pretty nonproductive use of everyone's time. "Coded as designed" and "user error" argument distractions can be prevented by making sure that defects also can be categorized as requirements, design, instructions and operational defects.
Results value
In addition to these physical size and quality measures of results, it's essential to quantify results in terms of value, which is what stumps many people. Probably the simplest method used is the percentage of defined requirements that have been implemented.
Percentages alone don't tell the full story because all requirements are not created equal with respect to size or value, and there can be wide variations in how well a requirement has been satisfied and how adequately the requirements have been defined. That's why it's essential to use effective methods to discover the REAL business requirements -- deliverable whats that provide value when delivered (or met or satisfied).
Ultimately, value should be measured in money. Monetary benefits come from four sources. Cost savings mean eliminating or reducing existing expenditures (unfortunately the most common method is eliminating jobs). Cost avoidance means not having to incur an otherwise additional future expense. Revenue enhancement occurs when one sells more, charges more for what they sell, and/or collects more of what they charge. Revenue protection involves retaining existing sales, which includes compliance with laws and regulations necessary to stay operational.
Actually, value is a net figure, which also must take into account the investment cost of achieving the benefit return. Thus, value most often is measured as return on investment (ROI). Conventional ROI determinations are frequently unreliable because they tend to fall prey to 10 common but seldom recognized pitfalls. (See www.proveit.net for information about determining right, reliable and responsible "REAL ROI.")
Causes
In order to sustain and improve results, it's necessary to identify and measure the causes of those results. Basic causal measures are resource costs/effort and time duration of the project work. Size and complexity of the project, of course, are the biggest determinants of effort and duration; they also are major sources of risk, which is another causal factor to consider.
Usually it's helpful to measure causes and results with respect to life cycle stages, such as requirements, design, development, unit testing, integration testing, system testing, acceptance testing and production. Distinguishing new code from modified code can be helpful for understanding causes of results.
Similarly, causes of results can be identified with respect to factors such as development methodology, use of particular types of tools and techniques, platform and language, and staff skills and experience.
By measuring results associated with these various types of causal factors, it's usually possible to tell what's going well and what needs improvement. Moreover, these more granular measures give quicker indication how well improvements are working.

End of document

Software testing metrics for a medium-sized project

This article provides you details on all the metrics one should collect for a typical medium-sized software testing project and how long these metrics be collected during the project schedule?
Theme
MHO, project size doesn't change your need to know what you're doing, which is what metrics are for. And I can't think of a point in a project when it's no longer necessary to know what's going on. Failing to know key measures, including the consequences after the project supposedly is done, is a major way in which small projects turn into big projects.
Basically, you always need measures of two things: (1) results you are getting, and (2) the causes of those results.
Results
Typically, the primary measure of results is whether the project is on time and in budget, which usually actually says more about the effectiveness of setting budgets and schedules than about the project itself. Poorly set budgets and schedules are the biggest reasons for overruns. Other results measures include size and quality of what has been produced.
Size may be measured in terms of KLOC (K for thousand, LOC for lines of code), function points, modules, objects, methods, or similar units which reliably describe physical size of software produced. Some people measure project size in number of requirements or pages of design. Other types of sizing measures include capacity, such as number of users or sites served, and database and transaction volumes. Project results involving hardware are also often sized with respect to numbers and capacities or capabilities of hardware components. A highway project ordinarily would be sized with respect to the length of the road involved. Although somewhat circular, many projects are sized by the budget and/or schedule.
Quality of results is typically measured in terms of defects, ordinarily as defect density, which is the number of defects relative to the physical size of the product, system or software. However, the way many folks measure defects can create as many issues as it addresses.
For instance, it's especially common for defect measures to include only coding errors, which reflect poorly on the developer and thereby create incentives for developers to pay more attention to avoiding accountability than actually doing a good job. Arguing about whether something is a defect is a pretty nonproductive use of everyone's time. "Coded as designed" and "user error" argument distractions can be prevented by making sure that defects also can be categorized as requirements, design, instructions and operational defects.
Results value
In addition to these physical size and quality measures of results, it's essential to quantify results in terms of value, which is what stumps many people. Probably the simplest method used is the percentage of defined requirements that have been implemented.
Percentages alone don't tell the full story because all requirements are not created equal with respect to size or value, and there can be wide variations in how well a requirement has been satisfied and how adequately the requirements have been defined. That's why it's essential to use effective methods to discover the REAL business requirements -- deliverable whats that provide value when delivered (or met or satisfied).
Ultimately, value should be measured in money. Monetary benefits come from four sources. Cost savings mean eliminating or reducing existing expenditures (unfortunately the most common method is eliminating jobs). Cost avoidance means not having to incur an otherwise additional future expense. Revenue enhancement occurs when one sells more, charges more for what they sell, and/or collects more of what they charge. Revenue protection involves retaining existing sales, which includes compliance with laws and regulations necessary to stay operational.
Actually, value is a net figure, which also must take into account the investment cost of achieving the benefit return. Thus, value most often is measured as return on investment (ROI). Conventional ROI determinations are frequently unreliable because they tend to fall prey to 10 common but seldom recognized pitfalls. (See www.proveit.net for information about determining right, reliable and responsible "REAL ROI.")
Causes
In order to sustain and improve results, it's necessary to identify and measure the causes of those results. Basic causal measures are resource costs/effort and time duration of the project work. Size and complexity of the project, of course, are the biggest determinants of effort and duration; they also are major sources of risk, which is another causal factor to consider.
Usually it's helpful to measure causes and results with respect to life cycle stages, such as requirements, design, development, unit testing, integration testing, system testing, acceptance testing and production. Distinguishing new code from modified code can be helpful for understanding causes of results.
Similarly, causes of results can be identified with respect to factors such as development methodology, use of particular types of tools and techniques, platform and language, and staff skills and experience.
By measuring results associated with these various types of causal factors, it's usually possible to tell what's going well and what needs improvement. Moreover, these more granular measures give quicker indication how well improvements are working.

End of document

Measuring Defect Removal Accurately

Summary:
This article provides you details on test metrics at product, process and project level

PRODUCT
Test metric Definition Purpose How to calculate
Number of remarks The total number of remarks found in a given time period/phase/test type. A remark is a claim made by test engineer that the application shows an undesired behavior. It may or may not result in software modification or changes to documentation. One of the earliest indicators to measure once the testing commences; provides initial indications about the stability of the software. Total number of remarks found.
Number of defects The total number of remarks found in a given time period/phase/test type that resulted in software or documentation modifications. A more meaningful way of assessing the stability and reliability of the software than number of remarks. Duplicate remarks have been eliminated; rejected remarks have been done. Only remarks that resulted in modifying the software or the documentation are counted.
Remark status The status of the defect could vary depending upon the defect-tracking tool that is used. Broadly, the following statuses are available: To be solved: Logged by the test engineers and waiting to be taken over by the software engineer. To be retested: Solved by the developer, and waiting to be retested by the test engineer. Closed: The issue was retested by the test engineer and was approved. Track the progress with respect to entering, solving and retesting the remarks. During this phase, the information is useful to know the number of remarks logged, solved, waiting to be resolved and retested. This information can normally be obtained directly from the defect tracking system based on the remark status.
Defect severity The severity level of a defect indicates the potential business impact for the end user (business impact = effect on the end user x frequency of occurrence). Provides indications about the quality of the product under test. High-severity defects means low product quality, and vice versa. At the end of this phase, this information is useful to make the release decision based on the number of defects and their severity levels. Every defect has severity levels attached to it. Broadly, these are Critical, Serious, Medium and Low.
Defect severity index An index representing the average of the severity of the defects. Provides a direct measurement of the quality of the product—specifically, reliability, fault tolerance and stability. Two measures are required to compute the defect severity index. A number is assigned against each severity level: 4 (Critical), 3 (Serious), 2 (Medium), 1 (Low). Multiply each remark by its severity level number and add the totals; divide this by the total number of defects to determine the defect severity index.
Time to find a defect The effort required to find a defect. Shows how fast the defects are being found. This metric indicates the correlation between the test effort and the number of defects found. Divide the cumulative hours spent on test execution and logging defects by the number of defects entered during the same period.
Time to solve a defect Effort required to resolve a defect (diagnosis and correction). Provides an indication of the maintainability of the product and can be used to estimate projected maintenance costs. Divide the number of hours spent on diagnosis and correction by the number of defects resolved during the same period.
Test coverage Defined as the extent to which testing covers the product’s complete functionality. This metric is an indication of the completeness of the testing. It does not indicate anything about the effectiveness of the testing. This can be used as a criterion to stop testing. Coverage could be with respect to requirements, functional topic list, business flows, use cases, etc. It can be calculated based on the number of items that were covered vs. the total number of items.
Test case effectiveness The extent to which test cases are able to find defects. This metric provides an indication of the effectiveness of the test cases and the stability of the software. Ratio of the number of test cases that resulted in logging remarks vs. the total number of test cases.
Defects/ KLOC The number of defects per 1,000 lines of code. This metric indicates the quality of the product under test. It can be used as a basis for estimating defects to be addressed in the next phase or the next version. Ratio of the number of defects found vs. the total number of lines of code (thousands)
PROJECT
Workload capacity ratio Ratio of the planned workload and the gross capacity for the total test project or phase. This metric helps in detecting issues related to estimation and planning. It serves as an input for estimating similar projects as well. Computation of this metric often happens in the beginning of the phase or project. Workload is determined by multiplying the number of tasks against their norm times. Gross capacity is nothing but planned working time, determined by workload divided by gross capacity.
Test planning performance The planned value related to the actual value. Shows how well estimation was done. The ratio of the actual effort spent to the planned effort.
Test effort percentage Test effort is the amount of work spent, in hours or days or weeks. Overall project effort is divided among multiple phases of the project: requirements, design, coding, testing and such. The effort spent in testing, in relation to the effort spent in the development activities, will give us an indication of the level of investment in testing. This information can also be used to estimate similar projects in the future. This metric can be computed by dividing the overall test effort by the total project effort.
Defect category An attribute of the defect in relation to the quality attributes of the product. Quality attributes of a product include functionality, usability, documentation, performance, installation and internationalization. This metric can provide insight into the different quality attributes of the product. This metric can be computed by dividing the defects that belong to a particular category by the total number of defects.

PROCESS
Should be found in which phase An attribute of the defect, indicating in which phase the remark should have been found. Are we able to find the right defects in the right phase as described in the test strategy? Indicates the percentage of defects that are getting migrated into subsequent test phases. Computation of this metric is done by calculating the number of defects that should have been found in previous test phases.
Residual defect density An estimate of the number of defects that may have been unresolved in the product phase. The goal is to achieve a defect level that is acceptable to the clients. We remove defects in each of the test phases so that few will remain. This is a tricky issue. Released products have a basis for estimation. For new versions, industry standards, coupled with project specifics, form the basis for estimation.
Defect remark ratio Ratio of the number of remarks that resulted in software modification vs. the total number of remarks. Provides an indication of the level of understanding between the test engineers and the software engineers about the product, as well as an indirect indication of test effectiveness. The number of remarks that resulted in software modification vs. the total number of logged remarks. Valid for each test type, during and at the end of test phases.
Valid remark ratio Percentage of valid remarks during a certain period. Valid remarks = number of defects + duplicate remarks + number of remarks that will be resolved in the next phase or release. Indicates the efficiency of the test process. Ratio of the total number of remarks that are valid to the total number of remarks found.
Bad fix ratio Percentage of the number of resolved remarks that resulted in creating new defects while resolving existing ones. Indicates the effectiveness of the defect-resolution process, plus indirect indications as to the maintainability of the software. Ratio of the total number of bad fixes to the total number of resolved defects. This can be calculated per test type, test phase or time period.
Defect removal efficiency The number of defects that are removed per time unit (hours/days/weeks) Indicates the efficiency of defect removal methods, as well as indirect measurement of the quality of the product. Computed by dividing the effort required for defect detection, defect resolution time and retesting time by the number of remarks. This is calculated per test type, during and across test phases.
Phase yield Defined as the number of defects found during the phase of the development life cycle vs. the estimated number of defects at the start of the phase. Shows the effectiveness of the defect removal. Provides a direct measurement of product quality; can be used to determine the estimated number of defects for the next phase. Ratio of the number of defects found by the total number of estimated defects. This can be used during a phase and also at the end of the phase.
Backlog development The number of remarks that are yet to be resolved by the development team. Indicates how well the software engineers are coping with the testing efforts. The number of remarks that remain to be resolved.
Backlog testing The number of resolved remarks that are yet to be retested by the development team. Indicates how well the test engineers are coping with the development efforts. The number of remarks that have been resolved.
Scope changes The number of changes that were made to the test scope. Indicates requirements stability or volatility, as well as process stability. Ratio of the number of changed items in the test scope to the total number of items.

End of document

Subject: Software testing metrics for a medium-sized project

Author: Robin F Goldsmith
Summary:- This article provides you details on all the metrics one should collect for a typical medium-sized software testing project and how long these metrics be collected during the project schedule?
Theme:-
IMHO, project size doesn't change your need to know what you're doing, which is what metrics are for. And I can't think of a point in a project when it's no longer necessary to know what's going on. Failing to know key measures, including the consequences after the project supposedly is done, is a major way in which small projects turn into big projects.
Basically, you always need measures of two things: (1) results you are getting, and (2) the causes of those results.
Results
Typically, the primary measure of results is whether the project is on time and in budget, which usually actually says more about the effectiveness of setting budgets and schedules than about the project itself. Poorly set budgets and schedules are the biggest reasons for overruns. Other results measures include size and quality of what has been produced.
Size may be measured in terms of KLOC (K for thousand, LOC for lines of code), function points, modules, objects, methods, or similar units which reliably describe physical size of software produced. Some people measure project size in number of requirements or pages of design. Other types of sizing measures include capacity, such as number of users or sites served, and database and transaction volumes. Project results involving hardware are also often sized with respect to numbers and capacities or capabilities of hardware components. A highway project ordinarily would be sized with respect to the length of the road involved. Although somewhat circular, many projects are sized by the budget and/or schedule.
Quality of results is typically measured in terms of defects, ordinarily as defect density, which is the number of defects relative to the physical size of the product, system or software. However, the way many folks measure defects can create as many issues as it addresses.
For instance, it's especially common for defect measures to include only coding errors, which reflect poorly on the developer and thereby create incentives for developers to pay more attention to avoiding accountability than actually doing a good job. Arguing about whether something is a defect is a pretty nonproductive use of everyone's time. "Coded as designed" and "user error" argument distractions can be prevented by making sure that defects also can be categorized as requirements, design, instructions and operational defects.
Results value
In addition to these physical size and quality measures of results, it's essential to quantify results in terms of value, which is what stumps many people. Probably the simplest method used is the percentage of defined requirements that have been implemented.
Percentages alone don't tell the full story because all requirements are not created equal with respect to size or value, and there can be wide variations in how well a requirement has been satisfied and how adequately the requirements have been defined. That's why it's essential to use effective methods to discover the REAL business requirements -- deliverable whats that provide value when delivered (or met or satisfied).
Ultimately, value should be measured in money. Monetary benefits come from four sources. Cost savings mean eliminating or reducing existing expenditures (unfortunately the most common method is eliminating jobs). Cost avoidance means not having to incur an otherwise additional future expense. Revenue enhancement occurs when one sells more, charges more for what they sell, and/or collects more of what they charge. Revenue protection involves retaining existing sales, which includes compliance with laws and regulations necessary to stay operational.
Actually, value is a net figure, which also must take into account the investment cost of achieving the benefit return. Thus, value most often is measured as return on investment (ROI). Conventional ROI determinations are frequently unreliable because they tend to fall prey to 10 common but seldom recognized pitfalls. (See www.proveit.net for information about determining right, reliable and responsible "REAL ROI.")
Causes
In order to sustain and improve results, it's necessary to identify and measure the causes of those results. Basic causal measures are resource costs/effort and time duration of the project work. Size and complexity of the project, of course, are the biggest determinants of effort and duration; they also are major sources of risk, which is another causal factor to consider.
Usually it's helpful to measure causes and results with respect to life cycle stages, such as requirements, design, development, unit testing, integration testing, system testing, acceptance testing and production. Distinguishing new code from modified code can be helpful for understanding causes of results.
Similarly, causes of results can be identified with respect to factors such as development methodology, use of particular types of tools and techniques, platform and language, and staff skills and experience.
By measuring results associated with these various types of causal factors, it's usually possible to tell what's going well and what needs improvement. Moreover, these more granular measures give quicker indication how well improvements are working.

End of document

Software testing Metrices - Test Case Review Effectiveness

Summary:- This article provides you laundry list of all metrics for test case review
Theme:-
Metrics are the means by which the software quality can be measured; they give you confidence in the product. You may consider these product management indicators, which can be either quantitative or qualitative. They are typically the providers of the visibility you need.
The goal is to choose metrics that will help you understand the state of your product.

Metrics for Test Case Review Effectiveness:

1. Major Defects Per Test Case Review
2. Minor Defects Per Test Case Review
3. Total Defects Per Test Case Review
4. Ratio of Major to Minor Defects Per Test Case Review
5. Total Defects Per Test Case Review Hour
6. Major Defects Per Test Case Review Hour
7. Ratio of Major to Minor Defects Per Test Case Review Hour
8. Number of Open Defects Per Test Review
9. Number of Closed Defects Per Test Case Review
10. Ratio of Closed to Open Defects Per Test Case Review
11. Number of Major Open Defects Per Test Case Review
12. Number of Major Closed Defects Per Test Case Review
13. Ratio of Major Closed to Open Defects Per Test Case Review
14. Number of Minor Open Defects Per Test Case Review
15. Number of Minor Closed Defects Per Test Case Review
16. Ratio of Minor Closed to Open Defects Per Test Case Review
17. Percent of Total Defects Captured Per Test Case Review
18. Percent of Major Defects Captured Per Test Case Review
19. Percent of Minor Defects Captured Per Test Case Review
20. Ratio of Percent Major to Minor Defects Captured Per Test Case Review
21. Percent of Total Defects Captured Per Test Case Review Hour
22. Percent of Major Defects Captured Per Test Case Review Hour
23. Percent of Minor Defects Captured Per Test Case Review Hour
24. Ratio of Percent Major to Minor Defects Captured Per Test Case Review Hour
25. Percent of Total Defect Residual Per Test Case Review
26. Percent of Major Defect Residual Per Test Case Review
27. Percent of Minor Defect Residual Per Test Case Review
28. Ratio of Percent Major to Minor Defect Residual Per Test Case Review
29. Percent of Total Defect Residual Per Test Case Review Hour
30. Percent of Major Defect Residual Per Test Case Review Hour
31. Percent of Minor Defect Residual Per Test Case Review Hour
32. Ratio of Percent Major to Minor Defect Residual Per Test Case Review Hour
33. Number of Planned Test Case Reviews
34. Number of Held Test Case Reviews
35. Ratio of Planned to Held Test Case Reviews
36. Number of Reviewed Test Cases
37. Number of Un reviewed Test Cases
38. Ratio of Reviewed to Un reviewed Test Cases
39. Number of Compliant Test Case Reviews
40. Number of Non-Compliant Test Case Reviews
41. Ratio of Compliant to Non-Compliant Test Case Reviews
42. Compliance of Test Case Reviews
43. Non-Compliance of Test Case Reviews
44. Ratio of Compliance to Non-Compliance of Test Case Reviews
End of document

Software Testing Metrics - Metrics Used by Software Testers

Summary:
This article provides you details of various types of metrics generally used in software tester
Theme:-
A software metric is a measure of some property of a piece of software or its specifications.

Since quantitative methods have proved so powerful in the other sciences, computer science practitioners and theoreticians have worked hard to bring similar approaches to software development. Tom De Marco stated, “You can’t control what you can't measure.”

The Product Quality Measures captured in various ways, here are some of the examples
1. Customer satisfaction index

This index is surveyed before product delivery and after product delivery
(and on-going on a periodic basis, using standard questionnaires).The following are analyzed:

- Number of system enhancement requests per year
- Number of maintenance fix requests per year
- User friendliness: call volume to customer service hotline
- User friendliness: training time per new user
- Number of product recalls or fix releases (software vendors)
- Number of production re-runs (in-house information systems groups)

2. Delivered defect quantities

They are normalized per function point (or per LOC) at product delivery (first 3 months or first year of operation) or Ongoing (per year of operation) by level of severity, by category or cause, e.g.: requirements defect, design defect, code defect, documentation/on-line help defect, defect introduced by fixes, etc.

3. Responsiveness (turnaround time) to users

- Turnaround time for defect fixes, by level of severity
- Time for minor vs. major enhancements; actual vs. planned elapsed time

4. Product volatility

- Ratio of maintenance fixes (to repair the system & bring it into compliance with specifications), vs. enhancement requests (requests by users to enhance or change functionality)

5. Defect ratios

- Defects found after product delivery per function point.
- Defects found after product delivery per LOC
- Pre-delivery defects: annual post-delivery defects
- Defects per function point of the system modifications

6. Defect removal efficiency

- Number of post-release defects (found by clients in field operation), categorized by level of severity
- Ratio of defects found internally prior to release (via inspections and testing), as a percentage of all defects
- All defects include defects found internally plus externally (by customers) in the first year after product delivery

7. Complexity of delivered product

- McCabe's cyclomatic complexity counts across the system
- Halstead’s measure
- Card's design complexity measures
- Predicted defects and maintenance costs, based on complexity measures

8. Test coverage

- Breadth of functional coverage
- Percentage of paths, branches or conditions that were actually tested
- Percentage by criticality level: perceived level of risk of paths
- The ratio of the number of detected faults to the number of predicted faults.

9. Cost of defects

- Business losses per defect that occurs during operation
- Business interruption costs; costs of work-arounds
- Lost sales and lost goodwill
- Litigation costs resulting from defects
- Annual maintenance cost (per function point)
- Annual operating cost (per function point)
- Measurable damage to your boss's career

10. Costs of quality activities

- Costs of reviews, inspections and preventive measures
- Costs of test planning and preparation
- Costs of test execution, defect tracking, version and change control
- Costs of diagnostics, debugging and fixing
- Costs of tools and tool support
- Costs of test case library maintenance
- Costs of testing & QA education associated with the product
- Costs of monitoring and oversight by the QA organization (if separate from the development and test organizations)

11. Re-work

- Re-work effort (hours, as a percentage of the original coding hours)
- Re-worked LOC (source lines of code, as a percentage of the total delivered LOC)
- Re-worked software components (as a percentage of the total delivered components)

12. Reliability

- Availability (percentage of time a system is available, versus the time the system is needed to be available)
- Mean time between failure (MTBF).
- Man time to repair (MTTR)
- Reliability ratio (MTBF / MTTR)
- Number of product recalls or fix releases
- Number of production re-runs as a ratio of production runs

Metrics for Evaluating Application System Testing:

Metric = Formula

Test Coverage = Number of units (KLOC/FP) tested / total size of the system. (LOC represents Lines of Code)

Number of tests per unit size = Number of test cases per KLOC/FP (LOC represents Lines of Code).

Acceptance criteria tested = Acceptance criteria tested / total acceptance criteria

Defects per size = Defects detected / system size

Test cost (in %) = Cost of testing / total cost *100

Cost to locate defect = Cost of testing / the number of defects located

Achieving Budget = Actual cost of testing / Budgeted cost of testing

Defects detected in testing = Defects detected in testing / total system defects

Defects detected in production = Defects detected in production/system size

Quality of Testing = No of defects found during Testing/(No of defects found during testing + No of acceptance defects found after delivery) *100

Effectiveness of testing to business = Loss due to problems / total resources processed by the system.

System complaints = Number of third party complaints / number of transactions processed

Scale of Ten = Assessment of testing by giving rating in scale of 1 to 10

Source Code Analysis = Number of source code statements changed / total number of tests.

Effort Productivity = Test Planning Productivity = No of Test cases designed / Actual Effort for Design and Documentation

Test Execution Productivity = No of Test cycles executed / Actual Effort for testing
End of document

Machars Blog

Search This Blog

Welcome to Machers Blog

Tuesday, September 8, 2009

The A-B-C's of software testing models

The Butterfly Model for Test Development (Part-2)

The Butterfly Model for Test Development ( part-1)

Understanding Metrics in Software Testing

Software testing metrics for a medium-sized project

Software testing metrics for a medium-sized project

Measuring Defect Removal Accurately

Subject: Software testing metrics for a medium-sized project

Software testing Metrices - Test Case Review Effectiveness

Software Testing Metrics - Metrics Used by Software Testers