Monday, March 16, 2009
How to Automate Testing of Graphical User Interfaces
Tilo Linz, Matthias Daigl imbus GmbH, D-91096 Möhrendorf, Germany
This lecture discusses strengths and weaknesses of commercially available Capture-and-Replay GUI testing tools (CR-Tools) and presents a pragmatic and economic approach for testing Graphical User Interfaces using such tools. The results presented were developed within the ESSI Process Improvement Experiment (PIE) 24306 [EU1], [EU2] in 1997/98 at imbus GmbH, Germany [im1].
Today's software systems usually feature Graphical User Interfaces (GUI's). Because of the varied possibilities for user interaction and the number of control elements (buttons, pull-down menus, toolbars, etc.) available with GUI's, their testing is extremely time-consuming and costly. Manual testing of GUI's is labor-intensive, frequently monotonous, and not well liked by software engineers or software testers. A promising remedy is offered by automation, and several tools for computer-based testing of GUI's are already commercially available.
All CR-Tools currently commercially available [im2] are similar in function and operation:
Capture mode: During a test session, the Capture-and-Replay Tool (CR-Tool) captures all manual user-interactions on the test object. All CR-Tools are object-oriented, i.e. they recognize each selected GUI element which is selected (such as button, radio-box, toolbar, etc.) and capture all object characteristics (name, color, label, value), and not just the X/Y-coordinates of the mouse-click.
Programming: The captured test steps are stored by the CR-tool in a test script equivalent to C or Basic. With all the functions of a programming language in the test script (case differentiation, loops, subroutines), even complex test processes can be implemented.
Checkpoints: In order to determine if the program being tested (SuT: Software under Test) is functioning correctly or if there have been any errors, the tester (during the test capture or in the script editing) can insert additional checkpoints in the test script. By this means the layout-relevant characteristics of windows-objects (color, position, size, etc.) can be verified along with functional characteristics of the SuT (a mask value, contents of a message box, etc.).
Replay mode: Once captured, tests can be replayed and thus in principal are repeatable at any time. The aforementioned object-oriented test capture permits GUI elements to be re-recognized when the test is repeated, even if the GUI has meanwhile been modified by a change in the software. If the test object behaves differently during a repeat test or if checkpoints are violated, then the test fails. The CR-Tool records this as an error in the test object.
Tool vendors promote CR-Tools as a very fast and easy method of automating GUI testing. But in reality there are lots of traps or pitfalls which can impair the effectiveness of CR-tool based testing. To understand where and why, we have to take a closer look at GUI's and GUI testing.
Testing of the GUI
In a GUI-based software system, all functions are represented in the GUI by visible "interaction points" (IP), and the activation of intended functions is achieved by directly pointing to their visible representations [Rau].
As a GUI tester you first must be able to recognize all these IP's (what is no problem for the human tester). Then you have to perform a set of test cases to test the functionality of the software underlying the GUI ( does the system really store the entered data on the database server? ). These tests are specific for each product being tested, and this functional testing is an essential component of GUI testing. You also always have to verify that the visible representations of functions are consistent and make sense to the end-user ( why is the save-button not labeled "save"? ). These "style guide" tests are usually specific for a product line, for all the software products of a specific company, or for software running under a specific windows-system. Of course, in practice there is no strict borderline between these two types of tests ( when I change that data field's value and close the mask - why doesn't the system automatically store the altered data? ). Now, where are the traps?
Traps when Accessing GUI-Objects
As mentioned above, CR-Tools are supposed to recognize IP's or GUI objects with object-oriented methodology and to also re-recognize them if the GUI layout has been changed. However, you must make sure this is also true for your tool. If your test tool is able to recognize GUI objects created by C++ development systems, it might not recognize objects within Oracle-applications or, for example, OCX-controls. In this case you would have to look for an additional interface kit offered by your test tool vendor.
Since CR-Tools recognize GUI elements by their ID, individualizing the specific element within its context, or via a combination of attribute values that uniquely identify the element, they are sensitive to changes of these values. However, software development tools (e.g. Microsoft's Visual C++) sometimes change ID's of GUI-objects without even notifying the developers (the idea is to release the developer from responsibility for assigning individual ID's to GUI elements). In this case, whenever a product is re-tested during a new release, the CR-Tool must be manually re-taught about all changed GUI objects. A similar problem occurs if GUI objects are eliminated or redesigned due to functional changes in the new product release.
Traps when Testing Functionality
If the CR-Tool captures all your mouse-clicks, it´s likely that no real test is recorded, such as: press the ´save´-button and wait until ´saved´ occurs in a message box . If you really want to test if the data has been saved on the server, you have to look for the specific data on the server itself (a checkpoint normally not performed by a GUI-testing tool), or you must reload the data record to check if it's the same as the one stored before. Capturing the sequence enter data, store it, reload it, check if values of input/output are the same is therefore a better approach, but there are even more potential traps:
Can you be sure that the data reread was not already stored on the server by your tester colleague? (You have to be sure of this fact. Otherwise your test case makes no sense at all). If you are sure, because - during testing -- the system hasn't popped up an overwrite data yes/no message, then you have a new problem: Your test case is not reusable and must be rewritten into something like: enter data, store it, if ´overwrite data yes/no´- message appears press ´yes´, reload it, check if values of input/output are the same.
Test cases depend on the SuT's history and current state; this problem is well-known in test automation. However, in GUI test automation this problem will occur with nearly every test case:
buttons are enabled/disabled depending on field-values,
data fields change color to gray and become read-only,
toolbars and menu entries change during operation,
the bitmap you stored as a reference bitmap contains system clock output,
sometimes during testing a word-processor will be running in parallel eating up windows resources, and sometimes not,
an email message popping up captures the focus and stops your test-run,
the application iconifies,
and other surprises.
Traps of Style Guide Testing
In order to give software a consistent Look & Feel, its designers try to establish appropriate layout-rules in a so-called Style Guide. The list below gives some examples:
´OK´ and ´Cancel´ are always located at the bottom or on the right, but never on top or on the left.
All fields must be accessed via tabs in the same order. If fields are grouped together, then tabs must access the fields within the group.
All fields must be perfectly aligned.
Texts must be legible and displayed with sufficient contrast.
A context-sensitive help function must be accessible for each button. These on-line help instructions must be useful and understandable.
The Style Guide must not only be followed for one mask, but also for the complete application. An automated Style Guide Test would therefore be extremely beneficial. Testing costs would be lowered considerably because it makes no difference whether the CR-Tool tests one or all masks. Furthermore, these checks could be run simultaneously ("free-of-charge") with the necessary functional tests. Considering the multitude of masks, the probability is very high that a product contains undiscovered style guide violations. Automated testing would thus certainly lead to a quality improvement for tested products. The value of using a CR-Tool here should be self-evident!
Nevertheless, typical Style Guide criteria are difficult to formulate and quantify, as shown by in the examples above. Test automation without formalization of the Style Guide is doomed to failure from the beginning.
Making GUI Test-Automation Work
As discussed above, the capture mode of CR-Tools can only help to achieve an initial prototype implementation of a test case. Most scripts captured will need additional script programming to obtain useful checkpoints and will require maintenance to make and keep them reusable. Therefore, GUI test automation is a programming task. On the other hand, it is a software specification task, too. The test cases must be defined before capturing, and their specification must be much more detailed than for manual testing. Any organization that is planning to automate testing must already have an established testing process in place; otherwise test development is doomed to failure. The diagram below shows a Testing Model of the imbus company, which specifies a structured process from test planning to test specification, test implementation and finally (automated) testing.
Fig. 1: The testing process
Each of these steps is primarily defined by templates of the documents that are generated as a result of the process step. Examples for such templates are illustrated in the following sections.
GUI Test Specification
As described above, GUI tests can be divided into product-specific functional tests and universally valid style guide tests . The functional tests can be subdivided into testing categories such as performance tests, which must be implemented for each tested product differently but can have a similar definition in the test specification. If this is taken into consideration when organizing the test specification, a reusable specification template can be obtained. Figure 2 shows a good basis for a GUI test specification:
A template like this lists the test cases to be performed in each GUI test. But what is inside a test case? Each test case definition should answer the following questions and therefore consist of the following parts:
GUI Test Implementation
After you have defined a method of specifying tests using templates like those shown above, then you are one step closer to automated GUI testing. However, additional rules about the test script implementation are necessary in order to guarantee proper documentation and adequate modularization of the scripts.
As a minimum requirement, each test script should consist of the following six sections:
If all tests are programmed in this manner, then test modules can be obtained which are relatively easy to combine into test sequences. This not only supports the reusability of tests when testing subsequent product releases, but also the reusability of the test programs by other testers, in different test conditions, and/or for testing other products.
Building a Test Case Library
The implementation of test scripts according to the rules described above necessitate relatively high expenditures for test-programming as well as corresponding know-how in CR-Tool programming. In order to "conserve" this knowledge and make it available to less qualified test-tool users, we started to develop a Test Case Library which completes the GUI Test Specification Template at the implementation level.
This library contains prototypes for test scripts (to ensure script implementation according to the rules described above), extensions of the test script language, and, in particular, ready-to-run implementations of Style Guide test cases from our GUI test specification running under Windows 95 and Windows NT. With the increasing growth of this library, we expect once to cover the full set of Style Guide rules. Accordingly, Style Guide conformance of a software product can be checked by running the library's test scripts, and in the future a style guide written in plain text will no longer be needed. But additional work is still necessary here, especially tests checking the ergonomic aspects of a software's GUI have to be programmed. The following are examples of Style Guide checks already implemented:
In the long-term such a library will only be used if it is constantly maintained and updated: new or modified tests in SW-projects must be periodically checked for reusability. All reusable tests must be perfected and documented; company-wide access must be guaranteed to any new versions of the test-case library. Therefore, in parallel to the implementation of the test case library, an appropriate process for library maintenance should be established. The following figure illustrates the steps needed:
Fig. 6: Test Library Maintenance Steps
Measurements of Expenditures
To determine how much more economical automated GUI-testing really is compared to manual testing, we have measured and compared the expenditures for both methods during our PIE [EU2].
The baseline project we chose for measuring this data was the development of an integrated PC software tool for radio base stations (for GSM radio communication). This application provides a graphical interface for commissioning, parameterization, hardware diagnostics, firmware downloading, equipment data base creation, and in-field and off-line diagnostics of multiple types of base stations, including a full-graphics editor for equipment definition. The application was developed using Microsoft Visual C++/Visual Studio and is approximately 100000 lines of C++ Code in size.
The table below (Figure 6) shows the measurement results:
V m := Expenditure for test specification. V a := Expenditure for test specification + implementation.
D m := Expenditure for single, manual test execution.
D a := Expenditure for test interpretation after automated testing. Time for test process not counted since executed without supervision via CR-Tool. V and D are given in hours of work.
E n := A a /A m = (V a + n*D a )/ (V m + n*D m ).
Figure 7: "Break-even" of GUI Test Automation
The question of main interest is: How often does a specific test have to be repeated before automated testing becomes cheaper than manual testing? In the above table this "break-even" point is represented by the N-factor, in accordance with the equation E N = A a /A m = 100%. The measurements undertaken within our experiments show that a break-even can already be attained by the 2nd regression test cycle (N total =2,03). This break-even, however, does have two prerequisites: the tests must run completely without human interaction (e.g. overnight test runs), and no further test script modifications are necessary to rerun the tests in later releases of the product. As already mentioned in this lecture, this is not easy to achieve.
If all you do is buy a CR-Tool and begin capturing, then your testing costs will increase to between 125% and 150% of manual testing costs (see E 1 in fig. 7). And there will be additional costs associated with each test run because of traps such as test script maintenance. On the other hand, if you establish a complete framework for GUI test automation (where the CR-Tool is a cornerstone and not the complete solution), then a decrease of costings down to about 40% for a typical product test cycle (E 10 ) is realistic.
Putting GUI test automation to work is a software development effort, and the complexity of this effort calls for the professional: experienced software testers with a solid software development background working within the framework of a well established testing process. High investments are necessary, not only for tools, but also for test implementation and test maintenance. On the other hand, after making these start-up investments, automated testing of Graphical User Interfaces can decrease your testing costs down to 40% compared to manual testing.