Testing for performance, part 3: Provide information
This final article in our three-part series on testing for performance looks at test execution and results reporting. As a reminder, the Testing for Performance series is broken into the following parts:
Assess the problem space: Understand your content, the system, and figure out where to start
Build out the test assets: Stage the environments, identify data, building out the scripts, and calibrate your tests
Provide information: Run your tests, analyze results, make them meaningful to the team, and work through the tuning process
As we look at the steps required to provide performance-related information, we will try to tie the artifacts and activities we use back to the work that we did in the earlier articles from this series. Often, as we execute our testing, our understanding of the problem space changes, becoming more (or sometimes less) detailed, and often we are required to reconsider some of the decisions we made before we got down into the details. Our tooling requirements can change, or our environment needs might expand or contract. As you execute your tests you learn a lot. It's important to recognize that it's OK for that learning to change your strategy, approach, tooling and data.
A common theme through this series has been how this material ties to Scott Barber's approach to performance testing. From Scott's nine heuristics for performance testing, in this article we look at some of the last heuristics: execute, analyze, and report:
Execute: Continually validate the tests and environment, running new tests and archiving all of the data associated with test execution
Analyze: Look at test results and collected data to determine requirement compliance, track trends, detect bottlenecks, or evaluate the effectiveness of tuning efforts
Report: Provide clear and intuitive reports for the intended audience so that critical performance issues get resolved
Those three heuristics capture the essence of gathering, understanding and reporting performance information. You can't have a tool make sense of your results for you. You also can't have a tool communicate those results to the various project stakeholders in a meaningful way. That's where performance testers become more than just tool-jocks (people who are really good with tools, but not really good at thinking about how or why they use them).
Make sure everything is in placeIn part 2 of this series, we covered configuration and coordination in detail. However, when it comes to execution, there are continuous configuration and coordination activities that need to take place between each and every test run. I've seen a lot of project energy wasted because tests were invalidated due to a configuration or coordination error. This is where you pull out those checklists you created in part 2 and work through them on a regular basis.
Before each performance test, you want to make sure everything you want to test is in place and configured correctly. Some common questions to ask include the following:
Are all the code deployments the correct version?
Are all the database schemas the correct version?
Are all the application endpoints pointing to the right environments?
Are all the services pointing to the right test environments?
Is all the data you're going to need available in the environment you're testing?
If you can, you might want to automate some of this checking. In the past, I've had the performance tests themselves check some of the settings. I've also used simple Ruby scripts to check various settings. Whether it's manual or automated, make sure you know what you're testing. Most likely it will be one of the first questions asked if you think you've found a performance problem.
Before you run each test, make sure you've coordinated any special schedule, environment or monitoring needs. For example, do the DBAs need to enable detailed transaction tracing in the database? Do you need to let your third-party service vendor know that you'll be running a test in their environment? Do you need to make sure that "other team" isn't running a performance test at the same time you are because you share a service?
Having a second checklist or a standard notification process for performance testing can help reduce embarrassing and frustrating oversights. Many mornings I've come in to find a full inbox because I failed to notify someone I was load testing. I've also had to re-run tests because the right people weren't watching the system at the right time.
There are also some tasks you may need to repeat in between tests. A common task I find I have to do for financial service applications is reset the test data. That could mean restoring databases, purging records, canceling pending transactions or updating user information back to a known state. In the past I've also had to come up with new IP address ranges between runs, had to wait for open sessions to timeout, clear server cache, or restart services. Make sure you know what you need to "re-set" between runs so you can quickly and reliably make those updates.
Execute your tests and collect all the dataExecuting your test could be as simple as scheduling some scripts and walking away, or it could be much more involved. Different tests can require different levels of interaction. Sometimes you might need to manually intervene during the test (clearing a file or log, kicking off additional scripts, or actively monitoring something), or you might just want to measure something manually using "wall-clock" time while the system is under load. In the past, inspired by James Bach's Log-Watch tool and a keynote I saw Thomas Dolby give on sonification, I've even collocated myself with servers to listen for various programmed sounds.
While tests are running, it can be helpful to monitor their execution. If there's a problem, many times I'll notice it soon enough to end the test run early and start it up again without having to reschedule execution. It can also be useful to watch streaming application logs, performance test data usage, and performance monitoring applications and utilities. Often, you'll find patterns while data is streaming that you may not see once you've collected it all and start analyzing it statically.
Once your test execution is complete, data gathering begins. I'll often download test logs from my performance tool, application and database logs and export data from monitoring tools for the time period my test ran, as well as any other data or information I think will help me with analysis. Save a snapshot of the data before you start to manipulate it. That way you can always go back to the original if you mess something up. (Not that you could ever corrupt a large dataset in Excel.)
Before you get to far into your analysis, make sure you do a quick scan to make sure all your data passes your heuristics for a successful run. For example, if you have an application log full of Java exceptions or a performance test log file with a 50% pass rate, you probably shouldn't spend too much time doing in-depth analysis. More than likely, something went wrong with the run. A little debugging and high-level troubleshooting should help you figure out what went wrong. Most often, you'll need to make a small change and re-run the entire test.
Analyze results (and collect even more data) If you appear to have usable data, then let the analysis begin! Analysis is very application-specific, but to give you a flavor of what I look for when analyzing results, let me walk you through what I normally do. (Most of my performance testing experience is with J2EE or .NET applications for financial services applications.)
The first thing I do is look at the errors logged by my performance test scripts. I'll often ask myself the following questions:
What were the errors and why did I get them?
How many did I get and is that within my error tolerance?
Do I need to debug any of those scripts and re-run my tests?
Are any of them based on functional defects? If so, has that issue been logged?
If I can't readily explain an error that occurs in my script, it goes on a list I maintain for issues to investigate. I might have a defect that occurs under load (or some other intermittent problem). I might have an environment issue I don't know enough about. Or I might have something wrong with my script or data that I just can't pinpoint yet. Keeping all these open issues in one place can sometimes help me identify patterns between them. I don't mind getting errors in my scripts; I just want to know where they are coming from.
Next I'll look into any errors I can identify in any of the application, server or database logs. Often, that means I need to work with someone else on the team -- a DBA, a programmer or someone hosting services or infrastructure. Working with the appropriate teammate, I'll again ask the questions from above. Again, it's OK if errors occur (I don't ever really expect a run to be 100% clean), but I want to know what the errors were and how frequent they occurred so I can account for that behavior when I report my results, or so I can formally log any defects or issues we may have uncovered.
Once all errors are accounted for and/or cataloged, if I think I still have results worth looking at, I'll start to look at the transactions that took place. I'm often interested in the number of key transactions that took place, the frequency and timing of the transactions, and any patterns in the data that I can find. Remember in part 1 when we defined a usage model and in part 2 when we calibrated to that model? Well, here is where I look to see if we actually executed something close to that model with our test. If for some reason we didn't get the workload that we were targeting or if I saw a concerning pattern in the transaction timing, I would want to investigate why that is.
If the generated workload looked correct, I would then move on to looking at system performance characteristics. I'd start with basics such as CPU and memory utilization for each sever involved, and move on to things like average queue depth and average message time spent in queue and how load balancing was distributed for the run. I'm looking for anything that might be out of tolerance. Again, you may need to work with a cross-functional team to determine what those tolerances should be set to.
Finally, I'll analyze the transaction response times. Often it can be useful to look at the response times captured in your performance scripts alongside response times from other monitoring tools (for example, Introscope or WhatsUp Gold). I move most data into Excel so I can manipulate it at will. (Don't forget to save the original before you edit.) There I'll look for trends and patterns in the average, max, min and percentile response times. When I examine response times, I'm often looking for the time measured to either trend towards or away from a performance objective (target numbers, SLAs or hard limits like timeouts), or I'm looking for something inconsistent with past performance history.
As I'm looking at the results, I'll often create a list of things that appear interesting to me or a list of any questions I might have. This list becomes useful as I think about the next test I'll want to run. It also helps me when I interact with other team members (since I can never seem to remember the minute details of the results when I need to).
Report your results If there's one thing I know, it's that the performance tester becomes a very popular person once he starts execution tests. Everyone wants to know about those test results, and they want to know the minute the tests have been completed. My experience is that there is a lot of pressure to get preliminary results public. Often this creates a dynamic where the performance tester doesn't want to share results because he doesn't yet understand them and doesn't want people to act on bad data, and a project team demanding results quickly, since performance testing is so critical and occurs so late in the project.
I recommend two steps that can help. First, make your raw data available as soon as you get it all pulled together. Get in the habit of publishing the data in a common location (file share, CM tool, SharePoint, wiki, etc.) where at least the technical stakeholders can get to the data they may need to review. Second, hold a review meeting for the technical team shortly after the run. Hold it after you've done your preliminary checks to see if you even have results worth looking at, but before you do any in-depth analysis. It's at this meeting that you might coordinate a cross-functional team to dig into the logs, errors and response times.
Once you have completed any in-depth analysis and have findings to share, pull them together in a quick document and call together another meeting. I try never to provide results without two things:
A chance for me to editorialize on the data so people don't draw their own conclusions without understanding the context for the test
A chance for people to ask questions in real time
In the results, I'll often include a summary of the test (model, scenarios, data, etc.), the version/configuration of the application tested, current results and how they trend to the targets, historical results, charts to illustrate key data, and a bulleted summary of findings, recommendations or next steps (if any). Only after a face-to-face meeting (or in preparation for a face-to-face meeting) will I send out the results in an email.
If (for whatever reason) you can't pull everyone together for a results review, send your findings first to key technical stakeholders (DBAs, programmers, infrastructure, etc.) so they can add their analysis and comments. Even this simple step, while slowing things down only a little, may save you hours of heartache from misunderstandings on your part or the part of the reviewers.
The more times you iteration through execution and results reporting with a particular team or for a particular application or system, the easier and faster the process becomes. You develop heuristics and shortcuts for most things, and the time from test completion to interim results can be as short as a few hours if not minutes. Like anything, the more you do it, the better your apperception and the faster you become.
Likewise, you also can become sloppy if you get stuck in a routine. Make sure you have effective safeguards in place. Reporting inaccurate results or drawing false conclusions will happen, and most people won't hold it against you if you do it once or twice. But don't let it become a habit. If you continue to make those mistakes because of shortcuts, you'll quickly lose the respect and trust of the team. Without that, the work becomes more difficult and you become more ineffective.
At the end of this phaseAt the end of this phase you should have an initial set of results to review with your team. You may not have your final results, but you should have some idea of what types of errors you're getting, whether your scripts are calibrated correctly, some preliminary response data, and an idea of system utilization under load. Going forward you need to constantly prioritize the next test to run, creating any new test assets as you move along. If you've done a good job of collecting your initial data and analyzing the results as you get them, you should be able to adapt rather quickly.
Here is a possible summary of some of the work products from the execution and reporting phase:
Checklists to support test execution
Performance test logs and baselines
Logged defects or performance issues
Lists for further investigation:
Errors (script, data, application, etc.)
High response times (database transactions, application events, end-user response times, etc.)
Unpredictable behavior (queue depth, load balancing, server utilization, etc.)
Open questions on architecture or infrastructure
Interim or final performance test result documents
Updated documents from the assessment phase (strategy, diagrams, usage models and test ideas lists)
Updated assets from the build out phase (scripts, environments, tools, data, etc.)
If you were in a contract-heavy or highly formalized project environment, you might be done after this phase or you might carry work over into another iteration, statement or work, or formal tuning phase. If you were in a more agile environment, you are now ready to dig in, get dirty, and start performance tuning or debugging any of the more problematic issues identified. The important point to remember is that this phase is about manipulation, observation and making sense of the information you gather. You move freely along the continuum of data generation, data gathering and data analysis.