PT-28

E s s e n t i a l f o r s o f t w a r e t e s t e r sTE TERSUBSCRIBE

It’s FREE for testers

August 2014 v2.0 number 28£ 4 ¤ 5/

Including articles by:

Sakis Ladopoulos Intrasoft International




Testing in the lead

This issue of Professional Tester is sponsored by

Including articles by:Henrik Rexed NeotysGregory Solovey and Anca Iorgulescu Alcatel-LucentNick Mayes PAC UKNormand Glaude ProtecodeEdwin van Vliet SupridaSakis Ladopoulos Intrasoft International

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-ranorex

3PT - August 2014 - professionaltester.com

From the editor

Keep testing in the leadIs software testing as important as software requirements engineering, design, programming or project management?

No. It is much more important than all of these. Testing should be the framework upon which they are built. Their only aims should be first to facilitate the creation of all the right tests, then to ensure they all pass. Testing is not a safety net to rescue the incompetent from failure: it is the systematic, generic method applied throughout the lifecycle and by which success is achieved.

This issue is made possible by our sponsor Neotys: a performance testing tool vendor committed to testing principles. We recommend readers evaluate its product Neoload which is available in a completely free edition limited only in the number of virtual users it simulates.

If you are one of the tens of thousands of testers who like PT, please help us to keep it free to read by considering the offerings of those who support it, and letting them know you appreciate that.

Edward BishopEditor

IN THIS ISSUETesting in the lead

4 Model answersTesting-led specification with Henrik Rexed

9 QA of testingTesting-led process with Gregory Solovey and Anca Iorgulescu

13 Get ready for testing in the front lineTesting-led digital transformation strategy with Nick Mayes

19 Zero toleranceTesting-led project management with Sakis Ladopoulos

21 Open but hiddenTesting-led third-party code management with Normand Glaude

24 Forget me notTesting-led test data management with Edwin van Vliet

Visit professionaltester.com for the latest news and commentary

E s s e n t i a l f o r s o f t w a r e t e s t e r sTE TERSUBSCRIBE

It’s FREE for testers

August 2014 v2.0 number 28£ 4 ¤ 5/

Including articles by:





Testing in the lead

This issue of Professional Tester is sponsored by

Contact

EditorEdward Bishop

[email protected]

Managing DirectorNiels Valkering

[email protected]

Art DirectorChristiaan van Heest

[email protected]

SalesRikkert van Erp

[email protected]

PublisherJerome H. Mol

[email protected]

[email protected]

Contributors to this issueHenrik Rexed

Gregory SoloveyAnca Iorgulescu

Nick MayesSakis LadopoulosNormand GlaudeEdwin van Vliet

Professional Tester is published by Professional Tester Inc

We aim to promote editorial independence and free debate: views expressed by contributors are not necessarily those of the editor nor of the proprietors.© Professional Tester Inc 2014All rights reserved. No part of this publication may be reproduced in any form without prior written permission. ‘Professional Tester’ is a trademark of Professional Tester Inc.

4 PT - August 2014 - professionaltester.com

Testing in the lead

Performance testing is often done in a way that is contrary to the princi-ples of testing. An application is put under arbitrary load and its response times measured. Measurements that seem large, relative to one another or compared with arbitrary expectations,

are investigated and addressed and the same test is run again to demonstrate that the change has been effective.

But how do we know the right things are being measured under the right condi-tions? If not there may have been no need for the changes. In fact the changes may well have worsened, or even caused, performance issues that will matter in pro-duction but have been missed by testing.

Just as functional testing has no meaning without trusted requirements, perfor-mance testing can do nothing to provide assurance unless what needs to be assured has been defined formally before testing starts. In both kinds of testing, the requirements will change, for many reasons including the light of test results: but adequate investment in specifying the right requirements means that testing can provide a clear result (requirements are met, or not: that is, the application will or will not fail in production) and the closer to right first time those specifications are, the more empowered is testing to save mas-sive effort in management, operations and development and the less testing costs.

When an application passes performance testing then fails in production, proving the testing to have been unrealistic, it is easy but wrong to blame the testing itself or the tools used to execute it. The real problem is test design without correct basis. It is necessary to ask “what did we need to know which, if we had known it, would have allowed us to predict this failure before production?”. In other words: for what should we have been testing?

This article will attempt to provide a generic answer to that question by defining a model minimum set of perfor-mance specifications suitable for use as test basis and explaining how to

Model answersby Henrik Rexed

Henrik Rexed explains how to get the numbers nearly right first time

Specifying accurate and testable performance requirements


Testing in the lead

estimate accurate quantitative informa-tion, working from high-level business requirements, to populate that model set. The availability of these accurate estimates gives performance testing, regardless of how it is carried out, far greater chance of success.

Important user transactionsThe method described here is for online user-facing applications and specifies performance from the user’s point of view, by considering the time between action and response. Each different action available to the user (that is, the user transactions that, individually or in sequences, provide the functions of the application) must be treated separately. Many of them will require similar perfor-mance and can be specified in groups, but these are the least important, ie the least likely to cause performance or reliability failure. Attempting to specify the more important ones as a group will result in very inaccurate figures for most of them.

Which are the important user transac-tions to specify is usually obvious from a high-level understanding of the application: those user transactions that require complex data transactions or processing, especially with or by sub-systems. However it is not always so simple for applications which perform transactions not triggered, or not imme-diately, by single or specific user action: ie “push” rather than “pull” transactions. Here a more detailed understanding of the technical design might be needed.

Application and transaction capacityThe capacity of an application is a set of values defining the maximum number of simultaneous users to whom it must deliver the performance levels (whose specification will be discussed below). These values are:

• sessioninitiationsperunittime (eg hour). This is the rate at which users are expected to begin interacting with the application and reflects the expected “busyness” or “traffic”

• concurrentusers. This is related to session initiations but also takes into account the expected length of use sessions, that is the rate at which users stop interacting with the application

• useraccounts. For applications which require users to be registered and to log in, this figure is the ceiling of the previous one: the maximum number of users who could be interacting with the application at any time. However many applications offer certain functions to unregistered users too. In this case the maximum number of user accounts (and where applica-ble the maximum number of accounts of each of the different types) is used to estimate the next figure

• concurrentusertransactions (for each of the user transactions).

Obviously the correct entity to define required capacity is the acquirer, ie the business. Sometimes this is easy: for example where an application or user transaction is available to a limited group of employees, associates or customers whose growth is reasonably predictable.

For a public-facing application that aims to increase its usership, the key source of information is the business plan. This must contain estimates of ROI and there-fore of market share, turnover, average sale price or similar which can be used to derive the expected usership and so the necessary capacity. Moreover, these estimates will tend toward the optimistic, which reduces the risk of performing test-ing under insufficient load.

Unless the usership of the application or a user transaction is truly closed so the maximum usership is known with good accuracy, capacity figures should be derived from the estimated maximum usership figures multiplied by 3. This is to assure against peak loads which may occur for multiple reasons, including:

• recovery: if the application becomes unavailable for functional or


Testing in the lead

non-functional reasons (including tran-sient conditions), usership is multiplied when it becomes available again

• coincidence/external/unknown:sometimes demand reaches high peaks for no predictable reason. Anyone who has had a job serving the public will recognize this phenomenon, for example every supermarket worker has experienced the shop suddenly becoming very busy in the middle of Tuesday afternoon, usually the quietest time. We need not be concerned here with its causes, but it is interesting to note that an external reason is not nec-essarily required: it can be explained purely mathematically

• transientconditions: network-layer events such as latency, packet loss, packets arriving out of order etc increase effective load by prevent-ing threads from completing, holding database connections open, filling queues and caches and complicating system resource management. The same effects can also be caused by application or system housekeeping events such as web or memory cache management, backup, reconciliation etc. While the second group of events, unlike the first, is to some extent predictable and controllable, as we have already seen the production load is unpredictable, so we must assume that all can happen at the same time as peaks in production load.

Applying the “3 times rule” should miti-gate these risks sufficiently but a greater one remains: unexpected popularity. If testing is based on expected capacity but then usership expands much faster than expected. As well as failing to serve the new users, perhaps losing a once-in-a-life-time business opportunity, their presence may well cause the application to fail to serve existing customers who may be even more valuable.

If such an event is considered possible and cost is no object, one would build the application and provision infrastructure

to be capable of passing performance testing at extremely high load. Realistically however it must be mitigated by scal-ability testing, that is testing to assure against failing to increase the application’s capacity economically when needed. For performance testing purposes we need to know the highest currently-expected capacity and every effort should be made to help the business to estimate this accu-rately and commit to those estimates.

Unfortunately many testers still find this impossible to accomplish and are forced to revert to an empirical method. This is possible only if the application is already in production or available for testing in a reasonably production-like environment. A preliminary load test is performed on the application using a simple virtual user script such as a typical visit or just random navigation (without infrequently-occurring user transactions). The same exercise is repeated on the important user transac-tions, again using a simple script which simply carries out a typical transaction without variation or mistakes. In all cases, the number of VUs is increased slowly until the performance requirements (dis-cussed below) are not being met for – that is, too-slow responses are being expe-rienced by – 25% of them. This is taken to be the point at which the application is reaching its capacity.

Obviously this figure is arbitrary and the method far from satisfactory. However it does stand a reasonable chance of being sufficiently accurate, used as a start-ing point for test design, to at least help limit waste. In fact, if the organizational, technical and project situation makes it cheap, it may well be worth doing even if accurate estimates based on the busi-ness plan are available, as a check which might show that the performance required by that plan is unfeasible so testing should not begin. Note that, because of the simplicity of the scripts, it must not be used the other way round: that is, do not say “the preliminary testing shows the current performance specifications can be met easily, so the business can consider making them more ambitious”.


Testing in the lead

The worst situation of all is having to begin test design based on the testers’ own esti-mates of needed capacity, or to proceed to test execution with no clear idea of these and having to find them out empirically. Only very good luck can prevent this lead-ing to great inaccuracy, delay and waste in the testing, development and, usually, business effort.

User behaviourUnusual action sequences, making many mistakes, breaking out of user transac-tions before they are completed and rapid repetition of actions have the effect of increasing the load caused by one user to that which would be caused by more than one “normally” behaving user. Long pauses between actions (“think times”) increase concurrency and can also cause sudden changes in total load as actions of users working at different speeds become and cease to be simultaneous.

It is futile at the specification stage to try and define a “normal” user. What test design needs to know is the required range of speed (average pause time) and accuracy (tendency not to make mistakes). Note that in modern responsive web and mobile applications, pause times are shorter than for older, less responsive web applications: do not consider the time between form submissions, but that between clicks and keypresses on a form which may interact with the user after each of them via AJAX etc.

Now for the target audience, decide how much longer this time may be for the slow-est user compared with the average user, and how much faster for the fastest user. Taking the time for the average user to be 0 arbitrary units, the range can now be expressed as, for example, -0.5 to +0.7.

The same approach is used for the ten-dency of users to make mistakes, where a user who makes the average number of mistakes per n actions is taken as 0 arbitrary units.

It is also vital to realise that no real user works at a constant rate or accuracy, due

to being affected by interruptions, distrac-tions etc. In the most extreme case, VUs must vary from the lowest to the highest points on the defined range within a single user journey.

Deciding the distribution of VUs of these different characteristics and how they should vary dynamically is the job of test design. In practice, the “average speed” user will be taken as working at the fastest speed the load injector can achieve. In the same way, initial VU scripts tend to assume no user errors. Both are fine, provided the necessary variation is then added. Not doing that is like trying to check the structural soundness of a build-ing without going in but just by hammering on the front door.

Specifying the range of variation for which assurance against failure must be pro-vided enables test design to find creative ways to do so. Without that specification, testing is degraded to guesswork.

Load levels and response timesOnce the capacity figures are known, load levels based upon them are defined: “No load” (less than 40% of capacity), “low load” (40-80%) and “high load” (80-90%). There is no need to deal with higher loads: any exact load, including 100%, is practi-cally impossible to achieve accurately by any means, therefore any test result obtained by an unknown approximation of it is meaningless. For the same reason, in testing it is best to aim for the middle of the band, that is 20%, 60% and 85% load. It might well be desired to apply load above capacity but there is no need to specify performance at that load; the aim is to predict other types of failure and their likely or possible impact. In estimating the capacity, we assume that performance failure will occur at loads higher than it.

For each of the important user transac-tions, and for each of the three load levels, maximum and average response times are decided. It must also be decided whether these should or should not include network transport times: in other words, whether the time should be


Testing in the lead

measured from the requestor’s point of view, from the time the request is made (eg user touches button) to the time the response is fully displayed, or from the responder’s, from the time the request is fully received to the time the response has been fully sent.

There are arguments for both approaches. The transient network con-ditions are unknown and beyond control; including them in the specification neces-sarily includes error in the testing. Even if conditions are fairly stable, entropic events such as packet loss will always cause some VUs to report performance failure. On the other hand, these factors all affect the user experience, which is the most important consideration. Moreover, that consideration is really the only way to decide what are the desirable and tolerable response times. Beware of oversimplified statements along the lines “x% of users abandon after y seconds delay”. It is necessary to consider the sequence of actions taken by the user before the transaction and the amount and nature of the data to be displayed after it, especially when errors occur (such as the submission of incom-plete or invalid data). All of these affect the user’s expectations and behaviour.

There is one special case where the response time must be specified as purely the time taken by the responder: when that responder (a component such as a database or mainframe) is respond-ing to more than one requestor. A typical example of this situation would be both web and mobile applications connected to the same external subsystem. While empirical testing including both working simultaneously may well be desirable and is made possible by modern devel-opments in tools and environment provision, that will happen far too late to discover that a key component on which both depend cannot provide adequate performance. Testing of each requestor must assure that the response time under maximum load of that requestor is at most that needed to meet the user-experienced performance specifications,

divided by the number of requestors which will be connected to the responder in production. This response time is the value that should be specified. Note that in almost all cases it will not be necessary to delve into system architecture: the only important point is that at which requests from multiple requestors converge. Subcomponents beyond this point can be considered as a single component.

The minimum set of performancespecificationsIn summary, we have defined the follow-ing values. For the application overall: maximum session initiations per unit

time, maximum concurrent users and maximum number of user accounts. For the users: relative range of variation of time between actions and rate of making mistakes. Then for every important user transaction: maximum concurrent users carrying it out, maximum response time and minimum response time. Setting these values before test design begins gives performance testing clear targets and makes it more quickly effective and less wasteful. A template to record these values is shown in figure 1. When all the symbols have been replaced by quan-titative values, minimum performance specification is complete

Henrik Rexed is a performance specialist at Neotys (http://neotys.com)

Symbol Definition NoteSIR maximum session initiation

rate per secondof entire application

CU maximum number of concurrent users

UA maximum number of user accounts

if applicable

UPmax maximum user pause (time between actions, seconds)

relative to average

UPmin minimum user pause (time between actions, seconds)

UMmax maximum number of user mistakes per n actions

UMmin minimum number of user mistakes per n actions

SCRTmax maximum response time of shared component connected to this and other apps (sec) ÷ total number of apps connected to it (seconds)

not including network transport time. For complex applications there may be more than one such responder component

IUT number of identified important user transactions

RTmax[n]…

maximum response time of user transaction n (seconds)

for n = 1 to IUT

RTmin[n]…

minimum response time of user transaction n (seconds)

Figure 1: performance specification template


Testing in the lead

Insufficientcodecoveragewhichdoesnot increase with each release means there are test cases that should be in the automated regression suite but are not. This happens, despite the pres-ence of an established testing process, because the problem is identified too late to do anything about it: coverage becomes visible only at the end of a release, when the full planned automa-tion is delivered. During the release, low coverage is blamed on instability of the product code – it is difficult to automate testing when code keeps changing – but when the release is done, all resources move to the next one so there is often no opportunity to improve coverage even when the code has stopped changing.

In this article we present our approach to detecting and resolv-ing test-related inconsistencies immediately as they appear, throughout the software lifecycle, using auditing.

CoveragedoesnʼtcoveritCoverage, of code or requirements, is a fairly poor indicator of test qual-ity. Consider the requirement ‘If the processor occupancy reaches 80% or the memory usage reaches 70% during the boot sequence an over-load warning should be generated’. As written, it could be “covered” with one test case, which clearly would not detect many possible defects.

QA of testingby Gregory Solovey and Anca Iorgulescu

Gregory Soloveyand Anca Iorgulescu present their audit-based approach

Replace retrospective test assessment with real-time test monitoring


Testing in the lead

Now assume the requirement is imple-mented by this code:

IF ((CPUOccupancy > 80% OR MemUsage > 70%) AND State eq “boot”))

{ sendWarning ( “Overload” ); }

The most commonly used code cover-age criterion, decision coverage, can be achieved with two obvious tests:

CPUOccupancy = 80%; MemUsage = 69%; State eq “boot”

CPUOccupancy = 81%; MemUsage = 70%; State eq “boot”

Neither of these would detect the defects in the following incorrect versions of the first line:

IF ((CPUOccupancy > 80% OR MemUsage = 70%) AND State eq “boot”))

IF ((CPUOccupancy > 80% OR MemUsage > 70%) OR State eq “boot”))

Test completeness can only be achieved by using the right test design methods for all requirements.

Accepting that fact as a starting point, we can envisage a system that will detect all possible defects, shown in figure 1.

A series of audits, triggered after each lifecycle phase, can ensure that the test process meets the predefined guidelines.

Critical test quality gatesWe check the quality of testing at the following reviews:

• requirements/architecture documents review

• design documents review

• test plans review

• test code review

• test coverage review (for each build released).

At each of these, the following series of steps is taken:

• generate metric(s) and upload them into the appropriate database

• perform audit to verify the new metrics values against prevailing standards

• communicate audit results to everyone with responsibility for this quality gate.

The audit detects any test degradation and everyone interested is notified immediately.

Metricsforrequirements/architecture documentsTypically, these reviews focus on making sure that the customer requests are

properly represented. Test aspects can be stressed insufficiently. However, the following two metrics are essential and must always be taken:

Traceability, the extent to which require-ments have unique ID for cross reference with test cases, results and incidents.

Testability, the extent to which test cases meet the controllability and observability principles. To be counted, a test case must be executable (control-lable) and the result of execution must be obtainable from an external interface (observable). If some requirements are not testable the requirements document should identify an alternative means to test them, for example test harnesses.

Metrics for design documentsTypically, a design review aims to verify the adequacy of the transformation of the business logic into the implementation details. From a test quality perspective, a design document review has to report on two aspects: test harness implementation and presence of unit tests.

Test harness implementation verifies that if the requirements document prom-ises a test harness, the design document should cover it. If it does, then we can be confident to some extent that this execu-tion and automation will happen.

However, designers may, in their design process, discover ways in which some of the test harness could be replaced, by

All relevant documents are reviewed (and amended if needed) and used by testingEvery requirement is testable Every requirement is covered by a complete set of tests Every test is automated in parallel with product development Every test is executed for every build releasedFigure 1: attributes of a test process to detect all possible defects


Testing in the lead

Figure 2: audit and notification system

Figure 3: test quality dashboard

requirementsreview

test planreview

designreview

test codereview

test build

Software lifecycleSoftware lifecycle

DatabasesDatabases

Group/team layerGroup/team layer

Release/project layerRelease/project layer

DashboardDashboard

immediate notifications

persisting issue reports

audit data

Audit and notification systemAudit and notification system

auditmanagementsystem

auditmanagementsystem

connectionmanager

connectionmanager

auditengineaudit

engine

notificationsubscriptionnotification

subscription

notificationgenerator

notificationgenerator

data

raw data

documentmanagementsystem

documentmanagementsystem

datatestmanagementsystem

testmanagementsystem

datacodemanagementsystem

codemanagementsystem

datadefectmanagementsystem

defectmanagementsystem

dataprojectmanagementsystem

projectmanagementsystem

resolution reports

metrics


Testing in the lead

unit test cases which could be executed by programmers.

In this case the design of those parts can be replaced, in the design docu-ment, with full specifications of the tests and commitment that they will be automated.

Presence of unit tests is the extent to which those unit tests have been speci-fied in the design document. Importantly, its measurement does not include assessment of the completeness of the unit tests. That is not within the scope of a design review: the completeness of tests for specific requirements will be reported by the test plan review.

The design document should specify the tests at whatever level is necessary to explain how they will deliver test-ability. It aims to show that if a specific acceptance test fails, a specific unit test would necessarily fail: for example a test for null pointers, memory leaks, array indices out of bounds, stack overflow etc. Execution of that unit test could be done by any available method, including static analysis, dynamic analysis and load testing with utilization monitoring.

Metrics for test plansTest traceability measures the extent to which the requirements are covered by reviewed test cases.

Completeness of reviewed test cases measures the extent to which they cover all possible implementation errors that could be achieved using standard test design techniques. It is assumed that all testers are trained in test design techniques and can verify the correctness and completeness of the presented test cases.

Metrics for test codeTestware maintainability measures the immunity of testware to produc-tion code changes. Ideally, a single change in the production code, APIs, or related interfaces should lead to a single change in the testware.

Test code is a development product and needs to follow the same process and meet the same standards as production code. It is assumed and required that, within that process, lines of test code reviewed, defect density of test code and inspection rate of test code are measured.

Metrics for automated test coverageExtent of test automation is the percentage of the test cases identified in the test plan which have been automated. The test automation code should be delivered at the same time as the code it will verify. This is especially important when code is released frequently, e.g. in continuous integration environments, since the test code is needed for sanity checking and regression testing, as well as to assure correctness of new functionality.

During automated test execution, for each software delivery, the code coverage metric can also be reported. Although we noted above that it is a poor indicator of test quality, code cov-erage is a good indicator of test teams’ productivity. Measuring it here gives a useful comparison across different test projects or groups. It is important that the same coverage level is maintained across all of them.

Implementation of auditsThese and other test-related metrics are held in databases such as require-ments management, test management and review management tools. They are raw data, meaningful only when audited against specific guidelines which define their interpretation, typically using ranges. An audit and notification system (figure 2) performs this task and reports to sub-scribed personnel as needed. The reports are of two kinds: notification immediately an issue is detected, and compiled infor-mation on persisting issues.

The output of the audit system is also presented in a “test quality dashboard” (figure 3). Its purpose is to provide easy comparison of test quality for dif-ferent releases, technologies, features and products.

Cost of implementationEnsuring that the test process is always formal, complete, straightforward and maintainable in real time decreases the cost of testing dramatically because it prevents test degradation from causing unnecessary rework, to both tests and the product. When we prototyped the process described above, manually, at Alcatel-Lucent, with no change in how test design or automation itself was done, 70% code coverage was achieved. The existing test automation processes, with no quality feedback, achieved only 40% code coverage

Gregory Solovey PhD is a distinguished member of technical staff at Alcatel-Lucent. He is currently leading the development effort of a test framework for continuous integration Anca Iorgulescu is a software developer and agile coach at Alcatel-Lucent. She is responsible for the establishment and automation of quality processes and audits for wireless products


Testing in the lead

Testing is about to face a massive challenge. Two vital, yet divergent, business demands depend upon it.

The first is economic. In that, testing is the victim of its own success. All reasonably well run businesses now consider good testing necessary. Until recently, some of them considered it optional. We testers have won our most important victory.

But now that we are an integral part of our organization, we are required to contribute to its effort to reduce “lights on” operating costs. So, testing has to be done more efficiently, ie cheaper. Those who under-stand testing and those who do not still

agree that this can be obtained by greater centralization of testing resources, optimi-zation of low-cost delivery teams and the use of standard tools and methodologies. All these act to reduce the amount of test-ing effort required.

The second challenge is practical: the digital agenda of the business. As other contributors to this issue of PT infer, it is the nature of software to change, and so to change the world. In the world as software currently has it, time to market outprioritizes cost.

Any ambitious business must now modernize and innovate areas such as mobile applications, self-service websites, social media analytics and multi-channel e-commerce platforms just to remain com-petitive. Put another way, the demands of its customer are influenced by its competi-tors, all online. Brand competition is now in real time.

But the digital strategy is usually led by sales and marketing, who often succeed in circumventing the CIO and undertaking their own development projects, often lev-eraging cloud-based tools and platforms.

Testers need to get their arms around both of these dynamics. In this article I will predict how test organizations will need to adapt to the market dynamics of digital and cloud computing and yet remain efficient.

Digital transformationEveryone in business, in all sectors, is wrestling with this concept.

Digital technologies are revolutionizing how companies, established and new, interact with their customers, who are shopping on their mobile devices, sharing their views on products via social media

Get ready for testing in the front lineby Nick Mayes

The latest forecast from PT’s weathermanNick Mayes

As software becomes more important, so do we

Timely performance testing prevents technical debt

Agile development particularly needs early and frequent performance testing. Performance should be tested at the very first iteration and SLAs assessed precisely at every sprint. That way, developers know immediately that something in a particular build has caused deterioration. Learning that several builds later makes isolating a nightmare and debugging exponentially more expensive.

Delivering actionable insight to developers quickly

Agile developers need to know more than just that their code is causing performance issues: they need to know when their code started causing problems and on what story they were working when the issue started. It’s a huge pain for developers to be forced to go back and fix code for a story they worked on weeks or months ago. It also means they can’t spend time working on getting new features out the door. It is important to address performance issues early in the cycle so that important feedback can be delivered to the developers. This is crucial to saving costs.

But many teams find timely performance testing hard to achieve in practice. Functional testing and debugging inevitably takes priority and delays the start of performance testing, creating need for additional, wasteful “hardening iterations”.

www.neotys.com www.neotys.com

Why and how performance testing should be done earlier

Agile Method

Waterfall Method

Sprint #1 Sprint #n...Sprint #2

Discover Design Develop Test

Discover

Design

Te

st

Develop

Discover

Design

Te

st

Develop

Discover

Design

Te

st

Develop

Late performance testing causes late releases and late features

If performance testing is done near the end of a development cycle, there will be little time to fix defects detected. Just as with functional defects, that delays releases and/or causes features that users need to be removed from them, or the decision to release defective software with significant risk of performance failure in production. Worse still, if the performance defects are found to be fundamental, they may require painful architecture-level changes taking a very long time.

How can you change this situation?

1. Put performance SLAs on the task board

Performance needs to receive continuous attention. User stories that are written from the functional perspective (as a user, I can click the ‘view basket’ button and view the ‘my basket’ page) can be qualified with performance information (as one of 1,000 logged-in, active users I can click the ‘view basket’ button and view the ‘my basket’ page less than one second later). Doing this to all user stories could become cumbersome, but most instances of it can be replaced with application-wide stories (as a user every page I request loads in less than one second), by adding the SLAs to a list of constraints (including functional constraints) that must be tested for every story, or by including specific SLAs in acceptance test criteria so that a story cannot reach ‘done’ until they are shown to be achieved. This last approach works particularly well when changes made for a story affect a relatively small proportion of the codebase so performance defects introduced will likely affect only a small section of application functionality.

2. Anticipate changes to performance test design

Provided testers stay engaged with the team and plan ahead, testing can stay ahead of the curve. One of the best things about agile for testers is that they learn about updates to development tasks in meetings with the developers themselves and can think immediately about how they will test the stories currently being coded. This thinking should include performance testing. Will new scripts and/or configurations be needed, or can existing ones be modified?

3. Get performance testing started faster

You can’t performance test early if setting up to do it takes too long. Choose a tool with time-saving features out of the box that help you get results fast: wizards, advanced pickers, automatic parameter handling etc.

4. Keep performance testing agile

You can’t performance test frequently if test design, maintenance and execution takes too long or causes process bottlenecks. Choose a very intuitive, recording-based tool to create scenarios 30-50% faster than by scripting, with content updating tools that identify when application changes affect test scenarios and re-record only the parts necessary.

5. Report performance defects effectively and efficiently

Agile teams don’t need to know that a test failed. They need to know exactly what in the application or infrastructure did not meet performance SLAs. Choose a tool with powerful, detailed comparison of test results and simple report generation to deliver immediately actionable information.

6. Collaborate on performance testing

Testing workflows are lumpy and performance testing is no exception. Agile is about everyone being able to help, when needed, with whatever is most urgent. Choose a tool with advanced sharing of all assets including virtual user profiles, load profiles, populations, monitoring and results so that flexible teams can work together, always with and on the latest versions.

neotys.com/ptwebinarTo register please visit:

Do you want to know how to get started with the new generation of advanced, accurate, fast performance testing?

Please see Henrik Rexed’s article on page 4 of this issue of Professional Tester, then attend our webinar “Specifying accurate and testable performance requirements”. Our experts will explain how to get the numbers nearly right first time, so you can start performance testing even faster.

Simply Powerful Load & Performance Testing

Neotys performance testing webinar, 7th October 2014

TIME

Discovery Design Development Testing Production

CO

ST

OF

CH

AN

GE

ADVERTORIAL A4 02.indd 1 2014-08-08 3:49 PM

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-neotys-advertorial

Timely performance testing prevents technical debt

Agile development particularly needs early and frequent performance testing. Performance should be tested at the very first iteration and SLAs assessed precisely at every sprint. That way, developers know immediately that something in a particular build has caused deterioration. Learning that several builds later makes isolating a nightmare and debugging exponentially more expensive.

Delivering actionable insight to developers quickly

Agile developers need to know more than just that their code is causing performance issues: they need to know when their code started causing problems and on what story they were working when the issue started. It’s a huge pain for developers to be forced to go back and fix code for a story they worked on weeks or months ago. It also means they can’t spend time working on getting new features out the door. It is important to address performance issues early in the cycle so that important feedback can be delivered to the developers. This is crucial to saving costs.

But many teams find timely performance testing hard to achieve in practice. Functional testing and debugging inevitably takes priority and delays the start of performance testing, creating need for additional, wasteful “hardening iterations”.

www.neotys.com www.neotys.com

Why and how performance testing should be done earlier

Agile Method

Waterfall Method

Sprint #1 Sprint #n...Sprint #2

Discover Design Develop Test

Discover

Design

Te

st

Develop

Discover

Design

Te

st

Develop

Discover

Design

Te

st

Develop

Late performance testing causes late releases and late features

If performance testing is done near the end of a development cycle, there will be little time to fix defects detected. Just as with functional defects, that delays releases and/or causes features that users need to be removed from them, or the decision to release defective software with significant risk of performance failure in production. Worse still, if the performance defects are found to be fundamental, they may require painful architecture-level changes taking a very long time.

How can you change this situation?

1. Put performance SLAs on the task board

Performance needs to receive continuous attention. User stories that are written from the functional perspective (as a user, I can click the ‘view basket’ button and view the ‘my basket’ page) can be qualified with performance information (as one of 1,000 logged-in, active users I can click the ‘view basket’ button and view the ‘my basket’ page less than one second later). Doing this to all user stories could become cumbersome, but most instances of it can be replaced with application-wide stories (as a user every page I request loads in less than one second), by adding the SLAs to a list of constraints (including functional constraints) that must be tested for every story, or by including specific SLAs in acceptance test criteria so that a story cannot reach ‘done’ until they are shown to be achieved. This last approach works particularly well when changes made for a story affect a relatively small proportion of the codebase so performance defects introduced will likely affect only a small section of application functionality.

2. Anticipate changes to performance test design

Provided testers stay engaged with the team and plan ahead, testing can stay ahead of the curve. One of the best things about agile for testers is that they learn about updates to development tasks in meetings with the developers themselves and can think immediately about how they will test the stories currently being coded. This thinking should include performance testing. Will new scripts and/or configurations be needed, or can existing ones be modified?

3. Get performance testing started faster

You can’t performance test early if setting up to do it takes too long. Choose a tool with time-saving features out of the box that help you get results fast: wizards, advanced pickers, automatic parameter handling etc.

4. Keep performance testing agile

You can’t performance test frequently if test design, maintenance and execution takes too long or causes process bottlenecks. Choose a very intuitive, recording-based tool to create scenarios 30-50% faster than by scripting, with content updating tools that identify when application changes affect test scenarios and re-record only the parts necessary.

5. Report performance defects effectively and efficiently

Agile teams don’t need to know that a test failed. They need to know exactly what in the application or infrastructure did not meet performance SLAs. Choose a tool with powerful, detailed comparison of test results and simple report generation to deliver immediately actionable information.

6. Collaborate on performance testing

Testing workflows are lumpy and performance testing is no exception. Agile is about everyone being able to help, when needed, with whatever is most urgent. Choose a tool with advanced sharing of all assets including virtual user profiles, load profiles, populations, monitoring and results so that flexible teams can work together, always with and on the latest versions.

neotys.com/ptwebinarTo register please visit:

Do you want to know how to get started with the new generation of advanced, accurate, fast performance testing?

Please see Henrik Rexed’s article on page 4 of this issue of Professional Tester, then attend our webinar “Specifying accurate and testable performance requirements”. Our experts will explain how to get the numbers nearly right first time, so you can start performance testing even faster.

Simply Powerful Load & Performance Testing

Neotys performance testing webinar, 7th October 2014

TIME

Discovery Design Development Testing Production

CO

ST

OF

CH

AN

GE

ADVERTORIAL A4 02.indd 1 2014-08-08 3:49 PM

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-neotys-advertorial-webinar


Testing in the lead

and leaving data across the web that enables analysis of their behavior: not only their cash purchases, but the free resources they access and exchange: music, photos, videos and any other digital content. If you are in the cloud, the cloud knows you, and what you are likely to do and buy.

As an example sector, retail is not the most important, but is easy to understand. Online-only retailer ASOS, 14 years old, will probably make £1bn sales in 2014, helped by its superb etail site. Marks and Spencer took 90 years to reach £1bn sales. It recently spent £150m on a “new” website whose failure in production has damaged the company financially to a much greater tune.

At the same time, digitization brings about continually, completely new business models and value propositions – driven by the digitization of services and their deployment from the cloud, connected devices, and the ubiquity of the (mobile) internet or social media functionalities. This digitization disrupts entire value chains, ecosystems and the competi-tive landscape in all industries and will continue to do so.

Companies in all industries in B2C and B2B markets are being challenged to adapt their business and technology strategies as well as their operational processes to these fundamental changes. Many talk explicitly about undergoing a ‘digital transformation’. It is critical for survival for every business today to understand what benefits and challenges digitization provides to their industry in general and their value proposition in spe-cific and formulate an overarching digital vision for their company.

What does digital transformation mean for test organizations? 1. A raft of projects focused on testing

systems to ensure the seamless flow of data across all customer channels and into the back end systems. For example, many retailers are looking to become “omni-channel”, whereby

customers are presented with a uni-form experience with a retail brand, whether they are engaging with them in store, online, via a mobile device, through a loyalty card scheme or via a service offered by a business partner (such as a “click and collect” service at another store or location).

2. Increased focus on user experience testing. The interface will become an even more important battleground as businesses compete to provide the best and most innovative digital experience. Testing teams will have to work more closely with marketing teams and digital agencies that the business engages for design and development services.

The biggest challenge will be that the testing organization will find itself pulled in two directions. In order to address business risk brought on by digital business models and interac-tions with customers, application development and testing processes need to change. However, large enter-prises formed application development and operations teams primarily to support internal facing business appli-cations linked to back end systems, ERP, and analytics platforms.

Externally facing applications, for example ecommerce websites or web portals, are usually developed as separate projects (either internally or by a provider). The organizational structures, roles, culture and mindset to work as an integrated team across application development and opera-tions teams is relatively new to most enterprise IT organizations.

Many businesses are already seeing a separation in how testing is performed between the core business and digital projects not just in terms of the pro-cesses and cycle times, but also in the tooling and platforms. Freely available open source tools offer a cost effec-tive way to support quick-fire digital projects and may not lend themselves

to the more industrial toolsets used in the core business operations.

The testing leadership team must decide where to strike a balance between control (demanding that all testing must be run and managed by a centralized organization) and flexibility (offering a menu of recommended/approved tools, processes and methodologies) to ensure that business can move at the speed that it needs to while ensuring the quality of its applications.

Cloud 2.0Many businesses aiming to embrace digital transformation will try to do so using cloud computing. See figure 1.

The adoption of cloud is entering into a second and more ambitious phase. Most companies have already deployed SaaS offerings around the edges of their core operations in areas such as salesforce automation and workforce management, while exploiting public cloud infrastructure-as-a-service to support short-term spikes in compute requirements.

But the next five years will see busi-nesses extend their use of both private and public cloud environments to support more critical workloads. Large enterprises will start to move their ERP platforms towards private cloud delivery, while the maturing of cloud development platforms will see more organizations build their new applications for cloud delivery from the outset.

PAC believes that cloud will pose three new challenges for the testing organiza-tion in 2015 and beyond.

Firstly, they will look to extend their use of cloud-based testing environments in order to keep up with the pace of new development projects. Many testers have used the likes of AWS or Google to spin up platforms quickly to support short-term projects, which have proved a much more cost effective way than investing in new servers internally.

Testing in the lead

However, they will need to take a much closer look at the economic business case for cloud-based testing platforms as the workloads increase in both volume and complexity. An increasing amount of new development work is being driven and overseen by busi-ness line leaders, which makes it more challenging for the leadership team to understand the total amount and cost of testing being performed. The testing function needs to get a clear picture of the current bills in order to make the best judgment about how to make best use of cloud going forward.

Other considerations will increasingly come to the fore. While scalability is the major advantage of using cloud-based testing platforms, clients will have to ensure that they can scale back as easily and rapidly as they ramp up their usage. As the business becomes more willing to use cloud platforms to support more critical workloads, security will be an increasingly important topic.

The testing organization will need to check their cloud provider’s security policies and robustness, particularly if applications that support customer or corporate data are in play.

The second major challenge will be the integration of new cloud-based software into the existing on-premise systems and other SaaS offerings. The share of SaaS in total application spending will increase

Figure 1: cloud and SaaS spending. Source: PAC

Discount valid on full price bookings made between 15th August and 30th September 2014 only. Not to be used in conjunction with other offers.

TestExpo returns to London this October!Join us to explore the theme of “Defi ning the Digital Testing Strategy”

20% discount for Professional Tester readers! Just quote ‘PTEXPO20’ in the comments fi eld

on our online registration form.

Visit: testexpo.co.uk

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-testexpo-half-page


Testing in the lead

substantially in the next three years (figure 1).

While this trend will remove some of the testing department’s traditional burden of large on-premise systems implementa-tions, the related integration work will be an increasingly important part of its work-load. This will cover testing the interaction of the systems at a functional level, and also will ensure that data sharing between the applications runs smoothly.

The third big challenge will be to keep track of the raft of new cloud-based tools that will become available and to evaluate their commercial propositions, which can often be less attractive than first per-ceived. For example, a pay-as-you-use deal sounds great – particularly for those businesses for whom testing is a highly cyclical activity - but not so appealing if you have to commit to pay a baseline sum over a three-year minimum sign-up period.

One of the main beneficiaries of the pro-liferation of cloud-based testing tools will be small and medium-sized businesses that previously did not have the budget to invest in enterprise-class offerings. Larger businesses will also be able to push their incumbent tools and services suppliers for more flexible delivery and pricing models, but they need to pay particular attention to the scalability of cloud-based offerings, which may quickly hit a ceiling in terms of their cost attractiveness.

Test tools aaS 2.0The two dominant players in the tools space are HP and IBM, positions that they owe in part to some major acquisi-tions in the past decade (Mercury and Rational respectively). Both HP and IBM view the SaaS model as a way to open up their tools to smaller accounts, and also to defend their market against emerging open source and niche tools vendors which clients are exploring as they look for specific functionality.

The challenge of selecting a testing tools supplier is further complicated by the

ongoing growth of open source suppliers. Businesses have been aggressive in their adoption of open source tools to support performance and load testing due to major cost savings that can be made, but they tend to be deployed to support specific projects or as a complement to on-prem-ise tool suites.

Staying in the leadSo how can testing ensure that it stays on top of the new demands of digital and cloud?

For some, it will be a long, hard fight to regain absolute control, while others will refocus on central operations and position themselves as an advisor to those parts of the business driving digital projects, offering guidance on best prac-tice, tools and suppliers. This “two-speed approach” is one that is favored by businesses in sectors such as insurance where the legacy challenges in their back and middle office are so great that they have put in separate teams to ensure that their customer-facing services keep pace with the market.

Whichever path they follow, the key to the success of the testing leadership team will be stakeholder management. While the involvement of business

lines in the testing process is noth-ing new, it is certainly becoming more important as more businesses adopt agile development and testing models, while the focus on digital transformation means that the business has increas-ingly demanding expectations on the look-and-feel of new applications. The lines of communication also need to be opened to the chief information security officer (CISO), as the focus of cyber attacks shifts from the network perim-eter to the application layer.

All this is of course set against a back-ground of a renewed focus on software quality. Organizations have become more dependent on the performance of their applications, which means that major failures have become front-page news. This doesn’t just apply to outages suffered by online businesses such as Twitter or iTunes, but also to banks, stock exchanges, retailers and government agencies who have seen their senior executives forced to face up to media barrages following high-profile system failures in the last 12 months.

No matter how the testing function han-dles the balancing act, quality will remain hugely important

Nick Mayes is a research director at Pierre Audoin Consultants (http://pac-online.com). This article is based on research from its latest report “Software Testing in 2015”


Testing in the lead

The technological innovations that have changed human life so much in the last few decades, and those that will do so even more in the near future, are in software. IT hardware is now seen as a mundane underlying entity needed to support software functionality. This sudden and dramatic change of emphasis is as important as the industrial revolution of the last century.

But the industrial revolution happened relatively slowly: with mistakes, but also with time for good principles and practices to be developed and refined. That is not the case for software, and those of us who care about how good software is and how well it is produced cannot help but be keenly aware of the flawed, too diverse,

often chaotic processes taking place and the great dangers of that.

So there is a tendency to think about how good manufacturing practices could be applied to software development. That is a good idea: PT is fond of pointing out that adhering more closely to some of them, in particular formality, would benefit many software organizations.

The mistake many such thinkers make – and it is a bad one – is to draw an analogy between manufactured physical products and software. There is no such analogy. The two could not be less similar.

Software is not a carA manufacturing process aims to achieve an optimal balance between quality, productivity and cost. It is sometimes appropriate and legitimate to compromise quality. For example if a plastic moulding machine can produce, per minute, either 10 perfect toy soldiers, or 25 of which on average 5 are faulty, no-one would hesitate to turn up the speed.

The concept of tolerance exists for the same reason. The goal is not to keep all products as close as possible to the centre of the tolerance range. It is to make prod-ucts that are anywhere within the range as quickly and cheaply as possible. All those products are useful.

Trying to apply this logic to software devel-opment will lead first to numerous errors causing waste and/or failure and then to software that does not meet its require-ments and is a lot worse than useless.

The term manufacturing is often used to mean mass production, but software is not mass produced, nor even produced in small quantities as in, for example, manufacture by handicraft. We do not aim

Zero toleranceby Sakis Ladopoulos

Sakis Ladopoulosproves by indisputable logic that testing should control the project

Software is as software does


Testing in the lead

to produce a program more than once. Despite what some developers might say, there is no such thing as “inventory” in a software process. There is only unfin-ished work.

A better analogy can be drawn with fabrication: consider for example a metal works commissioned to produce a unique staircase for a certain building, to the customer’s unique specifications. If the product does not meet all of those specifi-cations it cannot be fitted nor used and will rightly be rejected.

Perfect is the enemy of good enoughThis commonly-heard aphorism does not mean that it is OK for delivered software to meet only part of its specification. That would be a contradiction in terms: its specifications define, precisely, what is good enough. Rather it means that it is a dangerous mistake to continue to improve any work product beyond the point where it meets its specifications. A design docu-ment, once it can be shown to implement everything in the requirements document upon which it is based and to comply with all prevailing standards, must not be “improved” in ways not mandated by the requirements. If improvement is desired, the requirements must be improved first, and only then the design.

The same applies to all other work products including the delivered soft-ware. Before changing it, whatever is to be changed must be traced back until the correct product in which to make the change is found: all the way back to requirements if necessary. To do other-wise is to create discrepancy, making both the changed and unchanged items unfit for their purpose.

Software is qualityWe have seen that software is not at all like physical products. So what is it like?

Here lies the problem. Software is not like anything else. It is by definition highly vari-able and totally impalpable. That is why there are so many theories and there is so little agreement on how best to produce it.

Be honest with yourself. However good your software skills, you don’t know much about software. No-one does or can. Software is more complex than chess, bigger than the universe, better than life.

So we should look closely at what little we do know and understand, to try to go back to basics and identify that of which we are sure. So: what actually is software? What exactly is the main deliverable of a software project? This obvious question has been answered many times, but no answer is or can be satisfactory: as Willis and Danley put it, trying to answer it is like trying to nail jello to a tree. So, it is still asked often. I almost always ask it when I interview job candidates.

My own answer is that software is quality. The word cannot be used as an adjective to describe software, nor as a noun by which to call an attribute of software. Quality is the subject of soft-ware, its essence, the reason it exists. Software is what it does. If it does what it should, things happen that we want

to happen. Otherwise, it causes other things, or nothing, to happen. Desirable happenings are the true, tangible deliverable: the software is used by people as its buyer intended, to do the things its buyer wanted them to do. The consequences of that not happening are more tangible still.

Development is testingIt can be argued that Newton’s second law of thermodynamics dic-tates that it is impossible to write – or test – code without making mistakes. Imagine a good requirements document for a complex user-facing application, and a concerted effort to deliver them within which very many mistakes are made. The program delivered does nothing other than print “Hello world” to its current output device. It meets all requirements! It is just very defective. To make it behave as it should, so that it makes desirable things happen, we simply need to repair its defects. To do that, we need first to identify them

Frequent PT contributor Sakis Ladopoulos is a test manager at Intrasoft International (http://intrasoft-intl.com), testing trainer and conference speaker


Testing in the lead

A few years ago, there was some debate about whether large enter-prises, governments, etc. should buy and become dependent upon systems and services that included free software components. The debate was too late, because they already had

and were. Concerns about quality and security, while valid, could not stand in the light of experience which shows that, while both have defects, so far, free software has proven to be at least as good as proprietary software at (i) not having them; (ii) having them found; and (iii) getting them fixed fast. Point (ii) is also logically obvious: general availability of the source code enables third parties to use a whole array of powerful test techniques not available without it.

So, the use of complete third-party free software components is generally beneficial, but it does create a prob-lem. This problem is something that should send a shiver down your spine when you realise its implications: code reuse, as every tester knows a rich source of defects, even more so when the reused code is someone else’s.

Does the product you are testing include code copied and pasted from free software source, plumbed in, and maybe hacked about a bit? What if it is defective? What if some of those defects are security vulnerabilities? You may think that an update from the original provider will fix it, but that’s only if you keep up with the updates. Either way, that code is sticky. You need to know, urgently, what it is, from where it came, and if possible, from what version. I’ll discuss how that can be done below, but first let us look at the implications of its presence.

What to do about suspect third-party codeFirst, find the origin of the third-party code and try to track its latest version. If it is the same as your code, minus the plumbing and hacking (and you understand the plumbing and hacking and are certain your developers did it),

Open but hiddenby Normand Glaude

Normand Glaude on a testing responsibility and technique you may have missed

From where did the code you are testing come?


Testing in the lead

chances are you are current and will not learn much from the history of the code. Document your finding for repeatability.

If the code is different, whether or not you understand why, research the history of known defects in the product from which the code came. The release history may well inform easy tests for the most critical defects. But remember that testing can show only presence, never absence, of defects: those tests passing does not mean that the code is OK.

So, you must apply also whatever structural test techniques are, and to the greatest extent, available to you: use static and dynamic analysis

not to assess quality of the code, but to create test cases, with expected outcomes of which you are confident, and implement and execute those test cases, using unit test frameworks, debugging tools or instrumentation code as necessary.

Finally, revise your functional test plan in the light of the new information. Identify the tests most likely to exercise the suspect code. Trace them back to the requirements they were designed to assure. Now, using what you know about known defects in the product from which the code came, apply the fundamental technique error guessing to design additional tests to assure those requirements.

Is your product legal?The use of other people’s code comes with conditions, as defined in the licence of the product from which it came: many different licenses exist, and by their nature they often prove to be incompat-ible with business objectives. The first thing to establish is what licences apply and what conditions they require. For example, the MIT licence is one of the most permissive: to paraphrase, it says “you can do anything you like with this code except hold anyone else respon-sible for it”, but most forms of it also include the following statement:

“The above copyright notice and this permis-sion notice shall be included in all copies or substantial portions of the Software.”

Figure 1: Protecode reporting

Testing in the lead

If your product contains even a small fragment of code from another product released under such a licence, but does not include the copyright notice, you are in breach.

Most free software licences include more conditions. For example, the very widespread GPL licence uses the popular method “copyleft” which requires anything derived from anything it covers to inherit the same licence. In other words, if your product contains any GPL code, your product is also GPL and you must release its source code! If you don’t want to do that, you need to replace the GPL code. Now – before anyone finds out.

You may even discover that your prod-uct contains proprietary code – source or executable – which is not validly licenced at all.

Some code is also affected by other legal restrictions, eg export licences. For example, if your product borrows encryp-tion routines (one of the most frequently reused functions) it may be illegal to distribute it in certain countries.

Identifying reused codeSo, how to find the reused code in your codebase? Some instances, especially more substantial ones, may be revealed by design documentation and change management information, or by talking to programmers.

But, it is likely that some instances will not have been recorded nor remembered. My company’s product Protecode (see figure 1) detects these by scanning the product, both source and complied code. In a simi-lar way to how an antivirus program looks

for known code, Protecode compares the target code with its vast database contain-ing details of hundreds of millions of files. Importantly, the comparison is based on structure, not text equivalence, so it can find even code which has undergone a great deal of change.

As well as telling you from what product and version range your code came and whether it has been modified, Protecode uses the U.S. Government’s National Vulnerability Database (see http://nvd.nist.gov retrieved 12th August 2014 1400hrs UTC) to alert you to known vulnerabilities in those product versions, and reports exactly what licences and other restric-tions are in effect. Using it, testers can detect defects and risks associated with code reuse immediately they arise at any time, including very early in the lifecycle

Normand Glaude is chief operating officer at Protecode (http://protecode.com)

VISIT www.eurostarconferences.com OR

CONTACT [email protected] FOR MORE INFO

JOIN US AT EUROPE'S LARGEST

SOFTWARE TESTINGCONFERENCE & EXHIBITION60+ Sessions Including 6 Keynotes • 5 Full-Day Tutorials6 Half-Day Tutorials • 40 Track Sessions • 3 Workshops

BOOK BEFORE SEPTEMBER 26TH TOAVAIL OF THE EARLY BIRD DISCOUNT

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-eurostar


Testing in the lead

The best test data is production data. It contains all known variations. It is consistent with the production database and external data ser-vices. Its volume is perfectly correct for relevant, cost-effective volume testing and because it contains accurate chronological data, also for relevant, cost-effective stress testing. Collecting and using it is very easy.

So, many test organizations use produc-tion data, including private personal data, for testing and ignore the many dangers. This situation has now continued for two decades. Everyone sane knows it must stop. Testing adequately without break-ing the law is inconvenient, difficult and expensive. But testing that depends on breaking the law won’t become more widely adopted, used and integral to all work involving software, as testing should.

The letter of data protection law varies and is open to interpretation, but there is no need to consider the variations or interpretations because using these in any way would be disingenuous. The spirit of the law is perfectly clear and its purpose obvious: we are obliged to do all we can to prevent any private information relating to any individual becoming known to anyone who does not need to know it for essential, legitimate, operational business reasons. Testing is not operational.

So what does ‘all we can’ mean? Does it give us the freedom to allow a tester (or a developer or anyone else) who is legally bound, and trusted, not to reveal it to anyone else nor to misuse it, to use private information? No, because putting it in a development or test environment with software that is still being tested risks breach. The only production data that can be used legally for testing is that which will not reveal private information.

Please note the important difference between “will not” and “cannot”. Most approaches to making production data safe to use for testing could be defeated by someone prepared to invest great effort in doing so. As in all data security work, the likelihood of that (which is, usu-ally, a function of the gain possible from misuse of the information obtained) must be taken into account when deciding

Forget me notby Edwin van Vliet

The second part ofEdwin van Vliet‘s series on test data management

Testing needs to come out of denial about data protection

Testing in the lead

what prevention measures to take. The same thinking applies to information which does not come under data protec-tion law but is sensitive for business reasons and thus could be used illegiti-mately: for example an organization’s internal financial information.

Understanding how the available meas-ures work, and therefore their inherent risk, should lead to improvement: better testing with less risk of breach.

ScramblingReplace every alpha character in the data with ‘x’ and every numeric character with ‘0’. Replacing capital letters with lower case x may cause false incidents due to input validation code. If so, replace upper case alpha characters with ‘X’.

For example, replace ‘[email protected]’ with ‘[email protected]’.

An email address is an excellent example of a data item suitable to be anonymized by this simple and very effective method. It is unlikely to be parsed therefore unlikely to cause false test incidents.

Note that if the genuine data contains many instances of the characters X, x and 0, it may be changed very little by application of the method.

ShufflingData in one or more fields is moved between records. To most intents and purposes the data remains similar to production data, but now no record is that of a real person.

Imagine a real database of people with addresses. Sort the data in the field ‘town/city’ pseudorandomly. Has the data been anonymized? Of course not. It is trivial to write

a program that uses web services to derive the town/city from other address fields.

So shuffling must be applied to more fields. But that will make them invalid, that is they are no longer addresses that really exist, and that may cause false test incidents. Depending on the functionality under test, and how it is being tested, it is often possible to avoid this by limiting the degree of change yet still achieve sufficient anonymization. Designing (based on the test item specification and design) and implementing an algorithm to do this may require significant effort and sophisticated tools.

BlurringPurely numeric fields are adjusted numerically. For example the year in the date of birth of a real person may be randomized between the real year

Software Testing Specialists

[email protected]

Call Us+44 (0) 208 905 2761

e-testing deliver confidence in IT projects across all industries and verticals. We are mindful of the way you work and have the agility to meet the specific requirements of your organisation. In an industry of ever changing needs, we provide specialist testing solutions that improve efficiency and mitigate project risk.

Delivering better quality software for

15 years.

Consultancy Outsourcing

Resourcing Training

Mobile TestersWe are looking for

eople ower assion

Europe's first mobile systems integrator is looking for

experienced mobile testers and mobile security consultants

for a variety of customers in Belgium, France, United

Kingdom, Germany and the Netherlands. Please send

your resume to [email protected].

Amsterdam Brussels Ghent Paris London Munich

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-gpxs

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-etesting


Testing in the lead

plus or minus 10, the month between 1 and 12, and the day between 1 and 28/29/30/31 according to the random month and year: in other words, the person’s date of birth is replaced with a random, but feasible, date of birth.

Whether this will cause false test inci-dents again depends on the test. It may be possible to prevent it by applying fur-ther constraints based on product or test specification. For example, the blurring algorithm can check to see whether the blur causes the known input equivalence partition in which the date lies to change: for example if a person aged 18 appears, after blurring, to be 17. If such a change is detected, the blur is redone, until a value that causes no change of partition is obtained.

Again depending on whether they will cause false positives or reduce test coverage, easier but still effective methods are sometimes available. A good example is simply to set all values of in the ‘day’ field to 1. In her paper Simple Demographics Often Identify People Uniquely (Carnegie Mellon University, 2000: see http://dataprivacylab.org/pro-jects/identifiability/paper1.pdf retrieved 11th August 2014 1500hrs UTC), Latanya Sweeney shows that 87% of US citizens can likely be identified from their gender, zip code and date of birth, but taking away the day of the month reduces this to 3.7%. Obviously the full records of people who actually were born on the first of the month should be removed.

ReplacingSome data fields are designed to be unique: for example what is called in the UK ‘National Insurance Number’ and in the Netherlands ‘Burgerservicenummer’. This number associated with any other information during testing allows that information to be associated with the individual to which it pertains.

For many tests, fields like this can be shuffled, but this is not sufficient to protect identity. Because the field data is unique, someone who knows what

it is for a specific person and finds it in a given database knows that more information about that person is also in the database. If they have any other information about the person, eg name, it is a simple matter to make the search very narrow.

So, the only effective approach is to scramble/and or blur parts of the field data itself. But for many tests, it must be kept valid according to what is consid-ered valid by that test and anything it invokes, or false incidents will result. For example, a Dutch BSN is nine digits. It is unique, so it must be replaced. But it is subject to restrictions defining whether it is or is not valid: a weighted ‘11 proof’ which ensures that at least 2 digits are different between any two valid BSNs. This creation algorithm, because it is known, enables a valid ‘fictive’ value to be created to replace the real value.

Chain depersonalizationUsually, anonymization must be applied consistently to avoid loss of data integrity and false positives. If similar or related data exists in multiple tables and data-bases, it needs to be anonymized in the same way in all of them. Many tests will depend on structure: for example

suppose we shuffle surnames but the test item deals with family relations. We must use the same shuffling key values on everyone in the family, or more likely everyone who shares a surname, or even a similar surname.

In order to be able to undo or unravel part of the anonymization where necessary, many test data tools use ‘translation tables’ which record what key values have been applied to what records. Retaining these translation tables after the anonymization process is clearly dangerous: they must be kept highly secure, for example by allowing access only via the anonymization tool.

The translation table can also be reused to anonymize further data for refresh purposes. Some organizations pass anonymized new production data to test environments frequently. To prevent degradation of test effectiveness and for security the translation table is refreshed too, but perhaps less frequently. This saves work because the new data is anonymized, in chosen respects, in exactly the same way as the old, enabling existing test cases and their associated test data to be kept valid without requiring maintenance

Edwin van Vliet is a senior consultant at Suprida The first article in this series appeared in the July 2012 issue of Professional Tester (http://professionaltester.com/magazine/backissue/PT015/ProfessionalTester-July2012-vanVliet.pdf). Next: test data strategy

20% discount for readers of Professional Tester. Just visit testexpo.co.uk and quote ‘PTEXPO20’ in the

comments fi eld on our online registration form.

After a hugely successful event last year, TestExpo returns to London this October.

The theme of “Defi ning the Digital Testing Strategy” aims to address the ways we might rethink our testing practices and incorporate new technologies to support the delivery of high quality applications.

Book your place today and join us for thought-provoking presentations, best practice tips, lively discussion sessions and real life case studies from Experian and BBC Future Media.

Discount valid on full price bookings made between 15th August and 30th September 2014 only. Not to be used in conjunction with other offers.

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-testexpo-full-page

Performance Testing For Web and Mobile Applications

www.neotys.com

Fast & Automated

Unmatched Test Realism

Lower Total Cost of Ownership

NeoLoad is a load testing software solution designed for web and mobile applications to realistically simulate user activity and monitor infrastructure behavior.

Around the world, companies across all industries rely on NeoLoad for its unbeaten value, TCO, and ROI. Neotys’ dedicated support is another key factor in its customers’ satisfaction.

NeoLoad provides you with a complete range of exclusive features to make designing your performance tests faster and easier. This means you and your teams – even non-specialists – can test earlier in the deployment life cycle and increase testing frequency before the Go Live date, improving your ability to reduce risks and prevent poor web and mobile application performance.

neotys.com/ptwebinar

To register please visit:

Join the webinar on Tuesday 7th October 2pm UK time

Specifying accurate and testable performance requirements

Presented by In partnership with

To help you in your approach of load and performance testing we invite you to attend our live webinar

www.neotys.comFor more information, visit:

Neotys Prof Tester A4 Back Cover Ad 05.indd 1 2014-08-11 1:20 PM

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-neotys-back-cover

http://professionaltester.com//link/clicks.asp?type=WebHomePage&adid=3&link=pt28-neotys-back-cover-webinar

Documents

PT-28