Utku ÖZBEK 2006703363. Outline Introduction Group Testing For Reliability Application of this Reliability Model Weather services application Results of

Utku ÖZBEK2006703363

OutlineIntroductionGroup Testing For ReliabilityApplication of this Reliability Model

Weather services applicationResults of the application

ASTRAR Group TestingApplication of this Group Testing Model

A real-time stock-buy-sell web service applicationResults of the application

ConclusionReferencesQuestion & Answer

IntroductionSoftware development is shifting

from the product-oriented paradigm to the service-oriented paradigm

Web Services (WS) services that are offered through Web and

Internet technologyExamples: tax return service, stock ranking

service, and equations-solving servicecan be offered by many service providers, based

on the same theories but different implementations

trustworthiness and dependability problem

IntroductionWeb Services (WS)

Under SOA and WS, a system consists of a collection of loosely coupled services

These services can make use of each other's services to achieve their own desired goals and end results

Simple services can cooperate in this way to form a complex or composite service dynamically and at runtime

IntroductionHistory on WS testing:

In phase one, WS are essentially tested like ordinary software

In phase two (2003-2005), the following are included in testing: publishing, finding, and binding capabilities of an SOA (Service-

Oriented Architecture) the asynchronous capabilities of WS the SOAP (Simple Object Access Protocol) intermediary

capability the quality of services.

In phase three (2004 and beyond), the following are included in testing: dynamic runtime capabilities, WS versioning, and WS orchestration testing, which invoke

remote WS in a specific order to test their interoperability.

IntroductionHistory on WS testing:

Both clients and service providers must be involved in WS testing

issues must be addressed during WS development including: Security Interoperability UDDI (Universal Description, Discovery, and

Integration) registration Performance considerations

IntroductionThis presentation proposes

a Service-Oriented software Reliability Model (SORM)

This model evaluates the reliability of WS in two steps: Use highly efficient group testing to evaluate the

reliability of atomic services Evaluate the reliability of a composite service based on

the reliabilities of the component services

a technique to test large number of WS simultaneously

to determine the oracle and correctness of the WS under test by majority voting

to provide quality ranking of WS and the test cases

Group Testing For ReliabilityWebStrar

Web Services Testing

Reliability Assessment

Ranking servicesDirectory services

Group Testing For ReliabilityThe WebStrar can take registration from service providers

and various kinds of service brokersIt considers the services registered as atomic services

and uses them to compose composite services An atomic service is a service agent submitted by a service

provider that does not call other WS and thus should be treated as a unit that is not to be broken like atom

A composite service is a service agent submitted by a service provider that uses (calls) other WS

Both atomic and composite services can be provided by the WebStrar directly to the clients

Group Testing For ReliabilityGroup testing

technique was originally developed for testing large samples of blood

It is used to test complex composite WS at runtime

It tests the contamination of an entire group of services by applying one test.

Group Testing For ReliabilityAssume CSn is a composite service consisting of n services

S1, S2, ..., Sn, where Si can be an atomic serviceAssume services S11, S12, ..., S1m are functionally

equivalent to the service S1 in CSn. We can forward (broadcast) the input to S1 to S11,

S12, ..., S1m The results from all services, including that from S1,

are voted by a voting service The voting is weighted based on the current reliabilities

of the services under test. The voting service can set the initial weight of each

incoming service to zero while the exiting service S1's weight to the reliability R(S1)

The voting service detects faults by comparing the output of each service with the weighted majority output. A disagreement indicates a fault.

Group Testing For ReliabilityThe reaibility of the services is calculated using

formula:

Where:the reliability of service S at time point t is R(S,

∆t)In the next ∆t time, k runs are executed and f

disagreements have been detectedM is the total number of tests that the service has

ever been tested

The advantages of the model include:One of the toughest problems in software testing is to

construct an oracle that can determine if a fault has occurred. In this model, the voting service serves as the oracle according to the majority principle.

The model estimates the reliability of each incoming service while performing the normal operation. In other words, the incoming services are tested in the real operational environment at no extra time if sufficient computing power is available.

The model is dynamic, i.e., the data are collected and computed at runtime in real time. The reliability of each service involved in group testing is updated after each run or after a given period of time.

Group Testing For Reliability

Group Testing For ReliabilityOne situation in which SORM would not work

well is:when there are no alternative services

availableIn this case, the SOA is basically degraded to

the traditional software architecture: The service is only tested by the service provider in

its development cycle. However, this is an unlikely situation because SOA

is an open platform that allows and encourages cooperation and competition among service providers to create increasingly improved services.

Application of Reliability ModelsExamples to illustrate the applications of the

proposed service-oriented reliability model:Assume a space agency plans to launch a satellite

on a specific date and from a specific locationThe launch is heavily depends on the weather

conditions in the launch location, including rain, wind, and temperature

They designed 10 independent weather services, each of which offers three component services: RainForecast, TempForecast, and WindForecast

The forecasts are given by their probabilities

Evaluation of component ServicesTo build a trust on the reliability of the component services,

the space agency puts them in a group testing framework, and sets their initial reliability to zero

After a period of group testing, the space agency has the reliability estimation of each service. Table 2 shows a set of sample results obtained in their experiments.

Evaluation of component Services The first column of the

table lists the component services under test.

The second column shows the highest reliability of the service in the given test period.

Column 3 shows the forecasted probabilities of heavy rain, extreme temperature, and the strength of the wind, respectively, of the component services.

Column 4, shows the adjusted forecast probabilities from the service by taking the reliability of the service into account, which are the final evaluation values for the component services

Evaluation of Composite ServicesTo base the decision whether to change the

launch date on the most accurate whether forecasting information, the space agency then constructed a composite service, as shown in Figure 3.

The decision is based on these two factors:The numbers in the diamond boxes are the

reliability of the best component servicesThe numbers on the branches are the

probabilities forecasted by the best service

Evaluation of Composite Services

Evaluation of Composite ServicesAssume that the plan of launch is made a year before the

launch date. The composite service is up and running from day one.

At the beginning, the space agency would have little data about the reliability of each service and the weather forecast a year before the launch date won't be accurate too.

However, by the time, say a month or a week before the launch, we already have sufficient data about the reliability of the services.

These reliability data will be used in the future applications too. When the agency plans its next launch, or another event that needs weather forecast, it already has the reliability data.

Results of the applicationDesign Of Experiment (DOE) is an engineering technique

that can be used to determine the extend of the impact of the parameters (factors) of a model on the final results.applies DOE to analyze the impact of the reliability of the

component services on the reliability of the composite service three factors in our example, the reliabilities of

RainForecast TempForecast WindForecast

They use 2 level DOE techniques, i.e., use high and low values of each factor: RainForecast (70%, 90%), TempForecast (90%, 99%) and WindForecast (85%, 95%).

The 3-factor and 2-level design generated an ANOVA (ANalysis Of Ariance) table

Results of the applicationThe F-Value represents the significance of

the impact of a model and its components.In general, if a component generates a

significance value “Prob>F-Value” of less than 0.05, the impact of the componet is significant

Results of the applicationThe experiment results in Table 3 also show

that the FValues and significances of RainForecast, TempForecast, and WindForecast are all less than 0.0001, and thus they are all significant model components

Results of the applicationthe higher the

component reliability, the higher the overall reliability

The impact of the RainForecast service is much more significant than that of the others

The space agency should pay more attention to the quality of rain-forecast service-provider

Results of the applicationThe evaluation process is dynamic and at

runtime.The vastly available WS on-line make it

necessary to perform group testing, which, in turn, makes it possible to identify the correct service output without having to design an oracle

ASTRAR Group Testinga technique to test large number of WS simultaneously, to

determine the oracle and correctness of the WS under test by majority voting, and to provide quality ranking of WS and the test cases.

can be used by WS service providers, brokers, and clients. A WS provider or client can use the technique to find the best WS for composing new services or applications. For example, a WS provider can compose a digital imaging

using the Fast Fourier Transformation service as a component service.

A WS broker can use the technique to evaluate the quality of WS trying to be registered to make sure only WS with reasonable quality will be offered to the public.

ASTRAR Group TestingThese techniques are used to rank different

WS implementations based on the same specification, the same business logic, and the same input and internal states.

In other words, the WS under group testing should produce the same or close results if the same inputs are applied, e.g., various Fast Fourier Transformation WS

should produce the same or close results based on the same input

ASTRAR Group Testingtechnique proposed here has the following

advantages:It can test large number of WS rapidly and rank

them according to the test resultsIt can automatically create the oracle of test

cases, i.e.,the expected outputs for the given inputs

It can rank the effectiveness of test cases and thus apply the most effective test cases first to eliminate unacceptable WS quickly

Most of the steps in the process can be completely automated and this feature makes this process attractive for commercial applications.

ASTRAR Group TestingA group testing technique, originally

developed for testing a large number of blood samples and later for software regression testing is an attractive solution to address :The Service-Oriented Architecture (SOA) based

WS broker allows WS developers and providers to freely register WS and compose complex WS from other WS dynamically

As a result, for each WS specification, many alternative implementations may be available.

ASTRAR Group TestingASTRAR can test a large number of WS at

both the unit and integration levels. At each level, the testing process has two

phases:Phase 1: Training PhasePhase 2: Volume Testing Phase

Phase 1: Training Phase The process assumes that a reasonably large number of

test inputs or test cases are available to test the concerned WS before the start of this phase1) Select a subset of WS randomly from the set of all WS to be

tested. The size of the subset will be experimentally decided.

2) Group testing: Apply each test case in the given set of test cases to test all the WS in the selected subset.

3) Voting: For each test input, the outputs from the WS under test are voted by a stochastic voting mechanism based on majority and deviation voting principles.

4) Failure detection and reliability computation: Compare the majority output with the individual output. A disagreement indicates a component failure. A dynamic reliability model is used to compute the reliability of each WS based on the failure rate and other factors.

Phase 1: Training Phase5) Oracle establishment: If a clear majority output is found,

the output is used to form the oracle of the test case that generates the output. A confident level is defined based on the extent of the majority. The confident level will be dynamically adjusted in the phase 2 as well.

6) Test case ranking: Test cases will be ranked according to their fault detection capacity, which is proportional to the number failures the test cases detect. In the phase 2, the higher ranked test cases will be applied first to eliminate the WS that failed to pass the test.

7) WS ranking: The stochastic voting mechanism will not only find a majority output, but also rank the WS under group testing according to their average deviation to the majority output.

Phase 1: Training PhaseBy the end of training phase testing, they

have tested the selected sample WS and they have the test cases ranked by their capability so far in detecting failures; the oracle for test cases established with

respect to their confidence levelsthe sample WS are ranked

Phase 2: Volume Testing PhaseThis phase continues to test the remaining WS and

any newly arrived WSbased on the profiles and history (test case

effectiveness, oracle, and WS ranking) obtained in the training phase.

Phase 2 continues to rank the WS, rank test cases, and update the oracles.1) Test cases have been ranked by their capabilities in

detecting failures/faults in Phase 1. Now they are divided into layers, with layer one having the highest capability.

2) Select layer one test cases and apply them in the next step

3) For each layer of test cases, group-test all the WS

Phase 2: Volume Testing Phase4) If an oracle with acceptable confident level (e.g., greater

than 50%) exists, no voting is necessary: Use the oracle to detect failure: Determine if each WS has produced a correct answer and then compute the failure rate and possibly the reliability of each WS using the given reliability model

5) If no oracle with acceptable confident level exists, use voting mechanism to detect failure, as described in phase 1.

6) Update the confident level of the oracles: an agreement between the oracle and the current test output increases the confident level, otherwise, decreases the confident level accordingly;

7) Update the ranking of test cases by including the new number of failures detected

Phase 2: Volume Testing Phase8) Update the ranking of WS and eliminate the WS that have an

unacceptable level of failure rate or reliability. The elimination of unnecessary testing in this step saves testing time.

9) Select next layer of test cases, and return to step 3.By the end of Phase 2 group testing: all the WS available

are tested and a short list of WS are ranked; test cases are updated and rankedoracles and their confidence levels are updated

The same processes can be applied at the integration testing level. If a composite WS consists of n different units of WS, ASTRAR group testing technique can be applied to this composite WS by considering each composite as an individual WS in the group testing.

Application of this Group Testing ModelA real-time stock-buy-sell WS is used as an example to

illustrate the application of ASTRAR technique.The WS under development consist of a server WS

and multiple client WS, residing in different locations. A client can send requests to the server and the server

responses to the requests. All WS under group testing implement the same

specification.The WS Server offers two functions and Client WS can

access these two functions.The database consists of objects of stock information,

defined in the Class Stock.

Application of this Group Testing Model

Application of this Group Testing ModelEach stock object is set to an initial value at

certain time point. The evaluation engine then uses randomly

generated purchase and sale information, or uses replayed data from past stock dump, to decide the price dynamically once every minute.

Once the price is changed, the other members (the percentages of changes in a minute, a day, a month, and a year) of each stock object are computed and updated.

Results of the applicationThe size of the subset (training size) is criticalThe smaller the size is, the cheaper (fewer test runs)

the testing and ranking process will be. However, the smaller the size, the higher the probability

that the training phase fails to find the correct oracleAn incorrect oracle will lead to incorrect ranking of the

WS under test, while an incorrect ranking of test cases may result in more test runs in phase 2 of ASTRAR process.

Another factor that affects the testing cost is the target size, the number of WS to be ranked. For a given large number of WS to be tested, only a

short list of best WS needs to be ranked.

Results of the application proposed an efficient process to test a large number of web services

designed based on the same specification The process is divided in two phases. In phase 1 (training phase), a

selected number of WS is tested and their results are voted. The purpose of the first phase is to establish the oracle and identify the

most powerful test cases. In the phase 2, no voting is applied and the oracle created in phase 1 is

used to judge the correctness of WS under testing. Furthermore, the powerful test cases are applied first, so that the

incorrect WS can be eliminated in a few tests. The experiment results reveal that

the smaller the training size, the lower the cost. However, a small training size can lead to incorrect oracle, leading to

incorrect WS ranking. A small training size can also lead to incorrect test case ranking,

resulting a higher test cost in phase two. Therefore, it is critical to select a reasonable sized training size in WS

group testing As future work

explore the impact of the age of the test cases.

ConclusionThey have proposed presentation proposes

a Service-Oriented software Reliability Model (SORM) which generates a voted information on the fly

without using an oraclea technique to test large number of WS

simultaneously which uses an oracle to test the correctness of new

web services

References [1] W. T. Tsai, D. Zhang, Y. Chen, H. Huang, R. Paul, N. Liao, “A

Software Reliability Model for Web Services,” the 8th IASTED International Conference on Software Engineering and Applications, Cambridge, MA, November 2004, pp. 144-149.

[2] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Developing and Assuring Trustworthy Web Services,” 7th IEEE International Symposium on Autonomous Decentralized Systems (ISADS), April 2005, pp. 43-50.

[3] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Adaptive Testing, Oracle Generation, and Test Case Ranking for Web Services,” 29th Annual International Computer Software and Applications Conference (COMPSAC’05), 2005.

[4] W.T. Tsai, Y. Chen, R. Paul N. Liao, and H. Huang, “Cooperative and Group Testing in Verification of Dynamic Composite Web Services,” in Workshop on Quality Assurance and Testing of Web-Based Applications, September 2004, pp. 170-173.

[5] W.T. Tsai, Y. Chen, R. Paul, “Specification-Based Verification and Validation of Web Services and Service-Oriented Operating Systems,” Proc. of IEEE WORDS, Sedona, February 2005.

Documents

Utku ÖZBEK 2006703363. Outline Introduction Group Testing For Reliability Application of this Reliability Model Weather services application Results of