Upload
gabriel-rodgers
View
222
Download
4
Tags:
Embed Size (px)
Citation preview
Utku ÖZBEK2006703363
OutlineIntroductionGroup Testing For ReliabilityApplication of this Reliability Model
Weather services applicationResults of the application
ASTRAR Group TestingApplication of this Group Testing Model
A real-time stock-buy-sell web service applicationResults of the application
ConclusionReferencesQuestion & Answer
IntroductionSoftware development is shifting
from the product-oriented paradigm to the service-oriented paradigm
Web Services (WS) services that are offered through Web and
Internet technologyExamples: tax return service, stock ranking
service, and equations-solving servicecan be offered by many service providers, based
on the same theories but different implementations
trustworthiness and dependability problem
IntroductionWeb Services (WS)
Under SOA and WS, a system consists of a collection of loosely coupled services
These services can make use of each other's services to achieve their own desired goals and end results
Simple services can cooperate in this way to form a complex or composite service dynamically and at runtime
IntroductionHistory on WS testing:
In phase one, WS are essentially tested like ordinary software
In phase two (2003-2005), the following are included in testing: publishing, finding, and binding capabilities of an SOA (Service-
Oriented Architecture) the asynchronous capabilities of WS the SOAP (Simple Object Access Protocol) intermediary
capability the quality of services.
In phase three (2004 and beyond), the following are included in testing: dynamic runtime capabilities, WS versioning, and WS orchestration testing, which invoke
remote WS in a specific order to test their interoperability.
IntroductionHistory on WS testing:
Both clients and service providers must be involved in WS testing
issues must be addressed during WS development including: Security Interoperability UDDI (Universal Description, Discovery, and
Integration) registration Performance considerations
IntroductionThis presentation proposes
a Service-Oriented software Reliability Model (SORM)
This model evaluates the reliability of WS in two steps: Use highly efficient group testing to evaluate the
reliability of atomic services Evaluate the reliability of a composite service based on
the reliabilities of the component services
a technique to test large number of WS simultaneously
to determine the oracle and correctness of the WS under test by majority voting
to provide quality ranking of WS and the test cases
Group Testing For ReliabilityWebStrar
Web Services Testing
Reliability Assessment
Ranking servicesDirectory services
Group Testing For ReliabilityThe WebStrar can take registration from service providers
and various kinds of service brokersIt considers the services registered as atomic services
and uses them to compose composite services An atomic service is a service agent submitted by a service
provider that does not call other WS and thus should be treated as a unit that is not to be broken like atom
A composite service is a service agent submitted by a service provider that uses (calls) other WS
Both atomic and composite services can be provided by the WebStrar directly to the clients
Group Testing For ReliabilityGroup testing
technique was originally developed for testing large samples of blood
It is used to test complex composite WS at runtime
It tests the contamination of an entire group of services by applying one test.
Group Testing For ReliabilityAssume CSn is a composite service consisting of n services
S1, S2, ..., Sn, where Si can be an atomic serviceAssume services S11, S12, ..., S1m are functionally
equivalent to the service S1 in CSn. We can forward (broadcast) the input to S1 to S11,
S12, ..., S1m The results from all services, including that from S1,
are voted by a voting service The voting is weighted based on the current reliabilities
of the services under test. The voting service can set the initial weight of each
incoming service to zero while the exiting service S1's weight to the reliability R(S1)
The voting service detects faults by comparing the output of each service with the weighted majority output. A disagreement indicates a fault.
Group Testing For ReliabilityThe reaibility of the services is calculated using
formula:
Where:the reliability of service S at time point t is R(S,
∆t)In the next ∆t time, k runs are executed and f
disagreements have been detectedM is the total number of tests that the service has
ever been tested
The advantages of the model include:One of the toughest problems in software testing is to
construct an oracle that can determine if a fault has occurred. In this model, the voting service serves as the oracle according to the majority principle.
The model estimates the reliability of each incoming service while performing the normal operation. In other words, the incoming services are tested in the real operational environment at no extra time if sufficient computing power is available.
The model is dynamic, i.e., the data are collected and computed at runtime in real time. The reliability of each service involved in group testing is updated after each run or after a given period of time.
Group Testing For Reliability
Group Testing For ReliabilityOne situation in which SORM would not work
well is:when there are no alternative services
availableIn this case, the SOA is basically degraded to
the traditional software architecture: The service is only tested by the service provider in
its development cycle. However, this is an unlikely situation because SOA
is an open platform that allows and encourages cooperation and competition among service providers to create increasingly improved services.
Application of Reliability ModelsExamples to illustrate the applications of the
proposed service-oriented reliability model:Assume a space agency plans to launch a satellite
on a specific date and from a specific locationThe launch is heavily depends on the weather
conditions in the launch location, including rain, wind, and temperature
They designed 10 independent weather services, each of which offers three component services: RainForecast, TempForecast, and WindForecast
The forecasts are given by their probabilities
Evaluation of component ServicesTo build a trust on the reliability of the component services,
the space agency puts them in a group testing framework, and sets their initial reliability to zero
After a period of group testing, the space agency has the reliability estimation of each service. Table 2 shows a set of sample results obtained in their experiments.
Evaluation of component Services The first column of the
table lists the component services under test.
The second column shows the highest reliability of the service in the given test period.
Column 3 shows the forecasted probabilities of heavy rain, extreme temperature, and the strength of the wind, respectively, of the component services.
Column 4, shows the adjusted forecast probabilities from the service by taking the reliability of the service into account, which are the final evaluation values for the component services
Evaluation of Composite ServicesTo base the decision whether to change the
launch date on the most accurate whether forecasting information, the space agency then constructed a composite service, as shown in Figure 3.
The decision is based on these two factors:The numbers in the diamond boxes are the
reliability of the best component servicesThe numbers on the branches are the
probabilities forecasted by the best service
Evaluation of Composite Services
Evaluation of Composite ServicesAssume that the plan of launch is made a year before the
launch date. The composite service is up and running from day one.
At the beginning, the space agency would have little data about the reliability of each service and the weather forecast a year before the launch date won't be accurate too.
However, by the time, say a month or a week before the launch, we already have sufficient data about the reliability of the services.
These reliability data will be used in the future applications too. When the agency plans its next launch, or another event that needs weather forecast, it already has the reliability data.
Results of the applicationDesign Of Experiment (DOE) is an engineering technique
that can be used to determine the extend of the impact of the parameters (factors) of a model on the final results.applies DOE to analyze the impact of the reliability of the
component services on the reliability of the composite service three factors in our example, the reliabilities of
RainForecast TempForecast WindForecast
They use 2 level DOE techniques, i.e., use high and low values of each factor: RainForecast (70%, 90%), TempForecast (90%, 99%) and WindForecast (85%, 95%).
The 3-factor and 2-level design generated an ANOVA (ANalysis Of Ariance) table
Results of the applicationThe F-Value represents the significance of
the impact of a model and its components.In general, if a component generates a
significance value “Prob>F-Value” of less than 0.05, the impact of the componet is significant
Results of the applicationThe experiment results in Table 3 also show
that the FValues and significances of RainForecast, TempForecast, and WindForecast are all less than 0.0001, and thus they are all significant model components
Results of the applicationthe higher the
component reliability, the higher the overall reliability
The impact of the RainForecast service is much more significant than that of the others
The space agency should pay more attention to the quality of rain-forecast service-provider
Results of the applicationThe evaluation process is dynamic and at
runtime.The vastly available WS on-line make it
necessary to perform group testing, which, in turn, makes it possible to identify the correct service output without having to design an oracle
ASTRAR Group Testinga technique to test large number of WS simultaneously, to
determine the oracle and correctness of the WS under test by majority voting, and to provide quality ranking of WS and the test cases.
can be used by WS service providers, brokers, and clients. A WS provider or client can use the technique to find the best WS for composing new services or applications. For example, a WS provider can compose a digital imaging
using the Fast Fourier Transformation service as a component service.
A WS broker can use the technique to evaluate the quality of WS trying to be registered to make sure only WS with reasonable quality will be offered to the public.
ASTRAR Group TestingThese techniques are used to rank different
WS implementations based on the same specification, the same business logic, and the same input and internal states.
In other words, the WS under group testing should produce the same or close results if the same inputs are applied, e.g., various Fast Fourier Transformation WS
should produce the same or close results based on the same input
ASTRAR Group Testingtechnique proposed here has the following
advantages:It can test large number of WS rapidly and rank
them according to the test resultsIt can automatically create the oracle of test
cases, i.e.,the expected outputs for the given inputs
It can rank the effectiveness of test cases and thus apply the most effective test cases first to eliminate unacceptable WS quickly
Most of the steps in the process can be completely automated and this feature makes this process attractive for commercial applications.
ASTRAR Group TestingA group testing technique, originally
developed for testing a large number of blood samples and later for software regression testing is an attractive solution to address :The Service-Oriented Architecture (SOA) based
WS broker allows WS developers and providers to freely register WS and compose complex WS from other WS dynamically
As a result, for each WS specification, many alternative implementations may be available.
ASTRAR Group TestingASTRAR can test a large number of WS at
both the unit and integration levels. At each level, the testing process has two
phases:Phase 1: Training PhasePhase 2: Volume Testing Phase
Phase 1: Training Phase The process assumes that a reasonably large number of
test inputs or test cases are available to test the concerned WS before the start of this phase1) Select a subset of WS randomly from the set of all WS to be
tested. The size of the subset will be experimentally decided.
2) Group testing: Apply each test case in the given set of test cases to test all the WS in the selected subset.
3) Voting: For each test input, the outputs from the WS under test are voted by a stochastic voting mechanism based on majority and deviation voting principles.
4) Failure detection and reliability computation: Compare the majority output with the individual output. A disagreement indicates a component failure. A dynamic reliability model is used to compute the reliability of each WS based on the failure rate and other factors.
Phase 1: Training Phase5) Oracle establishment: If a clear majority output is found,
the output is used to form the oracle of the test case that generates the output. A confident level is defined based on the extent of the majority. The confident level will be dynamically adjusted in the phase 2 as well.
6) Test case ranking: Test cases will be ranked according to their fault detection capacity, which is proportional to the number failures the test cases detect. In the phase 2, the higher ranked test cases will be applied first to eliminate the WS that failed to pass the test.
7) WS ranking: The stochastic voting mechanism will not only find a majority output, but also rank the WS under group testing according to their average deviation to the majority output.
Phase 1: Training PhaseBy the end of training phase testing, they
have tested the selected sample WS and they have the test cases ranked by their capability so far in detecting failures; the oracle for test cases established with
respect to their confidence levelsthe sample WS are ranked
Phase 2: Volume Testing PhaseThis phase continues to test the remaining WS and
any newly arrived WSbased on the profiles and history (test case
effectiveness, oracle, and WS ranking) obtained in the training phase.
Phase 2 continues to rank the WS, rank test cases, and update the oracles.1) Test cases have been ranked by their capabilities in
detecting failures/faults in Phase 1. Now they are divided into layers, with layer one having the highest capability.
2) Select layer one test cases and apply them in the next step
3) For each layer of test cases, group-test all the WS
Phase 2: Volume Testing Phase4) If an oracle with acceptable confident level (e.g., greater
than 50%) exists, no voting is necessary: Use the oracle to detect failure: Determine if each WS has produced a correct answer and then compute the failure rate and possibly the reliability of each WS using the given reliability model
5) If no oracle with acceptable confident level exists, use voting mechanism to detect failure, as described in phase 1.
6) Update the confident level of the oracles: an agreement between the oracle and the current test output increases the confident level, otherwise, decreases the confident level accordingly;
7) Update the ranking of test cases by including the new number of failures detected
Phase 2: Volume Testing Phase8) Update the ranking of WS and eliminate the WS that have an
unacceptable level of failure rate or reliability. The elimination of unnecessary testing in this step saves testing time.
9) Select next layer of test cases, and return to step 3.By the end of Phase 2 group testing: all the WS available
are tested and a short list of WS are ranked; test cases are updated and rankedoracles and their confidence levels are updated
The same processes can be applied at the integration testing level. If a composite WS consists of n different units of WS, ASTRAR group testing technique can be applied to this composite WS by considering each composite as an individual WS in the group testing.
Application of this Group Testing ModelA real-time stock-buy-sell WS is used as an example to
illustrate the application of ASTRAR technique.The WS under development consist of a server WS
and multiple client WS, residing in different locations. A client can send requests to the server and the server
responses to the requests. All WS under group testing implement the same
specification.The WS Server offers two functions and Client WS can
access these two functions.The database consists of objects of stock information,
defined in the Class Stock.
Application of this Group Testing Model
Application of this Group Testing ModelEach stock object is set to an initial value at
certain time point. The evaluation engine then uses randomly
generated purchase and sale information, or uses replayed data from past stock dump, to decide the price dynamically once every minute.
Once the price is changed, the other members (the percentages of changes in a minute, a day, a month, and a year) of each stock object are computed and updated.
Results of the applicationThe size of the subset (training size) is criticalThe smaller the size is, the cheaper (fewer test runs)
the testing and ranking process will be. However, the smaller the size, the higher the probability
that the training phase fails to find the correct oracleAn incorrect oracle will lead to incorrect ranking of the
WS under test, while an incorrect ranking of test cases may result in more test runs in phase 2 of ASTRAR process.
Another factor that affects the testing cost is the target size, the number of WS to be ranked. For a given large number of WS to be tested, only a
short list of best WS needs to be ranked.
Results of the application proposed an efficient process to test a large number of web services
designed based on the same specification The process is divided in two phases. In phase 1 (training phase), a
selected number of WS is tested and their results are voted. The purpose of the first phase is to establish the oracle and identify the
most powerful test cases. In the phase 2, no voting is applied and the oracle created in phase 1 is
used to judge the correctness of WS under testing. Furthermore, the powerful test cases are applied first, so that the
incorrect WS can be eliminated in a few tests. The experiment results reveal that
the smaller the training size, the lower the cost. However, a small training size can lead to incorrect oracle, leading to
incorrect WS ranking. A small training size can also lead to incorrect test case ranking,
resulting a higher test cost in phase two. Therefore, it is critical to select a reasonable sized training size in WS
group testing As future work
explore the impact of the age of the test cases.
ConclusionThey have proposed presentation proposes
a Service-Oriented software Reliability Model (SORM) which generates a voted information on the fly
without using an oraclea technique to test large number of WS
simultaneously which uses an oracle to test the correctness of new
web services
References [1] W. T. Tsai, D. Zhang, Y. Chen, H. Huang, R. Paul, N. Liao, “A
Software Reliability Model for Web Services,” the 8th IASTED International Conference on Software Engineering and Applications, Cambridge, MA, November 2004, pp. 144-149.
[2] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Developing and Assuring Trustworthy Web Services,” 7th IEEE International Symposium on Autonomous Decentralized Systems (ISADS), April 2005, pp. 43-50.
[3] W. T. Tsai, X. Wei, Y. Chen, B. Xiao, R. Paul, and H. Huang, “Adaptive Testing, Oracle Generation, and Test Case Ranking for Web Services,” 29th Annual International Computer Software and Applications Conference (COMPSAC’05), 2005.
[4] W.T. Tsai, Y. Chen, R. Paul N. Liao, and H. Huang, “Cooperative and Group Testing in Verification of Dynamic Composite Web Services,” in Workshop on Quality Assurance and Testing of Web-Based Applications, September 2004, pp. 170-173.
[5] W.T. Tsai, Y. Chen, R. Paul, “Specification-Based Verification and Validation of Web Services and Service-Oriented Operating Systems,” Proc. of IEEE WORDS, Sedona, February 2005.