Upload
marco-peterson
View
239
Download
6
Embed Size (px)
Citation preview
Metamorphic Testing of Sensor Processing for Android Applications
By
Marco Peterson
A thesis submitted to the Faculty of the College of Graduate Studies of Virginia State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the School of Engineering, Science, and Technology
Virginia2015
Approved by:
______________________________Dr. Kostadin Damevski (Advisor)
_______________________________Dr. Hui Chen (Committee Member)
_______________________________Dr. David Walter (Committee Member)
ABSTRACT
The field of Software Engineering has always strived to enable the creation of more
reliable and accurate software by implementing a range of software testing
techniques to ensure source code executes as intended. Traditional software testing
is done by evaluating results against an oracle, consisting of a set of acceptable
outputs for each test case. A test case is another program created to emulate real
world inputs and scenarios a particular software might encounter. This is an effective
method of testing and is summarily an industry standard of today; but as we all
know, no program is without its bugs and glitches. Detecting theses errors more
effectively has become one of the most pressing objectives for many computer
science industries. Perhaps the chief error detection obstacle software engineers
face today is known as the oracle problem. The oracle problem arises from one of
two situations. The first is when the answer to the problem the software under test is
solving is difficult to constrain. This issue occurs most often in machine learning
software, where a machine must perform a task without be explicitly programed,
such as the self-driving car. In this case a source code must learn how to complete a
task from the input of the world around it. The second situation is when it is either
impossible or too expensive to create a test for all reasonable inputs a software
might encounter. Both situations leave the software developer without a means to
test their software effectively. In the case of sensor data calculations, it is very
difficult to calculate accurate results when given wide range of possible sensor
inputs. The goal of this Thesis is to evaluate the effectiveness of a technique known
as Metamorphic testing on sensor based application on Android platforms in order to
solve issues such as the oracle problem. Metamorphic Testing is a software testing
technique that takes
already existing test cases for a particular software and builds new test cases. This method
essentially reuses test cases to apply different mathematical properties until an error is found.
ACKNOWLEDGEMENTS
I would like to thank my advisor Dr. Kostadin Damevski for the continuous
support of my Master’s thesis and research. His patience, motivation, enthusiasm,
and immense knowledge paved the way for this research.
I would also like to thank Dr. Hui Chen for his help and expertise over the
course of my time in Master’s Program. Lastly I would like to thank all the Professors
and Staff for their help and guidance over the entire life span of my time at Virginia
State University
Last but by no means least; I would like to acknowledge the support from my
friends and peers for all their help both directly and indirectly.
iii
TABLE OF CONTENTS
List of Figures……………………………………………………………………………………….vList of Tables………………………………………………………………………………………..vi
1. Introduction ....................................................................................
...................................1
1.0Overview.......................................................................................
........................1
1.1
Aims and Objectives.....................................................................................
.......2
1.2
Research Questions......................................................................................
........3
1.3
Chapter Outline..........................................................................................
..........3
2.
Problem Statement/Hypothesis..............................................................
........................4
2.0
Problem Statement......................................................................................
.........4
2.1Hypothesis ....................................................................................
........................5
3.
Background/Related Works .............................................
..............................................6
3.0Traditional White-Box
Testing...........................................................................63.0.1 Simulation Testing……………………………………………………
73.0.2 Symbolic Execution…………………………………………………..
93.1 Path Explosion………………………………………………………………….
103.2 The Oracle Problem.................................................................................................................113.3 Machine
Learning……………………………………………………………….113.4 Metamorphic Testing…………………………………………………………..
123.4.1 List of Common Metamorphic Properties
143.4.2 Stacking Metamorphic Tests
153.5 Step Detection Algorithm.................................................................................................................16
3.5.1 Step Cycle Detection…………………………………………………. 17
3.5.2 Calculating Steps Filter………………………………………………. 19
3.6 Fault Seeding and Detection……………………………………………………20
4. Design and Approach21
4.0 Android Frame Work.................................................................................................................214.1 Test Case Detection.................................................................................................................23
4.1.1 Detecting Android API Tests……………………………………….. 244.1.2 List of Android API Tests Searched for……………………………. 244.1.3 Detecting Developer Created Tests………………………………….254.1.4 Test Case Detection Procedure………………………………………28
4.2 Data Collection (SenSee).................................................................................................................29
4.2.1 Data Collection Procedure……………………………………………31
4.3 Error Detection…………………………………………………………………. 32
4.4 Applied Metamorphic Transforms.................................................................................................................33
4.4.1 Multiplicative Transforms…………………………………………… 34
4.4.2 Interpolating Transform………………………..……………………..35
4.4.3 Adding Avg Noise Transform ……………………………………….35
4.4.4 Down Sampling Transform…………………………………………...36
4.4.5 Semantical Transform…………………………………………………37
4.5Fault Seeding Study……………………………………………………………..38
iv
5. Evaluation...................................................................................
........................................40
5.0
Study Recap ............................................................................................
..............40
5.1Test Case Detection
Results................................................................................40
5.2Initial Transform
Results ....................................................................................41
5.3Fault Seeding/Error Detection
Results.............................................................43
5.4Full Transform
Taxonomy……………………………………………………...45
5.5Discussion .....................................................................................
........................47
5.6Limitations of
Study ............................................................................................48
6. Summary..........................................................................................
..................................49
6.0 Summary.............................................................................................................
..49
6.1 Recommendations for Future Research............................................................
50
Appendix A......................................................................................................
.................................51
Appendix B......................................................................................................
.................................56
Appendix C......................................................................................................
.................................58
Appendix D .....................................................................................................
.................................62
Appendix E .....................................................................................................
..................................64
Appendix F .....................................................................................................
..................................65
Bibliography.....................................................................................................................................
72
v
LIST OF FIGURES
3.1 Rapid Growth of Conditional Possibilities...................................................8
3.2 Simulation Testing Single Path Execution..................................................9
3.3 Symbolic Execution Path Execution.............................................................................................................................10
3.4 Simple Cosine Test Case.............................................................................................................................13
3.5 Metamorphic Stacking.............................................................................................................................15
3.6 Accelerometer Sensor Data.............................................................................................................................17
3.7 Stride Diagram.............................................................................................................................19
3.8 Dynamic Threshold Leveling.............................................................................................................................20
4.1 Android Frame Work.............................................................................................................................22
4.2 t1 Test Case.............................................................................................................................26
4.3 t2 Test Case.............................................................................................................................27
4.4 Caller Method.............................................................................................................................28
4.5 Parsing Algorithm Output.............................................................................................................................29
4.6 Accelerometer Sensor Data with Tag Lines.............................................................................................................................
30
4.7 SenSee Capture and Transform Diagram.............................................................................................................................31
4.8 Original 10 Step Data Set.............................................................................................................................33
4.9 Multiplicative Transform on 10 Step Data Set.............................................................................................................................34
4.10 Interpolating Transform on 10 Step Data Set.............................................................................................................................35
4.11 Add Average Noise Transform on 10 Step Data Set.............................................................................................................................36
4.12 Down Sampling Transform on 10 Step Data Set.............................................................................................................................37
4.13 Semantical Transform on 10 Step Data Set.............................................................................................................................38
3.7 Stride Diagram.............................................................................................................................38
v
LIST OF TABLES
5.1 Base Line Pedometer Results before Transforms.............................................................................................................................42
5.2 Pedometer Application Results for each Transform.............................................................................................................................43
5.3 Transforms Results after Introducing an Error………………………………………...35
5.1 Questionnaire for evaluation.............................................................................................................................33
5.2 Distribution of the participants’ responses.............................................................................................................................39
5.3 Transforms Results after Introducing an Error…………………………………….......44
vi
CHAPTER 1 - INTRODUCTION
1.0 Overview
Reducing the cost of software development while improving software quality
is an important objective for the software industry. A study by Tassey estimated the
annual cost for software testing to be between $22.2 to $59.9 billion dollars, with
over half of those costs borne from mitigation activities caused by correcting errors
after a software’s release [15]. Checking a product for faults is standard practice in
almost all fields, and is fundamentally important to product quality. This is
especially true in the field of software engineering for two reasons. The first is the
complexity required from many modern software products. The second reason is
due to potential consequences of a software failure. The production of reliable
software is one of the fundamental requirements for applying computers to today's
challenging problems [12]. As computer programs grow in size and complexity,
testing costs will only increase. More research is needed to reduce these costs by
developing new, more effective testing methods and approaches.
A novel testing technique that aims to improve upon the state of the practice
is metamorphic testing. It has been used to help improve software accuracy and
reliability in several fields including Bioinformatics, Genetic Sequencing, and Machine
Learning. The focus of this thesis is applying this technique to sensor based
application,
more specifically Android based sensor applications. Many applications today use
sensor data to calculate some result. Applications ranging from calculating blood
pressure and heart rate to docking ships with the international space station.
However, calculating a desired result from a set of raw sensor data is not easy,
especially if the mathematical procedure to do so does not already exist. This
problem becomes exponentially more difficult when you are performing calculations
using more than one sensor. Perhaps the best example of this is today’s weather
forecasting system. Thousands of sensor arrays recording everything from humidity,
temperature, and wind speed are used in attempt to predict the forecast days in
advance, but it is not always accurate. Weather forecasting is an example of an
oracle problem. This is when all possible sensor inputs and combinations are
impossible to calculate, so creating a computer program to accurately predict the
weather one hundred percent of the time has proven to be equally impossible.
Solving and testing for the oracle problem has become a fundamental goal for
computer scientist today.
Weather forecasting is one of the most complex sensor based application in
existence today, nonetheless the basic principles remain the same. We are applying
metamorphic testing on a smaller scale in an attempt to understand how
metamorphic properties can be used to improve both the source code through error
detection and the overall error threshold accuracy of the software. The tools we
created will also provide Android developers with a platform to perform metamorphic
testing on their own applications.
1.1 Aims and Objectives
The goal of this thesis is to evaluate a testing technique known as
Metamorphic Testing within the Android platform. The objective is to evaluate the
effectiveness of metamorphic testing in finding errors within Android source code as
well as to evaluate the current testing practices being used by Android developers.
2
1.2 Research Questions
∑ What testing methods are Android developers currently using?
∑ Can metamorphic testing be applied to sensor based Android applications?
∑ How effective is metamorphic testing for detecting errors in Android source code?
∑ What Metamorphic transforms are most effective in evaluating the first three questions? ∑ Can we find transforms that can be applied to other software outside of Android?
1.3 Chapter Outline
This thesis consist of six chapters. Chapter 1 presents the overall goal the
thesis, including research questions, aims, objectives, and overview. Chapter 2
illustrates the problem statement and hypothesis based on related work in this area
of research, as well as give a brief history of software testing explaining where
metamorphic testing derived its concepts. Chapter 3 in the background chapter. It
provides an in-depth explanation on metamorphic testing and the methods used to
collect the sensor data used during this thesis. It also outlines the related works in
the fields of metamorphic testing, machine learning and fault seeding. Chapter 4
provides a detailed explanation of the Android frame work, and the transforms used
to during our evaluation. This chapter also provides a high level explanation on how
we were able to capture and transform onboard Android sensor data. Chapter 5
explains the results of your evaluation as well as the study’s limits. Finally Chapter 6
summarizes our work and provides recommendations for future research.
3
CHAPTER 2 – PROBLEM STATEMENT/HYPOTHESIS
2.0 Problem Statement
The conventional method to test software is to examine pairs of expected
output data and input data, then check to see if the expected output has been
achieved when a given input is passed through the code being tested. If the output is
incorrect, then it is safe to say your program has a bug/error; but what if the output is
correct? Is the code now faultless? The answer is no, as even for a relatively simple
program, reliably finding all errors that may exist is a difficult task. As software
increases in complexity, many computer programs are tasked with problems for
which the correct output is difficult to express in all cases or with 100% confidence.
This is known as the oracle problem in software testing. Finding errors, logic
mistakes, and general bugs is inherently difficult if a developer does not know what
the final outcome should be once a program’s computations are complete. As
mentioned in the movie “The Hitchhiker’s Guide to the Galaxy” a computer attempts
to compute the meaning of life [21], generating an arbitrary answer of 42. But, is that
answer correct? Perhaps the better question is how someone would test this
computer program for correctness. Metamorphic testing has been shown to be
effective by several studies [1] [1] [18] [19] in a wide range of testing applications,
especially testing software that possesses the oracle problem.
This thesis contains the methods needed to apply metamorphic testing to
sensor based Android applications. The goal is to provide Android developers with a
new tool to further test
and improve their applications, as well as provide an understanding of metamorphic testing and
it’s properties so it can be applied to other problems.
2.1 Hypothesis
Metamorphic testing transforms can be used to test sensor-based Android applications in order to improve overall error detection and error threshold.
5
CHAPTER 3 – BACKGROUND/RELATED WORKS
3.0 Traditional White-Box Testing
The term white-box testing is used to describe a group of methods used for
testing a software’s internal source code by constructing test cases. Also known as
clear box testing, or glass box testing (Beizer, 1995), these connotations indicate
that a developer has full visibility of the internal workings of the software product,
specifically, the logic and the structure of the code [8]. This visibility allows
developers to create test cases specifically designed to exercise a software’s
processing path and determine if it has reached an appropriate result. This method
is used to test a variety of source code functions such as data flow, decision
statements, networking connections, and program pathing. All of these examples
require the developer to evaluate the Software Under Test (SUT) using a predefined
set of inputs against the expected set of outputs.
There are two central “white-box” testing methods that can be applied when
creating a test case for a particular piece of software. The first is known as “Unit
Testing”. The most fundamental testing method of the two, Unit testing is used to
test one specific part of a code, usually a function or family of functions known as
modules or units. It has become a good programing practice to create several
separate modular functions to construct an overall piece of software, breaking a large
piece of code down into a bunch of small pieces of code that perform a very specific
task that contribute to the overall program as a whole. The primary goal of unit
testing is to take the smallest piece of
testable software in the application, isolate it from the remainder of the code, and
determine whether it behaves exactly as you expect [7].
The next type of testing method is Integration testing. Just like its name
suggests, this tests the assimilating of smaller pieces of code into a larger piece of
code after they have been verified to be correct through unit testing. This insures
that all the modules in the system are working together as intended [10].
When constructing test cases for error detection, developers can choose to
implement them using a variety of approaches. The best approaches exercise all
possible inputs and conditions within a given program in an attempt to insure no bug
is left undetected, this is called “Full Coverage”. However testing with full coverage
approaches my not always be possible or practical. Methods such as Simulation
Testing and Symbolic Extraction allow for deliberate and effective testing for some
software, but not all.
3.0.1 Simulation Testing
Perhaps the most basic form of software testing, simulation testing is the
simple process of feeding a predefined input into a program and evaluating the result
for accuracy. These tests are designed mimic the operation of world scenarios, such
as the day-to-day operation of a bank, the running of an assembly line in a factory, or
the staff assignment of a hospital or call center [9]. However simulation testing has a
fundamental flaw when it comes to testing software that have condition statements.
Using this method you can only test one condition at a time, if your program has
multiple conditions with several layers of nested conditions the number of possible
results grows very quickly, and testing for each of those results becomes more
difficult.
For Example, if your program has an “If Statement” it can execute one of the two possible
7
conditions at a time, either the true condition or the false condition. Another test is
required to execute the other condition. Most software today have several if
statements with in their source code, many of which as nested within each other.
Figure 3.1 illustrates how these possible conditions statement can grow rapidly
Figure 3.1 – Rapid Growth of Conditional Possibilities
This is just an example of one conditional statement. Other conditional
statements such as “If Else Statements can have more than just 2 possible
branches, further complicating the conditional logic of any given program.
Furthermore the same type of graph can be drawn to depict a programs
over structure. Complex programs will have individual functions they may or may
not be called during a particular test. These types of complexities make it very
difficult to achieve Full Coverage when
8
testing large complex software. Figure 3.2 depicts how simulation testing can only
execute one path at a time with in a complex program.
Figure 3.2 – Simulation Testing Single Path Execution [11]
FSM = Finite State Machine (i.e. Computer Program)
3.0.2 Symbolic Execution
In an attempt to obtain full coverage for complex programs, James King
created the first automatic testing method called Symbolic Execution in 1976.
Symbolic Extraction does away with concreate inputs (i.e. numbers) into a program.
Instead it supplies dynamic variables (or symbols) as inputs into the software being
tested; while keeping track of the conditions needed to travel along each path of the
source code [6]. This condition state tracking allows the symbols to dynamically
change in order to meet conditions needed explore and test another part of the
program.
For example if the symbol encountered an “If Statement” the value of the
symbol could change to satisfy the true condition. Since the current condition state
is recorded, the symbol variable can back track through the code, and then change
to satisfy the false condition. Repeating this process over and over this method of
testing will ultimately achieve full coverage as illustrated by figure 3.3 [6].
9
Figure 3.3 – Symbolic Execution Path Execution [11]
Even though Symbolic Execution is able to achieve full coverage, it is only
able to do so for relatively exceedingly large programs. As programs get large, their
conditional statements grow exponentially, costing more memory to track current
paths and more time to execute. This eventually caused the testing method to
become unpractical. This phenomenon is known “Path Explosion”.
3.1 Path Explosion
Symbolic techniques have been shown to be very effective in path-based test
case generation; however, they fail to scale to large programs [16]. This is because
the possible number of execution paths to be considered symbolically is so,
eventually only a small part of the Program path space is actually explored [14].
There have been several studies and projects dedicated to increasing the number of
possible paths methods such as these can handle. Most notably the field of model
checking [17], even winning the Turing Award in 2007 [35]. Todays most advanced
software contains millions of lines of code with billions of possible paths. Only
10
time will tell if new developments in this field will keep up with the path of ever
increasing path explosion, however these methods of testing are optimal for solving
other testing hurtles such as the oracle problem such as those found in machine
learning. This is especially true if these machines contain large decision making
processes with billions of possibilities.
3.2 The Oracle Problem
Traditional unit and integration testing methods are great for testing software
that have a known answer. Model testing is even better at automatically generating
full coverage tests for constrained software. Both of these testing metrics still require
finding inputs that cause execution to reveal faults [5]. What if you didn’t know all
the possible input combinations or execution paths a software might take to produce
a result? Furthermore, what if you don’t know what the answer should be? Applying
computers to solve for unknown problems is one of the stables of the industry, but
testing such software is incredibly difficult and costly. This is known as the oracle
problem [5], and solving it has been a major issue for several fields of computer
science. After all, answering questions we do not know the answer to is the
fundamental requirement for scientific advancement. Solving the oracle problem
involves constructing some sort of test oracle or table of expected results that can be
compared to a given set of inputs [18].Most of these types of applications fall under
the umbrella of machine learning.
3.3 Machine Learning
The basic definition of Machine Learning is getting computers to act without being
explicitly programmed, and over the past two decades Machine Learning has become one
of the mainstays of information technology [19]. These algorithms can be as simple as
the spam filter in your email learning which emails to send to your junk folder, or as
complex as the self-driving car; but they
11
all face the same fundamental problem. These computer applications do not start off
knowing all the answers to every problem they may face, hence the name “machine
learning”. When developing these applications, how do programmers know that the
software they have written will instruct a self-driving car to stop at a red light instead
of speeding through it? In situations like these, traditional testing measures cannot
be applied due to large number of possible inputs and execution paths. Many these
software also lack a definitive result the computation it is trying to execute. Here
Metamorphic Testing can be applied to the machines known set of rules to evaluate
if the program will react in the desired manner when presented a choice. The idea is
relatively simple, but extremely difficult to execute.
3.4 Metamorphic Testing
The concept of metamorphic testing was formally introduced to the world in
1998 by three professors from the University of Hong Kong. Dr. Chen, Dr. Cheung,
and Dr. Yiu [20]. They observed three fundamental problems with current white-box
testing methods. The first observation made was that software which passes its
initial test cases were considered successful and are seldom investigated further for
errors. Second, no matter how much testing is done, a software will most likely still
contain errors. Lastly, obtaining a test oracle to test against in many software
applications (especially in the development phase) is unrealistic in many situations.
[20] Solving the oracle problem allows developers to tackle computing challenges
that we do not know the answer to. Perhaps chief among these is the challenge of
machine learning.
However the aim of this thesis is to tackle the second observation made by
Dr. Chen and his colleagues, which states that almost all software contains errors.
These errors can either be logical errors that break the software in general, as well as
mathematical or algorithmic errors that cause the program give an inaccurate or
inconsistent result. In order to solve this problem we
12
must address the first observation which states once a software passes its first test
case is seldom tested again for further errors. In most cases a tested program still
contains errors that the first test case did not reveal. Typically when this happens a
new unit test case is created in an attempt to find the error.
This is where metamorphic testing differs from traditional white-box unit
testing. Instead of making more test cases from scratch, metamorphic testing
derives new test cases from the existing passing ones by applying a transform to the
original output of the original test case. These Transforms are typically a
mathematical operation or set of operations applied to the original data in order to
change the output result. The result should be changed in a predictable manner
based upon the transform applied. For example, if a Transform adds three to every
number in your data set, the result should reflect the transform applied, if it does not
you have found a potential error in your source code. The term metamorphic testing
comes from the fact that this method morphs existing input test data in order to
reevaluate the source code using the same test case. Figure 3.4 for example uses a
simple cosine property to check a result.
Figure 3.4 – Simple Cosine Test Case
We know that cosine exhibits certain mathematical properties, so if we make
changes to the input we can predict the output. Those cosine properties are what’s
called metamorphic properties. This is a simple example of a metamorphic property that
can exist within a program.
13
This logic of metamorphic properties can be implemented to create new tests
that can challenge your software functionality and accuracy. For instance we took a
test case that previously passed, and morphed the input data in a similar way so
that the output values should not change. If the test now fails, then we have
discovered an error in the program. This is an example of the Semantically
Equivalent Property. There are several metamorphic properties commonly used to
produce similar tests (listed below). Depending upon what computational techniques
a program performs determines what metamorphic properties are feasible when
creating a metamorphic test.
3.4.1 List of common metamorphic Properties
• Additive: Increase (or decrease) numerical values by a constant
• Multiplicative: Multiply numerical values by a constant
• Permutative: Randomly permute the order of elements in a set
• Invertive: Create the “opposite” of a set
• Inclusive: Add a new element to a set
• Exclusive: Remove an element from a set
• Compositional: Compose a set
• Noise-based: include input values that will not affect the output
• Semantically Equivalent: create inputs that are have the same “meaning” as the original
• Heuristic: create inputs that are “close” to the original
• Statistical: create inputs that exhibit the same statistical properties
14
3.4.2 Stacking Metamorphic Tests
The concept behind metamorphic stacking is simple. Take a transformed
output, then apply another transform. Keep transforming the input data until you
have reached a desired threshold. This is where metamorphic testing shines in its
ability to find changes or errors in code, while improving overall software
accuracy and reliability.
For example, a developer could apply multiple Noise based transforms to
determine how much noise a particular application can handle before it starts to fail.
Similarly we could then apply several averaging transforms to input data in an
attempt to cancel out the noise, or apply an exclusive transform to simply remove
the noise from the data set. Methods like these help reduce possible errors that
might exist in your code while improving overall accuracy and reliability of your
software. Continuously testing passing test cases until the software breaks. The
figure below details the transform flow.
Figure 3.5 Metamorphic Stacking
15
Applying a transform is relatively simple, but how do you know which
transform to apply? Not every transform is going to fit every problem. As of right
now there is no industry standard for applying data transformations, mainly
because the field of computer science encompasses such a wide range of
industries. Many of these individual industries do have a set frame work for finding
software errors, but these methods often cannot be applied to another industry. To
understand how we applied metamorphic testing to our Android application you
must first understand the metamorphic properties of the software itself.
3.5 Step Detection AlgorithmThis thesis uses a pedometer application as a test bed to evaluate if
metamorphic testing can be applied to Android sensor data, and if so; it will be used
to measure its effectiveness. In order to do this we will be manipulating the
metamorphic properties within this application’s mathematical and logical
algorithms. Exploring and applying the correct properties requires an understanding
of basic human step detection.
Most people are familiar with the basic function of a pedometer, which is to
count the number of steps you take. Nonetheless, how does it count steps? Not to
many years ago pedometers had physical balls that rolled back and forth to
determine steps. Every time the ball made a full back and forth cycle the pedometer
registered one step, but this system takes up a lot of space and does not hold a high
threshold of accuracy. Most pedometers today use a microelectromechanical system
or MEMS [22]. MEMS use a series of accelerators to detect and calculate when a full
step cycle has occurred. When running or walking your body moves in three
dimensions. Accelerometers measure the rate acceleration for each of the X, Y, and
Z axes [23]. The Figure below depicts a sample of this data. The next section will
explain the math behind calculating a human step.
16
Figure 3.6 – Accelerometer Sensor Data
Sensor Data
Accl
erati
on
25
20
15
105
0
116314661769110612113615116618119621122624125627128
63013163
31346361
37639140
64214364
51 Time
X AxisY AxisZ Axis
3.5.1 Step Cycle Detection
Key Terms
Lead Leg – Leg in front of the runner.
Trail Leg – Leg behind the runner.
Stride position - The position
where your lead leg is extended
out to the farthest point in front
of your body.
Kick Position - The position
your trail leg is extended out
to the farthest point behind of
your body.
Once this data is collected it can be calculated to determine when a human step cycle has been completed, from there we can begin to count these cycles; thus giving us a step counter.17
Figure 3.7 illustrated below should help explain the concept. We will start with the
most apparent axis in the data set, which is the Z axis or your “side-to-side”
movement. Since acceleration is the measure of the change in speed not a measure
of constant speed, your “side-to-side” motion will have the greatest range of data
set. When running or walking a person generally swings their arms, creating a back
and forth sideways motion. Finding this axis is key when your pedometer axes are
not specific to individual orientation. For example many phones have pedometer
applications that function no matter how you orient your phone on your body. When
you start moving the software first looks for the data that has the highest
acceleration osculation and declares it the Z axis, this is called Peak Detection.
Next is the Vertical acceleration or the Y axis. When running your body moves
in an “up-and-down” motion. When you’re running and transitioning from the “stride”
position (The position where your lead leg is extended out to the farthest point in
front of your body) to the “kick” position (The position your trail leg is extended out
to the farthest point behind of your body) Your body is moving up, and thus
registering an acceleration force to the Y axis. At the top of this momentum your
body will eventually slow, coming to a complete to stop before it falls back down. The
height of this upward motion corresponds to a peak on the Y axis graph. Your body is
suspended in air for a very brief period, during this time acceleration is zero, so the Y
axis line begins fall. As you transition from the “kick” position back to the “stride”
position your body begins to accelerate upward. The Y axis graph will again rise
because you are again accelerating. It might seem counter intuitive for the
acceleration line graph to rise when you are accelerating downward, but acceleration
in any direction; up, down, left, right, forward, and back are all considered positive
acceleration values. A step cycle is considered complete when transition from kick to
stride position and the back to the kick position.
18
The final axis is forward acceleration. Conceptually you might think that this
would be the value that has the highest acceleration, but again if this was a measure
of overall movement then yes the forward axis would have the highest range and
thus, the highest peaks on our graph. However, since acceleration is the measure of
change in speed, the X axis has the least “back-and-forth” motion of the 3 axes. As
you run or walk your forward acceleration as you transition from the kick position to
the stride position increases, because you are in the process of bringing your lead leg
out in front of you (commonly called striding out). When your lead leg hits the ground
and starts becoming your trail leg and begins transitioning into the kick position, the
forward acceleration slows down. At the same time the your vertical acceleration
increases, because at this point your body is moving farther up than it is moving
forward.
Figure 3.7 – Stride Diagram [23]
3.5.2 Calculating Steps Filter
Filtering the data serves to purposes, the first is to smooth out the
accelerometer data, the second is to cancel out false positives. This is achieved by
using Dynamic Precision [24], the process of continuously updating the average of a
data set. In this case we have 3 data sets, the X,
19
Y, and Z axes. In order to find the average we first need to find the minimum value
and maximum values of a predefined subset of the entire axis array, in our case
every fifty samples. The average value is equal to (Max + Min)/2. This average is
called the dynamic threshold level. A step is counted if the original axis line with a
negative slope crosses the threshold line. Figure 3.8 below is an example of how this
method is applied to the Z axis values.
Figure 3.8 - dynamic threshold leveling [23]
3.6 Fault Seeding and Detection
In order to evaluate the Metamorphic Testing for error detection, we must
introduce some errors to the software under test, otherwise known as fault seeding
[26]. In this case the software under test is an Android pedometer application. The
basic concept behind fault seeding is simple. Insert a logical or mathematical error
into a piece of software, than run it through a test case. This helps a developer
determine if his/her test case can effectively detect that particular type of fault.
These faults can either be introduced to the code manually or generated
automatically using techniques such as Dependency Graphs [25].
20
CHAPTER 4 – DESIGN AND APPROACH
4.0 Android Frame Work
The Android Operating system has become one the most popular
development platforms over the last few years due in large part to its robust libraries.
Perhaps more importantly, it’s detailed documentation that provides developers with
an in-depth understanding of how to use its vast library of functions and how to test
them, as well as a large suite of built in test cases and functions. Through this
documentation [28] [29], and understanding of java, we were able to construct not
only a metamorphic testing frame work for the Android platform, but also a parsing
algorithm to automatically check Android applications for testing functions, also
known as test case detection. This creation of these two tools was done by carefully
taking advantage of some known Android functions and repurposing them to
generate an output that is useful to us.
The Android system inheres to the following frame work: In order for any
application to receive data from any device sensor, that application must ask for
permission from the Android operating system. This is done by calling the
“RegisterListener” Function from Android’s API (Application Programming Interface)
[27]. This Function takes two parameters; the first is the name of the object you
would like that sensor data forwarded back to. This name will be reused elsewhere in
the code to collect that particular type of sensor data. The second parameter is the
type of sensor data your application needs. This is important because smart phones
today have a large assortment of sensors ranging from GPS to microphones, this
parameter specifies what senor data the operating system forwards to the requesting
application.
Once an application has sensor permission from the Android operating system
we can then use that object name to receive data. Within that object is another
function from the Android API called “onSensorChanged” [30]. Android uses this
function to receive new values every time the data changes. For example whenever
your GPS location changes on your phone that GPS data is sent to the
onSensorChanged functions for all applications that currently have permission to
access the GPS sensor. Since all new sensor data is sent to this function, it is here
where applications must perform any and all computations on sensor data, as well as
any tests. These tests and calculations can either be done by native source code that
is within onSensorChanged function itself, or there may be other modular functions
that are called upon to perform the calculation tasks for any given application. This is
also holds true for any tests or test functions that may exist. Figure 6 illustrates how
the Android frame work operates.
Figure 4.1 – Android Frame Work
22
4.1 Test Case Detection
Now that we understand the frame work that powers sensors, we can
repurpose it to evaluate if sensor based Android applications are taking advantage of
the testing libraries and tools provided by the Android API. We can also determine if
the developers are implementing their own testing methods; and if so what kind of
testing are they implementing. In order to detect Android tests cases for sensor
applications we first need to determine if the app uses any sensor data from the
device itself. Mobile devices contain many sensors, but not all apps make use of
them. For example an application that keeps track of the number of steps you take
throughout the day may use a device’s accelerometer or GPS sensors to calculate
steps. Whereas an application that simply sends or receives messages (FaceBook for
example) will make no use of a device’s onboard sensors. To extract this information
from an application’s source code we used a SrcML.net [31] function called
GetDescendantsAndSelf<MethodCall>() [32]. When used this function parses
through a given source code looking for a specific function by name. In this case we
are looking for the Android function called getDefualtSensor [27]. By searching for
this function, SrcML can return the type and number of sensors any particular
application is using. If there is no sensor in use we can skip that application and
continue parsing the next one. The code used to complete this task can be found in
appendix A.
Once we know that an application makes use of a device’s sensors, the next
step is to check if the application performs any internal test during or after any
calculations that may be performed on the incoming input data from the sensors. For
example the step counting application should test itself to see if the desired output is
being achieved when given a set of input sensor data. There are two different types
of testing scenario’s we are looking for. The first is to determine if any testing is done
utilizing Androids built-in testing library. The Android API comes with a variety of
built-in testing functions that can be used to test a wide range of Android’s
functionalities. The second scenario is locating developer created testing functions.
23
This is when a developer uses either traditional white box testing or some other
testing strategy to create his own test cases. The ultimate goal is to detect both
developer created test functions and Android’s built test functions, however the
strategies used for detecting these two types are drastically different.
4.1.1 Detecting Android API Tests
We will start with the easier of the two scenarios to detect; which is
detecting test cases that have been built into the Android API. Since we already
know the names of the Android test functions and what they do thanks to Android
API documentation. From this we can determine if a developer decides to use one of
Android’s built in testing libraries. This is done much the
same way find the getDefaultSensor function, simply change the name of the keyword
you’re looking for during the parsing process. In this experiment we searched for six
Android test functions to see if developers where taking advantage of these built in
tools. The full list of Android tests we searched for can be found below. We
hypothesized that developers would attempt to use the provided testing methods
before building one from scratch.
4.1.2 List of Android API Tests searched for
∑ ActivityUnitTestCase - This class provides isolated testing of a single activity. The activity under test will be created with minimal connection to the system infrastructure, and you can inject mocked or nested versions of many of Activity's dependencies [27].
∑ ServiceTestCase - This test case provides a framework in which you can test Service classes in a controlled environment. It provides basic support for the lifecycle of a Service, and hooks with which you can inject various dependencies and control the environment in which your Service is tested [27].
24
∑ ApplicationTestCase - This test case provides a framework in which you can test Application classes in a controlled environment. It provides basic support for the lifecycle of an Application, and hooks by which you can inject various dependencies and control the environment in which your Application is tested [27].
∑ ProviderTestCase2 - This test case class provides a framework for testing a single Content Provider and for testing your app code with an isolated content provider. Instead of using the system map of providers that is based on the manifests of other applications, the test case creates its own internal map. It then uses this map to resolve providers given an authority. This allows you to inject test providers and to null out providers that you do not want to use [27].
∑ LoaderTestCase - A convenience class for testing Loaders. This test case provides a simple way to synchronously get the result from a Loader making it easy to assert that the Loader returns the expected result [27].
∑ ActivityInstrumentationTestCase2 - this class provides functional testing of a single activity. The activity under test will be created using the system infrastructure (by calling InstrumentationTestCase.launchActivity()) and you will then be able to manipulate your Activity directly [27].
4.1.3 Detecting Developer Created Tests
To find developer created test cases we set the parsing algorithm to search
for the onSensorChanged function, the exact same way we search for the
getDefualtSensor function. We know that the onSensorChanged function is were all
Android applications receive incoming sensor data from the Android operating
system, finding this function is the first step in detecting any developer created tests
that may exist.
There are two ways a developer can implement a testing function within the
onSensorChanged function. Either the testing source code and, or testing function,
can exist natively with in the onSensorChanged function itself (referred to as a t1
test case). Or embed into another function outside of onSensorChanged, which
performs the calculations on the sensor data, and then later called by that
calculation function to perfume testing (referred to as a t2 test case).
The t1 test case is the simpler of the two. The calculations are done within the onSensorChanged function either by performing source code calculations native to25
onSensorChanged or using a calculation function call to some function that exists
outside of the scope of onSensorChanged. However the calculations are done the
testing function that is used to evaluate these calculation is called with in the
onSensorChanged function itself. In this scenario we only need to determine the test
cases parent function one level up, which in our case is easy because we already
know the parent is the onSensorChanged function. The Parsing algorithm then can
return all the children of the onSenorChanged function, among them will be the
testing function or functions we are looking to detect. Figure 4.2 below illustrate the
flow of the sensor data from onSensorChanged to calculations on the sensor data, to
the passing of calculated data to a test function.
Figure 4.2 – t1 Test Case
If the test case is embedded in another function that exists outside of the
onSensorChanged function, we refer to it as a t2 test case. This is when the sensor
data is passed to another function to perform the mathematical calculations, then
the test function for these calculations is called with in the function performing the
math. This is a more real world scenario, because when creating a software almost
all of your code, especially computational code, is contained within a
26
function. This is also much harder to find where the testing function is located
because we no longer know what the name of its parent function is. For the t1 test
case we relied on the Android API to tell us what the name of the function was, then
we simply search for that function name when parsing the code. In this scenario the
developer could have named his calculation function anything. To solve this problem
we need to return all the functions called by the
OnSensorChanged function, and then return all of the functions called within
those functions. Figure 4.3 below illustrates how the t2 test function is called
(embedded) by a calculation function.
Figure 4.3 – t2 Test Case
As you can see the sensor data is simply passed from the onSensorChanged
function to the calculation function where it is processed, and then passed to the test
function.
The code to mine this information out of the java code during parsing is below.
27
Figure 4.4 – Caller Method
4.1.4 Test Case Detection Procedure
We used Microsoft Visual Studio with a SrcML.net plugin to program the
source code that powers our parsing program. We then applied the algorithm to a
body of thirty sensor driven open source Android applications downloaded from
repositories such as GitHub [33]. The complete list of applications and their download
sources can be found in Appendix B. All applications were downloaded and stored in
a single folder that would serve as a root directory, or starting point, for our
algorithm. To execute the program we used Visual Studios’ “Run Tests” feature, at
which time the program would display sensor types, implementations of
onSensorChanged, as well the children functions for onSensorChanged for each
application stored within the root directory. Figure 4.5 is an example output
displayed after the program has completed.
28
Figure 4.5 – Parsing Algorithm Output
4.2 Data Collection (SenSee)
SenSee is an Android application created by Virginia State staff and students
using the same rules applied for test case detection [34]. The basic principle behind
SenSee is to allow a user to perform a series of actions or tasks using Android sensor
data, while at the same time allowing him or her to record and tag those actions in
order to provide some ground truth for the data that is being collected. We used it to
establish the number of steps actually taken by an individual during our evaluation of
a pedometer application. Using the SenSee’s tag feature we were able to identify
where each step or set of steps accrued when evaluating the sensor data, and thus
effectively eliminating the oracle problem. Figure 4.6 below illustrates the real world
step tags recorded when collecting sensor data.
29
Figure 4.6 – Accelerometer Sensor Data with Tag Lines
The frame work that powers SenSee is very similar to the frame work that
powered our test case detection algorithm discussed earlier in this paper. The
difference is that SenSee doesn’t use the onSensorChanged function to search for
test cases, instead it uses it to hijack and manipulate sensor data that is sent to any
application that has permission to it. Sense is a standalone application that does not
have to integrate with any other application or relay on outside code, thus allowing
us to perform two tasks. The first is to test the Android sensors themselves. Because
SenSee captures raw input data from the devise sensors, developers can see if
specific sensors are producing the correct readings before using that sensor data as
input into another application. This is a simple quality control measure. Inputting
corrupt or incorrect sensor data will cause an application to either crash or produce
incorrect results. The second task
30
SenSee allows us to do is to control what data is sent to a particular application. This
ability opens the door for metamorphic testing for Android platforms and is the focus
of this thesis.
Figure 4.7 – SenSee Capture and Transform Diagram
4.2.1 Data Collection Procedure
To collect data we used 3 participants, consisting of male and female, using 3
Android devices all running SenSee. This was done insure that Sensor data could be
recorded over multiple Android devices as well as confirm the pedometer app being
test could handle both male and female walking postures. Our participants walked a
predefined number of steps while the Android device recorded all accelerometer
sensor data along the way. SenSee stores all recoded data as a CSV file which is then
taken from the device and stored on a computer running a virtual copy of SenSee,
Via Android Studio, where it then can be feed into any Android application, in our
case we used an open source pedometer application.
31
4.3 Error Detection
The overall goal of this study is to detect errors using metamorphic testing,
but to do that we must first define what an error is. The term error in the field of
computer science can refer to many things, but we are focused on two types of
errors.
The first is a programing error. This an error that exists within the code that
leads to bugs or unintended glitches. Almost all software contains errors with it’s
source code with varying degrees of disruption to the overall function of the software.
In order to find these bugs you must first determine that they exist. This is harder to
achieve in some software than it is in others. Simple applications generally contain
less lines of code and have less dependency on external functions to operate, so
finding a programing error if one exists is much easier. Larger and more complex
pieces of software, the Windows Operating System for example, can contain millions
of lines of code within thousands of functions that all depend upon one another to
perform correctly. Finding errors in environments such as these is far more difficult. If
these errors persist they can lead to a dramatic fluctuations to our second type of
error.
Threshold error is the amount of incorrect results a particular software can
handle before failing. For example, if a software can be up to 20% incorrect and still
be considered effective, that software has a 20% error threshold, thus that software
must be correct 80% of the time or higher to achieve that threshold. This number can
vary greatly between software depending on the software application. Nuclear power
plants or flight control systems contain software that meets a much higher error
threshold, because the cost of failure can be catastrophic. In general the amount of
threshold error a software produces is a direct consequence to the number of
program errors that are contained with its source code. To combat this we must
either detect or take steps to minimize any logical or computational errors that may
exist.
32
4.4 Applied Metamorphic Transforms
In order to detect these errors we applied a serious of metamorphic
transforms to our Android application. Because our application is a step detections
application that uses a devices onboard accelerometer sensor, we can use SenSee to
alter the data being received by the application itself. To better display the
transforms effects on our sensor data we will be comparing the results from one of
our data sets. This particular data set only recorded 10 steps so it should be easier to
follow. The original accelerometer values for this data set is shown figure 4.8, which
shows the values for X,Y, and Z; as well as the positive average over all the axis. This
average is what the step detection algorithm uses to determine a step.
Figure 4.8 – Original 10 Step Data Set
Original Data30252015105
0 -5 -10 -15 -20 -25
136
131
126
121
116
111
106
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6 1
X Axis Y Axis Z Axis Pos Avg
33
4.4.1 Multiplicative Transforms
The first transforms we applied were a serious of multiplicative transforms, in
our case we multiplied the accelerometer input values by two. At first we multiplied
all three axis by two, this resulted in higher peaks across all the axis and thus a
higher average peak, causing the algorithm to count more steps then were actually
taken. Next limited the multiplication to only one axis, in our case to the z axis. The
algorithm still counted a high number of steps, but was 15% less than multiplying all
3 axis by a factor of 2. This can be a powerful tool for source code error detection.
Multiplying data by a constant allows developers to stretch their algorithms to the
breaking point, proving incite on how much alteration or out laying data points it can
handle before failing.
Figure 4.9 – Multiplicative Transform on 10 Step Data Set
Multiply All Axis by 26050403020100 -
10 -20 -30 -40 -50
136
131
126
121
116
111
106
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6 1
X Axis Y Axis Z Axis Pos Avg
34
4.4.2 Interpolating Transforms
This transform is simply taking the average of every adjacent pair of numbers
with in the data array and averaged them together. The resulting average was then
inserted in between those two numbers. This smoothed out the data resulting in
smaller peaks, but not to such a degree that the algorithm could no longer perform
peak detection. The result was a higher degree of accuracy and allowed for a lower
threshold error across almost all data sets tested. This transform can be an
invaluable tool to help developers eliminate noise or unwanted data from their data
sets, it is a poor tool for error detection because of its tendency to mitigate them.
Figure 4.10 – Interpolating Transform on 10 Step Data Set
Interpolating Transform30252015105
0 -5 -10 -15 -20 -25
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
151
161
171
181
191
201
211
221
231
241
251
261
271
X Axis Y Axis Z Axis Pos Avg
4.4.3 Adding Avg Noise Transforms
This transform finds the overall average of a particular set of data, in our case
the X,Y, and Z axises, then adds that average value to every number in data set. This
rises the overall
35
average of the data set as a whole, while flattening the data set at the same time.
This method doesn’t provide the same threshold accuracy result interpolating does,
but it still produces a noticeable improvement. The effectiveness of this Transform as
an error detection tool is largely passed on the metamorphic properties of the
software in question. If your software relies on data that has a wide range of both
large and small numbers being a specific distance from each other, this transform
can be used to test how fare or close those numbers can before your algorithm fails.
For example our pedometers peak detection algorithm contains a statement that
checks to see if the last peak recorded is at least two thirds as high as the current
peak, if yes count as a step, if no discard as walking motion noise.
Figure 4.11 – Add Average Noise Transform on 10 Step Data Set
Add Avg Noise20
15
10
5
0
-5
-10
-15
136
131
126
121
116
111
106
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6 1
X Axis Y Axis Z Axis Pos Avg
4.4.4 Down Sampling Transforms
36
Down Sampling is perhaps the ultimate test for evaluating how effective a
sensor based algorithm is. It does nothing to improve overall threshold accuracy of a
software in most cases, but as an error detection tool it can provide a great deal of
information. This transform can be used to evaluate how much data can be lost
before an algorithm’s performance starts to decay. During our testing we down
sampled data sets 50 percent, effectively reducing the number of accelerometer
input values being fed to the application by half. This greatly reduced the accuracy of
all results produced, but because of it ability to introduce unknowns into your
algorithm, it can be a great tool for error detection. Forcing developers to do more
with less data or applying transforms that help to improve overall threshold accuracy
such as the interpolating transform.
Figure 4.12 – Down Sampling Transform on 10 Step Data Set
Down Sample 50%25
20
15
10
5
0 -5 1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769
-10
-15
-20
-25
X Axis Y Axis Z Axis Pos Avg
4.4.5 Semantical Transforms
Perhaps the straightest forward transform for error detection, semantic
transforms simply apply a mathematical property to existing data in such a manner
that should result in the exact same data. These methods can range from multiplying
by 1 or adding 0 to applying Sin or cosine properties or applying a matrix transforms
to your data set. The method in which you
37
apply this transform can very based on testing needs, but the result should always be
the same. If your data changes over the course of this transform, your software is
fundamentally flawed.
Figure 4.13 – Semantical Transform on 10 Step Data Set
Semantic30
20
10
0
-10
-20
-30
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103
109
115
121
127
133
X Axis Y Axis Z Axis Pos Avg
4.5 Fault Seeding Study
In order to evaluate the effectiveness of metamorphic testing and its
transforms for error detection, we used a method called fault seeding. Fault seeding
is simply the introduction of known errors into software source code, in our case we
are introducing a number of errors into the step detection algorithm with in the
pedometer application. In order to evaluate against a wide range of possible real
world errors, we enlisted the help of several graduate students and professors with in
the computer science department at VSU.
38
We gave our participants a set of instructions, which can be found in Appendix
D, asking them to introduce several computational or logical errors into a serious of
functions that govern the pedometer’s step detection algorithm. Using the original
source code as a base case, we first recorded all the results produced by the original
code using both raw unmodified input sensor data, and morphed transform data. This
was simply a matter of recording the number of steps the algorithm calculated after
a particular transform or transforms were applied. These results were then compared
to the results of the corrupted code after the same set of transforms were applied.
The full list of results can be found in Appendix B.
39
CHAPTER 5 – EVALUATION
5.0 Study Recap
Over the course of this endeavor he have created several unique tools and
methodologies for Android developers to find and create test cases for any given
sensor driven applications. Our objective was to determine what testing strategies
are being deployed by indie developers today, as well as conclude if metamorphic
testing is possible on Android platforms, and if so evaluate is effectiveness. The final
evaluation of these systems is outlined below.
5.1 Test Case Detection Results
After applying our parsing algorithm to a body of thirty open source
applications, we found that all most all of them fail to perform some sort of internal
testing. The complete results sheet of this analysis cab be found in Appendix C. Only
three Android created test case were found, as well as 3 user defined test cases. All
six of the detected test cases were located amongst three application. Thus only 10%
of the applications we tested contained some sort of internal testing functionality.
This may be due to the fact that our pool of applications are in fact open source. If we
applied out algorithm to a body of paid closed source applications such as
“FaceBook” or “Clash of Clans”, my hypotheses is that we would detect far more
internal test case.
To further validate our results we compared our finding to that of a much
larger test case detection study conducted at Singapore Management University [35].
Using a pool of over 600
Android applications collect from 2 online repositories, F-Droid [37] and GitHub [33], these
researchers concluded that only 14% of the applications evaluated contained test cases. These
findings are very close to our own conclusion of 10%. This Study went one step
further and also found that only 9% of the apps that have executable test cases have
coverage above 40%. This means that less than 1% of open source Android
application contain test cases that examine more than half of its source code.
5.2 Initial Transform Results
In order to be certain our transforms were working as intended we applied
them to a serious of incoming sensor data, and them checked the new resulting to
determine if the correct mathematical operations had been applied. This transformed
data was them feed into our step detection application. The number of steps
detected was then recorded. During this stage we could evaluate what effect each
transform had on overall threshold accuracy of the application, and thus its overall
performance. Some transforms greatly increased accuracy over all data sets tested
while other had adverse effects. For example transforms that applied data averages
to the data set as a whole tended to decrease the error threshold, which in turn
increased accuracy. Other transforms such as adding noise tended to decrease over
all accuracy. All of these results were used as a base case for our error detection
experiment. The complete transform analysis can
be found in Appendix F.
41
Figure 5.1 – Base Line Pedometer Results before Transforms
Base Line Data
# of Steps Device Steps Calculated WhenData Set Name Actually App Sensitivity = DefaultUsedTaken Sensitivity
Marcos10Hip.csv 10 Note 3 14Phone
Marcos50Hip.csv 50Note 3
49Phone
Cece50StepsV2.csv 50 Note 3 68Phone
CeCe100StepsHip.csv Galaxy S350 47
Cece50StepsHipS3.csv 25 Android 21Tablet
Each “.csv” contains several thousand points of accelerometer data recorded
using SenSee. The table above simply depicts the resulting steps the pedometer
application calculated after each data set was processed, and compares it to the
number of steps actually taken. The next figure (Figure 5.2) illustrates the
application’s calculated steps using transforms data sets as input.
42
Figure 5.2 – Pedometer Application Results for each Transform
Calculated Steps After Transform
Multipl Multipl Convert Add Down BaseInsert To InterpolData Set Name y all By y Z Axis Semantic Avg Sample LineNoise Rounde atingTwo By Two Noise 50% Shiftd Noise
Marcos10Hip.csv 14 12 75 11 10 10 4 10 11
Marcos50Hip.csv 98 72 450 53 49 56 42 52 54
Cece50StepsV2.csv 70 60 251 44 44 39 18 44 40
Cece50StepsHipS3.c 86 70 336 49 47 51 29 47 52sv
accelerometer.csv 68 51 5370 11 21 12 26 24 17
As you can see some transforms, like insert noise, causes the accuracy of
calculated steps to fall dramatically for all data sets. While others such as adding
average noise increase overall accuracy for the given data sets. Now that we have
these transform results, we can use them as a new base line in order to detect errors
or changes either within the existing source code or future interactions of it.
5.3 Fault Seeding/Error Detection Results
After determining a transform base line for our original source code, we then
evaluated our transforms by introducing errors or defects into the application’s
source code. Some of these errors were small in scale and were only detected with
transforms that altered the input data on large scale, while other defects caused the
applications to fail all together. Figure 5.3 shows an
43
example of how the transforms are effected after an error has been introduced into
the source code. In this particular situation the error was small. A mathematical
operation has changed from addition to subtraction. The corrupted source code still
calculated 10 out of 10 steps. Using traditional white-box testing this error may have
gone unnoticed, but by applying several transforms to the input data, 2 of those
transforms (Inserting Random noise & Average Noise) returned results that did not
match out base line transforms, thus revealing an error or change in the code.
Figure 5.3 – Transforms Results after Introducing an Error
Calculated Steps After Transform
Multiply Multiply Insert Convert To Seman Add Down Inter Baseall By Z Axis By Random Rounded Avg Sample polat LineticTwo Two Noise Noise Noise 50% ing Shift
OriginalTransfo 14 12 75 11 10 10 4 10 11rmResult
Resultwith 14 12 77 10 10 9 4 10 11Error
After discarding the Random noise transform due the fact it will almost always
produce a result different from the base line, we are left with one transform that was
able to detect this particular error. High lighting the fact that even after applying a
wide range of metamorphic transforms you still may not be able to detect every
error, however this is a fare better option than traditional white-box testing. In this
case a traditional unit test would have more than likely passed this particular source
code if it did not employ some sort of metamorphic functionality. As a developer if
you want to increase the rate of error detection you are left with two options.
44
Either apply more transforms to your applications input data, for example we could
have applied twenty-five transforms instead of nine; or you can apply transforms that
better exercise your source code’s computations. The later solution requires
developers to have a concreate understanding of how their source code works. After
this understanding is achieved how you know what transforms should be applied.
Over the course of this project we have discovered several uses for our
particular transforms and how they may be best applied to other scenarios. We have
compiled this knowledge into a taxonomy that can be found below.
5.4 Full Transform Taxonomy
1) Multiplicative Transforms: Multiply numerical values by a constant
This Transform is relatively simple. You should know what the outcome should be. This transform is very good a testing for what is called limit errors in your software. If your program can only handle an 8 bit number and multiplying by a large content results in a 9 bit number your program will either dismiss the last bit or fail all together depending on the machine. The same can be down by multiplying large decimal numbers to your software to see how many decimal points your program can calculate before failing.
We applied this method to our app by multiplying all our accelerometer data points by a factor of two. We did not encounter any limiting errors with in the app, however using this transform we discovered that the algorithms step detection becomes less and less reliable the higher the accelerometer values are.
2) Insert Random Noise Transforms: This Transform Adds a noise value at complete random inserted in between every
If your algorithm needs to cancel out unneeded or useless data, applying this transform is a good way to test if your software can effectively handle the insertion of large spikes in your data. For example if you need to ignore all data that is above or below a certain threshold, but what if some of the random data is within the limits of that threshold, this will serve to corrupt your data. Determining the most effective threshold limit for such algorithms is where this metamorphic test shines.
The app we applied this transform to, did not have no such method. 45
3) Convert to Rounded Noise Transforms: This Transform modifies all the existing array values by converting them to some value plus or minus 1 from the original data.
Instead of inserting noise into the data set this transform converts the existing data. The transform will only change the number to a value no higher or lower than a value of one from the original number. This transform is great to see how your system handles small fluctuations or errors in your data. Many applications require a human input, these input are not always the most accrete so your system should be able to handle these errors.
4) Semantic Transforms: This Transform creates inputs that are have the same “meaning” as the original.
This is a very simple but effective method to check to see if your seemingly correct outputs are actually correct mathematically. By applying a mathematical function to your data that should result in the exact same output, such as multiplying by Cos(45), is effective in finding “order of operations” errors and other common mathematical mistakes.
When we applied this transform to our data set the resulting data was the same, thus we were able to conclude the application had no obvious misuse of mathematical operations.
5) Interpolation: Transform that adds average noise to the data set in between each original data point.
This method works by finding the average of two consecutive numbers than inserting that average value in-between those numbers. This Transform helps to guard against small errors or inconsistencies in your data, much like the “converted to rounded noise transform”. Thus this method will determine if your data set is reliable.
Applying this transform to our step detection app improved apps results by a factor of 20% on average. So if software engineers want to make their products more reliable, this would be a good place to start.
46
6) Down Sampling Transforms: This Transform down sizes the array by deleting a certain percentage of the values.
This deletes a certain percentage of data point with your data set, for example we deleted 50% of the data points when testing on the step detection app. This can do many things. If your data set collects data rapidly, at a rate of 500 date points a second for example, your system may be able to handle a 50 percent cut in data point and still be able to perform well. If your data doesn’t rapidly collect data, than the results will be more corrupted. So if a software engineer wants to know how many data points he can loss before his system starts becoming unreliable this transform is a good tool to have. Knowing this can allow him/her to either increase the number of data points collected within a given time frame, or combat the loss of data by using anther transform such as interpolation.
7) Add Average Noise Transform: This Transform adds the average value of a data set to every point in the data set.
This Transform is similar to the interpolation transform, but instead of inserting averages in-between 2 data points, this transform adds the average value of the data set to each point in the data set.
9) Add Average Noise Transform: This Transform moves the base line of the input data based on defined Rise and Run values.
5.5 Discussion
Most software testing practices today use a set of test cases constructed on
some predefined criteria in order to evaluate if a software’s processes are being
executed correctly. These methods take some input data, run it through the program,
then check to see if the resulting output is correct, if it is that test is considered
“passed”. The Android operating system uses the java programing language, which is
known for its large library of functions, to include testing functions that use this “test
case” frame work. So we performed our own empirical study
47
in order to determine what testing techniques are being used by current sensor
driven Android applications today.
5.5 Study Limitations
Although we successfully engineered a method for developers to apply
metamorphic testing to all sensor driven Android devices, our study did have some
limitations. The first of which pertains to our parsing algorithm. When searching for
developer created test cases, the algorithm returns all the functions within a source
code that may performing testing. However to quickly analyze rather or not a
function is testing function we are relaying on the programmer to name that function
as a test. If the testing function is not correctly named it becomes much harder to
decipher the true purpose of that function, and usually requires a manual inspection
of the code to determine its purpose.
The second limitation evolves the Android applications tested during the test
case detection study. Because many of these applications were pulled from open
source repositories such as GitHub, they are often created with no commercial
purpose in mind, and thus many of these applications require a very low degree of
reliability and accuracy, thus this may be way our parsing algorithm returned a very
low number of test case. If this algorithm was applied to a set of commercial
applications such as “FaceBook” or “Google Maps”, we find a much higher number of
test cases. These applications are closed source, and as of the date of this study, the
means to get the source code for these applications are either illegal or very
expensive.
48
CHAPTER 6 - SUMMARY
6.0 Summary
As our technological advances increase, new problems will arise for which
there is no current answer. As these problems grow in size and complexity, so too will
the computer programs needed to compute them, but with this growth comes the
possibility for more software errors. Developing new tests to detect these errors will
become more and more difficult on an exponential scale, but perhaps Edsger W.
Dijkstra but it best by stating “Program testing can be used to show the presence of
bugs, but never to show their absence” [12]. Creating a perfect program is nearly
impossible, but if testing advanced testing metrics like metamorphic testing we can
get pretty close. There has been many advances in the field of static error detection,
programs with known inputs and outputs. These advance include methods like
symbolic execution and model checking, even winning a Turing Award in 2007 [35],
but there has be relatively little advancement in dynamic error detection. As we rely
more on computer systems to calculate more complex unknowns, the testing metrics
used to evaluate these systems must also evolve. Problems such as the oracle
problem will be key if we hope to produce reliable independent software.
Our objective was to evaluate and provide a means which Android developers
may use to better their applications through the use of metamorphic testing. This
study concluded that metamorphic testing is not only possible but feasible, and
provided a means to universally apply it to all sensor based Android applications.
6.1 Recommendations for Future Research
My recommendations for future research would be to expand on more
transforms that we did not get to cover in this study, and evaluate them on a more
complex Android application.
50
APPENDIX A – Parsing Algorithm Code
namespace CodeAnalysisToolkit{
[TestFixture]public class SimpleAnalyticsCalculator_Thesis{
//------Test Case Class---------------------------------------------------
[TestCase]public void CalculateSimpleProjectStats(){
int NumOfApps = 30;
//-----------Current Working Method to Get sub directories -----------
// Get list of files in the specific directory.string[] TopDirectories = Directory.GetDirectories(@"C:\
School\Grad School (Comp Sci)\Thesis\Apps\","*.*", SearchOption.TopDirectoryOnly);
// Display all the files.//for (int i = 0; i <= NumOfApps; i++)//{
//Console.WriteLine(TopDirectories[i]);
//}
//Print out all Top Sub Directoies for Specified Path //foreach (string file in TopDirectories)//{// Console.WriteLine(file); //}
//----------End of Print Sub directory Method-------------------------
for (int i = 0; i < NumOfApps; i++){
var dataProject = new DataProject<CompleteWorkingSet>(TopDirectories[i],
Path.GetFullPath(TopDirectories[i]), "..//..//..//SrcML");
Console.WriteLine();
Debug.WriteLine("#############################################");Debug.WriteLine("Parsing " + TopDirectories[i]);
dataProject.UpdateAsync().Wait();
51
NamespaceDefinition globalNamespace; Assert.That(dataProject.WorkingSet.TryObtainReadLock(5000, out
globalNamespace));
DisplaySensorTypes(globalNamespace);//DisplayWhetherAppIsUnitTested(globalNamespace);DisplayCallsToOnSensorChanged(globalNamespace);//GetTypeForKeyword(globalNamespace);DisplayTestCaseClasses(globalNamespace);
}}
//-------Display Sensor Type Class----------------------------------------
private void DisplaySensorTypes(NamespaceDefinition globalNamespace){
var getDefaultSensorCalls = from statement in globalNamespace.GetDescendantsAndSelf()
from expression instatement.GetExpressions()
from call in expression.GetDescendantsAndSelf<MethodCall>()
where call.Name == "getDefaultSensor" select call;
foreach (var call in getDefaultSensorCalls){
if (call.Arguments.Any()){
var firstArg = call.Arguments.First(); var components = firstArg.Components; if (components.Count() == 3 &&
components.ElementAt(0).ToString() == "Sensor" && components.ElementAt(1).ToString() == ".")
{Debug.WriteLine("sensor
" + components.ElementAt(2).ToString() + " found");
}}
}}
//-------Display If this class has a Unit test----------------------------
private void DisplayWhetherAppIsUnitTested(NamespaceDefinition globalNamespace)
{
var testClasses = from klas in globalNamespace.GetDescendants<TypeDefinition>()
where klas.GetParentTypes(false).Any(t => t.Name == "ServiceTestCase")
select klas;
if (testClasses.Count() == 0)
52
{Debug.WriteLine("This File Does not contain any tests");
}else{
Debug.WriteLine("----- ");Debug.WriteLine("\r\n");Debug.WriteLine(testClasses.Count() + " TestClasses ");Debug.WriteLine("----- ");
foreach(var testClass in testClasses){
Debug.WriteLine(testClass.GetFullName() + " is a test class");}
}}
//-------Display If ActivityUnitTestCase test-----------------------------
---------------------------------
private void DisplayTestCaseClasses(NamespaceDefinition globalNamespace){
var testClasses = from klas in globalNamespace.GetDescendants<TypeDefinition>()
where klas.ParentTypeNames.Any(t => t.Name.Contains("ActivityUnitTestCase") ||
t.Name.Contains("ServiceTestCase") ||
t.Name.Contains
("ApplicationTestCase") ||
t.Name.Contains("ProviderTestCase2")
|| t.Name.Contains("LoaderTestCase")
||
t.Name.Contains("ActivityInstrumentationTestCase2")) select klas;
if (testClasses.Count() == 0){
Debug.WriteLine("This File Does not contain any test caseclasses");
}else
{
Debug.WriteLine("----- "); Debug.WriteLine("\r\n");Debug.WriteLine(testClasses.Count() + " Test Classes found "); Debug.WriteLine("----- ");
foreach (var testClass in testClasses)
53
{Debug.WriteLine(testClass.GetFullName()); //foreach(var parent in testClass.ParentTypeNames) //{// Debug.WriteLine("parent: " + parent); //}
}}
}
//-------Display Calls to OnSensorChanged Class---------------------------
---------------------
private void DisplayCallsToOnSensorChanged(NamespaceDefinition globalNamespace)
{var senChangedMethods = from method
in globalNamespace.GetDescendants<MethodDefinition>()
where method.Name == "onSensorChanged" select method;
if (senChangedMethods.Count() == 0){
Debug.WriteLine("This File Does not contain any Sensor Change
Mehtods");}
else{
Debug.WriteLine("----- "); Debug.WriteLine("\r\n");Debug.WriteLine(senChangedMethods.Count() + "
Implementations of " + senChangedMethods.First().GetFullName());Debug.WriteLine("----- ");
int n = senChangedMethods.Count(); for (int i = 0; i < n; i++){
var senChangedMethod = senChangedMethods.ElementAt(i); Debug.WriteLine("Implementations of onSensorChaged # " + (i +
1) + ": " + senChangedMethod.GetFullName());
//"GetCallsToSelf" returns the number of times the number is
calledvar callsToSenChanged =
senChangedMethod.GetCallsToSelf(); for (int j = 0; j < callsToSenChanged.Count(); j++){
var callerMethod = callsToSenChanged.ElementAt(j).ParentStatement
.GetAncestorsAndSelf<MethodDefinition>(); if (callerMethod.Any()){
54
Debug.WriteLine(" Called by --> " + callerMethod.ElementAt(0).GetFullName());
}}//Debug.WriteLine("----- ");
}} //End of Else does not Equal 0 Check
}
55
APPENDIX B
List of Apps Used in Test Case Detection Study
Android-Compass URL No Longer Available
Android-pedometer https://github.com/bagilevi/Android-pedometer
GlassSensorTest https://github.com/lnanek/GlassSensorTest
KineticSensors https://github.com/sebLopezCot/KineticSensors
My-StepCounter https://github.com/MichaelJames6/My-StepCounter
Pedometer https://github.com/phishman3579/Android-pedometer
TiltPong https://github.com/mah68/TiltPong
Tilt-snake Co URL No Longer Available
satstat https://github.com/mvglasow/satstat
cartsbusboarding https://github.com/carts-uiet/cartsbusboarding
ThermometerExtended2 https://github.com/mateuszbuda/ThermometerExtended2
Android-sensorium https://github.com/fmetzger/Android-sensorium
Community Compass https://bitbucket.org/alekseyt/compass/downloads
getback_gps https://github.com/ruleant/getback_gps
sosmobileclient https://github.com/52North/sosmobileclient
org.thecongers.mtpms https://github.com/kconger/org.thecongers.mtpms
SAnd https://github.com/kas70/SAnd
sensorreadout https://github.com/onyxbits/sensorreadout
pushup https://github.com/pjq/pushup
pushup_counter https://github.com/lyahdav/pushup_counter
56
Nhundredthings (Push up Counter) https://github.com/nkijak/nhundredthings
audio detection https://github.com/twrobel3/RightHear
AudioRecorder https://github.com/railskarthi/AudioRecorder
Android-AudioRecorder https://github.com/Uncodin/Android-AudioRecorder
Altimeter https://github.com/jkozerski/Altimeter
Altimeter https://github.com/efalk/Altimeter
face-recognition https://github.com/thelinmichael/face-recognition
Recognize Facial Expression https://github.com/chinmaykrishna/FacialRecognition
QRCodeReaderView https://github.com/dlazaro66/QRCodeReaderView
accelerometer-app to learn Eating https://github.com/analogjedi/accelerometer-appPatterns
57
APPENDIX C
Results from Test Case Detection Study
Sand App
Description: Uses your phones sensors (barometer and compass) to show your current orientation, height and air pressure.
Analytics Output
Parsing C:\School\Grad School (Comp Sci)\Thesis\Apps\SAnd-master
sensor TYPE_ORIENTATION found sensor TYPE_PRESSURE found-----1 Implementations of com.platypus.SAnd.MainActivity.onSensorChanged-----Implementations of onSensorChaged # 1: com.platypus.SAnd.MainActivity.onSensorChanged-----1 Test Classes found-----com.platypus.SAnd.ApplicationTest
Conclusion:
onSensorChanged – No testing of sensor computation was performed with in this function.
ApplicationTest - No Testing was actually performed in this test call, Perhaps the developers had planned to perform some testing in the future, but in this version the function call is empty.
58
Cartsbusboarding App
Description: Communication Assisted Road Transportation System. Bus Boarding Event Detection Module.
Analytics Output
Parsing C:\School\Grad School (Comp Sci)\Thesis\Apps\cartsbusboarding-master
sensor TYPE_ACCELEROMETER found-----1 Implementations of in.ac.iitb.cse.cartsbusboarding.acc.AccListener.onSensorChanged-----Implementations of onSensorChaged # 1: in.ac.iitb.cse.cartsbusboarding.acc.AccListener.onSensorChanged-----2 Test Classes found-----in.ac.iitb.cse.cartsbusboarding.test.ApplicationTestin.ac.iitb.cse.cartsbusboarding.test.acc.FeatureCalculatorTest
59
Conclusion
onSensorChanged – No testing of sensor computation was performed with in this function.
ApplicationTest - No Testing was actually performed in this test call, Perhaps the developers had planned to perform some testing in the future, but in this version the function call is empty.
FeatureCalculatorTest – This file does contain testing, even some degree of metamorphic testing by using the average and standard deviations of the sensor data to the accuracy of his results.
60
61
APPENDIX D
Fault seeding instructions Sheet
Introduction: Your goal is to introduce some errors with in the provided code. These errors can be both computational and logical. The purpose of this experiment is to identify your bug using a process called metamorphic testing, a process were we attempt to identify a fault that exists in a piece of software by transforming the properties of its input data. This is done by taking advantage of the mathematical properties that exist in most software’s allowing us to transform the input data in manner that will produce a predictable result. If the result is different, then we have detected a flaw. The errors that you introduce will help us determine if our transforms are adequate for detecting real bugs and mistakes a developer may make. If you can create a bug we cannot detect than, we will have discovered a problem we have not for seen, and thus will allow us to create a transform to detect that it.
The Code: The code we have provided you is the step detection function for an Android prodometer application. This function works by adding up the X,Y, and Z values from the Android accelerometer sensor and storing it into a value named “vSum”. This value also has some additional calculations applied to it so account for things like earth’s gravity and magnetic field. “vSum” is than divided by three and stored in a value called “v”. This “v” variable is used to calculate steps. There is a serious of loops that checks to see if “v” has reached a certain threshold, if yes then the algorithm counts a step, if not then the algorithm considers this data to be motion noise and ignores it.
We have provided an excel spread sheet of the “v” Value graphed in order to give a visual representation. Generally every peak represented on the graph should be a step counted by the algorithm.
Instructions: Make some changes to the existing code. You are free to add or remove any code, but remember the purpose is not to break the code to the point of uncompilability, but to instead introduce a bug that is either app breaking or subtle
62
enough to get passed a testing team, either way the code must compile in order to apply our transforms.
Examples:
∑ change the constant values used for mathematical computation ∑ Changes the conditions in for loops∑ Delete or add condition statements
Excel Chart: This can also be found in the attached Excel Spread Sheet.
v = vSum/3280
270
260
250
240
230
220
210
1 16 31 46 61 76 91 10 6 12 1 13 6 15 1 16 6 18 1 19 6 21 1 22 6 24 1 25 6 27 1 28 6 30 1 31 6 33 1 34 6 36 1 37 6 39 1 40 6 42 1 43 6 451
63
APPENDIX E
Complete Base Line Transform Analysis
Green boxes = Results that are more than 70% Accurate
64
BIBLIOGRAPHY
[1]T. Chen, S. Cheung and S. Yiu, Metamorphic Testing: A New Approach for Generating Next Test Cases. Hong Kong: Department of Computer Science Hong Kong University, 1998.
[2]G. Kaiser and F. Su, 'Finding Bugs in Machine Learning, Data Mining and Big Data Applications | Programming Systems Laboratory', Psl.cs.columbia.edu, 2015. [Online]. Available: http://www.psl.cs.columbia.edu/64/metamorphic-testing/. [Accessed: 17- May- 2015].
[3] Istqbexamcertification.com, 'What is Software Testing?', 2015. [Online]. Available: http://istqbexamcertification.com/what-is-a-software-testing/. [Accessed: 06- May- 2015].
[4] Istqbexamcertification.com, 'What is Test design? or How to specify test cases?', 2015. [Online]. Available: http://istqbexamcertification.com/what-is-test-design-or-how-to-specify-test-cases/. [Accessed: 10- May- 2015].
[5]E. Barr, M. Harman, P. McMinn, M. Shahbaz and S. Yoo, The Oracle Problem in Software Testing: A Survey. IEEE TRANSACTIONS ON SOFTWARE ENG, 2015.
[6]J. King, Symbolic Execution and Program Testing. IBM Thomas J. Watson Research Center, 1976.
[7]Msdn.microsoft.com, 'Unit Testing', 2015. [Online]. Available: https://msdn.microsoft.com/en-us/library/Aa292197%28v=VS.71%29.aspx. [Accessed: 19- May- 2015].
[8]agile.csc.ncsu.edu, 'White-Box Testing', 2015. [Online]. Available: http://agile.csc.ncsu.edu/SEMaterials/WhiteBox.pdf. [Accessed: 23- May- 2015].
[9]S. Webmaster, 'What is Simulation - Simulation Software Explained', Simul8.com, 2015. [Online]. Available: http://www.simul8.com/products/what_is_simulation.htm. [Accessed: 17- Jul- 2015].
72
[10] Softwaretestinghelp.com, 'What is Integration Testing and How It is Performed? — Software Testing Help', 2015. [Online]. Available: http://www.softwaretestinghelp.com/what-is-integration-testing. [Accessed: 20- Jun- 2015].
[11]C. Pasareanu, 'Symbolic Execution and Model Checking for Testing', YouTube, 2015. [Online]. Available: https://www.youtube.com/watch?v=azTVEwxN8zM. [Accessed: 02- Jun- 2015].
[12]E. Dijkstra, 'E.W. Dijkstra Archive: Structured programming (EWD268)', Cs.utexas.edu, 2015. [Online]. Available: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD268.html. [Accessed: 10- Jun- 2015].
[13]P. Boonstoppel, C. Cadar and D. Engler, Attacking Path Explosion in Constraint-Based Test Generation. Computer Systems Laboratory, Stanford University.
[14]M. Chair, P. Schaumont and P. Plassmann, Strategies for Scalable Symbolic Execution-based Test Generation. Blacksburg, Virginia: Virginia Polytechnic Institute and State University Department of Computer Engineering, 2010.
[15]G. Tassey, The Economic Impacts of Inadequate Infrastructure for Software Testing. Gaithersburg: National Institute of Standards and Technology, 2002.
[16] J. Burnim and K. Sen, \Heuristics for Scalable Dynamic Test Generation," in Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on,pp. 443{446, September 2008.
[17] Cs.cmu.edu, 'Model Checking at CMU', 2015. [Online]. Available: https://www.cs.cmu.edu/~modelcheck/. [Accessed: 20- Jun- 2015].
[18]X. Xie, J. Ho, C. Murphy, G. Kaiser, B. Xu and T. Chen, Testing and Validating Machine Learning Classifiers by Metamorphic Testing. National Institutes of Health, 2011.
[19]A. Smola and S. Vishwanathan, INTRODUCTION TO MACHINE LEARNING. University of Cambridge, 2008.
73
[20]Z. Zhou, D. Huang, T. Tse, Z. Yang, H. Huang and T. Chen, Metamorphic Testing and Its Applications. Hong Kong: International Symposium on Future Software Technology, 2004.
[21] The Independent, '42: The answer to life, the universe and everything', 2011. [Online]. Available: http://www.independent.co.uk/life-style/history/42-the-answer-to-life-the-universe-and-everything-2205734.html. [Accessed: 20- Jul- 2015].
[22] Compliantmechanisms.byu.edu, 'Introduction to Microelectromechanical Systems (MEMS) | Compliant Mechanisms', 2015. [Online]. Available: https://compliantmechanisms.byu.edu/content/introduction-microelectromechanical-systems-mems. [Accessed: 20- Jul- 2015].
[23]N. Zhoa, Full-Featured Pedometer Design Realized with 3-Axis Digital Accelerometer.
[24]D. Beyer, T. Henzinger and G. Theoduloz, Program Analysis with Dynamic Precision Adjustment. 2015.
[25]M. Harrold, J. Offutt and K. Tewary, An Approach to Fault Modeling and Fault Seeding Using the Program Dependence Graph.
[26]F. Grigorjev, N. Lascano and J. Staude, A Fault Seeding Experience. Motorola Global Software Group.
[27] Developer.Android.com, 'SensorManager | Android Developers', 2015. [Online]. Available: http://developer.Android.com/reference/Android/hardware/SensorManager.html. [Accessed: 20- Jul-2015].
[28]T. Fundamentals, 'Testing Fundamentals | Android Developers', Developer.Android.com, 2015. [Online]. Available: http://developer.Android.com/tools/testing/testing_Android.html. [Accessed: 23-Jun- 2015].
[29] Vogella.com, 'Android application testing with the Android test framework - Tutorial', 2015. [Online]. Available: http://www.vogella.com/tutorials/AndroidTesting/article.html. [Accessed: 25-Jun- 2015].
74
[30] Developer.Android.com, 'SensorEventListener | Android Developers', 2015. [Online]. Available: http://developer.Android.com/reference/Android/hardware/SensorEventListener.html. [Accessed: 20-Jul- 2015].
[31] Srcml.org, 'What is SrcML.Net', 2015. [Online]. Available: http://www.srcml.org/about-srcml.html. [Accessed: 20- Jul- 2015].
[32] GitHub, 'abb-iss/SrcML.NET', 2014. [Online]. Available: https://github.com/abb-iss/SrcML.NET/blob/master/ABB.SrcML.Data.Test/CodeParserTests.cs. [Accessed: 20- Jul- 2015].
[33] GitHub, 'Build software better, together', 2015. [Online]. Available: https://github.com/. [Accessed: 20- Jul- 2015].
[34] SenSee Application, 2015. [Online]. Available: https://play.google.com/store/apps/details?id=sysnetlab.Android.sdc&hl=en. [Accessed: 20- Mar-2015].
[35] Amturing.acm.org, 'Edmund Clarke - A.M. Turing Award Winner', 2015. [Online]. Available: http://amturing.acm.org/award_winners/clarke_1167964.cfm. [Accessed: 03- Jul- 2015].
[36]P. Kochhar, F. Thung, N. Nagappan, T. Zimmermann and D. Lo, Understanding the Test Automation Culture of App Developers. Singapore Management University, 2015.
[37] F-droid.org, 'F-Droid | Free and Open Source Android App Repository', 2015. [Online]. Available: https://f-droid.org/. [Accessed: 23- Jul- 2015].
75