Final Draft 3 Thesis v3 Post Defense

Metamorphic Testing of Sensor Processing for Android Applications

By

Marco Peterson

A thesis submitted to the Faculty of the College of Graduate Studies of Virginia State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the School of Engineering, Science, and Technology

Virginia2015

Approved by:

______________________________Dr. Kostadin Damevski (Advisor)

_______________________________Dr. Hui Chen (Committee Member)

_______________________________Dr. David Walter (Committee Member)

ABSTRACT

The field of Software Engineering has always strived to enable the creation of more

reliable and accurate software by implementing a range of software testing

techniques to ensure source code executes as intended. Traditional software testing

is done by evaluating results against an oracle, consisting of a set of acceptable

outputs for each test case. A test case is another program created to emulate real

world inputs and scenarios a particular software might encounter. This is an effective

method of testing and is summarily an industry standard of today; but as we all

know, no program is without its bugs and glitches. Detecting theses errors more

effectively has become one of the most pressing objectives for many computer

science industries. Perhaps the chief error detection obstacle software engineers

face today is known as the oracle problem. The oracle problem arises from one of

two situations. The first is when the answer to the problem the software under test is

solving is difficult to constrain. This issue occurs most often in machine learning

software, where a machine must perform a task without be explicitly programed,

such as the self-driving car. In this case a source code must learn how to complete a

task from the input of the world around it. The second situation is when it is either

impossible or too expensive to create a test for all reasonable inputs a software

might encounter. Both situations leave the software developer without a means to

test their software effectively. In the case of sensor data calculations, it is very

difficult to calculate accurate results when given wide range of possible sensor

inputs. The goal of this Thesis is to evaluate the effectiveness of a technique known

as Metamorphic testing on sensor based application on Android platforms in order to

solve issues such as the oracle problem. Metamorphic Testing is a software testing

technique that takes

already existing test cases for a particular software and builds new test cases. This method

essentially reuses test cases to apply different mathematical properties until an error is found.

ACKNOWLEDGEMENTS

I would like to thank my advisor Dr. Kostadin Damevski for the continuous

support of my Master’s thesis and research. His patience, motivation, enthusiasm,

and immense knowledge paved the way for this research.

I would also like to thank Dr. Hui Chen for his help and expertise over the

course of my time in Master’s Program. Lastly I would like to thank all the Professors

and Staff for their help and guidance over the entire life span of my time at Virginia

State University

Last but by no means least; I would like to acknowledge the support from my

friends and peers for all their help both directly and indirectly.

iii

TABLE OF CONTENTS

List of Figures……………………………………………………………………………………….vList of Tables………………………………………………………………………………………..vi

1. Introduction ....................................................................................

...................................1

1.0Overview.......................................................................................

........................1

1.1

Aims and Objectives.....................................................................................

.......2

1.2

Research Questions......................................................................................

........3

1.3

Chapter Outline..........................................................................................

..........3

2.

Problem Statement/Hypothesis..............................................................

........................4

2.0

Problem Statement......................................................................................

.........4

2.1Hypothesis ....................................................................................

........................5

3.

Background/Related Works .............................................

..............................................6

3.0Traditional White-Box

Testing...........................................................................63.0.1 Simulation Testing……………………………………………………

73.0.2 Symbolic Execution…………………………………………………..

93.1 Path Explosion………………………………………………………………….

103.2 The Oracle Problem.................................................................................................................113.3 Machine

Learning……………………………………………………………….113.4 Metamorphic Testing…………………………………………………………..

123.4.1 List of Common Metamorphic Properties

143.4.2 Stacking Metamorphic Tests

153.5 Step Detection Algorithm.................................................................................................................16

3.5.1 Step Cycle Detection…………………………………………………. 17

3.5.2 Calculating Steps Filter………………………………………………. 19

3.6 Fault Seeding and Detection……………………………………………………20

4. Design and Approach21

4.0 Android Frame Work.................................................................................................................214.1 Test Case Detection.................................................................................................................23

4.1.1 Detecting Android API Tests……………………………………….. 244.1.2 List of Android API Tests Searched for……………………………. 244.1.3 Detecting Developer Created Tests………………………………….254.1.4 Test Case Detection Procedure………………………………………28

4.2 Data Collection (SenSee).................................................................................................................29

4.2.1 Data Collection Procedure……………………………………………31

4.3 Error Detection…………………………………………………………………. 32

4.4 Applied Metamorphic Transforms.................................................................................................................33

4.4.1 Multiplicative Transforms…………………………………………… 34

4.4.2 Interpolating Transform………………………..……………………..35

4.4.3 Adding Avg Noise Transform ……………………………………….35

4.4.4 Down Sampling Transform…………………………………………...36

4.4.5 Semantical Transform…………………………………………………37

4.5Fault Seeding Study……………………………………………………………..38

iv

5. Evaluation...................................................................................

........................................40

5.0

Study Recap ............................................................................................

..............40

5.1Test Case Detection

Results................................................................................40

5.2Initial Transform

Results ....................................................................................41

5.3Fault Seeding/Error Detection

Results.............................................................43

5.4Full Transform

Taxonomy……………………………………………………...45

5.5Discussion .....................................................................................

........................47

5.6Limitations of

Study ............................................................................................48

6. Summary..........................................................................................

..................................49

6.0 Summary.............................................................................................................

..49

6.1 Recommendations for Future Research............................................................

50

Appendix A......................................................................................................

.................................51

Appendix B......................................................................................................

.................................56

Appendix C......................................................................................................

.................................58

Appendix D .....................................................................................................

.................................62

Appendix E .....................................................................................................

..................................64

Appendix F .....................................................................................................

..................................65

Bibliography.....................................................................................................................................

72

v

LIST OF FIGURES

3.1 Rapid Growth of Conditional Possibilities...................................................8

3.2 Simulation Testing Single Path Execution..................................................9

3.3 Symbolic Execution Path Execution.............................................................................................................................10

3.4 Simple Cosine Test Case.............................................................................................................................13

3.5 Metamorphic Stacking.............................................................................................................................15

3.6 Accelerometer Sensor Data.............................................................................................................................17

3.7 Stride Diagram.............................................................................................................................19

3.8 Dynamic Threshold Leveling.............................................................................................................................20

4.1 Android Frame Work.............................................................................................................................22

4.2 t1 Test Case.............................................................................................................................26

4.3 t2 Test Case.............................................................................................................................27

4.4 Caller Method.............................................................................................................................28

4.5 Parsing Algorithm Output.............................................................................................................................29

4.6 Accelerometer Sensor Data with Tag Lines.............................................................................................................................

30

4.7 SenSee Capture and Transform Diagram.............................................................................................................................31

4.8 Original 10 Step Data Set.............................................................................................................................33

4.9 Multiplicative Transform on 10 Step Data Set.............................................................................................................................34

4.10 Interpolating Transform on 10 Step Data Set.............................................................................................................................35

4.11 Add Average Noise Transform on 10 Step Data Set.............................................................................................................................36

4.12 Down Sampling Transform on 10 Step Data Set.............................................................................................................................37

4.13 Semantical Transform on 10 Step Data Set.............................................................................................................................38

3.7 Stride Diagram.............................................................................................................................38

v

LIST OF TABLES

5.1 Base Line Pedometer Results before Transforms.............................................................................................................................42

5.2 Pedometer Application Results for each Transform.............................................................................................................................43

5.3 Transforms Results after Introducing an Error………………………………………...35

5.1 Questionnaire for evaluation.............................................................................................................................33

5.2 Distribution of the participants’ responses.............................................................................................................................39

5.3 Transforms Results after Introducing an Error…………………………………….......44

vi

CHAPTER 1 - INTRODUCTION

1.0 Overview

Reducing the cost of software development while improving software quality

is an important objective for the software industry. A study by Tassey estimated the

annual cost for software testing to be between $22.2 to $59.9 billion dollars, with

over half of those costs borne from mitigation activities caused by correcting errors

after a software’s release [15]. Checking a product for faults is standard practice in

almost all fields, and is fundamentally important to product quality. This is

especially true in the field of software engineering for two reasons. The first is the

complexity required from many modern software products. The second reason is

due to potential consequences of a software failure. The production of reliable

software is one of the fundamental requirements for applying computers to today's

challenging problems [12]. As computer programs grow in size and complexity,

testing costs will only increase. More research is needed to reduce these costs by

developing new, more effective testing methods and approaches.

A novel testing technique that aims to improve upon the state of the practice

is metamorphic testing. It has been used to help improve software accuracy and

reliability in several fields including Bioinformatics, Genetic Sequencing, and Machine

Learning. The focus of this thesis is applying this technique to sensor based

application,

more specifically Android based sensor applications. Many applications today use

sensor data to calculate some result. Applications ranging from calculating blood

pressure and heart rate to docking ships with the international space station.

However, calculating a desired result from a set of raw sensor data is not easy,

especially if the mathematical procedure to do so does not already exist. This

problem becomes exponentially more difficult when you are performing calculations

using more than one sensor. Perhaps the best example of this is today’s weather

forecasting system. Thousands of sensor arrays recording everything from humidity,

temperature, and wind speed are used in attempt to predict the forecast days in

advance, but it is not always accurate. Weather forecasting is an example of an

oracle problem. This is when all possible sensor inputs and combinations are

impossible to calculate, so creating a computer program to accurately predict the

weather one hundred percent of the time has proven to be equally impossible.

Solving and testing for the oracle problem has become a fundamental goal for

computer scientist today.

Weather forecasting is one of the most complex sensor based application in

existence today, nonetheless the basic principles remain the same. We are applying

metamorphic testing on a smaller scale in an attempt to understand how

metamorphic properties can be used to improve both the source code through error

detection and the overall error threshold accuracy of the software. The tools we

created will also provide Android developers with a platform to perform metamorphic

testing on their own applications.

1.1 Aims and Objectives

The goal of this thesis is to evaluate a testing technique known as

Metamorphic Testing within the Android platform. The objective is to evaluate the

effectiveness of metamorphic testing in finding errors within Android source code as

well as to evaluate the current testing practices being used by Android developers.

2

1.2 Research Questions

∑ What testing methods are Android developers currently using?

∑ Can metamorphic testing be applied to sensor based Android applications?

∑ How effective is metamorphic testing for detecting errors in Android source code?

∑ What Metamorphic transforms are most effective in evaluating the first three questions? ∑ Can we find transforms that can be applied to other software outside of Android?

1.3 Chapter Outline

This thesis consist of six chapters. Chapter 1 presents the overall goal the

thesis, including research questions, aims, objectives, and overview. Chapter 2

illustrates the problem statement and hypothesis based on related work in this area

of research, as well as give a brief history of software testing explaining where

metamorphic testing derived its concepts. Chapter 3 in the background chapter. It

provides an in-depth explanation on metamorphic testing and the methods used to

collect the sensor data used during this thesis. It also outlines the related works in

the fields of metamorphic testing, machine learning and fault seeding. Chapter 4

provides a detailed explanation of the Android frame work, and the transforms used

to during our evaluation. This chapter also provides a high level explanation on how

we were able to capture and transform onboard Android sensor data. Chapter 5

explains the results of your evaluation as well as the study’s limits. Finally Chapter 6

summarizes our work and provides recommendations for future research.

3

CHAPTER 2 – PROBLEM STATEMENT/HYPOTHESIS

2.0 Problem Statement

The conventional method to test software is to examine pairs of expected

output data and input data, then check to see if the expected output has been

achieved when a given input is passed through the code being tested. If the output is

incorrect, then it is safe to say your program has a bug/error; but what if the output is

correct? Is the code now faultless? The answer is no, as even for a relatively simple

program, reliably finding all errors that may exist is a difficult task. As software

increases in complexity, many computer programs are tasked with problems for

which the correct output is difficult to express in all cases or with 100% confidence.

This is known as the oracle problem in software testing. Finding errors, logic

mistakes, and general bugs is inherently difficult if a developer does not know what

the final outcome should be once a program’s computations are complete. As

mentioned in the movie “The Hitchhiker’s Guide to the Galaxy” a computer attempts

to compute the meaning of life [21], generating an arbitrary answer of 42. But, is that

answer correct? Perhaps the better question is how someone would test this

computer program for correctness. Metamorphic testing has been shown to be

effective by several studies [1] [1] [18] [19] in a wide range of testing applications,

especially testing software that possesses the oracle problem.

This thesis contains the methods needed to apply metamorphic testing to

sensor based Android applications. The goal is to provide Android developers with a

new tool to further test

and improve their applications, as well as provide an understanding of metamorphic testing and

it’s properties so it can be applied to other problems.

2.1 Hypothesis

Metamorphic testing transforms can be used to test sensor-based Android applications in order to improve overall error detection and error threshold.

5

CHAPTER 3 – BACKGROUND/RELATED WORKS

3.0 Traditional White-Box Testing

The term white-box testing is used to describe a group of methods used for

testing a software’s internal source code by constructing test cases. Also known as

clear box testing, or glass box testing (Beizer, 1995), these connotations indicate

that a developer has full visibility of the internal workings of the software product,

specifically, the logic and the structure of the code [8]. This visibility allows

developers to create test cases specifically designed to exercise a software’s

processing path and determine if it has reached an appropriate result. This method

is used to test a variety of source code functions such as data flow, decision

statements, networking connections, and program pathing. All of these examples

require the developer to evaluate the Software Under Test (SUT) using a predefined

set of inputs against the expected set of outputs.

There are two central “white-box” testing methods that can be applied when

creating a test case for a particular piece of software. The first is known as “Unit

Testing”. The most fundamental testing method of the two, Unit testing is used to

test one specific part of a code, usually a function or family of functions known as

modules or units. It has become a good programing practice to create several

separate modular functions to construct an overall piece of software, breaking a large

piece of code down into a bunch of small pieces of code that perform a very specific

task that contribute to the overall program as a whole. The primary goal of unit

testing is to take the smallest piece of

testable software in the application, isolate it from the remainder of the code, and

determine whether it behaves exactly as you expect [7].

The next type of testing method is Integration testing. Just like its name

suggests, this tests the assimilating of smaller pieces of code into a larger piece of

code after they have been verified to be correct through unit testing. This insures

that all the modules in the system are working together as intended [10].

When constructing test cases for error detection, developers can choose to

implement them using a variety of approaches. The best approaches exercise all

possible inputs and conditions within a given program in an attempt to insure no bug

is left undetected, this is called “Full Coverage”. However testing with full coverage

approaches my not always be possible or practical. Methods such as Simulation

Testing and Symbolic Extraction allow for deliberate and effective testing for some

software, but not all.

3.0.1 Simulation Testing

Perhaps the most basic form of software testing, simulation testing is the

simple process of feeding a predefined input into a program and evaluating the result

for accuracy. These tests are designed mimic the operation of world scenarios, such

as the day-to-day operation of a bank, the running of an assembly line in a factory, or

the staff assignment of a hospital or call center [9]. However simulation testing has a

fundamental flaw when it comes to testing software that have condition statements.

Using this method you can only test one condition at a time, if your program has

multiple conditions with several layers of nested conditions the number of possible

results grows very quickly, and testing for each of those results becomes more

difficult.

For Example, if your program has an “If Statement” it can execute one of the two possible

7

conditions at a time, either the true condition or the false condition. Another test is

required to execute the other condition. Most software today have several if

statements with in their source code, many of which as nested within each other.

Figure 3.1 illustrates how these possible conditions statement can grow rapidly

Figure 3.1 – Rapid Growth of Conditional Possibilities

This is just an example of one conditional statement. Other conditional

statements such as “If Else Statements can have more than just 2 possible

branches, further complicating the conditional logic of any given program.

Furthermore the same type of graph can be drawn to depict a programs

over structure. Complex programs will have individual functions they may or may

not be called during a particular test. These types of complexities make it very

difficult to achieve Full Coverage when

8

testing large complex software. Figure 3.2 depicts how simulation testing can only

execute one path at a time with in a complex program.

Figure 3.2 – Simulation Testing Single Path Execution [11]

FSM = Finite State Machine (i.e. Computer Program)

3.0.2 Symbolic Execution

In an attempt to obtain full coverage for complex programs, James King

created the first automatic testing method called Symbolic Execution in 1976.

Symbolic Extraction does away with concreate inputs (i.e. numbers) into a program.

Instead it supplies dynamic variables (or symbols) as inputs into the software being

tested; while keeping track of the conditions needed to travel along each path of the

source code [6]. This condition state tracking allows the symbols to dynamically

change in order to meet conditions needed explore and test another part of the

program.

For example if the symbol encountered an “If Statement” the value of the

symbol could change to satisfy the true condition. Since the current condition state

is recorded, the symbol variable can back track through the code, and then change

to satisfy the false condition. Repeating this process over and over this method of

testing will ultimately achieve full coverage as illustrated by figure 3.3 [6].

9

Figure 3.3 – Symbolic Execution Path Execution [11]

Even though Symbolic Execution is able to achieve full coverage, it is only

able to do so for relatively exceedingly large programs. As programs get large, their

conditional statements grow exponentially, costing more memory to track current

paths and more time to execute. This eventually caused the testing method to

become unpractical. This phenomenon is known “Path Explosion”.

3.1 Path Explosion

Symbolic techniques have been shown to be very effective in path-based test

case generation; however, they fail to scale to large programs [16]. This is because

the possible number of execution paths to be considered symbolically is so,

eventually only a small part of the Program path space is actually explored [14].

There have been several studies and projects dedicated to increasing the number of

possible paths methods such as these can handle. Most notably the field of model

checking [17], even winning the Turing Award in 2007 [35]. Todays most advanced

software contains millions of lines of code with billions of possible paths. Only

10

time will tell if new developments in this field will keep up with the path of ever

increasing path explosion, however these methods of testing are optimal for solving

other testing hurtles such as the oracle problem such as those found in machine

learning. This is especially true if these machines contain large decision making

processes with billions of possibilities.

3.2 The Oracle Problem

Traditional unit and integration testing methods are great for testing software

that have a known answer. Model testing is even better at automatically generating

full coverage tests for constrained software. Both of these testing metrics still require

finding inputs that cause execution to reveal faults [5]. What if you didn’t know all

the possible input combinations or execution paths a software might take to produce

a result? Furthermore, what if you don’t know what the answer should be? Applying

computers to solve for unknown problems is one of the stables of the industry, but

testing such software is incredibly difficult and costly. This is known as the oracle

problem [5], and solving it has been a major issue for several fields of computer

science. After all, answering questions we do not know the answer to is the

fundamental requirement for scientific advancement. Solving the oracle problem

involves constructing some sort of test oracle or table of expected results that can be

compared to a given set of inputs [18].Most of these types of applications fall under

the umbrella of machine learning.

3.3 Machine Learning

The basic definition of Machine Learning is getting computers to act without being

explicitly programmed, and over the past two decades Machine Learning has become one

of the mainstays of information technology [19]. These algorithms can be as simple as

the spam filter in your email learning which emails to send to your junk folder, or as

complex as the self-driving car; but they

11

all face the same fundamental problem. These computer applications do not start off

knowing all the answers to every problem they may face, hence the name “machine

learning”. When developing these applications, how do programmers know that the

software they have written will instruct a self-driving car to stop at a red light instead

of speeding through it? In situations like these, traditional testing measures cannot

be applied due to large number of possible inputs and execution paths. Many these

software also lack a definitive result the computation it is trying to execute. Here

Metamorphic Testing can be applied to the machines known set of rules to evaluate

if the program will react in the desired manner when presented a choice. The idea is

relatively simple, but extremely difficult to execute.

3.4 Metamorphic Testing

The concept of metamorphic testing was formally introduced to the world in

1998 by three professors from the University of Hong Kong. Dr. Chen, Dr. Cheung,

and Dr. Yiu [20]. They observed three fundamental problems with current white-box

testing methods. The first observation made was that software which passes its

initial test cases were considered successful and are seldom investigated further for

errors. Second, no matter how much testing is done, a software will most likely still

contain errors. Lastly, obtaining a test oracle to test against in many software

applications (especially in the development phase) is unrealistic in many situations.

[20] Solving the oracle problem allows developers to tackle computing challenges

that we do not know the answer to. Perhaps chief among these is the challenge of

machine learning.

However the aim of this thesis is to tackle the second observation made by

Dr. Chen and his colleagues, which states that almost all software contains errors.

These errors can either be logical errors that break the software in general, as well as

mathematical or algorithmic errors that cause the program give an inaccurate or

inconsistent result. In order to solve this problem we

12

must address the first observation which states once a software passes its first test

case is seldom tested again for further errors. In most cases a tested program still

contains errors that the first test case did not reveal. Typically when this happens a

new unit test case is created in an attempt to find the error.

This is where metamorphic testing differs from traditional white-box unit

testing. Instead of making more test cases from scratch, metamorphic testing

derives new test cases from the existing passing ones by applying a transform to the

original output of the original test case. These Transforms are typically a

mathematical operation or set of operations applied to the original data in order to

change the output result. The result should be changed in a predictable manner

based upon the transform applied. For example, if a Transform adds three to every

number in your data set, the result should reflect the transform applied, if it does not

you have found a potential error in your source code. The term metamorphic testing

comes from the fact that this method morphs existing input test data in order to

reevaluate the source code using the same test case. Figure 3.4 for example uses a

simple cosine property to check a result.

Figure 3.4 – Simple Cosine Test Case

We know that cosine exhibits certain mathematical properties, so if we make

changes to the input we can predict the output. Those cosine properties are what’s

called metamorphic properties. This is a simple example of a metamorphic property that

can exist within a program.

13

This logic of metamorphic properties can be implemented to create new tests

that can challenge your software functionality and accuracy. For instance we took a

test case that previously passed, and morphed the input data in a similar way so

that the output values should not change. If the test now fails, then we have

discovered an error in the program. This is an example of the Semantically

Equivalent Property. There are several metamorphic properties commonly used to

produce similar tests (listed below). Depending upon what computational techniques

a program performs determines what metamorphic properties are feasible when

creating a metamorphic test.

3.4.1 List of common metamorphic Properties

• Additive: Increase (or decrease) numerical values by a constant

• Multiplicative: Multiply numerical values by a constant

• Permutative: Randomly permute the order of elements in a set

• Invertive: Create the “opposite” of a set

• Inclusive: Add a new element to a set

• Exclusive: Remove an element from a set

• Compositional: Compose a set

• Noise-based: include input values that will not affect the output

• Semantically Equivalent: create inputs that are have the same “meaning” as the original

• Heuristic: create inputs that are “close” to the original

• Statistical: create inputs that exhibit the same statistical properties

14

3.4.2 Stacking Metamorphic Tests

The concept behind metamorphic stacking is simple. Take a transformed

output, then apply another transform. Keep transforming the input data until you

have reached a desired threshold. This is where metamorphic testing shines in its

ability to find changes or errors in code, while improving overall software

accuracy and reliability.

For example, a developer could apply multiple Noise based transforms to

determine how much noise a particular application can handle before it starts to fail.

Similarly we could then apply several averaging transforms to input data in an

attempt to cancel out the noise, or apply an exclusive transform to simply remove

the noise from the data set. Methods like these help reduce possible errors that

might exist in your code while improving overall accuracy and reliability of your

software. Continuously testing passing test cases until the software breaks. The

figure below details the transform flow.

Figure 3.5 Metamorphic Stacking

15

Applying a transform is relatively simple, but how do you know which

transform to apply? Not every transform is going to fit every problem. As of right

now there is no industry standard for applying data transformations, mainly

because the field of computer science encompasses such a wide range of

industries. Many of these individual industries do have a set frame work for finding

software errors, but these methods often cannot be applied to another industry. To

understand how we applied metamorphic testing to our Android application you

must first understand the metamorphic properties of the software itself.

3.5 Step Detection AlgorithmThis thesis uses a pedometer application as a test bed to evaluate if

metamorphic testing can be applied to Android sensor data, and if so; it will be used

to measure its effectiveness. In order to do this we will be manipulating the

metamorphic properties within this application’s mathematical and logical

algorithms. Exploring and applying the correct properties requires an understanding

of basic human step detection.

Most people are familiar with the basic function of a pedometer, which is to

count the number of steps you take. Nonetheless, how does it count steps? Not to

many years ago pedometers had physical balls that rolled back and forth to

determine steps. Every time the ball made a full back and forth cycle the pedometer

registered one step, but this system takes up a lot of space and does not hold a high

threshold of accuracy. Most pedometers today use a microelectromechanical system

or MEMS [22]. MEMS use a series of accelerators to detect and calculate when a full

step cycle has occurred. When running or walking your body moves in three

dimensions. Accelerometers measure the rate acceleration for each of the X, Y, and

Z axes [23]. The Figure below depicts a sample of this data. The next section will

explain the math behind calculating a human step.

16

Figure 3.6 – Accelerometer Sensor Data

Sensor Data

Accl

erati

on

25

20

15

105

0

116314661769110612113615116618119621122624125627128

63013163

31346361

37639140

64214364

51 Time

X AxisY AxisZ Axis

3.5.1 Step Cycle Detection

Key Terms

Lead Leg – Leg in front of the runner.

Trail Leg – Leg behind the runner.

Stride position - The position

where your lead leg is extended

out to the farthest point in front

of your body.

Kick Position - The position

your trail leg is extended out

to the farthest point behind of

your body.

Once this data is collected it can be calculated to determine when a human step cycle has been completed, from there we can begin to count these cycles; thus giving us a step counter.17

Figure 3.7 illustrated below should help explain the concept. We will start with the

most apparent axis in the data set, which is the Z axis or your “side-to-side”

movement. Since acceleration is the measure of the change in speed not a measure

of constant speed, your “side-to-side” motion will have the greatest range of data

set. When running or walking a person generally swings their arms, creating a back

and forth sideways motion. Finding this axis is key when your pedometer axes are

not specific to individual orientation. For example many phones have pedometer

applications that function no matter how you orient your phone on your body. When

you start moving the software first looks for the data that has the highest

acceleration osculation and declares it the Z axis, this is called Peak Detection.

Next is the Vertical acceleration or the Y axis. When running your body moves

in an “up-and-down” motion. When you’re running and transitioning from the “stride”

position (The position where your lead leg is extended out to the farthest point in

front of your body) to the “kick” position (The position your trail leg is extended out

to the farthest point behind of your body) Your body is moving up, and thus

registering an acceleration force to the Y axis. At the top of this momentum your

body will eventually slow, coming to a complete to stop before it falls back down. The

height of this upward motion corresponds to a peak on the Y axis graph. Your body is

suspended in air for a very brief period, during this time acceleration is zero, so the Y

axis line begins fall. As you transition from the “kick” position back to the “stride”

position your body begins to accelerate upward. The Y axis graph will again rise

because you are again accelerating. It might seem counter intuitive for the

acceleration line graph to rise when you are accelerating downward, but acceleration

in any direction; up, down, left, right, forward, and back are all considered positive

acceleration values. A step cycle is considered complete when transition from kick to

stride position and the back to the kick position.

18

The final axis is forward acceleration. Conceptually you might think that this

would be the value that has the highest acceleration, but again if this was a measure

of overall movement then yes the forward axis would have the highest range and

thus, the highest peaks on our graph. However, since acceleration is the measure of

change in speed, the X axis has the least “back-and-forth” motion of the 3 axes. As

you run or walk your forward acceleration as you transition from the kick position to

the stride position increases, because you are in the process of bringing your lead leg

out in front of you (commonly called striding out). When your lead leg hits the ground

and starts becoming your trail leg and begins transitioning into the kick position, the

forward acceleration slows down. At the same time the your vertical acceleration

increases, because at this point your body is moving farther up than it is moving

forward.

Figure 3.7 – Stride Diagram [23]

3.5.2 Calculating Steps Filter

Filtering the data serves to purposes, the first is to smooth out the

accelerometer data, the second is to cancel out false positives. This is achieved by

using Dynamic Precision [24], the process of continuously updating the average of a

data set. In this case we have 3 data sets, the X,

19

Y, and Z axes. In order to find the average we first need to find the minimum value

and maximum values of a predefined subset of the entire axis array, in our case

every fifty samples. The average value is equal to (Max + Min)/2. This average is

called the dynamic threshold level. A step is counted if the original axis line with a

negative slope crosses the threshold line. Figure 3.8 below is an example of how this

method is applied to the Z axis values.

Figure 3.8 - dynamic threshold leveling [23]

3.6 Fault Seeding and Detection

In order to evaluate the Metamorphic Testing for error detection, we must

introduce some errors to the software under test, otherwise known as fault seeding

[26]. In this case the software under test is an Android pedometer application. The

basic concept behind fault seeding is simple. Insert a logical or mathematical error

into a piece of software, than run it through a test case. This helps a developer

determine if his/her test case can effectively detect that particular type of fault.

These faults can either be introduced to the code manually or generated

automatically using techniques such as Dependency Graphs [25].

20

CHAPTER 4 – DESIGN AND APPROACH

4.0 Android Frame Work

The Android Operating system has become one the most popular

development platforms over the last few years due in large part to its robust libraries.

Perhaps more importantly, it’s detailed documentation that provides developers with

an in-depth understanding of how to use its vast library of functions and how to test

them, as well as a large suite of built in test cases and functions. Through this

documentation [28] [29], and understanding of java, we were able to construct not

only a metamorphic testing frame work for the Android platform, but also a parsing

algorithm to automatically check Android applications for testing functions, also

known as test case detection. This creation of these two tools was done by carefully

taking advantage of some known Android functions and repurposing them to

generate an output that is useful to us.

The Android system inheres to the following frame work: In order for any

application to receive data from any device sensor, that application must ask for

permission from the Android operating system. This is done by calling the

“RegisterListener” Function from Android’s API (Application Programming Interface)

[27]. This Function takes two parameters; the first is the name of the object you

would like that sensor data forwarded back to. This name will be reused elsewhere in

the code to collect that particular type of sensor data. The second parameter is the

type of sensor data your application needs. This is important because smart phones

today have a large assortment of sensors ranging from GPS to microphones, this

parameter specifies what senor data the operating system forwards to the requesting

application.

Once an application has sensor permission from the Android operating system

we can then use that object name to receive data. Within that object is another

function from the Android API called “onSensorChanged” [30]. Android uses this

function to receive new values every time the data changes. For example whenever

your GPS location changes on your phone that GPS data is sent to the

onSensorChanged functions for all applications that currently have permission to

access the GPS sensor. Since all new sensor data is sent to this function, it is here

where applications must perform any and all computations on sensor data, as well as

any tests. These tests and calculations can either be done by native source code that

is within onSensorChanged function itself, or there may be other modular functions

that are called upon to perform the calculation tasks for any given application. This is

also holds true for any tests or test functions that may exist. Figure 6 illustrates how

the Android frame work operates.

Figure 4.1 – Android Frame Work

22

4.1 Test Case Detection

Now that we understand the frame work that powers sensors, we can

repurpose it to evaluate if sensor based Android applications are taking advantage of

the testing libraries and tools provided by the Android API. We can also determine if

the developers are implementing their own testing methods; and if so what kind of

testing are they implementing. In order to detect Android tests cases for sensor

applications we first need to determine if the app uses any sensor data from the

device itself. Mobile devices contain many sensors, but not all apps make use of

them. For example an application that keeps track of the number of steps you take

throughout the day may use a device’s accelerometer or GPS sensors to calculate

steps. Whereas an application that simply sends or receives messages (FaceBook for

example) will make no use of a device’s onboard sensors. To extract this information

from an application’s source code we used a SrcML.net [31] function called

GetDescendantsAndSelf<MethodCall>() [32]. When used this function parses

through a given source code looking for a specific function by name. In this case we

are looking for the Android function called getDefualtSensor [27]. By searching for

this function, SrcML can return the type and number of sensors any particular

application is using. If there is no sensor in use we can skip that application and

continue parsing the next one. The code used to complete this task can be found in

appendix A.

Once we know that an application makes use of a device’s sensors, the next

step is to check if the application performs any internal test during or after any

calculations that may be performed on the incoming input data from the sensors. For

example the step counting application should test itself to see if the desired output is

being achieved when given a set of input sensor data. There are two different types

of testing scenario’s we are looking for. The first is to determine if any testing is done

utilizing Androids built-in testing library. The Android API comes with a variety of

built-in testing functions that can be used to test a wide range of Android’s

functionalities. The second scenario is locating developer created testing functions.

23

This is when a developer uses either traditional white box testing or some other

testing strategy to create his own test cases. The ultimate goal is to detect both

developer created test functions and Android’s built test functions, however the

strategies used for detecting these two types are drastically different.

4.1.1 Detecting Android API Tests

We will start with the easier of the two scenarios to detect; which is

detecting test cases that have been built into the Android API. Since we already

know the names of the Android test functions and what they do thanks to Android

API documentation. From this we can determine if a developer decides to use one of

Android’s built in testing libraries. This is done much the

same way find the getDefaultSensor function, simply change the name of the keyword

you’re looking for during the parsing process. In this experiment we searched for six

Android test functions to see if developers where taking advantage of these built in

tools. The full list of Android tests we searched for can be found below. We

hypothesized that developers would attempt to use the provided testing methods

before building one from scratch.

4.1.2 List of Android API Tests searched for

∑ ActivityUnitTestCase - This class provides isolated testing of a single activity. The activity under test will be created with minimal connection to the system infrastructure, and you can inject mocked or nested versions of many of Activity's dependencies [27].

∑ ServiceTestCase - This test case provides a framework in which you can test Service classes in a controlled environment. It provides basic support for the lifecycle of a Service, and hooks with which you can inject various dependencies and control the environment in which your Service is tested [27].

24

∑ ApplicationTestCase - This test case provides a framework in which you can test Application classes in a controlled environment. It provides basic support for the lifecycle of an Application, and hooks by which you can inject various dependencies and control the environment in which your Application is tested [27].

∑ ProviderTestCase2 - This test case class provides a framework for testing a single Content Provider and for testing your app code with an isolated content provider. Instead of using the system map of providers that is based on the manifests of other applications, the test case creates its own internal map. It then uses this map to resolve providers given an authority. This allows you to inject test providers and to null out providers that you do not want to use [27].

∑ LoaderTestCase - A convenience class for testing Loaders. This test case provides a simple way to synchronously get the result from a Loader making it easy to assert that the Loader returns the expected result [27].

∑ ActivityInstrumentationTestCase2 - this class provides functional testing of a single activity. The activity under test will be created using the system infrastructure (by calling InstrumentationTestCase.launchActivity()) and you will then be able to manipulate your Activity directly [27].

4.1.3 Detecting Developer Created Tests

To find developer created test cases we set the parsing algorithm to search

for the onSensorChanged function, the exact same way we search for the

getDefualtSensor function. We know that the onSensorChanged function is were all

Android applications receive incoming sensor data from the Android operating

system, finding this function is the first step in detecting any developer created tests

that may exist.

There are two ways a developer can implement a testing function within the

onSensorChanged function. Either the testing source code and, or testing function,

can exist natively with in the onSensorChanged function itself (referred to as a t1

test case). Or embed into another function outside of onSensorChanged, which

performs the calculations on the sensor data, and then later called by that

calculation function to perfume testing (referred to as a t2 test case).

The t1 test case is the simpler of the two. The calculations are done within the onSensorChanged function either by performing source code calculations native to25

onSensorChanged or using a calculation function call to some function that exists

outside of the scope of onSensorChanged. However the calculations are done the

testing function that is used to evaluate these calculation is called with in the

onSensorChanged function itself. In this scenario we only need to determine the test

cases parent function one level up, which in our case is easy because we already

know the parent is the onSensorChanged function. The Parsing algorithm then can

return all the children of the onSenorChanged function, among them will be the

testing function or functions we are looking to detect. Figure 4.2 below illustrate the

flow of the sensor data from onSensorChanged to calculations on the sensor data, to

the passing of calculated data to a test function.

Figure 4.2 – t1 Test Case

If the test case is embedded in another function that exists outside of the

onSensorChanged function, we refer to it as a t2 test case. This is when the sensor

data is passed to another function to perform the mathematical calculations, then

the test function for these calculations is called with in the function performing the

math. This is a more real world scenario, because when creating a software almost

all of your code, especially computational code, is contained within a

26

function. This is also much harder to find where the testing function is located

because we no longer know what the name of its parent function is. For the t1 test

case we relied on the Android API to tell us what the name of the function was, then

we simply search for that function name when parsing the code. In this scenario the

developer could have named his calculation function anything. To solve this problem

we need to return all the functions called by the

OnSensorChanged function, and then return all of the functions called within

those functions. Figure 4.3 below illustrates how the t2 test function is called

(embedded) by a calculation function.

Figure 4.3 – t2 Test Case

As you can see the sensor data is simply passed from the onSensorChanged

function to the calculation function where it is processed, and then passed to the test

function.

The code to mine this information out of the java code during parsing is below.

27

Figure 4.4 – Caller Method

4.1.4 Test Case Detection Procedure

We used Microsoft Visual Studio with a SrcML.net plugin to program the

source code that powers our parsing program. We then applied the algorithm to a

body of thirty sensor driven open source Android applications downloaded from

repositories such as GitHub [33]. The complete list of applications and their download

sources can be found in Appendix B. All applications were downloaded and stored in

a single folder that would serve as a root directory, or starting point, for our

algorithm. To execute the program we used Visual Studios’ “Run Tests” feature, at

which time the program would display sensor types, implementations of

onSensorChanged, as well the children functions for onSensorChanged for each

application stored within the root directory. Figure 4.5 is an example output

displayed after the program has completed.

28

Figure 4.5 – Parsing Algorithm Output

4.2 Data Collection (SenSee)

SenSee is an Android application created by Virginia State staff and students

using the same rules applied for test case detection [34]. The basic principle behind

SenSee is to allow a user to perform a series of actions or tasks using Android sensor

data, while at the same time allowing him or her to record and tag those actions in

order to provide some ground truth for the data that is being collected. We used it to

establish the number of steps actually taken by an individual during our evaluation of

a pedometer application. Using the SenSee’s tag feature we were able to identify

where each step or set of steps accrued when evaluating the sensor data, and thus

effectively eliminating the oracle problem. Figure 4.6 below illustrates the real world

step tags recorded when collecting sensor data.

29

Figure 4.6 – Accelerometer Sensor Data with Tag Lines

The frame work that powers SenSee is very similar to the frame work that

powered our test case detection algorithm discussed earlier in this paper. The

difference is that SenSee doesn’t use the onSensorChanged function to search for

test cases, instead it uses it to hijack and manipulate sensor data that is sent to any

application that has permission to it. Sense is a standalone application that does not

have to integrate with any other application or relay on outside code, thus allowing

us to perform two tasks. The first is to test the Android sensors themselves. Because

SenSee captures raw input data from the devise sensors, developers can see if

specific sensors are producing the correct readings before using that sensor data as

input into another application. This is a simple quality control measure. Inputting

corrupt or incorrect sensor data will cause an application to either crash or produce

incorrect results. The second task

30

SenSee allows us to do is to control what data is sent to a particular application. This

ability opens the door for metamorphic testing for Android platforms and is the focus

of this thesis.

Figure 4.7 – SenSee Capture and Transform Diagram

4.2.1 Data Collection Procedure

To collect data we used 3 participants, consisting of male and female, using 3

Android devices all running SenSee. This was done insure that Sensor data could be

recorded over multiple Android devices as well as confirm the pedometer app being

test could handle both male and female walking postures. Our participants walked a

predefined number of steps while the Android device recorded all accelerometer

sensor data along the way. SenSee stores all recoded data as a CSV file which is then

taken from the device and stored on a computer running a virtual copy of SenSee,

Via Android Studio, where it then can be feed into any Android application, in our

case we used an open source pedometer application.

31

4.3 Error Detection

The overall goal of this study is to detect errors using metamorphic testing,

but to do that we must first define what an error is. The term error in the field of

computer science can refer to many things, but we are focused on two types of

errors.

The first is a programing error. This an error that exists within the code that

leads to bugs or unintended glitches. Almost all software contains errors with it’s

source code with varying degrees of disruption to the overall function of the software.

In order to find these bugs you must first determine that they exist. This is harder to

achieve in some software than it is in others. Simple applications generally contain

less lines of code and have less dependency on external functions to operate, so

finding a programing error if one exists is much easier. Larger and more complex

pieces of software, the Windows Operating System for example, can contain millions

of lines of code within thousands of functions that all depend upon one another to

perform correctly. Finding errors in environments such as these is far more difficult. If

these errors persist they can lead to a dramatic fluctuations to our second type of

error.

Threshold error is the amount of incorrect results a particular software can

handle before failing. For example, if a software can be up to 20% incorrect and still

be considered effective, that software has a 20% error threshold, thus that software

must be correct 80% of the time or higher to achieve that threshold. This number can

vary greatly between software depending on the software application. Nuclear power

plants or flight control systems contain software that meets a much higher error

threshold, because the cost of failure can be catastrophic. In general the amount of

threshold error a software produces is a direct consequence to the number of

program errors that are contained with its source code. To combat this we must

either detect or take steps to minimize any logical or computational errors that may

exist.

32

4.4 Applied Metamorphic Transforms

In order to detect these errors we applied a serious of metamorphic

transforms to our Android application. Because our application is a step detections

application that uses a devices onboard accelerometer sensor, we can use SenSee to

alter the data being received by the application itself. To better display the

transforms effects on our sensor data we will be comparing the results from one of

our data sets. This particular data set only recorded 10 steps so it should be easier to

follow. The original accelerometer values for this data set is shown figure 4.8, which

shows the values for X,Y, and Z; as well as the positive average over all the axis. This

average is what the step detection algorithm uses to determine a step.

Figure 4.8 – Original 10 Step Data Set

Original Data30252015105

0 -5 -10 -15 -20 -25

136

131

126

121

116

111

106

101

96

91

86

81

76

71

66

61

56

51

46

41

36

31

26

21

16

11

6 1

X Axis Y Axis Z Axis Pos Avg

33

4.4.1 Multiplicative Transforms

The first transforms we applied were a serious of multiplicative transforms, in

our case we multiplied the accelerometer input values by two. At first we multiplied

all three axis by two, this resulted in higher peaks across all the axis and thus a

higher average peak, causing the algorithm to count more steps then were actually

taken. Next limited the multiplication to only one axis, in our case to the z axis. The

algorithm still counted a high number of steps, but was 15% less than multiplying all

3 axis by a factor of 2. This can be a powerful tool for source code error detection.

Multiplying data by a constant allows developers to stretch their algorithms to the

breaking point, proving incite on how much alteration or out laying data points it can

handle before failing.

Figure 4.9 – Multiplicative Transform on 10 Step Data Set

Multiply All Axis by 26050403020100 -

10 -20 -30 -40 -50

136

131

126

121

116

111

106

101

96

91

86

81

76

71

66

61

56

51

46

41

36

31

26

21

16

11

6 1


34

4.4.2 Interpolating Transforms

This transform is simply taking the average of every adjacent pair of numbers

with in the data array and averaged them together. The resulting average was then

inserted in between those two numbers. This smoothed out the data resulting in

smaller peaks, but not to such a degree that the algorithm could no longer perform

peak detection. The result was a higher degree of accuracy and allowed for a lower

threshold error across almost all data sets tested. This transform can be an

invaluable tool to help developers eliminate noise or unwanted data from their data

sets, it is a poor tool for error detection because of its tendency to mitigate them.

Figure 4.10 – Interpolating Transform on 10 Step Data Set

Interpolating Transform30252015105

0 -5 -10 -15 -20 -25

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

151

161

171

181

191

201

211

221

231

241

251

261

271


4.4.3 Adding Avg Noise Transforms

This transform finds the overall average of a particular set of data, in our case

the X,Y, and Z axises, then adds that average value to every number in data set. This

rises the overall

35

average of the data set as a whole, while flattening the data set at the same time.

This method doesn’t provide the same threshold accuracy result interpolating does,

but it still produces a noticeable improvement. The effectiveness of this Transform as

an error detection tool is largely passed on the metamorphic properties of the

software in question. If your software relies on data that has a wide range of both

large and small numbers being a specific distance from each other, this transform

can be used to test how fare or close those numbers can before your algorithm fails.

For example our pedometers peak detection algorithm contains a statement that

checks to see if the last peak recorded is at least two thirds as high as the current

peak, if yes count as a step, if no discard as walking motion noise.

Figure 4.11 – Add Average Noise Transform on 10 Step Data Set

Add Avg Noise20

15

10

5

0

-5

-10

-15

136

131

126

121

116

111

106

101

96

91

86

81

76

71

66

61

56

51

46

41

36

31

26

21

16

11

6 1


4.4.4 Down Sampling Transforms

36

Down Sampling is perhaps the ultimate test for evaluating how effective a

sensor based algorithm is. It does nothing to improve overall threshold accuracy of a

software in most cases, but as an error detection tool it can provide a great deal of

information. This transform can be used to evaluate how much data can be lost

before an algorithm’s performance starts to decay. During our testing we down

sampled data sets 50 percent, effectively reducing the number of accelerometer

input values being fed to the application by half. This greatly reduced the accuracy of

all results produced, but because of it ability to introduce unknowns into your

algorithm, it can be a great tool for error detection. Forcing developers to do more

with less data or applying transforms that help to improve overall threshold accuracy

such as the interpolating transform.

Figure 4.12 – Down Sampling Transform on 10 Step Data Set

Down Sample 50%25

20

15

10

5

0 -5 1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769

-10

-15

-20

-25


4.4.5 Semantical Transforms

Perhaps the straightest forward transform for error detection, semantic

transforms simply apply a mathematical property to existing data in such a manner

that should result in the exact same data. These methods can range from multiplying

by 1 or adding 0 to applying Sin or cosine properties or applying a matrix transforms

to your data set. The method in which you

37

apply this transform can very based on testing needs, but the result should always be

the same. If your data changes over the course of this transform, your software is

fundamentally flawed.

Figure 4.13 – Semantical Transform on 10 Step Data Set

Semantic30

20

10

0

-10

-20

-30

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

109

115

121

127

133


4.5 Fault Seeding Study

In order to evaluate the effectiveness of metamorphic testing and its

transforms for error detection, we used a method called fault seeding. Fault seeding

is simply the introduction of known errors into software source code, in our case we

are introducing a number of errors into the step detection algorithm with in the

pedometer application. In order to evaluate against a wide range of possible real

world errors, we enlisted the help of several graduate students and professors with in

the computer science department at VSU.

38

We gave our participants a set of instructions, which can be found in Appendix

D, asking them to introduce several computational or logical errors into a serious of

functions that govern the pedometer’s step detection algorithm. Using the original

source code as a base case, we first recorded all the results produced by the original

code using both raw unmodified input sensor data, and morphed transform data. This

was simply a matter of recording the number of steps the algorithm calculated after

a particular transform or transforms were applied. These results were then compared

to the results of the corrupted code after the same set of transforms were applied.

The full list of results can be found in Appendix B.

39

CHAPTER 5 – EVALUATION

5.0 Study Recap

Over the course of this endeavor he have created several unique tools and

methodologies for Android developers to find and create test cases for any given

sensor driven applications. Our objective was to determine what testing strategies

are being deployed by indie developers today, as well as conclude if metamorphic

testing is possible on Android platforms, and if so evaluate is effectiveness. The final

evaluation of these systems is outlined below.

5.1 Test Case Detection Results

After applying our parsing algorithm to a body of thirty open source

applications, we found that all most all of them fail to perform some sort of internal

testing. The complete results sheet of this analysis cab be found in Appendix C. Only

three Android created test case were found, as well as 3 user defined test cases. All

six of the detected test cases were located amongst three application. Thus only 10%

of the applications we tested contained some sort of internal testing functionality.

This may be due to the fact that our pool of applications are in fact open source. If we

applied out algorithm to a body of paid closed source applications such as

“FaceBook” or “Clash of Clans”, my hypotheses is that we would detect far more

internal test case.

To further validate our results we compared our finding to that of a much

larger test case detection study conducted at Singapore Management University [35].

Using a pool of over 600

Android applications collect from 2 online repositories, F-Droid [37] and GitHub [33], these

researchers concluded that only 14% of the applications evaluated contained test cases. These

findings are very close to our own conclusion of 10%. This Study went one step

further and also found that only 9% of the apps that have executable test cases have

coverage above 40%. This means that less than 1% of open source Android

application contain test cases that examine more than half of its source code.

5.2 Initial Transform Results

In order to be certain our transforms were working as intended we applied

them to a serious of incoming sensor data, and them checked the new resulting to

determine if the correct mathematical operations had been applied. This transformed

data was them feed into our step detection application. The number of steps

detected was then recorded. During this stage we could evaluate what effect each

transform had on overall threshold accuracy of the application, and thus its overall

performance. Some transforms greatly increased accuracy over all data sets tested

while other had adverse effects. For example transforms that applied data averages

to the data set as a whole tended to decrease the error threshold, which in turn

increased accuracy. Other transforms such as adding noise tended to decrease over

all accuracy. All of these results were used as a base case for our error detection

experiment. The complete transform analysis can

be found in Appendix F.

41

Figure 5.1 – Base Line Pedometer Results before Transforms

Base Line Data

# of Steps Device Steps Calculated WhenData Set Name Actually App Sensitivity = DefaultUsedTaken Sensitivity

Marcos10Hip.csv 10 Note 3 14Phone

Marcos50Hip.csv 50Note 3

49Phone

Cece50StepsV2.csv 50 Note 3 68Phone

CeCe100StepsHip.csv Galaxy S350 47

Cece50StepsHipS3.csv 25 Android 21Tablet

Each “.csv” contains several thousand points of accelerometer data recorded

using SenSee. The table above simply depicts the resulting steps the pedometer

application calculated after each data set was processed, and compares it to the

number of steps actually taken. The next figure (Figure 5.2) illustrates the

application’s calculated steps using transforms data sets as input.

42

Figure 5.2 – Pedometer Application Results for each Transform

Calculated Steps After Transform

Multipl Multipl Convert Add Down BaseInsert To InterpolData Set Name y all By y Z Axis Semantic Avg Sample LineNoise Rounde atingTwo By Two Noise 50% Shiftd Noise

Marcos10Hip.csv 14 12 75 11 10 10 4 10 11

Marcos50Hip.csv 98 72 450 53 49 56 42 52 54

Cece50StepsV2.csv 70 60 251 44 44 39 18 44 40

Cece50StepsHipS3.c 86 70 336 49 47 51 29 47 52sv

accelerometer.csv 68 51 5370 11 21 12 26 24 17

As you can see some transforms, like insert noise, causes the accuracy of

calculated steps to fall dramatically for all data sets. While others such as adding

average noise increase overall accuracy for the given data sets. Now that we have

these transform results, we can use them as a new base line in order to detect errors

or changes either within the existing source code or future interactions of it.

5.3 Fault Seeding/Error Detection Results

After determining a transform base line for our original source code, we then

evaluated our transforms by introducing errors or defects into the application’s

source code. Some of these errors were small in scale and were only detected with

transforms that altered the input data on large scale, while other defects caused the

applications to fail all together. Figure 5.3 shows an

43

example of how the transforms are effected after an error has been introduced into

the source code. In this particular situation the error was small. A mathematical

operation has changed from addition to subtraction. The corrupted source code still

calculated 10 out of 10 steps. Using traditional white-box testing this error may have

gone unnoticed, but by applying several transforms to the input data, 2 of those

transforms (Inserting Random noise & Average Noise) returned results that did not

match out base line transforms, thus revealing an error or change in the code.

Figure 5.3 – Transforms Results after Introducing an Error

Calculated Steps After Transform

Multiply Multiply Insert Convert To Seman Add Down Inter Baseall By Z Axis By Random Rounded Avg Sample polat LineticTwo Two Noise Noise Noise 50% ing Shift

OriginalTransfo 14 12 75 11 10 10 4 10 11rmResult

Resultwith 14 12 77 10 10 9 4 10 11Error

After discarding the Random noise transform due the fact it will almost always

produce a result different from the base line, we are left with one transform that was

able to detect this particular error. High lighting the fact that even after applying a

wide range of metamorphic transforms you still may not be able to detect every

error, however this is a fare better option than traditional white-box testing. In this

case a traditional unit test would have more than likely passed this particular source

code if it did not employ some sort of metamorphic functionality. As a developer if

you want to increase the rate of error detection you are left with two options.

44

Either apply more transforms to your applications input data, for example we could

have applied twenty-five transforms instead of nine; or you can apply transforms that

better exercise your source code’s computations. The later solution requires

developers to have a concreate understanding of how their source code works. After

this understanding is achieved how you know what transforms should be applied.

Over the course of this project we have discovered several uses for our

particular transforms and how they may be best applied to other scenarios. We have

compiled this knowledge into a taxonomy that can be found below.

5.4 Full Transform Taxonomy

1) Multiplicative Transforms: Multiply numerical values by a constant

This Transform is relatively simple. You should know what the outcome should be. This transform is very good a testing for what is called limit errors in your software. If your program can only handle an 8 bit number and multiplying by a large content results in a 9 bit number your program will either dismiss the last bit or fail all together depending on the machine. The same can be down by multiplying large decimal numbers to your software to see how many decimal points your program can calculate before failing.

We applied this method to our app by multiplying all our accelerometer data points by a factor of two. We did not encounter any limiting errors with in the app, however using this transform we discovered that the algorithms step detection becomes less and less reliable the higher the accelerometer values are.

2) Insert Random Noise Transforms: This Transform Adds a noise value at complete random inserted in between every

If your algorithm needs to cancel out unneeded or useless data, applying this transform is a good way to test if your software can effectively handle the insertion of large spikes in your data. For example if you need to ignore all data that is above or below a certain threshold, but what if some of the random data is within the limits of that threshold, this will serve to corrupt your data. Determining the most effective threshold limit for such algorithms is where this metamorphic test shines.

The app we applied this transform to, did not have no such method. 45

3) Convert to Rounded Noise Transforms: This Transform modifies all the existing array values by converting them to some value plus or minus 1 from the original data.

Instead of inserting noise into the data set this transform converts the existing data. The transform will only change the number to a value no higher or lower than a value of one from the original number. This transform is great to see how your system handles small fluctuations or errors in your data. Many applications require a human input, these input are not always the most accrete so your system should be able to handle these errors.

4) Semantic Transforms: This Transform creates inputs that are have the same “meaning” as the original.

This is a very simple but effective method to check to see if your seemingly correct outputs are actually correct mathematically. By applying a mathematical function to your data that should result in the exact same output, such as multiplying by Cos(45), is effective in finding “order of operations” errors and other common mathematical mistakes.

When we applied this transform to our data set the resulting data was the same, thus we were able to conclude the application had no obvious misuse of mathematical operations.

5) Interpolation: Transform that adds average noise to the data set in between each original data point.

This method works by finding the average of two consecutive numbers than inserting that average value in-between those numbers. This Transform helps to guard against small errors or inconsistencies in your data, much like the “converted to rounded noise transform”. Thus this method will determine if your data set is reliable.

Applying this transform to our step detection app improved apps results by a factor of 20% on average. So if software engineers want to make their products more reliable, this would be a good place to start.

46

6) Down Sampling Transforms: This Transform down sizes the array by deleting a certain percentage of the values.

This deletes a certain percentage of data point with your data set, for example we deleted 50% of the data points when testing on the step detection app. This can do many things. If your data set collects data rapidly, at a rate of 500 date points a second for example, your system may be able to handle a 50 percent cut in data point and still be able to perform well. If your data doesn’t rapidly collect data, than the results will be more corrupted. So if a software engineer wants to know how many data points he can loss before his system starts becoming unreliable this transform is a good tool to have. Knowing this can allow him/her to either increase the number of data points collected within a given time frame, or combat the loss of data by using anther transform such as interpolation.

7) Add Average Noise Transform: This Transform adds the average value of a data set to every point in the data set.

This Transform is similar to the interpolation transform, but instead of inserting averages in-between 2 data points, this transform adds the average value of the data set to each point in the data set.

9) Add Average Noise Transform: This Transform moves the base line of the input data based on defined Rise and Run values.

5.5 Discussion

Most software testing practices today use a set of test cases constructed on

some predefined criteria in order to evaluate if a software’s processes are being

executed correctly. These methods take some input data, run it through the program,

then check to see if the resulting output is correct, if it is that test is considered

“passed”. The Android operating system uses the java programing language, which is

known for its large library of functions, to include testing functions that use this “test

case” frame work. So we performed our own empirical study

47

in order to determine what testing techniques are being used by current sensor

driven Android applications today.

5.5 Study Limitations

Although we successfully engineered a method for developers to apply

metamorphic testing to all sensor driven Android devices, our study did have some

limitations. The first of which pertains to our parsing algorithm. When searching for

developer created test cases, the algorithm returns all the functions within a source

code that may performing testing. However to quickly analyze rather or not a

function is testing function we are relaying on the programmer to name that function

as a test. If the testing function is not correctly named it becomes much harder to

decipher the true purpose of that function, and usually requires a manual inspection

of the code to determine its purpose.

The second limitation evolves the Android applications tested during the test

case detection study. Because many of these applications were pulled from open

source repositories such as GitHub, they are often created with no commercial

purpose in mind, and thus many of these applications require a very low degree of

reliability and accuracy, thus this may be way our parsing algorithm returned a very

low number of test case. If this algorithm was applied to a set of commercial

applications such as “FaceBook” or “Google Maps”, we find a much higher number of

test cases. These applications are closed source, and as of the date of this study, the

means to get the source code for these applications are either illegal or very

expensive.

48

CHAPTER 6 - SUMMARY

6.0 Summary

As our technological advances increase, new problems will arise for which

there is no current answer. As these problems grow in size and complexity, so too will

the computer programs needed to compute them, but with this growth comes the

possibility for more software errors. Developing new tests to detect these errors will

become more and more difficult on an exponential scale, but perhaps Edsger W.

Dijkstra but it best by stating “Program testing can be used to show the presence of

bugs, but never to show their absence” [12]. Creating a perfect program is nearly

impossible, but if testing advanced testing metrics like metamorphic testing we can

get pretty close. There has been many advances in the field of static error detection,

programs with known inputs and outputs. These advance include methods like

symbolic execution and model checking, even winning a Turing Award in 2007 [35],

but there has be relatively little advancement in dynamic error detection. As we rely

more on computer systems to calculate more complex unknowns, the testing metrics

used to evaluate these systems must also evolve. Problems such as the oracle

problem will be key if we hope to produce reliable independent software.

Our objective was to evaluate and provide a means which Android developers

may use to better their applications through the use of metamorphic testing. This

study concluded that metamorphic testing is not only possible but feasible, and

provided a means to universally apply it to all sensor based Android applications.

6.1 Recommendations for Future Research

My recommendations for future research would be to expand on more

transforms that we did not get to cover in this study, and evaluate them on a more

complex Android application.

50

APPENDIX A – Parsing Algorithm Code

namespace CodeAnalysisToolkit{

[TestFixture]public class SimpleAnalyticsCalculator_Thesis{

//------Test Case Class---------------------------------------------------

[TestCase]public void CalculateSimpleProjectStats(){

int NumOfApps = 30;

//-----------Current Working Method to Get sub directories -----------

// Get list of files in the specific directory.string[] TopDirectories = Directory.GetDirectories(@"C:\

School\Grad School (Comp Sci)\Thesis\Apps\","*.*", SearchOption.TopDirectoryOnly);

// Display all the files.//for (int i = 0; i <= NumOfApps; i++)//{

//Console.WriteLine(TopDirectories[i]);

//}

//Print out all Top Sub Directoies for Specified Path //foreach (string file in TopDirectories)//{// Console.WriteLine(file); //}

//----------End of Print Sub directory Method-------------------------

for (int i = 0; i < NumOfApps; i++){

var dataProject = new DataProject<CompleteWorkingSet>(TopDirectories[i],

Path.GetFullPath(TopDirectories[i]), "..//..//..//SrcML");

Console.WriteLine();

Debug.WriteLine("#############################################");Debug.WriteLine("Parsing " + TopDirectories[i]);

dataProject.UpdateAsync().Wait();

51

NamespaceDefinition globalNamespace; Assert.That(dataProject.WorkingSet.TryObtainReadLock(5000, out

globalNamespace));

DisplaySensorTypes(globalNamespace);//DisplayWhetherAppIsUnitTested(globalNamespace);DisplayCallsToOnSensorChanged(globalNamespace);//GetTypeForKeyword(globalNamespace);DisplayTestCaseClasses(globalNamespace);

}}

//-------Display Sensor Type Class----------------------------------------

private void DisplaySensorTypes(NamespaceDefinition globalNamespace){

var getDefaultSensorCalls = from statement in globalNamespace.GetDescendantsAndSelf()

from expression instatement.GetExpressions()

from call in expression.GetDescendantsAndSelf<MethodCall>()

where call.Name == "getDefaultSensor" select call;

foreach (var call in getDefaultSensorCalls){

if (call.Arguments.Any()){

var firstArg = call.Arguments.First(); var components = firstArg.Components; if (components.Count() == 3 &&

components.ElementAt(0).ToString() == "Sensor" && components.ElementAt(1).ToString() == ".")

{Debug.WriteLine("sensor

" + components.ElementAt(2).ToString() + " found");

}}

}}

//-------Display If this class has a Unit test----------------------------

private void DisplayWhetherAppIsUnitTested(NamespaceDefinition globalNamespace)

{

var testClasses = from klas in globalNamespace.GetDescendants<TypeDefinition>()

where klas.GetParentTypes(false).Any(t => t.Name == "ServiceTestCase")

select klas;

if (testClasses.Count() == 0)

52

{Debug.WriteLine("This File Does not contain any tests");

}else{

Debug.WriteLine("----- ");Debug.WriteLine("\r\n");Debug.WriteLine(testClasses.Count() + " TestClasses ");Debug.WriteLine("----- ");

foreach(var testClass in testClasses){

Debug.WriteLine(testClass.GetFullName() + " is a test class");}

}}

//-------Display If ActivityUnitTestCase test-----------------------------

---------------------------------

private void DisplayTestCaseClasses(NamespaceDefinition globalNamespace){

var testClasses = from klas in globalNamespace.GetDescendants<TypeDefinition>()

where klas.ParentTypeNames.Any(t => t.Name.Contains("ActivityUnitTestCase") ||

t.Name.Contains("ServiceTestCase") ||

t.Name.Contains

("ApplicationTestCase") ||

t.Name.Contains("ProviderTestCase2")

|| t.Name.Contains("LoaderTestCase")

||

t.Name.Contains("ActivityInstrumentationTestCase2")) select klas;

if (testClasses.Count() == 0){

Debug.WriteLine("This File Does not contain any test caseclasses");

}else

{

Debug.WriteLine("----- "); Debug.WriteLine("\r\n");Debug.WriteLine(testClasses.Count() + " Test Classes found "); Debug.WriteLine("----- ");

foreach (var testClass in testClasses)

53

{Debug.WriteLine(testClass.GetFullName()); //foreach(var parent in testClass.ParentTypeNames) //{// Debug.WriteLine("parent: " + parent); //}

}}

}

//-------Display Calls to OnSensorChanged Class---------------------------

---------------------

private void DisplayCallsToOnSensorChanged(NamespaceDefinition globalNamespace)

{var senChangedMethods = from method

in globalNamespace.GetDescendants<MethodDefinition>()

where method.Name == "onSensorChanged" select method;

if (senChangedMethods.Count() == 0){

Debug.WriteLine("This File Does not contain any Sensor Change

Mehtods");}

else{

Debug.WriteLine("----- "); Debug.WriteLine("\r\n");Debug.WriteLine(senChangedMethods.Count() + "

Implementations of " + senChangedMethods.First().GetFullName());Debug.WriteLine("----- ");

int n = senChangedMethods.Count(); for (int i = 0; i < n; i++){

var senChangedMethod = senChangedMethods.ElementAt(i); Debug.WriteLine("Implementations of onSensorChaged # " + (i +

1) + ": " + senChangedMethod.GetFullName());

//"GetCallsToSelf" returns the number of times the number is

calledvar callsToSenChanged =

senChangedMethod.GetCallsToSelf(); for (int j = 0; j < callsToSenChanged.Count(); j++){

var callerMethod = callsToSenChanged.ElementAt(j).ParentStatement

.GetAncestorsAndSelf<MethodDefinition>(); if (callerMethod.Any()){

54

Debug.WriteLine(" Called by --> " + callerMethod.ElementAt(0).GetFullName());

}}//Debug.WriteLine("----- ");

}} //End of Else does not Equal 0 Check

}

55

APPENDIX B

List of Apps Used in Test Case Detection Study

Android-Compass URL No Longer Available

Android-pedometer https://github.com/bagilevi/Android-pedometer

GlassSensorTest https://github.com/lnanek/GlassSensorTest

KineticSensors https://github.com/sebLopezCot/KineticSensors

My-StepCounter https://github.com/MichaelJames6/My-StepCounter

Pedometer https://github.com/phishman3579/Android-pedometer

TiltPong https://github.com/mah68/TiltPong

Tilt-snake Co URL No Longer Available

satstat https://github.com/mvglasow/satstat

cartsbusboarding https://github.com/carts-uiet/cartsbusboarding

ThermometerExtended2 https://github.com/mateuszbuda/ThermometerExtended2

Android-sensorium https://github.com/fmetzger/Android-sensorium

Community Compass https://bitbucket.org/alekseyt/compass/downloads

getback_gps https://github.com/ruleant/getback_gps

sosmobileclient https://github.com/52North/sosmobileclient

org.thecongers.mtpms https://github.com/kconger/org.thecongers.mtpms

SAnd https://github.com/kas70/SAnd

sensorreadout https://github.com/onyxbits/sensorreadout

pushup https://github.com/pjq/pushup

pushup_counter https://github.com/lyahdav/pushup_counter

56

Nhundredthings (Push up Counter) https://github.com/nkijak/nhundredthings

audio detection https://github.com/twrobel3/RightHear

AudioRecorder https://github.com/railskarthi/AudioRecorder

Android-AudioRecorder https://github.com/Uncodin/Android-AudioRecorder

Altimeter https://github.com/jkozerski/Altimeter

Altimeter https://github.com/efalk/Altimeter

face-recognition https://github.com/thelinmichael/face-recognition

Recognize Facial Expression https://github.com/chinmaykrishna/FacialRecognition

QRCodeReaderView https://github.com/dlazaro66/QRCodeReaderView

accelerometer-app to learn Eating https://github.com/analogjedi/accelerometer-appPatterns

57

APPENDIX C

Results from Test Case Detection Study

Sand App

Description: Uses your phones sensors (barometer and compass) to show your current orientation, height and air pressure.

Analytics Output

Parsing C:\School\Grad School (Comp Sci)\Thesis\Apps\SAnd-master

sensor TYPE_ORIENTATION found sensor TYPE_PRESSURE found-----1 Implementations of com.platypus.SAnd.MainActivity.onSensorChanged-----Implementations of onSensorChaged # 1: com.platypus.SAnd.MainActivity.onSensorChanged-----1 Test Classes found-----com.platypus.SAnd.ApplicationTest

Conclusion:

onSensorChanged – No testing of sensor computation was performed with in this function.

ApplicationTest - No Testing was actually performed in this test call, Perhaps the developers had planned to perform some testing in the future, but in this version the function call is empty.

58

Cartsbusboarding App

Description: Communication Assisted Road Transportation System. Bus Boarding Event Detection Module.

Analytics Output

Parsing C:\School\Grad School (Comp Sci)\Thesis\Apps\cartsbusboarding-master

sensor TYPE_ACCELEROMETER found-----1 Implementations of in.ac.iitb.cse.cartsbusboarding.acc.AccListener.onSensorChanged-----Implementations of onSensorChaged # 1: in.ac.iitb.cse.cartsbusboarding.acc.AccListener.onSensorChanged-----2 Test Classes found-----in.ac.iitb.cse.cartsbusboarding.test.ApplicationTestin.ac.iitb.cse.cartsbusboarding.test.acc.FeatureCalculatorTest

59

Conclusion

onSensorChanged – No testing of sensor computation was performed with in this function.

ApplicationTest - No Testing was actually performed in this test call, Perhaps the developers had planned to perform some testing in the future, but in this version the function call is empty.

FeatureCalculatorTest – This file does contain testing, even some degree of metamorphic testing by using the average and standard deviations of the sensor data to the accuracy of his results.

60

61

APPENDIX D

Fault seeding instructions Sheet

Introduction: Your goal is to introduce some errors with in the provided code. These errors can be both computational and logical. The purpose of this experiment is to identify your bug using a process called metamorphic testing, a process were we attempt to identify a fault that exists in a piece of software by transforming the properties of its input data. This is done by taking advantage of the mathematical properties that exist in most software’s allowing us to transform the input data in manner that will produce a predictable result. If the result is different, then we have detected a flaw. The errors that you introduce will help us determine if our transforms are adequate for detecting real bugs and mistakes a developer may make. If you can create a bug we cannot detect than, we will have discovered a problem we have not for seen, and thus will allow us to create a transform to detect that it.

The Code: The code we have provided you is the step detection function for an Android prodometer application. This function works by adding up the X,Y, and Z values from the Android accelerometer sensor and storing it into a value named “vSum”. This value also has some additional calculations applied to it so account for things like earth’s gravity and magnetic field. “vSum” is than divided by three and stored in a value called “v”. This “v” variable is used to calculate steps. There is a serious of loops that checks to see if “v” has reached a certain threshold, if yes then the algorithm counts a step, if not then the algorithm considers this data to be motion noise and ignores it.

We have provided an excel spread sheet of the “v” Value graphed in order to give a visual representation. Generally every peak represented on the graph should be a step counted by the algorithm.

Instructions: Make some changes to the existing code. You are free to add or remove any code, but remember the purpose is not to break the code to the point of uncompilability, but to instead introduce a bug that is either app breaking or subtle

62

enough to get passed a testing team, either way the code must compile in order to apply our transforms.

Examples:

∑ change the constant values used for mathematical computation ∑ Changes the conditions in for loops∑ Delete or add condition statements

Excel Chart: This can also be found in the attached Excel Spread Sheet.

v = vSum/3280

270

260

250

240

230

220

210

1 16 31 46 61 76 91 10 6 12 1 13 6 15 1 16 6 18 1 19 6 21 1 22 6 24 1 25 6 27 1 28 6 30 1 31 6 33 1 34 6 36 1 37 6 39 1 40 6 42 1 43 6 451

63

APPENDIX E

Complete Base Line Transform Analysis

Green boxes = Results that are more than 70% Accurate

64

BIBLIOGRAPHY

[1]T. Chen, S. Cheung and S. Yiu, Metamorphic Testing: A New Approach for Generating Next Test Cases. Hong Kong: Department of Computer Science Hong Kong University, 1998.

[2]G. Kaiser and F. Su, 'Finding Bugs in Machine Learning, Data Mining and Big Data Applications | Programming Systems Laboratory', Psl.cs.columbia.edu, 2015. [Online]. Available: http://www.psl.cs.columbia.edu/64/metamorphic-testing/. [Accessed: 17- May- 2015].

[3] Istqbexamcertification.com, 'What is Software Testing?', 2015. [Online]. Available: http://istqbexamcertification.com/what-is-a-software-testing/. [Accessed: 06- May- 2015].

[4] Istqbexamcertification.com, 'What is Test design? or How to specify test cases?', 2015. [Online]. Available: http://istqbexamcertification.com/what-is-test-design-or-how-to-specify-test-cases/. [Accessed: 10- May- 2015].

[5]E. Barr, M. Harman, P. McMinn, M. Shahbaz and S. Yoo, The Oracle Problem in Software Testing: A Survey. IEEE TRANSACTIONS ON SOFTWARE ENG, 2015.

[6]J. King, Symbolic Execution and Program Testing. IBM Thomas J. Watson Research Center, 1976.

[7]Msdn.microsoft.com, 'Unit Testing', 2015. [Online]. Available: https://msdn.microsoft.com/en-us/library/Aa292197%28v=VS.71%29.aspx. [Accessed: 19- May- 2015].

[8]agile.csc.ncsu.edu, 'White-Box Testing', 2015. [Online]. Available: http://agile.csc.ncsu.edu/SEMaterials/WhiteBox.pdf. [Accessed: 23- May- 2015].

[9]S. Webmaster, 'What is Simulation - Simulation Software Explained', Simul8.com, 2015. [Online]. Available: http://www.simul8.com/products/what_is_simulation.htm. [Accessed: 17- Jul- 2015].

72

[10] Softwaretestinghelp.com, 'What is Integration Testing and How It is Performed? — Software Testing Help', 2015. [Online]. Available: http://www.softwaretestinghelp.com/what-is-integration-testing. [Accessed: 20- Jun- 2015].

[11]C. Pasareanu, 'Symbolic Execution and Model Checking for Testing', YouTube, 2015. [Online]. Available: https://www.youtube.com/watch?v=azTVEwxN8zM. [Accessed: 02- Jun- 2015].

[12]E. Dijkstra, 'E.W. Dijkstra Archive: Structured programming (EWD268)', Cs.utexas.edu, 2015. [Online]. Available: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD268.html. [Accessed: 10- Jun- 2015].

[13]P. Boonstoppel, C. Cadar and D. Engler, Attacking Path Explosion in Constraint-Based Test Generation. Computer Systems Laboratory, Stanford University.

[14]M. Chair, P. Schaumont and P. Plassmann, Strategies for Scalable Symbolic Execution-based Test Generation. Blacksburg, Virginia: Virginia Polytechnic Institute and State University Department of Computer Engineering, 2010.

[15]G. Tassey, The Economic Impacts of Inadequate Infrastructure for Software Testing. Gaithersburg: National Institute of Standards and Technology, 2002.

[16] J. Burnim and K. Sen, \Heuristics for Scalable Dynamic Test Generation," in Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on,pp. 443{446, September 2008.

[17] Cs.cmu.edu, 'Model Checking at CMU', 2015. [Online]. Available: https://www.cs.cmu.edu/~modelcheck/. [Accessed: 20- Jun- 2015].

[18]X. Xie, J. Ho, C. Murphy, G. Kaiser, B. Xu and T. Chen, Testing and Validating Machine Learning Classifiers by Metamorphic Testing. National Institutes of Health, 2011.

[19]A. Smola and S. Vishwanathan, INTRODUCTION TO MACHINE LEARNING. University of Cambridge, 2008.

73

[20]Z. Zhou, D. Huang, T. Tse, Z. Yang, H. Huang and T. Chen, Metamorphic Testing and Its Applications. Hong Kong: International Symposium on Future Software Technology, 2004.

[21] The Independent, '42: The answer to life, the universe and everything', 2011. [Online]. Available: http://www.independent.co.uk/life-style/history/42-the-answer-to-life-the-universe-and-everything-2205734.html. [Accessed: 20- Jul- 2015].

[22] Compliantmechanisms.byu.edu, 'Introduction to Microelectromechanical Systems (MEMS) | Compliant Mechanisms', 2015. [Online]. Available: https://compliantmechanisms.byu.edu/content/introduction-microelectromechanical-systems-mems. [Accessed: 20- Jul- 2015].

[23]N. Zhoa, Full-Featured Pedometer Design Realized with 3-Axis Digital Accelerometer.

[24]D. Beyer, T. Henzinger and G. Theoduloz, Program Analysis with Dynamic Precision Adjustment. 2015.

[25]M. Harrold, J. Offutt and K. Tewary, An Approach to Fault Modeling and Fault Seeding Using the Program Dependence Graph.

[26]F. Grigorjev, N. Lascano and J. Staude, A Fault Seeding Experience. Motorola Global Software Group.

[27] Developer.Android.com, 'SensorManager | Android Developers', 2015. [Online]. Available: http://developer.Android.com/reference/Android/hardware/SensorManager.html. [Accessed: 20- Jul-2015].

[28]T. Fundamentals, 'Testing Fundamentals | Android Developers', Developer.Android.com, 2015. [Online]. Available: http://developer.Android.com/tools/testing/testing_Android.html. [Accessed: 23-Jun- 2015].

[29] Vogella.com, 'Android application testing with the Android test framework - Tutorial', 2015. [Online]. Available: http://www.vogella.com/tutorials/AndroidTesting/article.html. [Accessed: 25-Jun- 2015].

74

[30] Developer.Android.com, 'SensorEventListener | Android Developers', 2015. [Online]. Available: http://developer.Android.com/reference/Android/hardware/SensorEventListener.html. [Accessed: 20-Jul- 2015].

[31] Srcml.org, 'What is SrcML.Net', 2015. [Online]. Available: http://www.srcml.org/about-srcml.html. [Accessed: 20- Jul- 2015].

[32] GitHub, 'abb-iss/SrcML.NET', 2014. [Online]. Available: https://github.com/abb-iss/SrcML.NET/blob/master/ABB.SrcML.Data.Test/CodeParserTests.cs. [Accessed: 20- Jul- 2015].

[33] GitHub, 'Build software better, together', 2015. [Online]. Available: https://github.com/. [Accessed: 20- Jul- 2015].

[34] SenSee Application, 2015. [Online]. Available: https://play.google.com/store/apps/details?id=sysnetlab.Android.sdc&hl=en. [Accessed: 20- Mar-2015].

[35] Amturing.acm.org, 'Edmund Clarke - A.M. Turing Award Winner', 2015. [Online]. Available: http://amturing.acm.org/award_winners/clarke_1167964.cfm. [Accessed: 03- Jul- 2015].

[36]P. Kochhar, F. Thung, N. Nagappan, T. Zimmermann and D. Lo, Understanding the Test Automation Culture of App Developers. Singapore Management University, 2015.

[37] F-droid.org, 'F-Droid | Free and Open Source Android App Repository', 2015. [Online]. Available: https://f-droid.org/. [Accessed: 23- Jul- 2015].

75

Documents

Final Draft 3 Thesis v3 Post Defense