Kai Pan, Xintao Wu University of North Carolina at Charlotte
Generating Program Inputs for Database Application Testing Tao Xie
North Carolina State University 26th IEEE/ACM International
Conference on Automated Software Engineering Nov 11, 2011 Lawrence,
Kansas
Slide 2
2 Functional Testing Test Generation Program Inputs
Background
Slide 3
3 Test Generation Program Inputs Background Database States
Functional Testing
Slide 4
4 Program inputs Database An Example
Slide 5
Motivation 5
Slide 6
Represent real-world objects characteristics, helping detect
faults that could cause failures in real-world settings Reduce cost
of generating new database records 6 Benefits to use an existing
database state
Slide 7
Dynamic Symbolic Execution (DSE) Execute the program in both
concrete and symbolic way (also called concolic testing) Collect
constraints along executed path as path condition Negate part of
the path condition and solve the new path condition to lead to new
path DSE tools for various program languages Pex for.NET from
Microsoft Research 7
Slide 8
Motivation 8 Path Condition: C1: Query construction
constraints
Motivation 10 Path Condition: C1: Query construction
constraints C2: Query/DB constraints C3: Result manipulation
constraints
Slide 11
Motivation 11 Path Condition: C1: Query construction
constraints C2: Query/DB constraints C3: Result manipulation
constraints C1 ^ C2 ^ C3
Slide 12
Motivation 12 Path Condition: C1: Query construction
constraints C2: Query/DB constraints C3: Result manipulation
constraints C1 ^ C2 ^ C3 A hard part
Slide 13
Motivation 13 How to derive high-covering program input values
based on a given database state?
Slide 14
Outline Background Approach Evaluation Conclusion and future
work 14
Slide 15
SQL query forms Fundamental structure: SELECT, FROM, WHERE,
GROUP BY, and HAVING clauses. SELECT select-list FROM from-list
WHERE qualification (GROUP BY grouping-list) (HAVING
group-qualification) 15
Slide 16
SQL query forms (contd) Nested query: a query with another
query embedded within it Nested query can be unnested into
equivalent single level canonical queries SELECT S.sname FROM
Sailors S FROM Sailors S, Reserves R WHERE EXISTS ( SELECT * WHERE
R.sid=S.sid AND R.bid=103 FROM Reserves R WHERE R.bid=103 AND
R.sid=S.sid) 16 transoformation rules A nested query Its canonical
form
Slide 17
SQL query forms of focus WHERE clause consisting of a
disjunction of conjunctions SELECT C1, C2,..., Ch FROM from-list
WHERE (A11 AND... AND A1n) OR... OR (Am1 AND... AND Amn) 17
Slide 18
Outline Background Approach Evaluation Conclusion and future
work 18
Slide 19
Illustrative example 19
Slide 20
Apply DSE on the existing database 20 Step1: DSE chooses
type=0, zip=0 executed query: Q1: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=1 AND
C.SSN=M.SSN Execution of Q1 zero record, not covering loop
body
Slide 21
Apply DSE on the existing database (contd) 21 Step2: DSE flips
type == 0 to type != 0 type=1, zip=0 executed query: Q2: SELECT
C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE
M.year=30 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q2 zero
record not covering loop body
Slide 22
Apply DSE on the existing database (contd) 22 However, An input
like type=0, zip=27694 executed query: Q3: SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=27695 AND C.SSN=M.SSN Execution of Q3 one record {C.SSN =
001, C.income = 50000, M.balance = 20000}. Covering Line14=true and
Line18=false
Slide 23
Apply DSE on the existing database (contd) 23 Furthermore, An
input like type=0, zip=28222, executed query: Q4: SELECT C.SSN,
C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=28223 AND C.SSN=M.SSN Execution of Q4 one record {C.SSN =
002, C.income = 150000, M.balance = 30000}. As a result,
Line14=true and Line18=true
Slide 24
Assist DSE to generate program inputs 24 How to derive
high-covering program input values based on a given database
state?
Slide 25
Our idea: construct auxiliary queries 25 Auxiliary query :
SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN e.g., result set includes fzip=27695. From fzip=zip+1,
we derive zip=27694!
Slide 26
Our idea: construct auxiliary queries (contd) 26 Auxiliary
query : SELECT C.zipcode, FROM customer C, mortgage M WHERE
M.year=15 AND C.SSN=M.SSN e.g., result set includes fzip=27695.
From fzip=zip+1, we derive zip=27694! Cover Line14=true and
Line18=false! true false
Slide 27
Our idea: construct auxiliary queries (contd) 27 Auxiliary
query : SELECT C.zipcode, FROM customer C, mortgage M WHERE
M.year=15 AND C.SSN=M.SSN e.g., result set includes fzip=27695.
From fzip=zip+1, we derive zip=27694! Cover Line14=true and
Line18=false! true false Act like Constraint Solver for Program
Constraints +DB State Constraints
Slide 28
Approach Collect query construction constraints on program
variables used in the executed queries from the program code
28
Slide 29
Approach (contd) Collect query construction constraints on
program variables used in the executed queries from the program
code Collect result manipulation constraints on comparing with
record values in the querys result set (such as if (diff>100000)
) 29
Slide 30
Construct auxiliary queries 30 SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=fzip AND C.SSN=M.SSN For path Line04=true, Line14=true,
construct the abstract query: true
Slide 31
Construct auxiliary queries 31 SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=fzip AND C.SSN=M.SSN For path Line04=true, Line14=true,
construct the abstract query: true Our target
Slide 32
Construct auxiliary queries 32 SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode true Construct
auxiliary query
Slide 33
Construct auxiliary queries 33 SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode FROM customer C,
mortgage M true Construct auxiliary query
Slide 34
Construct auxiliary queries 34 SELECT C.SSN, C.income,
M.balance FROM customer C, mortgage M WHERE M.year=15 AND
C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode FROM customer C,
mortgage M WHERE M.year=15 AND C.SSN=M.SSN Construct auxiliary
query true
Slide 35
Generate program input values 35 Run auxiliary query: SELECT
C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN fzip:27695 or 28223
Slide 36
Generate program input values 36 Run auxiliary query: SELECT
C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN fzip: 27695 or 28223 zip: 27694 or 28222
Slide 37
37 type=0, zip=27694 covers Line04=true, Line14=true, but
Line18=false true false Input combinations: type: 0 or !0 X zip:
27694 or 28222 Generate program input values
Slide 38
Approach (contd) Not enough! Program variables in branch
condition after executing the query may be data-dependent on
returned record values. How to cover Line18 true branch? 38
Slide 39
Approach (contd) To cover path Line04=true, Line14=true,
Line18=true We need to extend previous auxiliary query 39 true
Slide 40
Construct auxiliary queries 40 SELECT C.zipcode, FROM customer
C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to
extend?----) We extend the WHERE clause true
Slide 41
Construct auxiliary queries 41 SELECT C.zipcode, FROM customer
C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to
extend?----) We extend the WHERE clause true
Slide 42
Construct auxiliary queries 42 SELECT C.zipcode, FROM customer
C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 *
M.balance > 100000 We extend the WHERE clause true
Slide 43
Generate program input values 43 Run auxiliary query: SELECT
C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000
fzip=28223
Slide 44
Generate program input values 44 Run auxiliary query: SELECT
C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 fzip=28223
zip=28222
Slide 45
Other issues (aggregate calculation) Extend auxiliary query
with GROUP BY and HAVING clauses. 45 Involve multiple records
Slide 46
Other issues (aggregate calculation) SELECT C.zipcode,
sum(M.balance) FROM customer C, mortgage M WHERE M.year=15 AND
C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 GROUP BY
C.zipcode HAVING sum(M.balance) > 500000 46
Slide 47
Other issues (cardinality constraints) SELECT C.zipcode FROM
customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income
- 1.5 * M.balance > 100000 GROUP BY C.zipcode HAVING COUNT(*)
>= 3 Use a special DSE technique for dealing with input-
dependent loops P. Godefroid and D. Luchaup. Automatic partial loop
summarization in dynamic test generation. In ISSTA 2011. 47
Slide 48
Outline Background Approach Evaluation Conclusion and future
work 48
Slide 49
Research questions RQ1 (Effectiveness): What is the percentage
increase in code coverage by the program inputs generated by Pex
with our approachs assistance? RQ2 (Cost): What is the cost of our
approachs assistance? 49
Slide 50
Evaluation subjects Two open source database applications
RiskIt 4.3K LOC, database: 13 tables, 57 attributes, and >1.2
million records 17 DB-interacting methods selected for testing
UnixUsage 2.8K LOC, database: 8 tables, 31 attributes, and >0.25
million records 28 DB-interacting methods selected for testing
50
Slide 51
Evaluation setup Measurement for test generation effectiveness:
code coverage cost: number of runs/paths, execution time Procedure
run Pex w/o our approachs assistance perform our algorithms to
generate new additional test inputs 51
Summary of evaluation results RQ1: Effectiveness RiskIt: 26%
higher block coverage over Pex only UnixUsage: 35% higher block
coverage over Pex only RQ2: Cost RiskIt: #runs/paths: 131 more over
1135 (Pex) execution time: 517 secs more over 1781 (Pex) UnixUsage
#runs/paths: 93 more over 1197 (Pex) execution time: 580 secs more
over 1718 (Pex) 56
Conclusion A new approach that formulates auxiliary queries to
bridge gap between program/DB constraints. Act like a constraint
solver for program constraints + DB constraints Empirical
evaluations on 2 open source DB apps our approach can assist DSE to
generate program inputs effectively achieving higher code coverage
with low additional cost. 58
Slide 59
Future Work To construct auxiliary queries directly from
embedded complex queries (e.g., nested queries), rather than from
their transformed norm forms. To handle complex program context
such as multiple queries. 59
Slide 60
Acknowledgment: This work was supported in part by U.S.
National Science Foundation under CCF-0915059 for Kai Pan and
Xintao Wu, and under CCF-0915400 for Tao Xie. Thank you! Questions?
60
Slide 61
Related Work All previous related work addresses a different
problem: constructing both program inputs and database states (from
scratch) M. Emmi, R. Majumdar, and K. Sen. Dynamic test input
generation for database applications. In ISSTA, 2007. K. Taneja, Y.
Zhang, and T. Xie. MODA: Automated test generation for database
applications via mock objects. In ASE, 2010. 61