38
Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code 1 Tse-Hsun(Peter) Chen Supervised by Ahmed E. Hassan

ICDE2015PhD - Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code

Embed Size (px)

Citation preview

Improving the Quality of Large-Scale Database-Centric Software Systems by

Analyzing Database Access Code

1

Tse-Hsun(Peter) Chen

Supervised by Ahmed E. Hassan

Common performance problems in database-access code

2

DBMSCode

• Inefficient data access

Inefficient data access

3

for each userId{…executeQuery(“select … where

u.id = userId”);}

Code

SQL select … from user where u.id = 1 select … from user where u.id = 2…

Common performance problems in database-access code

4

DBMSCode

• Inefficient data access• Unneeded data access

Unneeded data access

5

DBMSCode

Requireuser data in

the code

Actual request sent

Team TableUser Tablejoin

Common performance problems in database-access code

6

DBMSCode

• Inefficient data access• Unneeded data access• Overly-strict isolation level

Overly-strict isolation level

7

DBMSCode

Reading onlyuser name

set read-write transactionselect … from user

Cannot find problems by only looking at the DBMS side

8

DBMS

select … from user where u.id = 1 select … from user where u.id = 2…for each userId{

…executeQuery(“

select … where u.id = userId”);}

Database accesses are abstracted

9

DBMSCode

AbstractionLayers

Problems become more complex and frequent after adding abstraction layers

Accessing data using ORM incorrectly

10

@EntityClass User{

…}

for each userId{User u = findUserById(userId);u.getName();

}

A database entity class

Objects

SQL

select … from user where u.id = 1 select … from user where u.id = 2…

Research statement

There are common problems in how developers write database-access code, and these problems are hard to diagnose by only looking at the DBMS. In addition, due to the use of different database abstraction layers, these problems become even harder to find.

By finding problems in the database access code and configuring the abstraction layers correctly, we can significantly improve the performance of database-centric systems.

11

12

Detecting inefficientdata access code

Overview of the thesis

Detecting unneededdata access

Finding overly-strict isolation level

Finished work Future workUnder

submission

Detecting unneededdata access

13

Detecting inefficientdata access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder

submission

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

14

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

15

Focus on one type of inefficient data access: one-by-one processing

Detecting one-by-one processingusing static analysis

16

First find all the functions that read/write from/to DBMS

Class User{getUserById()…getUserAddress()…

}

Identify the positions of all loopsfor each userId{

foo(userId)}

Check if the loop contains any database-accessing function

foo (userId){getUserById(userId)

}

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

17

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

18

Assessing inefficient data access impact by fixing the problem

Execution

Response timefor u in users{

update u}

Code with inefficient data access

users.updateAll()

Code withoutinefficient data access

19

Execute test suite 30 times

Response time after the fix

Avg. % improvement

Execution

Execute test suite 30 times

Inefficient data access causes large performance impacts

0%

20%

40%

60%

80%

100%

One-by-one processing20

% im

pro

vem

en

t in

res

po

nse

tim

e

Detecting unneededdata access

21

Detecting inefficientdata access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder

submission

22

Detecting inefficientdata access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder

submission

Detecting unneededdata access

Intercepting system executions using byte code instrumentation

ORMDBMS

23

Intercept called functions in the code

Intercept ORM generated SQLs

Needed data

Requested data

diff

Mapping data access to called functions

24

@Entity@Table(name = “user”)public class User{

@Column(name=“name”)String userName;

@OneToManyList<Team> teams;

public String getUserName(){return userName;

}

public void addTeam(Team t){teams.add(t);

}

User.java

We apply static analysis to find which database column a function is reading/modifying

24

Identifying unneeded data access from intercepted data

Only user name is needed in the application logic

Needed user name in code

Requested id, name, address, phone number

25

26

Detecting inefficientdata access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder

submission

Detecting unneededdata access

Detecting unneededdata access

27

Detecting inefficientdata access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder

submission

A real life example of transaction abstraction problem

28

All functions in foo will be executed in transactions

@Transactionalpublic class foo{

int getFooVal(){return foo.val;

}…

}

This function does not need to be executed in transactions

Where to put the transaction may affect system behavior and performance

Plans for finding transaction problems related abstraction

29

Bug Reports

Categories of transaction-related bugs

We plan to empirically study bug reports to find root causes and implement a

detection framework

30

31

32

33

34

35

36

37

38

http://petertsehsun.github.io