38
Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code 1 Tse-Hsun(Peter) Chen Supervised by Ahmed E. Hassan

Improving the quality of large scale database-centric software systems by analyzing database access code

  • Upload
    sailqu

  • View
    272

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Improving the quality of large scale database-centric software systems by analyzing database access code

1

Improving the Quality of Large-Scale Database-Centric Software Systems by

Analyzing Database Access Code

Tse-Hsun(Peter) Chen

Supervised by Ahmed E. Hassan

Page 2: Improving the quality of large scale database-centric software systems by analyzing database access code

2

Common performance problems in database-access code

DBMSCode

• Inefficient data access

Page 3: Improving the quality of large scale database-centric software systems by analyzing database access code

3

Inefficient data access

for each userId{ … executeQuery(“select … where u.id = userId”);}

Code

SQL select … from user where u.id = 1 select … from user where u.id = 2…

Page 4: Improving the quality of large scale database-centric software systems by analyzing database access code

4

Common performance problems in database-access code

DBMSCode

• Inefficient data access• Unneeded data access

Page 5: Improving the quality of large scale database-centric software systems by analyzing database access code

5

Unneeded data access

DBMSCode

Requireuser data in

the code

Actual request sent

Team TableUser Tablejoin

Page 6: Improving the quality of large scale database-centric software systems by analyzing database access code

6

Common performance problems in database-access code

DBMSCode

• Inefficient data access• Unneeded data access• Overly-strict isolation level

Page 7: Improving the quality of large scale database-centric software systems by analyzing database access code

7

Overly-strict isolation level

DBMSCode

Reading onlyuser name

set read-write transactionselect … from user

Page 8: Improving the quality of large scale database-centric software systems by analyzing database access code

8

Cannot find problems by only looking at the DBMS side

DBMS

select … from user where u.id = 1 select … from user where u.id = 2…for each userId{

… executeQuery(“select … where u.id = userId”);}

Page 9: Improving the quality of large scale database-centric software systems by analyzing database access code

9

Database accesses are abstracted

DBMSCode

AbstractionLayers

Problems become more complex and frequent after adding abstraction layers

Page 10: Improving the quality of large scale database-centric software systems by analyzing database access code

10

Accessing data using ORM incorrectly@EntityClass User{

…}

for each userId{ User u = findUserById(userId); u.getName();}

A database entity class

Objects

SQLselect … from user where u.id = 1 select … from user where u.id = 2…

Page 11: Improving the quality of large scale database-centric software systems by analyzing database access code

11

Research statement

There are common problems in how developers write database-access code, and these problems are hard to diagnose by only looking at the DBMS. In addition, due to the use of different database abstraction layers, these problems become even harder to find.

By finding problems in the database access code and configuring the abstraction layers correctly, we can significantly improve the performance of database-centric systems.

Page 12: Improving the quality of large scale database-centric software systems by analyzing database access code

12

Detecting inefficient data access code

Overview of the thesis

Detecting unneeded data access

Finding overly-strict isolation level

Finished work Future workUnder submission

Page 13: Improving the quality of large scale database-centric software systems by analyzing database access code

13

Detecting unneeded data access

Detecting inefficient data access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder submission

Page 14: Improving the quality of large scale database-centric software systems by analyzing database access code

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

14

Page 15: Improving the quality of large scale database-centric software systems by analyzing database access code

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

15

Focus on one type of inefficient data access: one-by-one processing

Page 16: Improving the quality of large scale database-centric software systems by analyzing database access code

Detecting one-by-one processingusing static analysis

16

First find all the functions that read/write from/to DBMS

Class User{getUserById()…getUserAddress()…

}

Identify the positions of all loopsfor each userId{

foo(userId)}

Check if the loop contains any database-accessing function

foo (userId){getUserById(userId)

}

Page 17: Improving the quality of large scale database-centric software systems by analyzing database access code

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

17

Page 18: Improving the quality of large scale database-centric software systems by analyzing database access code

Inefficient data access detection framework

Inefficient data access detection and ranking framework

Ranked according to performance impact

Ranked inefficient

data access code

Source Code

detection

ranking

18

Page 19: Improving the quality of large scale database-centric software systems by analyzing database access code

Assessing inefficient data access impact by fixing the problem

ExecutionResponse time

for u in users{ update u}

Code with inefficient data access

users.updateAll()

Code without inefficient data access 19

Execute test suite 30 times

Response time after the fix

Avg. % improvement

Execution

Execute test suite 30 times

Page 20: Improving the quality of large scale database-centric software systems by analyzing database access code

Inefficient data access causes large performance impacts

One-by-one processing0%

20%

40%

60%

80%

100%

20

% im

prov

emen

t in

resp

onse

tim

e

Page 21: Improving the quality of large scale database-centric software systems by analyzing database access code

21

Detecting unneeded data access

Detecting inefficient data access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder submission

Page 22: Improving the quality of large scale database-centric software systems by analyzing database access code

22

Detecting inefficient data access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder submission

Detecting unneeded data access

Page 23: Improving the quality of large scale database-centric software systems by analyzing database access code

Intercepting system executions using byte code instrumentation

ORMDBMS

23

Intercept called functions in the code

Intercept ORM generated SQLs

Needed data

Requested data

diff

Page 24: Improving the quality of large scale database-centric software systems by analyzing database access code

Mapping data access to called functions

24

@Entity@Table(name = “user”)public class User{

@Column(name=“name”)String userName;

@OneToManyList<Team> teams;public String getUserName(){

return userName;}

public void addTeam(Team t){ teams.add(t); }

User.java

We apply static analysis to find which database column a function is reading/modifying

24

Page 25: Improving the quality of large scale database-centric software systems by analyzing database access code

Identifying unneeded data access from intercepted data

Only user name is needed in the application logic

Needed user name in code

Requested id, name, address, phone number

25

Page 26: Improving the quality of large scale database-centric software systems by analyzing database access code

26

Detecting inefficient data access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder submission

Detecting unneeded data access

Page 27: Improving the quality of large scale database-centric software systems by analyzing database access code

27

Detecting unneeded data access

Detecting inefficient data access code

Overview of the thesis

Finding overly-strict isolation level

Finished work Future workUnder submission

Page 28: Improving the quality of large scale database-centric software systems by analyzing database access code

A real life example of transaction abstraction problem

28

All functions in foo will be executed in transactions

@Transactionalpublic class foo{

int getFooVal(){return foo.val;

}…

}

This function does not need to be executed in transactions

Where to put the transaction may affect system behavior and performance

Page 29: Improving the quality of large scale database-centric software systems by analyzing database access code

Plans for finding transaction problems related abstraction

29

Bug Reports

Categories of transaction-related bugs

We plan to empirically study bug reports to find root causes and implement a

detection framework

Page 30: Improving the quality of large scale database-centric software systems by analyzing database access code

30

Page 31: Improving the quality of large scale database-centric software systems by analyzing database access code

31

Page 32: Improving the quality of large scale database-centric software systems by analyzing database access code

32

Page 33: Improving the quality of large scale database-centric software systems by analyzing database access code

33

Page 34: Improving the quality of large scale database-centric software systems by analyzing database access code

34

Page 35: Improving the quality of large scale database-centric software systems by analyzing database access code

35

Page 36: Improving the quality of large scale database-centric software systems by analyzing database access code

36

Page 37: Improving the quality of large scale database-centric software systems by analyzing database access code

37

Page 38: Improving the quality of large scale database-centric software systems by analyzing database access code

38

http://petertsehsun.github.io