25
` ` BugCache BugCache Predicting Defects Sung Kim • MIT Tom Zimmermann • Saarland University Jim Whitehead • UC Santa Cruz Andreas Zeller • Saarland University

Predicting Faults from Cached History

Embed Size (px)

DESCRIPTION

29th International Conference on Software Engineering (ICSE 2007), ACM SIGSOFT Distinguished Paper Award winner.

Citation preview

Page 1: Predicting Faults from Cached History

BugCacheBugCachePredicting Defects

Sung Kim • MITTom Zimmermann • Saarland

UniversityJim Whitehead • UC Santa Cruz

Andreas Zeller • Saarland University

Page 2: Predicting Faults from Cached History

The Problem

How should we How should we allocate our allocate our

resources for resources for quality quality

assurance?assurance?

WhichWhich files files should we focus should we focus

on?on?

Page 3: Predicting Faults from Cached History

WhichWhich files are files are most bug-most bug-

prone?prone?

The Problem

Page 4: Predicting Faults from Cached History

Where are bugs?

Temporal locality:Temporal locality:Defected files are Defected files are likely to have more likely to have more

soon.soon.[Ostrand, Weyuker][Ostrand, Weyuker]

In modified files!In modified files![Nagappan et al.][Nagappan et al.]

In new files!In new files![Graves et al.][Graves et al.]

Spatial locality:Spatial locality:In nearby other In nearby other

bugs!bugs! [Zimmermann et al.][Zimmermann et al.]

Page 5: Predicting Faults from Cached History

Our Solution

• List of most bug-prone files• Combine all bug occurrence models

Cache

Page 6: Predicting Faults from Cached History

Bug Cache

10% filesmost defect-prone

all files

load

pre-fetch

replacementNear by: co changes

Page 7: Predicting Faults from Cached History

Outline•BugCache Model•Cache update•Replacement Policies•Pre-fetch

•Evaluation

•7 open source projects

•Related Work

•Summary

Page 8: Predicting Faults from Cached History

Bug Cache

load

if m

issed load if m

issed

pre

-fe

tch

A

Fix change

Non-fix change

Fix change

Change historyB C

Page 9: Predicting Faults from Cached History

Cache Model

Miss

Cache size: 2

A B C

C

Page 10: Predicting Faults from Cached History

Cache Update

Parameter: Block size (neighborhood size)

• Load missed files • Load nearby files (spatial locality)

FileNumber of

common changes with .

140

C

A

B

D

4B

Page 11: Predicting Faults from Cached History

Cache Model

HitMiss Miss

Cache size: 2Block size: 2

Hit

A B C A DC B B A

CA

B

Which one should be replaced?

Page 12: Predicting Faults from Cached History

Replacement Policies

•Least recently used (LRU)Unload the files that have the least recently found defect.

•Least frequently changed (CHANGE)Unload the files that have the fewest changes.

•Least frequent defects (BUG)Unload the files that have the fewest defects.

Parameter: Replacement Policy

Page 13: Predicting Faults from Cached History

Cache Model

HitMiss Miss

Cache size: 2Block size: 2

Hit

Replacement: BUG

A B C A DC B B A

CA

Block size: 1Cache size: 2File LRU CHANGE BUG

-5 2 2-3 3 1B

C

BUG21

(replace)

B

Page 14: Predicting Faults from Cached History

Pre-fill and pre-fetch

•Pre-fill

•Fill cache with largest files (LOC)

•Pre-fetch

•Load changed files

•Load added files

•Unload deleted files

Parameter: Pre-fetch size

Page 15: Predicting Faults from Cached History

Cache Model

HitMiss Miss

Cache size: 2Block size: 2Replacement: BUGPre-fetch size: 1

A B C A DC B B A

CA

B

Hit rate = #Hits / #Defects = 25%

Pre-fill

Pre-fetch

Miss

D

Pre-fetch

Page 16: Predicting Faults from Cached History

Evaluation

PostgreSQLjEdit

Mozilla

Columba

Page 17: Predicting Faults from Cached History

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

Hit Rates

Cache size = 10% Block/pre-fetch size = 50% of the cache size

Replacement policy = LRU

67

76

85

83

93

79

71

43

59

55

46

69

67

60

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

43

59

55

46

69

67

60

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

Page 18: Predicting Faults from Cached History

Exhaustive Evaluation

•Cache size: fixed to 10%

•Vary block size:0% to 100% of cache size

•Vary pre-fetch size: 0% to 100% of cache size

•Vary replacement: LRU, CHANGE, BUG

Page 19: Predicting Faults from Cached History

Function Level Default vs Optimal Options

43

59

55

46

69

67

60

46

59

55

49

72

68

62

0 25 50

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

Default OptimalCache size = 10% of all

functions/methods

Page 20: Predicting Faults from Cached History

Function Level Optimal Hit Rates

ProjectFunctio

nApache 1.3ColumbaEclipseJEditMozillaPostgreSQL Subversion

2,1138,428

33,2145,489 8,2038,6593,693

Cache size = 10% of all functions/methods

Hit rate62%68%72%49%55%59%46%

Block15%57%20%85%41%29%71%

Pre-fetch17%20% 4% 8%14%17%14%

Replace

BUGBUGBUGBUGLRULRUBUG

Page 21: Predicting Faults from Cached History

File Level Default vs Optimal Options

67

76

85

83

93

79

71

73

79

88

85

95

83

82

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

Default OptimalCache size = 10% of all files

Page 22: Predicting Faults from Cached History

File Level Optimal Hit Rates

Project FilesApache 1.3ColumbaEclipseJEditMozillaPostgreSQL Subversion

1541,4283,330

420396598255

Cache size = 10% of all files

Hit rate82%83%95%85%88%79%73%

Block50%59%20%23%23%22%42%

Pre-fetch0%0%0%0%0%0%0%

Replace

LRUBUGLRULRULRULRULRU

Page 23: Predicting Faults from Cached History

0 25 50 75 100

BugCache. Top 10%

Hassan et al. Top 10%

Ostrand et al. Top 20%

Khoshgoftaar et al. Top 20%

Khoshgoftaar et al. Top 10%

Related Work

0 25 50 75 100

BugCache. Top 10%

Hassan et al. Top 10%

Ostrand et al. Top 20%

Khoshgoftaar et al. Top 20%

Khoshgoftaar et al. Top 10% In previous work,10% predicts 44%~78% 20% predicts 71~93%

10% BugCache predicts 73~95%

Page 24: Predicting Faults from Cached History

Summary

Page 25: Predicting Faults from Cached History

BugCacheBugCachePredicting Defects

Sung Kim • MITTom Zimmermann • Saarland

UniversityJim Whitehead • UC Santa Cruz

Andreas Zeller • Saarland University