24
Sahara: Guiding the Debugging of Failed Software Upgrades Rekha Bachwani, Olivier Crameri, Ricardo Bianchini, Dejan Kostic, Willy Zwaenepoel

Sahara icsm 2011

  • Upload
    rnbach

  • View
    179

  • Download
    3

Embed Size (px)

DESCRIPTION

Presentation slides for my ICSM talk.

Citation preview

Page 1: Sahara icsm 2011

Sahara: Guiding the Debugging of

Failed Software Upgrades

Rekha Bachwani, Olivier Crameri, Ricardo Bianchini, Dejan Kostic, Willy Zwaenepoel

Page 2: Sahara icsm 2011

Modern software is complex and requires regular

updates (once every few weeks)

Fix bugs

Patch security vulnerabilities

Software upgrade failures are frequent [sosp’07]

5-10% of all upgrades fail

Upgrade failures can be catastrophic

Service disruption ($$)

User dissatisfaction ( )

ICSM, 2011 Rekha Bachwani, Rutgers University 2

Motivation

Page 3: Sahara icsm 2011

Difference in vendor and user environment is a major

source of failures [sosp’07]

Broken dependencies

Incompatibilities with the legacy systems

Testing in all user environments is impractical

Set of all possible environment settings is large

Set of possible user inputs is huge

Debugging software upgrade failures is hard

Incomplete user environment data

Unable to reproduce user conditions or failure

ICSM, 2011 Rekha Bachwani, Rutgers University 3

Motivation

Page 4: Sahara icsm 2011

Integrate users in the testing environment

Test upgrade in (many) user environments with their input

Collect data from (willing) users

Environment settings

Success or failure flags

Leverage data from the users to isolate the cause

ICSM, 2011 Rekha Bachwani, Rutgers University 4

Approach

Page 5: Sahara icsm 2011

Sahara: Upgrade Debugging System

Simplifies debugging of environment-related failures

Prioritizes the set of routines to consider when debugging

Uses machine learning, and static and dynamic analyses

Evaluate Sahara with 3 applications (5 failures)

Three real upgrade failures in OpenSSH

One synthetic failure each in SQLite and uServer

ICSM, 2011 Rekha Bachwani, Rutgers University 5

Contributions

Page 6: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 6

Outline

Overview

Sahara: Debugging Failed Upgrades

Evaluation

Conclusion

Page 7: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 7

Sahara - Key Idea

Upgrade failures are caused by user environments

Identify the suspect environment resources (SERs)

Identify the code affected by SERs

Software behaved correctly before the upgrade

Identify the code deviations in the upgrade

The root cause is most likely in the code that is both

Affected by the suspect aspects of the environment

Has deviated after the upgrade

Page 8: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 8

Sahara - Identifying Suspects

Upgrade

Upgrade

Upgrade

Test

Upgrade

Test

Upgrade

Test

Upgrade

User Sites

Vendor

Page 9: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 9

Sahara - Identifying Suspects

Pass/Fail labels

Environment data

Suspect

Environment

Resources

Feature Selection

Static AnalysisSuspect

Routines

User Sites

Vendor

Page 10: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 10

Sahara – Identifying Deviations

Instrumented

code

Dynamic

analysis code

Run

Original

and New

Version

Run

Original

and New

Version

Suspect

Routines

Vendor

Instrumented

code

Dynamic

analysis code

User Sites

Page 11: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 11

Sahara – Identifying Deviations

Run

Original

and New

Version

Run

Original

and New

Version

Suspect

Routines

Vendor

Dynamic

Analysis

Dynamic Analysis

Deviated

Routines

User Sites

Page 12: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 12

Sahara – Identifying Deviations

Suspect

Routines ∩ Prime Suspects

Vendor

Deviated

Routines

Page 13: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 13

Sahara - Summary

Identifies environment resources that caused failure

Feature selection using feedback from many users

Isolates routines affected by suspect environment

Def-use static analysis

Finds routines that have deviated after upgrade

Dynamic source analysis [icsm’04]

Combines results from static and dynamic analysis to

produce prime suspects

Page 14: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 14

Outline

Overview

Sahara: Debugging Failed Upgrades

Evaluation

Conclusion

Page 15: Sahara icsm 2011

Experimental Setup

Upgrade deployment Environment data from 87 machines

Experiments

Application-specific configuration

Random

Modified 3 out of 8 real configurations to induce failures

Feature Selection 20 fail profiles, 67 success profiles

Failure Correlation

Perfect (100%) – all failure-inducing profiles result in failure

Imperfect (60%) – 60% of failure-inducing profiles result in failure

Imperfect (20%) – 20% of failure-inducing profiles result in failure

Suspects: features within 30% of the top-ranked feature

ICSM, 2011 15Rekha Bachwani, Rutgers University

Page 16: Sahara icsm 2011

Evaluation

Evaluate Sahara with three applications:

OpenSSH – 3 real upgrade failures

Upgrades every 3-6 months

50-70K lines of code

SQLite – 1 synthetic upgrade failure

67K lines of code

uServer – 1 synthetic upgrade failure

37K lines of code

Results for only two OpenSSH bugs discussed next

ICSM, 2011 16Rekha Bachwani, Rutgers University

Page 17: Sahara icsm 2011

OpenSSH bugs – Port Forwarding

Large data transfers abort with port forwarding

Regression bug in ssh version 4.7

Abort not reproducible at vendor site

Reasons for the abort

Users with port forwarding enabled issued large transfers

Default window size increased from 128KB to 2MB

Window size incorrectly advertised as packet size

sshd limits maximum packet size to 256KB

ICSM, 2011 17Rekha Bachwani, Rutgers University

Page 18: Sahara icsm 2011

Results – Port Forwarding

Sahara reduces no. of routines by

2-3x over static analysis, 17-20x over dynamic analysis, and 9-10x over diff

Produces small number of routines

Prime suspects always include offending routine(s)

ICSM, 2011 18Rekha Bachwani, Rutgers University

65 65 65 65 65 65

12 12 12

22 22 22

124 124 124 124 124 124

6 6 6 7 7 7

0

40

80

120

Perfect (100%) (SERs = 1)

Imperfect (60%) (SERs = 1)

Imperfect (20%) (SERs = 1)

Perfect (100%) (SERs = 3)

Imperfect (60%) (SERs = 3)

Imperfect (20%) (SERs = 3)

Random Real

diff Suspect Routines (Static Analysis)

Deviated Routines (Dynamic Analysis) Primary Suspects (Sahara)

No

. o

f R

ou

tin

es

Page 19: Sahara icsm 2011

OpenSSH bugs – X11 Forwarding

X forwarding won't start when executed in background

Regression bug in sshd version 4.2

X11 forwarding is enabled and X-session started in the

background

Reasons for the failure

X11 code modified to destroy listeners whose session has ended

X11 session in background, closes the session

ICSM, 2011 19Rekha Bachwani, Rutgers University

Page 20: Sahara icsm 2011

Results – X11 Forwarding

ICSM, 2011 20Rekha Bachwani, Rutgers University

137 137 137 137 137 137

18 18 18 21 20 20

157 157 157 157 157 157

6 6 6 7 6 60

40

80

120

160

Perfect (100%) (SERs = 1)

Imperfect (60%) (SERs = 1)

Imperfect (20%) (SERs = 1)

Perfect (100%) (SERs = 3)

Imperfect (60%) (SERs = 3)

Imperfect (20%) (SERs = 3)

Random Real

diff Suspect Routines (Static Analysis)

Deviated Routines (Dynamic Analysis) Primary Suspects (Sahara)

No

. o

f R

ou

tin

es

Sahara reduces no. of routines by

3x over static analysis, 20x over dynamic analysis, and 15x over diff

Produces small number of routines

Offending routine(s) always included in prime suspects

Page 21: Sahara icsm 2011

Results – Sensitivity Analysis (1/2)

Impact of number of failure-inducing profiles

Default - 20 failure-inducing profiles

Case 1 - 30 failure-inducing profiles

Number of SERs reduces by at most 2 features

Number of prime suspects reduces by at most 1

Case 2 - 10 failure-inducing profiles

Number of SERs reduces by at most 1

Number of prime suspects reduces by at most 1

ICSM, 2011 21Rekha Bachwani, Rutgers University

More profiles result in fewer SERs and prime suspects

Fewer profiles sometimes result in less noise

Page 22: Sahara icsm 2011

Results – Sensitivity Analysis (2/2)

Impact of feature selection accuracy

Default – suspects within 30% of top-ranked feature

Case 1 - suspects within 50% of top-ranked feature

Prime suspects increase by at most 2x

Case 2 – All configuration parameters are suspect

Prime suspects increase by 6-7x

ICSM, 2011 22Rekha Bachwani, Rutgers University

Lower feature selection accuracy results in more

SERs and prime suspects

Page 23: Sahara icsm 2011

ICSM, 2011 Rekha Bachwani, Rutgers University 23

Conclusion

Leverages user feedback, machine learning, and

program analyses

Produces accurate recommendations with a small set

of routines

The recommended set always includes

the offending routine

culprit environment resource

Demonstrates that combining different techniques can

be effective for debugging

Page 24: Sahara icsm 2011

Thanks for your time!

Questions ?