19
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs Dacong (Tony) Yan April 07, 2010

SherLog: Error Diagnosis by Connecting Clues from Run-time Logs

Embed Size (px)

DESCRIPTION

Class paper present

Citation preview

SherLog: Error Diagnosis by Connecting Clues fromRun-time Logs

Dacong (Tony) Yan

April 07, 2010

Introduction

Scenario - production run failure

failure reproduction: reproduce the failed execution trying to figureout what was going on with the program

Challenges

customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)

Common Practice in Industry

customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem

Research Question

how to locate root cause of failure by analyzing logs?even without reproduce the failure execution

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13

Introduction

Scenario - production run failure

failure reproduction: reproduce the failed execution trying to figureout what was going on with the program

Challenges

customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)

Common Practice in Industry

customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem

Research Question

how to locate root cause of failure by analyzing logs?even without reproduce the failure execution

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13

Introduction

Scenario - production run failure

failure reproduction: reproduce the failed execution trying to figureout what was going on with the program

Challenges

customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)

Common Practice in Industry

customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem

Research Question

how to locate root cause of failure by analyzing logs?even without reproduce the failure execution

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13

Introduction

Scenario - production run failure

failure reproduction: reproduce the failed execution trying to figureout what was going on with the program

Challenges

customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)

Common Practice in Industry

customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem

Research Question

how to locate root cause of failure by analyzing logs?even without reproduce the failure execution

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13

Approach

Idea

Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution paths

Realistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths

Usage Scenario

runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13

Approach

Idea

Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution pathsRealistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths

Usage Scenario

runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13

Approach

Idea

Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution pathsRealistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths

Usage Scenario

runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13

Design

Three main components:

Log Parsing: locates the source code lines printing the messages

Path Inference: infers the Must-Paths, May-Paths, andPruned-Paths

Value Inference: infers the variable values on the paths

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13

Design

Three main components:

Log Parsing: locates the source code lines printing the messages

Path Inference: infers the Must-Paths, May-Paths, andPruned-Paths

Value Inference: infers the variable values on the paths

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13

Evaluation

Evaluation

Methodology

manually reproduce and diagnose the failurecollect path summaries at runtimecompare the result of SherLog with the reproduction

Terminology

useful: SherLog infers a subset of the summarized informationcomplete: SherLog infers all the information necessary for debugging

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 7/13

Experimental Results

Overall Results

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 9/13

Case Studies

Three case studies to demonstrate the effectiveness of SherLog:

Case 1: ln of coreutils 4.5.1

Case 2: Squid web proxy cache server

Case 3: CVS Configuration Error

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 10/13

Squid Case Study

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 11/13

Performance

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 12/13

Discussion

What can we do with the results of SherLog? Can we make thesesuccessive steps automated as well?

How much helpful the result of SherLog is for debugging? Or moregenerally, how do we evaluate automated debugging tools?

How much useful SherLog is when it is not complete?

CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 13/13