Spell: Streaming Parsing of System Event Logs
Min Du, Feifei Li
School of Computing,
University of Utah
Background
Spell: Streaming Parsing of System Event Logs2
Background
Spell: Streaming Parsing of System Event Logs3
System Event Log
Background
Spell: Streaming Parsing of System Event Logs4
System Event Log
Exists practically on
every computer system!
Background
Spell: Streaming Parsing of System Event Logs5
System Event Log
Exists practically on
every computer system!
Background
Spell: Streaming Parsing of System Event Logs6
System
Event
Log
Started service A on port 80
Started service B on port 90
Started service C on port 100
Executor updated: app-1 is now LOADING
Executor updated: app-2 is now LOADING
TaskSetManager: Starting task 0 in stage 2
TaskSetManager: Starting task 1 in stage 5
……
Background
Spell: Streaming Parsing of System Event Logs7
System
Event
Log
Structured DataMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
Started service A on port 80
Started service B on port 90
Started service C on port 100
Executor updated: app-1 is now LOADING
Executor updated: app-2 is now LOADING
TaskSetManager: Starting task 0 in stage 2
TaskSetManager: Starting task 1 in stage 5
……
Started service * on port *
Executor updated: * is now LOADING
TaskSetManager: Starting task * in stage *
……
Background
Spell: Streaming Parsing of System Event Logs8
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G A N A L Y S I S
Background
Spell: Streaming Parsing of System Event Logs9
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
Message count vector:
Xu’SOSP09, Lou’ATC10, Lin’ICSE16, etc.
L O G A N A L Y S I S
Background
Spell: Streaming Parsing of System Event Logs10
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
Message count vector:
Xu’SOSP09, Lou’ATC10, Lin’ICSE16, etc.
Build workflow model:
Lou’KDD10, Beschastnikh’ICSE14,
Yu’ASPLOS16, etc.
L O G A N A L Y S I S
Background
Spell: Streaming Parsing of System Event Logs11
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G P A R S I N G
Background
Spell: Streaming Parsing of System Event Logs12
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G P A R S I N G
Use source code as template to parse logs:
Xu’SOSP09
Background
Spell: Streaming Parsing of System Event Logs13
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G P A R S I N G
Use source code as template to parse logs:
Xu’SOSP09
Problem: What if we don’t have source code?
Background
Spell: Streaming Parsing of System Event Logs14
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G P A R S I N G
Use source code as template to parse logs:
Xu’SOSP09
Problem: What if we don’t have source code?
Directly parse from raw system logs:
Makanju’KDD09, Fu’ICDM09, Tang’ICDM10, Tang’CIKM11, etc.
Background
Spell: Streaming Parsing of System Event Logs15
System
Event
Log
Structured Data Anomaly
DetectionMessage/Event type
Log key
……
printf(“Started service
%s on port %d”, x, y);
L O G P A R S I N G
Use source code as template to parse logs:
Xu’SOSP09
Problem: What if we don’t have source code?
Directly parse from raw system logs:
Makanju’KDD09, Fu’ICDM09, Tang’ICDM10, Tang’CIKM11, etc.
Problem: Offline batched processing, some very slow.
Our approach
Spell: Streaming Parsing of System Event Logs16
Spell, a structured Streaming Parser for Event Logs using an
LCS (longest common subsequence) based approach.
Our approach
Spell: Streaming Parsing of System Event Logs17
Spell, a structured Streaming Parser for Event Logs using an
LCS (longest common subsequence) based approach.
Two log entries:
Temperature (41C) exceeds warning threshold
Temperature (42C, 43C) exceeds warning threshold
Example:
Our approach
Spell: Streaming Parsing of System Event Logs18
Spell, a structured Streaming Parser for Event Logs using an
LCS (longest common subsequence) based approach.
Two log entries:
Temperature (41C) exceeds warning threshold
Temperature (42C, 43C) exceeds warning threshold
LCS:
Temperature * exceeds warning threshold
Example:
Our approach
Spell: Streaming Parsing of System Event Logs19
Spell, a structured Streaming Parser for Event Logs using an
LCS (longest common subsequence) based approach.
Two log entries:
Temperature (41C) exceeds warning threshold
Temperature (42C, 43C) exceeds warning threshold
LCS:
Temperature * exceeds warning threshold
Naturally a message type!
printf(“Temperature %s exceeds warning threshold”)
Example:
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs20
Add new log entry into LCSMap in a streaming fashion, update existing message type if
length(LCS) > 0.5 * length(new log entry)
LCSMap
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs21
LCSMap
new log entry: Temperature (41C) exceeds warning threshold
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs22
LCSMap
LC
SO
bje
ct
LCSseq: Temperature (41C) exceeds warning threshold
lineIds: {0}
paramPos: {empty}
new log entry:
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs23
LCSMap
LC
SO
bje
ct
LCSseq: Temperature (41C) exceeds warning threshold
lineIds: {0}
paramPos: {empty}
new log entry: Temperature (43C) exceeds warning threshold
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs24
LCSMap
LC
SO
bje
ct
LCSseq: Temperature * exceeds warning threshold
lineIds: {0, 1}
paramPos: {1}
new log entry:
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs25
LCSMap
LC
SO
bje
ct
LCSseq: Temperature * exceeds warning threshold
lineIds: {0, 1}
paramPos: {1}
new log entry: Command has completed successfully
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs26
LCSMap
new log entry:
LC
SO
bje
ct
LCSseq: Temperature * exceeds warning threshold
lineIds: {0, 1}
paramPos: {1}
LC
SO
bje
ct
LCSseq: Command has completed successfully
lineIds: {2}
paramPos: {empty}
SPELL – Basic workflow
Spell: Streaming Parsing of System Event Logs27
LCSMap
new log entry: ……
……
LC
SO
bje
ct
LCSseq: Temperature * exceeds warning threshold
lineIds: {0, 1}
paramPos: {1}
LC
SO
bje
ct
LCSseq: Command has completed successfully
lineIds: {2}
paramPos: {empty}
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs28
To compute LCS of two log entries, each one has 𝑶(𝒏) length:
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs29
To compute LCS of two log entries, each one has 𝑶(𝒏) length:
Naïve way: Dynamic Programing
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs30
To compute LCS of two log entries, each one has 𝑶(𝒏) length:
Naïve way: Dynamic Programing
Time complexity:
To compare a log entry with an existing message type: 𝑂(𝑛2)To compare a new log entry with 𝑂(𝑚) existing message types: 𝑂(𝑚𝑛2)
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs31
To compute LCS of two log entries, each one has 𝑶(𝒏) length:
Naïve way: Dynamic Programing
Time complexity:
To compare a log entry with an existing message type: 𝑂(𝑛2)To compare a new log entry with 𝑂(𝑚) existing message types: 𝑂(𝑚𝑛2)
Can we do better?
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs32
Observation. For a complex system,
number of log entries: millions
number of message types: hundreds
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs33
Observation. For a complex system,
number of log entries: millions
number of message types: hundreds
For example:Blue Gene/L log:
4,457,719 log entries, 394 message types
Hadoop log used in Xu’SOSP09:
11,197,705 log entries, only 29 message types
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs34
Observation. For a complex system,
number of log entries: millions
number of message types: hundreds
For example:Blue Gene/L log:
4,457,719 log entries, 394 message types
Hadoop log used in Xu’SOSP09:
11,197,705 log entries, only 29 message types
For a majority of new log entries, their message types already exist in LCSMap!
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs35
Improvement 1: Prefix Tree
Existing message types:
A B C
A C D
A D
E F
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs36
Improvement 1: Prefix Tree
Existing message types:
A B C
A C D
A D
E F
ROOT
A E
FB C D
C D
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs37
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
New log entry: A B P C
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs38
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
New log entry: A B P C
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs39
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
New log entry: A B P C
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs40
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
New log entry: A B P C
Parameter:
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs41
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
New log entry: A B P C
Parameter:
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs42
Improvement 1: Prefix TreeROOT
A E
FB C D
C D
Time Complexity:
𝑶(𝒏) for each log entry
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs43
Improvement 1: Prefix TreeROOT
A D
AB
C
Problem:
New log entry: D A P B C
E
F
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs44
Improvement 1: Prefix TreeROOT
D
A
Problem:
New log entry: D A P B C
Matches D A
A
B
C
E
F
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs45
Improvement 1: Prefix Tree
Problem:
New log entry: D A P B C
Matches D A
Should be: A B C
ROOT
D
A
A
B
C
E
F
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs46
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs47
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs48
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs49
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs50
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs51
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs52
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs53
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎
𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs54
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
Matched length:
3
N/A
2
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs55
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs56
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
Time complexity
𝑶 𝒎𝒏
Number of
message
types
Log entry
length
SPELL – Improvement on efficiency
Spell: Streaming Parsing of System Event Logs57
Improvement 2: Simple Loop
Compare each message type with new log entry
Message types:[ A B C ]
New log entry:
[ A E F ]
[ D A ]
[ D A P B C]
Time complexity
𝑶 𝒎𝒏
Number of
message
types
Log entry
length
For remaining log entries, compare it with each message type using simple DP.
Evaluation
Spell: Streaming Parsing of System Event Logs58
IPLoM (Makanju’KDD09):
Partition log file using 3-step heuristics (log entry length, etc.)
CLP (Fu’ICDM09)
Cluster similar logs together based on weighted edit distance
Log dataset:
Log type Count Message type ground truth
Los Alamos HPC log 433,490 Available online
BlueGene/L log 4,747,963 Available online
Methods to compare:
Evaluation - Efficiency
Spell: Streaming Parsing of System Event Logs59
log size (× 105, Los Alamos) log size (× 105, Blue Gene)
Evaluation - Effectiveness
Spell: Streaming Parsing of System Event Logs60
log size (× 105, Los Alamos) log size (× 105, Blue Gene)
Conclusion
Spell: Streaming Parsing of System Event Logs61
Thank you
A streaming system event log parser
Using LCS
Prefix tree and simple loop to improve efficiency
Outperform offline methods on large system log dataset
Spell:
Evaluation - Efficiency
Spell: Streaming Parsing of System Event Logs62
Evaluation - Effectiveness
Spell: Streaming Parsing of System Event Logs63