Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Motivation – How do event logs look like?
PAGE 1
multi set table
Motivation – How are event logs used?
PAGE 2
• Most process discovery techniques
• Most conformance checking techniques
• …
• Data-aware process discovery
• Data-aware conformance checking
• Most enhancement techniques
• …
Of course, the world is not black & white!
Motivation – Using ProM on a standard computer
PAGE 3
~ 4-8 GB of working memory
www.xes-standard.org
10.1109/IEEESTD.2016.7740858
Source: 1849-2016 - IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, © IEEE
IEEE
XES – The event log standard
OpenXES – An (outdated) reference implementation
PAGE 5
OpenXES – Memory Layout
PAGE 6
XEvent
XID HashMap
UUID Node[m]
Entry
Key XAttribute
Value
OpenXES – Memory Layout
PAGE 7
XEvent
XID HashMap
UUID32 bytes
Node[m]
Entry
Keyk bytes
XAttribute32 + v bytes
Valuev bytes
OpenXES – Memory Layout
PAGE 8
XEvent
XID HashMap
UUID32 bytes
Node[m]16 + 4m + (64+k+v)m bytes
Entry 32 + k + 32 + v bytes
Keyk bytes
XAttribute32 + v bytes
Valuev bytes
OpenXES – Memory Layout
PAGE 9
XEvent
XID16 + 32 bytes
HashMap48 + 16 + (68+k+v)m bytes
UUID32 bytes
Node[m]16 + 4m + (64+k+v)m bytes
Entry 32 + k + 32 + v bytes
Keyk bytes
XAttribute32 + v bytes
Valuev bytes
OpenXES – Memory Layout
PAGE 10
XEvent24 + 48 + 64 + (68+k+v)m bytes
XID16 + 32 bytes
HashMap48 + 16 + (68+k+v)m bytes
UUID32 bytes
Node[m]16 + 4m + (64+k+v)m bytes
Entry 32 + k + 32 + v bytes
Keyk bytes
XAttribute32 + v bytes
Valuev bytes
OpenXES – Memory Usage vs ‘Minimal’ Scenario
PAGE 11
OpenXES Minimal
0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0
0.01
0.10
1.00
4.00
10.00
100.00
1,000.00
Number of events in millions (n)
Mem
ory
usage (
GB
)
Attribute size (bytes) 8 48 Attributes (m) 3 25 50
Minimal scenario: n x m table of attributes (m) and events (n), no compression, no overhead
XESLite – Several attempts to solve the issue
PAGE 12
Definition of XESLite
(1) having too much fun in programming
(2) being fed up with OOM exceptions
(3) disbelieving that 17 MB zipped XES
requires GBs of memory
24.02.2014 16:59 – fmannhardt.de
XESLite –Three methods & Assumptions
PAGE 13
Automaton (XL-AT)
In-Memory (XL-IM)
Database(XL-DB)
• no external software / hardware
• ~ 4-8 GB memory
• compatibility
XESLite – General ideas – Flyweight literals
PAGE 14
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
…..
XESLite – General ideas – Flyweight literals
PAGE 15
Google Guava (github.com/google/guava)
Interner<String> interner = Interners.newStrongInterner();
…
…
XAttribute createAttribute(String key, …) {
String key = interner.intern(key);
…
}
Disclaimer:
• Considerable overhead when many unique literals!
• No garbage collection when deleting literals!
XESLite – General ideas – Sequential IDs
PAGE 16
XEvent24 + 48 + 64 + (68+k+v)m bytes
XID16 + 32 bytes
HashMap
UUID32 bytes
Node[m]
Entry
Key XAttribute
Value
XESLite – General ideas – Sequential IDs
PAGE 17
XEvent24 + 8 + 64 + (68+k+v)m bytes
long8 bytes
HashMap48 + 16 + (68+k+v)m bytes
40 bytes saved per event
Auch Kleinvieh macht Mist!
Disclaimer:
• No distributed events!
• Don’t assume the XID returns a real UUID
XESLite – General Ideas – Compressed Traces
PAGE 18
What is a trace?
Idea: Delta compression!
ok, quite idealistic situation
LZ4 compression
(400 MB/s compression & several GB/s decompression)
Disclaimer:
• Random-access methods Slow
• Use iterator / foreach instead of get(i)!
XESLite – Automaton (XL-AT)
PAGE 19
multi set table
XESLite – Automaton (XL-AT)
PAGE 20
finite set
of sequencesmultiplicity
encode
similar problem
XESLite – Automaton (XL-AT)
PAGE 21
external informationfinite set of words
research on from the 1990
minimal
deterministic acyclic
finite automaton
minimal perfect
hashing
XESLite – Automaton (XL-AT) – Example
PAGE 22
(1) build minimal DAFA
Automata minimization is a well-researched problem• Minimization of any DFA: O(n log(n)) with n states (Hopcroft 1974)
• Minimization for acyclic DFA can be done in linear time (Revuz 1992, Daciuk 2000)
XESLite – Automaton (XL-AT) – Example
PAGE 23
(2) build minimal perfect hashing scheme
Assign unique consecutive numbers
1..n to words accepted by the DAFA.
1
2
3
4
XESLite – Automaton (XL-AT) – Example
PAGE 24
(2) build minimal perfect hashing scheme
1
2
3
4
• Use lexicographical ordering
• Assign number based on predecessors
• Encode this scheme efficiently in the DAFA
XESLite – Automaton (XL-AT) – Example
PAGE 25
(2) build minimal perfect hashing scheme
1
2
3
4
• Remember the number of words accepted from states
• Compute number for word w
• Add the numbers of all those states for which
a transition t leads from the path to the state and
the letter of transition t precedes the next letter.
• Add the number of final states passed.
3 (3)
XESLite – Automaton (XL-AT) – Example
PAGE 26
lookup tableDAFA
Luchesi 1992: Applications of Finite Automata Representing Large Vocabularies
Daciuk 2005: Dynamic Perfect Hashing with Finite-State Automata
3 (3)
XESLite – In-Memory (XL-IM)
Tabular view instead of the object graph of OpenXES
PAGE 27
XESLite – In-Memory (XL-IM)
Events consists only of identifiers
PAGE 28
XEvent12 + 8 + 4 bytes
long (ID)8 bytes
Object (Storage)4 bytes
XEvent24 + 48 + 64 + (68+k+v)m bytes
XID16 + 32 bytes
HashMap48 + 16 + (68+k+v)m bytes
UUID32 bytes
Node[m]16 + 4m + (64+k+v)m bytes
Entry 32 + k + 32 + v bytes
Keyk bytes
XAttribute32 + v bytes
Valuev bytes
with trace compression
?? bytes
XESLite – In-Memory (XL-IM)
PAGE 29Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems
+ Compression / packing of similar values
+ Many other optimization possible
XESLite – In-Memory (XL-IM)
• Column-store like custom in-memory data structure in Java
• No communication overhead with external tools
• Assumptions
• Fixed-width values for fast access (lookup table for literals – flyweights for free)
• Consistent attribute types (i.e., columns types are enforced)
• Dynamic memory allocation in (compressed) blocks
PAGE 30Block storing 2 integer values Block storing 8 boolean values
Disclaimer:
• No real deletion only mark as delete!
• Meta-attributes supported but inefficient!
• Spawns a compressor thread!
XESLite – (Embedded) Database (XL-DB)
PAGE 31
As XL-IM, a tabular view instead of the object graph of OpenXES
MapDBstored as key/value pairs
• On-disk storage (mmaped-file)
• Uses operating system paging
• Caching mechanism for
common attributes: • concept:name,
• time:timestamp,
• lifecycle:transition
• Supports all OpenXES
functionality!
Disclaimer:
• No real deletion only mark as delete!
• Spawns a multiple threads!
• MMAP files in temp folder might not be deleted!
Benchmark - Memory
PAGE 32
Road Fines
No difference XL-DB vs XL-IM BPI 2011 vs
Hospital Billing
Benchmark - Time
PAGE 33
Garbage Coll.
No difference? BTree!
Random-access
implementation detail
Conclusion
PAGE 34
• Discussion on requirements
• Multi set vs Table
• Storage requirements
• Three general ideas
• Flyweights
• Sequential IDs
• Compressed Traces
• Three XESLite implementations
• Automaton (XL-AT)
• In-Memory (XL-IM)
• Database (XL-DB)
• Details in technical report:
• BPM Center Report BPM-16-02