Discovery of Meaningful Rules in Time Series SIGKDD2015

Discovery of Meaningful Rules in Time Series

SIGKDD2015

What is rule

• 静夜思• 唐李白• 床前明月光，• 疑是地上霜。• 举头望明月，• 低头思故乡。

• 秋浦歌• 唐李白• 炉火照天地，• 红星乱紫烟。• 赧郎明月夜，• 歌曲动寒川。

明 -> 月

The Raven

• A poem by Edgar Allan Poe

“Once upon a midnight dreary, while I pondered weak and …”

chamber( 房间 ) → door

chamber: antecedent

door: consequent

Rule in Time Series

• A major difference between text and time series is that the latter does not have a natural segmentation

• onceuponamidnightdrearywhileIponderedweak....

• qncexauponwamidmightmtdreerydwgileuIpponderediweek...

• dist(“chamber”, substring) ≤ t → door

• For example: t = 2 chanbet -> door

About lag

The consequent may not immediately followed the antecedent.

chamberdoor, chamberzdoor, chamberxydoor

So we need to define a parameter, maxlag, which is the maximum number of characters between the the antecedent and the consequent

Example:if maxlag = 2,the above predictions is valid

The formal definition of rule in Time Series

“If we see a substring of length ρ that is within distance of the word chamber, then we fire the rule and expect to see a similar substring to word door, within a learned distance , in the next maxlag time steps.”

Rule is like this

Time Series Motif

The method is based on Time Series Motif, which has been extensively studied in many literature

Definitions

DATA DISCRETIZATION

Find the minimum value and maximum value, then we set bin boundaries that are uniformly sized between min and max. The resulting bin width is then: (max - min) / cardinality

• MDL is used as a scoring function, which is novel in this paper

• Why MDL? Why not ED?

• The Euclidean distance does not allow us to compare the quality of consequents with different lengths.

• The Euclidean distance between two subsequences of length ρ can actually decrease when we expand to length ρ + 1 due to the (re)normalization of the data. So not only is the effect of length not linear, it is not even monotonic.

What is MDL• MDL or Minimum Description Length is used to

score a rule based on how many bits that can be saved.

A hypothesis (green/bold) can be used to score subsequences by subtracting it from them (producing the small integers shown top) and encoding the difference vector with Huffman encoding

Here the left sequence requires 57 bits, whereas the right sequence requires 84.

After encoding, how many bits it cost to save the sequence:

RULE DISCOVERY ALGORITHM

1. A scoring function

2. A search algorithm which repeatedly invokes this scoring function while searching for high quality rules

Rule Scoring

• For clarity, we begin to consider maxlag is 0

Motif-Based Rule Searching

• Efficient algorithms for discovering the top K motifs in a time series are well-known.In this paper, we use MK algorithm

EXPERIMENT-Zebra finch

EXPERIMENT-Energy Disaggregation

Clothes Washer Clothes Dryer

Conclusion

1. Applid MDL to score time series rules

2. Rule representation is expressive enough to allow rules with different length antecedents/consequents/lags/firing thresholds

Future work

1. On some datasets, Dynamic Time Warping, in single or multi-dimensional cases, may be more robust than the Euclidean distance, but to massive datasets remains an issue.

2. It may be possible to generalize the rule representation to allow more expressive logical connectives

3. There are currently no standard benchmarks for time series rule discovery.

Discovery of Meaningful Rules in Time Series SIGKDD2015

Documents

Datasheet - · PDF fileprocess rules discovery datasheet discover solutions from facts in data: delmia process rules discovery is a unique intelligence tool for manufacturing and

Florida's E-Discovery Rules -- John Barkett, ACEDS

rapid modeling and discovery of priority dispatching rules

Discovery of Meaningful Rules in Time Serieseamonn/RuleDiscovery_Extended_002.pdf · 2015-06-16 · research efforts to consider time series rule-based prediction have met with limited

PART III Discovery · 2011-01-19 · PART III Discovery THE NATURE OF DISCOVERY _____ The discovery rules in Ohio have been patterned after the rules in the federal court, and in

Strategic – Proactive Review and Setup VTE/Stroke Meaningful Use Clinical Rules 2014 Guidelines

Beyond Association Rules: Generalized Rule Discovery Rules/Beyond... · Beyond Association Rules: Generalized Rule Discovery Geo rey I. Webb (webb@infotech.monash.edu.au) School of

Rules of Engagement - Meaningful Communications Strategies in the Digital Age, July 2012

Discovery Projects - Deakin › __data › assets › pdf_file › 0011 › 301601 … · Discovery Program; Part C. provides specific rules for . Discovery Projects. for funding

Electronic Discovery: Litigation Holds, Data Preservation ... · 1 E-Discovery Rules Federal Rules of Civil Procedures amended effective December 1, 2006 formally making Electronically

Markle Foundation: Meaningful Use Rules Comments March 15, 2010

AN UPDATE ON MISSOURI’S NEW DISCOVERY RULES

Funding Rules for schemes under the Discovery … Rules for schemes under the Discovery Program (2017 Edition) 2 APPLICABLE LAW 21 CONFIDENTIALITY

DISCOVERY LEARNING vs RECEPTION LEARNING ......(3) meaningful-discovery, (4) rote-discovery. According to him, the essential distinction between pure reception learning and pure discovery

Pretrial Discovery and Inspection - New Criminal Rules for

New Civil Discovery Rules Seminar and Webinar

Efﬁcient discovery of the top-K optimal dependency rules ...whamalai/articles/icdm10slides.pdf · Efﬁcient discovery of the top-K optimal dependency rules with Fisher’s exact

Discovery of Multiple-Level Association Rules from Large

ON MATERIALS INFORMATICS AND KNOWLEDGE DISCOVERY ... · knowledge representation/discovery, data mining, machine learning, ... pixels of each image were grouped into meaningful clusters

Open PHACTS: meaningful linking of preclinical drug discovery knowledge