View
254
Download
2
Category
Preview:
Citation preview
Presenter
V.Hariharaputhranwww.puthranv.com
SQL Pattern Matching in Oracle 12c
Senior Oracle Developer DBA with 12 years of experience
-Data Modeler-Developer
-SQL Tuning-Replication/Migration
Member of AIOUG Since 2009Blogger – www.puthranv.comFB – Hariharaputhran Vaithinathan
Who am I
Objectives
To make you an expert in Oracle Analytical Functions and a big data champion
Present few key Oracle Analytical functions
Present SQL Pattern Matching in a easy way for better understanding
SQL Model Clause
SQL Pivot/Unpivot
SQL Pattern Matching
Trend
Handling Complex requirements in an easy
and efficient m
ethod
On-Going Evolution of SQL
Analytical Functions
Get me all the first name which contains JUAN
Oracle 8i like ‘%JUAN%’
“SANGAM THE LARGEST INDEPENDENT ORACLE EVENT IN INDIA”
PICK THE FIRST & LAST WORD
SQL> WITH TEMP AS
2 (SELECT 'SANGAM THE LARGEST INDEPENDENT ORACLE EVENT IN INDIA' AS DAT FROM DUAL)
3 SELECT
4 REGEXP_SUBSTR(DAT,'[^ ]+',1) AS FIRST_WORD,
5 REGEXP_REPLACE(DAT, '[[:print:]]* ([^ ]*)$','\1') AS LAST_WORD
6 FROM TEMP
7 /
FIRST_WORD LAST_WORD
---------- ----------
SANGAM INDIA
Oracle 10g
Regular Expressions
Regular Expressions
SQL Pattern Matching MATCH_RECOGNIZE
Oracle 12c
Look for a pattern / trend
What do you want to do with your data – Big Data
Identify Risk Increase Revenue / Reduce Cost
Fraud UnusualUsage Buying
PatternsMoneyLaundering
Recognize Patterns Scale analysis of Big Data FAST/FASTER/ASAP
Big Data Analytics
But can we make it simple ?
Have you done this
SELECT ID_ROUTE,DISTANCE,0 ROUTE_DIFFFROM ROUTEWHERE ID_ROUTE=(SELECT
MIN(ID_ROUTE) FROM ROUTE )
UNION ALLSELECT T1.ID_ROUTE, T1.DISTANCE, T1.DISTANCE-MAX(T2.DISTANCE) DFROM ROUTE T2,(SELECT ID_ROUTE, DISTANCE
FROM ROUTE ) T1WHERE T2.ID_ROUTE<T1.ID_ROUTEGROUP BY T1.ID_ROUTE, T1.DISTANCEORDER BY ID_ROUTE;
Self JoinROUTE
ROUTE
ROUTE
ROUTE
SELECT ID_ROUTE,DISTANCE, DISTANCE-LAG(DISTANCE,1,DISTANCE)
OVER(ORDER BY ID_ROUTE) ROUTE_DIFF FROM ROUTE ORDER BY ID_ROUTE;
Analytical Functions
ROUTE
Self Join Vs Analytical
Self Join
Analytical
Buying Pattern
ZZXX YY XXCustomer Buying_Pattern
Liza X,Y,Z,X
Mark X,Y,X
Lorena X,Y,Y,X
SELECT customer, Buying_Pattern FROM TBL_REG WHERE REGEXP_LIKE(REPLACE(X,','),'XYZX');
XX YY XXREGEXP_LIKE(REPLACE(X,','),'XYZX|XYX')
Pattern from a stream of rows
CUSTOMER PRODUCT
Liza X
Liza Y
Liza Z
Liza X
Lorena X
Lorena Y
Lorena Y
Lorena X
Mark X
Mark Y
Mark X
ZZXX YY XX
??
LEAD Analytical Function
WITH PRDT_ORDER AS(SELECT BP.*,LEAD(PRODUCT) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) NEXT_PRDT,LEAD(PRODUCT,2) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) S_NEXT_PRDT,LEAD(PRODUCT,3) OVER (PARTITION BY CUSTOMER ORDER BY BUY_DT) T_NEXT_PRDTFROM BUYING_PATTERN BP)SELECT CUSTOMER,BUY_DT,PRODUCT FROM PRDT_ORDER WHERE PRODUCT='X'AND NEXT_PRDT = 'Y'AND S_NEXT_PRDT = 'Z'AND T_NEXT_PRDT = 'X';
Pattern Recognition In Sequences of Rows
What is SQL Pattern Matching
SELECT * FROM tbl_commodity MATCH_RECOGNIZE ( PARTITION BY commodity ORDER BY commodity_date MEASURES BEGN.commodity_date AS start_time, BEGN.commodity_price AS start_price, LAST(DOWN.commodity_date) AS bottom_time, LAST(DOWN.commodity_price) AS bottom_price, LAST(UP.commodity_date) AS end_time, LAST(UP.commodity_price) AS end_price one ROW PER MATCH AFTER MATCH SKIP TO LAST UP PATTERN (BEGN DOWN+ UP+) DEFINE DOWN AS DOWN.commodity_price < PREV(DOWN.commodity_price), UP AS UP.commodity_price > PREV(UP.commodity_price) ) MR ORDER BY MR.commodity, MR.start_time;
Looks BigComplexToo much of Key Words
First Look
SQL Pattern Matching
Identify the Pattern Define the Pattern Identify the output/ Kind of column list Return the specific row matching the Pattern Start looking for next pattern
MATCH_RECOGNIZE – In simple terms
Examples Find both V and W shapes in Trading History
– Price Dips Find Large Transactions Occurring within a
specified time # of Consecutive Authentication Failures Suspicious Money Transfer
Identify a Pattern
Examples
Find Large Transactions Occurring within a specified time
Identify a Pattern cont..
SQL Pattern Matching – Define Patterns
How to define the Patterns ?
Patterns are defined using regular expressions and matched against sequence of rows.
Concatenation: no operator Quantifiers: ◦ * 0 or more matches◦ + ◦ ? 0 or 1 match◦ {n} exactly n matches◦ {n,} n or more matches◦ {n, m} between n and m (inclusive) matches◦ {, m} between 0 an m (inclusive) matches
Alternation: | Grouping: ()
SQL Pattern Matching - Building Regular Expressions
1 or more matches
Single dip – V Pattern
Example – V Pattern
4186841869
4187041871
4187241873
4187441875
4187641877
4187841879
4188041881
4188241883
4188441885
4188641887
4188841889
4189041891
4189241893
4189441895
4189641897
2700
2710
2720
2730
2740
2750
2760
Gold Price
Gold Cost
SELECT * FROM tbl_commodity
MATCH_RECOGNIZE ( PARTITION BY commodity ORDER BY
commodity_date.
V Pattern – Cont..
days
Gold Price
PATTERN (BEGN DOWN+ UP+)DEFINE DOWN AS (commodity_price < PREV(DOWN.commodity_price)),UP AS (commodity_price > PREV(UP.commodity_price))
Double dip – W Pattern
SELECT * FROM tbl_commodity
MATCH_RECOGNIZE ( PARTITION BY commodity ORDER BY
commodity_date.
W Pattern – Cont..
days
Gold Price
PATTERN (BEGN D1+ U1+ D2+ U2+)
D1 AS (commodity_price < PREV(D1.commodity_price)),
U1 AS (commodity_price > PREV(U1.commodity_price)),
D2 AS (commodity_price < PREV(D1.commodity_price)),
U2 AS (commodity_price > PREV(U1.commodity_price))
DEFINE
D1 U1
D2 U2
LIST THE OUTPUT
Starting point of my Pattern
LIST THE OUTPUT
days
Gold Price
The bottom point
End point of the pattern
MEASURES section – is to identify the output of the Query
MEASURES
BEGN.commodity_price AS start_price,
last(DOWN.commodity_price) AS bottom_price
last(UP.commodity_price) AS end_price
Return the specific row matching the Pattern
Output one row each time we find a match to our pattern
What rows to return
days
Gold Price
ONE ROW PER MATCH
Start looking for next pattern
What is the starting point for next V days
Gold Price
Last row of the first pattern becomes the first row of the next pattern
AFTER MATCH SKIP TO LAST UP
Declarative – Pattern Matching
Rows to be returned after matchONE ROW PER MATCHALL ROWS PER MATCH ALL ROWS PER MATCH WITH UNMATCHED ROWS
Skip Options after MatchSKIP PAST LAST ROWSKIP TO NEXT ROWSKIP TO <VARIABLE>SKIP TO FIRST(<VARIABLE>)SKIP TO LAST (<VARIABLE>)
Declarative – Pattern Matching
MATCH_RECOGNIZE - Functions
CLASIFFIER(): Which rows are members of which match.MATCH_NUMBER(): Which pattern variable applies to which
rows.PREV(): Access to a column/expression in a previous rows.NEXT(): Access to a column/expression in a next row.LAST(): Last value within the pattern match.COUNT(), AVG(), MAX(), MIN(), SUM()
MATCH_RECOGNIZE - Functions
CLASIFFIER/ MATCH_NUMBERMATCH_NUMBER() AS Match_Num,CLASSIFIER() AS Pattern_MatchALL ROWS PER MATCH
Detect suspicious money transfer pattern for an account◦ Three or more small amount (<2K) money transfers
within 30 days◦ Subsequent large transfer (>=1M) within 10 days of
last small transfer. Report account, date of first small transfer,
date of last large transfer
Suspicious Money Transfer
Source: http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG9303
Suspicious Money Transfer cont.
Source: http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG9303
SELECT * MATCH_RECOGNIZE (
) MR ORDER BY MR.COMMODITY, MR.START_PRICE;
SQL Pattern Matching – MATCH_RECOGNIZE
1. SourceFROM TBL_COMMODITY2. Group & Order the
Source PARTITION BY COMMODITYORDER BY COMMODITY_DATE3. Declare the Pattern
PATTERN (BEGN DOWN+ UP+)
4. Defining Conditions
DEFINEDOWN AS (COMMODITY_PRICE < PREV(DOWN.COMMODITY_PRICE)),UP AS (COMMODITY_PRICE > PREV(UP.COMMODITY_PRICE))
5. Rows Per MatchONE ROW PER MATCH
6. Query Output
MEASURES BEGN.COMMODITY_PRICE AS START_PRICE, LAST(DOWN.COMMODITY_PRICE) AS BOTTOM_PRICE, LAST(UP.COMMODITY_PRICE) AS END_PRICE
7. Look for Next PatternAFTER MATCH SKIP TO LAST UP
Java VS Sql
Java vs. SQL: Stock Markets - Searching for ‘W’ Patterns in Trade Data
250+ Lines of Java and PIG
package pigstuff;import java.io.IOException;import java.util.ArrayList;import java.util.Iterator;import org.apache.pig.EvalFunc;import org.apache.pig.PigException;import org.apache.pig.backend.executionengine.ExecException;import org.apache.pig.data.BagFactory;import org.apache.pig.data.DataBag;import org.apache.pig.data.DataType;import org.apache.pig.data.Tuple;import org.apache.pig.data.TupleFactory;import org.apache.pig.impl.logicalLayer.FrontendException;import org.apache.pig.impl.logicalLayer.schema.Schema;/** * * @author nbayliss */
private class V0Line { String state = null; String[] attributes; String prev = "”; String next = ””; public V0Line(String[] atts) { attributes = atts; }
public String[] getAttributes() { return attributes; }
public void setState(String state) { this.state = state; }
public String setState(V0Line linePrev, V0Line lineNext) {
private boolean eq(String a, String b) {
private boolean gt(String a, String b) {
public Tuple exec(Tuple input) throws IOException {
@Override public Schema outputSchema(Schema input) { Schema.FieldSchema linenumber = new Schema.FieldSchema("linenumber", DataType.CHARARRAY); Schema.FieldSchema pbykey = new Schema.FieldSchema("pbykey", DataType.CHARARRAY); Schema.FieldSchema count = new Schema.FieldSchema("count", DataType.LONG);
Schema tupleSchema = new Schema(); tupleSchema.add(linenumber); tupleSchema.add(pbykey); tupleSchema.add(count); return new Schema(tupleSchema); }
}
SELECT first_x, last_zFROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND z.time - FIRST(x.time) <= 7 ))
12 Lines of SQL
Source - Keith Laker presentation
Source - http://www.oracle.com/in/corporate/events/oracle-database-2197020-en-in.pdf
Fraud Detection
Summary
• Pattern in a Sequence of Rows• Measure Column Must have aliases • Match_Number, All Rows per match with unmatched rows – Helps in identify what went wrong.• DISTINCT is not supported• Test you pattern – Catastrophic Backtracking
Q & A
Thanks for attending my session
Presenter
V.Hariharaputhranwww.puthranv.com
Recommended