59
V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis [email protected] 1

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

Embed Size (px)

DESCRIPTION

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 3

Citation preview

Page 1: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DBMAN3

Group By, HavingCube, RollupOLTP vs OLAPData analysis

[email protected] 1

Page 2: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SELECTDisplayed order of suffixes1. INTO2. FROM3. WHERE4. GROUP BY5. HAVING6. UNION/MINUS7. INTERSECT8. ORDER BY

[email protected] 2

Page 3: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DBMAN3

Group By, HavingCube, RollupOLTP vs OLAPData analysis

[email protected] 3

Page 4: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Grouping/Aggregate functions • SUM - Sum• AVG - Average• MIN - Minimum• MAX - Maximum• COUNT - Number of non null values (records)• GROUP_CONCAT - Concatenated list of elements• STDDEV - Standard deviation• VARIANCE - Variance

[email protected] 4

Page 5: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Non-grouping usage• select avg(sal) as Average from emp;• select min(sal) from emp;• select min(sal) from emp where sal>2000;• select avg(distinct sal) as Average from emp;• select count(sal) from emp;• select count(comm) from emp where sal>2000;• select comm from emp where sal>2000;• select count(*) from emp where sal>2000;• select avg(comm) from emp; NULL values are not

included!

[email protected] 5

Page 6: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Grouping• select distinct deptno from emp;• select avg(sal) from emp where deptno=10;• select avg(sal) from emp where deptno=20;• select avg(sal) from emp where deptno=30; select deptno, avg(sal) from emp group by deptno;

[email protected] 6

Page 7: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GroupingIN THE SELECTION LIST (FIELD LIST) ONLY

THE GROUPED FIELD(s) AND THE GROUPING FUNCTION(s) ARE

ALLOWED!(YES, IN MYSQL AS WELL!!!)

(ONLY_FULL_GROUP_BY)

• select deptno, avg(sal) as Average, min(sal) as Minimum, count(*) as Num from emp group by deptno;

[email protected] 7

Page 8: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Grouping and suffixes• select mgr, avg(sal) from emp group by mgr;• select ifnull(mgr, "none") as boss, lpad(avg(sal), 15, '#')

as "Averagesal" from emp group by mgr;• HAVING vs. WHERE• select mgr, avg(sal) from emp where ename like '%E%'

group by mgr;• select mgr, avg(sal) from emp where ename like '%E%'

group by mgr having avg(sal)>1300;• select mgr, avg(sal) as average from emp where ename

like '%E%' group by mgr having avg(sal)>1300 order by average desc;

[email protected] 8

Page 9: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

More complex grouping queries• select min(max(sal)), max(max(sal)),

round(avg(max(sal))) from emp group by deptno; -- In Oracle this works, in MySQL „Invalid use of group function”

• select min(sal+ nvl(comm,0)), mod(empno,3) from emp group by mod(empno,3) having min(sal+nvl(comm,0)) > 800;

[email protected] 9

Page 10: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

• select distinct job, substr(job, 2, 1) from emp;• select avg(sal) as average, substr(job, 2, 1) from emp

group by substr(job, 2, 1);

• select ename, sal, round(sal/1000) from emp;• select round(sal/1000) as SalCat, count(sal) as Num

from emp group by round(sal/1000);

More complex grouping queries

[email protected] 10

Page 11: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

• select ename, round(datediff(curdate(), hiredate)/365.25) as diff from emp;

• select count(*), round(datediff(curdate(), hiredate)/365.25) as diff from emp group by round(datediff(curdate(), hiredate)/365.25);

More complex grouping queries (MySQL)

[email protected] 11

Page 12: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

• select ename, hiredate, (to_char(sysdate, 'YYYY')-to_char(hiredate, 'YYYY')) as diff from emp;

• select count(*),(to_char(sysdate, 'YYYY')-to_char(hiredate, 'YYYY')) as diff from emp group by (to_char(sysdate, 'YYYY')-to_char(hiredate, 'YYYY'));

• OR: we could use months_between()

More complex grouping queries (Oracle)

[email protected] 12

Page 13: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

• select distinct depno, job from emp;• select deptno, job, avg(sal), min(sal), max(sal) from

emp group by deptno, job order by deptno, job;

Oracle-specific „extras”:– GROUP BY GROUPING SETS– GROUP BY CUBE– GROUP BY ROLLUP

More complex grouping queries

[email protected] 13

Page 14: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DBMAN3

Group By, HavingCube, RollupOLTP vs OLAPData analysis

[email protected] 14

Page 15: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUP BY• Group by, Having – one-field use is "trivial": e.g.

average salary for job or department• Multiple fields: complex grouping, e.g. average salary

for job AND department• Still: only the grouped field and the grouping functions

are allowed in the selection list!!!

[email protected] 15

Page 16: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SELECT job, deptno, avg(sal) FROM emp GROUP BY job, deptno;JOB DEPTNO AVG(SAL)--------- ---------- ----------CLERK 10 1300MANAGER 10 2450PRESIDENT 10 5000ANALYST 20 3000CLERK 20 950MANAGER 20 2975CLERK 30 950MANAGER 30 2850SALESMAN 30 1400

[email protected] 16

Page 17: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SELECT mgr, job, deptno, avg(sal) FROM emp GROUP BY job, deptno, mgr; MGR JOB DEPTNO AVG(SAL)---------- --------- ---------- ---------- 7839 MANAGER 30 2850 7839 MANAGER 10 2450 7782 CLERK 10 1300 7698 SALESMAN 30 1400 7839 MANAGER 20 2975 7902 CLERK 20 800 7698 CLERK 30 950 PRESIDENT 10 5000 7566 ANALYST 20 3000 7788 CLERK 20 1100

[email protected] 17

Page 18: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DISADVANTAGES OF A SINGLE GROUP BY

• Not flexible enough• One grouping per query, thus multiple queries are

needed even if groupings are similar Slower• Aim: One query, multiple groupings GROUPING

SETS• SELECT job, deptno, avg(sal) FROM emp GROUP BY

GROUPING SETS ( (job, deptno) );

[email protected] 18

Page 19: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

NVL – Type matching!• SELECT nvl(mgr, 'Nope'), deptno, avg(sal) FROM emp

GROUP BY GROUPING SETS ( (mgr, deptno) );• SELECT nvl(to_char(mgr), 'Nope'), deptno, avg(sal) FROM

emp GROUP BY GROUPING SETS ( (mgr, deptno) );• SELECT nvl(mgr, 0), deptno, avg(sal) FROM emp GROUP

BY GROUPING SETS ( (mgr, deptno) );

[email protected] 19

Page 20: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUP BY GROUPING SETS

• We can define multiple groupings inside one query, sub-results can be cached

• E.g. performing an MGR, DEPTNO and a JOB, DEPTNO grouping in ONE query:

SELECT nvl(mgr, 0), deptno, nvl(job, 'Nope'), avg(sal) FROM empGROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job)); [email protected] 20

Page 21: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUP BY GROUPING SETS

• SELECT nvl(mgr, 0), nvl(deptno,0), nvl(job, 'NO'), avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr));

• SELECT nvl(mgr, 0), nvl(deptno,0), nvl(job, 'NO'), avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), ());

Why do we have 0 for the mgr value ???

[email protected] 21

Page 22: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 [email protected] 22

Page 23: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING• Using the GROUPING special "grouping function" we can

determine if the given field is used for a grouping in a record

• Grouping function: allowed in the selection list• Special: It can only work with a grouped field!

[email protected] 23

Page 24: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING0 = TRUE ?

• When using with a single and multi-field simple GROUP BY, it returns with 0

• SELECT job, avg(sal), grouping(job) FROM emp GROUP BY job;

• SELECT deptno, job, avg(sal), grouping(job) FROM emp GROUP BY job, deptno;

• When using with grouping sets: grouping = 0 means that the field is being used in the aggregation for that record

[email protected] 24

Page 25: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING

• SELECT mgr, deptno, job, avg(sal), GROUPING(mgr) as GMGR, GROUPING(deptno) as GDEPTNO, GROUPING(job) as GJOB FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), ());

[email protected] 25

Page 26: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 [email protected] 26

Page 27: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING• SELECT

CASE WHEN GROUPING(mgr)=0 THEN mgr ELSE 0 END as MGR,

CASE WHEN GROUPING(deptno)=0 THEN deptno ELSE 0 END as DEPTNO,

CASE WHEN GROUPING(job)=0 THEN job ELSE 'NO' END as JOB,

avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), ());

[email protected] 27

Page 28: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 [email protected] 28

Page 29: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING_ID• Unique identifier for each possible grouping column

configuration• SELECT mgr, deptno, job, avg(sal), GROUPING_ID(mgr,

deptno, job) as GID FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), ());

[email protected] 29

Page 30: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 [email protected] 30

Page 31: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUP BY GROUPING SETSDRAWBACKS• Too complicated, too long• When do we need a query with three totally different

grouping sets? What kind of caching can we do here?• Usually, there are hierarchical relations between the

grouping fields more meaning, more caching ROLLUP and CUBE GROUPING and GROUPING_ID can be used the same

way

[email protected] 31

Page 32: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

CUBE• GROUP BY CUBE (a, b, c) =

GROUP BY GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( )).

• CUBE(field1, field2) the two fields have the same rank, all permutations are shown

• CUBE(job, deptno): In addition for the simple two-field grouping, we get the job-averages, the department-averages, and the total average

[email protected] 32

Page 33: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SELECT job, deptno, avg(sal) FROM emp GROUP BY CUBE(job, deptno);

[email protected] 33

Page 34: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

ROLLUP• GROUP BY ROLLUP (a, b, c) =

GROUPING SETS ( (a, b, c), (a, b), (a), ( ))• ROLLUP(field1, field2) the first field is hierarchically

more important, we only take the permutations where it is used

• ROLLUP(job, deptno): In addition for the simple two-field grouping, we get the job-averages and the total average

[email protected] 34

Page 35: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SELECT job, deptno, avg(sal) FROM emp GROUP BY ROLLUP(job, deptno);JOB DEPTNO AVG(SAL)

--------- ---------- ----------CLERK 10 1300MANAGER 10 2450PRESIDENT 10 5000ANALYST 20 3000CLERK 20 950MANAGER 20 2975CLERK 30 950MANAGER 30 2850SALESMAN 30 1400ANALYST 3000CLERK 1037,5MANAGER 2758,33333PRESIDENT 5000SALESMAN 1400 2073,21429

[email protected] 35

Page 36: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

MIXTURE OF GROUPINGS• GROUP BY a, CUBE (b, c) =

GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a) )• GROUP BY a, ROLLUP (b, c) =

GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a) )

[email protected] 36

Page 37: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DBMAN3

Group By, HavingCube, RollupOLTP vs OLAPData analysis

[email protected] 37

Page 38: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

OLTP? OLAP?• OLTP = On Line Transaction Processing• OLAP = On Line Analytic Processing• OLTP

– product » price– invoice » amount– client » name

• OLAP– Product category × Region » Gross margin– Product × Warehouse » Inventory– Supplier × Time × Product » Return rate– Tables are usually a result of grouping!

[email protected] 38

Page 39: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

OLTP vs OLAPOLTP OLAP

Application Operational: ERP, CRM, legacy apps

Management Information System, Decision Support System

Typical users

Staff Managers, Executives

Horizon Weeks, Months YearsRefresh Immediate PeriodicData model Entity-relationship Multi-dimensionalSchema Normalized StarEmphasis Update Retrieval

[email protected] 39

Page 40: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Star data model?

[email protected] 40

Page 41: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

Star data model? • The supervisor that

gave the most discounts?

• The quantity shipped on a particular date, month, year or quarter?

• In which zip code did product A sell the most?

[email protected] 41

Page 42: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

OLAP rules• Automatized data transfer

– Extract data from OLTP system(s)– Transform/standardize, if necessary– Import to OLAP database– Build cubes (GROUP BY!)– Produce reports

• Drilling– Drill down: region city district– Drill up: city region country– Drill across: north region south region west

[email protected] 42

Page 43: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

OLAP vs Group by• Every dimension can be a result of a group by query• Every data cube will be a result of group by queries• One problem: missing/bad data points We need trends and projections!

[email protected] 43

Page 44: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

DBMAN3

Group By, HavingCube, RollupOLTP vs OLAPData analysis

[email protected] 44

Page 45: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

1. FROM2. WHERE3. GROUP BY4. HAVING5. UNION/MINUS6. INTERSECT7. ORDER BY8. INTO

SELECTOrder of suffixes

[email protected] 45

Page 46: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

BASIC PROBLEMS• Functions: in the selection list• Order by, group by: always executed after functions, so

we might need sub-queries• ROWNUM s*cks (later...)• Solution: special functions, that can work together with

the ordering / grouping of records

[email protected] 46

Page 47: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

RANK FUNCTIONS• SELECT ROW_NUMBER() OVER (ORDER BY ENAME ASC)

AS RNUM, ENAME FROM EMP;• Simple rank functions:

RANK() 1, 2, 2, 4DENSE_RANK() 1, 2, 2, 3PERCENT_RANK() percentage, [0..1]

• NO PARAMETERS!

[email protected] 47

Page 48: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

LET'S TRY THOSE…• SELECT ename, sal,

RANK() over (ORDER BY sal desc)FROM emp;

• + DENSE_RANK(), PERCENT_RANK()

[email protected] 48

Page 49: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

RANK WITHIN A GROUP• SELECT deptno, ename, sal,

RANK() OVER (PARTITION BY deptnoORDER BY sal

) as RANGFROM emp;

[email protected] 49

Page 50: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

RANK WITHIN A GROUP• SELECT deptno, job, ename, sal,

RANK() OVER (PARTITION BY deptno, jobORDER BY sal

) as RANGFROM emp;

• + ORDER BY …

[email protected] 50

Page 51: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES• SELECT ename, sal,

SUM(SAL) OVER (order by sal) as MySALFROM emp;

• Ordered list!

• SELECT ename, sal,AVG(SAL) OVER (order by sal) as MySALFROM emp;

[email protected] 51

Page 52: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES• SELECT deptno, ename, sal,

SUM(SAL) OVER (partition by deptnoorder by ename

) as MySumFROM empORDER BY deptno, ename;

[email protected] 52

Page 53: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES

• alter session set nls_date_format='YYYY-MM-DD';• select ename, hiredate, sal from emp order by hiredate;• select ename, hiredate, sal, sum(sal) over (order by

hiredate) as TOTAL from emp order by hiredate;• select ename, hiredate, sal, sum(sal) over (partition by

to_char(hiredate, 'YYYY') order by hiredate) as TOTAL from emp order by hiredate;

[email protected] 53

Page 54: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SUBSET(Sliding window)• SELECT ename, sal,

avg(SAL) OVER (order by salrows between 1 preceding and 2

following) as MyAvgFROM emp;

[email protected] 54

Page 55: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SUBSET(Sliding window)• SELECT deptno, ename, sal,

sum(SAL) OVER (partition by deptno order by salrows between 0 preceding and 1

following) as MySumFROM emp;

[email protected] 55

Page 56: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

SUBSET(Sliding window)

• We can use the RANGE keyword• SELECT deptno, ename, sal,

sum(SAL) OVER (order by salrange between current row and unbounded following

) as MySumFROM emp;

[email protected] 56

Page 57: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0

OTHER ANALYTICAL FUNCTIONS• FIRST_VALUE(), LAST_VALUE()• RATIO_TO_REPORT() Ratio compared to the sum

valueSELECT ename, sal,RATIO_TO_REPORT(sal) OVER ()FROM emp ORDER BY sal desc;+ PARTITION BY

[email protected] 57

Page 58: V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 [email protected] 58