24
A General Model for OLAP of Complex Data Jian Pei State University of New York at Buffalo, USA http://www.cse.buffalo.edu/faculty/jianpei/

A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

A General Model for OLAP of Complex Data

Jian PeiState University of New York at Buffalo, USA

http://www.cse.buffalo.edu/faculty/jianpei/

Page 2: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 2

Outline

• Motivation• GOLAP – a general OLAP model• Applying GOLAP on complex data• Conclusions

Page 3: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 3

OLAP on Relational Data

9FallP1S212SpringP2S16SpringP1S1

MeasureDimensions

SalesSeasonProductStore

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9

(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9

(*,*,*):9

Operations:-Roll-up-Drill-down-Slice, dice, pivot (rotate)

Page 4: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 4

Why OLAP is Desirable?

• Multi-level, multi-dimensional summarization– Identify multi-level, multi-dimensional trends,

changes and exceptions• Can we conduct OLAP on complex data?

– Data types: strings, time series, sequences, XML documents, …

– “What are the major patterns among the gene expressions that are similar to the given new sample?”

Page 5: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 5

Gene Expression Matrix

w11 w12 w13

w21 w22 w23

w31 w32 w33gene

s

Samples/time

igr

isr

Page 6: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 6

Can We OLAP Gene Expression Data?

• Gene expression data – matrices– Oh, it can be treated as a relational table! ☺

• Syntax problem: what should be the measure?– SUM, MAX, MIN, AVG? They do not make sense! – The patterns are wanted

• Semantic problem: what should be the OLAP operations? – What is the meaning by generalizing (roll up) a

sample/gene?

Page 7: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 7

Good News, We Are Not Far Away

• Two major issues in defining an OLAP model– How to partition the data into summarization

units at various levels?– How to summarize the data?

• The summarization units for OLAP should yield to some nice hierarchical structure– What about a lattice? – It’s nice

Page 8: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 8

GOLAP – A General OLAP Model

• Base database – a set of objects• Grouping function

– Map a set of query objects in the base database to the smallest summarization unit covering the query set

– Containment: a summarization unit is still in the base database

– Monotonicity: Q1 ⊆ Q2 g(Q1) ⊆ g(Q2)– Closure: a summarization unit is self-closed

Page 9: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 9

Grouping Function and Class

• Class: a subset of objects S s.t. g(S) = S

A classA larger class

The whole base database itself is a class

Page 10: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 10

Grouping Function – Lattice

• The classes generated by a grouping function form a lattice

• Good news: containment, monotonicityand closure are sufficient to get a nice hierarchical structure!

• Member function: from class to the set of members

Page 11: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 11

Summarization Function

• A mapping from a set of objects to a summary– A set of sequences the sequential patterns– A set of time series the dominant pattern– A set of XML trees the frequent subtrees

Page 12: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 12

OLAP Operations

• Given– A grouping function– A summarization function

• OLAP operations– Summarize: return the summary of the smallest class

covering the query set– Roll up: return the summary of the smallest class

covering the query set and the current class– Drill down: return the summary of the smallest class

covering the current class except for the query set

Page 13: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 13

GOLAP Model and Data Warehouse

• GOLAP model (g, f)– g – grouping function– f – summarization function

• G-warehouse {(c, f(c))}– c is a class

• (g1, f1) and (g2, f2) are two GOLAP models. Then, ((g1,g2), (f1,f2)) is also a GOLAP model

• GOLAP on relational data is consistent with the traditional OLAP model

Page 14: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 14

Applying GOLAP on Complex Data

• How to find a meaningful grouping function?– Use clusters from hierarchical clustering

• What kind of hierarchical clustering can lead to a grouping function in GOLAP?– Each cluster contains a subset of objects– The hierarchy covers every object– The whole set of objects is the root cluster– Ancestor/descendant relation based on containment– For any two clusters c1 and c2, c1 ∩ c2 is a cluster if it

is not empty

Page 15: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 15

Fixing the Clustering Methods

• Many hierarchical clustering methods, but not all, satisfy the requirements– The requirement “c1 ∩ c2 is a cluster” may be

violated by some methods• Fix: make the non-empty intersections of

clusters as “intermediate clusters”

Page 16: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 16

GeneXplorer: A GOLAP System

• OLAP gene expression time series data• Use a hierarchical clustering

– Based on attraction tree – the index structure of G-data warehouse

• Coherent patterns as summarization• Basic operations

– Roll up– Drill down– Slice

Page 17: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 17

Towards Interactive Exploration of Gene Expression Patterns

• Mine hierarchical clusters of co-expressed genes and coherent patterns

Page 18: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 18

Indexing Clusters

Page 19: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 19

Interactive Exploration on Iyer’s Data Set

Page 20: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 20

Comparison with Other Methods

0.9960.9760.9810.974100.8000.8440.8240.70290.9990.9140.9970.99180.7190.9900.9760.96770.9840.9700.9890.95260.8550.8680.8550.95850.9680.8830.9840.98040.9970.9940.9930.98430.8870.9910.9110.95720.9550.8840.9560.9931

CAST(9)CLICK(7)Adapt(7)GeneXplorer(9)Pattern

Each cell represents the similarity between the pattern reported by different approaches and the corresponding pattern in the ground truth

Page 21: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 21

Other Features of GeneXplorer

• Model adjustment – GOLAP models as plug-ins– User can change the grouping function and

summarization function• Gene annotation panel

– Link patterns to ground truth from public annotations

– Pattern and object visualization

Page 22: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 22

Conclusions

• Problem: how to construct a general model for OLAP on complex data?

• Solution: GOLAP – a general model– Consistent with traditional OLAP on relational

data– Can handle complex data

• A case study: GeneXplorer

Page 23: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 23

Future Work

• Is it necessary to introduce new OLAP operations for complex data?– Data/application oriented or general?

• Efficient implementation of G-warehouse• Data integration based on general OLAP

on complex data

Page 24: A General Model for OLAP of Complex Datajpei/publications/golap-er03... · 2003-10-18 · GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function

Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 24

Thank You!

http://www.cse.buffalo.edu/faculty/jianpei/