7
h.l;u't [-', L DATA MINING QUERY LANGUAGES DMOL-A Oata tvtinine Q '+ Data mining language must be designed to facilitate flexible and effective knowledge discovery. + Having a query language for data mining may help standardize the development of platforms for data mining systems. 4 gut designed a language is challenging because data mining covers a wide spectrum of tasks and each task has different requirement. * Hence, the design of a language requires deep understanding of the limitations and underlying mechanism of the various kinds of tasks. 'S So...how would you design an efficient query language??? ',& Based on the primitives discussed earlier. + DMQL allows mining of different kinds of knowledge from relational databases and data warehouses at multiple levels of abstraction + Adopts SQL-like syntax ,'*. Hence, can be easily integrated with relational query languages ,t. Defined in BNF grammar o [ ] represents 0 or one occurrence o { } represents 0 or more occurrences .,$ Words in sans serif represent keywords A DMQL can provide the ability to support ad-hoc and interactive data mining By providing a standardized language like SQL ' Hope to achieve a similar effect like that SQL has on relational database . Foundation for system development and evolution

Data Mining Query Language

Embed Size (px)

DESCRIPTION

dmql

Citation preview

Page 1: Data Mining Query Language

h.l;u't [-',

L

DATA MINING QUERY LANGUAGES

DMOL-A Oata tvtinine Q

'+ Data mining language must be designed to facilitate flexible and effective knowledge

discovery.

+ Having a query language for data mining may help standardize the development ofplatforms for data mining systems.

4 gut designed a language is challenging because data mining covers a wide spectrum of

tasks and each task has different requirement.

* Hence, the design of a language requires deep understanding of the limitations and

underlying mechanism of the various kinds of tasks.

'S So...how would you design an efficient query language???

',& Based on the primitives discussed earlier.

+ DMQL allows mining of different kinds of knowledge from relational databases and data

warehouses at multiple levels of abstraction

+ Adopts SQL-like syntax

,'*. Hence, can be easily integrated with relational query languages

,t. Defined in BNF grammar

o [ ] represents 0 or one occurrence

o { } represents 0 or more occurrences

.,$ Words in sans serif represent keywords

A DMQL can provide the ability to support ad-hoc and interactive data mining

By providing a standardized language like SQL

' Hope to achieve a similar effect like that SQL has on relational database. Foundation for system development and evolution

Page 2: Data Mining Query Language

2

. Facilitate information exchange, technology transfer, commercialization

and wide acceptance

I Design

D DMQL is designed with the primitives described as follows:

.4x Syntax for DMQL

'* Syntax for specification oftask-relevont dota

* the kind of knowledge to be mined

'l* con cept hi erarchy specification

'&. pottern presentotion and visualizotion

* Putting it all together - o DMQL query

Syntax of DMQL

,/ (DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)

./ (DMQL_Stotement) ;;= (pota_Mining_Stotement) | (Concept_Hierorchy_Definition-Statement)

| (V is ua I i zoti o n-o n d-P re se ntati o n )

./ Doto_Mining_Stotement) ::= use database(dotabase_nome) | use data warehouse

(doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}

(Mine-Knowledge-Specification) in relevance to

( attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) )

[where (condition)] [order by (order_list) [group by (grouping-list)] [hoving (condition)]

{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]

./ Mine_Knowtedge_Specificotion) ;;= (Mine-Char) | (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)

./ (Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s))

,/ (Mine_Desc) ::= mine comparison [as (pattern-name)] lor (target-closs)where

(torget_condition) {versus (contrast-closs_i) where (contrast-condition-i)l

analyze (meosure(s))

,/ Mine-Assoc) ::= mine ossociation [as (pottern-name)] [motching (metopottern)]

./ Mine_Closs) ::= mine classification [as (pottern-name)] analyze

( cl a ssify i n g-ott ri b ute -or-d

i me n s i o n )

Page 3: Data Mining Query Language

7,

,/ (Concept_Hierorchy_Definition-statemeittl ::= define hierorchy (hierorchy-nonte)

[for (attribute_or_dimension)] on (relotion_or_cube_or_hierarchy)

as (hierarchy_description) [where (condition)]

./ (Visuolization_and_Presentotion) ::= display as (resultJorm) | {(Multilevel_Manipulation)}

./ (Multilevel_Monipulation) ::= roll up on (ottribute_or_dimension)

I drill down on (ottribute_or_dimension) | odd (attribute_or_dimension)

I d rop ( att ri b ute_o r_d i m e nsi o n )

DMQL-Svntax for task-relevant data specification

. Nomes of the relevont database or doto warehouse, conditions ond relevant attributes or

dimensions must be specified

. use ddtabase <dotabase_nome) or use dota worehouse <data_worehouse_name)

. from <relation(s)/cube(s)t [where condition]

. inrelevdnceto<attribute or dimension listt

. order by torder_list>

. group by <grouping_list>

. hoving <conditiont

Svntax for specifvine the kind of knowledee to be mined

/ Characterization

o Mine_Knowledge_Specification ::=

m i ne ch a ro cteri sti cs [ospattern-na me]

anolyze measure{s)

o Specifies that characteristic descriptions are to be mined

o Analyze specifies aggregate measures

o Example: mine characteristics as customerPurchasing analyze count%

Page 4: Data Mining Query Language

/ Discriminationt

M i n e-Kn ow I e d ge-S Pe cifi coti o n : :=

mine comporison [as pattern-name]

for target-class where target-condition

{versus contrast-class-i where confidst-condition-i}

analYze measure(s)

''' Specifies thot discriminant descriptions ore to be mined, compore o given target closs of obiects

with one or more contrasting c/osses (thus referred to os comparison)

' Andlyze specifies oggregote meosures

. Example: mine comporison as purchose Groups for big Spenders where avg(t.price) >= 5L00

versus budget Spenders where avg(l'price) < 5100 onalyze count

/ Association

o Mine-Knowledge-specification ::=

mine associations [as pattern-namel

r [matching(metaPattern)]

o Specifies the mining of patterns of association

o can provide templates (metapattern) with the matching clause

o Example: mine associations as buyingHabits matching P(X: customer, W) and Q(X, Y; =2

buys (X,Z)

/ Classification

o Mine-Knowledge-specification ::=

m i ne cl o ssifi cqti o n Iospatte rn-na me]

o no lyzeclassifyi ng-attri bute-or-di me nsion

. Specifies that patterns for data classification are to be mined

. Analyze clause specifies that classification is performed according to the values

of (cl assifyi ng-attri bute-or-d i me nsion)

. For categorical attributes or dimensions, each value represents a class (such as

low-risk, medium risk, high risk)

4.

Page 5: Data Mining Query Language

5

I For numeric attributes, each class defined by a range (such as 20-39,40-59, 60-

89 for age)

' Example: mine classifications as classifyCustomerCreditRating analyze creditrating

/ To specifv what concept hierarchies to use

use h ie ra rchy <hierarchy> for <attribute_or_dimension>

We use different syntax to define different type of hierarchies

o schema hierarchies

define hierarchy time_hierarchy on date as [date, monthquarter, year]

o set-groupinghierarchies

. define hierarchy age-hierarchy for age on customer as

. levell: {young, middle_aged, seniorl < level0: all

o level2: {2O, ...,39} < levelli young

o level2: {4O, ...,59} < levell: middle_agedo level2: {60, ..., 89} < levell: senior

o operation-derived hierarchies

Definehierarchyage_hierarchy for age on customer as

{age_category (1), ...,age_category(5)} := cluster(default, age, 5) <all(age)

o rule-basedhierarchies

Def i n e h i e ra rc hyprof it_ma rgin_h iera rchyo n item a s

o level_l: low_profit_margin< level_O: all

o if (price - cost)< $50o level_l: medium-profit_margin<level_0: all

o if ((price - cost) > $SO1 and ((price - cost) <= $250))o level_l: high_profit_margin< level_0: all

o if (price - cost) > $250

/ Syntax for pattern oresentation and visualization specification

We have syntax which allows users to specify the display of discovered patterns in one or

more forms

Page 6: Data Mining Query Language

6,

display as <result_form>

ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces

To facilitate interactive viewing at different concept level, the following syntax is defined:

M u lti level_Ma n i pu lati on'.'.= rol I u p o nallribute-or_d ime nsion

I d ri I I dow n onattribute_or_dimension

I addattribute_or-dimension I

dropattri b ute_o r_d i me nsi o n

used ata ba seAll Electronics_d b

usehiera rchylocation_hierarchy for B.address

mine cha racteristics ascustomerPurchasing

analyze count%

in relevance to C.age,l.type, l.place-made

from customer C, item l, purchases P, items-sold S, works-at W, branch

wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD

andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"

andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" and

l.prico= 100

with noise threshold = 0.05

displayas table

/ Other Data Minine Laneuaees & Standardization Efforts

.'* Association rule language specifications

o MSQL (lmielinski& Virmani'99)

o MineRule (MeoPsaila and Ceri'96)

Page 7: Data Mining Query Language

7

o Query flocks based on Datalog slntax (Tsur et al'98)

* OLEDB for DM (Microsoft'2000)

o Based on OLE, OLE DB, OLE DB for OLAP

o lntegrating DBMS, data warehouse and data mining

+ CRISP-DM (CRoss-lndustry Standard Process for Data Mining)

o Providing a platform and process structure for effective data mining

o Emphasizing on deploying data mining technology to solve business problems

+ Other Data Mining Languages & Standardization Efforts

+ Association rule language specifications

o MSQL (lmielinski& Virmani'99)

o MineRule (MeoPsaila and Ceri'96)

o Query flocks based on Datalog syntax (Tsur et al'98)

"a! OTEDB for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)

o Based on OLE, OLE DB, OLE DB for OLAP, C#

o lntegrating DBMS, data warehouse and data mining

+ DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)

o Providing a platform and process structure for effective data mining

Hierarchy Specification

A hierarchy is a root member of an alternate hierarchy, which is always at generation2 ofa dimension. Member value expressions are not allowed as hierarchy arguments.

Alternate hierarchies are applicable to aggregate storage databases only.

The dimension of the hierarchy argument passed to a function must match the dimension of theother arguments passed to the function. If they do not match, an error is return and the query isaborted.

urN++7