Upload
mukeshnt
View
51
Download
1
Embed Size (px)
DESCRIPTION
dmql
Citation preview
h.l;u't [-',
L
DATA MINING QUERY LANGUAGES
DMOL-A Oata tvtinine Q
'+ Data mining language must be designed to facilitate flexible and effective knowledge
discovery.
+ Having a query language for data mining may help standardize the development ofplatforms for data mining systems.
4 gut designed a language is challenging because data mining covers a wide spectrum of
tasks and each task has different requirement.
* Hence, the design of a language requires deep understanding of the limitations and
underlying mechanism of the various kinds of tasks.
'S So...how would you design an efficient query language???
',& Based on the primitives discussed earlier.
+ DMQL allows mining of different kinds of knowledge from relational databases and data
warehouses at multiple levels of abstraction
+ Adopts SQL-like syntax
,'*. Hence, can be easily integrated with relational query languages
,t. Defined in BNF grammar
o [ ] represents 0 or one occurrence
o { } represents 0 or more occurrences
.,$ Words in sans serif represent keywords
A DMQL can provide the ability to support ad-hoc and interactive data mining
By providing a standardized language like SQL
' Hope to achieve a similar effect like that SQL has on relational database. Foundation for system development and evolution
2
. Facilitate information exchange, technology transfer, commercialization
and wide acceptance
I Design
D DMQL is designed with the primitives described as follows:
.4x Syntax for DMQL
'* Syntax for specification oftask-relevont dota
* the kind of knowledge to be mined
'l* con cept hi erarchy specification
'&. pottern presentotion and visualizotion
* Putting it all together - o DMQL query
Syntax of DMQL
,/ (DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)
./ (DMQL_Stotement) ;;= (pota_Mining_Stotement) | (Concept_Hierorchy_Definition-Statement)
| (V is ua I i zoti o n-o n d-P re se ntati o n )
./ Doto_Mining_Stotement) ::= use database(dotabase_nome) | use data warehouse
(doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}
(Mine-Knowledge-Specification) in relevance to
( attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) )
[where (condition)] [order by (order_list) [group by (grouping-list)] [hoving (condition)]
{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]
./ Mine_Knowtedge_Specificotion) ;;= (Mine-Char) | (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)
./ (Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s))
,/ (Mine_Desc) ::= mine comparison [as (pattern-name)] lor (target-closs)where
(torget_condition) {versus (contrast-closs_i) where (contrast-condition-i)l
analyze (meosure(s))
,/ Mine-Assoc) ::= mine ossociation [as (pottern-name)] [motching (metopottern)]
./ Mine_Closs) ::= mine classification [as (pottern-name)] analyze
( cl a ssify i n g-ott ri b ute -or-d
i me n s i o n )
7,
,/ (Concept_Hierorchy_Definition-statemeittl ::= define hierorchy (hierorchy-nonte)
[for (attribute_or_dimension)] on (relotion_or_cube_or_hierarchy)
as (hierarchy_description) [where (condition)]
./ (Visuolization_and_Presentotion) ::= display as (resultJorm) | {(Multilevel_Manipulation)}
./ (Multilevel_Monipulation) ::= roll up on (ottribute_or_dimension)
I drill down on (ottribute_or_dimension) | odd (attribute_or_dimension)
I d rop ( att ri b ute_o r_d i m e nsi o n )
DMQL-Svntax for task-relevant data specification
. Nomes of the relevont database or doto warehouse, conditions ond relevant attributes or
dimensions must be specified
. use ddtabase <dotabase_nome) or use dota worehouse <data_worehouse_name)
. from <relation(s)/cube(s)t [where condition]
. inrelevdnceto<attribute or dimension listt
. order by torder_list>
. group by <grouping_list>
. hoving <conditiont
Svntax for specifvine the kind of knowledee to be mined
/ Characterization
o Mine_Knowledge_Specification ::=
m i ne ch a ro cteri sti cs [ospattern-na me]
anolyze measure{s)
o Specifies that characteristic descriptions are to be mined
o Analyze specifies aggregate measures
o Example: mine characteristics as customerPurchasing analyze count%
/ Discriminationt
M i n e-Kn ow I e d ge-S Pe cifi coti o n : :=
mine comporison [as pattern-name]
for target-class where target-condition
{versus contrast-class-i where confidst-condition-i}
analYze measure(s)
''' Specifies thot discriminant descriptions ore to be mined, compore o given target closs of obiects
with one or more contrasting c/osses (thus referred to os comparison)
' Andlyze specifies oggregote meosures
. Example: mine comporison as purchose Groups for big Spenders where avg(t.price) >= 5L00
versus budget Spenders where avg(l'price) < 5100 onalyze count
/ Association
o Mine-Knowledge-specification ::=
mine associations [as pattern-namel
r [matching(metaPattern)]
o Specifies the mining of patterns of association
o can provide templates (metapattern) with the matching clause
o Example: mine associations as buyingHabits matching P(X: customer, W) and Q(X, Y; =2
buys (X,Z)
/ Classification
o Mine-Knowledge-specification ::=
m i ne cl o ssifi cqti o n Iospatte rn-na me]
o no lyzeclassifyi ng-attri bute-or-di me nsion
. Specifies that patterns for data classification are to be mined
. Analyze clause specifies that classification is performed according to the values
of (cl assifyi ng-attri bute-or-d i me nsion)
. For categorical attributes or dimensions, each value represents a class (such as
low-risk, medium risk, high risk)
4.
5
I For numeric attributes, each class defined by a range (such as 20-39,40-59, 60-
89 for age)
' Example: mine classifications as classifyCustomerCreditRating analyze creditrating
/ To specifv what concept hierarchies to use
use h ie ra rchy <hierarchy> for <attribute_or_dimension>
We use different syntax to define different type of hierarchies
o schema hierarchies
define hierarchy time_hierarchy on date as [date, monthquarter, year]
o set-groupinghierarchies
. define hierarchy age-hierarchy for age on customer as
. levell: {young, middle_aged, seniorl < level0: all
o level2: {2O, ...,39} < levelli young
o level2: {4O, ...,59} < levell: middle_agedo level2: {60, ..., 89} < levell: senior
o operation-derived hierarchies
Definehierarchyage_hierarchy for age on customer as
{age_category (1), ...,age_category(5)} := cluster(default, age, 5) <all(age)
o rule-basedhierarchies
Def i n e h i e ra rc hyprof it_ma rgin_h iera rchyo n item a s
o level_l: low_profit_margin< level_O: all
o if (price - cost)< $50o level_l: medium-profit_margin<level_0: all
o if ((price - cost) > $SO1 and ((price - cost) <= $250))o level_l: high_profit_margin< level_0: all
o if (price - cost) > $250
/ Syntax for pattern oresentation and visualization specification
We have syntax which allows users to specify the display of discovered patterns in one or
more forms
6,
display as <result_form>
ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces
To facilitate interactive viewing at different concept level, the following syntax is defined:
M u lti level_Ma n i pu lati on'.'.= rol I u p o nallribute-or_d ime nsion
I d ri I I dow n onattribute_or_dimension
I addattribute_or-dimension I
dropattri b ute_o r_d i me nsi o n
used ata ba seAll Electronics_d b
usehiera rchylocation_hierarchy for B.address
mine cha racteristics ascustomerPurchasing
analyze count%
in relevance to C.age,l.type, l.place-made
from customer C, item l, purchases P, items-sold S, works-at W, branch
wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD
andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"
andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" and
l.prico= 100
with noise threshold = 0.05
displayas table
/ Other Data Minine Laneuaees & Standardization Efforts
.'* Association rule language specifications
o MSQL (lmielinski& Virmani'99)
o MineRule (MeoPsaila and Ceri'96)
7
o Query flocks based on Datalog slntax (Tsur et al'98)
* OLEDB for DM (Microsoft'2000)
o Based on OLE, OLE DB, OLE DB for OLAP
o lntegrating DBMS, data warehouse and data mining
+ CRISP-DM (CRoss-lndustry Standard Process for Data Mining)
o Providing a platform and process structure for effective data mining
o Emphasizing on deploying data mining technology to solve business problems
+ Other Data Mining Languages & Standardization Efforts
+ Association rule language specifications
o MSQL (lmielinski& Virmani'99)
o MineRule (MeoPsaila and Ceri'96)
o Query flocks based on Datalog syntax (Tsur et al'98)
"a! OTEDB for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)
o Based on OLE, OLE DB, OLE DB for OLAP, C#
o lntegrating DBMS, data warehouse and data mining
+ DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)
o Providing a platform and process structure for effective data mining
Hierarchy Specification
A hierarchy is a root member of an alternate hierarchy, which is always at generation2 ofa dimension. Member value expressions are not allowed as hierarchy arguments.
Alternate hierarchies are applicable to aggregate storage databases only.
The dimension of the hierarchy argument passed to a function must match the dimension of theother arguments passed to the function. If they do not match, an error is return and the query isaborted.
urN++7