h.
l;u't [-',
DATA MINING QUERY LANGUAGES
DMOL-A Oata tvtinine Q
'+
Data mining language must be designed to facilitate flexible and effective knowledge discovery.
+ 4 *
'S
',&
Having a query language for data mining may help standardize the development of
platforms for data mining systems. gut designed a language is challenging because data mining covers a wide spectrum of
tasks and each task has different requirement. Hence, the design of a language requires deep understanding of the limitations and
underlying mechanism of the various kinds of tasks.
So...how would you design an efficient query language???
Based on the primitives discussed earlier.
+ +
,t.
DMQL allows mining of different kinds of knowledge from relational databases and data warehouses at multiple levels of abstraction
Adopts SQL-like syntax
languages
,'*. Hence, can be easily integrated with relational query
Defined in BNF grammar
o o
[ ] represents 0 or one occurrence
{ } represents 0 or more occurrences
.,$ Words in sans serif represent keywords
A DMQL can provide the ability to support ad-hoc and interactive data mining
By providing a standardized language like SQL
' .
Hope to achieve a similar effect like that SQL has on relational database
Foundation for system development and evolution
2
. I
Design
Facilitate information exchange, technology transfer, commercialization
and wide acceptance
D
.4x Syntax
DMQL is designed with the primitives described as follows:
'* *
'l*
Syntax
for DMQL for specification oftask-relevont dota
hi
the kind of knowledge to be mined
con
cept
erarchy specification
'&. pottern presentotion and visualizotion * Putting it all together - o DMQL query
Syntax of DMQL
,/ ./ ./
(DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)
(DMQL_Stotement) ;;= (pota_Mining_Stotement)
| (Concept_Hierorchy_Definition-Statement)
(V is ua I i zoti
n-o
d-P
re se
ntati o n )
use
Doto_Mining_Stotement)
database(dotabase_nome) | use data warehouse (doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}
::=
(Mine-Knowledge-Specification)
attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) ) [where (condition)] [order by (order_list) [group by
(
in relevance to
(grouping-list)] [hoving
(condition)]
{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]
Mine_Knowtedge_Specificotion) ;;= (Mine-Char)
./ ./ ,/
| (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)
(Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s))
(Mine_Desc) ::= mine comparison [as
(pattern-name)] lor (target-closs)where
(torget_condition)
analyze (meosure(s))
{versus (contrast-closs_i) where (contrast-condition-i)l
,/ ./
Mine-Assoc) Mine_Closs)
::= mine ossociation [as (pottern-name)] [motching (metopottern)]
::= mine classification [as (pottern-name)] analyze i me n s i o n ) ( cl a ssify i n g-ott ri b ute -or-d
7,
,/
(Concept_Hierorchy_Definition-statemeittl
::=
(attribute_or_dimension)] as (hierarchy_description)
[for
define hierorchy (hierorchy-nonte) on (relotion_or_cube_or_hierarchy)
[where (condition)]
./ ./
(Visuolization_and_Presentotion) ::= display as (resultJorm)
| {(Multilevel_Manipulation)}
(Multilevel_Monipulation)
::= I drill down on (ottribute_or_dimension)
I d rop ( att ri b ute_o
r_d i m e nsi o n )
roll up on (ottribute_or_dimension) | odd (attribute_or_dimension)
DMQL-Svntax for task-relevant data specification
. . . . . . .
Nomes of the relevont database or doto warehouse, conditions ond relevant attributes or
dimensions must be specified
use ddtabase <dotabase_nome) or use dota worehouse <data_worehouse_name)
from <relation(s)/cube(s)t [where condition] inrelevdnceto<attribute or dimension listt
order by torder_list> group by <grouping_list> hoving <conditiont
Svntax for specifvine the kind of knowledee to be mined
Characterization
Mine_Knowledge_Specification
m i ne ch a ro
::=
cteri sti cs [ospattern-na me] anolyze measure{s)
o o o
Specifies that characteristic descriptions are to be mined
Analyze specifies aggregate measures
Example: mine characteristics as customerPurchasing analyze count%
4.
Discrimination
M
in
e-Kn ow
ed
ge-S Pe cifi coti o n : :=
mine comporison [as pattern-name] for target-class where target-condition {versus contrast-class-i where confidst-condition-i} analYze measure(s)
''' ' .
o given target closs of obiects Specifies thot discriminant descriptions ore to be mined, compore with one or more contrasting c/osses (thus referred to os comparison)
Andlyze specifies oggregote meosures avg(t.price) >= 5L00 Example: mine comporison as purchose Groups for big Spenders where versus budget Spenders where avg(l'price) < 5100 onalyze count
/
o
Association
Mine-Knowledge-specification ::= mine associations [as pattern-namel
r o o o
[matching(metaPattern)]
Specifies the mining of patterns of association
can provide templates (metapattern) with the matching clause
W) and Q(X, Y; =2 Example: mine associations as buyingHabits matching P(X: customer,
buys (X,Z)
/
o
Classification
Mine-Knowledge-specification ::=
m
o
i
ne
cl
o ssifi cqti o n Iospatte rn-na me]
no lyzeclassifyi ng-attri bute-or-di me nsion
. . .
Specifies that patterns for data classification are
to be mined
to the values Analyze clause specifies that classification is performed according of
(cl assifyi
ng-attri bute-or-d
me nsion)
a class (such as For categorical attributes or dimensions, each value represents low-risk, medium risk, high risk)
5
I '
For numeric attributes, each class defined by a range (such as 20-39,40-59, 6089 for age) Example: mine classifications as classifyCustomerCreditRating analyze credit
rating
To specifv what concept hierarchies
use
h ie ra
to use
rchy <hierarchy> for <attribute_or_dimension>
We use different syntax to define different type of hierarchies
o o
schema hierarchies
define hierarchy time_hierarchy on date as [date, monthquarter, year]
set-groupinghierarchies
define hierarchy age-hierarchy for age on customer
as
. o o o o
levell: {young, middle_aged, seniorl < level0: all level2: {2O, ...,39} < levelli young level2: {4O, ...,59} < levell: middle_aged level2: {60, ..., 89} < levell: senior
operation-derived hierarchies
as
Definehierarchyage_hierarchy for age on customer
{age_category (1), ...,age_category(5)} := cluster(default, age, 5) <all(age)
o
Def
i
rule-basedhierarchies
h i e ra rc
hyprof it_ma rgin_h iera rchyo
item
o o o
level_l: low_profit_margin< level_O: all o if (price - cost)< $50 level_l: medium-profit_margin<level_0: all o if ((price - cost) > $SO1 and ((price - cost) <= $250)) level_l: high_profit_margin< level_0: all o if (price - cost) > $250
Syntax for pattern oresentation and visualization specification
We have syntax which allows users to specify the display of discovered patterns in one or more forms
6,
display as <result_form>
ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces
To
M
u
facilitate interactive viewing at different concept level, the following syntax is defined:
lti level_Ma
n
i
pu lati
on'.'.=
rol I u p o nallribute-or_d ime nsion
I d ri I I dow n onattribute_or_dimension I dropattri
b
addattribute_or-dimension
ute_o r_d i me
nsi o n
used ata ba seAll Electronics_d b
usehiera rchylocation_hierarchy
for B.address
mine cha racteristics ascustomerPurchasing analyze count% in relevance to C.age,l.type, l.place-made
from customer C, item l, purchases P, items-sold
S,
works-at W, branch
wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"
andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" and
l.prico= 100
with noise threshold
displayas table
= 0.05
/
.'*
Other Data Minine Laneuaees & Standardization Efforts
Association rule language specifications
o o
MSQL (lmielinski& Virmani'99)
MineRule (MeoPsaila and Ceri'96)
7
o *
OLEDB
Query flocks based on Datalog slntax (Tsur et al'98)
for DM (Microsoft'2000)
Based on OLE, OLE DB, OLE DB for OLAP
o o + o o + +
lntegrating DBMS, data warehouse and data mining
CRISP-DM (CRoss-lndustry Standard Process
for Data Mining)
Providing a platform and process structure for effective data mining
Emphasizing on deploying data mining technology to solve business problems
Other Data Mining Languages & Standardization Efforts
Association rule language specifications
o o o
"a!
OTEDB
MSQL (lmielinski& Virmani'99)
MineRule (MeoPsaila and Ceri'96) Query flocks based on Datalog syntax (Tsur et al'98)
for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)
Based on OLE, OLE DB, OLE DB for OLAP, C#
o o + o
lntegrating DBMS, data warehouse and data mining
DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)
Providing a platform and process structure for effective data mining
Hierarchy Specification
A hierarchy is a root member of an alternate hierarchy, which is always at generation2 of a dimension. Member value expressions are not allowed as hierarchy arguments.
Alternate hierarchies are applicable to aggregate storage databases only.
The dimension of the hierarchy argument passed to a function must match the dimension of the other arguments passed to the function. If they do not match, an error is return and the query is
aborted.
urN++7