14
AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email [email protected] http://scom.hud.ac.uk/scomtlm/ cha2555/

AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email [email protected]@hud.ac.uk

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

AI Week 23Machine LearningData Mining – Week 2

Lee McCluskey, room 2/07

Email [email protected]

http://scom.hud.ac.uk/scomtlm/cha2555/

Page 2: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Focus on one area: Data Mining involves discovering patterns from large data bases or data warehouses

for different purposes. It is the science of extracting meaningful information from (large) databases.

Applications - Market analysis and Retail, Decision support, Financial analysis, Discovering environmental trends

Two Types of Learning: Data Mining can be supervised (“Learning from Example”) or unsupervised (“Learning from Observation”)

Data Mining is often part of a larger process aimed at getting more out of data warehouses and involves data clensing

data clensing: is the process of identifying and removing or correcting corrupted record from a database. This makes the data consistent with other similar data sets in the database. Eg the process may remove invalid post codes, spurious extreme values (eg -999999.999).

Page 3: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Association Rule Mining(ARM)

This is an “unsupervised learning activity” - briefly, looking for strong associations between features in data.

Definitions: A transactional database is a set of “transactions” eg the details of individual sales.

A transaction can be though of as an “item-set” where each item is an attribute-value

{height=6, temp = 20. weather = warm}

As a special case we could have nominal item sets

{bread, cheese, milk}

Page 4: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Association Rule Mining(ARM): Important Definitions

An association rule is an expression

X => Ywhere X, Y are item-sets, and

The support of an association rule is defined as the proportion of transactions in the database that contain

X U Y.

The confidence of an association rule is defined as the probability that a transaction contains Y given that it contains X, that is

= no of transactions containing (X U Y) / no of transactions containing X

Page 5: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Example A trader deals in the following currencies in a series of 8 transactions…

1 Sterling Yen Dollar Euro

2 Dollar Euro Rand Sterling Ruble

3 Pesos Euro Ruble Rupee Yen

4 Rupee Sterling Ruble Euro Dollar

5 Sterling Dinars Rand Yen

6 Pesos Kroner Sterling Dollar

7 Ruble Rupee Kroner Sterling Pesos

8 Dollar Euro Sterling

What is the SUPPORT and CONFIDENCE of the following rules?{Ruble } → {Rupee}{Sterling, Euro} → {Ruble} {Sterling, Euro} → {Ruble,,Pesos}

Find an association rule from the set of transactions that has - at least 2 items in its antecedents, - better support and better confidence than both rules above.

Page 6: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Aims of ARM

Given a transactional database D, the association rule problem is to find all rules that have supports and confidences greater than certain user-specified thresholds, denoted by minimum support (MinSupp) and minimum confidence (MinConf), respectively.

The aim is the discovery of the most significant associations between the items in a transactional data set. This process involves primarily the discovery of so called frequent item-sets, i.e. item-sets that occurred in the transactional data set above MinSupp and MinConf.

Page 7: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Contract: Classification Rule MiningThe output of DM is a (set of) classification rule(s)

WHERE classes are known apriori (supervised learning) and there is only one class on RHS.

Features => C(1)

….

Features => C(n)

Page 8: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Classification Rule MiningSize = medium, colour = green, shape = square => c1Size = small, colour = red, shape = square => c1Size = small, colour = blue, shape = circle => c1Size = small, colour = green, shape = triangle => c2Size = large, colour = white, shape = circle => c2Aims is to find “hypotheses” that are

Characteristic – true of all members of a class

Discriminating – not true of ANY members of other classes

Page 9: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Associative ClassificationIf we fuse ARM and CRM we get “Associative Classification” –

use the association technique, but learning about particular items or item sets.

Associative Classification is a branch in data mining that combines classification and association rule mining. In other words, it utlises association rule discovery methods in classification data sets.

Typically:Find Association Rules using ARMSift out the “Class Association Rules” – ones that have the

class of interest on their Right Hand Sides

Page 10: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Example in Road Traffic Control

Page 11: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Example in Road Traffic Control

Page 12: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Example in Road Traffic ControlData ..

Numeric Data Record from individual CARS

(date, time, position, actual speed, expected speed)

Textual Data of INCIDENTS

(date, time start, time cleared, position, severity, road type, area, incident category, cause, road-effect, traffic-effect, reporter ..)

Page 13: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

Example in Road Traffic Control• associations between variations in speeds with near-

future incidents • effect of a particular type of incident (eg roadworks) on

average speeds on nearby trunk roads• looking for predictors in "heavy/slow traffic" incidents:

look for associations with speed variations or accidents on roads downstream from the incident position (hence causing the incident)

• looking for associations between speeds around a bypass and a later "heavy traffic" incident within the town bypassed

• extraction of the roads that have most impact to cause congestion

• formulation of rules that can predict conditions after a period of road works or an incident (depending on specific road, type of incident etc).

Page 14: AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk

Artform Research Group

ConclusionsData Mining is a powerful set of techniques

to help discover hidden knowledge

It can be supervised or unsupervised.

ARMCRMACAre three important classes of technique

used in DM