62
Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformat ion Science The Chinese University of Hong K ong [email protected]

Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong [email protected]

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

Spatial Data Mining

Yang Yubin

Joint Laboratory for Geoinformation Science

The Chinese University of Hong Kong

[email protected]

Page 2: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

2

Agenda

• Motivation and General Description• Data Mining: Basic Concepts• Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 3: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

3

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 4: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

4

Why do we need Data Mining?• Large number of records(cases) (108-1012 bytes)

– One thousand (103) bytes = 1 kilobyte (KB)– One million (106) bytes = 1 megabyte (MB)– One billion (109) bytes = 1 gigabyte (GB)– One trillion (1012) bytes = 1 terabyte (TB)

• High dimensional data (variables)– 10-104 attributes

• Only a small portion, typically 5% to 10%, of the collected data is ever analyzed

• We are drowning in data, but starving for knowledge!

Page 5: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

5

• Data collected and stored at enormous speeds (Gbyte/hour)– remote sensor on a satellite– telescope scanning the skies– scientific simulations generating terabytes of data

• Classical modeling techniques are infeasible

• Data reduction

• Cataloging, classifying, segmenting data

• Helps scientists in Hypothesis Formation

Scientific Viewpoint

Page 6: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

6

• Great efforts for construction and maintenance of large information databases

• Data cannot be analyzed by standard statistical methods– numerous missing records– data are qualitative rather than quantitative

• We do not always know what information might be represented or how relevant it might be to the questions

Current Situations (1)

Page 7: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

7

• the ways and means for using all this data lag far behind the increase of available data– Information can only be found with:

• a lot of coincidence (internet) • not explicitly available (company databases) • only accessible for human eyes by using lots of

processing power (astronomical, meteorological and earth observation data)

• This leads to a clear demand for means of uncovering the information and knowledge hidden in the massive quantities of data

Current Situations (2)

Page 8: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

8

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 9: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

9

What is Data Mining?• Data mining is concerned with solving

problems by analyzing existing data• “Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) information or patterns from huge amount of data”

• Alternative Names: Knowledge Discovery in Databases (KDD)– A term originated in Artificial Intelligence (AI) field– KDD consists of several steps (one of which is Data

Mining)

Page 10: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

10

Data Mining vs. KDD

• Knowledge Discovery in Databases (KDD): The whole process of finding useful information and patterns in data

• Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process

• Data mining is the core of the knowledge discovery process

Page 11: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

11

KDD Process

• Selection: Obtain data from various sources.• Preprocessing: Cleanse data.• Transformation: Convert to common format.

Transform to new format.• Data Mining: Obtain desired results.• Interpretation/Evaluation: Present results to user in

meaningful manner

Page 12: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

12

Data Mining: A KDD Process

– Data mining: core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 13: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

13

Typical Data Mining Architecture

Data Warehouse

Data cleaning & data integration Filtering

Databases

Database or data warehouse server

Data mining engine

Pattern evaluation

Graphical user interface

Knowledge-base

Page 14: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

14

Data Mining: Confluence of Multiple Disciplines

Data Mining

Database Systems

Statistics

Algorithms,…,Other

Disciplines

InformationTheory

MachineLearning Visualization

Page 15: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

15

Data Mining is:

• A “hot” word for a class of techniques that find patterns in data

• A user-centric, interactive process which leverages analysis technologies and computing power

• A group of techniques that find relationships that have not previously been discovered

• Not reliant on an existing database• A relatively easy task that requires knowledge of

the business problem/subject matter expertise

Page 16: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

16

Experts and clients are needed in:

• Define and redefine problems

• Determine relevant aspects of the problem

• Supply the data

• Remove errors from the data

• Provide constraints on possible patterns

• Interpret patterns and possibly reject implausible ones

• Evaluate predicted effects…

Page 17: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

17

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 18: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

18

Primary Data Mining Tasks (1)

• Descriptive Modeling– Finding a compact description for large dataset

[Concept Description]

– Clustering people or things into groups based on their attributes [Clustering]

– Associating what events are likely to occur together [Association Rule]

– Sequencing what events are likely to lead to later events [Sequential Pattern Analysis]

– Discovering the most significant changes [Deviation Detection]

Page 19: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

19

Primary Data Mining Tasks (2)

• Predictive Modeling– Classifying people or things into groups by

recognizing patterns [Classification]– Forecasting what may happen in the future by

mapping a data item to a predicting real-value variable [Regression]

Page 20: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

20

Concept Description

• Characterization: provides a concise and succinct summarization of the given collection of data

• Discrimination: provides descriptions comparing two or more collections of data

• can handle complex data types of the attributes

• a more automated process

Page 21: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

21

Name Gender Major Birth-Place Birth_date Residence Phone # GPA

Jim Woodman

M CS Vancouver,BC,Canada

8-12-76 3511 Main St., Richmond

687-4598 3.67

Scott Lachance

M CS Montreal, Que, Canada

28-7-75 345 1st Ave., Richmond

253-9106 3.70

Laura Lee …

F …

Physics …

Seattle, WA, USA …

25-8-70 …

125 Austin Ave., Burnaby …

420-5232 …

3.83 …

Removed Retained Sci,Eng,Bus

Country Age range City Removed Excl, VG,..

Gender Major Birth_region Age_range Residence GPA Count

M Science Canada 20-25 Richmond Very-good 16 F Science Foreign 25-30 Burnaby Excellent 22 … … … … … … …

Birth_Region

GenderCanada Foreign Total

M 16 14 30

F 10 22 32

Total 26 36 62

Generalized Relation

Initial Relation

Concept description: Characterization

Page 22: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

22

Clustering

• Cluster: a collection of data objects– Similar to one another within the same cluster– Dissimilar to the objects in other clusters

• Clustering– Grouping a set of data objects into clusters based on the

principle: maximizing the intra-class similarity and minimizing the interclass similarity

• Example– Land use: Identification of areas of similar land use in

an earth observation database– City-planning: Identifying groups of houses according

to their house type, value, and geographical location

Page 23: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

23

Association rule

• Association (correlation and causality)– age(X, “20..29”) ^ income(X, “20..29K”) buys(X,

“PC”) [support = 2%, confidence = 60%]

• Association rule mining– Finding frequent patterns, associations, correlations

among sets of items or objects in transaction databases, relational databases, and other information repositories

– Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database

• Motivation: finding regularities in data– What products were often purchased together?

Page 24: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

24

Example: Association rule

• Itemset A1,A2={a1, …, ak}

• Find all the rules A1A2 with min confidence and support– support, s, probability that a transact

ion contains A1A2– confidence, c, conditional probabilit

y that a transaction having A1 also contains A2.

Let min_support = 50%, min_conf = 50%:

a1 a3 (50%, 66.7%)a3 a1 (50%, 100%)

Transaction-id Items bought

10 a1,a2, a3

20 a1, a3

30 a1, a4

40 a2, a5, a6

Page 25: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

25

Sequential Pattern Analysis• Given a set of sequences, find the complete set of fr

equent subsequences

• Applications of sequential pattern– Customer shopping sequences:

• First buy computer, then CD-ROM, and then digital camera, within 3 months.

– Weblog click streams– Telephone calling patterns

SID sequence

10 <a(abc)(ac)d(cf)>

20 <(ad)c(bc)(ae)>

30 <(ef)(ab)(df)cb>

40 <eg(af)cbc>

Given support threshold min_sup =2, <(ab)c> is a sequential pattern

Page 26: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

26

Deviation Detection

• Outlier analysis– Outlier: a data object that does not comply with

the general behavior of the data– It can be considered as noise or exception but is

quite useful in fraud detection, rare events analysis

• Trend and evolution analysis– Trend and deviation: regression analysis– Periodicity analysis– Similarity-based analysis

Page 27: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

27

Classification and Regression

• Classification:– constructs a model (classifier) based on the

training set and uses it in classifying new data– Example: Climate Classification,…

• Regression:– models continuous-valued functions, i.e.,

predicts unknown or missing values– Example: stock trends prediction,…

Page 28: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

28

Classification (1): Model Construction

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 29: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

29

Classification (2): Prediction Using the Model

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 30: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

30

Classification Techniques

• Decision Tree Induction• Bayesian Classification• Neural Networks• Genetic Algorithms• Fuzzy Set and Logic

Page 31: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

31

Regression• Regression is similar to classification

– First, construct a model– Second, use model to predict unknown

value

• Methods– Linear and multiple regression– Non-linear regression

• Regression is different from classification– Classification refers to predict

categorical class label– Regression models continuous-valued

functions

Page 32: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

32

Are All the “Discovered” Patterns Interesting?• A data mining task may generate thousands of patt

erns, not all of them are interesting.• Interestingness measures:

– A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm

– Objective vs. Subjective interestingness measures:• Objective: based on statistics and structures of patterns, e.g.,

support, confidence, etc.

• Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, executability, etc.

Page 33: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

33

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 34: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

34

Spatial Data Mining• Spatial Patterns

– Spatial outliers– Location prediction– Associations, co-locations– Hotspots, Clustering, trends, …

• Primary Tasks– Mining Spatial Association Rules– Spatial Classification and Prediction – Spatial Data Clustering Analysis– Spatial Outlier Analysis

• Example: Unusual warming of Pacific ocean (El Nino) affects weather in USA…

Page 35: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

35

Spatial Data Mining Results

• Understanding spatial data, discovering relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc.

• In various forms– The description of the general weather patterns in a set

of geographic regions is a spatial characteristic rule.– The comparison of two weather patterns in two geograp

hic regions is a spatial discriminant rule.– A rule like “most cities in Canada are close to the Cana

da-US border” is a spatial association rule• near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%)

– Others: spatial clusters,…

Page 36: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

36

What is Spatial Data?

Used in/for: GIS - Geographic Information Systems Meteorology Astronomy Environmental studies, etc.

• The data related to objects that occupy space– traffic, bird habitats, global

climate, logistics, ... • Object types:

– Points, Lines, Polygons,etc.

Page 37: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

37

Basic Concepts (1)• Spatial data mining follows along the same functions

in data mining, with the end objective to find patterns in geography, meteorology, etc.

• The main difference (Spatial autocorrelation)– the neighbors of a spatial object may have an influence on

it and therefore have to be considered as well

• Spatial attributes– Topological

• adjacency or inclusion information

– Geometric• position (longitude/latitude), area, perimeter, boundary polygon

Page 38: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

38

Basic Concepts (2)

• Spatial neighborhood– Topological relation

• “intersect”, “overlap”, “disjoint”, …

– distance relation• “close_to”, “far_away”,…

– direction/orientation relation• “left_of”, “west_of”,…

• Global model might be inconsistent with regional models

Global Model

Local Model

Page 39: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

39

Applications

• NASA Earth Observing System (EOS): Earth science data

• National Inst. of Justice: crime mapping• Census Bureau, Dept. of Commerce: census

data• Dept. of Transportation (DOT): traffic data• National Inst. of Health(NIH): cancer

clusters• ……

Page 40: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

40

Example: What Kind of Houses Are Highly Valued?—Associative Classification

Page 41: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

41

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in

Meteorology and Weather Forecasting• Conclusions• Questions & Discussions

Page 42: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

42

Meteorological Data Mining

• Motivation– Lot of analysis methods must be applied to fast growing

data for climate studies

• Result– Appropriate presentation instruments (graphs, maps,

reports, etc) must be applied

• Examples– Spatial outliers can be associated with disastrous natural

events such as tornadoes, hurricane, and forest fires

– Associations between disaster events and certain meteorological observations

Page 43: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

43

• SKICAT(SKy Image Cataloging and Analysis Tool ) (Caltech, US)

• The Palomar Observatory discovered 22 quasars with the help of data mining

• the Second Palomar Observatory Sky Survey (POSS-II) – decision tree methods– classification of galaxies, stars and other

stellar objects

• About 3 TB of sky images were analyzed

Case Studies (1): Astronomy

Page 44: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

44

Case Studies (2): NCAR & UCAR• National Center for Atmospheric Research (NCAR) & Univers

ity Corporation for Atmospheric Research(UCAR), US– http://www.ucar.edu/

• “Automatic Fuzzy Logic-based systems now compete with human forecasts”

• Richard Wagoner, Deputy Director at Research Applications Program(RAP), NCAR

• Intelligent Weather System (IWS)– Detection and forecast in the areas of en-route turbulence, e

n-route icing, ceiling/visibility, and convective hazards in the aviation community

– Road winter maintenance, airport operations, and flash flood forecasting

Page 45: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

45

Operational Application

• Prediction System: WIND-2– WIND: “Weather Is Not Discrete”

• Consists of three parts:– Data

• Past airport weather observations, 30 years of hourly observations, time series of 300,000 detailed observations

• Recent and current observations (METARs)

• Model based guidance (knowledge of near-term changes,e.g., imminent wind-shift, onset/cessation of precipitation)

– Fuzzy similarity-measuring algorithm

– Prediction composition – predictions based on k nearest neighbors(k-nn, clustering method)

Page 46: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

46

Operational Application

• Hybrid methods are used to predict weather– Dynamical approach - based upon equations of

the atmosphere,uses finite element techniques – Empirical approach - similar weather situations

lead to similar outcomes

• WIND runs in real-time for meteorologically different sites

• Data-mining/forecast process takes about one second

Page 47: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

47

Page 48: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

48

Case Studies (3): CrossGrid (EU)

• Objective– To develop, implement and exploit new Grid components

for interactive compute and data intensive applications like flooding crisis team decision support systems, air pollution combined with weather forecasting

• Main tasks in Meteorological applications package– Data mining for atmospheric circulation patterns

• Find a set of representative prototypes of the atmospheric patterns in a region of interest

– Weather forecasting for maritime applications

– Ocean wave forecasting by models of various complexity

Page 49: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

49

• Data– ERA-15 using a T106L31 model (from 1978 to 1994) with 1.125◦ resolution

– Terabytes

– Comprises data from approx. 20 variables (such as temperature,humidity, pressure, etc.) at 30 pressure levels of a 360x360 nodes grid

6

SOM Application for DataMining

Downscaling Weather Forecasts

AdaptiveCompetitive

Learning

Sub-grid details scape from numerical models

Page 50: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

50

Dept. of Applied Mathematics

Universidad de Cantabria

Santander, Spain

Page 51: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

51

Case Studies (4): Typhoon Image Data Mining• Objective

– To establish algorithms and database models for the discovery of information and knowledge useful for typhoon analysis and prediction

– Content-based image retrieval technology to search for similar cloud patterns in the past

– Data mining technology to extract spatio-temporal pattern information which is meaningful from the meteorology viewpoints

• Result– Alignment of Multiple Typhoons, Explore by Projection to

2D Plane, Diurnal Analysis

Page 52: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

52

Methods

• Archive of approximately 34,000 typhoon images for the northern and southern hemisphere

• Various data mining approaches– Principal component analysis(PCA), K-means clustering,

self-organizing map(SOM), wavelet transform

• Retrieval of historical similar patterns from image databases to perform instance-based typhoon analysis and prediction

• Extracting the eigenvectors of the whole typhoon image collection

Page 53: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

53

Page 54: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

54

Case Studies (5): LEAD

• Linked Environments for Atmospheric Discovery– To accommodate the real time, on-demand, and dynami

cally-adaptive nature of mesoscale problems• Complexities: vastly disparate, high volume and bandwidth dat

a• Tremendous computational demands

– Used in accessing, preparing, assimilating, predicting, managing, mining/analyzing, and displaying a broad array of meteorological and related information

• Data Mining Solution Center: ITSC, The Univ. of Alabama in Huntsville, US– http://datamining.itsc.uah.edu/index.jsp

Page 55: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

55

ADaM

• The Algorithm Development and Mining– Component architecture data mining toolkit– For geophysical phenomena detection and featu

re extraction

• Applications– Detecting tropical cyclones and estimating their

maximum sustained wind speed– Mesocyclone Identification from RADAR– Detecting Cumulus Cloud Fields in GOES Ima

ges

Page 56: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

56

ADaM (cont’d)

– Mesoscale Convective Systems Detection

• EOS Special Sensor Microwave/Imager (SSM/I) Brightness Temperature Swaths from DMSP F13 and F14

– Rain Detection Using SSM/I– Lightning Detection Using

OLS – Rain Accumulation Study

Page 57: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

57

Case Studies (6): Rainfall Classification University of Oklahoma Norman• To classify significant and interesting features within a

two-dimensional spatial field of meteorological data– Observed or predicted rainfall

• Data source– Estimates of hourly accumulated rainfall

– Using radar and raingage data

• “Attributes” for classification– Statistical parameters representing the distribution of rainfall

amounts across the region

• Classification Method– Hierarchical cluster analysis

Page 58: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

58

Many Others…

• JARtool Project (Fayyad et al., NASA )

• Identifying volcanoes on the surface of Venus from images transmitted by the Magellan spacecraft

• More than 30,000 high resolution Synthetic Aperture Radar(SAR) images of the surface of Venus from different angles

• The obtained accuracy was about 80%

Page 59: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

59

What we can learn from those scenarios?

• Data Mining is a promising way for meteorological analysis

• Very strong interaction between scientists and the knowledge discovery system is necessary

• The users define features of the meteorological phenomena based on their expert knowledge

• The system extracts the instances of such phenomena

• Then, further analysis of phenomena is possible

Page 60: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

60

• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology

and Weather Forecasting• Conclusions• Questions & Discussions

Page 61: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

61

Conclusions

• Data mining: discovering interesting patterns from large amounts of data

• A natural evolution of database technology, in great demand, with wide applications

• A KDD process includes data mining, and other steps• Data Mining can be performed in a variety of

information repositories• Data mining Tasks: characterization, discrimination,

association, classification, clustering, outlier and trend analysis, etc.

Page 62: Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk

2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society

62

And now discussion