22
GEOINFO 2006 Utilização da biblioteca TerraLib para algoritmos de agrupamento em Sistemas de Informações Geográficas Mauricio P. Guidini Carlos H. C. Ribeiro Nov 2006 Supervisor Use of the TerraLib library for clustering algorithms in Geographic Information Systems

GEOINFO 2006

Embed Size (px)

DESCRIPTION

GEOINFO 2006. Utilização da biblioteca TerraLib para algoritmos de agrupamento em Sistemas de Informações Geográficas. Use of the TerraLib library for clustering algorithms in Geographic Information Systems. Mauricio P. Guidini Carlos H. C. Ribeiro. Supervisor. Nov 2006. - PowerPoint PPT Presentation

Citation preview

Page 1: GEOINFO 2006

GEOINFO 2006

Utilização da biblioteca TerraLib para algoritmos de agrupamento em Sistemas de Informações Geográficas

Mauricio P. Guidini

Carlos H. C. Ribeiro Nov 2006

Supervisor

Use of the TerraLib library for clustering algorithms in Geographic Information

Systems

Page 2: GEOINFO 2006

25/10/2004

“... 3000 unregistered flights, with origin and destiny unkown by authorities, invaded the Brazilian airspace in the first ten months of this year. The Air Force calculates that about 30% of these flights were related to drug dealing ...

Translated from note from

Page 3: GEOINFO 2006

3

Data Mining in GIS

Objetive

To present the integration of a Data Mining algorithm (k-means) to TerraLib/TerraView, forming a Geographic Information System for Unknown Air Traffic analysis (GisTAD).

Page 4: GEOINFO 2006

4

SummaryData Mining

Clustering Algorithms

Air Traffic

K-means Implementation

Results

Aplication

Data Mining in GIS

Page 5: GEOINFO 2006

5

Data Mining Definition:

“A non-trivial process of identification of valid, new, useful standards implicitly present in large volumes of data”

Knowledge Discovery in Database (KDD) - Fayyad et al. (1996)

Data Mining in GIS

Page 6: GEOINFO 2006

6

How proceed DM?KDD process

Data Mining in GIS

Page 7: GEOINFO 2006

7

Clustering Algorithms

The clustering process tries to grouping the data into groups that have highly similar features, helping the understanding of the information that they hold.

A good clustering algorithm is characterized by the production of high level classes, where the intraclass similarity is high, and the interclass similarity is low. [Han & Kamber 2001]

Data Mining in GIS

Page 8: GEOINFO 2006

8

Data Mining in GIS

Major Categories Partitioning – k-means, k-medoids Hierarchical – CURE, BIRCH Density-based – DBSCAN, OPTICS Grid-based – STING Model-based

Others ANN – Kohonen network Incremental - Leader

Page 9: GEOINFO 2006

9

Data Mining in GIS

Air TrafficMovement of aircraft, national or foreign, that fly over national territory.

Unkown Air TrafficTo unidentified airplanes (flight plan), two lines of action can be taken[Bernabeu 2004]:

1.Intercept; or

2.Generate an Unkown Air Traffic Report

Page 10: GEOINFO 2006

10

Traffic RepresentationLine segments

Latitude (decimal degrees)Longitude (decimal degrees)Distance (miles)Heading

RestrictionsAcceptable deviations

Data Mining in GIS

Page 11: GEOINFO 2006

11

K-means algorithm

Data Mining in GIS

Precondition: set max deviation values to coordinates, distance and routeBegin: K=0 While criterion condition not satisfied (deviation in clusters) Increase K Arbitrarily choose K centers (among data objects) While centers change (k-means) (re)assign routes in cluster based on weights update centers values end movement intergroups deviation in groups ok Save resultsEnd

Page 12: GEOINFO 2006

12

Distance Measure

Data Mining in GIS

2222** mimimimi prprdistdistlonglongpesolatlatpeso

Minimize deviationsImprove cluster quality

coordmi paramlatlat coordmi paramlonglong and

Page 13: GEOINFO 2006

13

GIS Integration

TerraLib

TerraView

k-means

Data Mining in GIS

Page 14: GEOINFO 2006

14

Data preparation

8000 records

looking for information (what?)

Data Mining in GIS

Search space restrictionsSearch space restrictions

Page 15: GEOINFO 2006

15

Numeric Tests

to 500 records

GisTAD Tests

319 records 73 groups

Aprox. time = 40 sec.

Data Mining in GIS

Page 16: GEOINFO 2006

16

TerraView

Page 17: GEOINFO 2006

17

TerraView

Page 18: GEOINFO 2006

18

Page 19: GEOINFO 2006

19

Applications

Air Operations

Improper use of air space

Data Mining in GIS

Page 20: GEOINFO 2006

20

Page 21: GEOINFO 2006

21

Data Mining in GIS

Conclusion

Considering the problem proposed, the k-means algorithm is applicable, and returned a good set of clusters.

However, the number of records that must be clustered can make the application of the algorithm very time consuming.

Page 22: GEOINFO 2006

22

Future Work

Other partitioning algorithms should be implemented, to verify which one is the most efficient for the problem in analysis, considering any size of records to be clustered.

The algorithms to be tested are:

Kohonen neural network;

Leader algorithm.