99
ase I Status Update April 26 2004 -- page 1 Architecture Technology Corporation Specialists in Computer Architecture Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant) TEC SBIR Phase I A03-129 Status Update Ranga Ramanujan Sid Kudige Shashi Shekhar Gene Proctor 952-829-5864 (x120) 952-829-5864 (x163) 612-624-8307 202-293-9701 (x113) [email protected] [email protected] [email protected] [email protected]

Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

  • Upload
    tuari

  • View
    17

  • Download
    2

Embed Size (px)

DESCRIPTION

Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant). TEC SBIR Phase I A03-129 Status Update Ranga Ramanujan Sid Kudige Shashi Shekhar Gene Proctor - PowerPoint PPT Presentation

Citation preview

Page 1: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 1 Architecture Technology CorporationSpecialists in Computer Architecture

Spatial Data Mining Toolkit for Refining MSDS

(aka TopoAssistant) TEC SBIR Phase I A03-129

Status Update

Ranga Ramanujan Sid Kudige Shashi Shekhar Gene Proctor

952-829-5864 (x120) 952-829-5864 (x163) 612-624-8307 202-293-9701 (x113)

[email protected] [email protected] [email protected] [email protected]

Page 2: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 2 Architecture Technology CorporationSpecialists in Computer Architecture

Agenda

SBIR Review 09:00 - 12:00 Kudige

Lunch 12:00 - 01:00

ATC R&D Overview 01:00 - 01:45Ramanujan

Spatial Data Mining 01:45 - 02:15 Shekhar

Research at UMN

Facility Tour 02:15 - 02:45 Proctor

Page 3: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 3 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 4: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 4 Architecture Technology CorporationSpecialists in Computer Architecture

Overall SBIR Goal

Develop TopoAssistant tool for assisting Army topographers with refinement of feature data for “just-in-time” MSDS Phase I Goal

Develop architecture and design of TopoAssistant software tool Build rapid prototype to establish implementation feasibility

Phase II Goal Build full-scale operational prototype of TopoAssistant

Phase III Goal Transition TopoAssistant to fielded system

Team Sid Kudige - PI Ranga Ramanujan - Tech. Advisor Prof. Shashi Shekhar - Consultant Gene Proctor - Commercialization

Page 5: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 5 Architecture Technology CorporationSpecialists in Computer Architecture

Motivation and Payoff

Current process for refining MSDS feature data is time consuming and expensive Study estimate of 2,400 production hours for DTOP 5 data

set for 15’X15’ cell size [Kabinier] TopoAssistant tool will use innovative spatial data mining

techniques to Significantly automate feature data refinement

Detection of errors in source data Prediction of positional errors Prediction of extra/erroneous/missing features Predicting mislabeled features

Feature attribution Prediction of missing features (categorical) Prediction of erroneous/missing attribute values (numerical)

Support timely and cost-effective Army co-production and value adding for MSDS feature data

Page 6: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 6 Architecture Technology CorporationSpecialists in Computer Architecture

TopoAssistant Innovations

Novel approach for automating the feature data refinement using spatial data mining techniques Detection of errors

Spatial outlier detection statistical/empirical rules collocation based rules

Feature attribution Attribute/Location prediction techniques

collocation based rules Open/Extensible implementation architecture

Plug-in/add-on spatial data mining techniques C/JMTK framework compliant Seamless integration with commercial GIS products

Page 7: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 7 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 8: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 8 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Results

Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation

methodology for TEC datasetsConcept feasibility: Applied spatial data mining

techniques for Detection of errors

Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features

Feature attribution Prediction of missing features

Identified technical challenges and Phase II approach for addressing them

Page 9: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 9 Architecture Technology CorporationSpecialists in Computer Architecture

Implementation Feasibility: Phase I Prototype Architecture

SHAPEFILETO SQL

CONVERSION(SHP2PGSQL)

OUTLIER DETECTION/

COLLOCATIONPACKAGE

(Weka)

CONVERT SQLTABLES INTOSHAPEFILES

LOAD SQL TABLES

INTOPOSTGRES/

POSTGIS

JDBC BRIDGE

SPATIAL JOINS USING SQL QUERIES

BACK-END SPATIAL DATABASE

COMPONENT

FRONT- END SPATIAL DATA

MINING COMPONENT

SHAPEFILEDATASET

SHAPEFILES

VISUALIZE SHAPEFILESWITH ARCEXPLORER

INTO MAPS

DATA VISUALIZATION COMPONENT

Page 10: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 10 Architecture Technology CorporationSpecialists in Computer Architecture

Architecture Components

Back-end Spatial Database Component PostGIS - Spatially enables Postgresql table ogis compliant Shp2pgsql tool - Shapefile to SQL table conversion using Bulk loader - Load SQL tables into spatially enabled

database Front-end Data Mining Component

Weka - Java based public domain software that implements classical data mining techniques

Custom spatial data mining classes - spatial outlier detection/collocation pattern detection package implemented for Weka

Pgsql2shp - Convert SQL tables returned as a result of outlier detection /collocation pattern detection operation into shapefiles using

Page 11: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 11 Architecture Technology CorporationSpecialists in Computer Architecture

Architecture Components

Connector Component - JDBC Bridge Java client in Weka can access PostGIS “geometry”

objects in Postgres database using JDBC extensions bundled with Postgres and PostGIS.

JDBC bridge successfully tested on test machine Map Visualization Component

ArcExplorer for shapefile visualization

Page 12: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 12 Architecture Technology CorporationSpecialists in Computer Architecture

Prototype Evaluation Methodology

Received Korea dataset from TEC Reviewed dataset using ArcExplorer Leveraged spatial database component to convert

shapefile to SQL script Loaded table in Postgres/PostGIS Formulated and ran SQL3/OGIS queries to mine

outliers/collocation patterns and compute interest mean

Converted resulting tables into shapefiles Visualized results using ArcExplorer

Page 13: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 13 Architecture Technology CorporationSpecialists in Computer Architecture

TEC Dataset Overview

Korea dataset Latitude 37deg15min to 37deg30min Longitude 128deg23min51sec to 128deg23min52sec

Layers Obstacles (Cut, embankment, depression) Surface drainage (River, stream, island, common open water,

ford, dam) Slope Soils (Poorly graded gravel, clayey sand, organic

silt,disturbed soil) Vegetation (Land subject to inundation, cropland, rice field,

evergreen trees, mixed trees) Transport (Roads, cart roads, railways)

Page 14: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 14 Architecture Technology CorporationSpecialists in Computer Architecture

TEC Dataset Overview

Visualized using ArcExplorer except elevation data Interpreted feature sets in TEC datasets

Using FACC Except common open water feature (surface drain

layer) Pattern rich

Numerous spatial outliers Collocation patterns

Promising test dataset for spatial data mining

Page 15: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 15 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Results

Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation

methodology for TEC datasetsConcept feasibility: Applied spatial data mining

techniques for Detection of errors

Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features

Feature attribution Prediction of missing features

Identified technical challenges and Phase II approach for addressing them

Page 16: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 16 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Errors via Spatial Outliers

Motivation - Improve map accuracy by detecting/predicting Positional errors Extra/erroneous/missing features Mislabeled/misclassified features

Spatial outlier detection techniques Statistical/user defined tests Collocation patterns

Page 17: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 17 Architecture Technology CorporationSpecialists in Computer Architecture

Spatial Outliers Detected

Statistical/user defined tests Disconnected road Overlapping road and river

Page 18: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 18 Architecture Technology CorporationSpecialists in Computer Architecture

Statistical/Empirically Derived Outliers Positional Error: Disconnected Roads

Disconnected Road

Legend

Road 1

Road 3

Road 2

Road 4

Road 5

Road 6

6 Disconnected roads discovered

Visual inspection may not reveal disconnect without further zooming

May be indicative of positional error

Distance threshold is 0.001 units

Page 19: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 19 Architecture Technology CorporationSpecialists in Computer Architecture

Statistical/Empirically Derived Outliers Positional Error: Disconnected Roads

Disconnected Road

Legend

Road 1

Road 3

Road 2

Road 4

Road 5

Road 6

Disconnect

Disconnect

Disconnect

Disconnect

Disconnect

Disconnect

6 Disconnected roads discovered

Visual inspection may not reveal disconnect without further zooming

May be indicative of positional error

Distance threshold is 0.001 units

Page 20: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 20 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Disconnected

Road 1

Page 21: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 21 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Disconnected

Road 2

Page 22: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 22 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Road 3

Disconnected

Disconnected

Page 23: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 23 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Disconnected

Road 3

Page 24: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 24 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified ViewFrontage Road Example

Disconnected ?Road 4

Interesting because end-point of Road 4

doesn’t appear visually to be close to end-point of other road.

Or is it ?

Afterthought: Road 4 resembles frontage road

End point of road geometry

Page 25: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 25 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Disconnected

Road 5

Page 26: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 26 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road: Magnified View

Disconnected

Road 6

Page 27: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 27 Architecture Technology CorporationSpecialists in Computer Architecture

Disconnected Road:Additional Outlier Discovered

Disconnected

Road 6

Outlier !

Page 28: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 28 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Disconnected Roads:Empirical Technique Used

Determine and store start-point and end-point of each road in the road table

Calculate distance between start-point and end-point of each road with start-point and end-point of every other road

Flag roads whose ends are at distance less than 0.001 units from each other as outliers

Page 29: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 29 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Disconnected Roads: Spatial Query Fragment

CREATE VIEW Road AS

SELECT T.id as Road_id, T.the_geom as Road_Geometry, startpoint ( T.the_geom ) as Road_Start_Point, endpoint ( T.the_geom ) as Road_End_Point

FROM Road_Line_Table T;

CREATE VIEW Disconnected_Road AS

SELECT R1.Road_id as Disconnected_Road_id

FROM Road R1, Road R2

WHERE ( disjoint ( R1.Road_Geometry, R2.Road_Geometry ) = true ) AND ( distance ( R1.Road_Start_Point, R2.Road_Start_Point ) <

0.001 OR

distance ( R1.Road_Start_Point, R2.Road_End_Point ) < 0.001 OR distance ( R1.Road_End_Point, R2.Road_Start_Point ) < 0.001

OR

distance ( R1.Road_End_Point, R2.Road_End_Point ) < 0.001 ) ;

CREATE TABLE Disconnected_Road_Outlier AS

SELECT DISTINCT R.*

FROM Road_Line_table R, Disconnected_Road D

WHERE R.id = D. Disconnected_Road_id ;

Page 30: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 30 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Disconnected RoadsSpatial Query Performance

Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 4.5 minutes

Page 31: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 31 Architecture Technology CorporationSpecialists in Computer Architecture

Statistical/Empirically Derived OutliersRoad Frequently Crossing River

Road frequently crossingriver

Visual inspection may not reveal outlier without further zooming

May be indicative of positional error

Threshold = 0.001 units

River

Road

Legend

Road 3

Road 2

Road 1

Page 32: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 32 Architecture Technology CorporationSpecialists in Computer Architecture

Statistical/Empirically Derived OutliersRoad Frequently Crossing River

Road frequently crossingriver

May be indicative of positional error

River

Road

Legend

Road 3

Road 2

Road 1

Outlier

Outlier

Outlier

Page 33: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 33 Architecture Technology CorporationSpecialists in Computer Architecture

Road Frequently Crossing River: Magnified View

Road 1

Outlier

Outlier

River

Road

Legend

Bridge

Page 34: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 34 Architecture Technology CorporationSpecialists in Computer Architecture

Road Frequently Crossing River: Magnified View

Road 2

Outlier River

Road

Legend

Bridge

Page 35: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 35 Architecture Technology CorporationSpecialists in Computer Architecture

Road Frequently Crossing River: Magnified View

Outlier

Road 3

River

Road

Legend

Bridge

Page 36: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 36 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Road Frequently Crossing River:Empirical Technique Used

Determine intersections of roads and rivers Identify location pairs

If the distance between any two location pairs is less than 0.001 units, it is classified as an outlier

Ensure that there is no bridge geometry feature between the two location pairs

Page 37: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 37 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Road Frequently Crossing RiverSpatial Query Fragment

CREATE VIEW Road_River_Cross_Geometry AS

SELECT T.id as Road_Cross_RiverID,

intersection ( T.the_geom, S.the_geom ) as Road_Cross_River

FROM Road_Line_Table T, River_Area_Table S

WHERE intersects ( T.the_geom, S.the_geom ) = true ;

CREATE VIEW Roads_Crossing_River_Frequently AS

SELECT R1.Road_Cross_RiverID AS Road_Cross_River_OutlierID,

FROM Road_River_Cross_Geomtery R1, Road_River_Cross_Geometry R2

WHERE disjoint ( R1.Road_Cross_River, R2.Road_Cross_River)

AND distance ( R1.Road_Cross_river, R2.Road_Cross_River ) < 0.001 ;

CREATE TABLE Road_Crossing_River_Outlier AS

SELECT DISTINCT T.*

FROM Road_Line_Table T, Roads_Crossing_River_Frequently R

WHERE T.id = R. Road_Cross_River_OutlierID;

Page 38: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 38 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Road Frequently Crossing River Spatial Query Performance

Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 5 minutes

Page 39: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 39 Architecture Technology CorporationSpecialists in Computer Architecture

River Becoming Stream: Predicting Mislabeled Features

Streams usually become rivers but rivers rarely become streams unless a lake is nearby

River becoming a stream is a local spatial outlier

Stream

River

Page 40: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 40 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting River Becoming Stream:Empirical Technique Used

Determine intersections of rivers and streams If there are no lakes at distance less than 0.01 units

near the intersection points classify the river feature as an outlier

Page 41: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 41 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Results

Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation

methodology for TEC datasetsConcept feasibility: Applied spatial data mining

techniques to Detection of errors

Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features

Feature attribution Prediction of missing features

Identified technical challenges and Phase II approach for addressing them

Page 42: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 42 Architecture Technology CorporationSpecialists in Computer Architecture

Feature Attribution via Collocation

Motivation - Improve feature attribution by Prediction of missing features

Approach - collocation patterns Collocation patterns detected

Crop land/rice fields: ends of roads/cart roads/rivers/streams

Road collocated with river/stream

Page 43: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 43 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Collocation Patterns:Algorithmic Basis

To calculate the degree of collocation we use a measure called interest measures

E.g., 96.5 % of the cropland are close to road/river Interest measure represents conditional probability i.e., is

the probability of finding a road or river nearby, there being a cropland is 0.965

Cropland not close to road/river may predict missing road or river feature

Cropland not close to road/river may also indicate positional error of cropland

Page 44: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 44 Architecture Technology CorporationSpecialists in Computer Architecture

Predicting Missing Features using Collocation Patterns

Cropland collocated with river, stream or road

May predict missing river, stream or road features

River/stream

Cropland

Road

Non collocated

cropland

Page 45: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 45 Architecture Technology CorporationSpecialists in Computer Architecture

Spatial Outlier Detection using Collocation Patterns

Cropland collocated with river, stream or road

Cropland outlier mayalso predict positional error of cropland

River/stream

Cropland

Road

Croplandoutlier

Page 46: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 46 Architecture Technology CorporationSpecialists in Computer Architecture

Cropland/Road/River: Interest Measure

Collocationpattern

Number ofcollocatedcropland

Interest measure (%)collocated cropland / total cropland *

100Cropland with

river90 46 %

Cropland withcartroad

97 55 %

Cropland withroad

118 60 %

Cropland withstream

137 68 %

Cropland withroad or cartroador river or stream

192 96.5 %

Total number of cropland features = 199 Distance threshold = 0.001 96.5 % of all cropland features collocated with road or river

Page 47: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 47 Architecture Technology CorporationSpecialists in Computer Architecture

Cropland/Road/River Collocation Pattern:Technique Used

Cropland pattern detected using collocation pattern detection techniques

• Step 1: Cropland areas collocated with cart road/road determined• Step 2: Cropland areas collocated with stream/river determined• Step 3: Cropland areas collocated with cart road/road or stream/river determined

Cropland outliers are cropland areas which are not collocated with either road, cartroad, stream or river features

Page 48: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 48 Architecture Technology CorporationSpecialists in Computer Architecture

Cropland/Road/River Collocation Pattern: Spatial Query Fragment

CREATE TABLE Cropland_River_Collocate AS

SELECT C.* FROM River_Area_Table R, Veg_Area_Table C

WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.01) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.01);

CREATE TABLE Cropland_Stream_Collocate AS

SELECT C.* FROM Stream_Line_Table R, Veg_Area_Table C

WHERE ( C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR ( C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom) < 0.001) ;

CREATE TABLE Cropland_Road_Collocate AS

SELECT C.* FROM Road_Line_Table R, Veg_Area_Table C

WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.001);

CREATE TABLE Cropland_Cartroad_Collocate AS

SELECT C.* FROM Cartroad_Line_Table R, Veg_Area_Table C

WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.001);

Page 49: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 49 Architecture Technology CorporationSpecialists in Computer Architecture

Cropland/Road/River Collocation Pattern: Spatial Query Performance

CollocationPattern

Execution Time(Minutes)

Cropland withstream

6.3

Cropland withriver

2.2

Cropland withcartroad

1.8

Cropland withroad

3.2

Cropland withroad or cartroador river or stream

13.5

Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 13.5 minutes

Page 50: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 50 Architecture Technology CorporationSpecialists in Computer Architecture

Collocation Pattern: Roads with Rivers

River/Stream

Collocated Roads

Road collocated with river/stream

Pondering if it could be used to predict anything ?

May predict missing streams

Non collocated

Roads

Page 51: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 51 Architecture Technology CorporationSpecialists in Computer Architecture

Road with River: Interest Measure

CollocationPattern

Number ofCollocatedFeatures

Interest Measure (%)(Collocated roads / Total roads) * 100

Road withstream

153 of 239 64 %

Road withriver

96 of 239 40 %

Road withstream or river

176 of 239 74 %

Cartroad withstream

97 of 136 71 %

Cartroad withriver

44 of 136 32 %

Cartroad withstream or river

111 of 136 82 %

All roads withriver or stream

287 of 375 77 %

375 road features Distance threshold = 0.001 units 77 % of all roads collocated with river

Page 52: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 52 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Road River Collocation Pattern:Technique Used

Roads collocated with rivers determined using collocation pattern detection techniques.

• Step 1: Roads collocated with rivers determined. • Step 2: Roads collocated with streams determined. • Step 3: Cart roads collocated with rivers determined. • Step 4: Cart roads collocated with streams determined.

Page 53: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 53 Architecture Technology CorporationSpecialists in Computer Architecture

Detecting Road River Collocation PatternSpatial Query Fragment

CREATE TABLE Road_River_Collocate AS

SELECT DISTINCT R.*

FROM River_Area_Table T, Road_Line_Table R

WHERE distance ( T.the_geom, R.the_geom ) < 0.001;

CREATE TABLE Road_Stream_Collocate AS

SELECT DISTINCT R.*

FROM Stream_Line_Table T, Road_Line_Table R

WHERE distance ( T.the_geom, R.the_geom ) < 0.001;

CREATE TABLE Cartroad_River_Collocate AS

SELECT DISTINCT R.*

FROM River_Area_Table T, Cartroad_Line_Table R

WHERE distance ( T.the_geom, R.the_geom ) < 0.001;

CREATE TABLE Cartroad_Stream_Collocate AS

SELECT DISTINCT R.*

FROM Stream_Line_Table T, Cartroad_Line_Table R

WHERE distance ( T.the_geom, R.the_geom ) < 0.001;

Page 54: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 54 Architecture Technology CorporationSpecialists in Computer Architecture

Spatial Query Performance

CollocationPattern

Execution Time(Minutes)

Road withstream

5.2

Road withriver

3.1

Cartroad withstream

1.6

Cartroad withriver

2.3

All roads withriver or stream 12

Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 12 minutes

Page 55: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 55 Architecture Technology CorporationSpecialists in Computer Architecture

Other Possible Predictive Patterns

Candidate patterns Predict crop type (rice) based on soil type (clay), soil

wetness condition, slope, elevation and surface drain (river/stream)

Predict land cover type (deciduous) based on soil type (clay), soil wetness condition, slope, elevation and surface drain (river/stream)

Predict soil type based on slope, elevation, surface drain, vegetation/landcover

Page 56: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 56 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Schedule and Status

Months after Start DateTask 1 2 3 4 5 6 7 8 9 Complete

1. Requirements Driven Selection ofMining Algorithms

# # 100 %

2. Definition of TopoAssistant SoftwareArchitecture

# # # # 80 %

3. TopoAssistent rapid prototype # # # # 85 %

4. Detailed Design of TopoAssistant(Option)

# # # To bestarted

5. Program Management and FinalTechnical Report

# # # # # # # # # 60 %

Page 57: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 57 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Results

Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation

methodology for TEC datasetsConcept feasibility: Applied spatial data mining

techniques for Detection of errors

Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features

Feature attribution Prediction of missing features Prediction of erroneous/missing attribute values

Identified technical challenges and Phase II approach for addressing them

Page 58: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 58 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Results Summary

Concept feasibility of TopoAssistant established

Implementation feasibility of TopoAssistant established

Applied Phase I prototype on TEC’s Korea dataset forDetection of errorsFeature attribution

Page 59: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 59 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 60: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 60 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Prototype Demonstration

PostGIS/Postgres description Convert Obstacles shapefile into SQL table Insert table into PostGIS TECKoreaDB

database JDBC bridge demonstration - cart roads

collocated with streams Convert resulting table into shapefile using

pgsql2shp

Page 61: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 61 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Prototype Demonstration

Cropland collocated with road/river/cart road/stream demonstration

Road frequently crossing river demonstration

Page 62: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 62 Architecture Technology CorporationSpecialists in Computer Architecture

Phase I Summary Concept feasibility of TopoAssistant established Implementation feasibility of TopoAssistant

established Applied Phase I prototype on TEC’s Korea

dataset forDetection of errorsFeature attribution

Successful demonstration of Phase I prototype for Detection of errors Collocation patterns

Page 63: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 63 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 64: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 64 Architecture Technology CorporationSpecialists in Computer Architecture

Technical Challenges Identified

Modeling spatial patterns, interest measures Local spatial outliers, edge effect numeric attribute prediction (load class of bridge etc) Similarity measure for extended objects

Scalability Multi-way spatial join, top-k querying technique

Automation of spatial data mining methods Reduce user burden to enumerate patterns, specify

thresholds Topographer’s GUI design

Present actionable information from data mining Map comparison tools

Data mining agent synthesis Allow rapid construction of new mining agents

Merging spatial datasets from disparate sources

Page 65: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 65 Architecture Technology CorporationSpecialists in Computer Architecture

Modeling Spatial Patterns: Edge Effect

Cropland on map edges may not be classified as outliers

No concept of spatial edges in classical data mining

River/Stream

Cropland

Road

Croplandoutlier

Page 66: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 66 Architecture Technology CorporationSpecialists in Computer Architecture

Modeling Spatial Patterns: Local Spatial Outliers

Streams usually become rivers but rivers rarely become streams unless a lake is nearby

River becoming a stream is a local spatial outlier

Can be determined based on spatial neighborhood

Stream

River

Page 67: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 67 Architecture Technology CorporationSpecialists in Computer Architecture

Modeling Spatial Patterns: Similarity MeasuresOccasional vs. Persistent Collocations

Occasional

Persistent

Page 68: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 68 Architecture Technology CorporationSpecialists in Computer Architecture

Scalability: Performance Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time

Disconnected roads - 4.5 minutes Road crossing river - 5 minutesCropland collocated with road/river - 13.5 minutesRoad river collocation - 12 minutes

Performance satisfactory for TEC Korea dataset Faster machines can be used for even better performance Bigger maps possible Outlier tests could be more elaborate

Page 69: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 69 Architecture Technology CorporationSpecialists in Computer Architecture

Scalability: PerformanceSpatial Join over More than 2 Tables

CREATE TABLE croplandcollocateroadriver AS

SELECT DISTINCT C.* FROM trntrdlinetable T1, trntrnlinetable T, sdrareatable R1, sdrlinetable R,

vegareatable C

WHERE (C.f_code_des = 'Cropland' AND distance(C.the_geom,R.the_geom) < 0.001) OR

(C.f_code_des = 'Rice Field' AND distance(C.the_geom,R.the_geom) < 0.001) OR

(C.f_code_des = 'Cropland' AND distance(C.the_geom,R1.the_geom) < 0.001) OR

(C.f_code_des = 'Rice Field' AND distance(C.the_geom,R1.the_geom) < 0.001) OR

(C.f_code_des = 'Cropland' AND distance(C.the_geom,T.the_geom) < 0.001) OR

(C.f_code_des = 'Rice Field' AND distance(C.the_geom,T.the_geom) < 0.001) OR

(C.f_code_des = 'Cropland' AND distance(C.the_geom,T1.the_geom) < 0.001) OR

(C.f_code_des = 'Rice Field' AND distance(C.the_geom,T1.the_geom) < 0.001);

Query execution takes 24 hours

Page 70: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 70 Architecture Technology CorporationSpecialists in Computer Architecture

Automation of Spatial Data Mining Methods:Choosing Right Threshold Values

CollocationPattern

Number ofCollocatedFeatures

Interest Measure (%)(Collocated roads / Total roads) * 100

Road withstream

153 of 239 64 %

Road withriver

96 of 239 40 %

Road withstream or river

176 of 239 74 %

Cartroad withstream

97 of 136 71 %

Cartroad withriver

44 of 136 32 %

Cartroad withstream or river

111 of 136 82 %

All roads withriver or stream

287 of 375 77 %

77 % of all roads collocated with river with distance threshold = 0.001

Page 71: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 71 Architecture Technology CorporationSpecialists in Computer Architecture

Automation of Spatial Data Mining Methods: Choosing Right Threshold Values

CollocationPattern

Number ofCollocatedFeatures

Interest measure (%)(Collocated roads / Total roads) * 100

Road withstream

133 of 239 56 %

Road withriver

238 of 239 99 %

Road withstream or river

239 of 239 100 %

Cartroad withstream

130 of 136 96 %

Cartroad withriver

79 of 136 58 %

Cartroad withstream or river

136 of 136 100 %

All roads withriver or stream

375 of 375 100 %

100 % of all roads collocated with rivers with distance threshold = 0.01!

Page 72: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 72 Architecture Technology CorporationSpecialists in Computer Architecture

Automation of Spatial Data Mining Methods

Distance/Other Thresholdse.g., distance threshold for road river collocation

Collocation patterns/enumerating over many tablese.g., trying all possibilities of triplets and pairs

out of cropland and all other features

Page 73: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 73 Architecture Technology CorporationSpecialists in Computer Architecture

TopoGrapher’s GUI Design

Two layers on top of each other obstructs topographer’s view E.g., Road-river on top of each other tend to

obstruct each other Mock-up GUI specs

Page 74: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 74 Architecture Technology CorporationSpecialists in Computer Architecture

Data Mining Agent Synthesis

Present methodology very manual How can one facilitate quick and convenient

synthesis ?

Page 75: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 75 Architecture Technology CorporationSpecialists in Computer Architecture

Merging Spatial Datasets from Disparate Sources Disparate FACC’s E.g., Common Open Water in TEC Korea

dataset How can we merge spatial datasets from

disparate sources ?

Page 76: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 76 Architecture Technology CorporationSpecialists in Computer Architecture

Technical Challenges Summary

Concept feasibility of TopoAssistant established

Implementation feasibility of TopoAssistant established

Successful demonstration of TopoAssistant Phase I prototype

Confident of addressing all identified technical challenges in follow-on Phase II

Page 77: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 77 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 78: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 78 Architecture Technology CorporationSpecialists in Computer Architecture

Phase II Technical Approach

Scalability challenge Automation of spatial data mining methods Topographer’s GUI design Modeling spatial patterns Data mining agent synthesis Merging spatial datasets

Page 79: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 79 Architecture Technology CorporationSpecialists in Computer Architecture

Approach to Scalability Challenge

SDM level Special algorithm e.g. top-k, apriori like, MRF, SAR

Geometry engine level Geometry simplification, e.g. Convex hull, spline Algorithms, e.g. Plane-sweep

Database level Filter and refine, e.g. MOBR Spatial indexing, multi-way spatial join algorithm (spatial

star join) System level

Reduce JDBC overhead - send multiple queries across at once

Rewrite the SQL, e.g., tables versus views

Page 80: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 80 Architecture Technology CorporationSpecialists in Computer Architecture

Automation of Spatial Data Mining Methods Reduce user burden to specify thresholds

Sorting to help identify top k outliers Statistical analysis to derive thresholds

E.g. Outside (mean +/- 3*standard deviation)

Reduce burden to enumerate candidate patterns Enumeration agents, e.g. enumerate pairs, triplets, ....

Subsets of a given set of features/layers Compute interest measure for each candidate Report interesting candidates

Page 81: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 81 Architecture Technology CorporationSpecialists in Computer Architecture

Topographer’s GUI Design

Workflow modeling Identify common topography tasks Identify dependence among topography tasks Specify common workflows

GUI design to facilitate common workflows Interview based Wizards to guide new users

Page 82: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 82 Architecture Technology CorporationSpecialists in Computer Architecture

Modeling Spatial Patterns

Similarity measures for extended objects Buffer based measure

intersection(object1,buffer(object2, d)) Similarity measures incorporating non-geometric

attributes Measures of neighborhood homogeneity, e.g. entropy

Spatial edge effect Feedback to the user, e.g. distance of pattern to the

edge

Page 83: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 83 Architecture Technology CorporationSpecialists in Computer Architecture

Data Mining Agent Synthesis

Graphical/interactive or scripting language Specify SDM process/workflow Visualize and evaluate data/patterns

Page 84: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 84 Architecture Technology CorporationSpecialists in Computer Architecture

Merging Spatial Datasets

Ontology mapping tools Design custom converter Universal ontology, e.g. OGIS, GML

Page 85: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 85 Architecture Technology CorporationSpecialists in Computer Architecture

Technical Approach Summary

Technical approach addresses all technical challenges identified

Confident of incorporating technical approach in full-scale Phase II prototype

Page 86: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 86 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 87: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 87 Architecture Technology CorporationSpecialists in Computer Architecture

Phase II Tasks

Refine TopoAssistant architecture Build spatial data mining engine Build Topographer’s GUI Build predictive and map validation agents System evaluation and refinement

Page 88: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 88 Architecture Technology CorporationSpecialists in Computer Architecture

Refine the Architecture

New modules - Spatial data mining engine, geometry engine, visualization engine for map comparison

Reuse of COTS, public domain software Portable Extensible Interoperability with other tools used in MSDS

generation process

Page 89: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 89 Architecture Technology CorporationSpecialists in Computer Architecture

Preliminary TopoAssistant System Architecture

Dataset 1

Data Conversionand Import

Tool

Dataset 2

Dataset 3

Ontology

SpatialDatabase

Exporter

MSDS

Topographer’sGUI

Data Enhancement/

Tailoring Agents

Verification/Error

Detection Agents

Recommendations

Approvals

Approvals

Recommendations

Approved

Approved

Updates

Updates

Page 90: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 90 Architecture Technology CorporationSpecialists in Computer Architecture

Refined TopoAssistant System Architecture

Dataset 1

Data Conversionand Import

Tool

Dataset 2

Dataset 3

Ontology

SpatialDatabase

Exporter

MSDS

Topographer’sGUI

MultipleMap

Visualization

Map error

Detection Agents

Recommendations

Approvals

Approvals

Recommendations

Approved

Approved

Updates

UpdatesSpatialData

MiningEngine

Map Feature

Attribution Agents

Geometry Engine

New components- Geometry engine, spatial data mining engine and multiple map visualization

Standard interfaces, plug-in components

Page 91: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 91 Architecture Technology CorporationSpecialists in Computer Architecture

Preliminary Implementation Architecture

C/JM TK W rapper(ArcSDE)

Database

O bject-Based Software Bus (e.g., COM )

C/JM TK(ArcObjects)

M ining Agents

C/JM TK(ArcObjects)

G UI

C/JM TK(ArcObjects)

Im port/Export Tool

Preliminary approach Integration within C/JMTK

Framework Seamless Interoperability

with Commercial Applications

Page 92: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 92 Architecture Technology CorporationSpecialists in Computer Architecture

Refined Implementation Architecture

C/JM TK W rapper(ArcSDE)

Database

O bject-Based Software Bus (e.g., COM )

C/JM TK(ArcObjects)

M ining Agents

C/JM TK(ArcObjects)

G UI

C/JM TK(ArcObjects)

Im port/Export Tool

C/JM TK(Arcobjects)

Spatial Data M iningEngine

New spatial data mining engine component

Integration within C/JMTK Framework

Seamless Interoperability with Commercial Applications

Page 93: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 93 Architecture Technology CorporationSpecialists in Computer Architecture

Build Spatial Data Mining Engine

Performance tuning Identify bottlenecks Address performance bottlenecks using new

algorithms, data structures and architecture refinement

Modeling spatial patterns Extend spatial data mining algorithms with new

interest measures, e.g. Buffer based, entropy Build enumerative agents Build threshold free methods

Sorting and statistical techniques

Page 94: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 94 Architecture Technology CorporationSpecialists in Computer Architecture

Topographer’s GUI Design

User interface design Requirement analysis Develop use-case scenario Mock-up GUI Spiral development

Build pattern evaluation tools to support Visual inspection: Allow map comparisons Collaborative review Statistical measures Comparison with gold standard, e.g. Image data

Page 95: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 95 Architecture Technology CorporationSpecialists in Computer Architecture

Build Predictive and Validation Agents

Build outlier detection agents Single layer outliers using statistical tests Multi-layer outliers using collocation tests Outliers via user defined tests

Build predictive agents Build collocation based predictive agents Build attribute value prediction agents, e.g. via spatial

autoregression or MRF Bayesian classifiers

Page 96: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 96 Architecture Technology CorporationSpecialists in Computer Architecture

System Evaluation and Refinement

Obtain benchmark datasets from TEC Select test cases, e.g. visually identified prediction

and validation scenarios (patterns) Compare results from agents with test cases Identify alpha testers from TEC Provide alpha prototype to TEC topographers (Yr 1) Iteratively refine prototype based on TEC feedback

Page 97: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 97 Architecture Technology CorporationSpecialists in Computer Architecture

Phase II Work Plan

Task Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

1. Refine TopoAssistantArchitecture

#

2. Build spatial data miningengine

# # # #

3. Build Topographer’s GUI # # # #4. Build predictive and map

validation agents# # # #

5. System evaluation andrefinement

# # # # # #

6. Program Management &Technical Reports

# # # # # # # #

Important milestones Deliver mock up GUI to TEC at the end of quarter 2 (month 6) Deliver alpha software to TEC for testing purposes at the end of quarter 4 (Year 1)

Refine system based on TEC testing/feedback and progressively deliver more mature

versions at the end of quarters 5, 6, 7 and 8

Page 98: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 98 Architecture Technology CorporationSpecialists in Computer Architecture

Outline

SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary

Page 99: Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

Phase I Status Update April 26 2004 -- page 99 Architecture Technology CorporationSpecialists in Computer Architecture

Summary

Established concept and implementation feasibility of TopoAssitant approach via Phase I results and Phase I prototype demonstration

Identified technical challenges to be addressed in follow-on Phase II

Developed technical approach to address technical challenges

Developed Phase II task list and work plan Confident of incorporating our ideas in the full-scale

Phase II prototype