Upload
tuari
View
17
Download
2
Embed Size (px)
DESCRIPTION
Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant). TEC SBIR Phase I A03-129 Status Update Ranga Ramanujan Sid Kudige Shashi Shekhar Gene Proctor - PowerPoint PPT Presentation
Citation preview
Phase I Status Update April 26 2004 -- page 1 Architecture Technology CorporationSpecialists in Computer Architecture
Spatial Data Mining Toolkit for Refining MSDS
(aka TopoAssistant) TEC SBIR Phase I A03-129
Status Update
Ranga Ramanujan Sid Kudige Shashi Shekhar Gene Proctor
952-829-5864 (x120) 952-829-5864 (x163) 612-624-8307 202-293-9701 (x113)
[email protected] [email protected] [email protected] [email protected]
Phase I Status Update April 26 2004 -- page 2 Architecture Technology CorporationSpecialists in Computer Architecture
Agenda
SBIR Review 09:00 - 12:00 Kudige
Lunch 12:00 - 01:00
ATC R&D Overview 01:00 - 01:45Ramanujan
Spatial Data Mining 01:45 - 02:15 Shekhar
Research at UMN
Facility Tour 02:15 - 02:45 Proctor
Phase I Status Update April 26 2004 -- page 3 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 4 Architecture Technology CorporationSpecialists in Computer Architecture
Overall SBIR Goal
Develop TopoAssistant tool for assisting Army topographers with refinement of feature data for “just-in-time” MSDS Phase I Goal
Develop architecture and design of TopoAssistant software tool Build rapid prototype to establish implementation feasibility
Phase II Goal Build full-scale operational prototype of TopoAssistant
Phase III Goal Transition TopoAssistant to fielded system
Team Sid Kudige - PI Ranga Ramanujan - Tech. Advisor Prof. Shashi Shekhar - Consultant Gene Proctor - Commercialization
Phase I Status Update April 26 2004 -- page 5 Architecture Technology CorporationSpecialists in Computer Architecture
Motivation and Payoff
Current process for refining MSDS feature data is time consuming and expensive Study estimate of 2,400 production hours for DTOP 5 data
set for 15’X15’ cell size [Kabinier] TopoAssistant tool will use innovative spatial data mining
techniques to Significantly automate feature data refinement
Detection of errors in source data Prediction of positional errors Prediction of extra/erroneous/missing features Predicting mislabeled features
Feature attribution Prediction of missing features (categorical) Prediction of erroneous/missing attribute values (numerical)
Support timely and cost-effective Army co-production and value adding for MSDS feature data
Phase I Status Update April 26 2004 -- page 6 Architecture Technology CorporationSpecialists in Computer Architecture
TopoAssistant Innovations
Novel approach for automating the feature data refinement using spatial data mining techniques Detection of errors
Spatial outlier detection statistical/empirical rules collocation based rules
Feature attribution Attribute/Location prediction techniques
collocation based rules Open/Extensible implementation architecture
Plug-in/add-on spatial data mining techniques C/JMTK framework compliant Seamless integration with commercial GIS products
Phase I Status Update April 26 2004 -- page 7 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 8 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Results
Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation
methodology for TEC datasetsConcept feasibility: Applied spatial data mining
techniques for Detection of errors
Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features
Feature attribution Prediction of missing features
Identified technical challenges and Phase II approach for addressing them
Phase I Status Update April 26 2004 -- page 9 Architecture Technology CorporationSpecialists in Computer Architecture
Implementation Feasibility: Phase I Prototype Architecture
SHAPEFILETO SQL
CONVERSION(SHP2PGSQL)
OUTLIER DETECTION/
COLLOCATIONPACKAGE
(Weka)
CONVERT SQLTABLES INTOSHAPEFILES
LOAD SQL TABLES
INTOPOSTGRES/
POSTGIS
JDBC BRIDGE
SPATIAL JOINS USING SQL QUERIES
BACK-END SPATIAL DATABASE
COMPONENT
FRONT- END SPATIAL DATA
MINING COMPONENT
SHAPEFILEDATASET
SHAPEFILES
VISUALIZE SHAPEFILESWITH ARCEXPLORER
INTO MAPS
DATA VISUALIZATION COMPONENT
Phase I Status Update April 26 2004 -- page 10 Architecture Technology CorporationSpecialists in Computer Architecture
Architecture Components
Back-end Spatial Database Component PostGIS - Spatially enables Postgresql table ogis compliant Shp2pgsql tool - Shapefile to SQL table conversion using Bulk loader - Load SQL tables into spatially enabled
database Front-end Data Mining Component
Weka - Java based public domain software that implements classical data mining techniques
Custom spatial data mining classes - spatial outlier detection/collocation pattern detection package implemented for Weka
Pgsql2shp - Convert SQL tables returned as a result of outlier detection /collocation pattern detection operation into shapefiles using
Phase I Status Update April 26 2004 -- page 11 Architecture Technology CorporationSpecialists in Computer Architecture
Architecture Components
Connector Component - JDBC Bridge Java client in Weka can access PostGIS “geometry”
objects in Postgres database using JDBC extensions bundled with Postgres and PostGIS.
JDBC bridge successfully tested on test machine Map Visualization Component
ArcExplorer for shapefile visualization
Phase I Status Update April 26 2004 -- page 12 Architecture Technology CorporationSpecialists in Computer Architecture
Prototype Evaluation Methodology
Received Korea dataset from TEC Reviewed dataset using ArcExplorer Leveraged spatial database component to convert
shapefile to SQL script Loaded table in Postgres/PostGIS Formulated and ran SQL3/OGIS queries to mine
outliers/collocation patterns and compute interest mean
Converted resulting tables into shapefiles Visualized results using ArcExplorer
Phase I Status Update April 26 2004 -- page 13 Architecture Technology CorporationSpecialists in Computer Architecture
TEC Dataset Overview
Korea dataset Latitude 37deg15min to 37deg30min Longitude 128deg23min51sec to 128deg23min52sec
Layers Obstacles (Cut, embankment, depression) Surface drainage (River, stream, island, common open water,
ford, dam) Slope Soils (Poorly graded gravel, clayey sand, organic
silt,disturbed soil) Vegetation (Land subject to inundation, cropland, rice field,
evergreen trees, mixed trees) Transport (Roads, cart roads, railways)
Phase I Status Update April 26 2004 -- page 14 Architecture Technology CorporationSpecialists in Computer Architecture
TEC Dataset Overview
Visualized using ArcExplorer except elevation data Interpreted feature sets in TEC datasets
Using FACC Except common open water feature (surface drain
layer) Pattern rich
Numerous spatial outliers Collocation patterns
Promising test dataset for spatial data mining
Phase I Status Update April 26 2004 -- page 15 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Results
Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation
methodology for TEC datasetsConcept feasibility: Applied spatial data mining
techniques for Detection of errors
Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features
Feature attribution Prediction of missing features
Identified technical challenges and Phase II approach for addressing them
Phase I Status Update April 26 2004 -- page 16 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Errors via Spatial Outliers
Motivation - Improve map accuracy by detecting/predicting Positional errors Extra/erroneous/missing features Mislabeled/misclassified features
Spatial outlier detection techniques Statistical/user defined tests Collocation patterns
Phase I Status Update April 26 2004 -- page 17 Architecture Technology CorporationSpecialists in Computer Architecture
Spatial Outliers Detected
Statistical/user defined tests Disconnected road Overlapping road and river
Phase I Status Update April 26 2004 -- page 18 Architecture Technology CorporationSpecialists in Computer Architecture
Statistical/Empirically Derived Outliers Positional Error: Disconnected Roads
Disconnected Road
Legend
Road 1
Road 3
Road 2
Road 4
Road 5
Road 6
6 Disconnected roads discovered
Visual inspection may not reveal disconnect without further zooming
May be indicative of positional error
Distance threshold is 0.001 units
Phase I Status Update April 26 2004 -- page 19 Architecture Technology CorporationSpecialists in Computer Architecture
Statistical/Empirically Derived Outliers Positional Error: Disconnected Roads
Disconnected Road
Legend
Road 1
Road 3
Road 2
Road 4
Road 5
Road 6
Disconnect
Disconnect
Disconnect
Disconnect
Disconnect
Disconnect
6 Disconnected roads discovered
Visual inspection may not reveal disconnect without further zooming
May be indicative of positional error
Distance threshold is 0.001 units
Phase I Status Update April 26 2004 -- page 20 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Disconnected
Road 1
Phase I Status Update April 26 2004 -- page 21 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Disconnected
Road 2
Phase I Status Update April 26 2004 -- page 22 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Road 3
Disconnected
Disconnected
Phase I Status Update April 26 2004 -- page 23 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Disconnected
Road 3
Phase I Status Update April 26 2004 -- page 24 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified ViewFrontage Road Example
Disconnected ?Road 4
Interesting because end-point of Road 4
doesn’t appear visually to be close to end-point of other road.
Or is it ?
Afterthought: Road 4 resembles frontage road
End point of road geometry
Phase I Status Update April 26 2004 -- page 25 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Disconnected
Road 5
Phase I Status Update April 26 2004 -- page 26 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road: Magnified View
Disconnected
Road 6
Phase I Status Update April 26 2004 -- page 27 Architecture Technology CorporationSpecialists in Computer Architecture
Disconnected Road:Additional Outlier Discovered
Disconnected
Road 6
Outlier !
Phase I Status Update April 26 2004 -- page 28 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Disconnected Roads:Empirical Technique Used
Determine and store start-point and end-point of each road in the road table
Calculate distance between start-point and end-point of each road with start-point and end-point of every other road
Flag roads whose ends are at distance less than 0.001 units from each other as outliers
Phase I Status Update April 26 2004 -- page 29 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Disconnected Roads: Spatial Query Fragment
CREATE VIEW Road AS
SELECT T.id as Road_id, T.the_geom as Road_Geometry, startpoint ( T.the_geom ) as Road_Start_Point, endpoint ( T.the_geom ) as Road_End_Point
FROM Road_Line_Table T;
CREATE VIEW Disconnected_Road AS
SELECT R1.Road_id as Disconnected_Road_id
FROM Road R1, Road R2
WHERE ( disjoint ( R1.Road_Geometry, R2.Road_Geometry ) = true ) AND ( distance ( R1.Road_Start_Point, R2.Road_Start_Point ) <
0.001 OR
distance ( R1.Road_Start_Point, R2.Road_End_Point ) < 0.001 OR distance ( R1.Road_End_Point, R2.Road_Start_Point ) < 0.001
OR
distance ( R1.Road_End_Point, R2.Road_End_Point ) < 0.001 ) ;
CREATE TABLE Disconnected_Road_Outlier AS
SELECT DISTINCT R.*
FROM Road_Line_table R, Disconnected_Road D
WHERE R.id = D. Disconnected_Road_id ;
Phase I Status Update April 26 2004 -- page 30 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Disconnected RoadsSpatial Query Performance
Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 4.5 minutes
Phase I Status Update April 26 2004 -- page 31 Architecture Technology CorporationSpecialists in Computer Architecture
Statistical/Empirically Derived OutliersRoad Frequently Crossing River
Road frequently crossingriver
Visual inspection may not reveal outlier without further zooming
May be indicative of positional error
Threshold = 0.001 units
River
Road
Legend
Road 3
Road 2
Road 1
Phase I Status Update April 26 2004 -- page 32 Architecture Technology CorporationSpecialists in Computer Architecture
Statistical/Empirically Derived OutliersRoad Frequently Crossing River
Road frequently crossingriver
May be indicative of positional error
River
Road
Legend
Road 3
Road 2
Road 1
Outlier
Outlier
Outlier
Phase I Status Update April 26 2004 -- page 33 Architecture Technology CorporationSpecialists in Computer Architecture
Road Frequently Crossing River: Magnified View
Road 1
Outlier
Outlier
River
Road
Legend
Bridge
Phase I Status Update April 26 2004 -- page 34 Architecture Technology CorporationSpecialists in Computer Architecture
Road Frequently Crossing River: Magnified View
Road 2
Outlier River
Road
Legend
Bridge
Phase I Status Update April 26 2004 -- page 35 Architecture Technology CorporationSpecialists in Computer Architecture
Road Frequently Crossing River: Magnified View
Outlier
Road 3
River
Road
Legend
Bridge
Phase I Status Update April 26 2004 -- page 36 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Road Frequently Crossing River:Empirical Technique Used
Determine intersections of roads and rivers Identify location pairs
If the distance between any two location pairs is less than 0.001 units, it is classified as an outlier
Ensure that there is no bridge geometry feature between the two location pairs
Phase I Status Update April 26 2004 -- page 37 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Road Frequently Crossing RiverSpatial Query Fragment
CREATE VIEW Road_River_Cross_Geometry AS
SELECT T.id as Road_Cross_RiverID,
intersection ( T.the_geom, S.the_geom ) as Road_Cross_River
FROM Road_Line_Table T, River_Area_Table S
WHERE intersects ( T.the_geom, S.the_geom ) = true ;
CREATE VIEW Roads_Crossing_River_Frequently AS
SELECT R1.Road_Cross_RiverID AS Road_Cross_River_OutlierID,
FROM Road_River_Cross_Geomtery R1, Road_River_Cross_Geometry R2
WHERE disjoint ( R1.Road_Cross_River, R2.Road_Cross_River)
AND distance ( R1.Road_Cross_river, R2.Road_Cross_River ) < 0.001 ;
CREATE TABLE Road_Crossing_River_Outlier AS
SELECT DISTINCT T.*
FROM Road_Line_Table T, Roads_Crossing_River_Frequently R
WHERE T.id = R. Road_Cross_River_OutlierID;
Phase I Status Update April 26 2004 -- page 38 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Road Frequently Crossing River Spatial Query Performance
Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 5 minutes
Phase I Status Update April 26 2004 -- page 39 Architecture Technology CorporationSpecialists in Computer Architecture
River Becoming Stream: Predicting Mislabeled Features
Streams usually become rivers but rivers rarely become streams unless a lake is nearby
River becoming a stream is a local spatial outlier
Stream
River
Phase I Status Update April 26 2004 -- page 40 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting River Becoming Stream:Empirical Technique Used
Determine intersections of rivers and streams If there are no lakes at distance less than 0.01 units
near the intersection points classify the river feature as an outlier
Phase I Status Update April 26 2004 -- page 41 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Results
Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation
methodology for TEC datasetsConcept feasibility: Applied spatial data mining
techniques to Detection of errors
Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features
Feature attribution Prediction of missing features
Identified technical challenges and Phase II approach for addressing them
Phase I Status Update April 26 2004 -- page 42 Architecture Technology CorporationSpecialists in Computer Architecture
Feature Attribution via Collocation
Motivation - Improve feature attribution by Prediction of missing features
Approach - collocation patterns Collocation patterns detected
Crop land/rice fields: ends of roads/cart roads/rivers/streams
Road collocated with river/stream
Phase I Status Update April 26 2004 -- page 43 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Collocation Patterns:Algorithmic Basis
To calculate the degree of collocation we use a measure called interest measures
E.g., 96.5 % of the cropland are close to road/river Interest measure represents conditional probability i.e., is
the probability of finding a road or river nearby, there being a cropland is 0.965
Cropland not close to road/river may predict missing road or river feature
Cropland not close to road/river may also indicate positional error of cropland
Phase I Status Update April 26 2004 -- page 44 Architecture Technology CorporationSpecialists in Computer Architecture
Predicting Missing Features using Collocation Patterns
Cropland collocated with river, stream or road
May predict missing river, stream or road features
River/stream
Cropland
Road
Non collocated
cropland
Phase I Status Update April 26 2004 -- page 45 Architecture Technology CorporationSpecialists in Computer Architecture
Spatial Outlier Detection using Collocation Patterns
Cropland collocated with river, stream or road
Cropland outlier mayalso predict positional error of cropland
River/stream
Cropland
Road
Croplandoutlier
Phase I Status Update April 26 2004 -- page 46 Architecture Technology CorporationSpecialists in Computer Architecture
Cropland/Road/River: Interest Measure
Collocationpattern
Number ofcollocatedcropland
Interest measure (%)collocated cropland / total cropland *
100Cropland with
river90 46 %
Cropland withcartroad
97 55 %
Cropland withroad
118 60 %
Cropland withstream
137 68 %
Cropland withroad or cartroador river or stream
192 96.5 %
Total number of cropland features = 199 Distance threshold = 0.001 96.5 % of all cropland features collocated with road or river
Phase I Status Update April 26 2004 -- page 47 Architecture Technology CorporationSpecialists in Computer Architecture
Cropland/Road/River Collocation Pattern:Technique Used
Cropland pattern detected using collocation pattern detection techniques
• Step 1: Cropland areas collocated with cart road/road determined• Step 2: Cropland areas collocated with stream/river determined• Step 3: Cropland areas collocated with cart road/road or stream/river determined
Cropland outliers are cropland areas which are not collocated with either road, cartroad, stream or river features
Phase I Status Update April 26 2004 -- page 48 Architecture Technology CorporationSpecialists in Computer Architecture
Cropland/Road/River Collocation Pattern: Spatial Query Fragment
CREATE TABLE Cropland_River_Collocate AS
SELECT C.* FROM River_Area_Table R, Veg_Area_Table C
WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.01) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.01);
CREATE TABLE Cropland_Stream_Collocate AS
SELECT C.* FROM Stream_Line_Table R, Veg_Area_Table C
WHERE ( C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR ( C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom) < 0.001) ;
CREATE TABLE Cropland_Road_Collocate AS
SELECT C.* FROM Road_Line_Table R, Veg_Area_Table C
WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.001);
CREATE TABLE Cropland_Cartroad_Collocate AS
SELECT C.* FROM Cartroad_Line_Table R, Veg_Area_Table C
WHERE (C.f_code_des = 'Cropland' AND distance ( C.the_geom,R.the_geom) < 0.001) OR (C.f_code_des = 'Rice Field' AND distance ( C.the_geom,R.the_geom)<0.001);
Phase I Status Update April 26 2004 -- page 49 Architecture Technology CorporationSpecialists in Computer Architecture
Cropland/Road/River Collocation Pattern: Spatial Query Performance
CollocationPattern
Execution Time(Minutes)
Cropland withstream
6.3
Cropland withriver
2.2
Cropland withcartroad
1.8
Cropland withroad
3.2
Cropland withroad or cartroador river or stream
13.5
Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 13.5 minutes
Phase I Status Update April 26 2004 -- page 50 Architecture Technology CorporationSpecialists in Computer Architecture
Collocation Pattern: Roads with Rivers
River/Stream
Collocated Roads
Road collocated with river/stream
Pondering if it could be used to predict anything ?
May predict missing streams
Non collocated
Roads
Phase I Status Update April 26 2004 -- page 51 Architecture Technology CorporationSpecialists in Computer Architecture
Road with River: Interest Measure
CollocationPattern
Number ofCollocatedFeatures
Interest Measure (%)(Collocated roads / Total roads) * 100
Road withstream
153 of 239 64 %
Road withriver
96 of 239 40 %
Road withstream or river
176 of 239 74 %
Cartroad withstream
97 of 136 71 %
Cartroad withriver
44 of 136 32 %
Cartroad withstream or river
111 of 136 82 %
All roads withriver or stream
287 of 375 77 %
375 road features Distance threshold = 0.001 units 77 % of all roads collocated with river
Phase I Status Update April 26 2004 -- page 52 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Road River Collocation Pattern:Technique Used
Roads collocated with rivers determined using collocation pattern detection techniques.
• Step 1: Roads collocated with rivers determined. • Step 2: Roads collocated with streams determined. • Step 3: Cart roads collocated with rivers determined. • Step 4: Cart roads collocated with streams determined.
Phase I Status Update April 26 2004 -- page 53 Architecture Technology CorporationSpecialists in Computer Architecture
Detecting Road River Collocation PatternSpatial Query Fragment
CREATE TABLE Road_River_Collocate AS
SELECT DISTINCT R.*
FROM River_Area_Table T, Road_Line_Table R
WHERE distance ( T.the_geom, R.the_geom ) < 0.001;
CREATE TABLE Road_Stream_Collocate AS
SELECT DISTINCT R.*
FROM Stream_Line_Table T, Road_Line_Table R
WHERE distance ( T.the_geom, R.the_geom ) < 0.001;
CREATE TABLE Cartroad_River_Collocate AS
SELECT DISTINCT R.*
FROM River_Area_Table T, Cartroad_Line_Table R
WHERE distance ( T.the_geom, R.the_geom ) < 0.001;
CREATE TABLE Cartroad_Stream_Collocate AS
SELECT DISTINCT R.*
FROM Stream_Line_Table T, Cartroad_Line_Table R
WHERE distance ( T.the_geom, R.the_geom ) < 0.001;
Phase I Status Update April 26 2004 -- page 54 Architecture Technology CorporationSpecialists in Computer Architecture
Spatial Query Performance
CollocationPattern
Execution Time(Minutes)
Road withstream
5.2
Road withriver
3.1
Cartroad withstream
1.6
Cartroad withriver
2.3
All roads withriver or stream 12
Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time - 12 minutes
Phase I Status Update April 26 2004 -- page 55 Architecture Technology CorporationSpecialists in Computer Architecture
Other Possible Predictive Patterns
Candidate patterns Predict crop type (rice) based on soil type (clay), soil
wetness condition, slope, elevation and surface drain (river/stream)
Predict land cover type (deciduous) based on soil type (clay), soil wetness condition, slope, elevation and surface drain (river/stream)
Predict soil type based on slope, elevation, surface drain, vegetation/landcover
Phase I Status Update April 26 2004 -- page 56 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Schedule and Status
Months after Start DateTask 1 2 3 4 5 6 7 8 9 Complete
1. Requirements Driven Selection ofMining Algorithms
# # 100 %
2. Definition of TopoAssistant SoftwareArchitecture
# # # # 80 %
3. TopoAssistent rapid prototype # # # # 85 %
4. Detailed Design of TopoAssistant(Option)
# # # To bestarted
5. Program Management and FinalTechnical Report
# # # # # # # # # 60 %
Phase I Status Update April 26 2004 -- page 57 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Results
Demonstrated TopoAssistant feasibility Implementation feasibility: Built prototype Concept feasibility: Designed prototype evaluation
methodology for TEC datasetsConcept feasibility: Applied spatial data mining
techniques for Detection of errors
Prediction of positional errors Prediction of extra/erroneous/missing features Prediction of mislabeled features
Feature attribution Prediction of missing features Prediction of erroneous/missing attribute values
Identified technical challenges and Phase II approach for addressing them
Phase I Status Update April 26 2004 -- page 58 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Results Summary
Concept feasibility of TopoAssistant established
Implementation feasibility of TopoAssistant established
Applied Phase I prototype on TEC’s Korea dataset forDetection of errorsFeature attribution
Phase I Status Update April 26 2004 -- page 59 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 60 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Prototype Demonstration
PostGIS/Postgres description Convert Obstacles shapefile into SQL table Insert table into PostGIS TECKoreaDB
database JDBC bridge demonstration - cart roads
collocated with streams Convert resulting table into shapefile using
pgsql2shp
Phase I Status Update April 26 2004 -- page 61 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Prototype Demonstration
Cropland collocated with road/river/cart road/stream demonstration
Road frequently crossing river demonstration
Phase I Status Update April 26 2004 -- page 62 Architecture Technology CorporationSpecialists in Computer Architecture
Phase I Summary Concept feasibility of TopoAssistant established Implementation feasibility of TopoAssistant
established Applied Phase I prototype on TEC’s Korea
dataset forDetection of errorsFeature attribution
Successful demonstration of Phase I prototype for Detection of errors Collocation patterns
Phase I Status Update April 26 2004 -- page 63 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 64 Architecture Technology CorporationSpecialists in Computer Architecture
Technical Challenges Identified
Modeling spatial patterns, interest measures Local spatial outliers, edge effect numeric attribute prediction (load class of bridge etc) Similarity measure for extended objects
Scalability Multi-way spatial join, top-k querying technique
Automation of spatial data mining methods Reduce user burden to enumerate patterns, specify
thresholds Topographer’s GUI design
Present actionable information from data mining Map comparison tools
Data mining agent synthesis Allow rapid construction of new mining agents
Merging spatial datasets from disparate sources
Phase I Status Update April 26 2004 -- page 65 Architecture Technology CorporationSpecialists in Computer Architecture
Modeling Spatial Patterns: Edge Effect
Cropland on map edges may not be classified as outliers
No concept of spatial edges in classical data mining
River/Stream
Cropland
Road
Croplandoutlier
Phase I Status Update April 26 2004 -- page 66 Architecture Technology CorporationSpecialists in Computer Architecture
Modeling Spatial Patterns: Local Spatial Outliers
Streams usually become rivers but rivers rarely become streams unless a lake is nearby
River becoming a stream is a local spatial outlier
Can be determined based on spatial neighborhood
Stream
River
Phase I Status Update April 26 2004 -- page 67 Architecture Technology CorporationSpecialists in Computer Architecture
Modeling Spatial Patterns: Similarity MeasuresOccasional vs. Persistent Collocations
Occasional
Persistent
Phase I Status Update April 26 2004 -- page 68 Architecture Technology CorporationSpecialists in Computer Architecture
Scalability: Performance Machine used - 1.4 GHz Athlon with 512 MB RAM Total execution time
Disconnected roads - 4.5 minutes Road crossing river - 5 minutesCropland collocated with road/river - 13.5 minutesRoad river collocation - 12 minutes
Performance satisfactory for TEC Korea dataset Faster machines can be used for even better performance Bigger maps possible Outlier tests could be more elaborate
Phase I Status Update April 26 2004 -- page 69 Architecture Technology CorporationSpecialists in Computer Architecture
Scalability: PerformanceSpatial Join over More than 2 Tables
CREATE TABLE croplandcollocateroadriver AS
SELECT DISTINCT C.* FROM trntrdlinetable T1, trntrnlinetable T, sdrareatable R1, sdrlinetable R,
vegareatable C
WHERE (C.f_code_des = 'Cropland' AND distance(C.the_geom,R.the_geom) < 0.001) OR
(C.f_code_des = 'Rice Field' AND distance(C.the_geom,R.the_geom) < 0.001) OR
(C.f_code_des = 'Cropland' AND distance(C.the_geom,R1.the_geom) < 0.001) OR
(C.f_code_des = 'Rice Field' AND distance(C.the_geom,R1.the_geom) < 0.001) OR
(C.f_code_des = 'Cropland' AND distance(C.the_geom,T.the_geom) < 0.001) OR
(C.f_code_des = 'Rice Field' AND distance(C.the_geom,T.the_geom) < 0.001) OR
(C.f_code_des = 'Cropland' AND distance(C.the_geom,T1.the_geom) < 0.001) OR
(C.f_code_des = 'Rice Field' AND distance(C.the_geom,T1.the_geom) < 0.001);
Query execution takes 24 hours
Phase I Status Update April 26 2004 -- page 70 Architecture Technology CorporationSpecialists in Computer Architecture
Automation of Spatial Data Mining Methods:Choosing Right Threshold Values
CollocationPattern
Number ofCollocatedFeatures
Interest Measure (%)(Collocated roads / Total roads) * 100
Road withstream
153 of 239 64 %
Road withriver
96 of 239 40 %
Road withstream or river
176 of 239 74 %
Cartroad withstream
97 of 136 71 %
Cartroad withriver
44 of 136 32 %
Cartroad withstream or river
111 of 136 82 %
All roads withriver or stream
287 of 375 77 %
77 % of all roads collocated with river with distance threshold = 0.001
Phase I Status Update April 26 2004 -- page 71 Architecture Technology CorporationSpecialists in Computer Architecture
Automation of Spatial Data Mining Methods: Choosing Right Threshold Values
CollocationPattern
Number ofCollocatedFeatures
Interest measure (%)(Collocated roads / Total roads) * 100
Road withstream
133 of 239 56 %
Road withriver
238 of 239 99 %
Road withstream or river
239 of 239 100 %
Cartroad withstream
130 of 136 96 %
Cartroad withriver
79 of 136 58 %
Cartroad withstream or river
136 of 136 100 %
All roads withriver or stream
375 of 375 100 %
100 % of all roads collocated with rivers with distance threshold = 0.01!
Phase I Status Update April 26 2004 -- page 72 Architecture Technology CorporationSpecialists in Computer Architecture
Automation of Spatial Data Mining Methods
Distance/Other Thresholdse.g., distance threshold for road river collocation
Collocation patterns/enumerating over many tablese.g., trying all possibilities of triplets and pairs
out of cropland and all other features
Phase I Status Update April 26 2004 -- page 73 Architecture Technology CorporationSpecialists in Computer Architecture
TopoGrapher’s GUI Design
Two layers on top of each other obstructs topographer’s view E.g., Road-river on top of each other tend to
obstruct each other Mock-up GUI specs
Phase I Status Update April 26 2004 -- page 74 Architecture Technology CorporationSpecialists in Computer Architecture
Data Mining Agent Synthesis
Present methodology very manual How can one facilitate quick and convenient
synthesis ?
Phase I Status Update April 26 2004 -- page 75 Architecture Technology CorporationSpecialists in Computer Architecture
Merging Spatial Datasets from Disparate Sources Disparate FACC’s E.g., Common Open Water in TEC Korea
dataset How can we merge spatial datasets from
disparate sources ?
Phase I Status Update April 26 2004 -- page 76 Architecture Technology CorporationSpecialists in Computer Architecture
Technical Challenges Summary
Concept feasibility of TopoAssistant established
Implementation feasibility of TopoAssistant established
Successful demonstration of TopoAssistant Phase I prototype
Confident of addressing all identified technical challenges in follow-on Phase II
Phase I Status Update April 26 2004 -- page 77 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 78 Architecture Technology CorporationSpecialists in Computer Architecture
Phase II Technical Approach
Scalability challenge Automation of spatial data mining methods Topographer’s GUI design Modeling spatial patterns Data mining agent synthesis Merging spatial datasets
Phase I Status Update April 26 2004 -- page 79 Architecture Technology CorporationSpecialists in Computer Architecture
Approach to Scalability Challenge
SDM level Special algorithm e.g. top-k, apriori like, MRF, SAR
Geometry engine level Geometry simplification, e.g. Convex hull, spline Algorithms, e.g. Plane-sweep
Database level Filter and refine, e.g. MOBR Spatial indexing, multi-way spatial join algorithm (spatial
star join) System level
Reduce JDBC overhead - send multiple queries across at once
Rewrite the SQL, e.g., tables versus views
Phase I Status Update April 26 2004 -- page 80 Architecture Technology CorporationSpecialists in Computer Architecture
Automation of Spatial Data Mining Methods Reduce user burden to specify thresholds
Sorting to help identify top k outliers Statistical analysis to derive thresholds
E.g. Outside (mean +/- 3*standard deviation)
Reduce burden to enumerate candidate patterns Enumeration agents, e.g. enumerate pairs, triplets, ....
Subsets of a given set of features/layers Compute interest measure for each candidate Report interesting candidates
Phase I Status Update April 26 2004 -- page 81 Architecture Technology CorporationSpecialists in Computer Architecture
Topographer’s GUI Design
Workflow modeling Identify common topography tasks Identify dependence among topography tasks Specify common workflows
GUI design to facilitate common workflows Interview based Wizards to guide new users
Phase I Status Update April 26 2004 -- page 82 Architecture Technology CorporationSpecialists in Computer Architecture
Modeling Spatial Patterns
Similarity measures for extended objects Buffer based measure
intersection(object1,buffer(object2, d)) Similarity measures incorporating non-geometric
attributes Measures of neighborhood homogeneity, e.g. entropy
Spatial edge effect Feedback to the user, e.g. distance of pattern to the
edge
Phase I Status Update April 26 2004 -- page 83 Architecture Technology CorporationSpecialists in Computer Architecture
Data Mining Agent Synthesis
Graphical/interactive or scripting language Specify SDM process/workflow Visualize and evaluate data/patterns
Phase I Status Update April 26 2004 -- page 84 Architecture Technology CorporationSpecialists in Computer Architecture
Merging Spatial Datasets
Ontology mapping tools Design custom converter Universal ontology, e.g. OGIS, GML
Phase I Status Update April 26 2004 -- page 85 Architecture Technology CorporationSpecialists in Computer Architecture
Technical Approach Summary
Technical approach addresses all technical challenges identified
Confident of incorporating technical approach in full-scale Phase II prototype
Phase I Status Update April 26 2004 -- page 86 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 87 Architecture Technology CorporationSpecialists in Computer Architecture
Phase II Tasks
Refine TopoAssistant architecture Build spatial data mining engine Build Topographer’s GUI Build predictive and map validation agents System evaluation and refinement
Phase I Status Update April 26 2004 -- page 88 Architecture Technology CorporationSpecialists in Computer Architecture
Refine the Architecture
New modules - Spatial data mining engine, geometry engine, visualization engine for map comparison
Reuse of COTS, public domain software Portable Extensible Interoperability with other tools used in MSDS
generation process
Phase I Status Update April 26 2004 -- page 89 Architecture Technology CorporationSpecialists in Computer Architecture
Preliminary TopoAssistant System Architecture
Dataset 1
Data Conversionand Import
Tool
Dataset 2
Dataset 3
Ontology
SpatialDatabase
Exporter
MSDS
Topographer’sGUI
Data Enhancement/
Tailoring Agents
Verification/Error
Detection Agents
Recommendations
Approvals
Approvals
Recommendations
Approved
Approved
Updates
Updates
Phase I Status Update April 26 2004 -- page 90 Architecture Technology CorporationSpecialists in Computer Architecture
Refined TopoAssistant System Architecture
Dataset 1
Data Conversionand Import
Tool
Dataset 2
Dataset 3
Ontology
SpatialDatabase
Exporter
MSDS
Topographer’sGUI
MultipleMap
Visualization
Map error
Detection Agents
Recommendations
Approvals
Approvals
Recommendations
Approved
Approved
Updates
UpdatesSpatialData
MiningEngine
Map Feature
Attribution Agents
Geometry Engine
New components- Geometry engine, spatial data mining engine and multiple map visualization
Standard interfaces, plug-in components
Phase I Status Update April 26 2004 -- page 91 Architecture Technology CorporationSpecialists in Computer Architecture
Preliminary Implementation Architecture
C/JM TK W rapper(ArcSDE)
Database
O bject-Based Software Bus (e.g., COM )
C/JM TK(ArcObjects)
M ining Agents
C/JM TK(ArcObjects)
G UI
C/JM TK(ArcObjects)
Im port/Export Tool
Preliminary approach Integration within C/JMTK
Framework Seamless Interoperability
with Commercial Applications
Phase I Status Update April 26 2004 -- page 92 Architecture Technology CorporationSpecialists in Computer Architecture
Refined Implementation Architecture
C/JM TK W rapper(ArcSDE)
Database
O bject-Based Software Bus (e.g., COM )
C/JM TK(ArcObjects)
M ining Agents
C/JM TK(ArcObjects)
G UI
C/JM TK(ArcObjects)
Im port/Export Tool
C/JM TK(Arcobjects)
Spatial Data M iningEngine
New spatial data mining engine component
Integration within C/JMTK Framework
Seamless Interoperability with Commercial Applications
Phase I Status Update April 26 2004 -- page 93 Architecture Technology CorporationSpecialists in Computer Architecture
Build Spatial Data Mining Engine
Performance tuning Identify bottlenecks Address performance bottlenecks using new
algorithms, data structures and architecture refinement
Modeling spatial patterns Extend spatial data mining algorithms with new
interest measures, e.g. Buffer based, entropy Build enumerative agents Build threshold free methods
Sorting and statistical techniques
Phase I Status Update April 26 2004 -- page 94 Architecture Technology CorporationSpecialists in Computer Architecture
Topographer’s GUI Design
User interface design Requirement analysis Develop use-case scenario Mock-up GUI Spiral development
Build pattern evaluation tools to support Visual inspection: Allow map comparisons Collaborative review Statistical measures Comparison with gold standard, e.g. Image data
Phase I Status Update April 26 2004 -- page 95 Architecture Technology CorporationSpecialists in Computer Architecture
Build Predictive and Validation Agents
Build outlier detection agents Single layer outliers using statistical tests Multi-layer outliers using collocation tests Outliers via user defined tests
Build predictive agents Build collocation based predictive agents Build attribute value prediction agents, e.g. via spatial
autoregression or MRF Bayesian classifiers
Phase I Status Update April 26 2004 -- page 96 Architecture Technology CorporationSpecialists in Computer Architecture
System Evaluation and Refinement
Obtain benchmark datasets from TEC Select test cases, e.g. visually identified prediction
and validation scenarios (patterns) Compare results from agents with test cases Identify alpha testers from TEC Provide alpha prototype to TEC topographers (Yr 1) Iteratively refine prototype based on TEC feedback
Phase I Status Update April 26 2004 -- page 97 Architecture Technology CorporationSpecialists in Computer Architecture
Phase II Work Plan
Task Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
1. Refine TopoAssistantArchitecture
#
2. Build spatial data miningengine
# # # #
3. Build Topographer’s GUI # # # #4. Build predictive and map
validation agents# # # #
5. System evaluation andrefinement
# # # # # #
6. Program Management &Technical Reports
# # # # # # # #
Important milestones Deliver mock up GUI to TEC at the end of quarter 2 (month 6) Deliver alpha software to TEC for testing purposes at the end of quarter 4 (Year 1)
Refine system based on TEC testing/feedback and progressively deliver more mature
versions at the end of quarters 5, 6, 7 and 8
Phase I Status Update April 26 2004 -- page 98 Architecture Technology CorporationSpecialists in Computer Architecture
Outline
SBIR goal, motivation and innovations Phase I overview and results Phase I prototype demonstration Technical challenges Phase II technical approach Phase II work plan Summary
Phase I Status Update April 26 2004 -- page 99 Architecture Technology CorporationSpecialists in Computer Architecture
Summary
Established concept and implementation feasibility of TopoAssitant approach via Phase I results and Phase I prototype demonstration
Identified technical challenges to be addressed in follow-on Phase II
Developed technical approach to address technical challenges
Developed Phase II task list and work plan Confident of incorporating our ideas in the full-scale
Phase II prototype