Metadata, Provenance and Web Service for Spatial Analysis -- the case of spatial weights

Preview:

DESCRIPTION

Metadata, Provenance and Web Service for Spatial Analysis -- the case of spatial weights. Luc Anselin, Sergio Rey, Wenwen Li GeoDa Center School of Geographical Sciences and Urban Planning Arizona State University. Some Specific Project Goals - PowerPoint PPT Presentation

Citation preview

Copyright © 2013 by Luc Anselin, All Rights Reserved

Metadata, Provenance and

Web Service for Spatial

Analysis

--the case of spatial weightsLuc Anselin, Sergio Rey, Wenwen Li

GeoDa CenterSchool of Geographical Sciences and Urban

PlanningArizona State University

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Some Specific Project Goals

• Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open source strategies

Copyright © 2012 by Luc Anselin, All Rights Reserved

•Challenge

•most current spatial analysis/spatial econometrics software written for single CPU

•rethink and rewrite analytical, algorithmic and processing facilities to integrate into a cyberinfrastructure

•address lack of interoperability

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Spatial Econometrics Workbench

•framework for supporting spatial econometric research in a cyberscience era (Anselin and Rey, IJGIS 2012)

•Leverage PySAL and CyberGIS

•Support scientific workflow

Copyright © 2013 by Luc Anselin, All Rights Reserved

•PySAL

•open source library of Python routines for spatial analysis: geocomputation, spatial weights, spatial autocorrelation, spatial econometrics, regionalization

•http://pysal.org

•hosted on github

Copyright © 2013 by Luc Anselin, All Rights Reserved

Copyright © 2013 by Luc Anselin, All Rights Reserved

•PySAL Progress Report

•current version is 1.6 (7th release)

•3.5 years of on-time bi-annual releases

•20,000+ downloads (10,000 in 2012)

•recognized in open source scientific community - Anaconda

Copyright © 2013 by Luc Anselin, All Rights Reserved

Anaconda for big data analytics

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Migrating to CyberGIS

•performance = need for parallelization + refined algorithms

•interoperability = provide functionality as web services

•replicability: need for metadata and provenance tracking

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Example: Spatial Weights

•includes spatial data source, type of weights (e.g., contiguity, distance), any standardization or manipulation (e.g., higher order)

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Lack of Interoperability

•different implementations

•no standards

•duplication of efforts

•hinders interoperability and workflow chaining

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Example: Weights Formats in PySAL

Copyright © 2013 by Luc Anselin, All Rights Reserved

Example: PySAL spgregwhat do we know about south_k6.gwt and

south_ep_k20.kwt

Copyright © 2012 by Luc Anselin, All Rights Reserved

•Conceptual Framework

•separate data source from operations

•data source: polygon or coordinate files with standard metadata (projection, origin, etc.)

•operations: weights metadata

Copyright © 2013 by Luc Anselin, All Rights Reserved

weights vocabulary

Copyright © 2013 by Luc Anselin, All Rights Reserved

weights metadata structure (wmd)

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Web service implementation(OGC WPS)

•wraps PySAL weights module

•(re)creates weights object from information in wmd file

•makes weights object available as a file

Copyright © 2013 by Luc Anselin, All Rights Reserved

Workflow

Weights Parser

Dispatcher

Output

wmd file(json)

PySAL

Weights

Metadata

Copyright © 2013 by Luc Anselin, All Rights Reserved

Illustration

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Generate Weights from Shapefile

•NAT.shp available on server

•output format = gal

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Get Request

• http://spatial.gdta.asu.edu/cgi-bin/wps.cgi?request=Execute&service=WPS&version=1.0.0&identifier=weights_ws&status=false&datainputs=[outputformat=gal;metadata={"input1":{"type":"shp","uri":"http://toae.org/pub/NAT.shp"},"weight_type":"rook","transform":"O", "parameters":{"p":2,"k":4}}]

metadata input

Copyright © 2013 by Luc Anselin, All Rights Reserved

Server Response

Copyright © 2013 by Luc Anselin, All Rights Reserved

Sample gal output

http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.gal

Copyright © 2013 by Luc Anselin, All Rights Reserved

metadata (wmd) file

http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.wmd

Copyright © 2013 by Luc Anselin, All Rights Reserved

Performance Evaluation•How does PySAL scale when the

amount of input data increases?

•Is the overhead of web service framework acceptable?

•How does the web service framework scale in handling massive concurrent requests?

Copyright © 2013 by Luc Anselin, All Rights Reserved

Scale-up vs. Scale-out solution

•Scale-up

•High-end computer

•Configuration• Processor  2 x 2.93 GHz Quad-Core Intel Xeon

• Memory  16 GB 1066 MHz DDR3 ECC

• Software  Mac OS X Lion 10.7.4 (11E53)

•Scale-out:

•Web server cluster

Copyright © 2013 by Luc Anselin, All Rights Reserved

Web Server Cluster

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Performance •experiment using grid layout for

N = 10,000 to N = 100,000

•rook contiguity and k nearest neighbors (k = 4)

•input shape files on server in Utah, web service on server at ASU

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Experiment 1

•Timing: average over 5 experiments

•web server overhead, data transfer and computation

•explore effect of data size

Copyright © 2013 by Luc Anselin, All Rights Reserved

time for rook and KNN contiguity

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Experiment 2

•Scalability of web service framework

•High-end computer (8-cores)

•Cluster (4 computing nodes, each has 2-core)

•Total processing time

•Speed up

Copyright © 2013 by Luc Anselin, All Rights Reserved

Total processing time

Copyright © 2013 by Luc Anselin, All Rights Reserved

Speed-up

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Experiment 3

•Scalability of the cluster by adding more computing nodes

•Average response time

•128 concurrent requests

•Dataset: 10,000 polygons

Copyright © 2013 by Luc Anselin, All Rights Reserved

Scalability - cluster

Copyright © 2013 by Luc Anselin, All Rights Reserved

Next Steps

Copyright © 2013 by Luc Anselin, All Rights Reserved

Copyright © 2013 by Luc Anselin, All Rights Reserved

•Towards a Standard

•refine specification: flexible, expandable, deal with edge cases

•improve performance (parallelization)

•implement seek operations on distributed files

•interoperability with other software

Thank you!

Recommended