View
277
Download
3
Category
Tags:
Preview:
DESCRIPTION
Stetl, Streaming ETL, is a toolkit for the transformation (ETL) of geospatial data. Stetl is based on existing ETL tools like GDAL/OGR and XSLT. Stetl processing is driven from a configuration (.ini) file. Stetl is written in Python and in particular suited for processing GML. Several INSPIRE transformations have been successfully performed with Stetl. This is an introductory presentation given at the OSGeo Bolsena Codesprint on June 4, 2013. Find more info, downloads and documentation on Stetl at http://stetl.org
Citation preview
Geospatial ETL with Stetl-
“Taming Your Rich GML”
Just van den BroeckeOSGeo Bolsena Codesprint 2013, Bolsena, Italy
June 4, 2012www.justobjects.nl
About MeIndependent Open Source Geospatial Professional
Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep
Just van den Broeckejust@justobjects.nl www.justobjects.nl
OSGeo - Bolsena - 2010
BOLSENA2012
ALLES VORBEI ?
BOLSENA2012
BOLSENA2012
We have a Problem
The Rich GML Problem
Rich GML = Complex Mess
INSPIREDutch National DSsAFIS-ALKIS-ATKIS
.
.
“Semi GML” e.g. Dutch Addresses & Buildings (BAG)
The Streetname!
Application Schema GML e.g. INSPIRE Addresses
Complex Model
Transformations
100+ MBGML Files
Millionsof
Objects
10s of Millionsof
<Elements>
MultipleTransformation
Steps
Solution is Spatial ETL
A.K.A.
Thank You for your
Attention!
But what about.......FOSS ? ... Stetl?
FOSS ETL - Lower Level
Each Powerful by Itself
ogr2ogr
FOSS ETL - High Level
FOSS ETL - DIY ? (No!)
FOSS ETL - How to Combine?
=+ + ?ogr2ogr
Example - 2011 INSPIRE-FOSS
http://inspire.kademo.nl/doc/design-etl.html
Good ideas buthard to scale
and reuse. Need Framework
FOSS ETL - Add Python to Equation
=+ + ?( )ogr2ogr
=+ +
Stetl
( )ogr2ogr
Stetl=
SimpleStreaming
SpatialSpeedy
ETL
Process Chain
Input Filter Outputgml
Filter
Stetl concepts
Speed: Streaming
Input Filter Output
gml
Stetl concepts
Speed: Going Native
Input Filter Outputgml
ogr2ogr sETLsETL
Native C Libs/Progs
Calls
Stetl concepts
Example: GML to PostGIS
ReaderXML
Splitter ogr2ogr
gml
Stetl concepts
Example: INSPIRE Model Transform
ogr2ogr XSLT Writergml
Stetl concepts
Example: deegree Store
ogr2ogr XSLTdeegreeWriter
Stetl concepts
Process Chain - How?
Input Filters Output
Stetl concepts
Example: XML to Shape
The Source
Example: XML to Shape
The XSLT Script
Example: XML to Shape
XSLT Transform to GML
Example: XML to Shape
XMLInput
XSLTFilter
ogr2ogrOutput
Example: XML to Shape
The SETL Chain Config File
ProcessChain
Reader
XSLT
ogr2ogr
Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT
log = Util.get_log("xsltfilter")
class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()
def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)
def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet
Example Components
Input Filters Output
Stetl concepts
XMLFile XSLT GMLFile
ogr2gml GMLSplitter gml2ogr
LineStream XMLValidator WFS-T
deegree* FeatureExtractor deegree*
YourInput YourFilter YourOutput
[etl]chains = input_xml_file|my_filter|output_std
[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml
# My custom component[my_filter]class = my.myfilter.MyFilter
[output_std]class = outputs.standardoutput.StandardXmlOutput
class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet
Your Own Components
Stetl concepts
Step 1- Define Class
Step 2- Config Class
Data Structures
Stetl concepts
✴ Components exchange Packets✴ Packet contains data and status✴ Data formats:
xml_line_stream etree_docetree_feature_arrayxml_doc_as_stringany
deegree Integration
Stetl concepts
✴Input DeegreeBlobstoreInput✴Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput
Cases✴INSPIRE Download Services publish to deegree store (WFS) GML files (for Atom Feed)
✴National GML Datasets GML to PostGIS (Top10NL, BGT)
[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres
# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql
# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql
# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}
[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}
# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember
Top10NL Extract
Case: INSPIRE DL Services - Dutch Addresses
Source<GML>
NLExtractStetl deegree
WFS
INSPIRE<GML>
AtomFeed
INSPIREAddresses
DutchAddresses+
Buildings
deegreeblobstore
Stetl
Thank You !
stetl.org github.com/justb4/stetl
inspire-foss.org
Recommended