Strabon: A Semantic Geospatial Database System

  • Published on

  • View

  • Download


We present a new version of the data model stRDF and the query language stSPARQL for the representation and querying of geospatial data. The new versions of stRDF and stSPARQL use OGC standards to represent geometries where the original version of stSPARQL used linear constraints. In this sense stSPARQL is a subset of the recent standard GeoSPARQL proposed by OGC. We discuss the implementation of the system Strabon which is a storage and query evaluation module for stRDF/stSPARQL and the corresponding subset of GeoSPARQL. We study the performance of Strabon experimentally and show that it scales to very large data volumes.


  • 1. StrabonA Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece

2. Outline Introduction The data model stRDF The query language stSPARQLand a comparison to GeoSPARQL The system Strabon forstSPARQL and GeoSPARQL Experimental Evaluation Related Work & Conclusions 3. Main idea How do we represent and query geospatial information in the Semantic Web? Develop appropriate vocabularies and ontologies Extend RDF to take into account the geospatial dimension Extend SPARQL to query the new kinds of data Use Open Geospatial Consortium (OGC) and other geospatial industry standards 4. Example4 5. National Observatory of Athens:Fire Products Ontology5 6. Introduction The data model The data modelstRDF The query languagestSPARQL and a stRDFcomparison toGeoSPARQL The system Strabonfor stSPARQL andGeoSPARQL ExperimentalEvaluation Related Work &Conclusions 7. The Data Model stRDF stRDF extends RDF with: Spatial literals encoded by Boolean combinations of linear constraints New datatype for spatial literals (strdf:geometry) Valid time of triples encoded by Boolean combinations of temporal constraints stRDF (most recent version) Spatial literals encoded in Well-Known Text/GML (OGC standards) Valid time of triples ignored for the time being 8. Burnt Area Products@prefix strdf:.@prefix noa:.noa:ba_15 rdf:type noa:BurntArea ;noa:hasGeometry noa:ba_15g. Spatialliteralnoa:ba_15g noa:hasSerialization"MULTIPOLYGON(((393801.42 4198827.92, ..., 393008 424131)));"Spatial^^strdf:WKT. data type 9. The stRDF Data Modelstrdf:geometry rdf:type rdfs:Datatype;rdfs:subClassOf rdfs:Literal.strdf:WKT rdf:type rdfs:Datatype;rdfs:subClassOf strdf:geometry.strdf:GML rdf:typerdfs:Datatype;rdfs:subClassOf strdf:geometry. 10. Introduction The data modelstRDF The query The querylanguage language stSPARQLstSPARQL and acomparison toGeoSPARQL The systemStrabon forstSPARQL and and a comparisonGeoSPARQL Experimental to GeoSPARQLEvaluation Related Work &Conclusions 11. stSPARQL: Geospatial SPARQL 1.1We define a SPARQL extension function for each functiondefined in the OpenGIS Simple Features Access standard Basic functions Get a property of a geometry (e.g., strdf:srid) Get the desired representation of a geometry (e.g., strdf:AsText) Test whether a certain condition holds (e.g., strdf:IsEmpty,strdf:IsSimple) Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects) OGC Simple Features Access, Egenhofer, RCC-8 Spatial analysis functions Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull) Spatial metric functions (e.g., strdf:distance, strdf:area) Spatial aggregate functions (e.g., strdf:union, strdf:extent) 12. stSPARQL: Geospatial SPARQL 1.1Select clause Construction of new geometries (e.g., strdf:buffer(?geo, 0.1)) Spatial aggregate functions (e.g., strdf:union(?geo)) Metric functions (e.g., strdf:area(?geo))Filter clause Functions for testing topological relationships between spatial terms (e.g.,strdf:contains(?G1, strdf:union(?G2, ?G3))) Numeric expressions involving spatial metric functions(e.g., strdf:area(?G1) 2*strdf:area(?G2)) Boolean combinationsHaving clause Boolean expressions involving spatial aggregate functions and spatial metricfunctions or functions testing for topological relationships between spatialterms (e.g., strdf:area(strdf:union(?geo))>1)Updates 13. stSPARQL: An example (1/2)Find coniferous forests that have been affected byfiresSELECT ?forest ?burntAreaWHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [Function clc:hasSerialization ?fGeom].FILTER(strdf:intersects(?baGeom,?fGeom))} 14. stSPARQL: An example (2/2)Isolate the parts of the burnt areas that lie inconiferous forests. Spatial SELECT ?burntAreaAggregate(strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest)WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom))}GROUP BY ?burntArea ?baGeom 15. The OGC Standard GeoSPARQLCore TopologyGeometryParametersVocabulary ExtensionExtension- relation family - serialization - version Serialization WKT GML Geometry Topology Extension - serialization Relation Family - version - relation family Simple Features RCC-8Query RewriteRDFS EntailmentExtension Extension Egenhofer- serialization- serialization- version- version- relation family- relation family 16. Introduction The data modelstRDFThe system The querylanguagestSPARQL and a Strabon forcomparison toGeoSPARQLstSPARQL and GeoSPARQL The systemStrabon forstSPARQL andGeoSPARQL ExperimentalEvaluation Related Work &Conclusions 17. Strabon Architecture Strabon WKT GML Sesame v2.6.3 Query Engine Storage Manager ParserRepository Optimizer SAIL stRDF graphs EvaluatorRDBMS TransactionManagerstSPARQL/GeoSPARQLqueries GeneralDB PostGIS 18. Storage Scheme uri_valuestype_2TriplesID VALUEDictionarySUBJEC OBJECTSUBJECT PREDICATE OBJECT PREDICATE1 OBJECT VALUE IDnoa:ba_15Tnoa:ba_15 3 rdf:typenoa:ba_151rdf:type12 noa:ba_15 rdf:type noa:BurntArea .1noa:ba_15 2 noa:hasGeometry 3noa:ba_15 6 noa:hasGeometry5noa:ba_15g. rdf:typenoa:ba_15g noa:BurntArea 231noa:ba_15g4 rdf:type 5noa:ba_15g rdf:typehasgeom_4 noa:Geometry . 34 noa:BurntAreanoa:Geometrynoa:hasGeometry5noa:ba_15g2 noa:hasSerialization 6 "MULTIPOLYGON(...)"^^noa:ba_15g OBJECTSUBJECTnoa:hasserialization45 noa:hasGeometrynoa:ba_15g"MULTIPOLYGON(...)"^^5 78strdf:WKT .strdf:WKT noa:ba_15g 56 noa:Geometry15 67 noa:Geometrynoa:hasSerializationhasserial_77label_values noa:hasSerializationSUBJECT OBJECT8 ID "MULTIPOLYGON(...)"^^VALUE5 8 geo_values strdf:WKT8 "MULTIPOLYGON(...)"^^ID VALUE strdf:WKT8datatype_values IDVALUE 8 strdf:WKT 19. Query ProcessingStrabon Parser generates stSPARQL/abstract syntax treeQuery EngineGeoSPARQLquery Parser Abstract syntax treeOptimizer mapped to internal Evaluatoralgebra of SesameTransaction Manager Standard optimizationsperformed Evaluator producescorresponding SQLGeneralDB query DBMS evaluates theSQL query PostGIS Post-processing 20. Query Processing (contd) Deviate from the evaluation strategy of Sesamefor SPARQL extension functions Push the evaluation of extension functions tounderlying DBMS Spatial predicates evaluated by PostGIS Spatial joins now affect query plan Avoid Cartesian products Results may be returned in well-known industry formats KML/KMZ GeoJSON GML 21. Introduction The data model ExperimentalstRDF The querylanguage EvaluationstSPARQL and acomparison toGeoSPARQL The systemStrabon forstSPARQL andGeoSPARQL ExperimentalEvaluation Related Work &Conclusions 22. Experimental Evaluation Goal: Evaluate the performance of Strabon vs other systems Real workload based on geospatial linked datasets 150 million triples Synthetic workloadHalf a billion triples 23. Strabon vs other systems Strabon over PostgreSQL(Strabon-PG) Strabon over System X(Strabon-X) Implementation over RDF-3X(Brodt et. al) Parliament (BBN Technologies) Naive Implementation over Sesame 24. Real world workload: Data DBpedia GeoNamesLGD Pachube SwissExCLCGADMSize 7.1 GB 2.1 GB 6.6 GB828 KB33 MB 14 GB 146 MBTriples 58,722,893 17,688,602 46,296,9786,333277,919 19,711,926255Spatial386,205 1,262,3565,414,032 101 6872,190,214 51TermsDistinct 375,087 1,099,9645,035,98170 6232,190,214 51SpatialTermsPoints 375,087 1,099,9643,205,01570 623- -Linestrings -- 353,714- -- 1 - 79,831 verticesPolygons--1,704,650 - -2,190,214 51 25. Real world workload: Queries Commonly Used Spatial SelectionSpatial JoinQuery 1 XQuery 2 XQuery 3 XQuery 4 XQuery 5XQuery 6 XXQuery 7 XXQuery 8XQueries with Spatial Joins Points LinesPolygonsQuery 5X X X XQuery 6X X XQuery 7XQuery 8X X X 26. Real world workload: ResultsCacheSystem Q1 Q2 Q3Q4Q5 Q6Q7 Q8StateCold Naive0.08 1.65 >8h 28.88 89 170 8441.699(sec.) Strabon2.01 6.79 41.39 10.11 78.6960.25 9.23 702.5 PG 5 Strabon1.74 3.05 162346.52 12.572409>8h57.83 X Parliament 2.12 6.46 >8h 229.72 1130872 3627 3786Warm Nave0.01 0.03 >8h 0.7943.07887081712(sec.) Strabon0.01 0.81 0.961.6638.741.222.92 648.1 PG Strabon0.01 0.26 1604.9 35.590.18 3196>8h44.72 X Parliament 0.01 0.04 >8h 10.91 358.92 483.29 27713502 27. Synthetic Workload Workload based on a synthetic datasetDataset based on OpenStreetMaps10 million triples (2 GB) up to half a billion triples (50GB) Implemented custom bulk loader Triples with spatial literals: 1 up to 46million triples Response time of queries with various thematic and spatial selectivities 28. Synthetic Workload: SampleData 29. Synthetic Workload:Sample stSPARQL QuerySELECT *WHERE {?tag geordf:key "1" .?node geordf:hasTag ?tag .?node geo:hasGeography1 ?geo .FILTER (strdf:inside(?geo,"POLYGON((-1 -1, 0.056568542 -1, 0.056568542 0.056568542, -1 0.056568542, -1 -1))"^^strdf:WKT))} 30. Real-world Workload: 100 million triples warm cachesResponse time (sec)Response time (sec)number of Nodes in query region number of Nodes in query region Tag 1Tag 1024 31. Real-world Workload:100 million triples cold caches number of Nodes in query region number of Nodes in query region Tag 1Tag 1024 32. Strabon-PG: 500 million triplesTags comparison Response time (sec) 33. Real-world Workload:Query Plans 34. Findings Strabon over PostgreSQL outperforms other systems in case of warm caches Results in case of cold caches mixed PostreSQL optimizer needs to take into account spatial selectivity PostGIS 2.0 moves towards it 35. System LanguageIndex GeometriesCRS support GeospatialFunctionSupportStrabonstSPARQL/R-tree-over- WKT / GML Yes OGC-SFA GeoSPARQL* GiST support Egenhofer RCC-8Parliament GeoSPARQL* R-Tree WKT / GML Yes OGC-SFA support Egenhofer RCC-8Oracle 12c GeoSPARQLR-Tree,WKT Yes OGC-SFAQuadtreeBrodt et al. SPARQL R-Tree WKT support NoOGC-SFA(RDF-3X)PerrySPARQL-STR-Tree GeoRSS GMLYes RCC-8AllegroGraph Extended Distribution 2D point PartialBuffer SPARQL sweeping geometriesBounding BoxtechniqueDistanceOWLIMExtended Custom 2D pointNoPoint-in-polygon Buffer SPARQLgeometries DistanceVirtuoso SPARQL R-Tree 2D pointYes SQL/MM (subset) geometriesuSeekM SPARQL R-tree-overWKT support NoOGC-SFAGiST 36. IntroductionThe data modelstRDFThe query Related Work &languagestSPARQL and acomparison to ConclusionsGeoSPARQL The systemStrabon forstSPARQL andGeoSPARQL ExperimentalEvaluation Related Work &Conclusions 37. Future Work Use even larger datasets Test with 1 billion triples successful Implement the temporal part of stSPARQL Develop a benchmark for geospatial RDF stores Consider more systems GeoSPARQL implementation of Oracle Virtuoso stSPARQL query processing in MonetDB Go distributed! Federated queries 38. Thanks! Any Questions? Strabon Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis,Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta,Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros. Web site: Mercurial repository: Trac: Mailing list: Real Time Fire Monitoring Service, National Obervatory of Athens Greek Linked Open Data TELEIOS EU Project SemsorGrid4Env EU Project


View more >