MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF...

Preview:

Citation preview

MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING

Anita GraserCenter for Mobility Systems, AIT Austrian Institute of Technology

ABOUT

Anita GraserScientist @ AIT Austrian Institute of Technology

− QGIS user since 2008

− MSc in Geomatics 2010

− QGIS Project Steering Committee since 2013

− OSGeo Director 2015-17

− Moderator on GIS.StackExchange.com

− Author of „Learning QGIS“ (1st ed 2013), „QGIS Map Design“

(2016) & „QGIS 2 Cookbook“ (2016)

@underdarkGIS

Austria‘s largest non-university research institute

− Energy

− Health & Bioresources

− Digital Safety & Security

− Vision, Automation & Control

− Mobility Systems

− Low-Emission Transport

− Technology Experience

− Innovation Systems & Policy

AIT

ANGESTELLTE

1,300

Application areas

− Road traffic → FCD, e.g. Waze, TomTom, Uber

− Air traffic → ADS-B, e.g. Flightradar

− Marine traffic → AIS, e.g. MarineTraffic

− Human movement → CDR, e.g. mobile network providers

→ Data-driven decision making

→ Technologically challenging

CONTEXT & MOTIVATION

411/07/2018

SPATIAL DATA

511/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

SPATIAL RELATIONSHIPS

611/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

SPATIAL FUNCTIONS

711/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Big geo data

is anything

which is

crash ArcGIS

Small data is when

is fit in RAM.

Big is when is

crash because is

no fit in RAM

WHAT‘S „MASSIVE“ SPATIO-TEMPORAL DATA

TRADITIONAL TOOLS

911/07/2018Scaling PostgreSQL and PostGIS http://s3.cleverelephant.ca/2017-cdb-postgis.pdf

LOOKING FOR SCALABLE SOLUTIONS

10

ESRI GIS Tools

for Hadoophttps://github.com/E

sri/gis-tools-for-

hadoop

LocationSparkhttps://github.com/merlin

tang/SpatialSpark

STARK - Spatio-

Temporal Data

Analytics on Sparkhttps://github.com/dbis-

ilm/stark

SpatialSparkhttps://github.com/syoum

mer/SpatialSpark

GeoSparkhttps://github.com/DataS

ystemsLab/GeoSpark

PySpark & Geopandashttps://github.com/sabman/

PySparkGeoAnalysis

OPENSOURCE & MATURE

1111/07/2018

https://projects.eclipse.org/wg/locationtech/projects

WHAT IS GEOMESA?

1211/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

WHAT IS GEOMESA?

1311/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

WHAT IS GEOMESA?

1411/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

WHAT IS GEOMESA?

1511/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

WHAT IS GEOMESA?

1611/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Features

✓ Store gigabytes to petabytes of spatial data (tens of billions of points or more)

✓ Serve up tens of millions of points in seconds

✓ Ingest data faster than 10,000 records per second per node

✓ Scale horizontally easily (add more servers to add more capacity)

✓ Support Spark analytics

✓ Drive a map through GeoServer or other OGC Clients

GEOMESA

1711/07/2018

http://www.geomesa.org/documentation/user/introduction.html#what-is-geomesa

… making 2/3D data sortable

→ Space-filling curves

SPATIO-TEMPORAL INDIZES

1811/07/2018

http://doi.ieeecomputersociety.org/10.1109/TVCG.2014.2298017

GEOMESA Z-CURVE

1911/07/2018

http://www.geomesa.org/documentation/tutorials/geohash-substrings.html

geomesa describe-schema -c geomesa.gdelt -f gdelt -u user -p password

INFO Describing attributes of feature 'gdelt'

globalEventId | String

eventCode | String

...

dtg | Date (Spatio-temporally indexed)

geom | Point (Spatially indexed)

User data:

geomesa.index.dtg | dtg

geomesa.indices | z3:4:3,z2:3:3,records:2:3

geomesa.table.sharing | false

GEOMESA COMMAND LINE

20

geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd

-q "globalEventId='671867776'"

Using GEOMESA_ACCUMULO_HOME = /opt/geomesa

id,globalEventId:String,...,dtg:Date,*geom:Point:srid=4326

d9e...,671867776,...,2007-07-13T00:00:00.000Z,POINT (-97 38)

GEOMESA COMMAND LINE

21

geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd

-q "CONTAINS(POLYGON ((0 0, 0 90, 90 90, 90 0, 0 0)),geom)" -m 3

Using GEOMESA_ACCUMULO_HOME = /opt/geomesa

id,globalEventId:String,...,dtg:Date,*geom:Point:srid=4326

139...,671713129,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)

9e8...,671928676,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)

d6c...,671817380,...,2017-07-09T00:00:00.000Z,POINT (5.43827 5.35886)

More complex queries & analyses → Spark(SQL)!

GEOMESA COMMAND LINE

22

GEOMESA

2311/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Option #1: DataFrame API

import org.locationtech.geomesa.spark.jts._

import spark.implicits. _

gdeltDf.where(st_contains(st_makeBBOX(0.0, 0.0, 90.0, 90.0), $"geom"))

Option #2: SparkSQL (mit UDFs)

SELECT * FROM gdelt

WHERE st_contains(st_makeBBOX(0.0, 0.0, 90.0, 90.0), geom)

GEOMESA

24

Save dataframe to GeoMesa table

val df = spark.sql(sqlQuery)

val dsParams = Map( "accumulo.instance.id" -> "...",

"accumulo.zookeepers" -> "...",

"accumulo.user" -> "...",

"accumulo.password" -> "...",

"accumulo.catalog" -> "tablename") )

df.write.format("geomesa").options(dsParams)

.option("geomesa.feature", "featurename").save()

GEOMESA

25

Example: Trajectory from points sorted by time

val someDF = Seq(

(1, Timestamp.valueOf("2018-01-01 12:00:00"), 2.5, geomFactory.createPoint(new Coordinate(0, 0))),

(1, Timestamp.valueOf("2018-01-01 12:05:00"), 3.5, geomFactory.createPoint(new Coordinate(1, 1))),

(2, Timestamp.valueOf("2018-01-01 12:00:00"), 5.5, geomFactory.createPoint(new Coordinate(0, 0))),

(2, Timestamp.valueOf("2018-01-01 12:05:00"), 5.5, geomFactory.createPoint(new Coordinate(1, 1)))

).toDF("id", "t", "sog", "pt")

+--+-------------------+---+-----------+

|id|t |sog|pt |

+--+-------------------+---+-----------+

|1 |2018-01-01 12:00:00|2.5|POINT (0 0)|

|1 |2018-01-01 12:05:00|3.5|POINT (1 1)|

|2 |2018-01-01 12:00:00|5.5|POINT (0 0)|

|2 |2018-01-01 12:05:00|5.5|POINT (1 1)|

+--+-------------------+---+-----------+

GEOMESA

26

Example: Trajectory from points sorted by time

someDF

.withColumn("collected", collect_list($"pt").over(Window.partitionBy("id").orderBy("t")))

.groupBy("id")

.agg(max($"collected").as("collected"))

.withColumn("line", st_makeLine($"collected"))

.show(false)

+--+------------------------------+-------------------------+

|id|collected |line |

+--+------------------------------+-------------------------+

|1 |[POINT (0 0), POINT (1 1)] |LINESTRING (0 0, 1 1) |

|2 |[POINT (10 10), POINT (11 11)]|LINESTRING (10 10, 11 11)|

+--+------------------------------+-------------------------+

GEOMESA

27

Example: Trajectory from points sorted by time

spark.sql("""WITH windowed AS (

SELECT id, collect_list(first(pt)) OVER (PARTITION BY id ORDER BY t) line

FROM temp

GROUP BY id, t)

SELECT id, max(line), st_makeline(max(line))

FROM windowed

GROUP BY id""").show(false)

+--+------------------------------+--------------------------+

|id|max(line) |UDF:st_makeLine(max(line))|

+--+------------------------------+--------------------------+

|1 |[POINT (0 0), POINT (1 1)] |LINESTRING (0 0, 1 1) |

|2 |[POINT (10 10), POINT (11 11)]|LINESTRING (10 10, 11 11) |

+--+------------------------------+--------------------------+

GEOMESA

28

http://www.geomesa.org/documentation/user/spark/sparksql_functions.html

Geometry Constructors

• st_geometryFromText

• st_makeBBOX

• st_makeLine

• st_makePoint

• st_makePolygon

• …

Geometry Accessors

• st_geometryN

• st_isValid

• st_pointN

• st_x

• …

Geometry Outputs

• st_asGeoJSON

• st_asText

• …

Spatial Relationships

• st_area

• st_centroid

• st_closestPoint

• st_contains

• st_covers

• st_crosses

• st_disjoint

• st_distance

• st_distanceSphere

• st_distanceSpheroid

• st_equals

• st_intersects

• st_length

• st_lengthSphere

• st_lengthSpheroid

• st_overlaps

• st_relate

• st_touches

• st_within

Geometry Processing

• st_bufferPoint

• st_convexHull

• …

GEOMESA-SPARK-SQL MODULE

29

BIG SPATIAL TECHNOLOGY STACK

30

ACCESSING GEOMESA IN GEOSERVER

3111/07/2018

GEOSERVER PREVIEW

3211/07/2018

CONSUMING WFS IN QGIS

3311/07/2018

EXAMPLE

TRAFFIC COUNTS

EXAMPLE

TRAVEL TIME

Based on similar trajectory search

EXAMPLE

TRAJECTORY PREDICTION

5 MIN 10 MIN 15 MIN

Graser, A., Schmidt, J., Widhalm, P. (2018) Predicting trajectories with probabilistic time geography and massive unconstrained movement data, GIScience Workshop on Analysis of Movement Data (AMD’18), 28. August 2018, Melbourne, Australia.

EXAMPLE

INTERACTIVE ANIMATION

37

http://www.geomesa.org/

CONTACT

Anita Graser

anita.graser@ait.ac.at

@underdarkGIS

anitagraser.com

Recommended