34
Giving MongoDB the way to play with the GIS community To make GIScience directly supported by the NoSQL Technology, so prepared for BIG DATA ERA Jiangsu Key Laboratory of Geographical Information Technology, Nanjing University. Cyber-Infrastructure and Geospatial Information Laboratory (CIGI), Department of Geography, School of Earth, Society and Environment, National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign, Urbana, Illinois, USA Jun. 25, 2014 Hanson Shuai Zhang [email protected]

Giving MongoDB a Way to Play with the GIS Community

  • Upload
    mongodb

  • View
    371

  • Download
    1

Embed Size (px)

DESCRIPTION

The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry. Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity. Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.

Citation preview

Page 1: Giving MongoDB a Way to Play with the GIS Community

Giving MongoDB the way to play with the GIS community To make GIScience directly supported by the NoSQL Technology, so prepared for BIG DATA ERA

Jiangsu Key Laboratory of Geographical Information Technology, Nanjing University.

Cyber-Infrastructure and Geospatial Information Laboratory (CIGI),

Department of Geography, School of Earth, Society and Environment,

National Center for Supercomputing Applications (NCSA),

University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

Jun. 25, 2014

Hanson Shuai Zhang

[email protected]

Presenter
Presentation Notes
Hello Everyone, I’m Hanson. I come from China. And I’m a Ph.d student majoring in GIS. Today I’m going to share some of my works related to MongoDB. As you may noticed, mongodb has a geospatial part which is really great and in some way, you can say, quite revolutionary. But if you examine this geospatial part from GIS tech point of view you will find that MongoDB is actually quite lonely. you know, GIS tech is the main field that attempts to deal with geospatial information, and after decades of development GIS ecosystem has become a very prosperous field with hundreds of libraries, tools and software to handle all kinds of geospatial problems. But among you can hardly find any which could cooperate directly with mongodb. well that’s really a pity. the work we did here is trying to change this awkward situation. The hope is that making GIS ecosystem powered by MongoDB, the new generation database technology, and meanwhile giving mongodb the way to play with the GIS community. Ok, Let’s start from a real world example, and see how we met mongodb.
Page 2: Giving MongoDB a Way to Play with the GIS Community

Spatial Pyramid – View the world with multiple spatiotemporal scales

1

Real world example - Spatial Pyramid

Challenges with PostGIS

Handling with MongoDB cluster

Presenter
Presentation Notes
One of our projects needs to generate a spatial pyramid. So what is spatial pyramid?
Page 3: Giving MongoDB a Way to Play with the GIS Community

Global

North America

Canada U.S.A.

Illinois

Champaign

UIUC Campus

Downtown

Chicago

New York

Asia

South Asia East Asia

China

Shanghai Beijing

Olympic Park

Xidan Street

Japan

Spatial Pyramid | Introduction

Presenter
Presentation Notes
Spatial pyramid is actually quite simple. Suppose we have millions of point data, for example photos and tweets with geo tag distributed all over the world. We want to analyze these data in multi scales, in different levels. It requires a recursive aggregation from bottom up, which we call spatial pyramid. The diagram here illustrates what the spatial pyramid can look like. But I have to mention that there are three quite outstanding characteristics of spatial pyramid. The data size is huge, and millions of data keep pouring in every day. It requires lots of spatial queries. There are lots of data back and forth, writes and reads in each scale, so it’s a very typical IO intensive task.
Page 4: Giving MongoDB a Way to Play with the GIS Community

Open Layers

Internet

Leaflets

ArcJs

Spatial Pyramid | PostGIS in the Open Stack

LAN uDig

QGIS

GRASS

ArcGIS

Mapserver

GeoServer

ArcServer

PostGIS

Presenter
Presentation Notes
Well, at first, I think it’s quite nature, that we decided to use postgis as our backend database server. PostGIS, built upon postgresql, is a very famous open source spatial database, and has a very wide range of support from the GIS ecosystem. So it can be very convenient for us to build a solution in the Postgis stack.
Page 5: Giving MongoDB a Way to Play with the GIS Community

Spatial Pyramid | Generator Architecture

Spatial Pyramid Generator Architecture Data Server

Spatial Pyramid Generator

PostGIS

HPC Cluster

Pyrimad Model

Python OGR, MPI

Postgre SQL

What is ArcSDE 8?

2.3 hours !!

Presenter
Presentation Notes
So we built one, and put all our data in the PostGIS database. But it turned out that postgis was extremely slow in this scenario, it takes us 2.3 hours on average to finish 10 layers with a single thread. We know that with multithreads and parallel computing it can be faster. But since the spatial pyramid is an IO intensive task, we believe that the throughput of PostGIS will be the bottleneck for further performance tuning. And besides our data keep growing every day and we need a high performance spatial database with a good scalability. So we start to search new solutions for our project. And we found mongodb.
Page 6: Giving MongoDB a Way to Play with the GIS Community

Spatial Pyramid | MongoDB Approach

Spat

ial P

yram

id

Requests

Load Balance

MongoS

P

S

S

P

S

S

MongoS

Shard

Shard

C C C

Config

GD

AL/

OG

R

15 minutes !!

Presenter
Presentation Notes
MongoDB was designed as a next generation database with high throughput, high performance, and good scalability. And the most exciting thing about mongodb is it has a well-defined geospatial part. The spatial query in the spatial pyramid is actually quite simple, just sort of nearby query. So we found that mongodb fits in our demands very nicely. So we decided to build our project up on mongodb cluster. We spent several weeks to set up a MongoDB cluster, import all our data and shard them. And another several weeks to rewrite the spatial pyramid generator inside of mongodb cluster, and then lots of performance tuning. And finally we managed to bring down the time to about 15 minutes. That’s acceptable. But today I’m not going to talk about the performance tuning part. What I want to talk about is the problem this mongodb approach has. The problem is that we use a library called GDAL as a fundamental library in our later spatial analysis part. and mongodb do not has a GDAL support. So the work flow in our project is actually broken.
Page 7: Giving MongoDB a Way to Play with the GIS Community

Open Source – GDAL is released under an X/MIT style Open Source license

– supported by the Open Source Geospatial Foundation

A library for geospatial data formats – abstract data model conformed to OGC standards.

– 133 raster data formats, 79 vector data formats

Widely used by the GIS community – 88 software listed in the gdal.org using GDAL

Basic Library for HPGC – We use GDAL as the basic tools to build high performance computing algorithms

Spatial Pyramid | GDAL Library

Presenter
Presentation Notes
So what is gdal? Can we get rid of it? Yes, we can, but not a good idea. GDAL is an open source library, and its main purpose is to load spatial data from all kinds of data formats. And it has been widely used throughout the GIS community and served as a basic element in the ecosystem. So algorithms written based on gdal will acquire a good interoperability and easy way to cooperate with other gis tools in the stack. So a better choice would be keeping both of GDAL and MongoDB in our project, and build a bridge between the two. Of course you can write an ad-hoc program by using a certain programing language to glue them together but in fact there is a much more elegant way to handle this.
Page 8: Giving MongoDB a Way to Play with the GIS Community

a

Spatial Pyramid | GDAL Architecture

Presenter
Presentation Notes
GDAL actually has an extendable architecture for new data formats, so instead of gluing them together but regarding mongodb as a new data formats for gdal. we could write a gdal driver for mongodb.
Page 9: Giving MongoDB a Way to Play with the GIS Community

GDAL Driver for MongoDB – Giving MongoDB the way to play with the GIS community

2

View MongoDB as a spatial database

Design GDAL Driver for MongoDB

Cooperate with other GIS tools

Presenter
Presentation Notes
So let’s see how we did it. But before we go to the details, in order to have a better understanding, we have to first go through the way how GDAL organizes its spatial data. And I’ll also talk about the challenges to write a gdal driver for mongodb, and the way we solved it.
Page 10: Giving MongoDB a Way to Play with the GIS Community

FID Geometry Name States Time Zone

10001 POINT(40.77, 73.98) NYC New York UTC-05:00

10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00

Feature – a spatial object

Point

Line

Polygon

Geometries

Attributes, Non-Spatial Data

GDAL | spatial database structure

Spatial Relational Table

1

2

3

Presenter
Presentation Notes
So how the gdal organize the geospatial data? Well, there is a fundamental concept called feature. And each feature is designed to represent a spatial object in the real world. Let’s say use a point to represent a city. But in the database each feature is stored as a row, and is composed of three parts, the geometries, their reference, and the attributes data. you put many features of the same theme in one table to form a spatial layer.
Page 11: Giving MongoDB a Way to Play with the GIS Community

GDAL | spatial database structure

https://lib.stanford.edu/gis

Tables – Layers

Rows – Features Where is

RDBS

Presenter
Presentation Notes
Then you overlay those layers together to get a whole map. Here is basic structure, we got tables for layers, and rows for features. But the problem is that in mongodb you can’t find a table anywhere. So the first challenge is how to organize geospatial data in mongodb.
Page 12: Giving MongoDB a Way to Play with the GIS Community

GDAL | Simple Feature Access

Presenter
Presentation Notes
Another thing is that gdal followed an international standard – called simple feature access – which defined a geometry hierarchy, as illustrated in this diagram. Unfortunately there is no definition for representing these geometry types in json style. So we have a problem again. How to represent this geometries in mongodb?
Page 13: Giving MongoDB a Way to Play with the GIS Community

RDBMS GeoDatabase MongoDB

Database Datasource Database

Table Layer Collection

Row(s) Feature(s) JSON Document

Field(s) Field(s) Key:Value

Index R tree Index

Join Join Embedding & Linking

Partition — Shard

GDAL | Terminology

Presenter
Presentation Notes
The first challenge can be very easy to handle. Yes, mongodb don’t have table but it has collection. We could treat each JSON document as a row. So put all these terminologies together, we could quickly find the idea of how to organize geospatial data in mongodb. While the second problem is not an easy one, and I’ll talk about three approaches to walk around it.
Page 14: Giving MongoDB a Way to Play with the GIS Community

WKT, Well-known text, originally defined by the Open Geospatial

Consortium (OGC) and described in their Simple Feature Access and

Coordinate Transformation Service specifications.

GDAL | WKT for Spatial data

Type Examples

Point POINT (30 10)

LineString LINESTRING (30 10, 10 30, 40 40)

Polygon POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))

POLYGON ((35 10, 10 20, 15 40, 45 45, 35 10), (20 30, 35 35, 30 20, 20 30))

In total, there are 18 distinct geometric objects that can be represented.

http://en.wikipedia.org/wiki/Well-known_text

Presenter
Presentation Notes
Well, the first approach lies in the standards itself. In the standard it defined a string format to represent the geometries called WKT. And below in the table are some WKT examples on how to represent point, linestring, polygon. So we could take advantage of this string format to represent geometries in the json document.
Page 15: Giving MongoDB a Way to Play with the GIS Community

GDAL | WKT for Spatial data

{

GEM: POINT(41.90, 87.65)

FID:10002

Name: Chicago,

States: Illinois,

Time Zone: UTC-06:00,

}

FID Geometry Name States Time Zone

10001 POINT(40.77, 73.98) NYC New York UTC-05:00

10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00

WKT

Geospatial Metadata collection

Presenter
Presentation Notes
So it becomes very simple. you just add a geometry field and use WKT string to represent them.
Page 16: Giving MongoDB a Way to Play with the GIS Community

GDAL | WKT for Spatial data

U.S.A

States

Cities

Canada

Roads

G_sys_Metadata

MongoDB Cluster

NYC

Chicago

……

Database

Collection

WKT

Feature

Layer

Datasource

|c_name | coord_d | src | type | Extent|+----------------------+-------------------+| Cities | 2 | 4326 | Point | [p1,p2]| States | 2 | 4326 | Polygon | [p1,p2]

No spatial Index

Presenter
Presentation Notes
So the overall solution can be described by this diagram. We had database, the terminology in the database technology. Its counterpart is datasource the terminology in GIS tech. each collection serves as a layer, Each JSON document is a feature with a field include WKT string. We also got a metadata collection in each database, just as traditional spatial database did. So here we go. We could organize all the geospatial information in the mongodb. But there is problem. You have no way to build a spatial index on the WKT field right now in mongodb. And that’s terrible.
Page 17: Giving MongoDB a Way to Play with the GIS Community

GDAL | GeoJSON for spatial data

FID Geometry Name States Time Zone

10001 POINT(40.77, 73.98) NYC New York UTC-05:00

10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00

{ type: "Feature", properties: { FID:10002 Name: Chicago, States: Illinois, Time Zone: UTC-06:00, }, geometry: { type: "Point", coordinates: [ 41.90 87.63] } }

GeoJSON

Geospatial Metadata collection

Presenter
Presentation Notes
So another way to walk around is GeoJSON. Well from last year, since version 2.4 , mongodb start to use GeoJSON to store geospatial data. GeoJSON is a specification for encoding a variety of geospatial data in JSON style.
Page 18: Giving MongoDB a Way to Play with the GIS Community

U.S.A

States

Cities

Canada

Roads

G_sys_Metadata

MongoDB Cluster

NYC

Chicago

……

Database

Collection

GeoJSON

Feature

Layer

Datasource

|c_name | coord_d | src | type | Extent|+----------------------+-------------------+| Cities | 2 | 4326 | Point | [p1,p2]| States | 2 | 4326 | Polygon | [p1,p2]

GDAL | GeoJSON for spatial data

Presenter
Presentation Notes
So we just follow the database structure of WKT approach, and change the document style into geojson, we got our second approach. And since mongodb has native support for geojson, the nice thing about this approach is that you could build up a spatial index and sent spatial query as you want.
Page 19: Giving MongoDB a Way to Play with the GIS Community

World

Canada

U.S.A

Oceans

Rivers

Cities

MongoDB Cluster

StatesRivers……

Database

Collection

FeatureCollection

Layer

Dataset

Datasource

GDAL | FeatureCollection

{ "type": "FeatureCollection", " crs " :{…} " bbox " :[….] "features": [ { "type": "Feature", "geometry": { "type": "Point", "coordinates": […] }, "properties": {"prop0": "value0"} }, … ] }

Presenter
Presentation Notes
But there is still one thing left, Cause’ in the geojson specification a document is not necessarily to be a feature. It can be a featurecollection as well, which means you got a lot of features in one document. And that will change the spatial database structure in mongodb.
Page 20: Giving MongoDB a Way to Play with the GIS Community

GDAL | Terminology

* FeatureCollection for GeoJSON format

RDBMS MongoDB GeoDatabase WKT GeoJSON FTCL*

Database Database Datasource Datasource Datasource Datasource

Table Collection Layer Layer Layer Dataset

Row(s) JSON Document Feature Feature Feature Layer

Index Index R tree — Grid Index Grid Index

Join Embedding & Linking Join Embedding &

Linking Embedding &

Linking Embedding &

Linking

Partition Shard — Shard Shard Shard

Presenter
Presentation Notes
So here is the terminology comparison table of the three approaches. You see WKT and GeoJSON approaches are alike. But Featurecollection has a slightly different structure. In the fCL approach each document serves as a layer.
Page 21: Giving MongoDB a Way to Play with the GIS Community

GDAL | who is better?

*http://en.wikipedia.org/wiki/Well-known_text ** http://geojson.org/geojson-spec.html

Features WKT GeoJSON Feature Collection

Structure Flexible & Tight Semi- Semi- & un-

Spatial Index NO Grid Index Grid Index

Efficiency SLOW FAST MEDIUM

Self-explanatory NO YES with semi- YES

Easy-sharing MEDIUM MEDIUM CONVENIENT

Geometry types ALL SFA, 18* LIMITED, 6** LIMITED, 6**

Presenter
Presentation Notes
But Who is better? It depands. WKT approach is poor in spatial query but has a full support of the standard. GeoJSON has spatial index but ONLY limited SDTs. Featurecollection approach contains all the information in one document, so is very convenient for sharing. OK. So here we go. We have three approaches to develop the GDAL driver for MongoDB to solve the problem in our project. However the driver is by no means only for our project, it has a much broader impact.
Page 22: Giving MongoDB a Way to Play with the GIS Community

ogr2ogr – convert simple features data between file formats

– spatial or attribute selections, reducing the set of attributes,

– setting the output coordinate system or even reprojecting

– Extract, Transform, and Load (ETL) Tools for MongoDB Geospatial

GDAL | Load all sorts of spatial data

Presenter
Presentation Notes
The most direct result is that we could utilize the tools of the gdal project. e.g. there is a tool called ogr2ogr which could transfer spatial data from different formats. With this tool mongodb could directly load spatial data from all kinds of spatial data formats, i.e. from esri shapefile, postgis, oracle spatial …
Page 23: Giving MongoDB a Way to Play with the GIS Community
Presenter
Presentation Notes
The second benefits will be the algorithms and software built on the gdal library. You do not have to make any changes to the algorithms but just use mongodb to organize your geospatial data. The algorithms will take advantage of the high performance capability of mongodb and fly.
Page 24: Giving MongoDB a Way to Play with the GIS Community

Work with various GIS software

Presenter
Presentation Notes
And as I mentioned that gdal is widely used among the gis ecosystem, and served as a fundamental library for lots of other gis software. So with the gdal driver for mongodb you suddenly have a number of GIS software which could directly cooperate with mongodb geospatial part. For example, arcgis, qgis, mapserver and so on.
Page 25: Giving MongoDB a Way to Play with the GIS Community

MongoDB Works with QGIS

Presenter
Presentation Notes
Here is one experiment I did. I loaded the global airports data from shapefile to mongodb using the tool I mentioned ogr2ogr. And then use QGIS, a popular desktop GIS software, to visualize them, and calculate their Voronoi polygons. So here in this map, Each polygon actually represents the nearest service area of each airport. So you see, with the gdal driver for mongodb You could use lots of GIS software directly to visualize, analyze and publish your geospatial data that stored in mongodb.
Page 26: Giving MongoDB a Way to Play with the GIS Community

A step forward : MongoGIS – Mend the way for the GIS community to play with MongoDB

3

Evolution of spatial database Tech

Comparison of spatial database solutions

Roadmap to make the way

Presenter
Presentation Notes
So that’s what we did to help mongodb cooperate with the gis ecosystem. But today I want to talk a little more than that. Mongodb is a great product, and for the GIS community it should not just serve as a container, a box for spatial data. It can have a much more significant destiny, and We should move a step forward. But in order to see that let’s first make a step backward and review the evolution of spatial database technology.
Page 27: Giving MongoDB a Way to Play with the GIS Community

GIS Application

Geometries Geometries Geometries files

FID

20th Century late 80s & early 90s

RDBMS for attribute data

File systems for geometry data.

An unique ID of feature link the two

ESRI Shapefile is one of most famous

Problems with data integrity, multiuser

access and editing

1st Generation | Hybrid Solution

Standard SQL Geoprocessing

Attributes

Presenter
Presentation Notes
Spatial database technology nowadays has go through three generations. Before spatial database GIS uses files to manage all the geospatial data. And late on, GIS could put some of its data, the attributes, in the database, and meanwhile still maintaining the spatial part, geometries, in the files. ESRI Shapefile is one of the most famous, and still widely used even today. But the problem with generation is that when the data volume becomes huge, maintaining the data integrity tends to be really complicated. And you also got problems with multiusers access and editing.
Page 28: Giving MongoDB a Way to Play with the GIS Community

IT

20th Century mid 90s

Attributes & Geometries in database

But geometry as binary large object

SDE as a middleware by GIS venders

Geometries are not understandable.

Poor integration, no spatial structure

query language

2nd Generation | Spatial Database Engine

SDE

Attributes Geometries Geometries Geometries

blobs SQL

GIS Application

Presenter
Presentation Notes
So when the database technology provided the binary large object (BLOB) for unstructured data, GIS quickly take advantage of it and developed spatial database engine upon it. For the first time GIS could store all its data in the database. ESRI ArcSDE is one of good examples. But the problem is that the database have no idea what these binaries are, you can’t use SQL language to do spatial query.
Page 29: Giving MongoDB a Way to Play with the GIS Community

GIS

eBusiness

Geometries Attributes

E-SQL

20th Century late 90s

Spatial is a native Data Type

Attributes & geometries all in

Rich GIS functions built inside

Supported by major DB venders

Spatial data queried using E-SQL

DB functionality fully supported

E-SQL

GIS GIS

eBusiness eBusiness

3rd Generation | Object-based Spatial Database

Presenter
Presentation Notes
The third generation comes, because of the object relational database technology. It allows you define new data types, spatial data types. And for the first time Spatial data in the database are no longer treated as second-class citizens. spatial database first got full support from database technology. Postgis is a most famous example of this kind.
Page 30: Giving MongoDB a Way to Play with the GIS Community

BIG DATA Spreading

2008.9

Nature

2009.1

Google

2009.5

UN

Detecting influenza epidemics using search engine query data

Global Plus Project

"Big Data for Development: Opportunities & Challenges”: A Global Pulse White Paper

2009.12

Microsoft

The Fourth Paradigm: Data-Intensive Scientific Discovery

2011.2

Science

Dealing with data

highlight both the challenges posed by the data deluge and the opportunities that can be realized if we can better organize and access the data.

2012.3

The White House

Big Data Initiative

more than $200 million to big data research projects.

Presenter
Presentation Notes
We are marching into the fourth generation of spatial database because of big data. In the era of big data, database technology tends to be running on large scale clusters, with high performance, high availability, and good scalability, which has been well discussed in the context of NoSQL technology. Geospatial data are no exception, especially when you see the wide spread of location-aware mobile devices, such as smart phones, tablets, and wearable sensors. So the GIS community needs a Next-G spatial database technology. But unfortunately when you look around in the world, you can find very little solutions, and in fact mongodb is a leader in this direction.
Page 31: Giving MongoDB a Way to Play with the GIS Community

Feature\Solutions

PostGIS As A Cluster MongoDB

Cluster Shared Disk Failover

File System Replication

Transaction Log Shipping

Trigger-Based Master-Standby

Replication

Statement-Based Replication Middleware

Asynchronous Multi-Master Replication

Implementation NAS DRBD Streaming Slony-I pgpool-II Bucardo Sharding

Communication Shared Disk Disk Blocks WAL Table Rows SQL Table Rows olog

No Special Hardware × √ √ √ √ √ √

Data Synchronous Sync Sync Sync, Async Async Sync Async Sync

Replication Method × M-S M-S M-S M-M, M-S M-M, M-S M-M

No Master Overhead √ × √ × √ √ √

Failover No Data Loss √ √ With Sync On × √ × √

Failover for HA Fast Fast Fast with Hot Manual Hard to Re-attach × Fast

Writes Scalability × × × × With M-M √ Good

Reads Scalability × × With Hot √ √ √ Good

Parallel Query × × × × With M-M √ √

Complexity For Admin Low Low Low High Very High High Low

Load Balancing × × × × √ × √

MongoDB as a High Performance Database

Presenter
Presentation Notes
Here in this table, from the perspective of next-g technology, I compared the 7 methods listed in the postgresql website, which can be used to deploy a postgis cluster. The table seems a little bit complex, but let’s focus one the colors. There are three colors in the table, Green means very good in this category Yellow means it’s OK. While red means you should pay close attention, otherwise you may get lots of trouble. As you can see, none of these solutions could fit in the demands of next-g spatial database. They all have sort of problems here or there. And you’ll get similar results if you review other popular existing spatial database solutions. But if you use these standards to examine mongodb you get all green. That is to say from the perspective of next-g technology, mongodb fits the needs quite nicely.
Page 32: Giving MongoDB a Way to Play with the GIS Community

Solutions OGC SFA SQL/MM GeoJSON ArcSDE PostGIS Oracle Spatial MongoDB

Spatial Data Types 17 18 6 +++ ++ ++ GeoJSON

Spatial Reference -- -- -- +++ +++ +++ WGS84

Spatial Index -- -- -- R tree Gist, Rtree R tree GeoHash

Geometry I/O √ √ -- +++ +++ ++ ×

Geometry Accessors √ √ -- +++ ++ ++ ×

Geometry Editors -- -- -- +++ ++ + ×

Topological Info -- √ -- +++ ++ +++ ×

Spatial Measurements √ √ -- +++ ++ ++ ×

Geo-processing √ √ -- +++ ++ ++ ×

Spatial Relationships √ √ -- +++ ++ ++ 4

GIS Tech Ecosystems -- -- -- +++ +++ + ×

MongoDB as a spatial database

Presenter
Presentation Notes
But how about the geospatial functionality? Here is another table, I examined the richness of spatial operators of the most popular spatial database solutions today. The first three columns are the related standards, the simple feature access, sql/mm, geojson. You see the geojson specification, which mongodb followed, only defined limited spatial data types, no additional operators. In the later three columns about spatial database solutions, a plus means I have all you defined and more. ArcSDE, the spatial database solution provided by the world largest GIS vendor ESRI, has the most richest spatial operators. Well what will happen if we use those categories to examine mongodb, right you got almost red. For example all the spatial database support thousands of spatial references, but mongodb only one. All the spatial database have a rich geospatial ecosystem upon them, but you can hardly find any for mongodb. So that is to say, from the professional spatial database point of view, both the spatial functionalities within mongodb and the geospatial ecosystem built on are incomplete. That’s really a sad lose for the GIS community. You know mongodb has a great potential to serve as next generation spatial database. And the GIS community should not lose such a great opportunity to promote its power. And it is the time for the GIS community to help mongodb geospatial part. But we do not have to start from scratch. there are a bunch of well-developed open source spatial libraries in the GIS ecosystem, which can be harnessed to improve the mongodb geospatial part. For example the proj4 library can be used to deal with the spatial reference part. And gdal library can help to bring up the geospatial ecosystem on mongodb.
Page 33: Giving MongoDB a Way to Play with the GIS Community

GDAL driver for mongodb – The way that mongodb plays with the GIS community

– Work with GDAL community to included in the next release

– Open Source: https://github.com/mongogis/mongodb-gdal-driver

MongoGIS – The Next Generation Infrastructure for the GIS community

– MongoGIS group in the github: https://github.com/mongogis

– We may build it together!

MongoGIS in github

Presenter
Presentation Notes
Therefore together with some of my colleagues, we set up a group called MongoGIS in the github, aiming at helping mongodb improve its geospatial part. And you can also find the gdal driver for mongodb there. And of course if you are interested, you are very welcome to join this group. It’s our hour to work with talent people from all over the world to build the next g infrastructure for the GIS community.
Page 34: Giving MongoDB a Way to Play with the GIS Community

Appreciate Your Time!

Sponsored by the China Scholarship Council for one year program at UIUC, Illinois, USA. Supported by the Scientific Research Foundation of Graduate School of Nanjing University.

Great Thanks go to Craig Wilson, Greg Steinbruner for their precious advices.

Presenter
Presentation Notes
So that is what I got. Appreciate your time!