47
A. Jesse Jiryu Davis “The Weather of the Century”: Data Visualization With MongoDB And Python Senior Engineer, MongoDB @jessejiryudavis

The Weather of the Century

  • Upload
    mongodb

  • View
    370

  • Download
    0

Embed Size (px)

DESCRIPTION

The weather is everywhere and always. That makes for a lot of data. This talk will walk you through how you can use MongoDB to store and analyze worldwide weather data from the entire 20th century in a graphical application. We’ll discuss loading and indexing terabytes of data in a sharded cluster, and optimizing the schema design for interactive exploration. MongoDB also natively supports geospatial indexing and querying, and it integrates easily with open source visualization tools. You'll earn high-performance techniques for querying and retrieving geospatial data, and how to create a rich visual representation of global weather data using Python, Monary, and Matplotlib.

Citation preview

Page 1: The Weather of the Century

A. Jesse Jiryu Davis

“The Weather of the Century”:!Data Visualization With MongoDB And Python

Senior Engineer, MongoDB@jessejiryudavis

Page 2: The Weather of the Century

Serious MongoDB Talk

Database

Page 3: The Weather of the Century

Serious MongoDB Talk

Page 4: The Weather of the Century

This Talk

Page 5: The Weather of the Century

Where’s the data from?

Page 6: The Weather of the Century

Where’s the data from?

Page 7: The Weather of the Century

How Much Is There?

• 2.5 billion documents • 4 TB (1.6k per document) • “Medium data”

Page 8: The Weather of the Century

What Does It Look Like?0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975 ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999 GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...

{ "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" } }

Station Identifier (»NYC Central Park«)

Page 9: The Weather of the Century

{! ts: ISODate("1991-01-01T00:00:00Z"),! position: {! type: "Point",! coordinates: [! -94.6,! 39.117! ]! },! airTemperature: {! value: 27,! quality: "1"! }!}!

GeoJSON

Page 10: The Weather of the Century

Visualization

Page 11: The Weather of the Century

Visualization Pipeline

MongoDB PyMongo NumPy MatplotlibPython dicts

SciPy

Page 12: The Weather of the Century

{! ts: ISODate("1991-01-01T00:00:00Z"),! position: {! type: "Point",! coordinates: [! -94.6,! 39.117! ]! },! airTemperature: {! value: 45,! quality: "1"! }!}!

Page 13: The Weather of the Century

import numpy!import pymongo!!data = []!db = pymongo.MongoClient().my_database!!for doc in db.collection.find(query):! data.append((! doc['position']['coordinates'][0],! doc['position']['coordinates'][1],! doc['airTemperature']['value']))!!arrays = numpy.array(data)!

Page 14: The Weather of the Century
Page 15: The Weather of the Century

# NumPy column access syntax.!lons = arrays[:, 0]!lats = arrays[:, 1]!temps = arrays[:, 2]!

Page 16: The Weather of the Century

from scipy import griddata!from matplotlib import pyplot!!xs = numpy.linspace(-180, 180, 361)!ys = numpy.linspace(-90, 90, 181)!zs = griddata(lats, lons, temps,! (xs, ys),! method='linear')!!pyplot.contour(xs, ys, zs)!

Magic!!

Also magic!!

Page 17: The Weather of the Century
Page 18: The Weather of the Century

from matplotlib import pyplot!!xs = numpy.linspace(-180, 180, 361)!ys = numpy.linspace(-90, 90, 181)!zs = griddata(lats, lons, temps,! (xs, ys),! method='linear')!!pyplot.contour(xs, ys, zs)!

Page 19: The Weather of the Century

Triangulation

Page 20: The Weather of the Century

Triangulation

Page 21: The Weather of the Century

What temperature?

Triangulation

Page 22: The Weather of the Century

Barycentric Interpolation

What temperature? 53

48

54

Weighted Average

51.1

Page 23: The Weather of the Century

Interpolation

51.1

Page 24: The Weather of the Century

Interpolation

Page 25: The Weather of the Century

Interpolation

Page 26: The Weather of the Century

Contours

Page 27: The Weather of the Century

Contours

Page 28: The Weather of the Century
Page 29: The Weather of the Century
Page 30: The Weather of the Century
Page 31: The Weather of the Century
Page 32: The Weather of the Century
Page 33: The Weather of the Century
Page 34: The Weather of the Century
Page 35: The Weather of the Century

import numpy!import pymongo!!data = []!db = pymongo.MongoClient().my_database!!for doc in db.collection.find(query):! data.append((! doc['position']['coordinates'][0],! doc['position']['coordinates'][1],! doc['airTemperature']['value']))!!arrays = numpy.array(data)!

Not terrifically fast

Page 36: The Weather of the Century

MongoDB-to-NumPy Performance

• Querying: 109k documents per second • (On localhost) • Can we go faster? • Enter “Monary”

Page 37: The Weather of the Century

MongoDB PyMongo NumPy MatplotlibPython dicts

MongoDB Monary NumPy Matplotlib

Monary by David Beach

Page 38: The Weather of the Century

import monary!!data = []!connection = monary.Monary()!!arrays = monary_connection.query(! db='my_database',! coll='collection',! query=query,! fields=[! 'position.coordinates.0',! 'position.coordinates.1',! 'airTemperature.value'],! types=[! 'float32',! 'float32',! 'float32'])!

Page 39: The Weather of the Century

Monary

• PyMongo: 109k documents per second • Monary: 817k documents per second

Page 40: The Weather of the Century

Visualization

Page 41: The Weather of the Century

• Author:David Beach

• Contributors from MongoDB, Inc.:Kyle SuarezMatt CotterAnna Herlihy

• Mentors:A. Jesse Jiryu DavisJason Carey

Monary

Page 42: The Weather of the Century

Recent features:

• Easy installation

• Nested field access

• Aggregation

• Python 3

Monary

Page 43: The Weather of the Century

• Insert, update, remove

• SSL and authentication mechanisms

• Improved API and logging

• parallelCollectionScan

Monary

Future:

Page 44: The Weather of the Century

!• MongoDB

• Python

• Monary

• NumPy

• SciPy

• Matplotlib

Page 45: The Weather of the Century

Thanks

Page 46: The Weather of the Century

Thank you

#MongoDBWorld

A. Jesse Jiryu DavisSenior Python Engineer, MongoDB

Page 47: The Weather of the Century

Presents

2. October MongoDB certification exams! price *= 0.8 Code “MongoDBBoston20” university.mongodb.com

3. Ask The Experts!!

1. http://bit.ly/century-links