Upload
mongodb
View
370
Download
0
Embed Size (px)
DESCRIPTION
The weather is everywhere and always. That makes for a lot of data. This talk will walk you through how you can use MongoDB to store and analyze worldwide weather data from the entire 20th century in a graphical application. We’ll discuss loading and indexing terabytes of data in a sharded cluster, and optimizing the schema design for interactive exploration. MongoDB also natively supports geospatial indexing and querying, and it integrates easily with open source visualization tools. You'll earn high-performance techniques for querying and retrieving geospatial data, and how to create a rich visual representation of global weather data using Python, Monary, and Matplotlib.
Citation preview
A. Jesse Jiryu Davis
“The Weather of the Century”:!Data Visualization With MongoDB And Python
Senior Engineer, MongoDB@jessejiryudavis
Serious MongoDB Talk
Database
Serious MongoDB Talk
This Talk
Where’s the data from?
Where’s the data from?
How Much Is There?
• 2.5 billion documents • 4 TB (1.6k per document) • “Medium data”
What Does It Look Like?0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975 ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999 GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...
{ "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" } }
Station Identifier (»NYC Central Park«)
{! ts: ISODate("1991-01-01T00:00:00Z"),! position: {! type: "Point",! coordinates: [! -94.6,! 39.117! ]! },! airTemperature: {! value: 27,! quality: "1"! }!}!
GeoJSON
Visualization
Visualization Pipeline
MongoDB PyMongo NumPy MatplotlibPython dicts
SciPy
{! ts: ISODate("1991-01-01T00:00:00Z"),! position: {! type: "Point",! coordinates: [! -94.6,! 39.117! ]! },! airTemperature: {! value: 45,! quality: "1"! }!}!
import numpy!import pymongo!!data = []!db = pymongo.MongoClient().my_database!!for doc in db.collection.find(query):! data.append((! doc['position']['coordinates'][0],! doc['position']['coordinates'][1],! doc['airTemperature']['value']))!!arrays = numpy.array(data)!
# NumPy column access syntax.!lons = arrays[:, 0]!lats = arrays[:, 1]!temps = arrays[:, 2]!
from scipy import griddata!from matplotlib import pyplot!!xs = numpy.linspace(-180, 180, 361)!ys = numpy.linspace(-90, 90, 181)!zs = griddata(lats, lons, temps,! (xs, ys),! method='linear')!!pyplot.contour(xs, ys, zs)!
Magic!!
Also magic!!
from matplotlib import pyplot!!xs = numpy.linspace(-180, 180, 361)!ys = numpy.linspace(-90, 90, 181)!zs = griddata(lats, lons, temps,! (xs, ys),! method='linear')!!pyplot.contour(xs, ys, zs)!
Triangulation
Triangulation
What temperature?
Triangulation
Barycentric Interpolation
What temperature? 53
48
54
Weighted Average
51.1
Interpolation
51.1
Interpolation
Interpolation
Contours
Contours
import numpy!import pymongo!!data = []!db = pymongo.MongoClient().my_database!!for doc in db.collection.find(query):! data.append((! doc['position']['coordinates'][0],! doc['position']['coordinates'][1],! doc['airTemperature']['value']))!!arrays = numpy.array(data)!
Not terrifically fast
MongoDB-to-NumPy Performance
• Querying: 109k documents per second • (On localhost) • Can we go faster? • Enter “Monary”
MongoDB PyMongo NumPy MatplotlibPython dicts
MongoDB Monary NumPy Matplotlib
Monary by David Beach
import monary!!data = []!connection = monary.Monary()!!arrays = monary_connection.query(! db='my_database',! coll='collection',! query=query,! fields=[! 'position.coordinates.0',! 'position.coordinates.1',! 'airTemperature.value'],! types=[! 'float32',! 'float32',! 'float32'])!
Monary
• PyMongo: 109k documents per second • Monary: 817k documents per second
Visualization
• Author:David Beach
• Contributors from MongoDB, Inc.:Kyle SuarezMatt CotterAnna Herlihy
• Mentors:A. Jesse Jiryu DavisJason Carey
Monary
Recent features:
• Easy installation
• Nested field access
• Aggregation
• Python 3
Monary
• Insert, update, remove
• SSL and authentication mechanisms
• Improved API and logging
• parallelCollectionScan
Monary
Future:
!• MongoDB
• Python
• Monary
• NumPy
• SciPy
• Matplotlib
Thanks
Thank you
#MongoDBWorld
A. Jesse Jiryu DavisSenior Python Engineer, MongoDB
Presents
2. October MongoDB certification exams! price *= 0.8 Code “MongoDBBoston20” university.mongodb.com
3. Ask The Experts!!
1. http://bit.ly/century-links