34
Tools of the Data Smithe’s Trade Joe Smithe, Tim Hunter, Tad Slawecki, Steve Ruberg

Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Tools of the Data Smithe’s Trade

Joe Smithe, Tim Hunter, Tad Slawecki, Steve Ruberg

Page 2: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Before we begin, a thank you:

Drew Gronewold, Tim Hunter, Steve Ruberg, Ron Muzzi, more…

Special thanks to the IJC for the invite

Page 3: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Recommendations to the IJC

● The end point: storage, access, analysis, presentation○ Products of sensor technology infrastructure○ Data from sensors to users, decision makers, etc.

● Some old tech are fine● Some new tech are begging to be adopted● Do what is socially sustainable and secure

○ Account for the retiring generations and the up and coming working ones

○ Adopt technologies with support from many people

■ Fair chance of hackers, greater chance of good programmers who can fix things fast

Page 4: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Labyrinths of data, hard to get around...

Page 5: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 6: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://s382.photobucket.com/user/Gandalf-lotr/media/Gandalfsfirework.jpg.html

Page 7: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://corecanvas.s3.amazonaws.com/theonering-0188db0e/gallery/original/pippinmerry011128a.jpg

Page 8: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://iihtofficialblog.blogspot.com/2014/07/5-vs-of-hadoop-big-data.html

Page 9: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://iihtofficialblog.blogspot.com/2014/07/5-vs-of-hadoop-big-data.html

Page 10: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview of Infrastructure Technology

Page 11: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

DISCLAIMER: I HAVE NOT WORKED WITH ALL OF THESE TECHNOLOGIES. THIS IS MERELY A

CATALOG OF TOOLS TO DISCUSS.

Overview of Infrastructure Technology

Page 12: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 13: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 14: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Target Platforms

DesktopMobile

orTablet

Web

Page 15: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Target Platforms

DesktopMobile

orTablet

Web

MS Windows● .NET● OneCoreApple● OS X and

Xcode*nix● Various

(Linux)

Win Phone

Apple● iOS

Android

Microsoft● ASP .NETLinux● LAMPOther● Wordpress● Drupal● Many more

Page 16: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 17: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Storage formats

● Plain text○ “Future proof”○ Growth can prove challenging○ Examples: XML, WaterML,

[other]ML, CSV● Binary

○ Computers eat this stuff up, but humans don’t. Good to have transformers to create downloadable and ingestible copies

○ Examples: GRiB, NetCDF

BluePenguino - Photobuckethttp://culturepopped.blogspot.com/2014/12/the-legends-of-pac-man.html

Page 18: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 19: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Data management

● Data provenance (origin) - copies aren’t great, version control systems offer limited help. Authoritative sources and citations to them mitigate noise, copies.

● Structured directories, even on the web● Relational Database Management Systems (RDBMSs)

○ Postgre SQL (recommended), MySQL, SQLite■ http://ask.metafilter.com/92162/MySQL-vs-PostgreSQL

○ Big Data - NoSQL, SciDB○ Geospatial - PostGIS, SpatialLite, MySQL Spatial

■ CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above

■ Web services (accessibility)

Page 20: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Data management - new tech to adopt

● GRAPH DATABASES○ Fund them○ Power++

■ Utilizes the power of graphs to explore relationships between data points

■ Understand, investigate many to many, one to many, many to one relationships with ease

○ http://cyanohub.earth.lsa.umich.edu/

○ For more: http://neo4j.com/developer/graph-db-vs-

rdbms/ and http://mashable.com/2012/09/26/graph-databases/

Page 21: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 22: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Model coupling or combining

● Java-based Object Modelling System● OpenMI (Open Modelling Interface, C# and Java)

○ GUIs - OpenMI Configuration Editor, Pipistrelle

A lot of specialized models focus on limited domains, and via coupling, we can attain a modelling domain that spans current problems...

Page 23: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 24: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Probabilistic Modelling

● Bayesian hierarchical modelling is becoming a very popular approach in many problems where estimates are many but conclusions are few or divergent○ JAGS○ Stan

● Cha, Y. and C.A. Stow. 2014. A Bayesian network incorporating observation error to predict phosphorus and chlorophyll a in Saginaw Bay. Environmental Modelling & Software, 57: 90- 100

● Gronewold, A.D., J. Bruxer, D. Durnford, J. Smith, A. Clites, F. Seglenieks, T. Hunter, S. Qian, V. Fortin (Accepted, 2016).

Hydrological drivers of record-setting water level rise on Earth’s

largest lake system. Water Resources Research.

Page 25: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)
Page 26: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 27: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Distributed processing

● High Performance Computers (HPCs, formerly Super)● MapReduce (key/value pairs as input)

○ programming model, similar to the Message Passage Interface (MPI)

○ scalable○ reputable fault tolerance (robust)

■ Apache Hadoop (an implementation)■ R and Hadoop Integrated Processing Environment

(RHIPE)

Page 28: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 29: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Modelling Services, Processing, Presentation

● Matlab, R, Python (Anaconda distribution), assisted with shell scripting○ http://www.talyarkoni.org/blog/2013/11/18/the-homogenization-of-scientific-computing-or-why-python-

is-steadily-eating-other-languages-lunch/

● Julia● Web Development

○ PHP, Javascript (and packages, more later)○ Frameworks under Java, Python, Ruby on Rails○ *.NET Frameworks (Microsoft)○ Backbone.js, Django

○ Content Management Systems (CMSs) such as Drupal, CKAN

Page 30: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 31: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Fireworks (Visualization)

● Often cast as the data themselves...

● Javascript Packages: jqPlot, Flot, Processing (language), Raphaël, D3 (successor to Protovis), Google Charts, and Dygraphs

● Apache Flex● Mapping: OpenLayers, Google Earth/Maps● Interfaces: CUAHSI HydroShare, QGIS (like ArcGIS), uDig● Desktop plotting packages:

○ R: ggplot2, ggvis, rgl, and default packages○ Python: Matplotlib, Plotly, Pychart...

■ https://wiki.python.org/moin/NumericAndScientific/Plotting

Page 32: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

jpTheSmithe.com

Page 33: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

All from Environmental Modelling and Software:

● Web technologies for environmental big data (Open Access), Vitolo et al. (2015)

● Web based visualization of large climate data sets, J. R. Alder and S.W. Hostetler (2015)

● A review of open source software solutions for developing water resources web applications, Swain et al. (2015)

And we’ll probably do this again in 5-10 years next year!

Relevant parchments:

Page 34: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Recommendations to the IJC

● The end point: storage, access, analysis, presentation○ Products of sensor technology infrastructure○ Data from sensors to users, decision makers, etc.

● Some old tech are fine● Some new tech are begging to be adopted● Do what is socially sustainable and secure

○ Account for the retiring generations and the up and coming working ones

○ Adopt technologies with support from many people

■ Fair chance of hackers, greater chance of good programmers who can fix things fast