Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
I M P R O V E D D A T A A N A L Y S I S T O O L S F O R T H E
T H E R M A L E M I S S I O N S P E C T R O M E T E R
K. Rodriguez J. Laura R. Fergason R. Bogle
What is TES
W H A T I S T H I S A B O U T ?
Why is TES important
Challenges in ingestion and modeling
• setup costs
• ingestion costs
• query times
Collected ~206 Million Spectra in its lifetime
Highest spectral resolution for infrared data
Ideal for analysis at global scales
519 Fields across ~206 million records
~200GB footprint
Nine tables, normalized
One is command line based, the other is a web form. They
get the job done, but we can always aim for improvement.
Increase productivity by reducing friction though more
precise tools.
Two Tools already exist: Vanilla and TES Data Tool
First Goal: Enable distributed analytics for the TES data set
Focus on spatio-temporal. Bin data at different solar longitude steps and Lon/Lat per pixel.
MONGODB
Popular document
based database. Has
built-in distributed
support.
POSTGRESQL
Traditional RDBMS
store. Uses PostGIS
for supporting spatial
queries
CASSANDRA
Popular Columnar &
key/value store. Works
best for sparse data
access.
The Databases.
The Data Center.
APACHE SPARK
Distributed computing engine
for large scale data volumes.
Supports most databases.
DC/OS
DataCenter Operating System.
Allows for the containerization
of services like Spark over a
pool of resources.
DC/OS Running with Cassandra and PostgreSQL
Progress so far…
Ingestion code publicly available via the Python library plio on Github.
Reads in the original binaries as Pandas data frames. Scripts for loading files into Postgres and MongoDB also available soon.
Maps Created.
Mars Thermal Inertia from MongoDB. Year 25at 6 lat/lon per pixel during Solar Longitudes (Ls) 170-220 at
10 Ls per step.
Future Work.Maps are nice… What about analytics?
DATA MINING
Start to apply spatio-temporal data mining techniques to look
for outliers in the data (Anomaly Detection).
MORE DISTRIBUTION
Complete distributed Cassandra & MongoDB. Make Spark
Distributed.