29
LaTiS https://github.com/dlindhol/ LaTiS Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014

LaTiS https:// github/dlindhol/LaTiS

  • Upload
    eve

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

LaTiS https:// github.com/dlindhol/LaTiS. Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014. Motivation - Get Data Into Analysis Code/Tools. Disparate Data. Unified Interface. LaTiS Server Architecture. Client Applications. - PowerPoint PPT Presentation

Citation preview

Page 1: LaTiS https:// github/dlindhol/LaTiS

LaTiShttps://github.com/dlindhol/LaTiS

Doug LindholmLaboratory for Atmospheric and Space Physics

University of Colorado BoulderESIP – July 8, 2014

Page 2: LaTiS https:// github/dlindhol/LaTiS

Motivation - Get Data Into Analysis Code/Tools

Disparate Data

UnifiedInterface

Page 3: LaTiS https:// github/dlindhol/LaTiS

LaTiS Server Architecture

Native Data Descriptors Adapters Filters WritersClient

Applications

LaTiS D

ata Model

TSML

TSML

TSML

TSML

TSML

ASCII

Binary

JDBC

FITS

WebService

Custom

Subset

Constrain(sst > 20)

JSON

ConvertUnits

DAP2

Image

codesnippet

MissingValues

DerivedProducts

CustomCustom

CSV WebBrowser

Excel

AnalysisTools

Programs

Web Service

Page 4: LaTiS https:// github/dlindhol/LaTiS

LaTiS Client Options

• Any OPeNDAP client. Available for most programming languages (python, IDL, Matlab,...).

• Analysis/visualization tools with built in OPeNDAP support.

• Web browser: Directly enter http URL query.• wget, curl: command line tools for making an HTTP

request.• Custom Web Applications (Open Source coming

soon) that make AJAX requests to LaTiS to get JSON output and make interactive plots.

• Custom programming APIs that wrap a LaTiS call.

Page 5: LaTiS https:// github/dlindhol/LaTiS

Related Technology Comparisons• OPeNDAP

– Both implement DAP2 protocol (standard service API)– OPeNDAP servers tend to be file centric– LaTiS presents “virtual” dataset via aggregation– LaTiS aims to be easier to install, configure, and extend

• NetCDF Common Data Model (CDM)– Multidimensional array centric– Coupled to NetCDF file format– Climate and forecast model (simulation) emphasis

• THREDDS Data Server– Built around NetCDF CDM– Provides OPeNDAP and other service interfaces

• TSDS– First generation of LaTiS built on NetCDF CDM

• VisAD– Essentially the same logical data model as LaTiS with a clunkier implementation

based on old Java capabilities– LaTiS is implemented around modern paradigms like Functional Programming

Page 6: LaTiS https:// github/dlindhol/LaTiS

What do I mean by Data Model• NOT a simulation or forecast (climate model)• NOT a metadata model (ISO 19115)• NOT a file format (NetCDF)• NOT how the data are stored (RDBMS)• NOT the representation in computer memory

(data structure)

• Logical model• What the data represent, conceptually• How the data are used

Page 7: LaTiS https:// github/dlindhol/LaTiS

Data Abstractions

10110101000001001111001100110011111110bits

bytes 00105e0 e6b0 343b 9c74 0804 e7bc 0804 e7d5 0804

1, -506376193, 13.52, 0.177483826523, 1.02e-14

int, long, float, double, scientific notation (Number)

1.2 3.6 2.4 1.7 -3.2array

Page 8: LaTiS https:// github/dlindhol/LaTiS

Scientific Data Abstractions

Multi-dimensional Arrays

Key Features:- Single data type- Access by index

Page 9: LaTiS https:// github/dlindhol/LaTiS

Relational Data

Relational Database

Table = RelationRow = Tuple of Attributes e.g. (0, 3.5, B)

Key Features:- Supports different data types- Well suited for access by value e.g. time>2, class=A

But the relation is limited to a sequence of tuples:

time flux class

0 3.5 B

1 4.6 A

2 4.7 A

3 4.1 A

4 3.2 B

Page 10: LaTiS https:// github/dlindhol/LaTiS

LaTiS Unified Data Model• Extends the Relational Model to add Functional relationships.• Represents multi-dimensional domain of data grids.• Access by value or index.

Example: time series of gridded surface winds

Time -> ((Lon, Lat) -> (U,V))

IndependentVariable(domain)

DependentVariables(range)

IndependentVariable

Page 11: LaTiS https:// github/dlindhol/LaTiS

LaTiS Data ModelOnly Three Variable Types: Scalar: single Variable Tuple: group of Variables Function: mapping from one Variable to another

Extend to capture higher level, domain specific abstractions

Page 12: LaTiS https:// github/dlindhol/LaTiS

Discipline Agnostic Data Access with LaTiS

Philosophy: Leave data in their native form Expose via a common interfaceSoftware:• Reusable adapters (software modules) to read common

formats, extension points for custom formats• XML dataset descriptors, map native data model to the

LaTiS data model• Open Source, communityWeb services:• Standard service interfaces, currently OPeNDAP• Server side processing and output format options

Page 13: LaTiS https:// github/dlindhol/LaTiS

Implementing the Data Model

• The LaTiS Data Model is an abstract representation• Can be represented several ways

– UML– VisAD grammar– Java Interface (no implementation)

• Need an implementation in code• Scientific data Domain Specific Language (DSL)

– Expose an API that fits the application domain• Scala programming language

– http://www.scala-lang.org/

Page 14: LaTiS https:// github/dlindhol/LaTiS

Why Scala• Evolution of Java

– Use with existing Java code– Runs on the Java Virtual Machine (JVM)– Command line (REPL), script, or compiled– Statically typed (safer than dynamic languages)– Industrial strength (Twitter, LinkedIn, …)

• Object-Oriented– Encapsulation, polymorphism, …– Traits: interfaces with implementation, multiple inheritance, mix-ins

• Functional Programming– Immutable data structures– Functions with no side effects– Provable, parallelizable

• Syntactic sugar for Domain Specific Languages• Operator “overloading”, natural math language for Variables• Parallel collections

Page 15: LaTiS https:// github/dlindhol/LaTiS

Scala Implementation• Dataset as a Scala collection• Functional Programming Paradigms:

– Function composition over object manipulation– Functions as first class citizens

• a LaTiS Function can be used like a programming function– Immutable data structures– No side-effects: parallelizable, provable– Lazy evaluation: scalable

• Math and resampling mixed in– e.g. dataset3 = (dataset1 + dataset2) / 2

• Metadata encapsulated– enforce data consistency: unit conversions ...– track provenance

Page 16: LaTiS https:// github/dlindhol/LaTiS

LaTiS Server Implementation• RESTful web service API (OPeNDAP +)• Java Servlet, build and deploy war file• XML dataset descriptor (TSML) for each dataset

– Specify Adapter to use– Map native data source to LaTiS data model– Define transformations as Processing Instructions

• Catalog to map dataset names to TSML• Plugins: implement the Adapter, Filter or Writer interfaces or

extend existing ones• Properties file to map filter and writer names to

implementing classes

Page 17: LaTiS https:// github/dlindhol/LaTiS

Example – Serving an ASCII File

Sunspot data for October 2003

2003 10 01 752003 10 02 722003 10 03 592003 10 04 602003 10 05 532003 10 06 512003 10 07 502003 10 08 562003 10 09 582003 10 10 502003 10 11 442003 10 12 222003 10 13 122003 10 14 42003 10 15 172003 10 16 242003 10 17 372003 10 18 432003 10 19 432003 10 20 642003 10 21 662003 10 22 722003 10 23 682003 10 24 812003 10 25 892003 10 26 1022003 10 27 1412003 10 28 1612003 10 29 1672003 10 30 1712003 10 31 156

TSML Dataset descriptor

<?xml version="1.0" encoding="UTF-8"?><tsml>

<dataset name="Sunspot_Number" history="Read by LaTiS"> <adapter class="latis.reader.tsml.AsciiAdapter" url="file:/data/latis/ssn.txt" /> <time units="yyyy MM dd” /> <integer name="ssn” /> </dataset>

</tsml>

Page 18: LaTiS https:// github/dlindhol/LaTiS

Example – Serving an ASCII File

Page 19: LaTiS https:// github/dlindhol/LaTiS

Current Applications• LASP Interactive Solar Irradiance Data Center (LISIRD)

– Uses LaTiS to read, subset, reformat data, metadata– http://lasp.colorado.edu/lisird/

• Time Series Data Server (TSDS)– Common RESTful interface to NASA Heliophysics data– http://tsds.net/

Other LASP projects: MMS, MAVEN, database statistics, log filesExternal users?

Page 20: LaTiS https:// github/dlindhol/LaTiS

Capabilities – Data Reader Modules• Operational:

– ASCII (file, web service, system call), binary, NetCDF, Relational database, data “generators”

– Time Series of scalars, vectors, and spectra– Arbitrarily long time series

• Prototyped:– HDF, CDF, FITS, GRIB, OPeNDAP (e.g. other LaTiS

servers), NoSQL (MongoDB)– Nested 2D (gridded) data structures

• Planned:– Arbitrarily complex data structures

Page 21: LaTiS https:// github/dlindhol/LaTiS

Capabilities – Data Writer Modules

• Operational:– OPeNDAP, ASCII (e.g. csv), binary, JSON, Image (PNG),

IDL code, HTML dataset landing page• Prototyped:

– NetCDF, HDF, IDL save file, interactive plot • Planned:

– GeoTIFF, …

Page 22: LaTiS https:// github/dlindhol/LaTiS

Capabilities – Data Filter Modules

• Operational:– Subset, aggregate, stride, thin, replace, integrate, bin

average• Prototyped:

– FFT, min, max, unique, resampling, unit conversion• Planned:

– Coordinate system transformations– Make it easier to plug in custom computations– Track provenance

Page 23: LaTiS https:// github/dlindhol/LaTiS

Capabilities – Service Interface

• Operational:– OPeNDAP– Java Servlet, simply deploy war file (Tomcat, Glassfish)

• Prototyped:– Authentication– Single executable (jetty)– THREDDS Data Server (TDS) integration

• Planned:– Open Geospatial Consortium (OGC) standards

• Web Map Server• Web Coverage Server

Page 24: LaTiS https:// github/dlindhol/LaTiS

Capabilities - Metadata

• Operational:– THREDDS catalog, static XML, browse

• Prototyped:– Semantic Web triple store (RDF, SPARQL)– Text search (Solr)– Modeling RDF triples (subject, predicate, object)– Track provenance, record Dataset modifications

• Planned:– Serve metadata in various schema (e.g. ISO 19115,

SPASE)– Unique IDs, Digital Object Identifiers (DOI) for publishing

Page 25: LaTiS https:// github/dlindhol/LaTiS

Other Capabilities

• Operational:– Time API with formatting– Time conversions with leap seconds

• Prototyped:– Caching, improve performance– Parallel processing, multi-core

• Planned:– Big Data, Hadoop, Map Reduce– Workflow integration

Page 26: LaTiS https:// github/dlindhol/LaTiS

Source Code Management – Open Source

• Time Series Server (a.k.a. TSS1)– Core of Time Series Data Server (TSDS, tsds.net)– Built around Unidata Common Data Model– SourceForge: https://sourceforge.net/projects/tsds/

• LaTiS (a.k.a. TSS2)– New LaTiS data model, scala implementation– GitHub: https://github.com/dlindhol/LaTiS– LASP internal development branch– Plug-ins as separate projects (e.g. data collections, math,

custom readers/writers,…), keep core small

Page 27: LaTiS https:// github/dlindhol/LaTiS

My Background (i.e. bias)

• Astrophysicist by degree, software engineer by profession

• Data user and provider• Scientific data applications developer:

– astrophysics, atmospheric science, space science• Holy Grail: common data model• Favorite scientific data models:

– VisAD (http://www.ssec.wisc.edu/~billh/visad.html)– Unidata Common Data Model

(http://www.unidata.ucar.edu/software/netcdf-java/CDM/)– OPeNDAP (http://www.opendap.org/)

Page 28: LaTiS https:// github/dlindhol/LaTiS

Motivation – Stove Pipes

Page 29: LaTiS https:// github/dlindhol/LaTiS

Single Data Access Interface