57
The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ. of Southern California Member, Apache Software Foundation

The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Embed Size (px)

Citation preview

Page 1: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

The Apache OODT Ecosystem: A Birds Eye View

Chris A. MattmannSenior Computer Scientist, NASA Jet Propulsion Laboratory

Adjunct Assistant Professor, Univ. of Southern CaliforniaMember, Apache Software Foundation

Page 2: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

And you are?

• Apache Member involved in– OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC), Incubator (PMC), SIS

(Mentor), Lucy (Mentor) and Gora (Champion), MRUnit (Mentor), Airavata (Mentor)

• Senior Computer Scientist at NASA JPL in Pasadena, CA USA

• Software Architecture/Engineering Prof at Univ. of Southern California

22-Feb-12 2NCAR-SEA-2012

Page 3: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Agenda

• Overview of OODT and its history• How we got it to Apache• How other projects can follow our model• Existing successful deployments of OODT• Pointers to papers, and more information

including case studies

22-Feb-12 NCAR-SEA-2012 3

Page 4: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

• Increasing data volumes (exponential growth)

• Increasing complexity of instruments and algorithms

• Increasing availability of proxy/sim/ancillary data

• Increasing rate of technology refresh

… all of this while NASA Earth Mission funding was decreasing

A data system framework based on a standard architecture and reusable software components for supporting all future missions.

Lessons from 90’s era missions

22-Feb-12 NCAR-SEA-2012 4

Page 5: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Object Oriented Data Technology http://oodt.apache.org

Funded initially in 1998 by NASA’s Office of Space Science

Envisaged as a national software framework for sharingdata across heterogeneous, distributed data repositories

OODT is both an architecture and a reference implementation providing

Data Production

Data Distribution

Data Discovery

Data Access

OODT is Open Source and available from the Apache Software Foundation

Enter OODT

22-Feb-12 NCAR-SEA-2012 5

Page 6: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Apache OODT• Entered “incubation” at the Apache

Software Foundation in 2010

• Selected as a top level Apache Software Foundation project in January 2011

• Developed by a community of participants from many companies, universities, and organizations

• Used for a diverse set of science data system activities in planetary science, earth science, radio astronomy, biomedicine, astrophysics, and more

OODT Development & user community includes:

http://oodt.apache.org

22-Feb-12 NCAR-SEA-2012 6

Page 7: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Apache OODT Press

22-Feb-12 NCAR-SEA-2012 7

Page 8: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Why Apache and OODT?• OODT is meant to be a set of tools to

help build data systems– It’s not meant to be “turn key” – It attempts to exploit the boundary

between bringing in capability vs. being overly rigid in science

– Each discipline/project extends

• Apache is the elite open source community for software developers– Less than 100 projects have been

promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop)

– Differs from other open source communities; it provides a governance and management structure

22-Feb-12 NCAR-SEA-2012 8

Page 9: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Governance Model+NASA=♥

• NASA and other government agencies have tons of process– They like that

22-Feb-12 NCAR-SEA-2012 9

Page 10: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Publicly accessible and searchable archives

• http://svnsearch.org/svnsearch/repos/ASF/search?path=%2Foodt

• http://mail-archives.apache.org/mod_mbox/oodt-dev/

• http://mail-archives.apache.org/mod_mbox/oodt-user/

• 100+ ML list subscriptions22-Feb-12 NCAR-SEA-2012 10

Page 11: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Great Metrics and Insight

• http://www.ohloh.net/p/oodt

22-Feb-12 NCAR-SEA-2012 11

Page 12: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Movement to the ASF

• Meeting held June 15, 2007 at JPL with ASF President Justin Erenkrantz– Develop plan moving forward to bring first

NASA project to Apache– Discuss obstacles, sponsorship– Discuss outlook

22-Feb-12 NCAR-SEA-2012 12

Page 13: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

2007: original goals

• Come up with incubation proposal– Chris Mattmann was one of the principal contributors

to the proposal for the Tika project, and to other Incubation activities (Apache SIS)

– Send out emails to the Incubator mailing list• Look for mentors

• Get sponsorship from ranking Apache PMC member or board member– Justin and others

• Top-level project versus sub project outlook heading out of incubation

22-Feb-12 NCAR-SEA-2012 13

Page 14: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

OODT Incubator Planning

• Monthly Updates (for first 3 months, then quarterly)– Status– Progress– Community– Acceptance

• Plan for exiting incubation– How to have a solid user base– How to operate as a unit in the Apache way– Maintenance of user interest and community going

forward

22-Feb-12 NCAR-SEA-2012 14

Page 15: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

OODT’s next steps circa 2007

• JPL to tackle legal issues– Is OODT releasable as an Apache product– http://www.apache.org/licenses/software-grant.txt

• This needs to be signed by parties that be by JPL

– Contributor License Agreement• Do we need a corporate one?

• In parallel to this– Draft OODT incubation proposal– Start identifying who would initially be interested

• More external, non-JPL people who are interested, the better

• Justin to get slides from other incubator people

22-Feb-12 NCAR-SEA-2012 15

Page 16: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

…2 years later

• Worked it out with JPL legal– Turns out the ALv2 license is extremely friendly and is

something that JPL (note not all of NASA) was amenable to

• Developed OODT incubator proposal– http://wiki.apache.org/incubator/OODTProposal

• Found willing Apache mentors besides Justin– Jean-Frederic Clere, Ross Gardler, Ian Holsman

• …Put OODT at Apache!

22-Feb-12 NCAR-SEA-2012 16

Page 17: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Apache OODT Community

• Includes PMC members from– NASA JPL, Univ. of Southern California, Google, Children’s

Hospital Los Angeles (CHLA), Vdio, South African SKA Project

• Projects that are deploying it operationally at– Decadal-survey recommended NASA Earth science

missions, NIH, and NCI, CHLA, USC, South African SKA project

• Use in the classroom– My graduate-level software architecture and seach

engines courses

22-Feb-12 NCAR-SEA-2012 17

Page 18: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

OODT Framework and PCS

OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK

OODT/Science Web Tools

ArchiveClient

ProfileXML Data

DataSystem 1

DataSystem 2

ArchiveService

ProfileService

ProductService

QueryService

Bridge to ExternalServices

Navigation Service

OtherService 1

OtherService 2

Catalog & ArchiveService

Process Control System (PCS)

Catalog & ArchiveService (CAS)

CAS has recently become known as Process Control Systemwhen applied to mission work.

22-Feb-12 NCAR-SEA-2012 18

Page 19: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Orbiting Carbon Observatory (OCO-2) - spectrometer instrumentNASA ESSP Mission, launch date: TBD 2013

PCS supporting Thermal Vacuum Tests, Ground-based instrument data processing, Space-based instrument data processing and Science Computing Facility

EOM Data Volume: 61-81 TB in 3 yrs Processing Throughput: 200-300 jobs/day

NPP Sounder PEATE - infrared sounder

Joint NASA/NPOESS mission, launch date: October 2011

PCS supporting Science Computing Facility (PEATE)

EOM Data Volume: 600 TB in 5 yrs Processing Throughput: 600 jobs/day

QuikSCAT - scatterometer

NASA Quick-Recovery Mission, launch date: June 1999

PCS supporting instrument data processing and science analyst sandbox

Originally planned as a 2-year mission

SMAP - high-res radar and radiometer

NASA decadal study mission, launch date: 2014

PCS supporting radar instrument and science algorithm development testbed

Current PCS deployments

22-Feb-12 NCAR-SEA-2012 19

Page 20: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Other PCS applications

BioinformaticsNational Institutes of Health (NIH) National Cancer Institute’s (NCI) Early Detection Research Network (EDRN)

Children’s Hospital LA Virtual Pediatric Intensive Care Unit (VPICU)

Technology DemonstrationJPL’s Active Mirror Telescope (AMT)

White Sands Missile Range

Earth ScienceNASA’s Virtual Oceanographic Data Center (VODC)

JPL’s Climate Data eXchange (CDX)

Astronomy and RadioPrototype work on MeerKAT with South Africans and KAT-7 telescope

Discussions ongoing with NRAO Socorro (EVLA and ALMA)

22-Feb-12 NCAR-SEA-2012 20

Page 21: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

• All Core components implemented as web services– XML-RPC used to communicate between components

– Servers implemented in Java

– Clients implemented in Java, scripts, Python, PHP and web-apps

– Service configuration implemented in ASCII and XML files

PCS Core Components

22-Feb-12 NCAR-SEA-2012 21

Page 22: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

• File Manager does Data Management– Tracks all of the stored data, files & metadata

– Moves data to appropriate locations before and after initiating PGE runs and from staging area to controlled access storage

• Workflow Manager does Pipeline Processing– Automates processing when all run conditions are ready

– Monitors and logs processing status

• Resource Manager does Resource Management– Allocates processing jobs to computing resources

– Monitors and logs job & resource status

– Copies output data to storage locations where space is available

– Provides the means to monitor resource usage

Core Capabilities

22-Feb-12 NCAR-SEA-2012 22

Page 23: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

PCS Ingestion Use Case

22-Feb-12 NCAR-SEA-2012 23

Page 24: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

File/Metadata Capabilities

22-Feb-12 NCAR-SEA-2012 24

Page 25: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

PCS Processing Use Case

22-Feb-12 NCAR-SEA-2012 25

Page 26: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Advanced Workflow Monitoring

22-Feb-12 NCAR-SEA-2012 26

Page 27: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Resource Monitoring

22-Feb-12 NCAR-SEA-2012 27

Page 28: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

How do we deploy PCS for a mission?• We implement the following mission-specific customizations

– Server Configuration• Implemented in ASCII properties files

– Product metadata specification• Implemented in XML policy files

– Processing Rules• Implemented as Java classes and/or XML policy files

– PGE Configuration• Implemented in XML policy files

– Compute Node Usage Policies• Implemented in XML policy files

• Here’s what we don’t change

– All PCS Servers (e.g. File Manager, Workflow Manager, Resource Manager)

• Core data management, pipeline process management and job scheduling/submission capabilities

– File Catalog schema

– Workflow Model Repository Schema

22-Feb-12 NCAR-SEA-2012 28

Page 29: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Server and PGE Configuration

22-Feb-12 NCAR-SEA-2012 29

Page 30: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

What is the Level of Effort for personalizing PCS?

• PCS Server Configuration – “days”– Deployment specific

• Addition of New File (Product) Type – “days”– Product metadata specification– Metadata extraction (if applicable)– Ingest Policy specification (if remote pull or remote push)

• Addition of a New PGE – (initial integration, ~ weeks)– Policy specification– Production rules– PGE Initiation

* Estimates based on OCO and NPP experience

22-Feb-12 NCAR-SEA-2012 30

Page 31: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

A typical PCS service (e.g., fm, wm, rm)

22-Feb-12 NCAR-SEA-2012 31

Page 32: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

What’s PCS configuration?

• Configuration follows typical Apache-like server configuration– A set of properties and flags that are set in an ASCII text file that

initialize the service at runtime

• Properties configure– The underlying subsystems of the PCS service

• For file manager, properties configure e.g., – Data transfer chunk size– Whether or not the catalog database should use quoted strings for

columns– What subsystems are actually chosen (e.g, database versus Lucene,

remote versus local data transfer)

• Can we see an example?

22-Feb-12 NCAR-SEA-2012 32

Page 33: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

The concept of “production rules”• Production rules are common terminology to

refer to the identification of the mission specific variation points in– PGE pipeline processing– Product cataloging and archiving

• So far, we’ve discussed– Configuration – Policy

• Policy is one piece of the puzzle in production rules

22-Feb-12 NCAR-SEA-2012 33

Page 34: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Production rule areas of concerns1. Policy defining file ingestion

1. What metadata should PCS capture per product?2. Where do product files go?

2. Policy defining PGE data flow and control flow3. PGE pre-conditions 4. File staging rules5. Queries to the PCS file manager service

1-5 are implemented in PCS (depending on complexity) as either:

1. Java Code2. XML files3. Some combination of Java code and XML files

22-Feb-12 NCAR-SEA-2012 34

Page 35: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

PCS Task Wrapper aka CAS-PGE

•Gathers information from the file manager

•Files to stage•Input metadata (time ranges, flags, etc.)

•Builds input file(s) for the PGE•Executes the PGE•Invokes PCS crawler to ingest output product and metadata•Notifies Workflow and Resource Managers about task (job) status•Can optionally

•Generate PCS metadata files

22-Feb-12 NCAR-SEA-2012 35

Page 36: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Some relevant experience with NRAO: EVLA prototype

• Explore JPL data system expertise– Leverage Apache OODT– Leverage architecture experience– Build on NRAO Socorro F2F given in April 2011 and

Innovations in Data-Intensive Astronomy meeting in May 2011

• Define achievable prototype– Focus on EVLA summer school pipeline

• Heavy focus on CASApy, simple pipelining, metadata extraction, archiving of directory-based products

• Ideal for OODT system

22-Feb-12 NCAR-SEA-2012 36

Page 37: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Architecture

22-Feb-12 NCAR-SEA-2012 37

Page 38: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Pre-Requisites

• Apache OODT– Version: 0.3– JDK6, Maven2.2.1

• Stock Linux box

22-Feb-12 NCAR-SEA-2012 38

Page 39: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Installed Services• File Manager

– http://ska-dc.jpl.nasa.gov:9000• Crawler

– http://ska-dc.jpl.na.gov:9020• Tomcat5

– Curator: http://ska-dc.jpl.nasa.gov:8080/curator/– Browser: http://ska-dc.jpl.nasa.gov/ – PCS Services: http://ska-dc.jpl.nasa.gov:8080/pcs/services/ – CAS Product Services: http://ska-dc.jpl.nasa.gov:8080/fmprod/ – Workflow Monitor: http://ska-dc.jpl.nasa.gov:8080/wmonitor/

• Met Extractors– /usr/local/ska-dc/pge/extractors (Cube, Cal Tables)

• PCS package– /usr/local/ska-dc/pcs (scripts dir contains pcs_stat, pcs_trace, etc.)

22-Feb-12 NCAR-SEA-2012 39

Page 40: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Demonstration Use Case

• Run EVLA Spectral Line Cube generation– First step is ingest EVLARawDataOutput from Joe– Then fire off evlascube event– Workflow manager writes CASApy script dynamically

• Via CAS-PGE– CAS-PGE starts CASApy– CASApy generates Cal tables and 2 Spectral Line Cube

Images– CAS-PGE ingests them into the File Manager

• Gravy: UIs,Cmd Line Tools, Services

22-Feb-12 NCAR-SEA-2012 40

Page 41: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: Workflow Monitor

22-Feb-12 NCAR-SEA-2012 41

Page 42: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: Data Portal

22-Feb-12 NCAR-SEA-2012 42

Page 43: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: Prod Browser

22-Feb-12 NCAR-SEA-2012 43

Page 44: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: PCS Trace Cmd Line

22-Feb-12 NCAR-SEA-2012 44

Page 45: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: PCS Stat Cmd Line

22-Feb-12 NCAR-SEA-2012 45

Page 46: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: PCS REST Services: Trace

curl http://host/pcs/services/pedigree/report/flux_redo.cal

22-Feb-12 NCAR-SEA-2012 46

Page 47: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: PCS REST Service: Health

curl http://host/pcs/services/health/report

Read up on https://issues.apache.org/jira/browse/OODT-139Read documentation on PCS services: https://cwiki.apache.org/confluence/display/OODT/OODT+REST+Services

22-Feb-12 NCAR-SEA-2012 47

Page 48: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: RSS feed of prods

22-Feb-12 NCAR-SEA-2012 48

Page 49: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Results: RDF of products

22-Feb-12 NCAR-SEA-2012 49

Page 50: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Who’s doing what?• Children’s Hospital Los Angeles

– Improving upon XMLPS, and CAS (Andrew Hart + Ricky Nguyen will talk about this)– Supporting data analytics

• Google– Brian Foster working on command line improvements and data protocol push/pull

• SKA South Africa– Deploying file manager and crawler for use in KAT-7 pipeline ingestion

• NIH/NCI– Maintaining the XMLPS components, and CAS components– Helping with user interfaces

• Various JPL and NASA research projects– OPeNDAPps, XMLPS

• Various NASA missions– Workflow, PCS, services, OPSui, other web apps

22-Feb-12 NCAR-SEA-2012 50

Page 51: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Latest release: 0.3• First appearance of PCS

– Core, Services (JAX-RS)

• Web Applications– Balance (PHP), and Wicket (Java)-based apps for file

management and workflow monitoring

• First release deployed to Maven Central– We did backport 0.2 there after this– Over 60 issues fixed in JIRA

• June 2011: recommended stable release

22-Feb-12 NCAR-SEA-2012 51

Page 52: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Working on: 0.4• Operator Interface (OODT-157)

– Andrew Hart and I will talk about this

• Workflow2 integration (OODT-215) and all of its sub-issues– Global workflow conditions, dynamic workflows, parallel/sequential

model, new workflow engine, etc.

• OODT RADIX for super easy deployment (OODT-120)– Paul Ramirez and Cameron Goodale will discuss this

• Solr sync with File Manager (OODT-326)• Improvements to XMLPS (OODT-333) and new crawler actions

(OODT-33, OODT-34, OODT-35, OODT-36, OODT-37)• Over 48 issues currently resolved• Likely to come before end of Q2 2012

22-Feb-12 NCAR-SEA-2012 52

Page 53: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Some Grand Challenges I’m interested in: OODT can help!

• How do we handle 700 TB/sec of data coming off the wire when we actually have to keep it around?– Required by the Square Kilometre Array

• Joe scientist says I’ve got an IDL or Matlab algorithm that I will not change and I need to run it on 10 years of data from the Colorado River Basin and store and disseminate the output products– Required by the Western Snow Hydrology project

22-Feb-12 NCAR-SEA-2012 53

Page 54: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Some Grand Challenges I’m interested in: OODT can help!

• How do we compare petabytes of climate model output data in a variety of formats (HDF, NetCDF, Grib, etc.) with petabytes of remote sensing data to improve climate models for the next IPCC assessment?– Required by the 5th IPCC assessment and the Earth

System Grid and NASA

• How do we catalog all of NASA’s current planetary science data?

22-Feb-12 NCAR-SEA-2012 54

Page 55: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Key Takeaway

OODT is already doing and/or preparing the world to handle all of these diverse use cases!

It’s a constantly evolving and improving framework – join up and help.

It’s free and open source from Apache and helping government demonstrate the public good

22-Feb-12 NCAR-SEA-2012 55

Page 56: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

OODT Project Contact Info• Learn more and track our progress at:

– http://oodt.apache.org – WIKI: https://cwiki.apache.org/OODT/ – JIRA: https://issues.apache.org/jira/browse/OODT

• Join the mailing list:– [email protected]

• Chat on IRC:– #oodt on irc.freenode.net

• Acknowledgements– Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes,

Andrew Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn, Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid

– Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network, Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid Federation

22-Feb-12 NCAR-SEA-2012 56

Page 57: The Apache OODT Ecosystem: A Birds Eye View Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant Professor, Univ

Alright, I’ll shut up now

• Any questions?

• THANK YOU!– [email protected] – @chrismattmann on Twitter

22-Feb-12 57NCAR-SEA-2012