Upload
precious-odonnell
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
OPeNDAP in the CloudOptimizing the Use of Storage Systems Provided by Cloud Computing Environments
OPeNDAPJames Gallagher, Nathan Potter
and
NOAA/NODCDeirdre Byrne, Jefferson Ogata, John Relph
26 June 2013
Cloud Systems Now*
•Providers: IBM, Microsoft, Amazon, Google, Rackspace, …
•Microsoft: Azure “…handles 100 petabytes of data a day”
•Amazon: “…hundreds of thousands of users”•Netflix: “…stopped building it’s own data
centers in 2008;” all in Amazon by 2012•Snapchat: 4000 pictures per second; “…never
owned a computer server.” (Google cloud)
*Quentin Hardy, “Google Joins a Heavyweight Competition in Cloud Computing,” NY Times, 3 December 2013
• TheOPeNDAP request smaller and is just the data the person wants
• In cloud systems cost is a function of data transfer, in addition to to data stored, so smaller targeted requests reduce costs
OPeNDAP request
4% Download
Full dataset
100% Download
Why use OPeNDAP?
NOAA Environmental Data Management Conceptual Cloud Architecture*
Potential locations of cloud-enabled OPeNDAP instances
*Aadapted from NOAA Environmental Data Management Framework Draft v0.3Appendix C - Dr. Jeff de La Beaujardière, NOAA Data Management Architect
• No vendor lock-in! • No Stovepipes! - flexible storage method
• What will be the client of 2020?• Hierarchical/human browsable
Constraints
file
dataset
file file
Data stores: S3 and Glacier•S3
• Spinning disk with a flat file system• Designed to make web-scale computing easier
•Glacier• Near-line device with 4-hour (or >) access times• Secure and durable storage
•EC2• EC2 was used to run the OPeNDAP data server• Linux
Using S3 as a Data Store
Catalog
Data
S3HTTP GET & HEAD requests
Web requests
S3
Catalog, or data request
XML or data file
To enhance performance, data were accessed from S3 only when not already cached.
OPeNDAP Catalog requests
S3OPeNDAP
Server
catalogcache
XML File
User catalogRequest Catalog Access
THREDDScatalog or HTML
EC2
datacache
To enhance performance, data were accessed from S3 only when not already cached.
OPeNDAP Data requests
S3OPeNDAP
Server
catalogcache
Data File
User dataRequest Data Access
Data Slice
EC2
datacache
Observations
• S3FS & Amazon's APIs: vendor lock-in
• XML catalogs were flexible: • Support both direct web and…
• Subsetting server access
• Likely adaptable to other use-cases
• Easily support hierarchical structure
• Catalogs didn't need to be stored in S3
Glacier and Asynchronous Responses
• To use Glacier, a web service protocol must support asynchronous access! Glacier is a near-line device; not a spinning disk.
• Support via protocol is not enough: typical use cases cannot be met without caching ‘metadata’o To support web interfaces/clients DAP metadata
objects should be cachedo To support smart clients, may need domain data in
cache
Glacier Implementation
• Cachingo Catalogo DAP metadata
• Support for programmatic and web clientso Web clients are the primary user of the DAP
metadata because of their ‘click and browse’ behavior
• XML with an embedded XSL style sheeto Single response (XML) o Multiple target clients – smart and browser
Comparison: S3 and Glacier*
•Glacier provides “secure and durable storage”•S3 is “designed to make web-scale computing
easier”•These graphs: A tiny part of complex cost model.
They do not include the cost to move data out of the Amazon cloud, EC2 instances, etc.
*http://calculator.s3.amazonaws.com/calc5.html
Summary
• OPeNDAP server with minimal changes • Data stored in S3 and Glacier• Solution widely applicable: Web + Smart
clients• Complexity of the cost model combination
of both S3 and Glacier likely• Modeling & Monitoring use required