22
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann The EarthServer Approach to Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University | rasdaman GmbH Bremen, Germany [email protected] Supported by EU FP7 eInfrastructure, contract 283610 EarthServer

The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

The EarthServer Approach

to Big Data AnalyticsEGU 2013 | Splinter Meeting

Vienna, Austria, 2013-apr-10

Peter Baumann & the EarthServer Team

Jacobs University | rasdaman GmbH

Bremen, Germany

[email protected]

Supported by EU FP7 eInfrastructure, contract 283610 EarthServer

Page 2: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Roadmap

Motivation

Strictly Standards

Platform technology: the rasdaman analytics server

Conclusion

Page 3: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Scalable On-Demand Processing for the Earth Sciences• EU funded, 3 years, 5.85 mEUR

• Platform: rasdaman (Array Analytics server)

• Distributed query processing, integrated data/metadata search, 3D clients

Strictly open standards: OGC WMS+WCS+WCPS; W3C Xquery; X3D

6 * 100+ TB databases for all Earth sciences + planetary science

EarthServer: Big Earth Data Analytics

Page 4: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

„Big Data“: The 4+ Vs

Volume• ngEO plannings 10^12 images under ESA custody

• ECHAM-T40 climate dataset object ~50 TB

Velocity• NASA EOSDIS: 5 TB / day

Variety• Regular / irregular / warped grids; point clouds; TINs; general meshes; ...

Veracity• Quality information, such as 32bit ocean color mask

...plus several more Vs in blogs:• Value, Verisimilitude, Variability, Visualization, ...

[Doug Laney / Gartner & IBM]

Page 5: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

A Standards-Based Approach

Core element in OGC: geographic feature• = abstraction of some real-world phenomenon

• associated with a location relative to Earth

Special kind of feature: coverage• = space-time varying multi-dimensional phenomenon

• Typical representative: raster image

• ...but there is more!

Typically, coverages are Big Data

Page 6: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

MultiSolid

Coverage

n-D

Variety of format encodings

The Unifying Coverage Concept

«FeatureType»AbstractCoverage

MultiPoint

Coverage

MultiCurve

Coverage

MultiSurface

Coverage

Grid

Coverage

Referenceable

GridCoverage

Rectified

GridCoverage

Page 7: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Facing the Coverage Tsunami

coverage

server

7

sensor feeds[OGC SWE]

Page 8: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

[OGC SWE]sensor feeds

Taming the Coverage Tsunami

8

coverage

server

Page 9: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Serving Coverages

GMLCOV & WCS:

one generic schema for all coverage

types; n-D; scalable; versatile processing

→ access & processing services

SWE SOS:

high flexibility to accommodate

all sensor types

→ data capturing

SOS WCS

coverage

server

Page 10: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Web Coverage Service (WCS)

Core: Simple & efficient access to multi-dimensional coverages

subset = trim | slice

WCS Extensions for additional functionality facets• “band extraction”, scaling, reprojection, interpolation, query language, ...

Application Profiles define domain-oriented bundling• EO, MetOcean, ...

10

see next

Page 11: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics

Time series

Image processing

Summary data

Sensor fusion

& pattern mining

OGC Web Coverage Processing Service (WCPS)

Page 12: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Array DBMS for massive n-D raster data• new database attribute type: array<celltype,extent>

• Data integration: rasters stored in standard database

Extending ISO SQL with array operators: •

“tile streaming” architecture• extensive optimization, hw/sw parallelization

• Scaling from laptop to cloud

In operational use

The rasdaman Raster Analytics Server

select img.green[x0:x1,y0:y1] > 130

from LandsatArchive as img

www.rasdaman.org

Page 13: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Ex: Spatio-Temporal Image Services

[Diedrich et al 2001]

Page 14: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Configurable Tiling

Sample tiling strategies [Furtado]:• regular directional area of interest

rasdaman storage layout language

insert into MyCollection

values ...

tiling area of interest [0:20,0:40], [45:80,80:85]

tile size 1000000

index d_index storage array compression zlib

Page 15: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

In-Situ Databases

Problem: Large-scale data centers sometimes object to import• „cannot duplicate my 100 TB“

Approach: reference external files, use as tiles; cf [Ailamaki et al 2010]

Challenges: efficiency, consistency, caching, ...• Data simultaneously used by other actors in parallel !

• In future: optimized storage of hot spots in DB

insert into PanchromaticPics (GreyImage)

referencing /my/path/ext1.tif [1000:1999,3000:3999], …

Page 16: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Local: Just-in-time compilation, Multi-threaded tile processing

Global: Cloud parallelization• no single point of failure

• semantic distribution

coverageA

for $b in ( B )

return max(

($b.nir - $b.red) / ($b.nir + $b.red)

)

for $a in ( A )

return max(

($a.nir - $a.red) / ($a.nir + $a.red)

)

Scalability: Parallel Query Processing

coverageB

[Owonibi 2012]

for $a in ( A ), $b in ( B )

return

( max( ($a.nir - $a.red) / ($a.nir + $a.red) )

- max( ($b.nir - $b.red) / ($b.nir + $b.red) )

)

Page 17: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Building Service Alliances: UDFs

User-Defined Function (UDF) = external code, invoked through query

Anything can be interfaced: R, GDAL, Orfeo Toolbox, ... you name it!• Local or remote

Integrated with rasdaman optimizer

select

encode( orfeo.orthorectify( L0.red – L0.nir ), "png" )

from L0Scenes as L0

Page 18: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Visual Queries

[ESA VAROS project, 2010]

Page 19: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

3D Database Visualization

Problem: coupling DB / visualization

Approach: • deliver RGBA image to X3D client,

transparency as height

• Feed directly into client GPU

select

encode(

{ red: (char) s.b7[x0:x1,x0:x1],

green: (char) s.b5[x0:x1,x0:x1],

blue: (char) s.b0[x0:x1,x0:x1],

alpha: (char) scale( d, 20 )

},

"png"

)

from SatImage as s, DEM as d [JacobsU, Fraunhofer 2012]

Page 20: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Rasdaman Impact

Has pioneered Array Database researchfirst appearance in literature

Page 21: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Rasdaman Impact

Has established the research field of Array Databases

Registered GEOSS component for Big Data Analytics

OGC WCS Core, OGC WCPS reference implementation

Experience heavily drives OGC coverage & ISO Array standardization

rasdaman hits 2013+

Page 22: The EarthServer Approach to Big Data Analyticsto Big Data Analytics EGU 2013 | Splinter Meeting Vienna, Austria, 2013-apr-10 Peter Baumann & the EarthServer Team Jacobs University

EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann

Conclusion

Sensor, image, simulation, & statistics data

= a main source of Big Data in Earth Sciences• Petrol industry has „more bytes than barrels“

OGC W*S standards offer coverage interoperability• www.ogcnetwork.net/wcs

Core paradigm: high-level integrated query language as c/s interface• data + metadata search in same query

• Semantic interoperability through standardized syntax & semantics

• Visual clients can hide QL

• Manifold optimizations, parallelization on rasdaman server

EarthServer introduces Agile Analytics

www.earthserver.eu