10
SC5 1 st Pilot Hangout

SC5 Hangout2 pilot 1 description

Embed Size (px)

Citation preview

Page 1: SC5 Hangout2  pilot 1 description

SC5 1st Pilot Hangout

Page 2: SC5 Hangout2  pilot 1 description

To demonstrate what can be achieved through the BDE platform in:Managing large volumes of climate / weather numerical dataIngestion / exporting of dataAnalytics potentialData lineage

BASIC AIM

Page 3: SC5 Hangout2  pilot 1 description

Downscaling Downscaling of climatic and / or meteorological data:

o Essential first step for any further analysis, assessment or processing in climate and related domains

Page 4: SC5 Hangout2  pilot 1 description

BDE SC5 Pilot I - ArchitectureCassandraMetadata & data lineage

Hive/Hadoop

Raw data & analytics

WRF ModelInstitutional

resource connectors

NetCDFInterfaces

and visualisationSC5

Pilot

Page 5: SC5 Hangout2  pilot 1 description

Current status Operations

o Data ingestion (NetCDF files) Both manually, for bootstrapping, as well as after downscaling

o Data export (NetCDF files) Selection of variables / time slices

o Start and monitor WRF-based downscaling on institutional resources If requested results already exist, they are retrieved If not, WRF is started

o Maintain data lineage records on BDE platform Monitoring and further analysis Subset of W3C PROV, http://www.w3.org/TR/prov-overview

Page 6: SC5 Hangout2  pilot 1 description

Current statuso Support basic analytics on BDE

Hive querieso Console-based UI

Python/Jupyter interface for demonstration

Page 7: SC5 Hangout2  pilot 1 description

Sample analytics Climate-change indices / analytics (indicative)

o Number of summer days, frost days o Tropical nights o Monthly minimum value of daily maximum temperatureo Precipitation-based statisticso Etc.

Analytics for other applicationso Comfort indices (temperature – humidity)o Risk for forest fires (wind speed – temperature – humidity)o Atmospheric pollution (wind speed – vertical gradient of

temperature – heat fluxes )o Etc.

Page 8: SC5 Hangout2  pilot 1 description

Further pilot development Investigation regarding transparent

climate NetCDF transformation tailored to the WRF model, using the BDE integrator (esp. Spark)

Testing and further development regarding data lineage and downscaling parameterisation and execution

Page 9: SC5 Hangout2  pilot 1 description

Expected added value Scalability and ease in managing large

data sets Efficient use of institutional resources in

performing downscaling computationso Avoiding calculating products when not

needed Data lineage

o either for existing data in the database, or for data that are not present anymore

o reproducibility

Page 10: SC5 Hangout2  pilot 1 description

Hands-on The jupyter notebook is accessible at:

o https://143.233.226.108

(please bypass the warnings)