Upload
bigdataeurope
View
166
Download
2
Embed Size (px)
Citation preview
SC5 1st Pilot Hangout
To demonstrate what can be achieved through the BDE platform in:Managing large volumes of climate / weather numerical dataIngestion / exporting of dataAnalytics potentialData lineage
BASIC AIM
Downscaling Downscaling of climatic and / or meteorological data:
o Essential first step for any further analysis, assessment or processing in climate and related domains
BDE SC5 Pilot I - ArchitectureCassandraMetadata & data lineage
Hive/Hadoop
Raw data & analytics
WRF ModelInstitutional
resource connectors
NetCDFInterfaces
and visualisationSC5
Pilot
Current status Operations
o Data ingestion (NetCDF files) Both manually, for bootstrapping, as well as after downscaling
o Data export (NetCDF files) Selection of variables / time slices
o Start and monitor WRF-based downscaling on institutional resources If requested results already exist, they are retrieved If not, WRF is started
o Maintain data lineage records on BDE platform Monitoring and further analysis Subset of W3C PROV, http://www.w3.org/TR/prov-overview
Current statuso Support basic analytics on BDE
Hive querieso Console-based UI
Python/Jupyter interface for demonstration
Sample analytics Climate-change indices / analytics (indicative)
o Number of summer days, frost days o Tropical nights o Monthly minimum value of daily maximum temperatureo Precipitation-based statisticso Etc.
Analytics for other applicationso Comfort indices (temperature – humidity)o Risk for forest fires (wind speed – temperature – humidity)o Atmospheric pollution (wind speed – vertical gradient of
temperature – heat fluxes )o Etc.
Further pilot development Investigation regarding transparent
climate NetCDF transformation tailored to the WRF model, using the BDE integrator (esp. Spark)
Testing and further development regarding data lineage and downscaling parameterisation and execution
Expected added value Scalability and ease in managing large
data sets Efficient use of institutional resources in
performing downscaling computationso Avoiding calculating products when not
needed Data lineage
o either for existing data in the database, or for data that are not present anymore
o reproducibility
Hands-on The jupyter notebook is accessible at:
o https://143.233.226.108
(please bypass the warnings)