AMLIGHT, Simulation Datasets, and Global Data …TeraShake vs. M8 comparison Terashake M8 Notes...


Citation preview

AMLIGHT, Simulation Datasets, and Global Data Sharing

Jean-Bernard Minster(1,2,4,6), John J. Helly(1,2), Steven M. Day(3,4), Raul Castro Escamilla(5),

Philip Maechling(4),Thomas H. Jordan(4), Amit Chourasia(2,4), Mustapha Mokrane(6)


10/10/13 AMLIGHT, Big Data, Big Network, CICESE

“Open data”   Many countries have adopted an open data

policy, at least for research and education (e.g. US, France, UK, ZA, etc.)

  This often includes the output of numerical models and simulations.

  But, because of different laws, large international organizations discuss “principles” instead of “policy”.

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 2

Data Sharing Policy   ICSU World Data Centers (1958-2007)

  Federation of Astronomical and Geophysical Data Analysis Services (1958-2007)

 “Full and Open access to data”

 “Long-term data Stewardship and curation”

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 3

Data Sharing Principles   Group on Earth Observations (GEO, 130+ nations) / Global

Earth Observation System of Systems (GEOSS). 2010-present.

 Equitable, unimpeded access to data for research and education

  Long-term data preservation

 Many exceptions (National security, privacy laws, commercial protection, ecological protection)

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 4

Data Sharing Policy  ICSU World Data System Data Policy


 “Full and Open access to data”

 “Long-term data Stewardship”

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 5

WDS Data Policy

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 6

Research Data Alliance and WDS (RDA/WDS, 2013)

 Include socio-economic, health, and other data in policy discussions

 Explore data publishing concepts and issues

 Collaboration with publishers

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 7

 This works for observational data in the natural sciences, especially environmental data, that can never be acquired again…

 Perhaps also for socio-economic, and human health data sets (with caveats, so as aggregation)…

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 8


Private Sector

Measurement Systems

International Networks


Observations &Data Collection

Integration &Validation

Distribution& Use

Models &Analysis Centers

Quality Assurance




End user(public)

End user(private)

Data buy

Public data

Distribution(full & open)



ArchiveSynthesized Core Products

The Environmental Information System Tree

AMLIGHT, Big Data, Big Network, CICESE 9 Francis Bretherton

 What about numerical simulation outputs?

  Issues are many, and difficult, e.g.:   Volume (can be enormous)   Quality (how is it measured and controlled?)   Metadata (what should be included?)   Costs (is it cheaper to re-compute?)   Needs (longitudinal studies, vs. punctual studies)   Requirements for data assimilation

  Examples: weather prediction, climate simulations, earthquake simulations, earthquake prediction algorithms

  This calls for a broad discussion

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 10

Minimalist Metadata (automatic capture)

  Code version

  HW platform (e.g. CPU, GPU, word length, etc)

  SW Platform (e.g compiler, options)

  Input and runtime options (workflow?)

  Other (Author, etc, Dublin core)

Even then, output might not be duplicated in future re-run. Many numerical outputs become obsolete.

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 11

TeraShake Simulation (2004)

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 12


M8 Simulation (2010) 10/10/13 AMLIGHT, Big Data, Big Network, CICESE 13


TeraShake vs. M8 comparison

Terashake M8 Notes

Dimensions 600x300x80 km 810x405x85 km

# cells 2 109 436 109

Time step 0.011 sec. 0.0023 sec.

# steps (Duration) 20,000 180 sec.

160,000 368 sec.

# cores 240 (Datastar) 223,074 (CPU) 16,600 (GPU)

Wall clock 5 days 24 hours (CPU)* 5 hours (GPU)**

*220 Tflop/s **2.3 Pflop/s

Checkpoints Every 1,000th step Every 20,000th step

Checkpoints, each 150 Gbytes 32 Tbytes Cannot transfer

Checkpoints, total 3 Tbytes 192 Tbytes* *Every 4 hrs

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 14

TeraShake vs. M8 comparison

Terashake M8 Notes

Surface Velocity vector field

All nodes, every step: 1.1 TB

Every other node, every 20th step: 4.4 TB (out of 352 TB)

Resolution OK for visualization

Total volume velocity field, all nodes, all steps

432 Tbytes 384 Pbytes

Volume velocity field, decimated

All nodes, every 10th step: 45 Tbytes **

Every other node, Every 20th step 4,8 Pbytes

**No longer usefully readable, because of tape read errors

Typical Viz. movie <100 Gbytes < 100 Gbytes Interactive Viz. possible

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 15

So…what to save?   Possible strategy: Only save enough to allow interactive

(user or purpose-specific) visualization, and use checkpoints to restart partial calculation. This works for punctual simulations (e.g. 1-day weather, single earthquake). AMLIGHT permits that.

  Save selected individual visualizations that characterize the run (small size data sets). AMLIGHT makes it easy.

  For long-term longitudinal research, such as climate research or earthquake prediction algorithms, some output may require long-term curation by a trusted repository… This must be discussed on a case-by-case basis. AMLIGHT makes the data repository look proximal.

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 16

TeraShake Visualization

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 17

Emmett MQuinn, Amit Chourasia

M8 Visualization

10/10/13 AMLIGHT, Big Data, Big Network, CICESE 18
