15
SSRL/LCLS Users Meeting SLAC 2014-10-09 Filipe R. N. C. Maia Lab. of Mol. Biophysics Uppsala University Data collection at XFELs: the Data Deluge

SSRL/LCLS Users Meeting SLAC 2014-10-09

Embed Size (px)

DESCRIPTION

Data collection at XFELs: the Data Deluge. SSRL/LCLS Users Meeting SLAC 2014-10-09. Filipe R. N. C. Maia Lab. of Mol. Biophysics Uppsala University. Data Deluge. - PowerPoint PPT Presentation

Citation preview

SSRL/LCLS Users MeetingSLAC

2014-10-09

Filipe R. N. C. MaiaLab. of Mol. Biophysics

Uppsala University

Data collection at XFELs:the Data Deluge

Data Deluge

Special Issue:Dealing with Data, Science, February 2011

“Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.”

-- Science Staff

Data Production vs. Storage

• “In 2007 the world produced more data that could fit in the world’s storage”.

• Data is growing

at 58% while

storage is growing

at 40%.

• The sensor (or

the experiment) is

no longer the

bottleneck.Baraniuk, R.G. More is less: signal processing and the data deluge. Science (New York, N.Y.) 331, 717-9 (2011).

Fast X-ray Detectors

CXI detector,frame rate 120 Hz

AGIPD-1M,frame rate ~3500 Hz

Tales of a Particle Physicist

Sigfried Bethke working on the JADE detector in 1984.

Curry, A. Rescue of old data offers lesson for particle physicists. Science (New York, N.Y.) 331, 694-5(2011).

The JADE Lead-Glass

Electromagnetic Shower Detector

The Petra ring at DESY

The outcome of the archaeological endeavor.

It should not be this hard!

The detector stopped, but is the experiment really over?

Data Availability

• “Science is driven by data.”

• “…making data widely available is an essential element of scientific research.”

• “it is a growing challenge to ensure that data… are appropriately described, standardized, archived and available to all.”• “We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much improved data curation.”

Hanson, B., Sugden, A. & Alberts, B. Making data maximally available. Science (New York, N.Y.) 331, 649 (2011).

http://cxidb.org

Coherent X-ray Imaging Data Bank

Coherent X-ray Imaging Data Bank

Goals:

• Help to maximize results obtained from data

• Facilitate the reproducibility of results

• Allow theoreticians to test their ideas on real data

• Expand the community beyond those lucky few who got beamtime.

• Preserve datasets for future analysis.

Reproducibility

• CXIDB aims to foster reproducible research.

Jon Claerbout, geophysics professor at Stanford

• Experimental data is crucial to reproducibility

• CXIDB also maintain a list of relevant software

“An article about computational science in a scientific publication isn’t the scholarship itself, it’s merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.”

-- Jon Claerbout

Maximize Efficiency of the Facilities

• 4th generation Light sources produce mountains of data

• They are serial, as opposed to synchrotron’s parallel nature.

• The result is a few groups with too much data.

• While others are data starved.

• Sharing is a clear solution.

Challenges to Data Sharing

• Lack of adequate rewards.

• Not viewed as a valuable

scholarly endeavour

• Afraid to be scooped

• Certain disciplines manage

to overcome this: astronomy,

oceanography, genomic.