27
Netherlands eScience Center Optimizing discovery in the big science era www.esciencecenter.nl Prof. Dr. Wilco Hazeleger

Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Netherlands eScience Center

Optimizing discovery in the big science era

www.esciencecenter.nl

Prof. Dr. Wilco Hazeleger

Page 2: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

The world around us

• Science and society are intimately

connected

Science becomes increasingly problem-

driven

Science increasingly inter-, multi-, trans-

disciplinary

Page 3: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

The Big Science era

Page 4: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Mission

Enabling digitally enhanced research through

efficient use of scientific software, data, and e-

infrastructure

Page 5: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Application DomainsLife Sciences & eHealth, Environment & Sustainability,

Humanities & Social Sciences, Physical World & Beyond

e-InfrastructureComputing, Networking

Storage & Visualization

Page 6: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Organisation

Founding organisations (since 2011).

NWO – Netherlands Organisation for Scientific Research (2.7 M€ p.a.)SURF – Dutch higher education and research partnership for ICT (2.7 M€ p.a.)

Collaborative projects between NLeSC, academic partners and industry whichinclude our digital scientists; cash and in kind.

NLeSC research program on generic eScience concepts and tools.

Page 7: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

NLeSC priority domains (demand-driven from science)

I. Environment & Sustainability

Climate, ecology, energy, logistics,

water management, agriculture & food

II. Life Sciences & eHealth

Next generation sequencing,

biobanking, molecules & man

III. Humanities & Social Sciences

SMART cities, text analysis, eBusiness,

creative technologies

IV. Physics and beyond

Astronomy, high-energy physics,

advanced materials, engineering

& manufacturing

Page 8: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

NLeSC eScience competences applied in science

Big data analyticsStatistics, machine learning, visualisation, text mining

Optimized data handlingData base optimization, structured & unstructured data, real time data

Efficient computingDistributed & acceleratedcomputing, efficient algorithms

Page 9: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

– broad oriented scientists at the interface of research and IT

– collaborating with domain researchers to implement eScience concepts and tools

– mostly PhDs with domain knowledge and IT skills

– Involved in projects, funded in cash and in kind

eScience Research Engineers = Digital Scientists

Page 10: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

eScience Technology Platform

Core of NLeSC expertise; promotes exchange

and re-use of best practices

• Repository

– compute kernels, interfaces, libraries, tools, and scientific

workflows

• Knowledge base

– professional coding standards, coding styles, unit and

integration testing, and documentation

• Expertise center & meeting place

Page 11: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &
Page 12: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Collaborative Project Examples

Astronomy

Project Leader: Marco de Vos, ASTRON

Neuroimaging

Project Leader: Paul Tiesinga, Univ. of Nijmegen

eChemistry

Project Leader: Lars Ridder, Univ. of Wageningen

eScience Engineer: Marijn Sanders

Climatology

Project Leader: Henk Dijkstra, Univ. Utrecht

eScience Engineer: Jason Maassen

eEcology

Project Leader: Willem Bouten, Univ. Amsterdam

eFood Research

Project Leader: Wynand Alkema, Univ. Nijmegen

Life Sciences

Project Leader: Jan Willem Boiten, CTMM

Water Management

Project Leader: Prof. Nick van de Giesen, TU Delft

eHumanities

Project Leader: Guus Schreiber, Free Univ. Amsterdam

Green Genetics

Project Leader: Bernard de Geus, TTi Green Genetics

Page 13: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Collaborative Project Examples

Massive Point Clouds

Project Leader: Peter van Oosterom, Delft

University of Technology

Sim-City

Project Leader: Peter Sloot, UVA

SPuDisc

Project Leader: Maarten de Rijke, Univ Amsterdam

Summer in the City

Project Leader: Bert Holtslag, Univ. of Wageningen

eVisualization

Project Leader: Edwin Valentijn, Univ. Groningen

TwiNL

Project Leader: Antal van de Bosch, Univ. of Nijmegen

ODEX4All

Project Leader: Barend Mons, Leiden Univ.

eSiBayes

Project Leader: Willem Bouten, Univ. Amstedam

AMUSE

Project Leader: Simon Portegies Zwart, Leiden Univ.

Via Appia

Project Leader: Henk Scholten, VU Amsterdam

Page 14: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

e-Ecology

NLeSC and UVA (Prof. W. Bouten)

Page 15: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Annotation tool and learning

Page 16: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Annotation tool and learning

Page 17: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Acceleration data to behavior

Machine learning

– Labeled train set

– Trained model

Schema from Natural Language Processing with Python, by Steven

Bird, Ewan Klein and Edward Loper, Copyright © 2009

Page 18: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

e-Food

NLeSC and Prof. Alkema (Radboud Univ Nijmegen) and Dr Tops (VU and WUR)

Page 19: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Literature sources

Taste Ontologye.g. sweet, sour, bitter,umami, salty,

ropiness, TASR1

Ingredient ontologye.g. mannitol,sucrose,sorbitol,

alpha-terpineol, 4-

methylpentanoic acid,ethyl

propionate,flavonoid,caffeine

tag tag

Calculate

Compoundontology profiles

store

Classifying compounds according to

taste

~500 terms

~40.000 terms

Derived from ChEBI

A number of known food proteins

24 million

scientific

abstracts

Page 20: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Point clouds

NLeSC with TU-Delft

Page 21: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Point clouds

• Set of data points in some coordinate system

• In 3D coordinate system, points defined by X, Y, Z

• Possible to have more attributes. Ex: Color (R, G, B)

Page 22: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

• NL surface

• 640 billion points

• 6 – 10 points per m

• 12 attributes

• 20 bytes/point

• 60000 files

2

Actual Height Model of the Netherlands (AHN2):

Massive point clouds for eSciences

LAS 11.64 TB

LAZ 1 TB

Page 23: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

• Loading

• Organization

• Indexing

• Clustering

• Blocking

• Compressing

• Querying

• Parallel processing

• Level of detail / Data pyramid

Point could data bases

Page 24: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Point cloud databases

Massive point clouds for eSciences

Page 25: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

e-Watercycle

NLeSC and TU-Delft, Utrecht Univ (Prof. N. vd Giessen)

Page 26: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Enabling digitally enhanced research

through efficient use of scientific

software, data and e-infrastructure

• Deals with data, data, data…and computing

• Domain overarching solutions needs cross-

discipline expertise, well defined interfaces,

and standardization

– eScience technology platform: software & expertise

– Application in domain sciences

Page 27: Optimizing discovery in the big science era · 2015-03-12 · Optimizing discovery in the big science era ... NLeSC priority domains (demand-driven from science) I. Environment &

Thank you