Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Data Centric Systems
David Turek
VP High Performance and Cognitive Computing
© 2016 International Business Machines Corporation
R&Din the
IT Era
Theory/Knowledge
Experiment
Simulation
Massive improvements:
• Applicability & ease of use• Simulation fidelity• Scalability• Throughput
thanks to Supercomputing
Data Centric Systems
3Source: Top500.org
Implementing Exact-Exchange in CPMD
>99% Parallel Efficiency to over 6.2M threads
Studying Li-Air Batteries, 1736 atoms, 70Ry cuttof
V. Weber, T. Laino, C. Bekas, A. Curioni, A. Bertsch, S. Futral IPDPS 13
Data Centric Systems
4
ACM Gordon Bell Prize 2013
14.4 PFLOP/S @73% of peak perf., with I/O
2 orders of magnitude improvement in
• scale of the problem (from 128 to 15K bubbles)
• time to solution
Compute specifics:
13 Trillion elements, 1.2TBytes compressed I/O
per time step, 6.4 M threads
IBM, ETHZ, TUM, LLNL
Success in Petascale computing: CFD can achieve Linpack like sustained performance
Data Centric Systems
5
ACM Gordon Bell Prize 2015
97% of sustained scalability for
a fully implicit solver. 1.6M cores
3.2M MPI processes
602B DoF,
IBM, UT Austin, NYU, CALTECH
Success in Petascale computing: Implicit linear solvers do scale!
© 2016 International Business Machines Corporation
6
New
Product
Opportunistic
Discovery
by Humans
Simulation
Experiments
R&D Today
But we cannot beat complexity with brute force simulation. Traditional
discovery has limits: We need a new, data driven, holistic approach
Data Centric Systems
Data Centric Systems
Data Centric Systems
© 2016 International Business Machines Corporation
10
Companies need to easily access
quickly growing and widely
diverse information sources.
• Highly unstructured/dark
• Current human based
approach not scalable
Domain related inference is largely
missing. Setting up and deploying the
right simulations is very hard.
• Human capital intensive, non
scalable
Internal evidence and experiments
are driven primarily empirically,
often brute force, and their results
are isolated from wider knowledge
space.
Knowledge
Evidence & Experiments
Inference & Simulation
© 2016 International Business Machines Corporation
11
Create technical area specific
knowledge space from all relevant
sources. Link with company data.
Use knowledge space to
• Drastically augment internal know-how & modeling
• Focus on which experiment is relevant
• Embed results in knowledge base
Use inference on the knowledge space
& simulation on the models
• To augment the knowledge space
• Sharpen simulation models
• Make precise decisions
Cognitive Discovery
Drastically accelerate pace
of systematic discovery
and maximize ROI for R&D
Rapid and Precise Materials R&D
drives new value for our clients
Pharma Materials Engineering &
Manufacturing
Science,
Products &
Economics
Experimental
Results
Knowledge Inference & Simulation
Evidence & Experiments
Simulation
Data Centric Systems
Document Ingestion: PDF
Domain Specific Knowledge
Graphs
Domain Specific ML +
Inference
NLQ + ML Driven
Simulations
Automatic Hypothesis
Discovery
Fully Automated
Reasoning
Fully Automated
Discovery
mature
Ideation
KNOWLEDGE EXTRACTION &
REPRESENTATION
INFERENCE DRIVEN
SIMULATIONS
AUTOMATED TECHNICAL
REASONING
Data Centric Systems
13
Literature ReviewNon scalable, human based outsourcing:
• Limited sources
• Non-systematic; limited re-use
1
Chemical/Physical/Eng. modeling & simulations
• Expert material scientists
• Empirical: no inference
• Trial and error based: no systematic knowledge buildup
2
Lab tests
Time/money costly
• Empirical (slow: many tests)
• No systematic knowledge buildup & connection
3
YearsMonths Months Months
INGESTION SIMULATION ANALYSIS
Data Centric Systems
Pdf-parser:
• Parses the pdf-code and presents the raw data of the pdf (text-cells, embedded images and vector-graphics in consumable format)
Pdf-interpreter:
• Captures ground truth by massive Crowd-sourcing big Data system
• Uses HPC for ML-techniques (Deep Leaning), to train automatic annotation models
Semantic-representation:
• Uses HPC & Big Data systems to to obtain a semantic representation in JSON-format of the original text
Billions of documentsMillions of concurrent users
Data Centric Systems
Weeks
Deep
Search
Lab tests experiments data
Simulation
& Inference
Scientific literature & internal
reports
Design alloys to avoid catastrophic failure that can
lead to huge liabilities
• Corrosion
• Cracks
• Special environmental and deployment
conditions
DAYS
Knowledge
space
• Atomistic simulations
• Deep Learning based property prediction
Data Centric Systems
• Typically HPC development is focused
on increased speed.
• The fastest calculation is the one
which you don’t run!
• Can we use machine learning to make
better decisions on which simulations
give the most value?
• Can we use machine learning to
improve resolution of information?
‘Cognitive’ workflow uses 1/3 of the calculations to achieve 4 orders of magnitude resolution increase
Data Centric Systems
On-prem, customer managed
(Bluemix Local)
IBM Cloud
private
X86, Power & Z X86 based systems
On-prem,
IBM
managed
Off-prem, IBM managed
(Bluemix Public or Dedicated)
Linux
4/11/2018IBM Confidential 17
kube-arbitrator
GPFS/Parallel object store
Spectrum MPI
Spectrum LSF Conductor w/Spark Symphony
XLc/C/Fortran
Compute Accelerators (GPUs, AI, FPGA, etc.)//High Performance Network (RoCE, IB, RRC)//NVMe,Flash
Math librariesESSL, GPU, AI
AI frameworks (PowerAI,DLaaS)
Workflow Managers (TCaaS)
HPC, AI Applications
xC
AT
Pro
vis
ionin
g
Ubiquity Storage drivers
Data Centric Systems
Knowledge
Space
Simulation
Weeks
Evidence/Experiments
• Supercomputing
• Quantum and new computing paradigms
• Inference (ML)
Ingest data and create massive knowledge spaces
Link evidence with knowledge spaces. Drive deep search