Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
HPCandBigdataconvergence
Serge G. Petiton, Université Lille 1 and CNRS, France,Nahid Emad, University of Versailles, France, Laurent Bobelin, Universite François Rabelais, France,
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Introduction• HPC
– simulation:sizeofdataincreasing,newdomains,stability,….
– globalreductionproblemsandcommunicationproblems
– andallthewell-knownproblemsdiscussedthisweekatCSE2017…..
• BigData:
– enoughdatatoavoidstatisticalmodels?
– dataranking
– map-reduceoperations(indexing,….)
• HPCandBigdata:convergence?Cooperation?Mutualexchanges?
– Jointedproblems:energyconsumption,I/Ominisation,resilience,.......
– Nevertheless,theintrinsicbehaviorsarequitedifferent
”convergence”HPC-Big Data:Whatlanguages,programmingparadigms,systems… forwhatmethods,sciencesandfuturedevelopments
April5th 1stFJGworkshop
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Aggregation of Google search data to estimate currentflu activity in near real-time
Statistic model free - Big Data: flu epidemic
1stFJGworkshop
AYutong Lu’sslide
StackoftheTianhe supercomputers:
April5th
MPI/OPenMP +hadoop/Sparkonsupercomputersoftwarestack
1stFJGworkshop
HaadoopandSparkOnTianhe 2
WithMPIandOpenMP
April5th
Askingalsoforsmartermiddlewareonsuchmachine)
April5th 1stFJGworkshop
BUT Nowadays HPC and Big Data platforms are different:• HPC Storage = expensive and interconnected, Big Data = local, cheap
disks• HPC and Big Data platforms are not of the same scale:
billion of cores vs. thousands of machines*
HPC tries to make full use of CPU/GPU, Big Data isfocused on I/O.
Nowadays HPC and Big Data relies on different theoreticalground:• HPC performs most of the time a mix of task and data parallelism• Inter-tasks communications are really different.• Big Data have a tendency to treat all nodes as equals.• MapReduce for example has good scaling property when all nodes perform a
Map simultaneously.• Big Data relies on statistics and good probabilistic properties of data
distribution, while HPC code explicitly distribute data
Things are fast changing and some Big Data environment (Hadoop for example) now include more support for task parallelism, but there's still an enormous gap between the two.
1stFJGworkshop
WhatHPC/BigDataApplications?Andwhatprogrammingparadigms?AndwhataboutDataRankingandHPCwith”BigData”?
April5th
Nowadays HPC and Big Data stacks are different:
• Performances: Big Data exhibits larger latency compared to HPC. BD scaling implies often larger tasks.
• HPC is optimized since almost 50 years, 10 years for BigData• HPC code is often optimized for its hardware platform, while BigData is not that
deep into the architecture• Languages and philosophy used in both may be different
Many projects have failed due to the complexity of integrating both stacks.
Some projects so far have addressed the problem – mainly based on Hadoop + MPI with no real large adoption:
• Integrating HPC tools into Hadoop environment (Hamster, ...): most of the time not accurate due to application latency, disappointing results
• Deploying Hadoop on Demand on HPC resources (HoD,...): most of the time not accurate because of hardware requirements
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Flu epidmic : Proposed PageRank-like model (Nahid Emad)
PageRank-like model PageRank modelAn individual in a social graph A webpage in a web graphe
A virus A walker
Propagation of the virus Promenade of the walker
PageRank of an individual is the probability to be infected by the virus in the course of epidemic
PageRank of a specific page is the probability of the presence of the walker on the page
Withamathematicalformalismanddominanteigenvectorcomputation
Weranktheindividualwithrespecttotheviruspropagation
Mathematical formalismFrobenius theorem à λ=1 is the largest eigenvalue of the matrix P.Then, there is a stationary distribution for the final state of epidemicspread: Px=x.xi the probability that individual i be infected during epidemicx= (x1, x2 ,.., xn) the stationary distribution (infection vector) for thewhole population is independent of starting distribution and verifiesPx=x.The impact of infection vector x in social graph is similar to that ofPageRank vector in web graph.
Problem Solution
Dangling individual add a loop to itself
Small worldnon-uniqueness of ranking vector
add a jumping vector to the random virus propagation process
Computational algorithms
A = αP + (1-α)vzT
A is disease transition matrixv is the teleportation vectorz is the vector (1, …, 1)T
α (<1) damping factor1-α jumping rate; the probability for the virus to jump from any individual to any other individual in a social graph.
Then,computationofthedominanteigenvectorofA(verylarge,non-Hermitianandverysparse)
Computational algorithms: MIRAM
initial guess w3
Projection of(Pn) to Km3
Solving (Pm3)
Return in ¢n
initial guess w1
Projection of(Pn) to Km1
Solving (Pm1)
Return in ¢n
initial guess w2
Projection of(Pn) to Km2
Solving (Pm2)
Return in ¢n
Asynchronous information exchange
With n=1011, c=103, m=103, if iter=100, b =10, MIRAM needs:iter ´ b =100 Exa flop (103 peta flop) and b ´ 10=100 peta bytes
ExperimentsStochastic simulation using the infection vector
Time series of infection in an 7010-node power-law social graph ba, with ν= 0.2, δ= 0.24and x=5
Experiments
Convergencebehaviorforthe281903X281903Stanfordmatrix,α=0:85
Numberofmatrixvectorproductsforthe281903281903Stanfordgraph
• 2 kg – more bacteria than human cells (60.1016)• An unknown organ: intestinal microbiota• Amount of sequence generated has increased 109 times in
20 years.
Big Data: microbiota (Nahid Emad with J.-M. BATTO - INRA MGP)
bacteria
protists
archae
virus
fungus
Big Data: microbiota
bacteria =~3000genes
parasites=~6000genes
Virus=~50genes
100 – 1000 individualsU
p to
10
mill
ions
of
gene
s
DNA preparation->Get Sequences->Compare to reference->Counting & analyzing
Big Data: microbiota
MetaProf→energy efficiency multiplied by 4.7 with the GPU implementation
genes
gene
s
Correlationmatrix
gene
s
Countingmatrix
Matrix106 genesby800samples
Samples
Principal Coordinates Analysis applied on the matrices of distances between samples,concentrating the major variations in the samples in a small space implies using manylinear algebra techniques. SVD computation of very large matrices
Numerical methods/algorithms & HPC techniques have to be defined/adapted to increase data-scalability.
April5th 1st FJG workshop
PageRankisoftenpartofalargerApplication
April5th 1st FJG workshop
Indexing : MAP-REDUCE algorithm
April5th 1st FJG workshop
Pagerank ranks all the pages : Computation of a non-Hermitian sparse matrix of size several billions, and growing
The ranking is done before the queries
Indexing : MAP-REDUCE algorithm
April5th 1st FJG workshop
Pagerank ranks all the pages : Computation of a non-Hermitian sparse matrix of size several billions, and growing
The ranking is done before the queries
Indexing : MAP-REDUCE algorithm
Itexistsapplicationsmixingasynchronouslydifferentapproaches:• HPC• Indexing• DataOntology• …....
• Wehavedistributedasynchronousparallelcomputation
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Applicationsmixinglinearalgebra(orHPC-likecomputation,includingPageRank,…)andMap-Reduce(orotherclassical“bigdata”algorithms
• Insequence• Inparallel,asynchrously• Iteratively
ItexistsnewapproachesforMap-Reduce(Yarn,Tez,…)
ItexistsnewlanguageanprogrammingparadigmsforHPC
Bothhavetomixdistributedandparallelcomputation,asynchronouslyornot
Wehavetotargetpost-petascale machineandfutureexascale computing
Wehavehavetoproposesolutionadaptedtosupercomputeranddatacenter
April5th 1stFJGworkshop
April5th 1stFJGworkshop
Geoscience
Kirchhoff
April5th 1stFJGworkshop
Kirchhoff
April5th 1stFJGworkshop
Map-Reduce,orothersbigdataindexingorontologyanalysis
1stFJGworkshop
Masnida Amami, Ali Setayesh, Nasrin Jaberi. Distributed Computing of Seismic Imaging Algorithms (Iran)
Rizvandi N B, Boloori A J, Kamyabpour N, et al.: MapReduce implementation of prestack Kirchhoff time migration (PKTM) on seismic data (Australia)
SparkversiondevelopedbyresearchersfromNanjingUniversity(China
April5th
Map-ReduceusetooptimizeI/OandcommunicationsforsomeKirchhoffmigrationmethod
Nevertheless,theMap-ReduceisdonebeforetheKirchhoffcomputationitself.
WhataboutcomputationalmethodsmixingMap-Reduce(orotherstypicalBigDataAlgorithms)and”classical”HPCcomputation(LinearAlgebra,FFT,convolution,…..)duringcomputation
• Mixinglanguage• Datastructure• Plateforms:clouds,grids,supercomputers,...• Systems,scheduling,....• I/O.....• .....
StartingfromtheBigdataside:
Firstexperiments,forsomeversionsofKirchhoff:
1stFJGworkshopApril5th
Yarn:containersContainersareclustersofnodes
1stFJGworkshopApril5th
1stFJGworkshopApril5th
Hardwarenodes
Node Manager’s containers
Node Managers
ResourceManager
Distributedandparalleltwolevelprogramming
WithDAG
1stFJGworkshopApril5th
1stFJGworkshopApril5th
1stFJGworkshopApril5th
1stFJGworkshopApril5th
TEZ:DAGofMapReducetasks/jobs
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Some elements on YML
• YML1 Framework is dedicated to develop and run parallel and distributed applications on Cluster, clusters of clusters, and supercomputers (schedulers and middleware would have to be optimized for more integrated computer – cf. “K” and OmnRPC for example).
• Independent from systems and middlewares– The end users can reused their code using another middleware– Actually the main system is OmniRPC3
• Components approach– Defined in XML– Three types : Abstract, Implementation (in FORTRAN, C or C++;XMP,..),
Graph (Parallelism)– Reuse and Optimized
• The parallelism is expressed through a graph description language, named Yvette (name of the river in Gif-sur-Yvette where the ASCI lab was). LL(1) grammar, easy to parse.
• Deployed in France, Belgium, Ireland, Japan (T2K, K), China, Tunisia, USA (LBNL, TOTAL-Houston).
1stFJGworkshop
Graph (n dimensions)of components/tasksYML
Begin nodeEnd nodeGraph node
Dependence
parcompute tache1(..);notify(e1);
//compute tache2(..); migrate matrix(..);notify(e2);
//wait(e1 and e2);Par (i :=1;n) do
parcompute tache3(..);
notify(e3(i));//if(I < n)then
wait(e3(i+1);compute tache4(..);notify(e4);
endif;//compute tache5(..); control robot(..);notify(e5); visualize mesh(…) ;end par
end do par//
wait(e3(2:n) and e4 and e5);compute tache6(..);compute tache7(..);
end par
Generic component node
April5th 1stFJGworkshop
AbstractComponent<?xml version="1.0"?><componenttype="abstract" name="prodMat"description=“Matrix
MatrixProduct"><params><param name="matrixBkk"type="Matrix"mode="in"/><param name="matrixAki« type="Matrix"mode="inout"/><param name="blocksize"type="integer"mode="in"/></params>
</component>
Future:
<param name= "conv"type=" graph_param_float"mode= "inout"/>
1stFJGworkshop
Enduserswouldbeabletoaddsomeexpertisehere
<libraries=ScaLAPACk <NAG<PETSC>
Implementation Component<?xml version="1.0"?><componenttype="impl"name="prodMat"abstract="prodMat"description="Implementation
componentofaMatrixProduct"><impl lang="CXX"><header/><source><![CDATA[
int i,j,k;double**tempMat;//Allocationfor(k=0;k<blocksize ;k++)for(i=0;i<blocksize ;i++)
for(j=0;j<blocksize ;j++)tempMat[i][j]=tempMat[i][j]+matrixBkk.data[i][k]*matrixAki.data[k][j];
for(i=0;i<blocksize ;i++)for(j=0;j<blocksize ;j++)
matrixAki.data[i][j]=tempMat[i][j];//Desallocation
]]></source>
<footer /></impl>
</component>1stFJGworkshop
Endusersmayaddsomeexpertisehere(we’llseeexample)
<impl lang="XMP"nodes="CPU:(5,5)"libs=""><distribute>
<param template="block,block "name="A(100,100)" align="[i][j]:(j,i)" /><param template="block"name="Y(100);X(100)"align="[i]:(i,*)"/>
</distribute>
Aimplementationcomponentmaybedevelopedusingaparallellanguage(suchasXMP)
Graph component of Block Gauss-Jordan Method
44
1stFJGworkshop
1stFJGworkshopApril5th
Tez and YML :• Share (more or less) the same DAG/Graph of Tasks vision• Each task in both may be parallel• Shares the same structure, with a two-part scheduler: one DAG-based part and
one responsible for resource allocation.• As Tez is maintaining a container pool, it can provide YML a stable platform.
It may not be perfect, but at least it can provide a prototype to have a better view of the inherent problems of scheduling
Graph (n dimensions)of components/tasksYML
Begin nodeEnd nodeGraph node
Dependence
parcompute tache1(..);notify(e1);
//compute tache2(..); notify(e2);
reduce tache8(…) ; notify(e8);//
wait(e1 and e2);Par (i :=1;n) do
parcompute tache3(..); notify(e3(i));
//if(I < n)then
wait(e3(i+1);compute tache4(..);notify(e4);
endif;//
wait (e8);map tache9(…..);
end parend do par
//wait(e3(2:n) and e4 and e5);compute tache6(..);compute tache7(..);
end par
Generic component node
April5th 1stFJGworkshop
parcompute tache1(..);notify(e1);
//compute tache2(..); notify(e2);
mapreduce tache8(…) ; notify(e8);//
wait(e1 and e2);Par (i :=1;n) do
parcompute tache3(..); notify(e3(i));
//if(I < n)then
wait(e3(i+1);compute tache4(..);notify(e4);
endif;//
wait (e8);mapreduce tache9(…..);
end parend do par
//wait(e3(2:n) and e4 and e5);compute tache6(..);compute tache7(..);
end par1stFJGworkshop
Withinformations onthe“implementationcomponents”suchas“reduce”or“map”,,orothers
Itwouldbepossibletoavoidkeywordssuchas“compute”,“mapreduce”,“migration”orothersandtospecifythatontheabstractcomponents,butitwillbelessclearfortheend-userand/orprogrammer
Then,wemayhavegraphsoftask/componentmixingHPCandBidDatacomputation.
Thedifficultiesaretoefficientlyschedulethesetask/componentsandminimizecommunicationsandenergyconsumptionsandtoahaveagoodresilience
April5th
Outline
• Introduction
• BigDataand/versusHPC
• ThePageRankmethodandeigenvectorcomputations
• TheMap-Reducemethodanddistributedalgorithms
• ConvergencebetweenHPCandBigData
• Conclusion
1stFJGworkshopApril5th
Conclusion• Adaptedhighlevelprogrammingparadigmshavetobedeveloped,
allowingschedulerstomanageboththegraphsofcontrol-tasksandthedataflowgraphs;andallowingmixingHPCandBigDatacomputation
• Multi-levelprogrammingissharedbyHPCandBigData(graphadistributed”parallel”computation)
• ConvergenceofHPCandBigDatacomputationisanimportantchallenge:theimportantchallengeistoproposelanguage,programmingparadigmsandschedulerswhichwillallowtobeadaptedtoallthedifferentapproachesofthisconvergence.
• Applicationswithhugedatamodifiedduringcomputationaskingforoperationssuchasmap-reduceinparallelofclassicHPC wouldbemoreandmorepresentforthefutureimportantexascale challenges.
• Highperformanceartificialintelligence(askingforHPCandhugenumberofdata)wouldusesuchprogrammingparadigms
1stFJGworkshopApril5th