13
Please note IBMs statements regarding its plans, direc5ons, and intent are subject to change or withdrawal without no5ce at IBMs sole discre5on. Informa5on regarding poten5al future products is intended to outline our general product direc5on and it should not be relied on in making a purchasing decision. The informa5on men5oned regarding poten5al future products is not a commitment, promise, or legal obliga5on to deliver any material, code or func5onality. Informa5on about poten5al future products may not be incorporated into any contract. The development, release, and 5ming of any future features or func5onality described for our products remains at our sole discre5on. Performance is based on measurements and projec5ons using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considera5ons such as the amount of mul5programming in the users job stream, the I/ O configura5on, the storage configura5on, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 4/11/16 1

Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Embed Size (px)

Citation preview

Page 1: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Please  note  •  IBM’s  statements  regarding  its  plans,  direc5ons,  and  intent  are  subject  to  change  or  withdrawal  without  no5ce  

at  IBM’s  sole  discre5on.  

•  Informa5on  regarding  poten5al  future  products  is  intended  to  outline  our  general  product  direc5on  and  it  should  not  be  relied  on  in  making  a  purchasing  decision.    

•  The  informa5on  men5oned  regarding  poten5al  future  products  is  not  a  commitment,    promise,  or  legal  obliga5on  to  deliver  any  material,  code  or  func5onality.  Informa5on  about  poten5al  future  products  may  not  be  incorporated  into  any  contract.    

•  The  development,  release,  and  5ming  of  any  future  features  or  func5onality  described  for  our  products  remains  at  our  sole  discre5on.    

•  Performance  is  based  on  measurements  and  projec5ons  using  standard  IBM  benchmarks  in  a  controlled  environment.  The  actual  throughput  or  performance  that  any  user  will  experience  will  vary  depending  upon  many  factors,  including  considera5ons  such  as  the  amount  of  mul5programming  in  the  user’s  job  stream,  the  I/O  configura5on,  the  storage  configura5on,  and  the  workload  processed.  Therefore,  no  assurance  can  be  given  that  an  individual  user  will  achieve  results  similar  to  those  stated  here.    

4/11/16 1

Page 2: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Agenda  !  What  is  Apache  Spark?    

!  How  does  Spark  perform  on  OpenPOWER?  

!  Leveraging  OpenPOWER  innova5on  under  Spark  

!  Ques5ons  

Page 3: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

What  is  Apache  Spark?  •  Unified  Analy5cs  PlaOorm  

–  Combine  streaming,  graph,  machine  learning  and  sql  analy5cs  on  a  single  plaOorm  

–  Simplified,  mul5-­‐language  programming  model  

–  Interac5ve  and  Batch  

•  In-­‐Memory  Design  –  Pipelines  mul5ple  itera5ons  on  

single  copy  of  data  in  memory  –  Superior  Performance  –  Natural  Successor  to  MapReduce  

4/11/16 3

Fast and general engine for large-scale data processing

Spark  Core  API    R   Scala   SQL   Python   Java  

Spark  SQL   Streaming   MLlib   GraphX  

Page 4: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

The following charts show Performance results of comparing multiple Spark Workloads from SparkBench using data sizes from 100GB to 10TB (https://github.com/SparkTC/spark-bench)

7-node cluster of Intel Haswell servers •  E5-2620 V3 •  12-core •  256GB

vs 7-node cluster of OpenPOWER servers •  POWER8 S812LC •  10-core •  256GB

•  Machine Learning (Spark MLlib) •  Matrix Factorization •  Logistic Regression •  Support Vector Machine

•  SQL (Spark SQL) sqlContext.sql("SELECT COUNT(*) FROM orderTab").count()

sqlContext.sql("SELECT COUNT(*) FROM orderTab where bid>5000").count()

sqlContext.sql("SELECT * FROM oitemTab WHERE price>250").count()

sqlContext.sql("SELECT * FROM oitemTab WHERE price>500").count()

sqlContext.sql("SELECT * FROM orderTab r JOIN oitemTab s ON r.oid = s.oid").count()

•  Graph (Spark GraphX) •  Page Rank •  Triangle Count •  Singular Value Decomp++

Measuring  Performance  of  Spark  on  POWER  

Page 5: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

System  Performance  of  Spark  on  POWER  

4/6/2016 5

0"

0.5"

1"

1.5"

2"

2.5"

3"

E5)262

0"v3"

100GB"Mat."Fact."

100GB"(in

"mem)"LR"

1TB"(in

"mem)"LR"

1TB"(50

/50)"LR

"

1TB"SV

M"

10TB"LR"

1TB"5"q

uery"

2TB"5"q

uery"

130GB"Pa

ge"Ran

k"

1TB"Trian

gle"Cn

t"

1TB"SV

D++"

AVERA

GE"

Relat

ive Sy

stem

Perfo

rman

ce

Spark"Workloads"

Machine Learning SQL Graph

1.7X

Page 6: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Price  Performance  of  Spark  on  POWER  •  Spend  33%  less  on  infrastructure  suppor5ng  the  same  amount  of  workload  

•  Spend  the  same  on  infrastructure  but  host  50%  more  workload  

4/6/2016 6 * - based on preliminary SoftLayer pricing targets – subject to change

Machine Learning SQL Graph

1.5X

Page 7: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

POWER  Advantages  for  Spark  •  Streaming  and  SQL  benefit  from  High  Thread  Density  and  Concurrency  

•  Processing  mul5ple  packets  of  a  stream  and  different  stages  of  a  message  stream  pipeline  •  Processing  mul5ple  rows  from  a  query  

•  Machine  Learning  benefits  from  Large  Caches  and  Memory  Bandwidth    •  Itera5ve  Algorithms  on  the  same  data  •  Fewer  core  pipeline  stalls  and  overall  higher  throughput  

•  Graph  also  benefits  from  Large  Caches,    Memory  Bandwidth  and  Higher  Thread  Strength    •  Flexibility  to  go  from  8  SMT  threads  per  core  to  4  or  2  •  Manage  Balance  between  thread  performance  and  throughput  

4/6/2016 7

Page 8: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Leveraging  OpenPOWER  Innova5on  

4/6/2016 8

0

50000

100000

150000

200000

250000

300000

350000

400000

Run

time

(ms)

Total Heap Memory

x Degrees of Separation on Spark

Disk CAPI/Flash

CAPI Flash for RDD Cache = 4X memory reduction at equal performance

RDMA for Spark Shuffle = 30% Better Response Time, Lower CPU Utilization, Lower Memory Footprint

CAPI Flash and RDMA can be Leveraged Transparently to Spark Applications

Page 9: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Accelera5ng  Spark  with  GPUs  •  Adverse  Drug  Reac5on  Predic5on  built  on  Spark  •  25X  Speed  up  for  Building  Model  stage  (using  Spark  Mllib  Logis5c  Regression)  •  Again,    Transparent  to  the  Spark  Applica5on  •  Game  changer  for  Personalized  Medicine  

4/6/2016 9

Page 10: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Summary  •  Spark  is  a  new  disrup5ve  technology  for  big  

data  analy5cs  

•  OpenPOWER  systems  can  provide  leadership  performance  and  economics  for  Spark  deployments.      

•  OpenPOWER  innova5ons  can  provide  addi5onal  accelera5on  and  value  to  Spark  

4/6/2016 10

Page 11: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

No5ces  and  Disclaimers  (1  of  2)  Copyright  ©  2016  by  Interna5onal  Business  Machines  Corpora5on  (IBM).    No  part  of  this  document  may  be  reproduced  or  transmiged  in  any  form  without  wrigen  permission  from  IBM.    

U.S.  Government  Users  Restricted  Rights  -­‐  Use,  duplicaGon  or  disclosure  restricted  by  GSA  ADP  Schedule  Contract  with  IBM.  

Informa5on  in  these  presenta5ons  (including  informa5on  rela5ng  to  products  that  have  not  yet  been  announced  by  IBM)  has  been  reviewed  for  accuracy  as  of  the  date  of  ini5al  publica5on  and  could  include  uninten5onal  technical  or  typographical  errors.  IBM  shall  have  no  responsibility  to  update  this  informa5on.  THIS  DOCUMENT  IS  DISTRIBUTED  "AS  IS"  WITHOUT  ANY  WARRANTY,  EITHER  EXPRESS  OR  IMPLIED.    IN  NO  EVENT  SHALL  IBM  BE  LIABLE  FOR  ANY  DAMAGE  ARISING  FROM  THE  USE  OF  THIS  INFORMATION,  INCLUDING  BUT  NOT  LIMITED  TO,  LOSS  OF  DATA,  BUSINESS  INTERRUPTION,  LOSS  OF  PROFIT  OR  LOSS  OF  OPPORTUNITY.    IBM  products  and  services  are  warranted  according  to  the  terms  and  condi5ons  of  the  agreements  under  which  they  are  provided.    

Any  statements  regarding  IBM's  future  direcGon,  intent  or  product  plans  are  subject  to  change  or  withdrawal  without  noGce.  

Performance  data  contained  herein  was  generally  obtained  in  a  controlled,  isolated  environments.    Customer  examples  are  presented  as  illustra5ons  of  how  those  customers  have  used  IBM  products  and  the  results  they  may  have  achieved.    Actual  performance,  cost,  savings  or  other  results  in  other  opera5ng  environments  may  vary.      

References  in  this  document  to  IBM  products,  programs,  or  services  does  not  imply  that  IBM  intends  to  make  such  products,  programs  or  services  available  in  all  countries  in  which  IBM  operates  or  does  business.      

Workshops,  sessions  and  associated  materials  may  have  been  prepared  by  independent  session  speakers,  and  do  not  necessarily  reflect  the  views  of  IBM.    All  materials  and  discussions  are  provided  for  informa5onal  purposes  only,  and  are  neither  intended  to,  nor  shall  cons5tute  legal  or  other  guidance  or  advice  to  any  individual  par5cipant  or  their  specific  situa5on.    

It  is  the  customer’s    responsibility  to  insure  its  own  compliance  with  legal  requirements  and  to  obtain  advice  of  competent  legal  counsel  as  to  the  iden5fica5on  and  interpreta5on  of  any  relevant  laws  and  regulatory  requirements  that  may  affect  the  customer’s  business  and  any  ac5ons  the  customer  may  need  to  take  to  comply  with  such  laws.    IBM  does  not  provide  legal  advice  or  represent  or  warrant  that  its  services  or  products  will  ensure  that  the  customer  is  in  compliance  with  any  law  

 

4/6/2016 11

Page 12: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

No5ces  and  Disclaimers  (2  of  2)  Informa5on  concerning  non-­‐IBM  products  was  obtained  from  the  suppliers  of  those  products,  their  published  announcements  or  other  publicly  available  sources.    IBM  has  not  tested  those  products  in  connec5on  with  this  publica5on  and  cannot  confirm  the  accuracy  of  performance,  compa5bility  or  any  other  claims  related  to  non-­‐IBM  products.    Ques5ons  on  the  capabili5es  of  non-­‐IBM  products  should  be  addressed  to  the  suppliers  of  those  products.  IBM  does  not  warrant  the  quality  of  any  third-­‐party  products,  or  the  ability  of  any  such  third-­‐party  products  to  interoperate  with  IBM’s  products.    IBM  EXPRESSLY  DISCLAIMS  ALL  WARRANTIES,  EXPRESSED  OR  IMPLIED,  INCLUDING  BUT  NOT  LIMITED  TO,  THE  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  AND  FITNESS  FOR  A  PARTICULAR  PURPOSE.    

The  provision  of  the  informa5on  contained  h  erein  is  not  intended  to,  and  does  not,  grant  any  right  or  license  under  any  IBM  patents,  copyrights,  trademarks  or  other  intellectual  property  right.    

IBM,  the  IBM  logo,  ibm.com,  Aspera®,  Bluemix,  Blueworks  Live,  CICS,  Clearcase,  Cognos®,  DOORS®,  Emptoris®,  Enterprise  Document  Management  System™,  FASP®,  FileNet®,  Global  Business  Services  ®,  Global  Technology  Services  ®,  IBM  ExperienceOne™,  IBM  SmartCloud®,  IBM  Social  Business®,  Informa5on  on  Demand,  ILOG,  Maximo®,  MQIntegrator®,  MQSeries®,  Netcool®,  OMEGAMON,  OpenPower,  PureAnaly5cs™,  PureApplica5on®,  pureCluster™,  PureCoverage®,  PureData®,  PureExperience®,  PureFlex®,  pureQuery®,  pureScale®,  PureSystems®,  QRadar®,  Ra5onal®,  Rhapsody®,  Smarter  Commerce®,  SoDA,  SPSS,  Sterling  Commerce®,  StoredIQ,  Tealeaf®,  Tivoli®,  Trusteer®,  Unica®,  urban{code}®,  Watson,  WebSphere®,  Worklight®,  X-­‐Force®  and  System  z®  Z/OS,  are  trademarks  of  Interna5onal  Business  Machines  Corpora5on,  registered  in  many  jurisdic5ons  worldwide.  Other  product  and  service  names  might  be  trademarks  of  IBM  or  other  companies.  A  current  list  of  IBM  trademarks  is  available  on  the  Web  at  "Copyright  and  trademark  informa5on"  at:    www.ibm.com/legal/copytrade.shtml.  

4/6/2016 12

Page 13: Please¬e& - OpenPOWER Foundation · Total Heap Memory x Degrees of Separation on Spark Disk CAPI/Flash ... Worklight®,&XForce®&and&System&z®&Z/OS,&are&trademarks&of&Internaonal&Business&Machines&

Revolutionizing the Datacenter

Join the Conversation #OpenPOWERSummit

Thank  You!  Ques5ons?