44

Integrating Big Data for the Enterprise - Oracledownload.oracle.com/.../oow14-IntegratingBigDatafortheEnterprise.pdf · Simple programming model for large scale data processing Append

  • Upload
    lenhi

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Integra(ng  Big  Data  for  the    Enterprise  

Melli  Annamalai  Product  Manager  

Rob  Abbo=  Consul(ng  Engineer  

Oracle  Big  Data  Development  October  1,  2014  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Safe  Harbor  Statement  

The  following  is  intended  to  outline  our  general  product  direc(on.  It  is  intended  for  informa(on  purposes  only,  and  may  not  be  incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver  any  material,  code,  or  func(onality,  and  should  not  be  relied  upon  in  making  purchasing  decisions.  The  development,  release,  and  (ming  of  any  features  or  func(onality  described  for  Oracle’s  products  remains  at  the  sole  discre(on  of  Oracle.  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Program  Agenda  

Customer  Use  Cases  

Oracle  Big  Data  Connectors    Load,  SQL  Query,  Analyze  

Oracle  Big  Data  SQL    Op(mized  SQL  Access  for  Engineered  Systems  

Oracle  Data  Integrator    Comprehensive  Data  integra(on  

Oracle  Golden  Gate    Real-­‐(me  Replica(on  

1  

2  

3  

4  

5  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   Oracle  Confiden(al  –  Internal/Restricted/Highly  Restricted   5  

Big  Data  Analy(c  Services  •  Measure  effect  of  new  marke(ng  campaign  •  Quick  access  to  weblogs  in  Hadoop,  combine  with  data  in  database  

Business  Transforma(on  •  Leading  Spanish  Bank  >  13M  customers  interact  via  ATMs,  web,  mobile,  branches,  numerous  acquisi(ons  over  the  years  

•  Collect  &  unify  all  relevant  informaBon  in  ‘Data  Pool’  

Network  Performance  •  Clean  and  process  network  monitoring  data  on  Hadoop  •  Load  into  database  

Usage-­‐based  Insurance  •  Track  driving  parameters  integrated  with  loca(on  data  on  Hadoop    •  Load  into  database  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Measure  effect  of  markeBng  campaign  •   Customer  subscrip(ons  in  database  •   Online  ac(vity  in  weblogs  in  Hadoop  

Collect  and  unify  in  Data  ‘Pool’  •  >  13M  customers    interact  via  ATMs,  web,  mobile,  branches,    numerous  acquisi(ons  over  the  years  

Usage-­‐based  insurance  •   Track  driving  parameters  integrated  with    loca(on  data  on  Hadoop  

•   Driver  profiles,  policies  in  database  

Monitor  network  performance  •   Clean  and  process  network  monitoring    data  in  Hadoop  

•   Load  into  database  for  further  analysis  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Driving  Business  Value  from  Technology  Innova(on  

Use  the  Right  Tool  for  the  Job  and  benefit  from  the  Power  of  “AND”  

7  

Run  the  Business    Integrate existing systems   Support mission-critical tasks   Protect existing expenditures   Ensure skills relevance

RelaBonal  Hadoop  

Change the Business   Disrupt competitors   Disintermediate supply chains   Leverage new paradigms   Exploit new analyses

NoSQL  

Scale the Business   Serve data faster   Meet mobile challenges   Scale-out economically

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Focus  in  this  Session  

8  

  Data organized for fast query   Structured schema   Complex programming models   Read, write, delete, update   Access specific record

RelaBonal  Hadoop  

  Data in files   Schema on read   Simple programming model for

large scale data processing   Append only   Sequential access of blocks

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

“The  implementa(on  of  this  Big  Data  solu(on  will  help  CaixaBank  remain  at  the  forefront  of  innova(on  in  the  financial  sector,  delivering  the  best  and  most  compe((ve  services  to  our  customers”  –  Juan  Maria  Nin,  Chief  Execu(ve  Officer,  CaixaBank    

9  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Integra(ng  Big  Data  

Data  Load  

Data  Access  

Data  Staging  Data  PreparaBon  

                 Data  Reservoir  Exploratory  Analysis  

Deep  AnalyBcs  

Real-­‐Bme  ReplicaBon  

Required  features  

Fewer  new  interfaces  

Uniform  access  methods  

Easy  to  use  

Performance  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  Connectors  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  Connectors  

Data  Load  Oracle  Loader  for  Hadoop  

Data  Access  Oracle  SQL  Connector  for  

HDFS  

R  AnalyBcs  Oracle  R  Advanced  AnalyBcs  

on  Hadoop  

Oracle  Data  Integrator  Knowledge  Modules  

XML/XQuery  Oracle  XQuery  on  Hadoop  

XQuery    R  Client  

Op(mized  for  Hadoop:  Maximise    parallelism  

Fast  performance  

Analyze  data  on  Hadoop  using    familiar  client  tools  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Cer(fied  Hadoop  and  Database  Versions  

Database  versions  (on  any  operaBng  system*)  

10.2.0.5  and  greater  

11.2.0.3  and  greater  

12c  

Hadoop  versions   CerBfied  by  

Apache  Hadoop  2.x   Oracle  

CDH  4.x  (Cloudera)   Oracle  

CDH  5.x  (Cloudera)   Oracle  

HDP  1.3  (Hortonworks)   Hortonworks  

HDP  2.1  (Hortonworks)   Hortonworks  *Oracle  SQL  Connector  for  HDFS  requires    Hadoop  client  to  be  supported  on  the  opera(ng  system  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Analyze  on  Hadoop    With  Oracle  Big  Data  Connectors  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  XQuery  for  Hadoop  

Text        Avro  JSON      XML  

•  Massively  scalable  XQuery  processing  in  Hadoop  

•  XQueries  processed  in  parallel  with  MapReduce  

•  Query  XML  with  Hive  with  XML  extensions  

•  Oozie  integra(on  

for $ln in text:collection() let $f := tokenize($ln,”,”) where $f[1] = ‘x’ return text:put($f[2]))

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Query  Output  OpBons  

Oracle  XQuery  for  Hadoop  

for $ln in text:collection() let $f := tokenize($ln,”,”) where $f[1] = ‘x’ return text:put($f[2]))

Text        Avro  JSON      XML  

Oracle  NoSQL  Database  

Oracle  Database  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  R  Advanced  Analy(cs  for  Hadoop  

•  Pre-­‐packaged  predic(ve  analy(cs  algorithms  

•  Familiar  interface  R  (to  Data  Scien(sts)  

•  Customer:  Credit  behavior  evalua(on  

•  Enabled  faster  analy(cs,  simpler  solu(on,  and  be=er  behavior  model  

R  Client  

R  algorithms:  Neural,  GLM,  LM  kMeans,  NMF,  LMF  Data  movement,  sampling,  sta(s(cs  

Hadoop  

Parallel  MapReduce  Calls  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  High  Speed  Load  from  Hadoop  to  Oracle  Database  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  

•  Parallel  load,  op(mized  for  Hadoop  

•  Automa(c  load  balancing  

•  Convert  to  Oracle  format  on  Hadoop  

– Save  database  CPU  •  Load  specific  Hive  par((ons  •  Kerberos  authen(ca(on  •  Load  directly  into  In-­‐Memory  table  

JSON Log files

Hive

Text Parquet Avro Sequence files

Compressed files

And more …

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  Performance   •  Extremely  fast  performance  

•  Sample  numbers  (on  Oracle  Engineered  Systems)  

–  4.4  TB/hour  end-­‐to-­‐end  (load  +  Hadoop  process)  –  12+  TB/hour  load  (me  

•  Much  higher  than  typical  customer  requirements  

•  Op(mized  for  Oracle  Big  Data  Appliance  and  Oracle  Exadata:  InfiniBand  Connec(vity  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  Concurrency  

•  Uses  very  few  database  CPU  cycles  

•  Maximizes  concurrency  on  database  

•  Enables  large  and  con(nuous  loads  concurrently  with  applica(ons  

Oracle  Loader  for  Hadoop   External  table  load  

External  table  load  of    Oracle  Loader  for  Hadoop    generated  Data  Pump  files  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  AutomaBc  Load  Balancing  

Real  data  is  skewed  

When  one  task  loads    more  rows  than  others  

Time  =  X  

Time  =  2…10  X  or  more  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  

•  Intelligent  sampling  to  distribute  load  evenly  across  load  processes  

•  Fine  tune  load  proper(es  for  data  distribu(on  in  current  job  • Maintain  repeatable  load  performance  

AutomaBc  Load  Balancing  

0  

2  

4  

6  

8  

10  

12  

14  

R1  

R9  

R18  

R27  

R36  

R45  

R54  

R63  

R72  

R81  

R90  

R99  

R108

 R1

17  

R126

 R1

35  

R144

 R1

53  

R162

 R1

71  

R180

 R1

89  

0  

2  

4  

6  

8  

10  

12  

14  

R1  

R9  

R18  

R27  

R36  

R45  

R54  

R63  

R72  

R81  

R90  

R99  

R108

 R1

17  

R126

 R1

35  

R144

 R1

53  

R162

 R1

71  

R180

 R1

89  

Unbalanced  load   Balanced  load  

Load  (me:  >10x  faster  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Customer  Use  Case  •  >  5000  Hive  par((ons  •  1  TB  of  data  • High  data  skew  •  Load  into  mul(ple  target  tables  

• Achievable  speed:  20  min,  well  exceeded  their  target  

• Performance  improvement  with  load  balancing:  2-­‐3x  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Loader  for  Hadoop  

•  Fast,  parallel  load  of  a  variety  of  data  formats  

• Minimize  impact  on  database  during  load  

• Automa(c  load  balancing  

• Works  with  Kerberos  

Benefits  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  SQL  Connector  for  HDFS    Oracle  SQL  access  on  Commodity  Hardware  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  SQL  Connector  for  HDFS  

Hive Text

External Table

create table customer_address ( ca_customer_id number(10,0)

, ca_street_number char(10) , ca_state char(2) , ca_zip char(10)) organization external ( TYPE ORACLE_LOADER DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (…)

PREPROCESSOR “HDFS_BIN_PATH:hdfs_stream”) LOCATION (‘addr1’, ‘addr2’, ‘addr3’))

•  Parallel  query  and  load  •  Load  into  database  or  query  in  place  

•  Access  text  or  Hive  over  text  •  Access  compressed  data  

•  Access  specific  Hive  par((ons  •  Kerberos  authen(ca(on  

Compressed files

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  SQL  Connector  for  HDFS  

•  Includes  tool  to  generate  external  table  

• Performance  on  Engineered  Systems  – 15  TB/hour  load  (me  

• Query  and  load  Oracle  Data  Pump  files    – Binary  file  in  Oracle  format  – Uses  less  database  CPU  cycles  during  query/load  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  SQL  Connector  for  HDFS  Hive  ParBBoned  Tables  

External Table

QUERY  ONLY  SPECIFIED  PARTITIONS:  

T_DATE  =  TO_DATE(‘2013-­‐10-­‐01,  ‘YYYY-­‐MM-­‐DD’)  AND    T_DATE  =  TO_DATE(‘2013-­‐09-­‐30,  ‘YYYY-­‐MM-­‐DD’)  

•  Tool  generates  external  table  and  view  for  each  par((on  

•  Create  a  UNION  ALL  view  on  all  views  

•  Query    •  Individual  view  •  UNION  ALL  view  with  Hive  par((on  column  WHERE  clause  to  access  only  relevant  views  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  SQL  Connector  for  HDFS  

•  Fast  access  – Parallel  access  to  data  in  Hadoop  

• Query  in-­‐place  from  database  

•  Easy  to  use  for  Oracle  developers  

• Works  with  Kerberos  

Benefits  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  SQL  OpBmized  for  Oracle  Engineered  Systems  

Big Data Appliance +

Cloudera  Hadoop  

Exadata  +

Oracle  Database  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  SQL  Query  All  Data  without  ApplicaBon  Change  or  Data  Conversion  

Big Data Appliance +

Cloudera Hadoop  

HDFS  Data  Node  

Exadata  +

Oracle  Database  

Oracle  Catalog  

External  Table  

create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10)) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1)

LOCATION ('hive://customer_address') )

HDFS  Data  Node  

HDFS  Name  Node  

Hive  metadata  

External  Table  

Hive  metadata  

Big  Data  SQL  Query  all  data  with  Oracle  SQL  Smart  scan  in  Hadoop  to  op(mize  data  requests  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  SQL  Copy  to  BDA  

Big Data Appliance +

Cloudera  Hadoop  

HDFS  Data  Node  

Exadata  +

Oracle  Database  

External  Table  HDFS  Data  Node   External  Table  

Hive  access  to  Oracle  Data    Pump  files  

External  Table  

Big  Data  SQL  

Copy    .dmp  files  to  BDA  

create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10)) organization external ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY DEFAULT_DIR LOCATION (‘customer_address.dmp') ) AS SELECT <…> FROM <……> (can be any Oracle SQL query)

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  SQL  Copy  to  BDA  

Big Data Appliance +

Cloudera  Hadoop  

Exadata  +

Oracle  Database  

Copy  files  to  BDA  

Big  Data  SQL   •  Business  cri(cal  data  on  Exadata  •  Copy  older  data  to  BDA  

– Integrate  with  batch  analysis  in  Hadoop  

– Infrequent  query  of  archive  data  •  Query  data  in  BDA  or  database  with  no  applica(on  change  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Big  Data  SQL  Copy  to  BDA  Example  Use  Case  

Big Data Appliance +

Cloudera  Hadoop  

Exadata  +

Oracle  Database  

Copy  files  to  BDA  

Big  Data  SQL   •  Most  current  data  on  Exadata  

•  Older  online  data  in  BDA  •  Query  all  online  data  with  no  applica(on  change  

•  Steps  •  Copy  older  par((ons  to  BDA  •   Create  views  on  Exadata  +  BDA  par((ons  

•  Drop  older  Exadata  par((ons  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Data  Integra(on  Playorm    Oracle  Data  Integrator  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Data  Integra(on  for  Big  Data  and  Hadoop  

Oracle  Confiden(al   37  

Comprehensive  data  integraBon  plaaorm  designed  to  work  with  all  data  Oracle  Data  Integrator  (Data  Transforma(on)  

Enterprise  Data  Quality  (Profile,  Cleanse,  Match  and  De-­‐duplicate)  

Fast  Load  

Oracle  GoldenGate  (Data  Replica(on)  

Enterprise  Metadata  Management  (Lineage,  Impact  Analysis  and  Data  Provenance)  

Data  ReplicaBon    Con(nuous  data  staging  into  Hadoop  

Data  TransformaBon  –  Pushdown  processing  in  Hadoop  

Data  FederaBon  –  Query  Hadoop  SQL  via  JDBC  

Data  Quality  –  Fix  quality  at  the  source  or  invoke  

Machine  Learning  in  Hadoop  

Metadata  Management  –  Lineage  and  Impact  Analysis  w/Hadoop  

Data  Service  Integrator  (Data  Federa(on)  

Synchroniza(on  

Real(me  Staging  

Pushdown  Data  Transforma(ons  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Data  Integra(on  Can  Help  Right  Now  

Oracle  Confiden(al   38  

Any  Sources  

Staging

Temp

Prod Files  

Files  

Detail

MR  

MR  

Fast Load SQL  

#1  –  Tools  not  Spaghee  •  “ETL  101”    avoid  complex,  costly  custom  coding  

#2  –  Non-­‐invasive  Capture  and  Staging  •  Move  data  without  inefficient  batch  extracts  

#3  –  Processing  is  Taken  to  the  Data    •  No  separate  ETL  engine  needed  •  Eliminate  unnecessary  data  movement  •  Reclaim  latency  and  (me  from  network  overhead  

#4  –NaBve  Hadoop  ExecuBon  •  Choose  the  right  Hadoop  language  for  your  use  case  

•  HiveQL,  Pig,  Spark,  Storm,  Java/MR2,  etc.  •  Template  driven  code  gen  keeps  pace  w/change  on  Hadoop  playorm  

#5  –  NaBve  SQL  Pushdown  •  Op(mize  some  join  types  within  the  Data  Warehouse  

#6  –  Oracle  OpBmized  •  OGG  and  ODI  cerBfied  to  run  on  the  Oracle  Appliances  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Real-­‐(me  Replica(on  to  Hadoop    Oracle  Golden  Gate  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

GoldenGate  and  Streaming  Data  

Oracle  Confiden(al   40  

Sensors  

Apps  

Apps  Leverage  DB  

transac(ons  w/in  real(me  analy(c  

streams  

Stage  DB  records  for  subsequent  processing  

Open  OGG  APIs  for  capture  of  non-­‐DBMS  events  

Non-­‐invasive  Capture  and  Staging  

•  Move  data  without  batch  extracts  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Summary  

•  Fast,  easy,  integra(on  of  all  data  in  your  Big  Data  solu(on  

• Oracle  Big  Data  Connectors  • Oracle  Big  Data  SQL  (on  Oracle  Engineered  Systems)  

• Oracle  Data  Integrator  • Oracle  Golden  Gate  

Intricate  elephant  sculptures  throughout  the  base  of  the  Chennakesava  temple  in  Belur,  India,    symbolizing  strength.    The  temple  was  built  in    1117  CE.  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Ques(ons?  

Copyright  ©  2014,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |