41
H T Tech nologies 2012

Hot Technologies of 2012

Embed Size (px)

Citation preview

Page 1: Hot Technologies of 2012

H T  Technologies   2012  

Page 2: Hot Technologies of 2012

HOST:  Eric  Kavanagh  

Page 3: Hot Technologies of 2012

     THIS  YEAR  WAS…  

Page 4: Hot Technologies of 2012

ANALYTIC  PLATFORMS  

�  Analytic  Platforms  represent  the  next  major  phase  in  the  evolution  of  Business  Intelligence  and  Analytics  

�  These  platforms  should  foster  collaboration  and  transparency  

�  Users  should  be  enabled  to  access  and  analyze  the  data  they  want,  quickly  and  effectively  

Page 5: Hot Technologies of 2012

ANALYST:  

Mark  Madsen  CEO,  Third  Nature  Inc.  

ANALYST:  

John  O’Brien  Principal  &  CEO,  Radiant  Advisors  

GUEST:  

Walter  Maguire  Director  of  Analytics,  ParAccel  TH

E  LINE  UP  

Page 6: Hot Technologies of 2012

INTRODUCING  

Mark  Madsen  

Page 7: Hot Technologies of 2012

© Third Nature Inc.

Philosophical  ques.on  

When  modeling  a  data  warehouse,  is  it  best  to:    

A.  Choose  each  data  element  in  your  schema  based  on  usefulness  /usage  

or  B.  Keep  every  element  in  the  source  data?    

Page 8: Hot Technologies of 2012

© Third Nature Inc.

.        

It would be logical to keep all the data in one place.

I need that data now.

The  common  situa.on  for  analysts  

It will take 6 months

Page 9: Hot Technologies of 2012

© Third Nature Inc.

Analy.cs  embiggens  the  data  volume  problem  

Many  of  the  processing  problems  are  O(n2)  or  worse,  so  even  moderate  data  can  be  a  problem  for  DW  models  &  architectures  

Page 10: Hot Technologies of 2012

© Third Nature Inc.

Big  changes  for  data  warehousing  workloads  

Much  of  the  analyHcs  data  is  being  read,  wriIen  and  processed  interacHvely  with  people  waiHng,  or  in  real  Hme  machine  to  machine  contexts.  The  results  of  analyHc  processing  can  –  oMen  do  –  feed  back  into  the  system  from  which  they  originate.  Our  DW  design  point  was  not  changing  tables,  ephemeral  paIerns,  large  data  movement  –  it  was  a  pub-­‐sub  model.  

Page 11: Hot Technologies of 2012

© Third Nature Inc.

What  do  we  mean  by  analy.cs  pla?orm?  

AnalyHcs<>  BI,  different  usage  model  and  workload  Deployment  environment?  What  sort?  ▪  Batch  ▪  Real  Hme  

Development  or  exploraHon  environment?  For  what?  ▪  The  process  of  model  building  ▪  Exploratory  analysis  ▪  AnalyHc  data  management  

   

A real analytics production workflow Hatch, CIKM 2011

Page 12: Hot Technologies of 2012

© Third Nature Inc.

Analy.c  pla?orm  design  goals  

1. Decouple  the  analyHc  plaXorm  from  the  data  warehouse:  it  can  be  a  part  of  the  delivery  layer,  or  the  integraHon  layer,  or  both.  

2.  Support  the  analyHc  development  and  maintenance  processes,  preferably  without  unsupported  data  copying.  

3.  Support  the  producHon  deployment  processes.  

Don’t  try  to  force-­‐fit  “offload”  and  “merge”  paIerns.  To  the  extent  you  can  do  all  of  this  without  moving  data  around,  it’s  a  big  win.    

Page 13: Hot Technologies of 2012

© Third Nature Inc.

Be  suspicious  of  anyone  who  says  Hadoop  is  the  only  answer  

Page 14: Hot Technologies of 2012

© Third Nature Inc. © Third Nature Inc.

IT  reality  is  mul.ple  data  stores,  distributed  pla?orm  Separate, purpose-built databases and processing systems for different types of data and query / computing workloads is the new norm for information delivery. Delivery must be separated.

   

 

 

Informa.on  delivery  layer  

   

 

 

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

1 Marge  Inovera $150,000 Statistician2 Anita  Bath $120,000 Sewer  inspector3 Ivan  Awfulitch $160,000 Dermatologist4 Nadia  Geddit $36,000 DBA

Data Warehouse

Databases Documents Flat Files XML Queues ERP Applications

Source  Environments  

Pla

tform

laye

r

Page 15: Hot Technologies of 2012

INTRODUCING  

John  O’Brien  

Page 16: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

ROLE OF ANALYTIC DBMS IN MODERN BI ARCHITECTURES

Hot Technologies – December 5, 2012 John O’Brien, Radiant Advisors [email protected]

16

Page 17: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

ROLE OF ANALYTIC DATABASES Modern BI Architectures

Data persistence for optimized BI workloads •  2-tier versus 3-tier debate

•  Why 3-tier will be next generation

Integrating semantics “in” or “above” data •  Cross database versus data virtualization debate

•  Why a evolving combination will be next generation

Predictions for 2013 and 2014

17

Page 18: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

MIXED WORKLOAD CAPABILITIES

18

Modern BI Architectures

Key Value Store (Hadoop) Discovery Oriented

Highest Scalability

Lowest Cost Schema-less

Without Context

Analytic Database

Technologies

EDW RDBMS

Accessibility: Programming SQL, MDX, UDF SQL Access Workload: Flexible, Scalable Analytic Optimized Reference Data Mgmt Maturity: Emerging Accepted Mature

3-Tier BI Architecture

Page 19: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

MIXED WORKLOAD CAPABILITIES

19

Modern BI Architectures

Highest Scalability Lowest Cost

Flexibility Schema-less

Without Context

Analytic Workloads

EDW RDBMS

Hadoop Programming Batch Oriented

What’s not to like about this? •  While possible, analytic execution will be slower performing and more

time consuming to develop and manage in Hadoop stores for BI teams •  Broad accessibility of BI tools will be a limitation

2-Tier BI Architecture

Broad SQL Accessibility by Users

Page 20: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

SEMANTIC INTEGRATION AT DATA

20

Modern BI Architectures

Hadoop

Analytic DBMS Columnar storage In-memory access Document stores

Text Analysis Graph Analysis ROLAP/MOLAP

EDW (RDBMS)

SQL

ç

ç

ç ç

text

Integration H

Cat

alog

/ H

ive-

QL

Links Gateways

BI tools (today)

Know when pulling data into ADBMS is ok

Map

Red

uce

ç

Semantic Projections

Semantic Discovery

Page 21: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

SEMANTIC INTEGRATION ABOVE DATA

21

Modern BI Architectures

MapReduce

HCatalog

Hadoop Analytic DBMS EDW

SQL / Data Virtualization ç

ç ç

ç

Future BI tools Where should semantic knowledge live in the architecture?

Semantic Discovery

text ç In-memory

ç

ç

Services

Page 22: Hot Technologies of 2012

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

THINGS TO KEEP AN EYE ON Modern BI Architectures

1.  Expect modern BI architectures to evolve in coming years as technologies pave the way

2.  Adoption of R and PMML for analytic models to become portable across platforms

3.  How vendors push-down execution code in Hadoop or pull-through data into analytic databases

4.  Polyglot persistence will optimize on multiple storage engines with service layer access

22

Page 23: Hot Technologies of 2012

INTRODUCING  

Walter  Maguire  

Page 24: Hot Technologies of 2012

Enabling  Big  Data  ApplicaHons  Walter  Maguire,  Director  of  AnalyHcs  

Copyright 2012 ParAccel, Inc. 24

Page 25: Hot Technologies of 2012

ParAccel  Analy.c  Pla?orm  is…  

Copyright 2012 ParAccel, Inc.

…built  for  high  performance,  interac.ve  analy.cs.  

Integrated  Analy.cs  

Basic  AnalyHcs  

Advanced  AnalyHcs  

On  Demand  Integra.on  

Database  

Teradata  

Hadoop  

Streaming  Data  

ApplicaHons  

Parallel  Processing  

Data  Scale  

AnalyHc  Scale  

User  Scale  

InteracHve  Scale  

ParAccel  Analy.c  Pla?orm  

Analy.c  Engine  

Columnar  

Compression  

Compiled  

SQL  OpHmizaHon  

Plan  OpHmizaHon  

ExecuHon  OpHmizaHon  

Comms  OpHmizaHon  

I/O  OpHmizaHon  

In-­‐Memory  Op.on  Available  

25

Page 26: Hot Technologies of 2012

ParAccel  technology  is  the  first  to  deliver  on  Coopera.ve  Analy.c  Processing  

SQL-­‐Based  Business  Intelligence  and  Repor.ng  Tools  

Advanced    Analy.cs  

Analy.c    Applica.ons  

Machine  Data  

Opera.onal  Data  

3rd  Party  Info  

Provider  

Streaming  Data   Logs  

ParAccel  Analy.c  Pla?orm  

On  Demand  Integra.on  

Enterprise  Data  Warehouse  

Hadoop  

Big  Data  Apps  

Embedded  Analy.cs  

Copyright 2012 ParAccel, Inc. 26

Page 27: Hot Technologies of 2012

ParAccel  ODI  Services  makes  our  pla?orm  the  analy.c  engine  for  en.re  ecosystems.  

1.  Share  both  data  and  processes  in  both  direcHons  2.  Transform  incoming  data  for  analyHc  performance  3.  Interact  with  many  programming  languages  (Java,  Python,  more)  4.  Persist  or  stream  data  through  analyHc  processing  5.  Rapidly  build  new  On  Demand  IntegraHon  modules  

Machine  Data  

Opera.onal  Data  

3rd  Party  Info  

Provider  

Streaming  Data   Logs  

ParAccel  Analy.c  Pla?orm  

On  Demand  Integra.on  Services  

Enterprise  Data  Warehouse  

Hadoop  

Big  Data  Apps  

Embedded  Analy.cs  

Copyright 2012 ParAccel, Inc. 27

Page 28: Hot Technologies of 2012

One  Size  Does  Not  Fit  All:  Why  an  Ecosystem?  

ReporHng  

Dashboards  

StaHc  Analysis  

OLAP  

 

 

AnalyHcs  

Data  Mining  

Dynamic  Analysis  

Complexity  

 

 

Archiving  

Filtering  

Text  Search  

Text  AnalyHcs  

TransformaHon  

 

Copyright 2011 ParAccel, Inc. 28

Page 29: Hot Technologies of 2012

The  Best  Way  to  Do  Analy.cs  on  Hadoop  Data  

Create  a  high-­‐performance,  node-­‐to-­‐node,  bi-­‐direcHonal,  connecHon  between  Hadoop  and  an  analyHc  plaXorm  that  is  capable  of  sharing  both  data  and  processes  so  that  the  analyHc  plaXorm  becomes  an  extension  of  the  Hadoop  cluster  and  you  can  uHlize  the  lingua  franca  of  analyHcs,  SQL.  

Copyright 2012 ParAccel, Inc. 29

Page 30: Hot Technologies of 2012

30 Copyright 2012 ParAccel, Inc.

Read  from  Hadoop:    INSERT  INTO  mytable  SELECT  *  FROM    HadoopIn(with  hfs_name(‘hadoopfile’)                                                          mr_job(‘xyz’)                                                        pa_schema(‘mytable’));  

Write  to  Hadoop:    SELECT  num_rows  FROM  HadoopOut(on  (select  *  from  mytable)                                              WITH  hdfs_name(  ‘hadoopfile’));  

Page 31: Hot Technologies of 2012

What’s  Next  for  the  Hadoop  ODI?  HCatalog  Integra.on  

•  Apache  HCatalog  is  a  table  and  storage  management  layer  for  Hadoop  Provides  table  abstracHon  for  HDFS  file  for  various  data  processing  tools    

•  ODI  Scan  filters  UDF  Filters  from  the  SQL  will  be  pushed  down  to  Hadoop  as  parHHon  filters  

 

Greatly  simplify  invesHgaHve  workflow  on  large  volumes  of  data  in  Hadoop  before  bringing  it  into  ParAccel  

Simplify  development  of  Hadoop  to  ParAccel  integraHons  

 Copyright 2012 ParAccel, Inc. 31

Page 32: Hot Technologies of 2012

ODI  Services  Architecture  Overview  

Leader  Node  

ODI  Services  Service  Mgmt.  Service  Context  

Compute  Node  

ODI  

Services  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Page 33: Hot Technologies of 2012

ODI  Services  Architecture  Overview  

Leader  Node  

ODI  Services  Service  Mgmt.  Service  Context  

Compute  Node  

ODI  

Services  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

•  Job Progress & Status •  Installation •  Logging •  Balancing •  Optimization

Page 34: Hot Technologies of 2012

ODI  Services  Architecture  Overview  

Leader  Node  

ODI  Services  Service  Mgmt.  Service  Context  

Compute  Node  

ODI  

Services  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

•  Job Progress & Status •  Installation •  Logging •  Balancing •  Optimization

STDIN STDOUT STDERR Metadata Mgmt Framework

Page 35: Hot Technologies of 2012

ODI  Services  Architecture  Overview  

Leader  Node  

ODI  Services  Service  Mgmt.  Service  Context  

Compute  Node  

ODI  

Services  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

Compute  Node  

ODI  

Services  

Perl  Python  Java  Bash  R  Etc.  

•  Job Progress & Status •  Installation •  Logging •  Balancing •  Optimization

•  Command line executable

•  3rd party interpreter (e.g. Perl, Python, Java VM)

STDIN STDOUT STDERR Metadata Mgmt Framework

Page 36: Hot Technologies of 2012

Developing  and  Deploying  ODIs  

Write  command  line  executable  or  interpreted  script  

Test  with  ODI  Services  test  harness  

Load  to  lead  node  

Lead  node  distributes  ODI  across  the  compute  nodes  

 

Copyright 2011 ParAccel, Inc. 36

Page 37: Hot Technologies of 2012

Developing  and  Deploying  ODIs  

o      Enables  a  spectrum  of  use  cases  from  fast  prototyping  to  one-­‐off  and  producHon  data  loads/unloads  

o      No  need  to  code  to  C++  APIs  or  be  exposed  to  any  complexity  

o      Fast  development  

o      Handles  parallelism  for  you  

o      Simple  protocol  

o      Logging  

o      Monitoring  progress  

Copyright 2011 ParAccel, Inc. 37

Page 38: Hot Technologies of 2012

ODI  services:  examples  

Event  Capture  

Smart  Meter  Logging  

RFID  Tag  Capture  

Tweets,  Facebook,  consolidated  social  streams  

Web  services  (Salesforce,  Eloqua,  Omniture,  etc.)  

Enterprise  Semi-­‐Structured  sources  (Outlook,  Gmail,  Zendesk,  etc.)  

Embedded  business  processes  (ex:  call  center,  distribuHon  rouHng)  

Copyright 2011 ParAccel, Inc. 38

Page 39: Hot Technologies of 2012

Coopera.ve  Analy.c  Processing  is  the  Future  

SQL-­‐Based  Business  Intelligence  and  Repor.ng  Tools  

Advanced    Analy.cs  

Analy.c    Applica.ons  

Machine  Data  

Opera.onal  Data  

3rd  Party  Info  

Provider  

Streaming  Data   Logs  

ParAccel  Analy.c  Pla?orm  

On  Demand  Integra.on  

Enterprise  Data  Warehouse  

Hadoop  

Big  Data  Apps  

Embedded  Analy.cs  

Copyright 2012 ParAccel, Inc. 39

Page 40: Hot Technologies of 2012
Page 41: Hot Technologies of 2012

The  Archive  Trifecta:  •  Inside  Analysis    www.insideanalysis.com  •  SlideShare    www.slideshare.net/InsideAnalysis  •  YouTube    www.youtube.com/user/BloorGroup  

THANK  YOU!