10
Big Data Analytics Eddie Toh Regional Server Product Marketing Manager - Intel

AWS Enterprise Day | Big Data Analytics

Embed Size (px)

Citation preview

Page 1: AWS Enterprise Day | Big Data Analytics

Big Data Analytics Eddie Toh Regional Server Product Marketing Manager - Intel

Page 2: AWS Enterprise Day | Big Data Analytics

Big Data – Volume, Velocity, Variety (& Value)

7.9 ZB by 2015 3x more bits in digital universe than stars in the physical universe

450 Billion Business transactions per day by 2020 (IDC)

Therapies tailored to a persons genome Decoding the human genome: •  From 10 years to hours •  On track to hit <$1000 per person

Explosive growth, 30 Tb/month billing data Radical overhaul of customer service: •  Self service, real time access •  30x performance increase

$600 B Potential value to US healthcare

90% of Data In the world was created in the last 2 years.

100 years Worth of video uploaded to YouTube every 10 days

>5 Billion People calling, texting, tweeting & browsing on cell phones

“In God we trust, all others bring data” — NASA, Johnson Space Center

How  Will  Businesses  Manage  a  50x  Data  Growth    by  2020  in  an  Affordable  Way?  

Page 3: AWS Enterprise Day | Big Data Analytics

MACHINE  GENERATED    

HUMAN  GENERATED    

BUSINESS  GENERATED  

Sources of Big Data

EDGE  

SCALE  UP  

DISTRIBUTED  

REQUIRES  DIFFERENT  APPROACHES  

Page 4: AWS Enterprise Day | Big Data Analytics

Hadoop?

The  best  thing  since…  

Page 5: AWS Enterprise Day | Big Data Analytics

Hadoop Framework

Open  Source   Proprietary  

HDFS | Lustre | GlusterFS Hadoop Compatible File Systems

YARN (+MapReduce) Distributed Processing Framework

HB

ase

Zook

eepe

r C

oord

inat

ion

Flum

e Lo

g C

olle

ctor

S

qoop

D

ata

Tran

sfer

Hive Query

Ooz

ie

Wor

kflo

w

Mahout Machine Learning

Pig Scripting

R Stats

Hcatalog Metadata

Deployment  

Upgrade  

ConfiguraCon  

Unified  Logging  

Tuning  

Alerts  

Resource  Monitor  

Job  Profiler  

Security  Controls  

Heat  Map  

Rhino (Security)

High Availability and Disaster Recovery

HBase  Explorer  

RecommendaCon  Engine   Behavior  Model   VerCcal  Accelerators  

AnalyCcs  Workbench  

Connectors Netezza, Oracle, SAP, SQLServer,

Teradata, DB2 Kafka

Event  Bus  Lucene, Solr

Search  Tribeca

Graph  Mining  Gryphon

Low-­‐latency  SQL-­‐92  Spark/Shark In-­‐memory  

SLURM Scheduler

Page 6: AWS Enterprise Day | Big Data Analytics

Big Data Use Cases Across Industries

EducaCon  

Financial  Services  

Page 7: AWS Enterprise Day | Big Data Analytics

Telco- China Mobile Group Guangdong Hadoop & Xeon optimized Big Data storage & analytics •  Challenge: Deliver real time access to Call Data

Records (CDR) for billing self service •  Solution: Chose Hadoop + Xeon over RDMS to

remove data access bottlenecks, increase storage, and scale system

•  Benefits: Lower TCO, 30x performance increase, stable operation, analytics on subscriber usage for targeted promotions

•  Data Characteristics: •  30TB billing data/month •  Real-time retrieval of 30 days CDRs •  300k records/second, 800k insert speed/sec •  15 analytics queries

Analy&cs  

Page 8: AWS Enterprise Day | Big Data Analytics

Government - Smart Traffic Intelligent Transport System Hadoop for Predictive Analytics

Crime prevention, Info sharing & Predictive Traffic Analytics Machine Generated Data: •  Embedded HBase client in camera for real-time inserts of

structured/unstructured data •  30000 + camera data collection points •  2 billion HBase records •  Petabytes of traffic data •  Terabytes of images •  1 week of Data mining Results: •  Automated queries for traffic violation •  Crime Prevention: ID fake •  Licenses <1 minute •  Traffic Routing

App    Servers  

Regional  Data  Collec&on  

Distributed  Processing  Across  District  Nodes  

Derived                                                                              Analy&cs  Services    

Crime  PrevenCon   CiCzen  Traffic  Services  

Page 9: AWS Enterprise Day | Big Data Analytics

Options For Hadoop Deployment

On-Premise (or private cloud) •  Limited scalability •  Internal IT resources

to manage cluster •  CapEx – HW, DC

space, power & cooling

On AWS (public cloud) •  Scalability •  Flexibility •  Easy to deploy to

multiple locations •  Additional resources

on demand •  OpEx

Hybrid Cloud model •  Provides bursting capacity •  Flexibility •  Scalability •  IT still needs to manage on-

premise cluster

Security Is Addressed In All Models

Page 10: AWS Enterprise Day | Big Data Analytics

“Where do I start…?” 1.  What is your business problem?

2.  Do you have a (lots of) data problem?

3.  Will big data analytics work for my

business problem?

Speak To AWS Today!