Upload
amazon-web-services
View
1.322
Download
4
Tags:
Embed Size (px)
Citation preview
Big Data Analytics Eddie Toh Regional Server Product Marketing Manager - Intel
Big Data – Volume, Velocity, Variety (& Value)
7.9 ZB by 2015 3x more bits in digital universe than stars in the physical universe
450 Billion Business transactions per day by 2020 (IDC)
Therapies tailored to a persons genome Decoding the human genome: • From 10 years to hours • On track to hit <$1000 per person
Explosive growth, 30 Tb/month billing data Radical overhaul of customer service: • Self service, real time access • 30x performance increase
$600 B Potential value to US healthcare
90% of Data In the world was created in the last 2 years.
100 years Worth of video uploaded to YouTube every 10 days
>5 Billion People calling, texting, tweeting & browsing on cell phones
“In God we trust, all others bring data” — NASA, Johnson Space Center
How Will Businesses Manage a 50x Data Growth by 2020 in an Affordable Way?
MACHINE GENERATED
HUMAN GENERATED
BUSINESS GENERATED
Sources of Big Data
EDGE
SCALE UP
DISTRIBUTED
REQUIRES DIFFERENT APPROACHES
Hadoop?
The best thing since…
Hadoop Framework
Open Source Proprietary
HDFS | Lustre | GlusterFS Hadoop Compatible File Systems
YARN (+MapReduce) Distributed Processing Framework
HB
ase
Zook
eepe
r C
oord
inat
ion
Flum
e Lo
g C
olle
ctor
S
qoop
D
ata
Tran
sfer
Hive Query
Ooz
ie
Wor
kflo
w
Mahout Machine Learning
Pig Scripting
R Stats
Hcatalog Metadata
Deployment
Upgrade
ConfiguraCon
Unified Logging
Tuning
Alerts
Resource Monitor
Job Profiler
Security Controls
Heat Map
Rhino (Security)
High Availability and Disaster Recovery
HBase Explorer
RecommendaCon Engine Behavior Model VerCcal Accelerators
AnalyCcs Workbench
Connectors Netezza, Oracle, SAP, SQLServer,
Teradata, DB2 Kafka
Event Bus Lucene, Solr
Search Tribeca
Graph Mining Gryphon
Low-‐latency SQL-‐92 Spark/Shark In-‐memory
SLURM Scheduler
Big Data Use Cases Across Industries
EducaCon
Financial Services
Telco- China Mobile Group Guangdong Hadoop & Xeon optimized Big Data storage & analytics • Challenge: Deliver real time access to Call Data
Records (CDR) for billing self service • Solution: Chose Hadoop + Xeon over RDMS to
remove data access bottlenecks, increase storage, and scale system
• Benefits: Lower TCO, 30x performance increase, stable operation, analytics on subscriber usage for targeted promotions
• Data Characteristics: • 30TB billing data/month • Real-time retrieval of 30 days CDRs • 300k records/second, 800k insert speed/sec • 15 analytics queries
Analy&cs
Government - Smart Traffic Intelligent Transport System Hadoop for Predictive Analytics
Crime prevention, Info sharing & Predictive Traffic Analytics Machine Generated Data: • Embedded HBase client in camera for real-time inserts of
structured/unstructured data • 30000 + camera data collection points • 2 billion HBase records • Petabytes of traffic data • Terabytes of images • 1 week of Data mining Results: • Automated queries for traffic violation • Crime Prevention: ID fake • Licenses <1 minute • Traffic Routing
App Servers
Regional Data Collec&on
Distributed Processing Across District Nodes
Derived Analy&cs Services
Crime PrevenCon CiCzen Traffic Services
Options For Hadoop Deployment
On-Premise (or private cloud) • Limited scalability • Internal IT resources
to manage cluster • CapEx – HW, DC
space, power & cooling
On AWS (public cloud) • Scalability • Flexibility • Easy to deploy to
multiple locations • Additional resources
on demand • OpEx
Hybrid Cloud model • Provides bursting capacity • Flexibility • Scalability • IT still needs to manage on-
premise cluster
Security Is Addressed In All Models
“Where do I start…?” 1. What is your business problem?
2. Do you have a (lots of) data problem?
3. Will big data analytics work for my
business problem?
Speak To AWS Today!