Upload
amazon-web-services-korea
View
1.044
Download
0
Embed Size (px)
DESCRIPTION
Big Data Analysis in the Cloud - AWS Korea (정윤진, Solutions Architect)
Citation preview
Cloud
Thank you
In the next 30 minutes
1
3
What is big data
Big data on AWS
How customers using AWS
2
Where is this data coming from ?
Human generated
Machine generated
Tweet
Surf the internet
Buy and sell products
Upload images and videos
Play games
Check in at restaurants
Search for cafes
Find deals
Watch content online
Look for directions
Use social media
Human generated
Machine generated
Networks and security devices
Mobile phones
Cell phone towers
Smart grids
Smart meters
Telematics from cars
Sensors on machines
Videos from traffic and security cameras
What is it used for ?
Data for competitive advantage
Data for competitive advantage
Customer Segmentation
Financial modeling,
System analysis,
Line-of-sight,
Replacing Human decisions
Business intelligence..
Data for competitive advantage
Customer Segmentation
Financial modeling,
System analysis,
Line-of-sight,
Replacing Human decisions
Business intelligence..
Innovating new business and revenue models
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
lower cost,
increased
throughput
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
lower cost,
increased
throughput
constraint
Very high barrier to
turning data into
information…
Very high barrier to
turning data into
information.
Infrastructure capacity
Technical Skills
Questions to ask
Cheap experimentation
Amazon Web Services Cloud
Elastic and highly scalable
No upfront capital expense
Only pay for what you use
+
+
Available on-demand
+
= Remove
constraints
Remove constraints = More experimentation
More experimentation = More innovation
More Innovation = Competitive edge
Amazon Web Services
Removes constraints
Focus on your data
Leave undifferentiated heavy lifting to us
HOW
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
26
AWS Cloud Corporate Data center
Virtual Private Cloud
VPN
Internet
Direct Connect
Storage Gateway
AWS Import/Export
S3 EMR RedShift
How to move your data into AWS
AWS
Import/Export
Corporate
data center
Amazon
Elastic
MapReduce Amazon
Simple
Storage
Service (S3)
BI Users
Clickstream data
from 500+
websites and VoD
platform
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3
months
4 million ratings per day
3 million searches
Device location , time ,
day, week etc.
Social data
10 TB of streaming data per day
What is S3?
Highly scalable data storage
Access via APIs
Fast
(850K requests
per sec)
Highly available & durable
(99.999999999% Durability
Economical
($0.095 per GB)*
Web store
Velocity of data
Amazon Dynamodb
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
“Who buys video games?”
3.5 billion records
13 TB of click stream logs
71 million unique cookies
Per day:
500% return on ad spend
17,000% reduction in
procurement time
Results:
What is EMR?
Map-Reduce engine Integrated with tools
Hadoop-as-a-service
Massively parallel
Cost effective AWS wrapper
Integrated to AWS services
+
Source: http://nerds.airbnb.com/redshift-performance-cost
Table Size Query type Hive Redshift
3 billion
rows
Simple range
query
1680
seconds (28
min)
360 seconds
(6 min)
1 million
rows
2 complex
joins
182 seconds 8 seconds
$13.60/hour on Redshift versus $57/hour on
HIVE
Every day is crucial and costly
Challenge: To run a virtual screen with a higher
accuracy algorithm & 21 million compounds
Metric Count
Compute Hours of
Work
109,927 hours
Compute Days of
Work
4,580 days
Compute Years of
Work
12.55 years
Ligand Count ~21 million ligands
Using Cycle Computing and Amazon
Web Services
3 Hours for $4828.85/hr
Instead of $20+
Million in
Infrastructure
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Open web index.
3.4 billion records.
Available to all.
1000 Genomes
project
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Game instances
DB instances Proxy farms
Amazon EMR
Amazon
Glacier
Amazon
RedShift
Amazon
DynamoDB
Game traffic Analysis
Users
Sample architecture
Thank you! aws.amazon.com/big-data
May 21st, COEX Intercontinental, Seoul
One day Free training
Walk through of services
http://aws.amazon.com/apac/awsday/seoul/