Upload
amazon-web-services
View
2.055
Download
5
Embed Size (px)
DESCRIPTION
In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.
Citation preview
What is Netflix’s data warehouse?
a) Cassandra
b) Teradata
c) Hive
d) S3
DSE Platform
DSE Platform
S3
Chukwa
Aegisthus
DSE Platform
S3
Chukwa
Aegisthus
Sting
DSE Platform
S3
Chukwa
Aegisthus
Sting
What is Netflix’s data warehouse?
a) Cassandra
b) Teradata
c) Hive
d) S3
DSE Platform
S3
Chukwa
Aegisthus
Sting
S3
S3
99.999999999%
S3
S3
High SLA
Query
HDFS ?
“Data Science as a Service”
• Execution Service / Genie
• Event Service
• Metadata Service
High SLA Cluster Job
High SLA
S3
Query Cluster Job
Query
High SLA
S3
Query Cluster Job
Query
High SLA Cluster Job
High SLA
S3
Query Cluster Job
Query
High SLA Cluster Job
High SLA
S3
Query Cluster Job
Query
Super SLA Cluster Job
Super SLA
High SLA Cluster Job
High SLA
S3
Query Cluster Job
Query
Super SLA Cluster Job