Upload
spark-summit
View
280
Download
0
Embed Size (px)
Citation preview
Telmo Oliveira, Toon
Using Spark in the Cloud: A Devops perspective
4put your #assignedhashtag here by setting the footer in view-header/footer
5put your #assignedhashtag here by setting the footer in view-header/footer
6put your #assignedhashtag here by setting the footer in view-header/footer
8
9
Requirements
10
• Seamless transition
• Ensure data anonymity
• Move fast, optimise later
• Ensure multi-tenancy
• As little disturbance as possible to the DS team
11
12
13
14
• Cluster timeouts• Autoscaling• Spot instances• Well documented API
15
Infrastructure as code
• Repeatability• Fast deployment• Resilience• Documentation
16
17
Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies
18
Terraform• S3 Buckets• EC2 instances• Network topology• Log management• RDS instances• IAM roles/policies
19
Ansible
• User management
• Databases and ACLs
• Custom app deployment
20
Ansible
• User management
• Databases and ACLs
• Custom app deployment
21
ArchitectureOverview
22
Airflow
23
24
25
26
27
28
• External Hive metastore• Send logs to S3• Authorisation• i3.2xlarge nodes
Future plans
29
• Streaming
• Real time services
• Improve CI/CD
What’s all this for?
30
What’s all this for?
31
32
Thanks to the team
Aemro AmareBarend GarvelinkBert Jan KatsmanKliment MarkovskiMiquel MonrealStanislava Potupchik
Questions?