16
Mesos @ Atlassian MesosCon14 Ryan Thomas Development Team Lead

Mesos @ Atlassian - MesosCon14

Embed Size (px)

DESCRIPTION

Mesos in development at Atlassian

Citation preview

Page 1: Mesos @ Atlassian - MesosCon14

Mesos @ Atlassian

MesosCon14 Ryan Thomas

Development Team Lead

Page 2: Mesos @ Atlassian - MesosCon14

Overview• Why?

• Design Considerations

• Service Design

• Platform Design

• Current Status / Challenges

Page 3: Mesos @ Atlassian - MesosCon14

Why?

• Atlassian applications are traditionally single-tenanted monolithic Java applications

• Decompose shared services out of the applications

• Facilitate quick service development cycles

• Utilise current delivery pipeline

Page 4: Mesos @ Atlassian - MesosCon14

Design Considerations• This must run in AWS

• Inside of a VPC (with no public routes)

• Single logical point of external ingress through Riverbed Stingrays

• Single logical point of external egress through squid proxies

• Accessible from our data centres over ADCs

• Service SOE must be decoupled from service itself

Page 5: Mesos @ Atlassian - MesosCon14

Service Design• Container based service SOE

• Services are packaged as “slugs”

• Services must be stateless

• No guarantee consecutive requests go to the same service node

• Services declaratively described in a Service Descriptor

Page 6: Mesos @ Atlassian - MesosCon14

Platform Design

Page 7: Mesos @ Atlassian - MesosCon14

Platform Design

Page 8: Mesos @ Atlassian - MesosCon14

Platform Design

Page 9: Mesos @ Atlassian - MesosCon14

Platform Design

Page 10: Mesos @ Atlassian - MesosCon14
Page 11: Mesos @ Atlassian - MesosCon14

Platform Iteration 1

• Container Manager: Mesos + Marathon + mesos-docker-executor

• Service Locator: Netflix Eureka!

• Service Manager, Slug Builder, Load Balancer Manager: All In House

Page 12: Mesos @ Atlassian - MesosCon14

Platform Iteration 1

• All platform services in Docker containers

• Worked fine in development on a single machine

• Doesn’t work in reality due to MESOS-809

• Services require Eureka-client for registration / heartbeat

Page 13: Mesos @ Atlassian - MesosCon14

Platform Iteration 2• Platform services provisioned on “bare metal”

• Service Locator: Marathon

• Generally worked well

• Edge would health check its pool members and keep ones starting from serving traffic

• Service upgrades required a service interruption

Page 14: Mesos @ Atlassian - MesosCon14

Platform Iteration 3• Container Manager: Mesos + Aurora

• Service Locator: Aurora

• Aurora provides a lot of the functionality we were planning on building into our Service Manager

• Seamless, rolling upgrades

• Service health checks

• Notifications / service ownership / quotas

• Aurora setup & deployment was a little more involved than Marathon

• Load Balancer Manager & Service Manager need to talk Thrift to Aurora

Page 15: Mesos @ Atlassian - MesosCon14

Current Status / Challenges• Not serving production traffic yet

• How do we handle multiple regions with region-agnostic services & clients?

• Our applications are not historically stateless, how can we run them on Mesos?

• Service testing - spin up the world vs. testing against APIs vs. just do it live

Page 16: Mesos @ Atlassian - MesosCon14

Mesos Hackathon!Team forming / announcements !

@ 5:40pm (here)!

!

Starts 9am in the Ontario Room tomorrow

HipChat Room bit.ly/1q36ScB