23
Handling 1 Billion Requests/hr with minimal latency using Docker David Spitzer

Handling 1 Billion Requests/hr with Minimal Latency Using Docker

  • Upload
    matomy

  • View
    1.842

  • Download
    0

Embed Size (px)

Citation preview

Handling 1 Billion Requests/hrwith minimal latency using Docker David Spitzer

Publicis Groupe becomes main shareholder

Matomy goes public on the London Stock Exchange

Continuous Growth

Matomy shares commence trading on the Tel-Aviv Stock Exchange

Dual listing02/16

11/1407/14

04/15

11/15

Email

MobileIPO Video

10/14Publicis

Matomy acquired mobile programmatic company

Acquired data-driven email technology company

Acquired video programmatic company Optimatic

What Does Mobfox Do?

• We’re a mobile advertising Supply Side Platform (SSP)

• We deliver ads to mobile devices

• As a Supply Side Platform, we partner with Demand Side Platforms or DSPs

MOBFOX SSP

APPS

3

Connected to more than 120 DSPs

Monitor Impressions/ Clicks

Run the AuctionReceive an Ad Request

Validate and Filter the Responses

Determine the Winner and Serve the Ad

Building the Request Context

The Request Lifecycle @ Mobfox

RTB Auctions – Retrieving the Best Ad

• RTB stands for Real Time Bidding

• It’s an industry standard published by the IAB

• It’s simple but standardized JSON payloads over HTTP

• Due to its simplistic nature, it's easy to implement but very resource intensive

• Highest bidder wins

RTBReal Time Bidding

MOBFOX SSP

16 billion bid requests per hour

Ingesting 1.9B rows/h!

And smiling :)

Mobfox Today

Response Time1200 Servers

Ad Requests @ peak-time

16 billion ad requests per

day!

Outgoing Requests@ peak-time

DatabaseVolume

16,000 apps are making money with Mobfox

Smart request throttling and statistical CPM calculation algorithms we are proud of

Managed by 20 talented and awesome people at our R&D Vienna office

How to respect300ms?

How to manage?

Mobfox Today

Response Time1200 Servers

Ad Requests @ peak-time

Outgoing Requests@ peak-time

DatabaseVolume

16,000 apps are making money with Mobfox

Smart request throttling and statistical CPM calculation algorithms we are proud of

Managed by 20 talented and awesome people at our R&D Vienna office

Max. 300ms!

Processing1 billion/hourwith RxJava &

Docker!

Amounts to 160B outgoing

Bid Requests per day!

That amounts to 38TB of data

per day!

Docker Magic!

Our Current Docker Setup

• Fully running on Amazon’s ECS (Elastic Container Service)

• Including stateful services

• Stateless services run on Spot Instances using Spotinst

• They have fallback support to on-demand instances

• Spotinst has dedicated ECS support

• Container scheduling is done by ECS

• AutoScaling managed by Spotinst

ECS

Cloudformation

Spotinst

AWS Lambda

Rundeck

Bash

MaintenanceAutoscalingOrchestration

• We run on stock ECS Optimized Amazon Linux AMIs

• We try not to add any non-container components to the host unless absolutely necessary

• We use many different kinds of instances depending on the workload

• We employ multiple high-speed Docker Registry Proxies for fast provisioning of new

images/containers

• No complex configuration management tools

• We use simple shell scripts in combination with EC2 tags to dynamically configure the hosts

• We have a job scheduler that also runs maintenance tasks on the long-running instances

• For service discovery we use AWS Load Balancers, Route 53, and Consul

Development and Testing

• We achieve a very high level of similarity in our dev and test stacks compared to production

• Development is either running Docker for Mac or native Docker on Linux

• Canary releases based on ECS and CloudFormation

• Most of our Docker images are home-grown

• Static configurations, simple template processing (we want to change that)

• Continuous Integration Testing through Jenkins

• Jenkins builds our containers

• Jenkins runs multiple tests in parallel

The History of Our Docker Setup

• Mobfox was founded in 2011 in Vienna by then 17-year-old Julian Zehetmayr

• It turned into a very successful startup

• Mobfox had a large publisher base and offices in London, Paris and Vienna

• It already made millions in revenue and still had high potential for growth

• It was a startup rollercoaster

• No version control

• One developer (the founder) for backend and frontend

• Everything was written in PHP• … spaghetti PHP• … no indentation PHP• … no exception handling PHP

• There was no deployment system

• Servers were ordered and provisioned manually

However…Mobfoxhad some issues …

Overcoming the Issues of a Startup• A small but great team was hired

• They immediately started putting Mobfox on solid ground

• Matomy bought Mobfox in November 2014 and liked the team so much that they decided to keep and expand the Vienna office

• It’s 2017 and we’re still there :)

Dealing with the DevOps Challenges – The Path to Docker

• In January 2015, they hired their first DevOps guy (me)

• We were in dire need of a good systems architecture

• We already had a lot of traffic

• We knew that we wanted to be able to run sth. as close to the production stack as possible in development

• After many considerations we decided to base it all on Docker

Deciding for Docker in 2015

• DockerCon San Francisco 2014 had just ended• Docker and its Ecosystem finally gave a clear picture• We did a few test setups locally and in production• A couple months later, all of our services were running inside containers

• It was a great learning experience

• We made some good and some bad decisions

Dev Tools

Official Repositories

Operating Systems

Big Data

Service Discovery

Build / Continuous Integration

Configuration ManagementConsulting &Training

Management

Storage

Clustering & Scheduling

Networking

Infrastructure & Service Providers

Security

Monitoring & Logging

The Docker Ecosystem

Source: “Intro to Docker at the 2016 Evans Developer Relations Conference,” Slideshare - https://www.slideshare.net/ManoMarks/intro-to-docker-at-the-2016-evans-developer-relations-conference

Problems with Docker in 2015

• Best practices for Container architectures really weren’t around

• Docker swarm was still in Beta

• Docker Compose was just announced

• Orchestration tools of the time saw Docker just as an execution engine with totally different usage patterns

• We decided to use docker-compose for development and

• “Maestro-Ng” for orchestration in production

Lessons Learned

• Docker was the right decision

• Handling 1bn requests per hour is hard on any platform

• Keep it as simple as possible, but not simpler

How Do We Handle that Amount of Traffic?

• We use the best hardware or virtual instance type for the job

• We don’t put multiple containers with the same affinity on the same instance

• But keep chatter local as long as possible

• We optimize our apps as much as possible

• We use Docker host networking when we have to deal with a lot of connection setups

• We know the pitfalls – by now

Sharing the Lessons Learned – They Might Apply to You!

• Don’t be afraid of it• It’s fine for most services that

don’t receive or establish a lot of new connections

• The overhead is otherwise minimal

Bridge Networking

• Whenever a lot of connection setups are happening

• But disable Netfilter Connection Tracking!

• No Docker Swarm Mode 😭• Otherwise big performance

gain! 🎉

Host Networking

• Choose your TCP congestion algorithm wisely!

• Have a look at Google BBR!• The internal DNS proxy of the

Docker daemon can’t handle many concurrent requests! Don’t use it for massive parallel queries of external systems!

• When auto-scaling, use fast registries or registry proxies to minimize spin-up time of your new containers

In addition …

Lessons learned – Running Java Inside Containers

• Java up until version 8 has no knowledge of cgroups

• It’ll happily take 1/4th of the host’s memory size for itself, even with container memory limits set

• Always specify memory limits for java explicitly

• There’s a nasty memory leak issue with the Hotspot VM + G1 Garbage Collector + Containers

• There’s no real fix yet, just workarounds

• Container memory limits are by default memory + 2*memory for swap

• We had jobs causing heavy disk i/o because they weren’t started with –Xmx parameters

What Will the Future Bring? A Wish List

• More standardization for (Linux) Containers• Container Network Interface (CNI) standard in Docker?

• More swappable container engines (rkt, ..)

• Less breaking changes

• Advancement of Docker Swarm Mode

• Practical CaaS (Container as a Service) solutions, also for bigger setups

• Better support for stateful services in AWS ECS

• Native service discovery in ECS

• Generic service discovery solutions that work out of the box with various Docker-based setups

Thank You! Questions?