Shimon Tolts General Manager, Data Solutions
AtomData Pipeline Processing 200B events
with Node.js And Docker On AWS
About ironSource: Hypergrowth
People Reached Each Month
4200Apps Installed Every Minutewith the ironSource Platform
Registered & Analyzed Data EventsEvery Month
200B
800M
50B
0
100B
150B
200B
Jun 201
5
Jul 201
5
Aug 201
5
Sep 201
5
Oct 201
5
Nov 201
5
Dec 201
5
Jan 201
6
Feb 201
6
Mar 201
6
Apr 201
6
May 201
6
We needed a way to manage this data:
Our Business Challenge
ProcessCollect Store
Collection
● Multi region layer - Latency based
routing
● Low latency from client to Atom servers
● High Availability - AWS regions does
fail!
● Storing raw data + headers upon
receiving
Data Enrichment● Enrich data before storing in your Data
Lake and/or Warehouse○ IP to Country○ Currency conversion ○ Decrypt data○ User Agent parsing - OS, Browser, Device...
● Any custom logic you would like! - fully extendible
Data Targets● Near real-time data insertion - 1
minute!● Stream data to Google Storage and/or
AWS S3● Smart insertion of data into AWS
Redshift○ Set the amount of parallel copys○ Configure priority on tables
● BigQuery - Streaming data using batch files import (saves 20% cost)
Micro-Services Architecture● Everything is a service● Decoupling● Distributed systems
Separate lifecycle● Communication using RESTful /
Queue / Streams
Docker● Linux Container● Save provisioning time● Infrastructure as code● Dev-Test-Production - identical
container● Ship easily
Cloud infrastructure● Pay as you go - (grow)● SaaS services ● Auto-scaling-groups● DynamoDB● RDS *SQL● Redshift data warehouse
Continuous Integration● From commit to production● Jenkins commit hook● Git branching model● AWS dynamic slaves● Unit tests● Docker builds● Updating live environment
Diagram
Starting PointPre-baked images - AMIs
Supervisor
Nginx reverse proxy
Node.js * cpu-count
Provisioning time * instances
Bash provisioning scripts
Minimum Viable ProductInfrastructure as code
Nginx
Node.js * cpu-count
Supervisor
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/bb6be85b97132cbdd10084305ee1ee2f414b0b50/Dockerfile
Interactive CycleNginx
Supervisor
Infrastructure as code
Node.js * cpu-count
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/c4bbad11a323fd6e36ff31505c43e7c8dc51b1eb/Dockerfile-iojs-cluster
User Data
https://github.com/ironSource/docker-config/blob/2f4ccc7c277850de928cc432f47b2fc58fb8732a/Dockerfile-nodejs-cluster
docker-common.yml
docker-compose.yml
https://stash.ironsrc.com/projects/INFRA-IB/repos/ironbeastcompserter/browse/docker-compose.ymlDocker Compose Example #1 (Using ‘Extends):
User Data
Docker Compose Example #2 (Using ‘links’):