39
DataSinks CONFIDENTIAL Big Data and Small Services the story of a whale bus in a data lake Farzad Senart - President & CTO, co-founder at DataSinks @fsenart @DataSinks

Big Data and Small Services - Meetupfiles.meetup.com/9585192/Docker Meetup - DataSinks - 21-09-2015.pdf · Big Data and Small Services ... Netflix, Hailo, etc. ... Our journey Workaround

Embed Size (px)

Citation preview

DataSinks CONFIDENTIAL

Big Data and Small Servicesthe story of a whale bus in a data lake

Farzad Senart - President & CTO, co-founder at DataSinks

@fsenart @DataSinks

DataSinks CONFIDENTIALFrance in XXI Century - Jean-Marc Côté

DataSinks CONFIDENTIALTeam

FarzadPresident & CTO

29 years oldIn a relationship

When I’m not codingI’m coding

LionelCEO

29 years oldIn a relationship, fiancé

You’d never guess what...All of us are looking at

ChristopheChief Commercial Officer

28 years old In a relationship, father

I really admireCh******he

YvesChief Customer Success Officer

34 years old In a relationship

The most beautiful arrangementA pile of things poured out at random

Fantastic Four

DataSinks CONFIDENTIALMission

Leverage the Internet of ANYthingWhile playing LEGO®

DataSinks CONFIDENTIALData Value Chain as a Service

DiscoveryIoT - Wearables - Physical stores - Websites - Databases - Flat files

DataSinks CONFIDENTIALData Value Chain as a Service

IngressStructured - Semi-structured - Pull - Push - Real time - Batch - Scalable - Highly available

DataSinks CONFIDENTIALData Value Chain as a Service

PersistenceDurable - Schema-free - Secured - Scalable - Highly available

DataSinks CONFIDENTIALData Value Chain as a Service

EnrichmentThird-party data - Open data - In-house data

DataSinks CONFIDENTIALData Value Chain as a Service

ProcessingModular - Third-party API - Real time - Batch - Scalable - Highly available

DataSinks CONFIDENTIALData Value Chain as a Service

AnalysisSQL - Data mart - Data lake - Machine learning

DataSinks CONFIDENTIALData Value Chain as a Service

EgressStructured - Semi-structured - Pull - Push - Real time - Batch - Scalable - Highly available

DataSinks CONFIDENTIAL

ApplicationSentiment analysis - Churn prediction - Recommendation - Forecasting - BI - Analytics - CRM - ERP - Databases - Flat files

Data Value Chain as a Service

DataSinks CONFIDENTIAL

SecurityDatensparsamkeit - Data governance - Audit trail - Encryption

Data Value Chain as a Service

DataSinks CONFIDENTIAL

Data Value Chain as a Service

Platform as a Service

Infrastructure as a Service

Big Data Application

Data source

Infrastructure overview

Data source

Data source

Data source

Data source

Big Data Application

Big Data Application

Big Data Application

Big Data Application

DataSinks

DataSinks CONFIDENTIALExample

DataSinks CONFIDENTIALExample

DataSinks CONFIDENTIALExample

DataSinks CONFIDENTIALExample

DataSinks CONFIDENTIAL

Microservices

Our journey

Services connect directly to each other willy-nilly.Netflix, Hailo, etc.

DataSinks CONFIDENTIAL

Microservices

Our journey

Services connect directly to each other willy-nilly.Netflix, Hailo, etc.

DataSinks CONFIDENTIAL

Microservices

Our journey

New services can build on what's out there, without anyone knowing to send anything to them directly. Fred George, ØREDEV 2013

Bus

Service

Service

Service

Need Choose Solution

Solutions

SolutionNeed

Need

DataSinks CONFIDENTIALOur journey

Microservices

AWS Kinesis AWS SNS AWS SQS

DataSinks CONFIDENTIAL

Aggregation

Our journey

DataSinks CONFIDENTIAL

Aggregation

Our journey

CPU

MEMORY

DISK

DataSinks CONFIDENTIAL

Aggregation

WASTED

Our journey

WASTED WASTED

DataSinks CONFIDENTIAL

Aggregation

WASTED

FREEFREE

Our journey

WASTED WASTED

DataSinks CONFIDENTIALOur journey

Mesos Master

Mesos Master

Mesos Master

Hadoop Scheduler

Marathon Scheduler

Mesos Slave

Zookeeper quorum

Hadoop task tracker

Mesos Executor

./ruby XTask #1

Task #2

Aggregation

Mesos Slave

Docker Executor

Docker Executor

./Xjava -jar X.jar

DataSinks CONFIDENTIAL

Scaling

Our journey

Cluster

Leverage existing AWS EC2 patterns AWS AutoScaling Policy AWS CloudWatch metrics AWS SNS

DataSinks CONFIDENTIAL

Scaling

Our journey

Cluster

Leverage existing AWS EC2 patterns AWS AutoScaling Policy AWS CloudWatch metrics AWS SNS

Services

Leverage AWS Elastic Load Balancer AWS Lambda AWS SNS

DataSinks CONFIDENTIAL

Logging

Our journey

6 million ways to log in Docker

In app Container collector Collector container Host syslog Syslog container File collector etc.

DataSinks CONFIDENTIAL

Logging

Our journey

6 million ways to log in Docker

In app Container collector Collector container Host syslog Syslog container File collector etc.

Customized AWS ECS AMI

log.Println(“Leverage Docker logging driver”) AWS CloudWatch Logs AWS cloud-init script GitHub

DataSinks CONFIDENTIAL

Security

Our journey

No perfect solution

AWS IAM roles are shared among containers AWS EC2 remains the most granular type of resource A possible future without hypervisors? A possible future with Docker containers as first-class citizens?

DataSinks CONFIDENTIAL

Continuous integration

Our journey

Github Docker Automated Builds

AWS API Gateway

AWS Lambda

DataSinks CONFIDENTIAL

CRON jobs

Our journey

Workaround

Leverage AWS Data Pipeline Start tasks with AWS ECS API Limited scheduling interval Future managed CRON service?

DataSinks CONFIDENTIAL

Collect

Data flow

Lambda

Lambda

Lambda

ECS

ECS

Kinesis

DataSinks CONFIDENTIAL

Dispatch

Data flow

SNS

Lambda

Lambda

Lambda

Lambda

Lambda

ECS

ECS

Kinesis Lambda

DataSinks CONFIDENTIAL

Process

Data flow

SNS

Lambda

Lambda

SQS

SQS

SQS

ECS

ECS

ECS

Lambda

Lambda

Lambda

ECS

ECS

Kinesis Lambda

DataSinks CONFIDENTIAL

Awesomeness

Cluster management API simplicity and completeness AWS services integrations

Hopes

Managed CRON services Managed service discovery Increased ecs-agent velocity

Feedback

DataSinks CONFIDENTIAL

DataSinks

@DataSinks