Upload
trinhdan
View
216
Download
3
Embed Size (px)
Citation preview
DataSinks CONFIDENTIAL
Big Data and Small Servicesthe story of a whale bus in a data lake
Farzad Senart - President & CTO, co-founder at DataSinks
@fsenart @DataSinks
DataSinks CONFIDENTIALTeam
FarzadPresident & CTO
29 years oldIn a relationship
When I’m not codingI’m coding
LionelCEO
29 years oldIn a relationship, fiancé
You’d never guess what...All of us are looking at
ChristopheChief Commercial Officer
28 years old In a relationship, father
I really admireCh******he
YvesChief Customer Success Officer
34 years old In a relationship
The most beautiful arrangementA pile of things poured out at random
Fantastic Four
DataSinks CONFIDENTIALData Value Chain as a Service
DiscoveryIoT - Wearables - Physical stores - Websites - Databases - Flat files
DataSinks CONFIDENTIALData Value Chain as a Service
IngressStructured - Semi-structured - Pull - Push - Real time - Batch - Scalable - Highly available
DataSinks CONFIDENTIALData Value Chain as a Service
PersistenceDurable - Schema-free - Secured - Scalable - Highly available
DataSinks CONFIDENTIALData Value Chain as a Service
EnrichmentThird-party data - Open data - In-house data
DataSinks CONFIDENTIALData Value Chain as a Service
ProcessingModular - Third-party API - Real time - Batch - Scalable - Highly available
DataSinks CONFIDENTIALData Value Chain as a Service
AnalysisSQL - Data mart - Data lake - Machine learning
DataSinks CONFIDENTIALData Value Chain as a Service
EgressStructured - Semi-structured - Pull - Push - Real time - Batch - Scalable - Highly available
DataSinks CONFIDENTIAL
ApplicationSentiment analysis - Churn prediction - Recommendation - Forecasting - BI - Analytics - CRM - ERP - Databases - Flat files
Data Value Chain as a Service
DataSinks CONFIDENTIAL
SecurityDatensparsamkeit - Data governance - Audit trail - Encryption
Data Value Chain as a Service
DataSinks CONFIDENTIAL
Data Value Chain as a Service
Platform as a Service
Infrastructure as a Service
Big Data Application
Data source
Infrastructure overview
Data source
Data source
Data source
Data source
Big Data Application
Big Data Application
Big Data Application
Big Data Application
DataSinks
DataSinks CONFIDENTIAL
Microservices
Our journey
Services connect directly to each other willy-nilly.Netflix, Hailo, etc.
DataSinks CONFIDENTIAL
Microservices
Our journey
Services connect directly to each other willy-nilly.Netflix, Hailo, etc.
DataSinks CONFIDENTIAL
Microservices
Our journey
New services can build on what's out there, without anyone knowing to send anything to them directly. Fred George, ØREDEV 2013
Bus
Service
Service
Service
Need Choose Solution
Solutions
SolutionNeed
Need
DataSinks CONFIDENTIALOur journey
Mesos Master
Mesos Master
Mesos Master
Hadoop Scheduler
Marathon Scheduler
Mesos Slave
Zookeeper quorum
Hadoop task tracker
Mesos Executor
./ruby XTask #1
Task #2
Aggregation
Mesos Slave
Docker Executor
Docker Executor
./Xjava -jar X.jar
DataSinks CONFIDENTIAL
Scaling
Our journey
Cluster
Leverage existing AWS EC2 patterns AWS AutoScaling Policy AWS CloudWatch metrics AWS SNS
DataSinks CONFIDENTIAL
Scaling
Our journey
Cluster
Leverage existing AWS EC2 patterns AWS AutoScaling Policy AWS CloudWatch metrics AWS SNS
Services
Leverage AWS Elastic Load Balancer AWS Lambda AWS SNS
DataSinks CONFIDENTIAL
Logging
Our journey
6 million ways to log in Docker
In app Container collector Collector container Host syslog Syslog container File collector etc.
DataSinks CONFIDENTIAL
Logging
Our journey
6 million ways to log in Docker
In app Container collector Collector container Host syslog Syslog container File collector etc.
Customized AWS ECS AMI
log.Println(“Leverage Docker logging driver”) AWS CloudWatch Logs AWS cloud-init script GitHub
DataSinks CONFIDENTIAL
Security
Our journey
No perfect solution
AWS IAM roles are shared among containers AWS EC2 remains the most granular type of resource A possible future without hypervisors? A possible future with Docker containers as first-class citizens?
DataSinks CONFIDENTIAL
Continuous integration
Our journey
Github Docker Automated Builds
AWS API Gateway
AWS Lambda
DataSinks CONFIDENTIAL
CRON jobs
Our journey
Workaround
Leverage AWS Data Pipeline Start tasks with AWS ECS API Limited scheduling interval Future managed CRON service?
DataSinks CONFIDENTIAL
Dispatch
Data flow
SNS
Lambda
Lambda
Lambda
Lambda
Lambda
ECS
ECS
Kinesis Lambda
DataSinks CONFIDENTIAL
Process
Data flow
SNS
Lambda
Lambda
SQS
SQS
SQS
ECS
ECS
ECS
Lambda
Lambda
Lambda
ECS
ECS
Kinesis Lambda
DataSinks CONFIDENTIAL
Awesomeness
Cluster management API simplicity and completeness AWS services integrations
Hopes
Managed CRON services Managed service discovery Increased ecs-agent velocity
Feedback