View
215
Download
0
Category
Preview:
Citation preview
1
Modern Approaches of Customer’s Dream Distribution Across the ClusterEvgenij Kozhevnikov, Samara
AUGUST 4, 2015
2
About me
1+ years of production experience in BigData– Edmunds.com– BigData CC
3+ years of development experience in BigData– Hadoop– Spark– Storm– Akka
6+ years of development experience– Java EE, IBM Websphere– Spring
3
Successful Business - Growing Business
4
Growing Business – Growing Load
5
Our software should be ready to grow with business
1. Pay for your needs, not for plans
2. Growth doesn’t require any changes in application
3. Where one growing app, there are some growing apps
6
Our software should be ready to grow with business
7
Our software should be ready to grow with business
• Caching Proxy• CDN
8
Our software should be ready to grow with business
• Caching Proxy• CDN
• NoSQL• Distributed cache
9
Our software should be ready to grow with business
• Caching Proxy• CDN
• NoSQL• Distributed cache
10
Does Edmunds need cluster solution?
1. What trends we have now?2. Is quality of the vehicle catalog is enough?3. Is our ad efficient?4. What results of A/B testing do we get?5. What can we recommend to our clients?6. Where is the car that client needs?7. How many leads were sent to the dealer?8. Is the dealer successful?9. Are our visitors not robots?10. What revenue do we have in this year?11. Are we growing?12. Are our dealers growing?
11
1. What trends we have now?2. Is quality of the vehicle catalog is enough?3. Is our ad efficient?4. What results of A/B testing do we get?5. What can we recommend to our clients?6. Where is the car that client needs?7. How many leads were sent to the dealer?8. Is the dealer successful?9. Are our visitors not robots?10. What revenue do we have in this year?11. Are we growing?12. Are our dealers growing?
It’s not a competitive advantageAll competitors do that
Does Edmunds need cluster solution?
12
• Need in fast access to the whole amount of data
• Historical data is important as a new one
• Support dynamically extended hardware resources
• Be able to run some independent applications on the same cluster
• Each application run require specific amount of resources
• Need in convenient monitoring tool and fault-tolerance of the system
• Code should be readable and distributed algorithms should be supportable
Does Edmunds need cluster solution?
Growing amount of data
Amount of tasks growth
13
MAPREDUCEYARN
Hadoop-based solutions
14
MapReduce across YARN
Node Node
Node Node
Node
Node
Node
Node
15
MapReduce across YARN
ResourceManager
NameNode
Resource Manager
Name Node
Node
Node
Node
Node
16
MapReduce across YARN
StandbyResourceManager
ActiveResource Manager
HadoopClient
MR Application
Master
NameNode
Data NodeMR Executor
Data NodeMR Executor
17
SPARKYARN
Hadoop-based solutions
18
Spark across YARN
StandbyResourceManager
ActiveResource Manager
SparkClient
MR Application
Master
NameNode
Data NodeSpark Executor
Data NodeSpark Executor
19
SPARKMESOS
Mesosphere-based solutions
20
Spark across YARN
StandbyMesos Master
ActiveMesosMaster
SparkClient
Spark Scheduler
NameNode
Data NodeSpark Executor
Data NodeSpark Executor
21
1 2 3 4
WHAT NEXT
Myriad
• YARN on Mesos
• Efficient access to Hadoop resources
• Dynamic nature of Mesos
Kubernetes
• Resource Manager for docker-based infrastructure
• Solution from Google
Akka Cluster
• Efficient model for vertical and horizontal scaling
• Freedom of choosing the way of distribution
Task-specific tools
• Apache Storm
• Hive/Pig/Cascading…
• NoSQL solutions
• Kafka/Sqoop/Flume…
• Chef/Puppet/Ansible…
• Docker/Rocket/CoreOS
• Data Science
22
Modern Approaches of Customer’s Dream Distribution Across the Cluster
Evgenij Kozhevnikov, Samara
Recommended