2. [email protected] Overview Introduction Architecture Security
Container High availability
3. [email protected] Introduction First release in 2009 at the
Berkley University Framework to use datacenter resources
efficiently Combine Cpu, storage, memory etc. to one big shared
virtual resource A distributed systems kernel 10.000 lines C++
code
4. [email protected] Introduction - Definitions Master - Scheduler
Slaves Working Nodes Frameworks Application running on Mesos
Executors Run tasks on the slaves Executor-Task - Running job on
the slave Resource Offer - Slave resources which could be used by
the frameworks
8. [email protected] Resource Allocation 1) Slave 1 reports to the
master that it has 4 CPUs and 4 GB of memory free. The master then
invokes the allocation policy module, which tells it that framework
1 should be offered all available resources. 2) The master sends a
resource offer describing what is available on slave 1 to framework
1.
9. [email protected] Resource Allocation 3) The frameworks
scheduler replies to the master with information about two tasks to
run on the slave, using for the first task, and for the second
task. 2) Finally, the master sends the tasks to the slave, which
allocates appropriate resources to the frameworks executor, which
in turn launches the two tasks
10. [email protected] Resource Allocation - DRF Resource offer
decision are made by the Resource Allocation Modul in the master In
a heterogeneous environment resource allocation is difficult What
is a fair share, when: User a require 1 CPU, 4GB RAM User b require
3 CPUs, 1 GB RAM Mesos: Dominant Resource Fairness
11. [email protected] DRF A modified fair share algorithm The goal
is that each framework receives a fair share of the the resources
most needed by the framework Dominant resource: Resource most
demand by the framework Dominant Share: The highest percentage of
shares owned across all resources of a framework
12. [email protected] DRF - Example Resource offer: 9 Cpu, 18GB
RAM Tasks User A: 1CPU, 4 GB RAM - RAM=DR Tasks User B: 3CPUs, 1GB
RAM CPU=DR Each Framework has 2/3 of its DS
13. [email protected] DRF - Example Framework1: 1CPU, 4GB RAM
Framework2: 3CPU, 1GB RAM Buggy tasks could be killed by mesos
Framework can have guaranteed allocation, non of its tasks should
be killed
14. [email protected] RA Master Configuration Name Default Example
allocation_interval 1s framework_sorter drf user_sorter drf
offer_timeout 5 minutes roles - marathon,jenkins weights -
marathon=2,jenkins=1
15. [email protected] RA Slave Configuration Name Default Example
attributes ssd:true,rack:2 default_role * resources
cpus(jenkins):1;disk(jenkins):10000;
cpus(marathon):3;mem(marathon):2000
16. [email protected] Mesos Security Default configuration = No
security Name Example Master authenticate_slaves true credentials
/etc/mesos.pw authenticators crammd5 authenticate true Slave
credential /etc/mesos.pw
17. [email protected] Framework Security 1) Framework to
(re-)register with authorized roles 2)Framework to launch
task/executors as authorized users 3)Authorized principals to
shutdown frameworks through /shutdown HTTP endpoint
18. [email protected] Security ACLs Subjects Action Object
principals register_framework roles usernames run_tasks users
shutdown_frameworks framework_principals A set of subjects can
perform an action on a set of objects
20. [email protected] Extract of the mesos api URL Function
master:5050/help REST Documentation master:5050/metrics/snapshot
Metrics of the cluster master:5050/master/tasks.json List mesos
tasks master:5050/master/redirect 307 to the leading master
master:5050/master/shutdown Shutdown Framework
master:5050/registrar(1)/registry Content of the current registry
slave:5051/files/browse.json?path=pathOnSlave Browse files in
sandbox slave:5051/files/read.json?path=stdoutOnSlave Read stdout
from sandbox slave:5051/system/stats.json Local system metrics
21. [email protected] Resource Isolation Mesos supports Docker -
and Mesos Container Resource isolation with cgroups or posix
24. [email protected] Mesos Tasks States TaskState Int Description
TASK_STARTING 0 TASK_RUNNING 1 Task TASK_FINISHED 2 TERMINAL: The
task finished successfully TASK_FAILED 3 TERMINAL: The task failed
to finished TASK_KILLED 4 TERMINAL: The task was killed by executor
TASK_LOST 5 TERMINAL: The task was failed but can rescheduled
TASK_STAGING 6 Initial State TASK_ERROR 7 TERMINAL: Task
description contains an error
25. [email protected] References
http://mesos.apache.org/documentation/latest/ Mesos: A Platform for
Fine-Grained Resource Sharing in the Data Center Dominant Resource
Fairness: Fair Allocation of Multiple Resource Types
playing-traffic-cop-resource-allocation-in-apache-mesos
https://mesosphere.com/