14
Mesos Python framework O. Sallou, DevExp 2016 CC-BY-SA 3.0

Creating a Mesos python framework

Embed Size (px)

Citation preview

Mesos Python frameworkO. Sallou, DevExp 2016

CC-BY-SA 3.0

Interacting with Mesos, 2 choicesPython API:

- not compatible with Python 3- Easy to implement- Bindings over C API

HTTP API:

- HTTP calls with persistent connection and streaming- Recent- Language independent,

WorkflowRegister => Listen for offer => accept/decline offer => listen for job status

Messages use Protobuf [0], HTTP interface also supports JSON.

See Mesos protobuf definition [1] to read or create messages.

[0] https://developers.google.com/protocol-buffers/

[1] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto

Simple examplePython API

Register framework = mesos_pb2.FrameworkInfo()

# mesos_pb2.XXX() read/use/write protobuf Mesos objects

framework.user = "" # Have Mesos fill in the current user.

framework.name = "Example Mesos framework"

framework.failover_timeout = 3600 * 24*7 # 1 week

# Optionally, restart from a previous run

mesos_framework_id = mesos_pb2.FrameworkID()

mesos_framework_id.value = XYZ

framework.id.MergeFrom(mesos_framework_id)

framework.principal = "godocker-mesos-framework"

# We will create our scheduler class MesosScheduler in next slide

mesosScheduler = MesosScheduler(1, executor)

# Let’s declare a framework, with a scheduler to manage offers

driver = mesos.native.MesosSchedulerDriver(

mesosScheduler,

framework,

‘zk://127.0.01:2881’)

driver.start()

executor = mesos_pb2.ExecutorInfo()

executor.executor_id.value = "sample"

executor.name = "Example executor"

When scheduler ends...When scheduler stops, Mesos will kill any remaining tasks after “failover_timeout” value.

One can set FrameworkID to restart framework and keep same context. Mesos will keep tasks, and send status messages to framework.

Scheduler skeletonclass MesosScheduler(mesos.interface.Scheduler):

def registered(self, driver, frameworkId, masterInfo):

logging.info("Registered with framework ID %s" % frameworkId.value)

self.frameworkId = frameworkId.value

def resourceOffers(self, driver, offers):

'''

Receive offers, an offer defines a node

with available resources (cpu, mem, etc.)

'''

for offer in offers:

logging.debug('Mesos:Offer:Decline)

driver.declineOffer(offer.id)

def statusUpdate(self, driver, update):

'''

Receive status info from submitted tasks

(switch to running, failure of node, etc.)

'''

logging.debug("Task %s is in state %s" % \

(update.task_id.value, mesos_pb2.TaskState.Name

(update.state)))

def frameworkMessage(self, driver,

executorId, slaveId, message):

logging.debug("Received framework message")

# usually, nothing to do here

Messages are asynchronousStatus updates and offers are asynchronous callbacks. Scheduler run in a separate thread.

You’re never the initiator of the requests (except registration), but you will receive callback messages when something change on Mesos side (job switch to running, node failure, …)

Submit a taskfor offer in offers:

# Get available cpu and mem for this offer

offerCpus = 0

offerMem = 0

for resource in offer.resources:

if resource.name == "cpus":

offerCpus += resource.scalar.value

elif resource.name == "mem":

offerMem += resource.scalar.value

# We could chek for other resources here

logging.debug("Mesos:Received offer %s with cpus: %s and mem: %s" \

% (offer.id.value, offerCpus, offerMem))

# We should check that offer has enough resources

sample_task = create_a_sample_task(offer)

array_of_task = [ sample_task ]

driver.launchTasks(offer.id, array_of_task)

Mesos support any custom resource definition on

nodes (gpu, slots, disk, …), using scalar or range

values

When a task is launched, requested resources will be

removed from available resources for the selected

node.

Next offers won’t propose thoses resources again

until task is over (or killed).

Define a taskdef create_a_sample_task(offer):

task = mesos_pb2.TaskInfo()

# The container part (native or docker)

container = mesos_pb2.ContainerInfo()

container.type = 1 # mesos_pb2.ContainerInfo.Type.DOCKER

# Let’s add a volume

volume = container.volumes.add()

volume.container_path = “/tmp/test”

volume.host_path = “/tmp/incontainer”

volume.mode = 1 # mesos_pb2.Volume.Mode.RW

# The command to execute, if not using entrypoint

command = mesos_pb2.CommandInfo()

command.value = “echo hello world”

task.command.MergeFrom(command)

# Unique identifier (or let mesos assign one)

task.task_id.value = XYZ_UNIQUE_IDENTIFIER

# the slave where task is executed

task.slave_id.value = offer.slave_id.value

task.name = “my_sample_task”

# The resources/requirements

# Resources have names, cpu, mem and ports are available

# by default, one can define custom ones per slave node

# and get them by their name here

cpus = task.resources.add()

cpus.name = "cpus"

cpus.type = mesos_pb2.Value.SCALAR

cpus.scalar.value = 2

mem = task.resources.add()

mem.name = "mem"

mem.type = mesos_pb2.Value.SCALAR

mem.scalar.value = 3000 #3 Go

Define a task (next) # Now the Docker part

docker = mesos_pb2.ContainerInfo.DockerInfo()

docker.image = “debian:latest”

docker.network = 2 # mesos_pb2.ContainerInfo.DockerInfo.Network.BRIDGE

docker.force_pull_image = True

container.docker.MergeFrom(docker)

# Let’s map some ports, ports are resources like cpu and mem

# We will map container port 80 to an available host port

# Let’s pick the first available port for this offer, for simplicity

# we will skip here controls and suppose there is at least one port

offer_port = None

for resource in offer.resources:

if resource.name == "ports":

for mesos_range in resource.ranges.range:

offer_port = mesos_range.begin

break

# We map port 80 to offer_port in container

docker_port = docker.port_mappings.add()

docker_port.host_port = 80

docker_port.container_port = offer_port

# We tell mesos that we reserve this port

# Mesos will remove it from next offers until task

completion

mesos_ports = task.resources.add()

mesos_ports.name = "ports"

mesos_ports.type = mesos_pb2.Value.RANGES

port_range = mesos_ports.ranges.range.add()

port_range.begin = offer_port

port_range.end = offer_port

task.container.MergeFrom(container)

return task

Task statusdef statusUpdate(self, driver, update):

'''

Receive status info from submitted tasks

(switch to running, failure of node, etc.)

'''

logging.debug("Task %s is in state %s" % \

(update.task_id.value, mesos_pb2.TaskState.Name(update.state)))

if int(update.state= == 1:

#Switched to RUNNING

container_info = json.loads(update.data)

if int(update.state) in [2,3,4,5,7]:

# Over or failure

logging.error(“Task is over or failed”)

Want to kill a task?def resourceOffers(self, driver, offers):

….

task_id = mesos_pb2.TaskID()

task_id.value = my_unique_task_id

driver.killTask(task_id)

A framework

Quite easy to setup

Many logs on Mesos side for debug

Share the same resources with other frameworks

Different executors (docker, native, …)

In a few lines of code