16
Hadoop on Dockers

Hadoop on Dockers

Embed Size (px)

Citation preview

Page 1: Hadoop on Dockers

Hadoop on Dockers

Page 2: Hadoop on Dockers

What are Dockers?

• Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.

• At a high level, Docker is a Linux utility that can efficiently create, ship, and run containers.

• Docker containers wrap a piece of software in a complete file system that contains everything needed to run: code, runtime, system tools, system libraries

• Docker enables you to quickly, reliably, and consistently deploy applications regardless of environment.

Page 3: Hadoop on Dockers

What are containers?

• Linux containers are self-contained execution environments -- with their own, isolated CPU, memory, block I/O, and network resources

• Feels like a virtual machine, but sheds all the weight and startup overhead of a guest operating system

• Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

Page 4: Hadoop on Dockers

Container vs Virtual Machines

• Containers and virtual machines have similar resource isolation and allocation benefits -- but a different architectural approach allows containers to be more portable and efficient.

VIRTUAL MACHINES

Virtual machines include the application, the

necessary binaries and libraries, and an entire guest

operating system -- all of which can amount to tens

of GBs.

CONTAINERS

Containers include the application and all of its

dependencies --but share the kernel with other containers,

running as isolated processes in user space on the host

operating system. Docker containers are not tied to any

specific infrastructure: they run on any computer, on any

infrastructure, and in any cloud.

Page 5: Hadoop on Dockers

Containers:

Page 6: Hadoop on Dockers

Virtual Machines

Page 7: Hadoop on Dockers

Dockers

Page 8: Hadoop on Dockers

Advantages• Containers running on a single machine share the same operating system

kernel; they start instantly and use less RAM. Images are constructed from layered file systems and share common files, making disk usage and image downloads much more efficient.

• Docker containers are based on open standards, enabling containers to run on all major Linux distributions and on Microsoft Windows -- and on top of any infrastructure.

• Containers isolate applications from one another and the underlying infrastructure, while providing an added layer of protection for the application.

• Eliminate Environment inconsistencies

• Distribute and share Content

• Simply Share your application with other without worrying about environment

Page 9: Hadoop on Dockers

Advantages..

• Quickly Scale

• Docker makes it easy to identify issues, isolate the problem container, quickly roll back to make the necessary changes, and then push the updated container into production

• Dockers allows you to bundle Build Once, Run anywhere

Page 10: Hadoop on Dockers

Dockerfile and Image

Page 11: Hadoop on Dockers

Example:

Page 12: Hadoop on Dockers

HADOOP ON DOCKERS

• Docker is the New Quick Start Option for Apache Hadoop and Cloudera

http://blog.cloudera.com/blog/2015/12/docker-is-the-new-quickstart-option-for-apache-hadoop-and-cloudera/

Page 13: Hadoop on Dockers

HADOOP ON DOCKERS

Page 14: Hadoop on Dockers

Some Challenges

• Which container manager to choose? Swarn, kubernetes, AWS ECS, MESOS ?

• How to handle Storage Configuration? Overlay files, flocker, canvoy?

• Which network configurations?

• Software compatibly? What OS(linus/ubunutu), build of Hadoop, application layer, how to make sure all these work together.

• Maintenance : availability, multi-container, upgrades, patches, back up?

Page 15: Hadoop on Dockers
Page 16: Hadoop on Dockers

References:

https://www.youtube.com/watch?v=pGYAg7TMmp0https://www.youtube.com/watch?v=biJTvobZm1Ahttps://www.youtube.com/watch?v=YFl2mCHdv24