Upload
mark-hinkle
View
5.223
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Imagine it's eight o'clock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galáctica to discern the best course of action but your copy is likely out of date. And while the Hitchhiker's Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn't cover the nuances of cloud computing. That's why you need the Hitchhiker's Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more.
Citation preview
Hitchhiker’s Guide to The Open Cloud Linux Foundation Collaboration Summit 2013 Mark R. Hinkle Sr. Director , OPEN SOURCE SOLUTIONS Citrix Systems INC. @mrhinkle [email protected]
Mark Hinkle, Sr. Director, Open Source Solutions
• Dedicated to the success of the Apache CloudStack, Open Daylight & Xen Project Communi3es on Citrix behalf
• Run BuildACloud.org learning ac3vi3es all over the world • Joined Citrix via Cloud.com acquisi3on July 2011 • Zenoss Core Open Source project to 100,000 users, 1.5
million downloads • Former LinuxWorld Magazine Editor-‐in-‐Chief • Open Management ConsorGum organizer • Author -‐ “Windows to Linux Business Desktop MigraGon” –
Thomson • NetDirector Project -‐ Open Source Configura3on
Management • Some3mes Author and Blogger at SocializedSoJware.com • NetworkWorld Open Source Subnet
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
2
Why Open Source and the Cloud Computing?
• User-‐Driven Context from Solving Real Problems • Lower Barrier to Par3cipa3on • Larger user base, users helping users • Aggressive release cycles stay current with the state-‐of-‐the-‐art
• Open Source innova3ng faster than commercial • Open data, Open standards, Open APIs
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
3
Quick Cloud Computing Overview or the Obligatory “What is the Cloud Explanation”
In!nite Improbability Drive
Five Characteristics of Cloud
1. On-‐Demand Self-‐Service
2. Broad Network Access 3. Resource Pooling 4. Rapid Elas3city 5. Measured Service
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
5
Cloud Computing Service Models
USER CLOUD a.k.a. SOFTWARE AS A SERVICE Single application, multi-tenancy, network-based, one-to-many delivery of applications, all users have same access to features. Examples: Salesforce.com, Google Docs, Red Hat Network/RHEL DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE Application developer model, Application deployed to an elastic service that autoscales, low administrative overhead. No concept of virtual machines or operating system. Code it and deploy it. Examples: VMware CloudFoundry, Google AppEngine, Windows Azure, Rackspace Sites, Red Hat OpenShift, Active State Stackato, Appfog SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE Servers and storage are made available in a scalable way over a network. Examples: EC2,Rackspace CloudFiles, OpenStack, CloudStack, Eucalyptus, OpenNebula
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
6
Deployment Models: Public, Private & Hybrid
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
7
Building Open Source Clouds
Cloud Architecture
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
9
Hypervisors
Open Source • Xen, Project Xen Cloud PlaMorm (XCP) • KVM – Kernel-‐based VirtualizaGon • VirtualBox* -‐ Oracle supported Virtualiza3on Solu3ons • OpenVZ* -‐ Container-‐based, Similar to Solaris Containers or BSD Zones • LXC – User Space chrooted installs Proprietary • VMware • Citrix Xenserver (based • Microsoa Hyper-‐V • OracleVM (Based on OS Xen)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
10
Open Virtual Machine Formats
Open VirtualizaGon Format (OVF) is an open standard for packaging and distribu3ng virtual appliances or more generally soaware to be run in virtual machines.
Formats for hypervisors/cloud technologies: • Amazon -‐ AMI • KVM – QCOW2 • VMware – VMDK • Xen Project– IMG • VHD – Virtual Hard Disk -‐ Hyper-‐V
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
11
Sourcing Cloud Appliances Tool/Project What you can do with them
Bitnami BitNami provides free, ready to run environments for your favorite open source web applica3ons and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more.
Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for mul3ple virtualiza3on and Cloud providers
Oz Command-‐line tool that has the ability to create images for common Linux distribu3ons to run on KVM
SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2.
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
12
Scale-Up or Scale-Out
VerGcal Scaling (Scale-‐Up) Allocate addi3onal resources to VMs, requires a reboot, no need for distributed app logic, single-‐point of OS failure
Horizontal Scaling (Scale-‐Out) Applica3on needs logic to work in distributed fashion (e.g. HA-‐Proxy and Apache, Hadoop)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
13
Compute Clouds (IaaS)
Year Started License VirtualizaGon Technologies
Apache CloudStack
2008 Apache Xenserver, Xen Cloud Plalorm, KVM, VMware (Hyper-‐V developing)
Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version)
OpenNebula 2005 Apache Xen, KVM, VMware
OpenStack 2010 (Developed by NASA by Anso Labs previously)
Apache VMware ESX and ESXi, , Xen, Xen Cloud Plalorm KVM, LXC, QEMU and Virtual Box
Numerous companies are building cloud software on OpenStack including Nebula, Piston Inc., CloudScaling
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
14
OpenStack – Ecosystem of Projects
Enterprise Message Queue based on Rabbit MQ (ESB)
Object Storage “Swia”
Image Service “Glance
”
Compute “Nova”
Dashboard “Horizon”
KVM, VMware, Xen Cloud Plalorm Ceph, Gluster
Advanced Cloud and Networking services accessing the Quantum API
Firewall Service
Gateway Service
Quantum
Netw
orking Fabric REST API Plugins
OpenvSwitch Quantum Plugin-‐ins Id
en3ty Services “Ke
ystone
” API
20+ Collective projects hosted at: https://launchpad.net/openstack
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
15
Cloud APIs
• jclouds • libcloud • deltacloud • fog
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
16
Cloud Computing Storage Project DescripGon
Ceph Distributed file storage system developed by DreamHost
GlusterFS Scale Out NAS system aggrega3ng storage over Ethernet or Infiniband
OpenStack Storage
Long-‐term object storage system
Riak CS Riak CS is open source soaware designed to provide simple, available, distributed cloud storage at any scale. Riak CS is S3-‐API compa3ble and supports per-‐tenant repor3ng for billing and metering use cases.
Sheepdog Distributed storage for KVM hypervisors
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
17
Platform-as-a-Service (PaaS) Project Year Started Sponsors Languages/Frameworks
CloudFoundry 2011 VMware Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lia and more via partners (e.g. Python, PHP)
Cloudify 2012 Gigaspaces [Groovy for deployment recipes]
OpenShia ** 2011 Red Hat Java, Ruby, PHP, Perl and Python
Stackato* 2012 Ac3veState Java, Python, PHP, Ruby, Perl, Node.js, others
WSO2 Stratus 2010 WSO2 Jboss, Java EE6
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
18
Software Defined Networking (SDN)
Overview of Software Defined Networking
Business Applica3ons
Network Services
SDN Control Software
API API
Network Devices Network Devices Network Devices
Network Devices Network Devices Network Devices
Application Layer
Control Layer
Infrastructure Layer
Control Data Plane Interface (e.g. OpenFlow)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
20
Cloud Promise, Reality and Networks
Cloud Promise Cloud Reality Centralized ConfiguraGon and AutomaGon
Without true virtualiza3on, network devices must s3ll be manually configured.
Instant Self-‐Service Provisioning
In a physical network, it could take a long 3me for network engineer to provision new services.
ElasGcity and Scalability By horizontally scaling up the physical network, elas3city is lost.
Designed for Failure Failover can be automated and physical network limita3ons can be alleviated.
Source: Midokura
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
21
Open Flow
OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-‐defined "forwarding instruc3on set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors.
Image from http://www.open"ow.org/documents/open"ow-wp-latest.pdf
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
22
Software Defined Networking (SDN) Project Description
Floodlight The Floodlight controller is an enterprise-‐class, Apache-‐licensed, Java-‐based OpenFlow Controller.
Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-‐gigabit ports. Mul3ple gigabit plalorms with 10-‐gigabit uplinks are also supported.
Open Daylight Linux Founda3on Collabora3ve Project based on Cisco One Controller and
OpenStack Networking “Quantum”
Pluggable, scalable, API-‐driven network and IP management
Open vSwitch Open vSwitch is a open source (ASL 2.0), mul3layer virtual switch designed to enable massive network automa3on through programma3c extension, while s3ll suppor3ng standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
23
Big Data
Deep Thought
1 Billion Facebook Users - October 2012
0
200
400
600
800
1000
1200 De
c-‐04
Mar-‐05
Jun-‐05
Sep-‐05
Dec-‐05
Mar-‐06
Jun-‐06
Sep-‐06
Dec-‐06
Mar-‐07
Jun-‐07
Sep-‐07
Dec-‐07
Mar-‐08
Jun-‐08
Sep-‐08
Dec-‐08
Mar-‐09
Jun-‐09
Sep-‐09
Dec-‐09
Mar-‐10
Jun-‐10
Sep-‐10
Dec-‐10
Mar-‐11
Jun-‐11
Sep-‐11
Dec-‐11
Mar-‐12
Jun-‐12
Sep-‐12
Face
book
Use
rs in
Mill
ions
Source: Benphoster.com
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
25
Twitter at 400M Tweets Per Day – June 2012
0
50
100
150
200
250
300
350
400
450
Jan-‐07
Mar-‐07
May-‐07
Jul-‐0
7
Sep-‐07
Nov-‐07
Jan-‐08
Mar-‐08
May-‐08
Jul-‐0
8
Sep-‐08
Nov-‐08
Jan-‐09
Mar-‐09
May-‐09
Jul-‐0
9
Sep-‐09
Nov-‐09
Jan-‐10
Mar-‐10
May-‐10
Jul-‐1
0
Sep-‐10
Nov-‐10
Jan-‐11
Mar-‐11
May-‐11
Jul-‐1
1
Sep-‐11
Nov-‐11
Jan-‐12
Mar-‐12
May-‐12
Twee
ts in
Mill
ions
Source :TheBigDataGroup.com
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
26
Data is growing faster than storage capacity and compu3ng power. Legacy systems hold organiza3ons back; storage soaware must include mul3-‐petabyte capacity, support poten3ally billions of objects, and provide applica3on performance awareness and agile provisioning.
-‐Gartner, Big Data Challenges for the IT
Infrastructure Team
Big Data and Storage Infrastructure
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
27
Big Data Landscape Source: BigD
ataGroup.com
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
28
Open Source NoSQL Databases Name Type Description
Apache Cassandra
Wide Column Store/Families
API: many » Query Method: MapReduce, Replicaton: , Wriuen in: Java, Concurrency: eventually consistent , Misc: like "Big-‐Table on Amazon Dynamo alike", ini3ated by Facebook
CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Wriuen in: C/C++ + Erlang (clustering), Replica3on: Peer to Peer, fully consistent, Misc: Transparent topology changes during opera3on, provides memcached-‐compa3ble caching buckets
HBase Wide Column Store/Families
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replica3on: HDFS Replica3on, Wriuen in: Java
Hypertable Wide Column Store/Families
PI: Thria (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thria, Query Method: HQL, na3ve Thria API, Replica3on: HDFS Replica3on, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementa3on of Google's Bigtable.
MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-‐based language & MapReduce, Replica3on: Master Slave & Auto-‐Sharding, Wriuen in: C++,Concurrency
Redis Key Value/ Tuple Store
API: Tons of languages, Wriuen in: C, Concurrency: in memory and saves asynchronous disk aaer a defined 3me. Append only mode available. Different kinds of fsync policies. Replica3on: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues.
Riak Key Value / Tuple Store
API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Mul3ple Masters; Wriuen in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
29
MapReduce
Problem Data
Master Node
Worker Node 1
Worker Node 2
Worker Node 3
Solu3on Data
Map
Reduce
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
30
Apache Hadoop
Overview • Handles large amounts of data • Stores data in na3ve format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent applica3on scalability
Facts • Apache top-‐level open source project • One framework for storage and compute
– HDFS – Scalable storage in Hadoop Distributed File System (HDFS) – Compute via the MapReduce distributed processing plalorm
• Domain Specific Language (DSL) -‐ Java
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
31
Hadoop Architecture
Hadoop Common HDFS
Distributes & replicates data across machines
MapReduce Distributes & monitors tasks
Hive Data warehouse that provides SQL interface. Ad hoc projec3on of data structure to unstructured
MapReduce • Parallel programming • Handles large data blocks
Non-Relational DB
HBase Column-‐oriented
schema-‐less distributed DB modeled aaer Google’s BigTable
Random real 3me read/write.
Scripting
Pig Plalorm for
manipula3ng and analyzing large data sets. Scrip3ng language for
analysts.
Mahout Machine learning
libraries for recommenda3ons ,
clustering, classifica3ons and item sets.
Machine Learning
Chuckw
a Zookeepe
r
Man
agem
ent
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
32
Big Data Summary
• Quan3ty of Machine Created Data Increasing Dras3cally (examples: networked sensor data from mobile phones and GPS devices)
• Data manipula3on moving from batched to real-‐3me • Cloud services giving everyone Big Data tools • Consumer company speed and scale requirements driving
efficiencies in Big Data storage and analy3cs • New and broader number of data sources being meshed
together • Big Data Apps means using Big Data is faster and easier
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
33
Cloud Management Tools
Automation in the Cloud
Meat Cloud Cloud Operations
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
35
4 Types of Management Tools
Provisioning Installation of operating systems and other software Configuration Management Sets the parameters for servers, can specify installation parameters Orchestration/Automation Automate tasks across systems Monitoring Records errors and health of IT infrastructure
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
36
Management Toolchains
Configura3on
Patching and
Provisioning
Monitoring
Toolchain (n): A set of tools where the output of one tool becomes the input of another tool
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
37
Provisioning Project Installation Targets
Apache Provisionr(incuba3ng)
Can provision 10s to 1000s of machines on various clouds.
Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs
Crowbar (Bare metal provisioning)
JuJu Public Clouds -‐ Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS.
Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server via ZeroMQ
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
38
Configuration Management Tools
Project Year Started Language License Client/Server
Cfengine 1993 C Apache Yes
Chef 2009 Ruby Apache Chef Solo – No Chef Server -‐ Yes
Puppet 2004 Ruby GPL Yes & standalone
Salt 2011 Python Apache yes
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
39
Automation/Orchestration Tools Project DescripGon
Ansible Ansible's SSH-‐key based access allows contributors to the Fedora Project to assist in automa3ng infrastructure while having access limited appropriately.
Capistrano U3lity and framework for execu3ng commands in parallel on mul3ple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles
RunDeck Rundeck is an open-‐source process automa3on and command orchestra3on tool with a web console.
Func Func provides a two-‐way authen3cated system for generically execu3ng tasks, integra3ons with puppet and cobbler.
MCollec3ve The Marioneue Collec3ve AKA MCollec3ve is a framework to build server orchestra3on or parallel job execu3on systems.
Salt Execute arbitrary shell commands or choose from dozens of pre-‐built modules of common (or complex) commands.
Scalr Provide scaling across mul3ple cloud compu3ng plalorms, integrates with Chef.
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
40
Conceptual Automated Toolchain
BootStrapped Image CloudStack OpenStack
ConfiguraGon Puppet Chef
Start/Stop Services RunDeck Capistrano MCollec3ve
Provision Cobbler
SUSE Stuido
Monitoring Nagios Zenoss Cac3
Generate Images SUSE Studio BoxGrinder
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
41
NetFlix Open Source ToolBag for AWS
ASGARD ASTYANAX EDDA
EUREKA PRIAM SIMIAN ARMY
42
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
http://net"ix.github.com
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
43
Goodbye and thanks for all the fish!
Questions? Slides Can be Viewed and Downloaded at: http://www.slideshare.net/socializedsoftware/
Copyright Mark R. Hinkle, available under the CCbySA license some rights reserved. 2012 -2013
Contact Me
Professional: [email protected] Personal: [email protected]
Phone: 919.228.8049
Personal: http://www.socializedsoftware.com
Twitter: @mrhinkle
Mark R. Hinkle Senior Director, Open Source Solutions Citrix Systems Inc. Open Source Enthusiast
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
46
Appendix
Additional Resources
• Devops Toolchains Group • Soaware Defined Networking: The New Norm for Networks (Whitepaper) • DevOps Wikipedia Page • NoSQL-‐Database.org – Ul3mate Guide to the Non-‐Rela3onal Universe • Open Cloud Ini3a3ve • NIST Cloud Compu3ng Plalorm • Open Virtualiza3on Format Specs • Cloudera3 Twiuer Account • Planet DevOps • Nicira Whitepaper – It’s Time to Virtualize the Network • Why Open vSwitch FAQ
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
48
Monitoring Tools
License Type of Monitoring CollecGon Methods
Cac3 / RRDTool GPL Performance SNMP, syslog
Graphite Apache 2.0 Performance Agent
Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog
Zabbix GPL Availability/ Performance and more
SNMP, TCP/ICMP, IPMI, Synthe3c Transac3ons
Zenoss GPL Availability, Performance, Event Management
SNMP, ICMP, SSH, syslog, WMI
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
49