How Adobe Has Built An OpenStack CloudJun Park (Ph.D, MBA), Solutions Architect At AdobeArghya Banerjee, Sr. Systems Engineer At Adobe
OpenStack Utah Meetup, Sept 24
Swiss Cheese Model
2
From Wikipedia
If aligned, flaws would allow an accident to occur
Flaws In Defense layers
Two More Factors That Complicate Things
3
SpaceTime Continuum- Einstein Interactions,
Higgs Field & Boson
From Wikipedia
From Youtube
Our Template
4
Time
Components
Dependencies
OpenStack Survey, May 2015
5
The most common arch:Ubuntu + KVM + OVS + Ceph
Adobe OpenStack Architecture
6
VM1 VM2eth0 eth1 eth0 eth1
VM3eth0 eth1
Private Networks: VxLAN-based
External Provider Networks: VLAN-basedAdobe Network Firewall
Adobe Corporate Networks
Storage: Ceph RBD
Adobe OpenStack Architecture
7
VM1eth0 eth1
External Provider Networks: VLAN-basedAdobe Network Firewall
Adobe Corporate Networks
Linux Bridge
OpenvSwitchbond
0 Physical VLANs
Set of ImagesGlance API
Server
Image1: Ubuntu Trusty
Volume1 : Ubuntu Trusty Copy-On-Write (COW) Ceph Volume
Snapshot1: Ubuntu Trusty
New Volume1 for VM1
New Volume2 for VM2
New Volume3 for VM3
Cinder API Server
Base Volume ForAll Three VMsIndividual COW
Volumes
Volume Management in OpenStack
2. Snapshot
3. Volumes
1. Copy
Live Demo
9
Possible Combinations
10
Containers VMsBare Metals ContainersIn ContainersVMs
Mesos Cluster Via Heat
11
MarathonZookeeper
VM1: mesos masterVM2: mesos slave1 VM3: mesos slave2
http server http server
Host1 Host2 Host3
-> Ubuntu-mesos imageavailable via diskimage-builder-> Post configuration for master-> starting services
-> Ubuntu-mesos image-> Post configuration for slave using mesos master IP.-> starting services
Mesos Cluster with Marathon
12
Marathon
Mesos Slave2
http server
Mesos MasterWith
ZooKeeper
Request to run a micro-servicevia REST API
Mesos Slave1
http server
Ebay’s CI Approach With Mesos
13
Marathon
Mesos Slave2
Jenkins Slaves
Mesos MasterWith
ZooKeeper
Create Jenkins Mastervia REST API
Mesos Slave1
Jenkins Master
Create Jenkins Slavesvia API
1
2
34
6
7
5
Takeaways From Mesos Demo Flexible & Powerful No External Dependencies Towards Maximizing Efficiency and
Productivity Good Hints for Better Services? Murano,
Magnum, and so on…
14
Heat Templates In Magnum
15
Time
Components
Dependencies
What Happened At Networking?
16
May ‘15Jul ‘14Apr ‘14
Ubuntu 14.04 Trusty ReleasedWith OVS 2.0.1
Bug Report With OVS 2.0.1In Ubuntu 14.04
Cherry-PickOn OVS 2.0.2
In Ubuntu 14.04.2 Ubunt
u 14.04
OpenvSwitch
(OVS)Bug Fix
In all OVS 2.x
Jun ‘13
This BugIntroduced withOVS Mega Flow
Aug ‘14
OVS 2.3.0OVS 2.1.3OVS 2.0.2Released
A New Bug: OVS Sporadically Crashes In Adding A Port(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
OVS 2.0.1 Released:Mega Flow
Multiprocessing
Dec ‘13
Enhancement PatchNot Yet Integrated
(e.g., 270 secs to 3 secsFor 25K rules)
Neutron Security Group
O(N^2) Issue
Restarting agents re-establishes entire flows Fix ready, not added
What Happened At Networking?
17
May ‘15Nov ‘14
Cherry-PickOnto OVS 2.0.2
In Ubuntu 14.04 Ubunt
u 14.04
OpenStack
Summits
A New Bug: OVS Sporadically Crashes In Adding A Port(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
OVS 2.0.1 Released:Mega Flow
Multiprocessing
Dec ‘13
OVS
ParisJuno
VancouverKilo
• Some companies reverted OVS to LinuxBridge!• Some pundits spread FUD about Neutron!
AtlantaIceHouse
May ‘14Apr ‘14
Ubuntu 14.04 Trusty ReleasedWith OVS 2.0.1
What Happened At Storage?
18
July ‘15Apr ‘14
Ubuntu 14.04 Trusty Released
With Ceph FireFly 0.79Ubuntu 14.04 Updates
With Ceph FireFly 0.80.10 Ubunt
u 14.04
Ceph Failover InstabilityWith FireFly Hammer?
Ceph Operational Instability,Cinder Scalability Issue
Enhancement SolutionNot Yet Integrated
(e.g., APIs Stacked Up -> Multiprocessing)
CinderCinder is stuck
when Ceph is stuck(e.g., use local drive for copying an image)
May ‘14
What Happened At Data Node?
19
July ‘15Apr ‘14
Ubuntu 14.04 Trusty ReleasedWith Kernel…
Ubuntu
14.04
KernelXFS
Deadlock
Bug
Kernel Memory Bug,Security Issue
Security PatchKVM Security Issue
May ‘14 Nov‘14
Bug Fix
Dec‘13
Ubuntu 14.04 Trusty ReleasedWith Kernel…
May ‘15
Our Workarounds Networks
Understand OVS and find stable OVS Cherry-pick for Neutron Scalability: firewall rules Our own out-of-band rate limiting on networks, e.g.,
200 Mbps Set up right MTU size on OVS structure Turn off GRO/LRO on hosts
Storage Cinder Scalability Ceph Stability: Hammer, reconfigure towards optimal
20
How To Test at Scale Emulate future production env
Create hundreds of VMs, inject workloads, and destroy all Recycle this entire test over and over again Findings: dead tokens stacked up
Each component scalability Neutron: OVS Cinder: Ceph Nova: KVM
21
Have We Done Enough?
4?
3?
23
It's not that I'm so smart, it's just that I stay with problems longer.
- Albert Einstein
New Efforts In OpenStack OpenStack Product Working Group
Link up between contributors and users Governance/DefCoreCommittee
Defining OpenStack Core Large Deployment Team
Operational issues for large delpoyments Open Virtual Network (OVN)
In-kernel Conntrack, DPDK, etc. Will run atop OVS
24
Milestone Murano
Application Catalog service: CloudFoundry, Kubernetes, Jenkins, Tomcat, etc.
Magnum Docker Swarm, Kubernetes, and Mesos (for our live
demo) Advanced Networking
DVR, Load Balancer, IPv6
25