39
CloudStack Best Practices In PPTV DeanWei

CloudStack Best Practice in PPTV

Embed Size (px)

DESCRIPTION

PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.

Citation preview

Page 1: CloudStack Best Practice in PPTV

CloudStack Best Practices In PPTV D e a n We i

Page 2: CloudStack Best Practice in PPTV

About Me

OPS Architect at PPTV

• 3 years experience in software development and design

• 6 years experience in technical consultant(infrastructure architecture design , integration , solution , capacity planning and performance tuning) for the top insurance companies (AIG,ASR,ACE,Fortis,SNS REAAL,Chubb,GEL,SBI)

• 1 year experience in ASP(Application Service Provider) platform architecture design,security, performance analysis and optimization ,and operations

• Current focus on the automation operations architecture, cloud platform building, the large-scale distributed system operations and performance analysis and optimization ,continuous delivery, System performance tuning

SINA WEIBO (DeanWei) : http://weibo.com/deanw

Page 3: CloudStack Best Practice in PPTV

Agenda

Why Cloud?

What is Cloudstack?

How to Build?

Page 4: CloudStack Best Practice in PPTV

Overview Why Use Cloud ?

Why Cloudstack ?

What is CloudStack ?

How to build A Cloud-Based Infrastructure Platform?

Cloudstack Best Practices In PPTV

Deployment Architecture

Network Considerations And Design

Storage Considerations And Design

Services Offering Considerations And Design

Troubleshooting Best Practices

Performance Tuning

Page 5: CloudStack Best Practice in PPTV

Background And Challenge

Page 6: CloudStack Best Practice in PPTV

The Original Infrastructure Provisioning Processes

APP OPS 申请资源

IDC 查找CMDB

IDC 初始化 OS IDC 安装VM 软件

IDC 创建VM

监控Team更新Zabbix 监控

APP OPS 更新 CMDB

App OPS 安装应用

App OPS 安装中间件

App OPS 初始 VM

Tools 调整 release 配置

更改控制审批 迁移到环境 重新布线,迁移到产品环境 应用上线

Page 7: CloudStack Best Practice in PPTV

Problems

A. Occupied by a large number of people

B. A large number of manual steps

C. Built one server at a time

D. Non-Self Service

E. Not out of the box by itself

F. Non-elastic

G. Path dependence

H. Long time for building

I. Many fault point

Page 8: CloudStack Best Practice in PPTV

Five Characteristics of Clouds

A. On-Demand Self-Service

B. Scalable

C. Resource Pooling

D. Rapid Elasticity

E. Measured Service

Cloud technology can solve our current confusion!

Page 9: CloudStack Best Practice in PPTV

Cloud-based Infrastructure Provisioning Processes

App OPS 申请应用

环境 OPS 访问

Services UI OPS 挑选应用最

近快照模板

资源自动分配和注册

选择可用资源

(验证资源分配) (选择应用模板和资源规模)

按 “启动”

(资源分配,自动创建VM,监控注册等)

(可用的资源和何时使用)

ERP CRM app

APP

App1

APP2

o Out of the box

o Parallel building

o Self Service

o One-button for All

o Elastic

Provisioned when needed

Page 10: CloudStack Best Practice in PPTV

Cloud Still Requires Architectural Design

Cloud Computing isn’t a magical solution apps need to be able to scale out

Design your architecture with the end in mind

Make your infrastructure easily replicable

Page 11: CloudStack Best Practice in PPTV

Popular Cloud Software Platform

Page 12: CloudStack Best Practice in PPTV

Why CloudStack?

Open Source: Apache 2.0

Cloudstack User(it is proven, and has a good track record)

It is very easy to install and get up and running

Less man hours for implementation

Easy to integration and custom

Match our requirements at this stage

Page 13: CloudStack Best Practice in PPTV

What is CloudStack?

Open source Infrastructure as a Service (IaaS) solution.

Programmable Data Center orchestrator

Hypervisor agnostic

Support scalable storage (Ceph, SWIF,NFS)

Support complex enterprise networking (e.g Firewall, load balancer, VPN, VPC…)

Multi-tenant

Page 14: CloudStack Best Practice in PPTV

Core Components

Hosts o Servers onto which services will be

provisioned

Primary Storage o VM disk storage

Cluster o A grouping of hosts and their associated

storage

Pod o Collection of clusters in the same failure

boundary

Network o Logical network associated with service

offerings

Secondary Storage o Template, snapshot and ISO storage

Zone o Collection of pods, network offerings and

secondary storage

Management Server Farm o Management and provisioning tasks

Zone

CloudStack Pod

Cluster

Host

Host

Network

Primary Storage

VM

VM

CloudStack Pod

Cluster Secondary

Storage

Page 15: CloudStack Best Practice in PPTV

Two Types of Storage

Pod 1

Host 2

Cluster 1

Host 1

Primary Storage

L3 switch

Secondary Storage

L2 switch

• Stores disk volumes for VMs in a cluster • Configured at Cluster-level. • Close to hosts for better performance • Cluster have at least one primary storage • Requires high IOPs (can be expensive)

Primary Storage

• Stores all Templates, ISOs and Snapshots • Configured at Zone-level • Zone can have one or more secondary

storages • High capacity, low cost commodity

storage

Secondary Storage

Page 16: CloudStack Best Practice in PPTV

Deployment Architecture

Pod 1

….

Cluster N

L2

Host 2

Cluster 1

Host 1

Hypervisor is the basic unit of scale.

Cluster consists of one ore more hosts of same hypervisor

All hosts in cluster have access to shared (primary) storage

Pod is one or more clusters, usually with L2 switches.

Availability Zone has one or more pods, has access to secondary storage.

One or more zones represent cloud

Primary Storage

Zone 1

….

L3

Secondary Storage

Pod N

Management Server Cluster

Internet

Page 17: CloudStack Best Practice in PPTV

Software Architecture

Management Server

Orchestration Engine - Drives long running VM

operations - Syncs between resources

managed and DB - Generates events

Resource Management

Cluster Management

Job Management

DB

UI Cloud Portal

CLI Other Clients

Deployment Planning

Network Gurus

Network Elements

Hypervisor Gurus

Database Access

Alert & Event Management

Plu

gin

AP

I

Resource API

Hypervisor Resources

Network Resources

Storage Resources

Image Resources

Snapshot Resources

REST API

OAM&P API End User API EC2 API Pluggable Service API Engine Other APIs

Security Adapters

Account Management Connectors

ACL & Authentication - Accounts, Domains, and Projects - ACL, limits checking

Services API

Serv

ices

AP

I

Console Proxy Management

Template Access

HA

Usage Calculations

Additional Services

Event Bus

Message Bus Usage Server

Page 18: CloudStack Best Practice in PPTV

Data And Control Flow

Data Center 1

Cloud

Data Center 2

Data Center 3

Management

Server

Management Servers control all resources, both virtual and physical

SSVMs deployed to transfer data between zones

CPVMs deployed to transfer VNC console traffic

VR deployed for traffic into public internet

Management Server is never in the data path

SSVM

SSVM

SSVM Transfer of Templates,

ISOs, Snapshots

CPVM CPVM

CPVM

VR

VR

VR

Internet

Page 19: CloudStack Best Practice in PPTV

How to build A Cloud-based infrastructure Platform?

A infrastructure Management Platform constitutes:

Provisioning

Configuration Management

Services Orchestration

Monitoring And Alert

How to build ?

Architecture

A programmable infrastructure architecture

Open Source ToolChains

Page 20: CloudStack Best Practice in PPTV

A infrastructure Management Platform constitutes

Provisioning

Installation of operating systems and other software

Configuration Management

Sets the parameters for servers, can specify initialized parameters

Services Orchestration

Automate tasks across systems

Monitoring And Alert

Records errors and health of infrastructure

Alert Services

Page 21: CloudStack Best Practice in PPTV

A Programmable Infrastructure Architecture

Page 22: CloudStack Best Practice in PPTV

Open Source Provisioning Tools

Year Started License Installation Targets

Kickstart ? GPL Most .dep and RPM based Linux distros

Cobbler (Plus koan for PXE boot of VMs)

2007 GPL Red Hat, OpenSUSE Fedora, Debian, Ubuntu

Spacewalk 2008 GPL Fedora, Centos

Crowbar 2011 Apache (Bare metal provisioning)

Page 23: CloudStack Best Practice in PPTV

Open Source Configuration Management Tools

Year Started

Language License Client/Server

Cfengine 1993 C Apache Yes

Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes

Puppet 2004 Ruby GPL yes

Salt 2011 Python Apache yes

Page 24: CloudStack Best Practice in PPTV

Open Source Monitoring Tools

License Type of Monitoring

Collection Methods

Cacti / RRDTool

GPL Performance SNMP, syslog

Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog

Zabbix GPL Availability/ Performance and more

SNMP, TCP/ICMP, IPMI, Synthetic Transactions

Zenoss GPL Availability, Performance, Event Management

SNMP, ICMP, SSH, syslog, WMI

Page 25: CloudStack Best Practice in PPTV

Open Source Automation/Orchestration Tools

Year Started

Language

License

Client/Server

Support Organization

Capistrano 2006 Ruby MIT Yes None

Controltier/RunDeck

2010 Java Apache Yes DTO Solutions

Func 2007 Python GPL Yes Fedora Project

MCollective 2009 Ruby Apache Yes PuppetLabs

Salt 2011 Python Apache Yes SaltStack Inc. ?

Page 26: CloudStack Best Practice in PPTV

Provisioning Activity Flow And Open Source Tools P

rovi

sio

nin

g A

ctiv

ity

Bootstrapping

Configuration

Command and Control

VM Image Launch

OS Install

Co

bb

ler

Clo

ud

stac

k

System Configuration

Pu

pp

et

Zab

bix

Application Services Orchestration And Management

Co

ntr

olT

ier

Serv

ice

s P

ort

al

Page 27: CloudStack Best Practice in PPTV

Automated Tools Chain in PPTV

BootStrapped Image

Cobbler/CloudStack

Configuration

Puppet

Services Orchestration ControlTier/Zabbix

agent

Provision Cobbler/Cloud

stack/Koan

Monitoring zabbix Cacti

Generate Images

BoxGrinder

CMDB CMDBUILD/Ra

ckTable

Page 28: CloudStack Best Practice in PPTV

Cloudstack In PPTV

CS Version : 3.0.2

Hypervisor : KVM

Host OS : Centos 6.2

KVM Guest OS : Centos 5.8

Multiple management servers are deployed in the multi-line/BGP IDC

Be deployed to all the core IDC and Used for the Non-vod business

More than 150 hosts

Primary storage : local Storage

Secondary Storage : Local NFS Server and GlusterFS

Network : Basic Network

Monitoring : Zabbix

System configuration management : Puppet

Services Orchestration management : ControlTier/Services Portal

Patches for the performance, integration and stability

Workaround for some issues

Page 29: CloudStack Best Practice in PPTV

Deployment Architecture

BGP Zone

BGP IDC

BGP/Multi-line Management Farm

广州电信 IDC

GZTB Zone

Management

Server

SHTB Zone

上海电信 IDC

BJCB Zone

北京网通 IDC 成都电信 IDC

CDTB Zone

沈阳电信 IDC

SYCB Zone

Page 30: CloudStack Best Practice in PPTV

Management Server Deployment Architecture

Slave

User API

Admin API

Load Balancer

Management Server1

Management Server2

MySQL

Replication

Infrastructure Resources

zone1

Infrastructure Resources

Zone2

Infrastructure Resources

Zone3

Page 31: CloudStack Best Practice in PPTV

Network Considerations And Design

Using Basic Network

Custom Network offering for basic network(Only use DHCP)

Disable Iptables for performance consideration(modify Sources Code)

Disable Security Group

Multi-zone design for PrimaryStorage Performance consideration

Page 32: CloudStack Best Practice in PPTV

Use Local Storage

A cluster mapping to a Host

Primary Storage

A local disk only services a VM instance

Backup VM instance as template on schedule

Using shared storage type

Separating application data and log

data to Root Volume and Data Volume

Secondary Storage

Local NFS Server

Backup Data use Inotify and Rsync

Network Card bonding

Up-link to 10G

Failover By manual

GlusterFS over NFS

Storage Considerations And Design

Pod 1

Cluster 1

Host 1 Primary Storage

L3 switch

Secondary Storage

L2 switch

Page 33: CloudStack Best Practice in PPTV

Services Offering Considerations And Design

Disable HA

A disk offering bind the specified disk

A compute offering bind the specified host and disk

Page 34: CloudStack Best Practice in PPTV

Provisioning Processes Best Practices

A. Install Host OS by cobber

B. Install CS agent and system settings by puppet

C. Install and configure monitor by puppet

D. Services Orchestration system trigger scripts to register host to CS

E. Services Orchestration system trigger script to generate Disk offerings and Compute offerings for Host

F. Services Orchestration system register host to CMDB

G. Host go launch

Page 35: CloudStack Best Practice in PPTV

Troubleshooting Best Practices

Analyse Log files

Management Log : /var/log/cloud/management/

Agent Log : /var/log/cloud/agent/

Adjust log4j level for debugging

Source Code

Data Models

Page 36: CloudStack Best Practice in PPTV

Performance Tuning

BIOS Settings for KVM Host

For Dell PowerEdge servers:

A. Set the Power Management Mode to Maximum Performance.

B. Set the CPU Power and Performance Management Mode to Maximum Performance.

C. Processor Settings: set Turbo Mode to enabled .

D. Processor Settings: set C States to disabled.

Page 37: CloudStack Best Practice in PPTV

Performance Tuning (contd)

CS Tuning

NFS Server Tuning

Use NFSV4

noatime,nodiratime,noacl,data=writeback,commit=15

IDE/Sata parameters

NIC &TCP/IP

Use GlusterFS

Management Server Tuning

Increase Worker Process Number

Turn off stats collectors

Tuning Allocation Algorithm

Tuning Direct Agent Load Size

Mysql DB tuning

JVM Tuning

Heap Size Tuning

Use CMS GC Algorithm

Page 38: CloudStack Best Practice in PPTV

Performance Tuning (contd)

KVM Tuning

CPU

Disable KSM in KVM Host

Disable tickless mode in KVM guest

PIN CPU in KVM host

Memory

THP in KVM Host

echo 'yes' > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag

echo 'always'> /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo 'never'> /sys/kernel/mm/redhat_transparent_hugepage/defrag

network performance issue in centos 6.2

Workaround: blacklist vhost-net. Edit /etc/modprobe.d/blacklist-kvm.conf and include vhost-net.

Linux kernel parameters tuning

TCP Buffer Tuning

Page 39: CloudStack Best Practice in PPTV

Q&A