17
Infrastructure Building physical in a virtual world

Building Physical in a Virtual World

Embed Size (px)

Citation preview

Infrastructure Building physical in a virtual world!

Who am I?

Infrastructure Operations @ HootSuite

Chris Maxwell!Lead Operations Engineer!@[email protected]!

Previously

Coral Princess, 2010!Left: bow thrusters, core network!Right: improvised cooling!!

Princess Cruises – Drydock / datacenter refit team

Why should I listen to you?

Just a guy who’s been in the trenches a long time.

•  Learned to code in C long ago. BSD kernel hacking, secure messaging, managed security appliances, nomadic file systems.!

•  >1000 wireless access points deployed to 14 cruise ships!

•  6 Cisco core network replacements from Nortel Passport!

•  First live-voyage core network replacement (Diamond Princess)!

•  Built 22 broadband wireless towers (of 75)!

•  Regional Voice-over-IPX (DSP on OS/2 over Novell !)!

Why HootSuite went physical

“unique” workload: •  95% write •  12TB dimension •  I/O bound •  Noisy

neighbours •  pre- PIOPS

(AWS 100io/vol) •  Need >68GB •  No lock-in

What is “cloud”

Not a cloud definition slide!

•  Just datacenter best practices from 1998 (infrastructures.org)!

•  Gold disk deploy - AMI!

•  Version Control - config mgmt!

•  Automate everything - APIs!

Cloud is like cutting your legs off at the knee - stop trying to walk somewhere, just clone a new server in place – me.!

Compromising

Balancing best vs. budget

•  We chose software routers. OpenBSD + OpenBGPD on Dell!

•  We chose Cisco core switching!

•  We chose software firewalls. OpenBSD + PF on Dell!

•  We chose CloudStack on VMware!

•  We chose SAN + iSCSI!

Compromising

We chose software routers. OpenBSD + OpenBGPD on Dell

•  OpenBSD is secure, OpenBGPD is stable!

•  Scales to 1.5-2 Gbps per host, depending on packet size!

•  Redundant pairs instead of internally redundant (live upgrades!)!

•  Ops team understands BSD tools!

•  Added support for Intel 520 (82599) 10GE NICs!

•  Much lower cost than hardware routers!

Compromising

We chose Cisco core switching

•  Cisco is solid. Cisco engineers can be hired!

•  OSPF with millisecond timers = sub-second convergence!

•  Wanted 10Gig in the network core!

•  Needed minimal port count!

•  Ops team has Cisco experience.!!

Compromising

We chose software firewalls. OpenBSD + PF on Dell

•  OpenBSD is secure, PF is stable!

•  Scales to 1-1.5 Gbps per host, depending on states/rules (~300k)!

•  CARP + Pfsync is great! We run Active+Standby, alternating Masters.!

•  Redundant pairs instead of internally redundant (live upgrades!)!

•  Ops team understands BSD tools. Scripts sync security groups from AWS to PF tables.!

!

Compromising

We chose CloudStack on VMware

•  2012: CloudStack more mature than OpenStack!

•  Wanted VMware hypervisor for core data services (MySQL, Mongo)!

•  We use vMotion + HA on core services!

•  Did not want vendor lock-in, layered CloudStack for future options!

•  Original plan was mixed VMware + XenServer, but small Ops team!

Compromising

We chose SAN + iSCSI

•  We chose iSCSI for flexibility:!

•  We need snapshots. Most backups are sync+snap!

•  We like live migration of virtual machines!

•  We tolerate latency penalty of SAN for snapshot flexibility!

•  We run RAID-6 (2 parity disks)!Tolerate 2 disk failures per slice before data loss!Painful on write – 5,000 writes è 30,000 read + write!Remote equipment – time to replacement is not instant!!

SJC Stack – Core Network

BGP, OSPF, PF, on OpenBSD and Cisco!

Routers, switches, firewalls

SJC Stack – Private Cloud

CloudStack, VMware, iSCSI!

Switches, servers, storage

Network Overview

“no default” routing

Network Overview

AS 31931!

Multiple carriers, many paths

Thank You! Chris Maxwell!@[email protected]!