71
Chuck Hoppenrath Cloud & Infrastructure Architect LHC1158BU #VMworld #LHC1158BU Architecture and Tooling for the Brownfield Hybrid Cloud VMworld 2017 Content: Not for publication or distribution

LHC1158BU Architecture and Tooling for the Brownfield ... · Chuck Hoppenrath –Cloud & Infrastructure Architect LHC1158BU #VMworld #LHC1158BU Architecture and Tooling for the Brownfield

Embed Size (px)

Citation preview

Chuck Hoppenrath – Cloud & Infrastructure Architect

LHC1158BU

#VMworld #LHC1158BU

Architecture and Tooling for the Brownfield Hybrid Cloud

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

#LHC1158BU CONFIDENTIAL 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Who is Scripps Networks Interactive

2 Current On-Prem / Cloud Deployments

3 Learning to Love Criticism

4 Design End Point infrastructure

5 There Ain’t No Such Thing as a Free Lunch

#LHC1158BU CONFIDENTIAL 3

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Who is Scripps Networks Interactive

2 Current On-Prem / Cloud Deployments

3 Learning to Love Criticism

4 Design End Point infrastructure

5 There Ain’t No Such Thing as a Free Lunch

#LHC1158BU CONFIDENTIAL 4

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Who is Scripps Networks Interactive

2 Current On-Prem / Cloud Deployments

3 Learning to Love Criticism

4 Design End Point infrastructure

5 There Ain’t No Such Thing as a Free Lunch

#LHC1158BU CONFIDENTIAL 5

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Who is Scripps Networks Interactive

2 Current On-Prem / Cloud Deployments

3 Learning to Love Criticism

4 Design End Point infrastructure

5 There Ain’t No Such Thing as a Free Lunch

#LHC1158BU CONFIDENTIAL 6

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

1 Who is Scripps Networks Interactive

2 Current On-Prem / Cloud Deployments

3 Learning to Love Criticism

4 Design End Point Infrastructure

5 There Ain’t No Such Thing as a Free Lunch

#LHC1158BU CONFIDENTIAL 7

VMworld 2017 Content: Not fo

r publication or distri

bution

Who is Scripps Networks Interactive?

VMworld 2017 Content: Not fo

r publication or distri

bution

Who is Scripps Networks Interactive?

#LHC1158BU CONFIDENTIAL 9

VMworld 2017 Content: Not fo

r publication or distri

bution

Who is SNI?

Linear Non-Linear

#LHC1158BU CONFIDENTIAL 10

VMworld 2017 Content: Not fo

r publication or distri

bution

Who is SNI?

#LHC1158BU CONFIDENTIAL 11

VMworld 2017 Content: Not fo

r publication or distri

bution

Current On-Prem / Cloud Deployments

VMworld 2017 Content: Not fo

r publication or distri

bution

On Premises Infrastructure – Knoxville Datacenter

55 vm's, 60 instances, 850 user databases (over 1000 including system databases)

~ 900TB (VM, physicalServer mounts).

block & file

~1300VMs

~20TB used onlyFor Hyper-V SQL

workloads

vRealize Automationv7

#LHC1158BU CONFIDENTIAL 13

VMworld 2017 Content: Not fo

r publication or distri

bution

US OfficesKnoxville

2 ESXi servers~14 TB EMC storage

International Operations

Compute: UCS Domain, single chassis, 4 bladesStorage: EMC VNX, block / file, ~ 60TB40 VMs – General Services

#LHC1158BU CONFIDENTIAL 14

VMworld 2017 Content: Not fo

r publication or distri

bution

New York – Disaster Recovery

~ 60TB Block & File

5 ESXi hosts6 General Service servers

Knoxville

20 “warm” servers constantly run – nightly backupsare restored daily with SQL logs replayed on a 15minute increment

Disaster Recovery Environment

~ 40 VMs

#LHC1158BU CONFIDENTIAL 15

VMworld 2017 Content: Not fo

r publication or distri

bution

Public Cloud Infrastructure / Usage

5 years

2years

#LHC1158BU CONFIDENTIAL 16

VMworld 2017 Content: Not fo

r publication or distri

bution

Amazon Web Services

A typical month of traffic will see our AWS based websites and services deliver:

170M visits

850M page views

22M video plays

#LHC1158BU CONFIDENTIAL 17

VMworld 2017 Content: Not fo

r publication or distri

bution

AWS EC2 Growth

Amazon EC2

2900 Instances400 TB EBS storage

#LHC1158BU CONFIDENTIAL 18

VMworld 2017 Content: Not fo

r publication or distri

bution

AWS S3 Growth

Amazon S3

5 PB storage1.7B objects

#LHC1158BU CONFIDENTIAL 19

VMworld 2017 Content: Not fo

r publication or distri

bution

Other Clouds

Azure IaaS builds are minimal~ 40 VM’s and 1 production

system

We do extensively leverage Azure Data Factory and utilize PowerBI for analyticsdashboarding

#LHC1158BU CONFIDENTIAL 20

VMworld 2017 Content: Not fo

r publication or distri

bution

The Bi-Modal Capability Gap is Widening

• Investment in on premises infrastructure has lagged past several years – all hardware is due for refresh. Storage in extended support, compute shortly behind

• The business is used to self service capabilities that Public Cloud uses. There is a strong preference toward building that capability on prem as well.

• Perception of the environment continues to be a struggle

Unreliable!Too

Slow!

Expensive! It’s not

AWS!

#LHC1158BU CONFIDENTIAL 21

VMworld 2017 Content: Not fo

r publication or distri

bution

Learning to Love Criticism

VMworld 2017 Content: Not fo

r publication or distri

bution

What Follows Does Not Describe a Failing Team……

Never had to fail apps up to current DR site in New York

SLA for DR exceeds business requested RTO and easily meetsRPO for apps currently in DR

Delivers highly performant systems without budget refresh in several years and through significant staff churn

#LHC1158BU CONFIDENTIAL 23

VMworld 2017 Content: Not fo

r publication or distri

bution

Learning to Love CriticismNo one likes hearing their infrastructure isn’t good enough. That message will get delivered through

Reduced Budgets Reduced Workloads

• Disaster Recovery Table Top Exercise

• Business Impact Analysis Audit

• Infrastructure and Operations Assessment

#LHC1158BU CONFIDENTIAL 24

VMworld 2017 Content: Not fo

r publication or distri

bution

Disaster Recovery Tabletop Exercise

Business Led

Departments: 8

Participants: 60

Scope: An escalating DR scenario played outover several hours

Purpose: Identify key systems and methods of communication neededduring a disaster

#LHC1158BU CONFIDENTIAL 25

VMworld 2017 Content: Not fo

r publication or distri

bution

Disaster Recovery Tabletop Exercise

When asked if IT could easily shut down non-criticalsystems within the VMware environment to consolidateto a minimal set of hosts the answer was no – neithervCenter or the CMDB contained reliable enough datato allow that to happen quickly or reliably

Once the DR scenario involved Knoxville goingoffline access to what limited systems we had in DR in New York City involved a VPN connection from London back across the Atlantic.

Knoxville was operating as a hub to a multi-site, multi Cloudoperation with no graceful failover mechanism in place

#LHC1158BU CONFIDENTIAL 26

VMworld 2017 Content: Not fo

r publication or distri

bution

Business Impact Analysis

Business Led

Departments: 15

Participants: 97

Scope: Extensive auditing of key business processes to understand their interdependencies and their criticality as well as RTO / RPO

#LHC1158BU CONFIDENTIAL 27

VMworld 2017 Content: Not fo

r publication or distri

bution

Business Impact Analysis

Sections indicate major business departments

Not all of these processes directly involve internal IT

Business process needs to be recovered in 3 days (or less)

Business process needs to be recovered in 4 – 7 days

Business process needs to be recovered in 10 days or more

#LHC1158BU CONFIDENTIAL 28

VMworld 2017 Content: Not fo

r publication or distri

bution

Business Impact Analysis

• The BIA has identified over 100 mission critical processes.

• Those processes translate to over 60 apps IT need to enable DR protection for in theshort term.

That is ~ 5X increase in DR App targets at a minimum!

New York DR location is undesirable for several reasons:

• Hardware deployed there for DR is not scalable• Facilities are close to max in terms of power and

cooling• Issues with water infiltration into server room• The building caught on fire a few months back• New York City is a place you DR from – not to…..

#LHC1158BU CONFIDENTIAL 29

VMworld 2017 Content: Not fo

r publication or distri

bution

Round Tower Technologies – On Prem Assessment

Quickly became apparent that the challenges in on prem environment in Knoxville were adherence to legacy technical decisions as well as

process related

IT Led

Departments: 10

Participants: 50

Scope: Recommendations for deployed systems optimization. Strategy recommendations for upcoming hardware refresh & capability enhancement

#LHC1158BU CONFIDENTIAL 30

VMworld 2017 Content: Not fo

r publication or distri

bution

On Prem Assessment

Ditch Hyper-V

Seriously considerHCI for hardwarerefresh

Integrate vRA with ServiceNow Fix Change Control Process lag

LOTS of optimizations to current infrastructure (over 200 pages of data +recommendations)

#LHC1158BU CONFIDENTIAL 31

VMworld 2017 Content: Not fo

r publication or distri

bution

Disparity in Change Control by Environment

Most of the perceived speed challenges with on prem infrastructure can

be tied back to this process…

• Slow approval process (as long as 2 weeks)• Every change (including test / dev) must go

to CCB• Limited integration with orchestration tools• Each VM deployment is a separate change

request• Process is not well understood or documented

• Limited CCB participation• CMDB updated on batch pull from management tools

not as changes are made• Development teams and customers see no benefit from

engagement in CCB process

#LHC1158BU CONFIDENTIAL 32

VMworld 2017 Content: Not fo

r publication or distri

bution

The Butcher’s Bill

DR Table Top BIA Round Tower

Service ScalabilityLocation Criticality Process Consistency

#LHC1158BU CONFIDENTIAL 33

VMworld 2017 Content: Not fo

r publication or distri

bution

Design Your End State Infrastructure Like No One is Watching…but everyone has opinions…

VMworld 2017 Content: Not fo

r publication or distri

bution

What Capabilities Will Get Us There?

ITaaS

Applic

ation d

ependency m

appin

g

Virtual

Networking

and isolationMicro-

segmentationImproved

RTO

On demand Scale out network services

Applicationdependencies

Security follows workload to cloud

Primary Datacenter

Servers Storage NSX

VMware Cloud on AWS

vSpherevSANNSX

Cloud &

Automation IaaS -> Passself service

Automated DRaaS

Hybrid cloudorchestration

Operationalconsistency

$Cost

transparencyStandards &

Tiers

Capacity Management

Service Catalog

VDI Capable

Hybrid cloudElasticity

Intelligent Log management

ITS

M Inte

gra

tion

#LHC1158BU CONFIDENTIAL 35

VMworld 2017 Content: Not fo

r publication or distri

bution

Primary Datacenter

Servers Storage NSX

VMware Cloud on AWS

vSpherevSANNSX

The only specific service that is being called out – SNI has identified VMC on AWS as its intended DR target in 2018.

As the service matures and rolls out to more regions its use cases will only grow…

The addition of NSX as a network transport / extention / security layer to our primary Datacenter is being driven my multiple requirements.

NSX does require a level of integration with networking that is some cause for concern within IT

#LHC1158BU CONFIDENTIAL 36

VMworld 2017 Content: Not fo

r publication or distri

bution

• Make this more than simple OS deployments • Multi-Machine Blueprints and full application stacks as single catalog items.• Usable by more than just internal IT• Inclusive of Day 2 self service operations (backups / restores / snapshots)

• Host quantity reducing, uptime demands increasing • Leverage log data to become proactive versus reactive• Reduction in break fix time as well as predictive failure alerting• vROPS, vRNI, Log Insight

ServiceCatalog

Intelligent Log management

#LHC1158BU CONFIDENTIAL 37

ITaaSCapacity

ManagementService Catalog

VDI Capable

Hybrid cloudElasticity

Intelligent Log management

VMworld 2017 Content: Not fo

r publication or distri

bution

Virtual

Networking

and isolationMicro-

segmentationImproved

RTO

On demand Scale out network services

Security follows workload to cloud

Micro-segmentation

Security follows workload to

cloud

• Dev groups capability to fully isolate multiple app stacks • Control and contain east-west traffic flow as much as possible• Legacy OS / Applications containment (Win2K3)

• Security policy defined once and follows workload (on prem, VMC, AWS)• Single panel security policy definition and enforcement vs. Cloud specific• VERY attractive for systems containing PII, SOX, or Talent data

#LHC1158BU CONFIDENTIAL 38

VMworld 2017 Content: Not fo

r publication or distri

bution

AutomatedDRaaS

$Cost

transparency

• DRaaS should cover both application onboarding as well as test / failover• Fully orchestrated testing as well as failover / failback• Repeatable, measurable, implementable by support staff

• If it is treated as free it is not valued…• Shows the value IT budget delivers to the business• Who are the largest consumers / at what monetary rate

#LHC1158BU CONFIDENTIAL 39

Cloud &

Automation IaaS -> Passself service

Automated DRaaS

Hybrid cloudorchestration

Operationalconsistency

$Cost

transparencyStandards &

Tiers

VMworld 2017 Content: Not fo

r publication or distri

bution

ITSM Integration

Application dependency mapping

We cannot accurately perform DR activities, optimize necessary infrastructure workflows,

or accurately inform customers of outages withoutaccurate and automated app dependency

mapping tools.

These app mappings and all activities performedby IT need to be reflected in our IT Service

Platform and be reflected in our CMDB.

These processes should be thorough but lightweight

It’s All Connected!!!!!#LHC1158BU CONFIDENTIAL 40

VMworld 2017 Content: Not fo

r publication or distri

bution

How Does This Address Our Identified Gaps?

Process ConsistencyService ScalabilityLocation Criticality#LHC1158BU CONFIDENTIAL 41

VMworld 2017 Content: Not fo

r publication or distri

bution

2020

Long Term Objectives...

2019

2018This isa LOT

of Work

We will pick off the most urgent of these capabilities (DR and BC related) and build competency slowly.As the Service Catalog gets fleshed out a roadmap of capabilities will grow and be deployed on a regular schedule

#LHC1158BU CONFIDENTIAL 42

VMworld 2017 Content: Not fo

r publication or distri

bution

TANSTAAFLThe Moon may in fact be a harsh mistress but she won’t give you a free lunch...

VMworld 2017 Content: Not fo

r publication or distri

bution

Hardware Refresh

We have the opportunity to refresh the complete hardware stack at SNI – compute,storage, and tooling.

What that leaves you with is the proverbialWhite Page problem. Where to start?

#LHC1158BU CONFIDENTIAL 44

VMworld 2017 Content: Not fo

r publication or distri

bution

#LHC1158BU CONFIDENTIAL 45

VMworld 2017 Content: Not fo

r publication or distri

bution

Traditional 3 TierHyperconverged

VSAN ReadyNodesHyperconverged

Proprietary Solution

SAFE QUICK

New UCS Blades

All-Flash VMAX

Coordinate w/Network Team onrefresh

#LHC1158BU CONFIDENTIAL 46

VMworld 2017 Content: Not fo

r publication or distri

bution

Technology Assessment LabAnybody got a better acronym than TAL?

VMworld 2017 Content: Not fo

r publication or distri

bution

TAL Design Philosophy

Make use of extra hardware available to IT staff to build representative lab environment

Build lab completely segregated from all production systems and free from all Change Control processes

Make the lab usage by networking, infrastructure, and ITcustomers for POC and upgrade testing

Waterfall replaced hardware freed from technology upgradesinto the lab

#LHC1158BU CONFIDENTIAL 48

VMworld 2017 Content: Not fo

r publication or distri

bution

Lab Buildout

vRNI

vRA

NSX

AWS cloud

VPC

HQ

WAN

speed

Remote

VMC on

VPC

Microsoft

Azure

VPN

Gateway

#LHC1158BU CONFIDENTIAL 49

VMworld 2017 Content: Not fo

r publication or distri

bution

Technology Assessments

VMworld 2017 Content: Not fo

r publication or distri

bution

NSX

IT was asked what it could do to increase availability and flexibility to avoid previous downtime incidentsMost extended outages were due to upgrade processes going longer than expected

The ability to move workloads to VMC, perform on prem maintenance, and then migrate back is very attractive

Traffic

Optimization

Micro

Segmentation

Simple Load

BalancingAPI Network

Constructs

#LHC1158BU CONFIDENTIAL 51

VMworld 2017 Content: Not fo

r publication or distri

bution

NSX Impact on Infrastructure & Networking Teams

Infrastructure Networking+

We don’t work

like that!

#LHC1158BU CONFIDENTIAL 52

VMworld 2017 Content: Not fo

r publication or distri

bution

vRealize Network InsightvRNI network reporting will let us see how much traffic flows in / out of our VMware environment

#LHC1158BU CONFIDENTIAL 53

VMworld 2017 Content: Not fo

r publication or distri

bution

vRealize Network InsightvRNI path reporting (with integrated network statistics!) makes troubleshooting so much easier…

#LHC1158BU CONFIDENTIAL 54

VMworld 2017 Content: Not fo

r publication or distri

bution

vRNI+AWS

Flow data from AWS Instances

AWS Security Group identification

#LHC1158BU CONFIDENTIAL 55

VMworld 2017 Content: Not fo

r publication or distri

bution

VMC on

BackupSNI is seeking options……what interesting things can be done with the bits once they are protected?

S3 Bucket Azure

VM VM VM

#LHC1158BU CONFIDENTIAL 56

VMworld 2017 Content: Not fo

r publication or distri

bution

Data Protection

Knoxville

User file shares are currently stored in storage / servers local to user location.

For remote site locations data is replicated round robin for duplication offsite

However local file shares are difficult to

access in the event of Knoxville being

unavailable…….

#LHC1158BU CONFIDENTIAL 57

VMworld 2017 Content: Not fo

r publication or distri

bution

Automation – vRealize Automation + ServiceNow Integration

#LHC1158BU CONFIDENTIAL 58

VMworld 2017 Content: Not fo

r publication or distri

bution

VMware Cloud on AWS

LightHouse and Early Access Program

#LHC1158BU CONFIDENTIAL 59

VMworld 2017 Content: Not fo

r publication or distri

bution

The Puzzle Pieces Seemed to Fall into Place…

VMC on

VMC on

#LHC1158BU CONFIDENTIAL 60

VMworld 2017 Content: Not fo

r publication or distri

bution

VMware Cloud on AWS as a Forcing Function

It’s not

AWS!

Forcing a strong look / upgrade to vSphere 6.5and the HL5 client.

Also forcing usage of ContentLibraries between Knoxvile and VMC

#LHC1158BU CONFIDENTIAL 61

VMworld 2017 Content: Not fo

r publication or distri

bution

Disaster Recovery Solutions – SRM as a Service

VMC on AWS

vSphere vSphere

Knoxville

Scalable

Cost Effective

GeographicallyDesirable

Unlikely to flood or catch on fire

#LHC1158BU CONFIDENTIAL 62

VMworld 2017 Content: Not fo

r publication or distri

bution

VMware Cloud on AWS – Compelling Price Structure

202020192018

SNI Current Infrastructure Refresh Plan#LHC1158BU CONFIDENTIAL 63

VMworld 2017 Content: Not fo

r publication or distri

bution

2018

20192020

Infrastructure Refresh + VMC?

VMware Cloud on AWS – Compelling Price Structure

#LHC1158BU CONFIDENTIAL 64

VMworld 2017 Content: Not fo

r publication or distri

bution

The Long Long Wait for VXLAN!

AvailabilityEASY

Migration Improved

SLA Delivery

Gee Whiz

#LHC1158BU CONFIDENTIAL 65

VMworld 2017 Content: Not fo

r publication or distri

bution

What’s Next?Tentative Timeline & Next Steps

VMworld 2017 Content: Not fo

r publication or distri

bution

Difficult to predict this far out.

Here there be Dragons!

Begin NSX / vRNI rollout

in prod

Tier II AppCapability

Test in VMC

10gb Direct Connect To VMC provisioned

Finalize 2018

Budgets

vRA / SNIntegration

Replace 1/3Hosts & Storage

In Knoxville

Change MgmtWorkshop

Full VMCSRM Test

ConfirmTier 1

Apps for DR

ReplaceRemote Office

HW

DR MigrationTo VMC

Complete

Tier II AppFailover to

VMC Complete

US EastExtension of

VMC –Hybrid MAMApp testing

VMC extendedTo Ireland –International

Usage?

Replace 1/3Hosts

In Knoxville?

2017 2018 2019

#LHC1158BU CONFIDENTIAL 67

VMworld 2017 Content: Not fo

r publication or distri

bution

Next Steps

Internal communication and roadshow• Our plan is to consolidate Hyper-V workloads onto VMware – will require maintenance windows

to accomplish. LOTS of customer handholding• Coordinate DR testing for new environment in conjunction with planned quarterly tests• vRNI ability to reach into AWS will be of interest to everyone• LOTS of training – NSX blurring a lot of lines• HCI blurs even more

#LHC1158BU CONFIDENTIAL 68

VMworld 2017 Content: Not fo

r publication or distri

bution

Thank YouChuck Hoppenrath – [email protected]

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution