41
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nick Matthews, Partner Solutions Architect, AWS Warby Warburton, Technical Marketing Engineering, Palo Alto Networks November 29, 2016 GPST401 Advanced Tips for EC2 Networking and High Availability

AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Embed Size (px)

Citation preview

Page 1: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Nick Matthews, Partner Solutions Architect, AWS

Warby Warburton, Technical Marketing Engineering, Palo Alto Networks

November 29, 2016

GPST401

Advanced Tips for EC2 Networking

and High Availability

Page 2: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

We know how to build web applications

Page 3: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

What if it’s not that simple?

AWS Provides many services to improve availability

Page 4: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Common non-webby applications

• Business applications

• Legacy application

• Requirements for third party services

• Security

• Networking

• Load Balancing

• Storage

• Must use IP addresses

Page 5: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Tips and Pointers

Page 6: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

DNS and Auto-Scaling Design

• Old DNS problems are still DNS problems• Caching, TTL, client support

• ELB pointers• IP addresses change

• Performs Source NAT

• Session Stickiness for HTTP/S

• Supports TCP

• Minimum failover time is 7-12 seconds

• Route 53 pointers• Multi-Region

• Separate from Auto-Scaling

• Better for UDP, non-NAT, and simpler workloads

• Minimum failover time is 10-20 seconds

• Auto-Scaling• Publish and use custom metrics when appropriate

• Lifecycle hooks can assist with instance provisioning

Amazon Route 53

Elastic Load Balancing

Auto Scaling

Page 7: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Lambda is glue

• Lambda helps fill gaps• Handles Availability Zone degradation gracefully

• Event driven or scheduled• 1 minute minimum frequency

• 1M requests and 400,000 GB-seconds in the free tier

• Use Cases• Adding interfaces in Auto Scaling groups

• Adding and removing IP addresses in Route 53

• Automated failure detection and remediation

• Detecting new Elastic Load Balancing IP addresses

Lambda

Page 8: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Networking Tips

• Internet gateways are highly available and don’t have bandwidth limits

• There is one Virtual private gateway per VPC which supports many Direct Connect virtual interfaces and VPN connections

• For Direct Connect, availability and bandwidth are dependent on the port speeds and BGP routing policy

• For VPN, availability is automatically managed with 2 connections which are multi-gigabit in throughput

• Subnets, IP addresses, Elastic Network Interfaces, and NAT Gateways are local to one Availability Zone

Page 9: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Basic High

Availability Designs

Page 10: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

High Availability Methods

• Agent-based solutions

• DNS• Route 53

• Elastic Load Balancing Sandwich

• Auto Scaling Group Size 1

• Networking• Floating Elastic Network Interface

• Floating Elastic IP address

• Route shifting

Page 11: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Agent-based solutions

Host-based Security Host-based Security

Central Monitoring

and Control

Use Cases• Highly elastic applications

• DevOps + DevSecOps

• Host IDS / IPS

Design Notes• Can inspect encrypted data

• Scales with application

• Requires trust in user or application space

• Requires application compatibility

• Increases host overhead

Failover Time• Variable

Page 12: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

DNS Options

Page 13: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Route 53 or DNS

Use Cases• Multi-region applications

• Stateless web front ends

• Applications utilizing UDP

Design Notes• Client must support DNS

• Application is tolerant of DNS caching

• Inbound only

• Multiple routing policies to use

• Outbound return may be asymmetric

Failover Time• 20+ seconds

example.com

Internet

AZ 1 AZ 2

Route 53

Page 14: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Elastic Load Balancing Sandwich

Use Cases• Web Proxies, WAF

• Inbound web security

Design Notes• Stickiness is available for HTTP/S

• Use X-Forwarded for source visibility

• Set a low TTL for faster failover

• Health check the device instead of a pass-through health check

• May require a worker node to prepare instances for auto-scaling

Failover Time• 8+ second failover

Elastic Load

Balancing (ELB)

Elastic Load

Balancing (ELB)

Auto Scaling

Auto Scaling

Web Servers

inside.example.com

example.com

Internet

Proxy, WAF, or Firewall

Page 15: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Auto Scaling Group Size of 1

Use Cases• Simple HA

• Tolerant to minutes of interruptions

• Management consoles

Design Notes• Effective cost reduction

• Aware of EC2 failures

• Optional addition of ELB health checks

Failover Time• Minutes, dependent on instance

boot time and ELB monitoring

Page 16: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Networking Options

Page 17: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Design considerations:

• VPC API calls are eventually consistent• Test it!

• Relies upon user or partner built monitoring• Can happen ‘on box’ or ‘off box’

• Who’s monitoring the monitor?

• Routes, interfaces, and EIPs point to one instance• Does this meet your scaling requirement?

Networking Options

Page 18: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Floating Elastic Network Interface

Use Cases• Stateful Applications

• Clustering

• Virtual IP emulation

Design Notes• Inbound and Outbound Traffic

• Attach EIPs to the border instances for inbound traffic

• Monitoring between instances is required

• Single Availability Zone only

Failover Time• Timing is subject to the attach-

network-interface API request

Page 19: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Floating Elastic IP Address

Use Cases

• Similar to Floating ENI

• EIPs are more granular to move

Design Notes

• Monitoring between instances is required

• Costs begin after remapping EIPs over 100 times per month

• EIPs can move between Availability Zones, but will change private addresses

Failover Time

• Timing is subject to the associate-address API request

Page 20: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Route Shifting

Use Cases

• Active-passive solutions in different Availability Zones

• Inline security services

Design Notes

• Outbound traffic

• No clustering or synchronization

• Monitoring between instances is required

• Multiple Availability Zones

Failover Time

• Timing is subject to the replace-route API request

Page 21: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Transit VPCUse Cases

• Connecting VPCs within a region and across accounts

• Centralize resources back on-premises

Design Notes• Utilizes Cisco CSR

• Uses tags to automate VPC connectivity with Lambda

• Bandwidth bottleneck at approximately 1.5-2 gbps

Failover Time• BGP and DPD timers are 30

seconds

Page 22: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Palo Alto VM-Series

Auto-Scaling Firewall

Warby Warburton

Page 23: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

• Protect your AWS deployment from advanced cyberattacks

• Enforce policy consistency with centralized management

• Automate deployment and policy updates so security keeps pace with

the business

VM-Series Next-Generation Firewall on AWS

AZ1bWeb

1

DB1

Subnet1

Subnet2

Page 24: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

CloudFormation Template: Automates full

use case deployments

S3: AWS service where bootstrapping files

are stored

CloudWatch: Consumes metrics and makes

intelligent scale in/out decisions

Lambda: Code as a service pushes custom

metrics to CloudWatch via XML API

Auto Scaling Groups (ASG): The firewalls

are members of a group that scales in/out

based on custom metrics

PAN-OS Bootstrapping: Automates

creation of fully configured firewall

PAN-OS API: enables delivery of custom

metric to CloudWatch

Panorama: Optional but highly

recommended to simplify VM-Series

management

Native AWS and PAN-OS/VM-Series Services Used

AWS Services PAN-OS/VM-Series Services

Page 25: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Region 1

AZ1

External ELB

AZ2

Internal ELB

Web ASG

1CFT deploys

base topology

ASG1

2Initial firewalls are

bootstrapped from

S3

ASG2

Bootstrapping

adds FWs to

Panorama

Page 26: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Region 1

AZ1

External ELB

AZ2

Internal ELB

Web ASG

ASG1

3Standard metrics

sent to CloudWatch

4Alarm triggers

ASG scale out

ASG2

Page 27: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Region 1

AZ1

External ELB

AZ2

Internal ELB

Web ASG

ASG1

5 l function collects

PAN-OS metrics via API

Custom metrics sent to

CloudWatch6

7Alarm triggers

FW ASG scale

events

ASG2

Bootstrapping

continues to add

FWs to

Panorama

l Function

removes FWs

from Panorama

Page 28: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Region 1

AZ1

IELB VIP 1 IELB VIP 2

AZ2

Web ASG

ASG1 ASG2

8l function monitors

for ELB VIP changes IELB VIP 3

9l function deploys

new ASG with NAT

rule for new VIP

ASG3

IELB VIP 4

ASG4

External ELB

Internal ELB

Page 29: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Advanced High

Availability Designs

Page 30: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Advanced High Availability Methods

• Overlay networks

• Services VPC

• Availability Zone VPN Mesh

Page 31: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Overlays

Use Cases• When simpler topologies don’t

meet requirements

• Multicast

• Abstraction frameworks

Design Notes• Limitless responsibility

• Security redesign

• Visibility and complexity problems

• Outbound only unless extended outside of the VPC

Failover Time• Variable

Page 32: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Services VPC

Use Cases• Centralized firewalls, IDS/IPS

• WAN, Security or Shared Services

• Multiple VPCs

Design Notes• Device must support VPN + NAT

• VPN Overhead on devices

• VGW outbound is active/passive and has multi-gigabit bandwidth

• Supports multiple Availability Zones

• Scales to 8-10 VPCs due to VGW IP address overlap without VRFs

• Requires BGP policy design for symmetric routing

Failover Time• BGP and DPD timers are 30 seconds

FW

Internet

AZ1

Application VPC

VPN

VGW

Incoming could be EIP, DNS, or

Route 53

Advertising a 0.0.0.0/0 down,VPC advertising

CIDR up

FW

AZ2

AZ1 Routes have shorter path than

AZ2

Application VPC

VGW

Page 33: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Availability Zone Overlay Mesh

Use Cases• Encryption in transit

• Applications manage their own high availability

Design Notes• Single or multiple devices per

Availability Zone

• A device failure is equivalent to an Availability Zone failure

• Centralized management is recommended

• Cost of devices may be high

Failover Time• Variable, depending on routing

protocol and tunnelAZ1 AZ2

FW FW

AZ1 AZ2

FW FW

AZ1 AZ2

FW FW

AZ1 AZ2

FW FW

Full Mesh VPN

Internet

Production

VPC

Staging

VPC

Development

VPC

WAN

DMZ VPC

On Premises

Page 34: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Case Studies

Page 35: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Customer #1

Scale

• Using VPCs to segment production and development and different organizations – 8 VPCs total

Application mix

• Traffic will be a mix of TCP, UDP, and HTTP

Security

• Firewalls are required between VPCs and to the Internet

• Need centralized control

• Require 1gbps of private connectivity to on-premises

Page 36: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Customer #1 – Services VPC

with Direct Connect

FW

Internet

AZ1

Application VPC

VPN

VGW

FW

AZ2

Application VPC

VGW

Direct Connect

Private Virtual Interface

WAN

Datacenter

• Security Groups within the VPC

• Spoke VPCs route points towards VPN

• On-premises (RFC 1918) routes towards Direct Connect

• Traffic to the Internet or other applications goes through the firewall

Page 37: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Customer #2

Scale

• 2 Gbps to a single VPC

• Requires high availability and backup for failure

Application

• Mix of lift and shift applications and web applications

Security

• AWS is an ‘untrusted datacenter’; IPS to and from on-premises

• Use AWS Internet, but only for patches and AWS API calls

Page 38: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Customer #2

Encrypted Direct Connect and Outbound Proxy

Instances have proxies set

for outbound HTTP traffic

Routes to on-premises split

between firewalls with VPN

connections

Multiple firewalls and

routes to handle load

Firewalls handle

approximately 1.5 Gbps

Use ENI shifting for

additional outbound high

availability

Internet

AZ1

VPN over Direct Connect

AZ2

Direct Connect

WAN

Datacenter

FW

FW

FW

FW

Backup VPN

ApplicationSubnets

URL URLOutbound Proxy

Subnets

Internet

Page 39: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Closing Thoughts

• Pick the right design for your use case

• Think about inbound vs. outbound, scale, and auto-scaling

• Mix and match designs to meet requirements

• May require segmenting your applications

• Start simple, grow as you need

• Migrate from one design pattern to another

Page 40: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Remember to complete

your evaluations!

Page 41: AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availability (GPST401)

Thank you!