Upload
ns1
View
113
Download
0
Embed Size (px)
Citation preview
Future-proofing Application Delivery at Yelp
Building and tuning traffic management for large web-scale applications
Hello!Who are these guys anyway?
Sarguru Mohan
Site Reliability Engineer
Yelp
Kris Beevers
Founder
NS1
Intelligent DNS & Traffic
Management
A broad look at modern DNS & traffic
management
What’s changed / changing?
Application delivery & traffic automation are
deeply intertwined
• Today’s application architectures --like Yelp -- are distributed, dynamic, and driven by real-timeconditions
• As elastic infrastructure shifts, traffic needs to move with it
• Deep integration between traffic management and the application is key
What makes DNS a good place in the stack
to take control over traffic?
• Ubiquitous: every client speaks DNS to
find your application
• No application changes: you’re already
using DNS to direct traffic
• Early: first indication a user needs to
interact with your infrastructure, first
opportunity to impact traffic
• Simple: mapping names to services –
lightweight idea with lots of flexibility
Automate DNS & traffic management like
everything else
• Software development & operations
are increasingly intertwined -- DNS
should be tightly integrated with
deployments, scaling, testing & burn-
in, etc
• Demand comprehensive API control– Zone files aren’t granular or expressive enough for modern
traffic management
– Change propagation is a key consideration -- how long for
an automated change to propagate across global
authoritative DNS (independent of TTL)
Beware the pitfalls!
• Caching / TTL induce limitations
–Can’t control every connection
like L7
–DNS is better for global load
balancing than for local
–But! Can still impact availability /
Beware the pitfalls!
• Modern DNS traffic management
doesn’t translate across providers
–(Yet)
–Different approaches / semantics
–Lose zone transfer & traditional
approaches for redundancy
–Think about this up front -- today’s
internet demands DNS network
redundancy
DNS based traffic management enables
powerful automation
• Health checks & geo-routing: still
important, but we can do better
• Load shedding: optimize infrastructure
utilization with thin provisioning
• Real-time performance management:
traffic routing based on RUM telemetry --
bust the geo approximation & optimize
real perf metrics
DNS based traffic management enables
powerful automation
• Flexibility around datacenter /
infrastructure elasticity, migration,
expansion: weighting, stickiness,
etc
• Network-based control: specifically
route ISPs/prefixes
• Data ingestion, aggregation,
propagation drives global traffic
automation
TLDR
Lots going on in modern DNS & traffic
management.
Let’s look at Yelp’s use case.
Yelp’s Mission:Connecting People with great local business.
Yelp StatsAs of Q1 2016
90M 102M
And some more stats
6 datacenters 5 continents
Goals
• Availability
• Performance
Challenges
• Balance traffic across multiple edges.
• Actively monitor edges and respond to events.
• Respond to changes in elastic infrastructure
The “edge” stack
The “edge” stack
The “edge” stack
The “edge” stack
Why is DNS critical here?
• Public facing DNS records resolves to CDN’s
Anycast address.
• Users are routed to the nearest PoP.
• CDN is configured to route requests to our
“backend” DNS records.
What’s behind the edge
What’s behind the edge
What’s behind the edge
• Load Balancers
• Webservers
• 100s of micro-services
• smartstack for service discovery
What’s behind the edge
What’s behind the edge
What’s behind the edge
• Hardware boxes.
• EC2 instances.
• Auto Scaling Groups.
• Spot Fleets.
What’s behind the edge
• Hardware boxes.
• EC2 instances.
• Auto Scaling Groups.
• Spot Fleets.
DNS & Elastic Infrastructure
• Elastic Infra demands Intelligent DNS
DNS & Elastic Infrastructure
• Elastic Infra demands Intelligent DNS
• Intelligent = Fast
DNS & Elastic Infrastructure
• Elastic Infra demands Intelligent DNS
• Intelligent = Fast
• Intelligent = Flexible
DNS & Elastic Infrastructure
• Elastic Infra demands Intelligent DNS
• Intelligent = Fast
• Intelligent = Flexible
• Intelligent = API Driven
Let’s Talk About Infra Automation
Let’s Talk About Infra Automation
So how do we launch these AMIs?
• AWS Web console? !!!
• Shell script?
• clops
So how do we launch these AMIs?
Terraform
• Declarative Infrastructure
• “Self Documenting”
• Infra changelog
• Reproducible Infrastructure
Terraform at Yelp
• Base AWS Infrastructure.
• Front-end Infrastructure
• Datastores
• Datapipeline
• Batch systems
• DNS
Terraform at Yelp
• Base AWS Infrastructure.
• Front-end Infrastructure
• Datastores
• Datapipeline
• Batch systems
• DNS
DNS @ Pre-terraformic days
• BIND
• Web UI
• Script reading YAML for traffic
management.
Terraform at Yelp
• 4 custom providers
• aws wrapper
• git
• nsone
• ddns
• 16 custom resources
Terraforming DNS
Terraforming DNS
• So we wrote a go-api client.
• And a custom terraform provider too!
Why not X?
• Web UI??
• BIND??
• Yet Another Script??
Why Terraform?
• Common tool
• Source of Truth
• Declaratively describe your complete
infrastructure.
• Remote states <3
Show me the Code!
So how does automation help here?
• Test/deploy new features
• IP Fencing
• ALIAS records
• CDN gets DDoSed?
• Let go of it!
• Keep your TTLs Reasonable.
• Use Multiple CDNs.
So how does automation help here?
• DNS provider gets DDoSed?
• Use multiple DNS providers.
• Automation to keep them in sync.
• vendor_agnostic_tools++
Problems
• Terraform config is verbose and
unexpressive for some zones
compared to BIND
• Sometimes Terraform’s remote state
refresh step is not fast enough.
Future of DNS at Yelp
• Automated traffic distribution
management.
• Load shedding.
• Continuous Integration and Delivery.
Code
https://github.com/bobtfish/terraform-provider-nsone
https://github.com/bobtfish/go-nsone-api
Thanks!!
@sargru90
@YelpCareers
@YelpEngineering
github.com/yelp
engineeringblog.yelp.com
@beevek
@nsoneinc
github.com/ns1
ns1.com/blog