View
226
Download
0
Category
Preview:
Citation preview
Jimmy Halim IDNOG3jhalim@cloudflare.com Jakarta, 28 July 2016
Building & Managing 80+ PoPs
Overview of CloudFlare
● 4+ million zones/domains● 43+ billion DNS queries/day● How?
○ Orange cloud○ Global distributed network
in 80+ locationsStill growing fast!
○ Anycast routing
Protect and accelerate any website online
Benefit of orange cloud● Direct visitors to the nearest entry point
○ Fast!■ Lesser hops■ Reduced latency■ Improved performance
● Save bandwidth!○ Lesser requests to origin
■ Typically 50% of the resources on any givenweb page are cacheable
○ Mitigate malicious visitors or DDoS ■ Stop them before get to the origin web server
● Resiliency○ 80+ locations!
Grey cloud vs orange cloud
Building like crazy
1 new PoP per week!
Strategic Planning
● Agreement/Negotiation● Location
○ Peering Exchanges ○ Cost○ Support
● Size○ Traffic analysis
■ Number of Racks■ Equipment types■ Transits/Peering Exchanges
● How many?● How big are the pipes?
Challenges
● Installation○ Regulation
■ Import policy○ Transits
■ Different carriers have different setup/policies○ Language barriers
● Human factors○ Configuration errors!
■ Anycast
● Traffic turnup○ How to ensure it is not impacting
■ No outages please!
Solutions
● Out of band network is a must!○ Acting as last resort○ Upgrade/downgrade○ Maintenances
● Configuration template○ Auto configuration
■ Anycast!○ Peer review
● Global Network Engineering○ Round the clock deployment
■ Reduced bottleneck
Testing with providers
● Circuit testing○ Point to point extended ping test
■ Test all physical ports○ Failover Testing
■ Redundancy● Do not create a blackhole instead!
● Use testing prefix○ Global versus domestic
■ RIPE Atlas measurement■ Public route servers
○ Good related BGP configuration■ It does what is supposed to do
Traffic Turnup
● Do not send all prefixes at 1 go!○ Start with few prefixes○ Check the routing to these few prefixes
■ Global traffic analysis● No big drop of traffic in other location ● Traffic comes from the right countries
○ Monitor for 24 hours■ Confirms there are no anomalies observed
● On the new location● Globally
○ Announce all prefixes■ In batches■ Repeat the same steps above!
Traffic Turnup
● Get the providers to be involved○ Especially if it is a single homed○ Inform them the schedule
■ Get them to understand what to expect■ Troubleshoot and fix the problem faster!
○ Their users might be able to see problem faster
Managing 80+ PoPs
● 80+ locations● 500+ transit/exchange ports● 500+ network equipments● Uncountable alerts!
Challenges
Building Resilience Network
● Stable hardware and software● Automatic configuration template/peer review● Solid monitoring system● Network automation● Global network engineering
Hardware and Software
● Proper evaluation and testing○ Fits requirement○ Bugs free○ Scalable
● Global standardization○ Same models of hardware○ Same software codes
● No mass software upgrade!○ Small PoP first○ Deploy in batches
Solid Monitoring System
● Reduced unwanted alerts○ Only gets relevant alerts○ Silence PoP/ports during maintenances
● Monitor the performance of transit providers○ Detects packet loss on their backbone○ Provides automatic related traceroutes○ Actions based on severity
■ Disabling the PoP automatically■ Disabling traffic on related transit provider automatically■ Suggests on actions to do
Alerts Channel and Dashboard
Alerts Channel and Dashboard
Alerts Channel and Dashboard
Alerts Channel and Dashboard
Network Automation
● Open source recipe: napalm-salt
ripe72-NetworkAutomation-SaltandNAPALM-MirceaUlinic-CloudFlare
Network Automation | NAPALM-Salt (examples)● salt "edge*" net.cli "show version"● salt -G "os:junos" net.cli "show chassis hardware”● salt -G "os:iosxr" net.arp● salt-run net.find [target_device]● salt-run net.find [mac_address]● salt-run bgp.neighbors [bgp_asn]● salt [target_device] [anycast.disable | anycast.enable]● salt [target_device] [transit.disable | transit.enable]
[transit_name]
Network Automation | NAPALM-Salt (examples)
Global Network Engineering
● Follow the sun approach○ San Francisco -> Singapore -> London -> San Francisco
● Doing all stuffs○ Technical operations○ Network engineering○ Network expansion projects○ New PoPs deployment○ Peering stuffs
● Very fast response to network issues and escalation
Statistics
Indonesian’ Statistics
Q&A
Recommended