21 - IDNOG03 - Jimmy Halim (Cloudflare) - Brief Introduction of CloudFlare, the routing and benefit...

Preview:

Citation preview

Jimmy Halim IDNOG3jhalim@cloudflare.com Jakarta, 28 July 2016

Building & Managing 80+ PoPs

Overview of CloudFlare

● 4+ million zones/domains● 43+ billion DNS queries/day● How?

○ Orange cloud○ Global distributed network

in 80+ locationsStill growing fast!

○ Anycast routing

Protect and accelerate any website online

Benefit of orange cloud● Direct visitors to the nearest entry point

○ Fast!■ Lesser hops■ Reduced latency■ Improved performance

● Save bandwidth!○ Lesser requests to origin

■ Typically 50% of the resources on any givenweb page are cacheable

○ Mitigate malicious visitors or DDoS ■ Stop them before get to the origin web server

● Resiliency○ 80+ locations!

Grey cloud vs orange cloud

Building like crazy

1 new PoP per week!

Strategic Planning

● Agreement/Negotiation● Location

○ Peering Exchanges ○ Cost○ Support

● Size○ Traffic analysis

■ Number of Racks■ Equipment types■ Transits/Peering Exchanges

● How many?● How big are the pipes?

Challenges

● Installation○ Regulation

■ Import policy○ Transits

■ Different carriers have different setup/policies○ Language barriers

● Human factors○ Configuration errors!

■ Anycast

● Traffic turnup○ How to ensure it is not impacting

■ No outages please!

Solutions

● Out of band network is a must!○ Acting as last resort○ Upgrade/downgrade○ Maintenances

● Configuration template○ Auto configuration

■ Anycast!○ Peer review

● Global Network Engineering○ Round the clock deployment

■ Reduced bottleneck

Testing with providers

● Circuit testing○ Point to point extended ping test

■ Test all physical ports○ Failover Testing

■ Redundancy● Do not create a blackhole instead!

● Use testing prefix○ Global versus domestic

■ RIPE Atlas measurement■ Public route servers

○ Good related BGP configuration■ It does what is supposed to do

Traffic Turnup

● Do not send all prefixes at 1 go!○ Start with few prefixes○ Check the routing to these few prefixes

■ Global traffic analysis● No big drop of traffic in other location ● Traffic comes from the right countries

○ Monitor for 24 hours■ Confirms there are no anomalies observed

● On the new location● Globally

○ Announce all prefixes■ In batches■ Repeat the same steps above!

Traffic Turnup

● Get the providers to be involved○ Especially if it is a single homed○ Inform them the schedule

■ Get them to understand what to expect■ Troubleshoot and fix the problem faster!

○ Their users might be able to see problem faster

Managing 80+ PoPs

● 80+ locations● 500+ transit/exchange ports● 500+ network equipments● Uncountable alerts!

Challenges

Building Resilience Network

● Stable hardware and software● Automatic configuration template/peer review● Solid monitoring system● Network automation● Global network engineering

Hardware and Software

● Proper evaluation and testing○ Fits requirement○ Bugs free○ Scalable

● Global standardization○ Same models of hardware○ Same software codes

● No mass software upgrade!○ Small PoP first○ Deploy in batches

Solid Monitoring System

● Reduced unwanted alerts○ Only gets relevant alerts○ Silence PoP/ports during maintenances

● Monitor the performance of transit providers○ Detects packet loss on their backbone○ Provides automatic related traceroutes○ Actions based on severity

■ Disabling the PoP automatically■ Disabling traffic on related transit provider automatically■ Suggests on actions to do

Alerts Channel and Dashboard

Alerts Channel and Dashboard

Alerts Channel and Dashboard

Alerts Channel and Dashboard

Network Automation | NAPALM-Salt (examples)● salt "edge*" net.cli "show version"● salt -G "os:junos" net.cli "show chassis hardware”● salt -G "os:iosxr" net.arp● salt-run net.find [target_device]● salt-run net.find [mac_address]● salt-run bgp.neighbors [bgp_asn]● salt [target_device] [anycast.disable | anycast.enable]● salt [target_device] [transit.disable | transit.enable]

[transit_name]

Network Automation | NAPALM-Salt (examples)

Global Network Engineering

● Follow the sun approach○ San Francisco -> Singapore -> London -> San Francisco

● Doing all stuffs○ Technical operations○ Network engineering○ Network expansion projects○ New PoPs deployment○ Peering stuffs

● Very fast response to network issues and escalation

Statistics

Indonesian’ Statistics

Q&A