35
Technical Debt: An Anycast Story Tom Strickx Cloudflare, London HKNOG 7.0 Hong-Kong

Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt: An Anycast Story

Tom StrickxCloudflare, London

HKNOG 7.0Hong-Kong

Page 2: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Tom Strickx● Network Hooligan at Cloudflare (Network Software Engineer)● Contributor at NAPALM Automation and Saltstack● https://tom.strickx.com

Ichabond

@tstrickx2

Page 3: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

● Anycast introduction● Our technical debt● Configuration changes using Saltstack● Change monitoring

Agenda

3

Page 4: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

4

Page 5: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Our Anycast Network

● ± 250 IPv4 prefixes

● ± 15 IPv6 prefixes

● Announced globally (150+ locations)

5

Page 6: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

6

Page 7: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt

7

Page 8: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt

● Few Tier 1 transit providers

● Prepends to steer traffic to proper location (± 10 PoPs)

● Eventually normalized globally

History

8

Page 9: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical DebtIssues

9

Page 10: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt

● Authoritative DNS targeted (eg. 173.245.58.0/24)

● Single location attracts all traffic due to missing prepends

Incidents

10

Page 11: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt

● RPKI

● Shorter AS-path

● /24 everything

Solutions

11

Page 12: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical Debt

● Staggered deployment (6 stages)

● As quickly as possible globally

● Extensive internal and external monitoring

Resolution

12

Page 13: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Technical DebtChange

13

Page 14: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

14

Page 15: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Automation and orchestration

● Open source

● Python, Jinja2 & YAML

● Highly scalable

● Very fast

● Vendor neutral

● Across our fleet: servers and network equipment

Saltstack

15

Page 16: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Make sure we know what we’re changing

● Adjust configuration if needed

● Add confidence

Prechecks

16

Page 17: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global RolloutActual Change

17

● All in Python (config generation)

● Both Junos and EOS

● Concurrently

● Done globally within

± 2 minutes

Page 18: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Internal metrics

● External metrics

Metrics

18

Page 19: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Stored in Clickhouse or Prometheus

● Visualized with Grafana

● Flows

● SNMP data

● Request data

Internal metrics

19

Page 20: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Developed at Yandex

● Column-oriented DBMS

● Open source

● 3 PB on disk

● 100 Gbps insertion

Clickhouse

20

Page 21: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Stores flow data

● Stores request data

Clickhouse

21

Clickhouse query

Result

Page 22: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Time-series database

● Monitoring platform

● Open source

Prometheus

22

Page 23: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Prepend length

● PromQL

Prometheus

23

Begin rollout Rollout complete

All external looking-glasses

Page 24: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Analytics

● Time-series visualization

● Multitude of plugins

● Open source

Grafana

24

Page 25: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Internal Metrics

● RPS / colo

● Traffic shift during change

● Real-time information

25

Global Rollout

Page 26: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Stored in Prometheus or raw ingestion

● Visualized with Grafana

● RIPE Atlas

● Looking glasses

External metrics

26

Page 27: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Global probes

● Ping, Traceroute, DNS query

● REST API

● Determine routing before and after change

RIPE Atlas

27

Page 28: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Routeviews

● AS57335 (http://dfz.watch/) looking glasses (Thanks Aaron!)

● IX looking glasses (We need more! With APIs!)

● Collect / scrape into Prometheus

● Visualize with Grafana

Looking Glasses

28

Page 29: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Scrape metrics

● BeautifulSoup for scraping

● Aggregate data

Looking Glasses

29

Page 30: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Insert into Prometheus

● python_client

Looking Glasses

30

Page 31: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global RolloutLooking Glasses

31

● Grafana visualization

● Track change in

near-real-time

Page 32: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global RolloutCombined

32

● Traffic

● Prepends

Perturbance

Delay

Page 33: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global RolloutTakeaways

33

Page 34: Technical Debt: An Anycast Story - HKNOG › wp-content › uploads › 2018 › 12 › ... · Configuration changes using Saltstack Agenda Change monitoring 3. 4. Our Anycast Network

Global Rollout

● Negligible customer impact

● Route fluctuations for ± 2 minutes

● Took 1 hour to complete change, 2 days to prepare

● Instantly detected and resolved minor issues

● Heavily reliant on open source tooling and community

Takeaways

34