Simple, Scalable and Secure Networking for Data Centers with Project Calico

PowerPoint Presentation

Simple, Scalable and Secure Networking for Data Centers

Emma Gordon13/11/2015

The Brains of the New Global Network

Introductions myself, Metaswitch and Project Calico

1

Evolution of Docker NetworkingWhy Calico?Quick DemoThoughts on Security in the new world of micro-services

In this TalkMetaswitch Networks | Proprietary and confidential | 2014 | 2

Chosen docker focus because there are lot of exciting changes there at the moment but much of this talk is relevant to other setups OpenStack, Mesos, Kubernetes etc where we also have Calico integrations.2

Libnetwork in Docker 1.9 (released last week!)Pluggable architectureDifferent network drivers availableDefault is bridge

Container Network ModelNetwork Isolation

Metaswitch Networks | Proprietary and confidential | 2014 | 3Whats New in Docker Networking?

3

Multi Host NetworkingUsing the overlay network driverwhich uses VXLAN

Metaswitch Networks | Proprietary and confidential | 2014 | 4Whats New in Docker Networking?Virtual L2 segments, implemented in software by virtual switch

vSwitchvSwitchvSwitchLinuxLinuxLinux

Encap / de-encap(& flooding!)OuterMACOuterIPOuterUDPVXLANInnerMACInnerIPInnerTCP/UDPPayloadData

Router services required to hop between tenantsNAT required for public Internet accessOn/off-ramp required to get to NAS, etc.Virtual L2 segments, implemented in software by virtual switch

Encapsulation adds overheadOverlay networks are complicated to configure and diagnoseScaling is a challenge4

There are times when this is required (for specific L2 function that is needed) but in general it feels like something simpler is called for!

5

Virtual Networking RequirementsWorkloads need to communicate with one anotherEnforce policy (who can talk to whom)Base requirement for IP connectivity

Mainline use case unicast IPWhat if we focused on this 80% use case?6

What if we built a data center like the internet?

IPApp

IPApp

IPApp

IPApp

IPApp

IPApp

IPApp

IPAppRouterRouterRouterBGPBGP

What is the best example of a truly large scale model that we can think of? The Internet!!BGP as deployed today often complicated, but thats a policy overlay the protocol is simple and scalable and high performing7

What if we built a data center like the internet?

IPApp

IPApp

IPApp

IPApp

IPApp

IPApp

IPApp

IPAppBGPBGPCompute NodeCompute NodeVMs / LXCsRouterRouterRouterVMs / LXCs this is Project Calico!

An (Apache licensed) open source project to enable networking of workloads in a data center / cloud environmentObjectives:What is Calico?

SimpleScaleOpen

Thousands of servers,100ks of workloadsDont demand users to be networking expertsOpen source and open standards

9

Technical DetailsArchitecture componentsOrchestrator plug-inetcd distributed, highly available datastoreFelix agent - forwarding table update, security policyBIRD route distribution, network integration Linux kernel layer 3 forwarding and ACL enforcementBuild on and contribute to many existing open source projects

Any physical fabric (L2, L3, MPLS, )

Cloud OS / Orchestration SystemCloud OS / Orchestration SystemCompute NodeCompute NodeCompute NodeLinux kernelCloud OS / Orchestration SystemBIRDFelixRoutesACLs

WorkloadVM / ContainerEth0Eth1

CalicoPlugin

Uses standard linux routing, iptables etc. i.e. features that are already in the linux kernel

10

Life Before and after CalicoMetaswitch Networks | Proprietary and confidential | 2014 | 11

Before CalicoAfter Calico

Scale challenges above few hundred servers / thousands of workloadsScale to millions of workloads with minimal CPU and network overheadTroubleshooting connectivity issues can take hours What is happening is obvious traceroute, ping, etc., work as expected

EXITOn/off ramps + NAT to break out of overlayPath from workload to non-virtual device or public internet (or even between data centers) is just a route

High availability / load balancing across links requires LB function (virtual or physical) and/or app-specific logicEqual Cost Multi-Path (ECMP) & Anycast just work, enabling scalable resilience and full utilization of physical links

CCNACCNA or equivalent required to understand end-to-end networking, deploy applicationsBasic IP networking knowledge only required

Scale testing latest results:- Docker 100k containers across 1k hosts, 50k containers in 120s (not timed for 100k)- OpenStack 500 hosts with 20VMs each (patch in testing that should remove a bottleneck and get us to 1k) , we have a customer deployed with 140 hosts and a churn of 2 VMs per second11

Demo

Part 2 Security!!13

Remember 3-tier architectures?

Web/app/data14

Getting Medieval

Like a medieval fortress outer walls/inner walls/castle on the hill where you keep your crown jewels15

Fast forward to the present

New architecture - application specific machines replaced with general purpose servers. Commodity Hardware. Easily interchangeable. Centralized monitoring. Multiply redundant.

On top of this physical infrastructure I deploy virtualized application services. I set up my virtual network on top of the physical network and SURPRISE!, its the same architecture!

Imagine building an application for a big enterprisepretty progressive, pretty with-it when it comes to technology. Fully virtualized compute and storageEven so, ops manager hands you a form that more or less looks like this16

Increased complexity

Fill in all our services. Which ones need access from the Internet?, those go in the Web tier. Which ones access data?, those are in the Data tier. Anything else? App tier! Web cant access Data directly.

Lets consider just services; not micro-servicesfill-in all the ports and IPs that I needed to open on the firewalls.I forgot one and the application couldnt connect. Raise a ticket! 2 days later, firewalls updated and Im back in business. Then, a few weeks later, I wanted to add a new service: raise a ticket!

you want to stand up a second application stack in this environment? Maybe a third? So this is where we are today before you even add micro-services

first part of the challenge with micro-services fast rate of change

do put your foot down and you have to go through security to modify the network, start early. If you do, congratulations, now youre the bottleneck to innovation in your software company. Or, do you open wide the gates?17

Resource Fungibility

The 2nd problem resource fungibility

Each host is mutually interchangeable with any other. any service instantiated in your should be able to be deployed on any hostand you should be able to scale any service across your whole data center.

e.g. Tectonic: CoreOS + Kubernetes, and Mesosphere - in common is the vision of a datacenter operating system---an environment that takes care of the detail that your application is distributed across hundreds or thousands of servers. and requires fungibility---so that they can autoscale their applications and have service instances deployed by scheduler which pack things for maximum utilization of the expensive hardware it takes to run a datacenter.

[back one slide] This is not fungible. Ive divided my datacenter into zones and Im dependent on the firewalls to enforce security.18

Tear down the walls?

tear down the walls at the zoo and let the animals roam together?But now what do we do about security?

So, thats the problem: - Things are moving fast and will move faster - and people want the datacenter to present fungible resources,

19

The opportunity?

But its not all bad news for micro-services from the perspective of network security.

Micro-service have a really useful property: they are compartmentalized.

Building applications this way naturally forces you to break down large, monolithic applications into constituent parts and isolate them. That means that if one gets compromised, your attacker gets only a small amount of information, and only a small amount of power to subvert the rest of your system.

Theyre also, by definition, small! The goal is for each service to do one thing and do it well. This makes it much easier to analyze an application from a security perspective. Each service does something simple and probably only needs to talk to a few different things.

20

The opportunity?

the dream: break the application up into compartments that are easy to understand and easy to analyze, and then isolate them. Containers give you isolation within the host OS, and what Im talking to you about is network isolation.

Now, of course, we cant have a functioning application if the containers are completely isolated: they need to communicate, but only over a few specific ports to specific destinations.

Firewall every instance of every micro-service, opening just the ports it needs to communicate over and to just the specific addresses it needs.

So now, attackers need to do more than just breach your castle walls---make them fight tooth and nail room by room, container by container. Every service instance exposes just the minimum surface.

21

The Distributed Firewall

NetworkFabriceth0eth0

eth0192.168.1.2

Routing

Routing

eth0192.168.1.3

eth0192.168.1.4

eth0192.168.1.7

eth0192.168.1.6

eth0192.168.1.510.0.0.110.0.0.2

The problem of resource fungibility, that containers need to run anywhere in the data center means that:the firewall needs to be closely coupled to the service instance. It needs to be where the container is and needs to live and die with the service.

To properly isolate each service, we need to instantiate a per-instance firewall right there. In the container host Linux kernel, tied to the particular virtual interface for that instance.

The scale of containers and rate of change means that this needs to be automated. You cant provision these manually based on opening tickets.

22

Project Calico architecture

eth0192.168.1.2

eth0192.168.1.4

eth0192.168.1.7

FelixRoutesiptables

RouteReflector

Kernel

BIRD

How can Calico can do this?

In order to program the firewall, Felix needs to know the policy for each containeretcd cluster used as data store (security policy written to it by the orchestrators calico plugin)

Felix programs local routes and sets up iptablesIptables uses ipchains to configure complex firewall rules in the kernel

Each workload has its own virtual interface for the firewall rules to be associated withSo the firewalls are in the host kernel and tied to individual workloads

Felix watches etcd for changes, as services are created, destroyed and rescaled.So it will automatically create/update/remove the firewall rules as the workloads change

23

We welcome your feedback and contributions Website www.projectcalico.orgGithubgithub.com/projectcalicoMailing listprojectcalico.org/contact/Freenode IRC: #calicoSlack Communityhttps://calicousers.slack.comTwitter@projectcalicoMetaswitch Networks | Proprietary and confidential | 2014 | 24

24

Software

Simple, Scalable and Secure Networking for Data Centers with Project Calico