59
Tarmak Documentation Release 0.3 Jetstack Jul 10, 2018

Tarmak Documentation - Read the Docs · 1.1 What is Tarmak ... (as well as potentially Windows in ... the instance can run independently of Puppet. Why Puppet over

  • Upload
    lamdung

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Tarmak DocumentationRelease 0.3

Jetstack

Jul 10, 2018

Contents

1 Introduction 31.1 What is Tarmak? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Architecture overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Tools used under the hood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 User guide 72.1 Getting started with AWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Command-line tool reference 193.1 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Developer guide 254.1 Building Tarmak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Release Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Setting up a Puppet Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Proposals 295.1 Terraform Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Custom Vault Auth Provider for EC2 Instances - A Proposal . . . . . . . . . . . . . . . . . . . . . . 31

6 Known issues 356.1 An alias with the name arn:aws:kms:<region>:<id>:alias/tarmak/

<environment>/secrets already exists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 AWS key pair is not matching the local one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 Troubleshooting 377.1 Calico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

8 Examples 398.1 Kube2IAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

9 Deploying into an existing AWS VPC 45

10 Vault Setup and Configurations 4910.1 Certificate Authorities (CAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

i

10.2 Init Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5010.3 Purpose of Node Unique Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5010.4 Expiration of Tokens and Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5010.5 Certificate Roles on Kubernetes CA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

11 Vault Helper In Tarmak 51

12 Accepting CentOS Terms 53

ii

Tarmak Documentation, Release 0.3

Tarmak is an open-source toolkit for Kubernetes cluster lifecycle management. It is focused on best-practice clustersecurity, management and operation.

Tarmak’s underlying components are the product of Jetstack’s work with its customers to build and deploy Kubernetesin production at scale.

Note: Please note that current releases of Tarmak are alpha (unless explicitly marked). Although we do not anticipatebreaking changes, at this stage this cannot be absolutely guaranteed.

Contents 1

Tarmak Documentation, Release 0.3

2 Contents

CHAPTER 1

Introduction

1.1 What is Tarmak?

Tarmak is a toolkit for Kubernetes cluster lifecycle management. It focuses on best practice cluster security and clustermanagement/operation. It has been built from the ground-up to be cloud provider-agnostic and hence provides a meansfor consistent and reliable cluster deployment and management, across clouds and on-premises environments.

Tarmak and its underlying components are the product of Jetstack’s work with its customers to build and deployKubernetes in production at scale.

1.2 Design goals

1.2.1 Goals

• Build and manage as similar as possible cluster deployments across different cloud and on-premises environ-ments.

• Combine tried-and-tested and well-understood system tools throughout the stack to provide production-readyand ready-to-use clusters.

• Follow security best practices.

• Support for a fully automated CI/CD operation.

• Provide minimally invasive upgrades, which can be predicted using dry-runs.

• Have a testable code base, that follows KISS and DRY. For example, avoidance of convoluted bash scripts thatare environment and operating system-specific.

• Provide a tool-independent CLI experience, that simplifies common tasks and allows to investigate cluster statusand health quickly and easily.

• Allow customisation to parts of the code, to follow internal standards.

3

Tarmak Documentation, Release 0.3

1.2.2 Non-goals

• Reinventing the wheel

1.3 Architecture overview

Todo: A high-level architecture diagram is coming soon!

1.3.1 Tarmak configuration resources

The Tarmak configuration uses Kubernetes’ API tooling and consists of various different resources. While the Tarmakspecific resources (Providers and Environments) are defined by the Tarmak project, Clusters are derived from a draftversion of the Cluster API. This is a community effort to have a standardised way of defining Kubernetes clusters. Bydefault, Tarmak configuration is located in ~/.tarmak/tarmak.yaml.

Note: Although we do not anticipate breaking changes in our configuration, at this stage this cannot be absolutelyguaranteed. Through the use of the Kubernetes API tooling, we have the option of migrating between different versionsof the configuration in a controlled way.

Providers

A Provider contains credentials for and information about cloud provider accounts. A single Provider can be used formany Environments, while every Environment has to be associated with exactly one Provider.

Currently, the only supported Provider is Amazon. An Amazon Provider object references credentials to make use ofan AWS account to provision resources.

Environments

An Environment consists of one or more Clusters. If an Environment has exactly one cluster, it is called a SingleCluster Environment. A Cluster in such an environment also contains the Environment-wide tooling.

For Multi-Cluster Environments, these components are placed in a special hub Cluster resource. This enables reuseof bastion and Vault nodes throughout all Clusters.

Clusters

A Cluster resource represents exactly one Kubernetes cluster. The only exception being the hub in a Multi ClusterEnvironment. Hubs do not contain a Kubernetes cluster, as they are just where the Environment-wide tooling is placed.

All instances in a Cluster are defined by an InstancePool.

Stacks

The Cluster-specific Terraform code is broken down into separate, self-contained Stacks. Stacks share Terraformoutputs via the remote Terraform state. Some Stacks depend on others, so the order in which they are provisioned isimportant. Tarmak currently uses the following Stacks to build environments:

4 Chapter 1. Introduction

Tarmak Documentation, Release 0.3

• state: contains the stateful resources of the Cluster (data stores, persistent disk volumes)

• network: sets-up the necessary network objects to allow communication

• tools: contains the Environment-wide tooling, like bastion and CI/CD instances

• vault: spins up a Vault cluster, backed by a Consul key-value store

• kubernetes: contains Kubernetes’ master, worker and etcd instances

Fig. 1: This is what a single cluster, production setup might look like. While the dev environment allows for multipleclusters (e.g. each with different features and/or team members), the staging and production environments consist ofa single cluster each. The same AWS account is used for the dev and staging environment, while production runs inseparate account.

InstancePools

Every Cluster contains InstancePools that group instances of a similar type together. Every InstancePool has a nameand role attached to it. Other parameters allow us to customise the instances regarding size, count and location.

These roles are defined:

• bastion: Bastion instance within the tools stack. Has a public IP address and allows Tarmak to connect toother instances that only have private IP addresses.

• vault: Vault instance within the vault stack. Has persistent disks, that back a Consul cluster, which backsVault itself.

• etcd: Stateful instances within kubernetes stack. etcd is the key-value store backing Kubernetes andpotentially other components, overlay networks such as Calico for example.

• master: Stateless Kubernetes master instances.

• worker: Stateless Kubernetes worker instances.

1.3. Architecture overview 5

Tarmak Documentation, Release 0.3

1.4 Tools used under the hood

Tarmak is backed by tried-and-tested tools, which act as the glue and automation behind the Tarmak CLI interface.These tools are plugable, but at this stage we use the following:

1.4.1 Docker

Docker is used to package the tools necessary and run them in a uniform environment across different operatingsystems. This allows Tarmak to run on Linux and macOS (as well as potentially Windows in the future).

1.4.2 Packer

Packer helps build reproducible VM images for various environments. Using Packer, we build custom VM imagescontaining the latest kernel upgrades and supported puppet version.

1.4.3 Terraform

Terraform is a well-known tool for infrastructure provisioning in public and private clouds. We use Terraform tomanage the lifecycle of resources and store cluster state.

1.4.4 Puppet

As soon as instances are spun up, Tarmak uses Puppet to configure them. Puppet is used in a ‘masterless’ architecture,so as to avoid the complexity of a full Puppet master setup. All the services are configured in such a way that, onceconverged, the instance can run independently of Puppet.

Why Puppet over other means of configuration (i.e. bash scripts, Ansible, Chef)? The main reason is its testability(at various levels) as well as the concept of explicit dependency definition (allowing a tree of dependencies to be builthelping predict the changes with a dry-run).

1.4.5 Systemd

Tarmak uses Systemd units and timers. Units are used to maintain the dependencies between services while timersenable periodic application execution - e.g. for certificate renewal.

6 Chapter 1. Introduction

CHAPTER 2

User guide

2.1 Getting started with AWS

In this getting started guide, we walk through how to initialise Tarmak with a new Provider (AWS), a new Environmentand then provision a Kubernetes cluster. This will comprise of Kubernetes master and worker nodes, etcd clusters,Vault and a bastion node with a public IP address (see Architecture overview for details of cluster components)

2.1.1 Prerequisites

• Docker

• An AWS account that has accepted the CentOS licence terms

• A public DNS zone that can be delegated to AWS Route 53

• Optional: Vault with the AWS secret backend configured

2.1.2 Overview of steps to follow

• Initialise cluster configuration

• Build an image (AMI)

• Create the cluster

• Destroy the cluster

2.1.3 Initialise configuration

Simply run tarmak init to initialise configuration for the first time. You will be prompted for the necessaryconfiguration to set-up a new Provider (AWS) and Environment. The list below describes the questions you will beasked.

7

Tarmak Documentation, Release 0.3

Note: If you are not using Vault’s AWS secret backend, you can authenticate with AWS in the same way as the AWSCLI. More details can be found at Configuring the AWS CLI.

• Configuring a new Provider

– Provider name: must be unique

– Cloud: Amazon (AWS) is the default and only option for now (more clouds to come)

– Credentials: Amazon CLI auth (i.e. env variables/profile) or Vault (optional)

– Name prefix: for state buckets and DynamoDB tables

– Public DNS zone: will be created if not already existing, must be delegated from the root

• Configuring a new Environment

– Environment name: must be unique

– Project name: used for AWS resource labels

– Project administrator mail address

– Cloud region: pick a region fetched from AWS (using Provider credentials)

• Configuring new Cluster(s)

– Single or multi-cluster environment

– Cloud availability zone(s): pick zone(s) fetched from AWS

Once initialised, the configuration will be created at $HOME/.tarmak/tarmak.yaml (default).

2.1.4 Create an AMI

Next we create an AMI for this environment by running tarmak clusters images build (this is the step thatrequires Docker to be installed locally).

% tarmak clusters images build<output omitted>

2.1.5 Create the cluster

To create the cluster, run tarmak clusters apply.

% tarmak clusters apply<output omitted>

Warning: The first time this command is run, Tarmak will create a hosted zone and then fail with the followingerror.

* failed verifying delegation of public zone 5 times, make sure the zone k8s.→˓jetstack.io is delegated to nameservers [ns-100.awsdns-12.com ns-1283.awsdns-32.→˓org ns-1638.awsdns-12.co.uk ns-842.awsdns-41.net]

When creating a multi-cluster environment, the hub cluster must first be applied . To change the current cluster usethe flag --current-cluster. See tarmak cluster help for more information.

8 Chapter 2. User guide

Tarmak Documentation, Release 0.3

You should now change the nameservers of your domain to the four listed in the error. If you only wish to delegate asubdomain containing your zone to AWS without delegating the parent domain see Creating a Subdomain That UsesAmazon Route 53 as the DNS Service without Migrating the Parent Domain.

To complete the cluster provisioning, run tarmak clusters apply once again.

Note: This process may take 30-60 minutes to complete. You can stop it by sending the signal SIGTERM or SIGINT(Ctrl-C) to the process. Tarmak will not exit immediately. It will wait for the currently running step to finish and thenexit. You can complete the process by re-running the command.

2.1.6 Destroy the cluster

To destroy the cluster, run tarmak clusters destroy.

% tarmak clusters destroy<output omitted>

Note: This process may take 30-60 minutes to complete. You can stop it by sending the signal SIGTERM or SIGINT(Ctrl-C) to the process. Tarmak will not exit immediately. It will wait for the currently running step to finish and thenexit. You can complete the process by re-running the command.

2.2 Configuration Options

After generating your tarmak.yaml configuration file there are a number of options you can set that are not exposedvia tarmak init.

2.2.1 Pod Security Policy

Note: For cluster versions greater than 1.8.0 this is applied by default. For cluster versions before 1.6.0 is it notapplied.

To enable Pod Security Policy to an environment, include the following to the configuration file under the kubernetesfield of that environment:

kubernetes:podSecurityPolicy:

enabled: true

The configuration file can be found at $HOME/.tarmak/tarmak.yaml (default). The Pod Security Pol-icy manifests can be found within the tarmak directory at puppet/modules/kubernetes/templates/pod-security-policy.yaml.erb

2.2.2 Cluster Autoscaler

Tarmak supports deploying Cluster Autoscaler when spinning up a Kubernetes cluster. The following tarmak.yamlsnippet shows how you would enable Cluster Autoscaler.

2.2. Configuration Options 9

Tarmak Documentation, Release 0.3

kubernetes:clusterAutoscaler:enabled: true

...

The above configuration would deploy Cluster Autoscaler with an image of gcr.io/google_containers/cluster-autoscaler using the recommend version based on the version of your Kubernetes cluster. The configuration blockaccepts two optional fields of image and version allowing you to change these defaults. Note that the final image tagused when deploying Cluster Autoscaler will be the configured version prepended with the letter v.

The current implementation will configure the first instance pool of type worker in your cluster configuration to scalebetween minCount and maxCount. We plan to add support for an arbitrary number of worker instance pools.

Overprovisioning

Tarmak supports overprovisioning to give a fixed or proportional amount of headroom in the cluster. The techniqueused to implement overprovisioning is the same as described in the cluster autoscaler documentation. The follow-ing tarmak.yaml snippet shows how to configure fixed overprovisioning. Note that cluster autoscaling must also beenabled.

kubernetes:clusterAutoscaler:enabled: trueoverprovisioning:

enabled: truereservedMillicoresPerReplica: 100reservedMegabytesPerReplica: 100replicaCount: 10

...

This will deploy 10 pause Pods with a negative PriorityClass so that they will be preempted by any other pendingPods. Each Pod will request the specified number of millicores and megabytes. The following tarmak.yaml snippetshows how to configure proportional overprovisioning.

kubernetes:clusterAutoscaler:enabled: trueoverprovisioning:

enabled: truereservedMillicoresPerReplica: 100reservedMegabytesPerReplica: 100nodesPerReplica: 1coresPerReplica: 4

...

The nodesPerReplica and coresPerReplica configuration parameters are described in the cluster-proportional-autoscaler documentation.

The image and version used by the cluster-proportional-autoscaler can also be specified using the image and versionfields of the overprovisioning block. These values default to k8s.gcr.io/cluster-proportional-autoscaler-amd64 and1.1.2 respectively.

10 Chapter 2. User guide

Tarmak Documentation, Release 0.3

2.2.3 Logging

Each Kubernetes cluster can be configured with a number of logging sinks. The only sink currently supported isElasticsearch. An example configuration is shown below:

apiVersion: api.tarmak.io/v1alpha1kind: Configclusters:- loggingSinks:- types:

- application- platformelasticsearch:host: example.amazonaws.comport: 443logstashPrefix: testtls: truetlsVerify: falsehttpBasicAuth:

username: administratorpassword: mypassword

- types:- allelasticsearch:host: example2.amazonaws.comport: 443tls: trueamazonESProxy:

port: 9200...

A full list of the configuration parameters are shown below:

• General configuration parameters

– types - the types of logs to ship. The accepted values are:

* platform (kernel, systemd and platform namespace logs)

* application (all other namespaces)

* audit (apiserver audit logs)

* all

• Elasticsearch configuration parameters

– host - IP address or hostname of the target Elasticsearch instance

– port - TCP port of the target Elasticsearch instance

– logstashPrefix - Shipped logs are in a Logstash compatible format. This field specifies theLogstash index prefix * tls - enable or disable TLS support

– tlsVerify - force certificate validation (only valid when not using the AWS ES Proxy)

– tlsCA - Custom CA certificate for Elasticsearch instance (only valid when not using the AWS ESProxy)

– httpBasicAuth - configure basic auth (only valid when not using the AWS ES Proxy)

* username

* password

2.2. Configuration Options 11

Tarmak Documentation, Release 0.3

– amazonESProxy - configure AWS ES Proxy

* port - Port to listen on (a free port will be chosen for you if omitted)

Setting up an AWS hosted Elasticsearch Cluster

AWS provides a hosted Elasticsearch cluster that can be used for log aggregation. This snippet will setup an Elastic-search domain in your account and create a policy along with it that will allow shipping of logs into the cluster:

variable "name" {default = "tarmak-logs"

}

variable "region" {default = "eu-west-1"

}

provider "aws" {region = "${var.region}"

}

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "es" {statement {actions = [

"es:*",]

principals {type = "AWS"

identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",

]}

}}

resource "aws_elasticsearch_domain" "es" {domain_name = "${var.name}"elasticsearch_version = "6.2"

cluster_config {instance_type = "t2.medium.elasticsearch"

}

ebs_options {ebs_enabled = truevolume_type = "gp2"volume_size = 30

}

access_policies = "${data.aws_iam_policy_document.es.json}"}

data "aws_iam_policy_document" "es_shipping" {

(continues on next page)

12 Chapter 2. User guide

Tarmak Documentation, Release 0.3

(continued from previous page)

statement {actions = [

"es:ESHttpHead","es:ESHttpPost","es:ESHttpGet",

]

resources = ["arn:aws:es:${var.region}:${data.aws_caller_identity.current.account_id}:domain/

→˓${var.name}/*",]

}}

resource "aws_iam_policy" "es_shipping" {name = "${var.name}-shipping"description = "Allows shipping of logs to elasticsearch"

policy = "${data.aws_iam_policy_document.es_shipping.json}"}

output "elasticsearch_endpoint" {value = "${aws_elasticsearch_domain.es.endpoint}"

}

output "elasticsearch_shipping_policy_arn" {value = "${aws_iam_policy.es_shipping.arn}"

}

Once terraform has been successfully run it will output, the resulting AWS Elasticsearch endpoint and the policy thatallow shipping to it:

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

elasticsearch_endpoint = search-tarmak-logs-xyz.eu-west-1.es.amazonaws.comelasticsearch_shipping_policy_arn = arn:aws:iam::1234:policy/tarmak-logs-shipping

Both of those outputs can then be used in the tarmak configuration:

apiVersion: api.tarmak.io/v1alpha1clusters:- name: cluster

loggingSinks:- types: ["all"]elasticsearch:

host: ${elasticsearch_endpoint}tls: trueamazonESProxy: {}

amazon:additionalIAMPolicies:- ${elasticsearch_shipping_policy_arn}

2.2. Configuration Options 13

Tarmak Documentation, Release 0.3

2.2.4 OIDC Authentication

Tarmak supports authentication using OIDC. The following snippet demonstrates how you would configure OIDCauthentication in tarmak.yaml. For details on the configuration options, visit the Kubernetes documentation here.Note that if the version of your cluster is less than 1.10.0, the signingAlgs parameter is ignored.

kubernetes:apiServer:

oidc:clientID: 1a2b3c4d5e6f7g8hgroupsClaim: groupsgroupsPrefix: "oidc:"issuerURL: https://domain/application-serversigningAlgs:- RS256usernameClaim: preferred_usernameusernamePrefix: "oidc:"

...

For the above setup, ID tokens presented to the apiserver will need to contain claims called preferred_username andgroups representing the username and groups associated with the client. These values will then be prepended withoidc: before authorisation rules are applied, so it is important that this is taken into account when configuring clusterauthorisation.

2.2.5 Jenkins

You can install Jenkins as part of your hub. This can be achieved by adding an extra instance pool to your hub. Thisinstance pool can be extended with an annotation tarmak.io/jenkins-certificate-arn. The value of thisannotation will be ARN pointing to an Amazon Certificate. When you set this annotation, your Jenkins will be securedwith HTTPS. You need to make sure your SSL certificate is valid for jenkins.<environment>.<zone>.

- image: centos-puppet-agentmaxCount: 1metadata:annotations:

tarmak.io/jenkins-certificate-arn: "arn:aws:acm:eu-west-→˓1:228615251467:certificate/81e0c595-f5ad-40b2-8062-683b215bedcf"

creationTimestamp: nullname: jenkins

minCount: 1size: largetype: jenkinsvolumes:- metadata:

creationTimestamp: nullname: root

size: 16Gitype: ssd

- metadata:creationTimestamp: nullname: data

size: 16Gitype: ssd

...

14 Chapter 2. User guide

Tarmak Documentation, Release 0.3

2.2.6 Tiller

Another configuration option allows to deploy Tiller the server-side of Helm. Tiller is listening for request on theloopback device only. This makes sure that no other Pod in the cluster can speak to it, while Helm clients are still ableto access it using a port forwarding through the API server.

As Helm and Tiller minor version need to match, the tarmak configuration also allows to override the deployed version:

kubernetes:tiller:enabled: trueversion: 2.9.1

Warning: Tiller is deployed with full cluster-admin ClusterRole bound to its service account and hastherefore quiet far reaching privileges. Also consider Helm’s security best practices.

2.2.7 Prometheus

By default Tarmak will deploy a Prometheus installation and some exporters into the monitoring namespace.

This can be completely disabled with the following cluster configuration:

kubernetes:prometheus:enabled: false

Another possibility would be to use the Tarmak provisioned Prometheus only for scraping exporters on instancesthat are not part of the Kubernetes cluster. Using federation, those metrics could then be integrated into an existingPrometheus deployment.

To have Prometheus only monitor nodes external to the cluster, use the following configuration instead:

kubernetes:prometheus:enabled: truemode: ExternalScrapeTargetsOnly

Finally, you may wish to have Tarmak only install the exporters on the external nodes. If this is your desired configu-ration, then set the following mode in the yaml:

kubernetes:prometheus:enabled: truemode: ExternalExportersOnly

2.2.8 API Server

It is possible to let Tarmak create an public endpoint for your APIserver. This can be used together with Secure publicendpoints.

kubernetes:apiServer:public: true

2.2. Configuration Options 15

Tarmak Documentation, Release 0.3

2.2.9 Secure public endpoints

Public endpoints (Jenkins, bastion host and if enabled apiserver) can be secured by limiting the access to a list of CIDRblocks. This can be configured on a environment level for all public endpoint and if wanted can be overwritten on aspecific public endpoint.

Environment level

This can be done by adding an adminCIDRs list to an environments block, if nothing has been set, the default is0.0.0.0/0:

environments:- contact: [email protected]

location: eu-west-1metadata:name: example

privateZone: example.localproject: example-projectprovider: awsadminCIDRs:- x.x.x.x/32- y.y.y.y/24

Jenkins and bastion host

The environment level can be overwritten for Jenkins and bastion host by adding allowCIDRs in the instance poolblock:

instancePools:- image: centos-puppet-agent

allowCIDRs:- x.x.x.x/32maxCount: 1metadata:name: jenkins

minCount: 1size: largetype: jenkins

API Server

For API server you can overwrite the environment level by adding allowCIDRs to the kubernetes block.

Warning: For this to work, you need to set your API Server public first.

kubernetes:apiServer:

public: trueallowCIDRs:- y.y.y.y/24

16 Chapter 2. User guide

Tarmak Documentation, Release 0.3

2.2.10 Additional IAM policies

Additional IAM policies can be added by adding those ARNs to the tarmak.yaml config. You can add additionalIAM policies to the cluster and instance pool blocks. When you define additional IAM policies on bothlevels, they will be merged when applied to a specific instance pool.

Cluster

You can add additional IAM policies that will be added to all the instance pools of the whole cluster.

apiVersion: api.tarmak.io/v1alpha1clusters:- amazon:

additionalIAMPolicies:- "arn:aws:iam::xxxxxxx:policy/policy_name"

Instance pool

It is possible to add extra policies to only a specific instance pool.

- image: centos-puppet-agentamazon:additionalIAMPolicies:- "arn:aws:iam::xxxxxxx:policy/policy_name"

maxCount: 3metadata:name: worker

minCount: 3size: mediumsubnets:- metadata:zone: eu-west-1a

- metadata:zone: eu-west-1b

- metadata:zone: eu-west-1c

type: worker

2.2. Configuration Options 17

Tarmak Documentation, Release 0.3

18 Chapter 2. User guide

CHAPTER 3

Command-line tool reference

Here are the commands and resources for the tarmak command-line tool.

3.1 Commands

3.1.1 kubectl

Run kubectl on clusters (Alias for $ tarmak clusters kubectl).

Usage:

$ tarmak kubectl

3.1.2 init

• Initialises a provider if not existing.

• Initialises an environment if not existing.

• Initialises a cluster.

Usage:

$ tarmak init

19

Tarmak Documentation, Release 0.3

3.2 Resources

Tarmak has three resources that can be acted upon - environments, providers and clusters.

Usage:

$ tarmak [providers | environments | clusters] [command]

3.2.1 Providers

Providers resource sub-command.

list

List providers resource.

Usage:

$ tarmak providers list

init

Initialise providers resource.

Usage:

$ tarmak providers init

3.2.2 Environments

Environments resource sub-command.

list

List environments resource.

Usage:

$ tarmak environments list

init

Initialise environments resource.

Usage:

20 Chapter 3. Command-line tool reference

Tarmak Documentation, Release 0.3

$ tarmak environments init

3.2.3 Clusters

Clusters resource sub-command.

list

List clusters resource.

Usage:

$ tarmak clusters list

init

Initialise cluster resource.

Usage:

$ tarmak clusters init

kubectl

Run kubectl on clusters resource.

Usage:

$ tarmak clusters kubectl

ssh <instance_name>

Secure Shell into an instance on clusters.

Usage:

$ tarmak clusters ssh <instance_name>

apply

Apply changes to a cluster (by default applies infrastructure (Terraform) and configuration (Puppet) changes.

Usage:

$ tarmak clusters apply

Flags:

3.2. Resources 21

Tarmak Documentation, Release 0.3

--infrastructure-stacks [state,network,tools,vault,kubernetes]target exactlyone piece of the infrastructure (aka terraform stack). This implies

→˓(--infrastructure-only)--infrastructure-only [default=false]

only apply infrastructure (aka terraform)--configuration-only [default=false]

only apply configuration (aka puppet)--dry-run [default=false]

show changes only, do not actually execute them

destroy

Destroy the infrastructure of a cluster

Usage:

$ tarmak clusters destroy

Flags:

--infrastructure-stacks [state,network,tools,vault,kubernetes]target exactlyone piece of the infrastructure (aka terraform stack). This implies

→˓(--infrastructure-only)--force-destroy-state-stack [default=false]

force destroy the state stack, this is unreversible--dry-run [default=false]

show changes only, do not actually execute them

instances [ list | ssh ]

Instances on Cluster resource.

list

Lists nodes of the context.

ssh

Alias for $ tarmak clusters ssh.

Usage:

$ tarmak clusters instances [list | ssh]

server-pools [ list ]

list

List server pools on Cluster resource.

Usage:

22 Chapter 3. Command-line tool reference

Tarmak Documentation, Release 0.3

$ tarmak clusters server-pools list

images [ list | build ]

list

List images on Cluster resource.

build

Build images of Cluster resource.

Usage:

$ tarmak clusters images [list | build]

debug [ terraform shell | puppet | etcd | vault ]

Used for debugging.

terraform shell

Debug terraform via shell.

Usage:

$ tarmak clusters debug terraform [shell]

puppet

Debug puppet.

Usage:

$ tarmak clusters debug puppet []

etcd

Debug etcd.

Usage:

$ tarmak clusters debug etcd [status|shell|etcdctl]

3.2. Resources 23

Tarmak Documentation, Release 0.3

vault

Debug vault.

Usage:

$ tarmak clusters debug vault [status|shell|vault]

24 Chapter 3. Command-line tool reference

CHAPTER 4

Developer guide

Here we will walk through how to compile the Tarmak CLI and documentation from source.

4.1 Building Tarmak

4.1.1 Prerequisites

• Go (for the CLI)

• Python 2.x (for documentation)

• virtualenv and virtualenvwrapper (for documentation)

4.1.2 Building Tarmak binary

First we will clone the Tarmak repository and build the tarmak binary. Make sure you have your $GOPATH setcorrectly. The last line may change depending on your architecture.

mkdir -p $GOPATH/src/github.com/jetstackcd $GOPATH/src/github.com/jetstackgit clone [email protected]:jetstack/tarmak.gitcd tarmakmake buildln -s $PWD/tarmak_$(uname -s | tr '[:upper:]' '[:lower:]')_amd64 /usr/local/bin/tarmak

You should now be able to run tarmak to view the available commands.

$ tarmakTarmak is a toolkit for provisioning and managing Kubernetes clusters.

Usage:tarmak [command]

(continues on next page)

25

Tarmak Documentation, Release 0.3

(continued from previous page)

Available Commands:clusters Operations on clustersenvironments Operations on environmentshelp Help about any commandinit Initialize a clusterkubectl Run kubectl on the current clusterproviders Operations on providersversion Print the version number of tarmak

Flags:-c, --config-directory string config directory for tarmak's configuration

→˓(default "~/.tarmak")-h, --help help for tarmak-v, --verbose enable verbose logging

Use "tarmak [command] --help" for more information about a command.

4.1.3 Building Tarmak documentation

To build the documentation run the following.

cd $GOPATH/src/github.com/jetstack/tarmak/docsmake html

Or using docker:

cd $GOPATH/src/github.com/jetstack/tarmak/docsmake docker_html

You can now open _build/html/index.html in a browser or serve the site with a web server of your choice.

4.1.4 Updating puppet subtrees

Puppet modules are maintained as separate repositories, which get bundled into tarmak using git subtree. To pull thelatest changes from the upstream repositories, run make subtrees.

4.2 Release Checklist

This is a list to collect manual tasks/checks necessary for cutting a release of Tarmak:

• Ensure release references are updated (don’t forget to commit)

make release VERSION=x.y.x

• Tag release commit with x.y.z and push to GitLab and GitHub

• Update the CHANGELOG using the release notes

# relnotes is the golang tool from https://github.com/kubernetes/release/tree/master/→˓toolbox/relnotesrelnotes -repo tarmak -owner jetstack -doc-url=https://docs.tarmak.io -htmlize-md -→˓markdown-file CHANGELOGX.md x.y(-1).z-1..x.y.z

26 Chapter 4. Developer guide

Tarmak Documentation, Release 0.3

• Branch out minor releases into release-x.y

After release job has run:

• Make sure we update the generated releases page

4.3 Setting up a Puppet Development Environment

In order to develop the Puppet modules for configuring Tarmak instances we need to set up our environment properly.The following instructions will walk through the process on a fresh Ubuntu 16.04 LTS instance.

Install Ruby development tools (http://www.nokogiri.org/tutorials/installing_nokogiri.html):

sudo apt-get install build-essential patch ruby-dev zlib1g-dev liblzma-devsudo gem install bundler

To test your environment, verify a module:

cd puppet/modules/kubernetesmake verify

Install the latest version of vagrant:

wget https://releases.hashicorp.com/vagrant/2.1.1/vagrant_2.1.1_x86_64.debsudo dpkg -i vagrant_2.1.1_x86_64.deb

Install vagrant-libvirt (https://github.com/vagrant-libvirt/vagrant-libvirt). You should now be able to run acceptancetests:

make acceptance

To keep any VMs around for debugging purposes, use the following commands instead of the acceptance target:

BEAKER_provision=yes BEAKER_destroy=no bundle exec rake beakerbundle exec rake beaker:ssh

4.3. Setting up a Puppet Development Environment 27

Tarmak Documentation, Release 0.3

28 Chapter 4. Developer guide

CHAPTER 5

Proposals

This section should contain design and implementation proposals for Tarmak. Once the proposal has been imple-mented it should be preserved to serve as a concrete reference for why certain decisions were made. It also allows usto check if the implementation has drifted from the proposal.

5.1 Terraform Provider

This proposal suggests how to approach the implementation of a Terraform provider for Tarmak, to make Tarmak <->Terraform interactions within a Terraform run more straightforward.

5.1.1 Background

Right now the Terraform code for the AWS provider (the only one implemented) consists of multiple separate stacks(state, network, tools, vault, kubernetes). The Main reason for having these stacks is to enable Tarmak to do operationsin between parts of the resource spin up. Examples for such operations are (list might not be complete):

• Bastion node needs to be up for other instances to check into Wing.

• Vault needs to be up and initialised, before PKI resources are created.

• Vault needs to contain a cluster’s PKI resources, before Kubernetes instances can be created (init-tokens).

The separation of stacks comes with some overhead for preparing Terraform apply (pull state, lock stack, plan run,apply run). Terraform can’t make use of parallel creation of resources that are independent from each other.

5.1.2 Objective

An integration of these stacks into a single stack could lead to a substantial reduction of execution time.

As Terraform is running in a container is quite isolated from the Tarmak process:

• Requires some Terraform refactoring

29

Tarmak Documentation, Release 0.3

• Should be done before implementing multiple providers

5.1.3 Changes

Terraform code base

Terraform resources

The proposal is to implement a Tarmak provider for Terraform, with at least these three resources.

tarmak_bastion_instance

A bastion instance

Input:- Bastion IP address or hostname- Username for SSH

Blocks until Wing API server is running.

tarmak_vault_cluster

A vault cluster

Input:- List of Vault internal FQDNs or IP addresses

Blocks until Vault is initialised & unsealed.

tarmak_vault_instance_role

This creates (once per role) an init token for such instances in Vault.

Input:- Name of Vault cluster- Role name

Output:- init_token per role

Blocks until init token is setup.

5.1.4 Notable items

Communication with the process in the container

The main difficulty is communication with the Tarmak process, as Terraform is run within a Docker container with nocommunication available to the main Tarmak process (stdin/out is used by the Terraform main process).

30 Chapter 5. Proposals

Tarmak Documentation, Release 0.3

The proposal suggests that all terraform-provider-tarmak resources block until the point when the mainTarmak process connects using another exec to a so-called tarmak-connector executable that speaks via a localUnix socket to the terraform-provider-tarmak.

This provides a secure and platform-independent channel between Tarmak and terraform-provider-tarmak.

<<Tarmak>> -- launches -- <<terraform|container>>

stdIN/OUT -- exec into ---- <<exec terraform apply>><<subprocess terraform-provider-tarmak

|connects

|unix socket

|listens

|stdIN/OUT -- exec into ---- <<exec tarmak-connector>>

The protocol on that channel should be using Golang’s net/rpc.

Initial proof-of-concept

An initial proof-of-concept has been explored to test what the Terraform model looks like. Although it’s not reallyhaving anything implemented at this point, it might serve as a starting point.

5.1.5 Out of scope

This proposal is not suggesting that we migrate features that are currently done by the Tarmak main process. Thereason for that is that we don’t want Terraform to become involved in the configuration provisioning of e.g. the Vaultcluster. This proposal should only improve the control we have from Tarmak over things that happen in Terraform.

5.2 Custom Vault Auth Provider for EC2 Instances - A Proposal

This proposal suggests writing a custom authentication provider for Vault. This provider would allow access privileges,especially to the PKI secrets provider, to be locked down per EC2 Instance, which is not currently possible.

5.2.1 Background

Glossary

• IAM Role (iam-role) - an IAM principal. Our instances have an IAM role attached which they use as a serviceaccount. https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html

• IAM Policy (iam-policy) - the authz rules (and rules documents) in AWS.

• Vault Token (vault-token) - common identity document given by all vault auth providers. Has a list of policy(documents).

• Vault Policy (vault-policy) - both the policy language itself (giving authorization for vault operations), documentof those policy statement,s and the list of such documents named on a vault-token. vault-policy has its own type,which we’ll call policy

5.2. Custom Vault Auth Provider for EC2 Instances - A Proposal 31

Tarmak Documentation, Release 0.3

• Vault PKI Role (pki-role) - vault-policy isn’t very powerful wrt to PKI operations, e.g. it doesn’t let you lockPKI certificate issuance down to specific CNs etc. The PKI secrets provider has its own mechanism for this;the PKI role. pki-roles live at paths and access is controlled by vault-policy. pki-roles are implemented as vaultgeneric secrets. https://www.vaultproject.io/api/secret/pki/index.html#create-update-role

Problem

We would like to have individual Vault policies per EC2 instance, so that they can’t issue themselves certificates foranyone else’s FQDN.

See:

• https://github.com/jetstack/tarmak/issues/120

• https://github.com/jetstack/tarmak/issues/34

Posibilities

Due to several Vault limitations, especially in the free version, it’s not possible to:

• Add a vault-policy (name) to a token after it’s been created (by vault ec2 login)

• Specify a vault-policy expressive enough to re-use one policy for all EC2 Instances (i.e. no variable back-references)

• Vault Premium has a different policy language called Sentinel, which looks like it could do this

• Use the k8s auth provider, as we also want to use this mechanism with etcd Instances, etc, which do not runKubelet

5.2.2 Objective

Solution

Our proposed solution is Vault auth provider. This will auth EC2 instances using their IID like the current AWS authprovider does, but it will also generate new vault-policies and pki-roles on the fly, which will e.g. limit their certcreation power to just that instance’s DNS name.

This vault auth plugin will serve exactly one environment (i.e. all kubernetes clusters in the same (provider, region))

On login, our provider will:

• Auth the Instance like the AWS provider’s EC2 mode does (can we simply defer to that code?)

• Match the iam-role attached to the Instance against our Provider’s config, using its ARN

• Make a pki-role from the configured template for that iam-role

• Make a vault-policy (document) from the configured template for that iam-role, including templating in thename of the pki-role

– This should have a unique name based on the AWS Instance ID / boot ID / etc

• Like any other auth provider, ultimately make and return a vault-token (from a template?), including the uniquevault-policy just made

32 Chapter 5. Proposals

Tarmak Documentation, Release 0.3

5.2.3 Changes

Configuration

Config of the provider will be though Vault paths as normal.

config/client

Global configuration of the plugin.

Fields:

cloud_provider=[aws|gce]aws_access_id= (use instance role if empty)aws_secret_id= (use instance role if emtpy)aws_region= (detect from metadata service if empty)gce_* (equivalents...)

roles/<rolename>

Tell Vault about the IAM roles in use by the instances.

Recall that each instance we bring up has an IAM role attached depending on its type, e.g. etcd, master, or worker.We can easily use this to tell the different instance types apart, as they need different policy templates.

Note: we have to store this information at a vault path, which means coming up with yet another set of symbolicnames, and a name for this type of thing. I’ll loosely call them “roles”, as there’s a 1:1 mapping with the IAM Rolesthey’re modelling. Recall that the vault server is shared between clusters. We won’t do any namespacing in the path,so we should strongly encourage (or enforce?) that names are e.g. alice_cluster-etcd_role

Fields:

iam_role_arn="arn:aws:iam::$account:role/$role"base_path="..." # base path to cluster's vault secrets, such that the kubernetes PKI→˓ lives at {{base_path}}/pki/k8s. E.g. "dev-cluster"

templates/<rolename>/<templatename>

Recall that we need to dynamically create two types of thing from templates:

• vault-policies; to attach to tokens, which amoungst other things point at pki-roles

• pki-roles; to actually limit EC2 Instances’ PKI power. These are stored as generic secrets

Semantics:

• These templates are to be golang templates, and substitute at least: {{.InstanceHash}}, {{.FQDN}},{{.InternalIPv4}}, {{.BasePath}}

• Secrets are to be specified in JSON * Policies are to be specified in vault free-edition policy-language

• path is where the rendered template should be written during the log-in process, relative to the base_path

– e.g. /pki/k8s/roles/:name would result in a pki-role at /alice_cluster/pki/k8s/roles/kubelet

– e.g. such a role for the kubelet would be crated by a “worker” role.

5.2. Custom Vault Auth Provider for EC2 Instances - A Proposal 33

Tarmak Documentation, Release 0.3

Fields:

type="policy|generic"path="relative/path/of/template/output"template="<golang template of either policy document or JSON-encoded generic secret>"

5.2.4 Notable items

Concerns

• Huge part of security critical code in our hands

• Clean up of roles and templates once they are no longer used

• Work will be needed for each additional cloud provider we want to support.

5.2.5 Out of scope

• AWS auth provider’s IAM mode

34 Chapter 5. Proposals

CHAPTER 6

Known issues

This document summarises some of the known issues users may come across when running Tarmak and how to dealwith them.

6.1 An alias with the name arn:aws:kms:<region>:<id>:alias/tarmak/<environment>/secrets already exists

If you lose your terraform state file after spinning up a cluster with Tarmak, terraform cannot delete anything that wasin that state file. On the next run of Tarmak, terraform will try to recreate the resources required for your cluster. Onesuch resource is AWS KMS aliases, which need to be unique and cannot be deleted through the AWS console. In orderto delete these aliases manually based on the error above you can run:

aws kms delete-alias --region <region> --alias-name alias/tarmak/<environment>/→˓secrets

6.2 AWS key pair is not matching the local one

If you run into following error when running tarmak clusters apply:

FATA[0004] error preparing container: error validating environment: 1 error occurred:

* aws key pair is not matching the local one, aws_fingerprint=<aws_fingerprint> local_→˓fingerprint=<local_fingerprint>

then there is a mismatch between your key pair’s public key stored by AWS and your local key pair. To remedy thisyou must either create a new key pair and upload it to AWS manually, or delete your existing key pair through theAWS console and re-run tarmak clusters apply.

35

Tarmak Documentation, Release 0.3

36 Chapter 6. Known issues

CHAPTER 7

Troubleshooting

7.1 Calico

Calico as the overlay network policy plugin, plays a vital role in the cluster communications.

7.1.1 Install calicoctl

The easiest form of accessing the calico manifest store is using calicoctl in the controller pod:

# exec into calico-controller container of the clusterkubectl exec -n kube-system -t -i $(kubectl get pods -n kube-system -l k8s-app=calico-→˓policy -o go-template='{{range .items}}{{.metadata.name}}{{end}}') /bin/sh

# download calicoctlapk --update add curlcurl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.1.1/→˓calicoctlchmod +x calicoctl

# request node objects./calicoctl get nodes

# leave containerexit

37

Tarmak Documentation, Release 0.3

38 Chapter 7. Troubleshooting

CHAPTER 8

Examples

This section covers extra add-ons you can install on Tarmak kubernetes clusters.

8.1 Kube2IAM

Kube2IAM is an extension to kubernetes that will allow you to give fine grained AWS access to pods without. Moreinformation about the project can be found on the project page.

8.1.1 Prerequisite

Make sure HELM is activated on the Tarmak cluster. You also need to make sure you can connect to the cluster withyour HELM install.

helm versionClient: &version.Version{SemVer:"v2.9.1", GitCommit:→˓"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}Server: &version.Version{SemVer:"v2.9.1", GitCommit:→˓"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

8.1.2 Setup

Create instance IAM policy

Every instance that will run the kube2iam pod needs to have an specific IAM policy attached to the IAM instance roleof that instance.

The following Terraform project creates an IAM policy that will give instances the ability to assume roles. The assumerole is restricted to only have access to roles deployed in a specific path. By doing this, we can limit the amount ofroles an instance can assume to only the roles that it really needs to. The Terraform project has 2 inputs aws_region

39

Tarmak Documentation, Release 0.3

and cluster_name. The projects also has 2 outputs defined the ARN and path of the IAM policy. The ARN iswhat you need to give to Tarmak and the path is needed to be able to deploy your roles for the pods in the correct path.

terraform {}

provider "aws" {region = "${var.aws_region}"

}

variable "aws_region" {description = "AWS Region you want to deploy it in"default = "eu-west-1"

}

variable "cluster_name" {description = "Name of the cluster"

}

data "aws_caller_identity" "current" {}

resource "aws_iam_policy" "kube2iam" {name = "kube2iam_assumeRole_policy_${var.cluster_name}"path = "/"description = "Kube2IAM role policy for ${var.cluster_name}"

policy = "${data.aws_iam_policy_document.kube2iam.json}"}

data "aws_iam_policy_document" "kube2iam" {statement {

sid = "1"

actions = ["sts:AssumeRole",]

effect = "Allow"

resources = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/kube2iam_$

→˓{var.cluster_name}/*",]

}}

output "kube2iam_arn" {value = "${aws_iam_policy.kube2iam.arn}"

}

output "kube2iam_path" {value = "/kube2iam_${var.cluster_name}/"

}

You can run the Terraform project the following way:

terraform initterraform apply -var cluster_name=example -var region=eu-west-1

40 Chapter 8. Examples

Tarmak Documentation, Release 0.3

Attach instance policy

Add the created IAM policy ARN to your tarmak config. You can do this by adding additional IAM policies.

Deploy kube2iam

With HELM it is easy to deploy kube2iam with the correct settings.

You can deploy it with the following command:

helm upgrade kube2iam stable/kube2iam \--install \--version 0.9.0 \--namespace kube-system \--set=extraArgs.host-ip=127.0.0.1 \--set=extraArgs.log-format=json \--set=updateStrategy=RollingUpdate \--set=rbac.create=true \--set=host.iptables=false

We set iptables to false and host-ip to 127.0.0.1 as Tarmak already creates the iptables rule and forward it to127.0.0.1:8181. Specific kube2iam options can be found in the documentation of kube2iam.

8.1.3 Usage

Now that kube2IAM is installed on your system, you can start creating roles and policies to give your pods access toAWS resources.

An example creation of an IAM policy and role:

terraform {}

provider "aws" {region = "${var.aws_region}"

}

variable "aws_region" {description = "AWS Region you want to deploy it in"default = "eu-west-1"

}

variable "cluster_name" {description = "Name of the cluster"

}

variable "instance_iam_role_arn" {description = "ARN of the instance IAM role"

}

resource "aws_iam_role" "test_role" {name = "test_role"path = "/kube2iam_${var.cluster_name}/"

assume_role_policy = "${data.aws_iam_policy_document.test_role.json}"}

(continues on next page)

8.1. Kube2IAM 41

Tarmak Documentation, Release 0.3

(continued from previous page)

data "aws_iam_policy_document" "test_role" {statement {

sid = "1"

actions = ["sts:AssumeRole",

]

principals {type = "AWS"identifiers = ["${var.instance_iam_role_arn}"]

}}

}

resource "aws_iam_role_policy" "test_role_policy" {name = "test_policy"role = "${aws_iam_role.test_role.id}"

policy = "${data.aws_iam_policy_document.test_role_policy.json}"}

data "aws_iam_policy_document" "test_role_policy" {statement {

sid = "1"

actions = ["s3:ListAllMyBuckets",

]

resources = ["*",

]}

}

output "test_role" {value = "${aws_iam_role.test_role.arn}"

}

Now you can run this Terraform project the following way:

terraform initterraform apply -var cluster_name=example -var region=eu-west-1 -var instance_iam_→˓role_arn=arn:aws:iam::xxxxxxx:role/my-instance-role

When you create a role, you need to make sure you deploy it in the correct path and also add an assume rolepolicy to it. That assume role policy needs to grant access to the role ARN that is attached to the instances. In ourexample Terraform project above we solved that by adding a variable for the instance_iam_role_arn and thecluster_name.

Warning: Make sure you use the IAM role ARN attached to your worker Kubernetes instances as input forinstance_iam_role_arn. You can retrieve the IAM role ARN through the AWS console. Don’t confusethis with the earlier created IAM policy.

42 Chapter 8. Examples

Tarmak Documentation, Release 0.3

With the output test_role, you can add that ARN as an annotation to your Pod/Deployment/ReplicaSet etc. In thefollowing example we spin up a pod to list all buckets:

apiVersion: v1kind: Podmetadata:

name: aws-clilabels:

name: aws-cliannotations:

iam.amazonaws.com/role: role-arnspec:

containers:- name: aws-cli

image: fstab/aws-clicommand:

- "/home/aws/aws/env/bin/aws"- "s3"- "ls"

8.1. Kube2IAM 43

Tarmak Documentation, Release 0.3

44 Chapter 8. Examples

CHAPTER 9

Deploying into an existing AWS VPC

Tarmak has experimental support for deploying clusters into an existing AWS VPC.

To enable this, you will need to note down the IDs for the VPC and subnets you want to deploy to.

For example, if we have the following infrastructure (notation in terraform):

provider "aws" {}

data "aws_availability_zones" "available" {}

resource "aws_vpc" "main" {cidr_block = "10.0.0.0/16"enable_dns_support = trueenable_dns_hostnames = true

tags {Name = "test_vpc"

}}

resource "aws_eip" "nat" {vpc = true

}

resource "aws_subnet" "public" {count = "${length(data.aws_availability_zones.available.names)}"vpc_id = "${aws_vpc.main.id}"cidr_block = "${cidrsubnet(cidrsubnet(aws_vpc.main.cidr_block, 3, 0), 3,

→˓count.index)}"availability_zone = "${data.aws_availability_zones.available.names[count.index]}"

tags {Name = "public_${data.aws_availability_zones.available.names[count.index]}"

}}

(continues on next page)

45

Tarmak Documentation, Release 0.3

(continued from previous page)

resource "aws_subnet" "private" {count = "${length(data.aws_availability_zones.available.names)}"vpc_id = "${aws_vpc.main.id}"cidr_block = "${cidrsubnet(aws_vpc.main.cidr_block, 3, count.index + 1)}"availability_zone = "${data.aws_availability_zones.available.names[count.index]}"

tags {Name = "private_${data.aws_availability_zones.available.names[count.index]}"

}}

resource "aws_internet_gateway" "main" {vpc_id = "${aws_vpc.main.id}"

}

resource "aws_nat_gateway" "main" {count = "${length(aws_subnet.public)}"depends_on = ["aws_internet_gateway.main"]allocation_id = "${aws_eip.nat.id}"subnet_id = "${aws_subnet.public.*.id[count.index]}"

}

resource "aws_route_table" "public" {vpc_id = "${aws_vpc.main.id}"

}

resource "aws_route" "public" {route_table_id = "${aws_route_table.public.id}"destination_cidr_block = "0.0.0.0/0"gateway_id = "${aws_internet_gateway.main.id}"

}

resource "aws_route_table" "private" {vpc_id = "${aws_vpc.main.id}"

}

resource "aws_route" "private" {route_table_id = "${aws_route_table.private.id}"destination_cidr_block = "0.0.0.0/0"nat_gateway_id = "${aws_nat_gateway.main.id}"

}

resource "aws_route_table_association" "public" {count = "${length(data.aws_availability_zones.available.names)}"subnet_id = "${aws_subnet.public.*.id[count.index]}"route_table_id = "${aws_route_table.public.id}"

}

resource "aws_route_table_association" "private" {count = "${length(data.aws_availability_zones.available.names)}"subnet_id = "${aws_subnet.private.*.id[count.index]}"route_table_id = "${aws_route_table.private.id}"

}

Run tarmak init as normal. Before running the apply stage, add the following annotations to your clustersnetwork configuration (located in ~/.tarmak/tarmak.yaml):

46 Chapter 9. Deploying into an existing AWS VPC

Tarmak Documentation, Release 0.3

network:cidr: 10.99.0.0/16metadata:

creationTimestamp: nullannotations:tarmak.io/existing-vpc-id: vpc-xxxxxxxxtarmak.io/existing-public-subnet-ids: subnet-xxxxxxxx,subnet-xxxxxxxx,

→˓subnet-xxxxxxxxtarmak.io/existing-private-subnet-ids: subnet-xxxxxxxx,subnet-xxxxxxxx,

→˓subnet-xxxxxxxx

Now you can run tarmak cluster apply and continue as normal.

47

Tarmak Documentation, Release 0.3

48 Chapter 9. Deploying into an existing AWS VPC

CHAPTER 10

Vault Setup and Configurations

Vault is a tool developed by Hashicorp that securely manages secrets - handling leasing, key revocation, key rolling,and auditing.

Vault is used in Tarmak to provide a PKI (private key infrastructure). Vault is run on several high-availability instancesthat serve Tarmak clusters in the same environment (single or multi-cluster).

10.1 Certificate Authorities (CAs)

Vault is used as a Certificate Authority for Tarmak’s Kubernetes Clusters. Three CA’s are needed to provide multipleroles for each cluster as follows:

• Etcd cluster that serves API server (etcd-k8s)

– server role for etcd daemons

– client role for kube-apiserver

• Etcd cluster that serves as overlay backend (etcd-overlay)

– server role for etcd daemons

– client role for overlay daemons

• Kubernetes API (k8s)

– master role for Kubernetes master components (kube-apiserver, kube-controller-manager, kube-scheduler)

– worker role for Kubernetes worker components (kube-proxy, kubelet)

– admin role for users requesting access to Kubernetes API as admin

– more specific bespoke roles that limit access e.g. read-only, namespace-specific roles

• Verifying aggregated API calls (k8s-api-proxy)

– single role for Kubernetes components (kube-apiserver-proxy)

49

Tarmak Documentation, Release 0.3

– verified through a custom API server

– CA stored in the configmap extension-apiserver-authentication in the kube-system namespace

10.2 Init Tokens

Tokens are used as the main authentication method in Vault and provide a mapping to one or more policies. On firstboot, each instance generates their own unique token via a given token - the init token. These init-tokens are roledependant meaning the same init-token is shared with instances only with the same role. Once generated, the inittoken is erased by all instances in favour of their own new unique token making the init token no longer accessible onany instance. Unlike the init-tokens, generated tokens are short lived and so need renewal regularly.

10.3 Purpose of Node Unique Tokens

Every instance type has a unique set of policies which need to be realised in order to be able to execute its specificoperations. With unique tokens, each instance is able to uniquely authenticate themselves against Vault with itsrequired policies that the tokens map to. This means renewing and revocation can be controlled on a per instancebasis. With this, each instance generates a private key and sends a Certificate Singing Request (CSR) containingthe policies needed. Vault then verifies the CSR by ensuring the CSR matches the requirements of the policy - ifsuccessful, returns a signed certificate. Instances can only obtain certificates from CSRs because of the permissionsthat its unique token provides. Upon receiving, the instance will store the signed certificate locally to be shared withits relevant services and start or restart all services which are dependant.

10.4 Expiration of Tokens and Certificates

Both signed certificates and tokens issued to each instance are short lived meaning they need to be renewed regularly.Two Systemd timers cert.timer and token-renewal.timer are run on each instance that will renew its certificate andtoken at a default value of 24 hours. This ensures all instances always have valid certificates. If an instance were tobecome offline or the Vault server became unreachable for a sufficient amount of time, certificates and tokens willno longer be renewable. If a certificate expires it will become invalid and will cause the relevant operation to behalted until its certificates are renewed. If an instance’s unique token is not renewed, it will no longer be able to everauthenticate itself against Vault and so will need to be replaced.

10.5 Certificate Roles on Kubernetes CA

etcd-client: Certificates with client flag - short ttl.

etcd-server: Certificates with client and server flag - short ttl.

admin: Allowed to get admin domain, certified for server certificates - long ttl.

kube-apiserver: Allowed to get any domain name certified for server certificates - short ttl.

worker: Allowed to get “kubelet” and “system:node” domains certified for server and client certificates - short ttl.

admin (kube-scheduler, kube-controller-manager, kube-proxy): Allowed to get system:<rolename> domains (i.e.system:kube-scheduler) certified for client certificates - short ttl.

kube-apiserver-proxy: Allowed to get “kube-apiserver-proxy” domain, certified for client certificates - short ttl.

50 Chapter 10. Vault Setup and Configurations

CHAPTER 11

Vault Helper In Tarmak

Vault is used in Tarmak to provide a PKI (private key infrastructure). vault-helper is a tool designed to facilitate andautomate the PKI tasks required. Each Kubernetes instance type in a Tarmak cluster is in need of signed certificatesfrom Vault to operate. These certificates need to be stored locally and regularly renewed.

It is essential for the Vault stack to be executed and completed before the Kubernetes stacks as they rely on communi-cation from Vault. With the Vault stack completed and the connection to the Vault server established, the vault-helperpackage is used to mount all backends (Etcd, Etcd overlay, K8s, K8s proxy and secrets generic) to Vault if they havenot already. Mounts with incorrect default or max lease TTLs will be re-tuned accordingly.

These backends serve as the CAs for the Kubernetes components. Roles and then polices to these roles are writtento each CA as described here. The init-token polices and roles are then written to Vault also. This whole process isidempotent.

Vault is now set up correctly for each CA, tokens, roles and policies.

The vault-helper binary is stored on all cluster instances (etcd, worker and master). Two Systemd timers are run onevery cluster instance in order to renew both tokens and certificates every day. These will trigger oneshot services(token-renewal.service, cert.service) to be fired, executing the locally stored vault-helper binary to renew certificatesand tokens. When executing either renew-token or cert, vault-helper will recognise if an init-token is present in localfile, generating a new unique token to be stored, deleting the init-token. The cert subcommand will ensure a privatekey is generated, if one does not exist, before sending a CSR to the Vault server. The returned signed certificate it thenstored locally, replacing any previous certificates.

51

Tarmak Documentation, Release 0.3

52 Chapter 11. Vault Helper In Tarmak

CHAPTER 12

Accepting CentOS Terms

Tarmak uses CentOS images for machines in clusters. Before being able to use these images on your AWS accountyou must accept the terms of use.

1. Visit the image page on the AWS marketplace. Click “Continue to Subscribe in the top right”

2. Select “Manual Launch” so that we only need to accept the licence rather than creating instances and click on“Accept Software Terms”.

You should see this confirmation:

After you see this screen you’re done. Unfortunately, sometimes it can take as long as a few hours for your account tobe permitted to run instances with the images.

53

Tarmak Documentation, Release 0.3

54 Chapter 12. Accepting CentOS Terms

Tarmak Documentation, Release 0.3

55