Percona Live, 2018-11-06
Monitoring Kubernetes with Prometheus
Henri Dubois-Ferriere@henridf
Hello.Henri Dubois-FerriereTechnical Director, Sysdig
Doing “observability” for many many years, from network to web apps via many startups.
PhD in CS from EPFL
Repatriate from San Francisco to Switzerland
Outline
● Kubernetes
● Prometheus
● Kubernetes metrics & sources
● Deployment
Monitor why?
● Know about outages before users tell me
● Understand my production environment (or try…)
● Plan/trend/forecast
Kubernetes
Kubernetes
- Container orchestration system
- aka “OS for your cluster”
- Abstracts away the underlying infra
- declarative APIs with control loops
https://commons.wikimedia.org/wiki/File:Kubernetes.png
Prometheus
Prometheus
❏ Started at SoundCloud in 2012
❏ Motivated by challenges with monitoring dynamic
environments
❏ Made public 2015, now second CNCF “graduate”
More than a TSDB
https://prometheus.io/assets/architecture.png
It’s all about the pull
- Prom scrapes targets to get metrics
- Nice side effect: know when target down
- Needs to know what to scrape
What should Prometheus scrape?
- Service discovery provides answer
- Azure, Consul, GCE, K8S, EC2, ...
- Can also watch a file containing target list
Dimensional data model
Query: http_requests_total{code=”200”, method=”get”}
Selector (aka filter)Metric name
Query: http_requests_total{code=”200”, method=”get”}
Response:http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741
http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920
Label/value pairs (aka dimensions)
Dimensional data model
Query: http_requests_total{code=”200”, method=”get”}
Response:http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741
http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920
Timestamp value
Dimensional data model
Metadata discovery
- SD also provides metadata
- Metadata can be mixed in with metrics
- Powerful relabelling feature for label manipulation at
ingest
Instrumentation
Off-the-shelf or write your own
Kubernetes metrics
Monitoring resources and methods
- For resources like memory, queues, CPUs, disks…- USE Method: Utilization, Saturation, Errors - http://www.brendangregg.com/usemethod.html
- For services- “RED” Method: Request rate, Error rate, Duration- https://www.weave.works/blog/the-red-method-key-metrics-for-micr
oservices-architecture/
- Host metrics- CPU- Memory- Disk- Network- ...
- Not K8S specific, but useful as referential and for totals
node_exporter: node metrics
- Runs in kubelet (usually, for now..)
- Resource stats about running containers
- Mostly container and node-level labels…
- (k8s: plus namespace and pod_name)
cAdvisor: container metrics
Sample cAdvisor metric queries
Percent of total cluster memory used: sum(container_memory_rss) / sum(machine_memory_bytes)
Memory used by kubernetes namespace: sum(container_memory_rss) by (namespace)
Top 5 pods by network I/O:topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))
$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...
Kube-state metrics
$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...
Kube-state metrics
kube_deployment_spec_replicas{deployment="my-app", ...}
Metrics created by kube-state-metricsWith label set from this deployment
kube_deployment_status_replicas{deployment="my-app", ...}
Sample kube-state-metrics queries
Deployments with issueskube_deployment_spec_replicas!=kube_deployment_status_replicas_available
Top 10 longest-running pods (“reverse uptime”)topk(10, sort_desc(time() - kube_pod_created))
- API Server
- etcd3
- kube-dns
- scheduler, controller-manager
Kube core service metrics
Metrics recap
Deployment mode How many Metrics about
node_exporter daemonset 1 per node node resources
cAdvisor inside kubelet 1 per node container resources
kube-state-metrics deployment singleton k8s object state
etcd, Api Server, controller manager, ...
core service singleton or HA group Itself
Deploying
- Monitoring runs inside thing being monitored?
- Yes. It’s fine really. Really, it’s fine.
- (And being outside has own challenges)
Monitoring from the inside
- Metrics services- node_exporter- kube-state-metrics- (cAdvisor usually enabled out of box)
- Prometheus running- Storage- Read access to API server (for service discovery)- Service discovery config for above- Service discovery config for apps/services
Deployment outline
helm fetch stable/prometheus
vi prometheus/values.yaml # configure install
helm upgrade -i # or manually deploy yaml
Helm-based install
Prometheus operator
- Use Kubernetes API facilities to make Prometheus “native”
- new Prometheus-related objects: `kubectl get prometheus`
- PrometheusRule, ServiceMonitor, AlertManager,
AlertingSpec, ...
- Prometheus configuration abstracted via all these objects
- Young but promising
- Consider more direct route first (hand-rolled or Helm), and Operator once
more familiar with challenges of direct route
Thank You.Henri Dubois-Ferriere@henridf
Pointers
- Prometheus SD for Kubernetes:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
- KSM metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation
- Prometheus Helm chart: https://github.com/helm/charts/tree/master/stable/prometheus
- Prometheus operator: https://github.com/coreos/prometheus-operator
- “A deep dive into Kubernetes metrics” blog series:
https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae