Microservices on GKE at Mercari - crash.academy · 2018-06-01 · Microservices is a software...

Preview:

Citation preview

Microservices On GKE At MercariGCPUG Tokyo Kubernetes Engine Day @deeeet

@deeeet

Background

Start with Monolith

Small Overhead for cross domains 👍 Reusable code across domains 👍Effective operation by SRE team 👍

3 scalabilities

Growth of business Growth of features Growth of organization

Growth of business Growth of features Growth of organization

Growth of business Growth of features Growth of organization

Huge Monolith

Difficult to understand change effect 👎 Difficult to test 👎 Difficult to on-board👎 Difficult to isolate failure 👎 Difficult to scale independently 👎 Difficult to try new technologies 👎

Growth of business Growth of features Growth of organization

Unclear ownership 😩 Communication overhead 😩

Velocity is stalled ☔

Microservices

Microservices is a software development technique that structures an application as a collection of loosely coupled services with the smallest autonomous boundary.

Technical benefit Organization benefit

Technical benefit Organization benefit

Easy to test 👍 Easy to deploy 👍 Easy to on-board 👍 Easy to isolate failure 👍 Easy to scale independently 👍

Technical benefit Organization benefit

Clear ownership 😁 Minimum communication overhead 😁

Deliver new features faster ☀

How Microservices?

Gateway pattern Strangler pattern

Gateway pattern Strangler pattern

Service A

Service B

Mercari API

API Gateway

Service A

Service B

Mercari API

API Gateway

Service A

Service B

Service X

Mercari API

API Gateway

Service A

Service B

Service X

Multiple services on a single endpoint SSL Termination DDoS Protection Common AuthZ/AuthN

Mercari API

Gateway pattern Strangler pattern

Mercari API

API Gateway

Service A

Service B

Service X

Mercari API

API Gateway

Service B

Service X Service A

Mercari API

API Gateway

Service X Service A Service B

Mercari API

API Gateway

Function X

Function Y

Function Z

Service C

Mercari API

API Gateway

Function X

Facade C

Function Y

Function Z

Service C

Mercari API

API Gateway

Facade C

Function Y

Function Z

Service C

Function X

Mercari API

API Gateway

Facade C

Function Z

Service C

Function X

Function Y

Mercari API

API Gateway

Facade C

Service C

Function X

Function Y

Function Z

Mercari API

API Gateway

Service C

Function X

Function Y

Function Z

Mercari API

API Gateway

Service C

Function X

Function Y

Service D

Function Z

Current Status

API Gateway

Service A

Service B

Service X

Mercari API

Technical Stack

API GatewayAuthority

Service A

Service B

Sakura

Service X

Mercari API

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

Over HTTP

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

Over HTTP

SSL Termination DDoS Protection Cloud Amor?

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

Over HTTP

Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering

SSL Termination DDoS Protection Cloud Amor?

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

Over HTTP

Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering

SSL Termination DDoS Protection Cloud Amor?

Common AuthZ/AuthN

API Gateway

Google Cloud Load balancing

Authority

Service A

Service B

Sakura

Service X

Mercari API

GCPKubernetes Engine

Cloud Resources Managed Services

Container

Over HTTP

Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering

SSL Termination DDoS Protection Cloud Amor?

Common AuthZ/AuthN

Managed DB

Another important takeaway is that even though all of these listed items are important, ultimately the most critical thing is observability. As I like to say: observability, observability, observability

- Matt Klein, Seeking SRE (Chapter6)

Service A Service BNetwork

Logging? Tracing? (Observability)

Network

Logging? Tracing? (Observability)

Service A Service BNetwork

AuthN and AuthZ? API limit ?

Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ?

Logging? Tracing? (Observability)

Network

Logging? Tracing? (Observability)

Service A Service BNetwork

AuthN and AuthZ? API limit ?

Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ?

Logging? Tracing? (Observability)

Network

Logging? Tracing? (Observability)

Different protocols..

Service A Service B

Service C

Service D

Service A Service B

Service C

Service D

Service B

Service B

Service B

How we use GCP?

API Gateway

Google Cloud Load balancing

Authority

Service XGCP

Kubernetes Engine

API Gateway

Google Cloud Load balancing

Authority

Service XGCP

Kubernetes Engine

How we use GKE?

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

asia-northeast1us-west1

europe-west1

Each region has its own Cluster

Production Cluster

Development Cluster

Testing/QA will be done in development cluster

All services in 1 cluster No special cluster for specific service

Production Cluster

In future, 1 region 1 cluster like Google Borg

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

GCP project: GKE Production

Production Cluster

GCP project: GKE Development

Development Cluster

IAM: SRE IAM: SRE + α

1 cluster for 1 GCP project

Only SRE can access cluster nodes

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

GCP project: GKE Production

Production Cluster

n1-standard-16 node pool

n1-highmem-16 node poolMachine learning workloads

Normal applications

Auto scaling Enabled Automatic node repair Enabled Preemptible Enabled (only in US)

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Each services has its own kubernetes namespace

GCP project: GKE Production

Namespace: Service A

Pod: A Pod: A Pod: A

Namespace: Service B

Pod: B Pod: B

Production Cluster

RBAC: Team X

RBAC: Team X

Each team can only access its own kubernetes namespace

API Gateway

Google Cloud Load balancing

Authority

Service XGCP

Kubernetes Engine

How we use GCP services?

How access limit GCP services? Each service should be allowed to access only its own GCP resources

GCP project: GKE ProductionIAM: SRE

Namespace: Service A

Pod: A Pod: A Pod: A

Namespace: Service B

Pod: B Pod: B

Production Cluster

RBAC: Team X

RBAC: Team Y

GCP project: GKE ProductionIAM: SRE

Namespace: Service A

Pod: A Pod: A Pod: A

Namespace: Service B

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRE

GCP project: Service B

IAM: Team Y + SRE

Production Cluster

Each services has its own GCP project

RBAC: Team X

RBAC: Team Y

GCP project: GKE ProductionIAM: SRE

Namespace: Service A

Pod: A Pod: A Pod: A

Namespace: Service B

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Each services has its own GCP project

RBAC: Team X

RBAC: Team Y

Service resources in its own GCP project

GCP project: GKE ProductionIAM: SRE

Namespace: Service A

Pod: A Pod: A Pod: A

Namespace: Service B

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Each services has its own GCP project

Each namespace has its own service account for its own GCP project

RBAC: Team X

RBAC: Team Y

Service resources in its own GCP project

Each namespace has its own service account

GCP project: GKE ProductionIAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Each services has its own GCP project

Each namespace has its own service account for its own GCP project

Service resources in its own GCP project

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

GCP project creation…? Setup Spanner or Cloud SQL ..?

GCP project: GKE Production

Infrastructure as Code

CloudSQL instance creation

Spanner instance creation

mercari / microservices-terraform Private

Just create a PR to create new GCP project

Terraform plan on CI

Terraform apply on CI

Tool for notifying terraform result is open sourced https://github.com/mercari/tfnotify

Terraform apply on CI

Common part (GCP project creation, Pagerduty setup) can be bootstrapped

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Stackdriver

GCP project: GKE Production

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Logging…?Stackdriver

GCP project: GKE Production

How access limit stackdriver logging? Each team should be allowed to access only its service log

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Logging…?Stackdriver

GCP project: GKE Production

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Stackdriver

Big Query

Big Query

GCP project: GKE Production

Create BQ for each services

IAM: SRE

Namespace: Service ARBAC: Team X

Pod: A Pod: A Pod: A

Namespace: Service BRBAC: Team Y

Pod: B Pod: B

GCP project: Service A

IAM: Team X + SRECloud SQL

GCP project: Service B

SpannerIAM: Team Y + SRE

Production Cluster

Create BQ sink for each services

Stackdriver

Big Query

Big Query

sink

sink

GCP project: GKE Production

Create BQ for each services

BigQuery sink creation

GCP and k8s Ecosystem

Just create ingress it automatically creates DNS records

with Cloud DNS

Disaster Recovering Take backups of your cluster and restore in case of loss.

with Cloud Storage

Non GCP?

Notification or Integration with GitHub

vs. Container Builder

Integration with external services like CDN or AWS

vs. Stackdriver monitoring

vs. Stackdriver error reportNotification and Integration with GitHub

vs. ??GCP does not have chaos as a service

Conclusion

Mercari ❤

@deeeet

Recommended