31
Copyright©2017 NTT Corp. All Rights Reserved. Akihiro Suda <[email protected]> NTT Software Innovation Center Parallelizing CI using Docker Swarm-Mode Open Source Summit Japan (June 1, 2017) Last update: June 1, 2017

Parallelizing CI using Docker Swarm-Mode

Embed Size (px)

Citation preview

Page 1: Parallelizing CI using Docker Swarm-Mode

Copyright©2017 NTT Corp. All Rights Reserved.

Akihiro Suda <[email protected]>NTT Software Innovation Center

Parallelizing CI using Docker Swarm-Mode

Open Source Summit Japan (June 1, 2017) Last update: June 1, 2017

Page 2: Parallelizing CI using Docker Swarm-Mode

2Copyright©2017 NTT Corp. All Rights Reserved.

•Software Engineer at NTT Corporation

•Several talks at FLOSS community• FOSDEM 2016• ApacheCon Core North America 2016, etc.

• Docker Moby Project core maintainer

Docker project transited into Moby Project (April, 2017).Now Docker products are "downstreams" of Moby Project.

: ≒ :

github.com/AkihiroSuda

Page 3: Parallelizing CI using Docker Swarm-Mode

3Copyright©2017 NTT Corp. All Rights Reserved.

A problem in Docker/Moby project: CI is slow

https://jenkins.dockerproject.org/job/Docker-PRs/buildTimeTrendcapture: March 3, 2017

120 min

red valley = test failed immediately L

Page 4: Parallelizing CI using Docker Swarm-Mode

4Copyright©2017 NTT Corp. All Rights Reserved.

How about other FLOSS projects?

https://builds.apache.org/view/All/job/Pr

eCommit-HDFS-Build/buildTimeTrend

https://builds.apache.org/view/All/job/Pr

eCommit-YARN-Build/buildTimeTrend

https://amplab.cs.berkeley.edu/jenkins/jo

b/SparkPullRequestBuilder/buildTimeTrend

capture: March 3, 2017200 min

260 min

240 min

https://grpc-

testing.appspot.com/job/gRPC_pull_requests_linux/buildTimeTrend

550 min

(for ease of visualization, picked up some projects that use Jenkins from various categories)

Page 5: Parallelizing CI using Docker Swarm-Mode

5Copyright©2017 NTT Corp. All Rights Reserved.

•Blocker for reviewing/merging patches

•Discourages developers from writing tests

•Discourages developers from enabling additional testing features

• e.g. `go test –race` (race detector) is 2-20x slower

Why slow CI matters?

source: https://golang.org/doc/articles/race_detector.html

Page 6: Parallelizing CI using Docker Swarm-Mode

6Copyright©2017 NTT Corp. All Rights Reserved.

Why slow CI matters?

Can result in poor implementation quality &

slow development cycle

Page 7: Parallelizing CI using Docker Swarm-Mode

7Copyright©2017 NTT Corp. All Rights Reserved.

Solution: Parallelize CI ?

Task

CPUs

Machine

General

$ parallel –-max-procs N ...

Go

$ go test –parallel N ...

Java (Maven)

$ mvn –-threads N ...

Python (nose)

$ nosetests –-processes N ...

Page 8: Parallelizing CI using Docker Swarm-Mode

8Copyright©2017 NTT Corp. All Rights Reserved.

But just doing parallelization is not enough

•No isolation• Concurrent test tasks may race for certain shared resources(e.g. files under `/tmp`, TCP port, ...)

•Poor scalability• CPU/RAM resource limitation• I/O parallelism limitation

Solution: Parallelize CI ?

Page 9: Parallelizing CI using Docker Swarm-Mode

9Copyright©2017 NTT Corp. All Rights Reserved.

•Docker provides isolation•Swarm-mode provides scalability

Solution: Docker (in Swarm-mode)

Distribute across Swarm-modeParallelize & Isolateusing Docker containers

✓Isolation ✓Scalability

Page 10: Parallelizing CI using Docker Swarm-Mode

10Copyright©2017 NTT Corp. All Rights Reserved.

•For ideal isolation, each of the test functions should be encapsulated into independent containers

•But this is not optimal in actual due to setup/teardown code

Challenge 1: Redundant setup/teardown

func TestMain(m *testing.M) {

setUp()

m.Run()

tearDown()

}

func TestFoo(t *testing.T)

func TestBar(t *testing.T)

redundantly executed

setUp()

testFoo.Run()

tearDown()

container

setUp()

testBar.Run()

tearDown()

container

Page 11: Parallelizing CI using Docker Swarm-Mode

11Copyright©2017 NTT Corp. All Rights Reserved.

•Solution: execute a chunk of multiple test functions sequentially in a single container

Optimization 1: Chunking

setUp()

testFoo.Run()

tearDown()

container

setUp()

testBar.Run()

tearDown()

containerchunk

setUp()

testFoo.Run()

testBar.Run()

tearDown()

container

Page 12: Parallelizing CI using Docker Swarm-Mode

12Copyright©2017 NTT Corp. All Rights Reserved.

• Test sequence is typically executed in lexicographic order (ABC...)

• Observation: long-running test functions concentrate on a certain portion

• because testing similar scenarios

Challenge 2: Makespan non-uniformity

Execution Order

TestApple1TestApple2

TestBanana1

TestBanana2

TestCarrot1TestCarrot2

Page 13: Parallelizing CI using Docker Swarm-Mode

13Copyright©2017 NTT Corp. All Rights Reserved.

Challenge 2: Makespan non-uniformity

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500N-th test (in lexicographic ordering)

Makespan (seconds)

Example: Docker itself

DockerSuite.TestBuild*

DockerSwarmSuite.Test*

Page 14: Parallelizing CI using Docker Swarm-Mode

14Copyright©2017 NTT Corp. All Rights Reserved.

Makespans of the chunks are likely to result in non-uniformity

Challenge 2: Makespan non-uniformity

Test1

Test2

Test3

Test4

Test1

Test2Test3

Test4non-uniformity(wasted time for container 1)

speed-up

1 2

Chunks for containersSequence

Page 15: Parallelizing CI using Docker Swarm-Mode

15Copyright©2017 NTT Corp. All Rights Reserved.

Solution: shuffle the chunks• No guarantee for optimal schedule though

Optimization 2: Shuffling

Test1

Test2

Test3

Test4

Sequence

21Test1

Test2

Test3

Test4

Chunks(Optimal)

1

Test2Test3

Test4

Test12

Chunks(Unoptimized)

21Test1 Test2

Test3Test4

Chunks(Shuffled)

Page 16: Parallelizing CI using Docker Swarm-Mode

16Copyright©2017 NTT Corp. All Rights Reserved.

• RPC: Funker (github.com/bfirsh/funker)• FaaS-like architecture• Workloads are automatically balanced via Docker's built-in LB• No explicit task queue; when a worker is busy, the master just retries

• Deployment: `docker stack deploy` with Compose file• Easily portable to other orchestrators as well (e.g. Kubernetes)

Implementation

master Built-in LBClient worker.2

worker.1

worker.3

Funker

Docker Swarm-mode cluster(typically on cloud, but even ok to use localhost as a

single-node cluster)

(on CI / laptop)

`docker stack deploy`

Page 17: Parallelizing CI using Docker Swarm-Mode

17Copyright©2017 NTT Corp. All Rights Reserved.

• Testing Docker itself requires `docker run –-privileged`

• But Swarm-mode lacks support for `--privileged` at the moment

• moby/moby#24862

• Workaround: Bind-mount `/var/run/docker.sock` into service containers, and execute `docker run –-privileged` within them

Implementation (Docker-specific part)

worker.2

worker.1

worker.3

privileged_worker.2

privileged_worker.1

privileged_worker.3

`docker run --privileged`Swarm

Non-swarm on Swarm

Page 18: Parallelizing CI using Docker Swarm-Mode

18Copyright©2017 NTT Corp. All Rights Reserved.

•Evaluated my hack against the CI of Docker itself• Of course, this hack can be applicable to CI of other software as well

•Target: Docker 16.04-dev (git:7fb83eb7)• Contains 1,648 test functions

•Machine: Google Compute Enginen1-standard-4 instances (4 vCPUS, 15GB RAM)

Experimental result

Page 19: Parallelizing CI using Docker Swarm-Mode

19Copyright©2017 NTT Corp. All Rights Reserved.

Experimental result

•20 times faster at maximum with 10 nodes•But even effective with a single node!

1h22m7s(traditional testing)

4m10s10 nodesCost: 10x

Speed-up: 19.7x

15m3s

Cost: 1xSpeed-up: 5.5x

1 node1 node

Page 20: Parallelizing CI using Docker Swarm-Mode

20Copyright©2017 NTT Corp. All Rights Reserved.

Detailed result

1(Chunk size: 1648)

10(165)

30(55)

50(33)

70(24)

1

1h22m7s(=traditional)

15m3s N/A (more than 30m)2 12m7s 10m12s 11m25s 13m57s

5 10m16s 6m18s 5m46s 6m28s

10 8m26s 4m31s 4m10s 4m20s

Containers running in parallel

Fastestconfiguration

5.5x (BCR 5.5x)

19.7x (BCR 2.0x)

8.1x (BCR 4.0x)

14.2x (BCR 2.8x)

Nodes

Time: average of 5 runs, Graph driver: overlay2

Best BCR (benefit-cost ratio)

Page 21: Parallelizing CI using Docker Swarm-Mode

21Copyright©2017 NTT Corp. All Rights Reserved.

What if no optimization techniques?

Nodes ParallelizeParallelize

+ Chunking

Parallelize +

Chunking+

Shuffling(= previous slide)

1

more than 30m

14m58s 15m3s2 10m1s 10m12s5 7m32s 5m46s

10 6m9s 4m10sSignificantly faster 1.5x faster

See the previous slide for the number of containers running in parallel

Page 22: Parallelizing CI using Docker Swarm-Mode

22Copyright©2017 NTT Corp. All Rights Reserved.

Scale-out vs Scale-up?

Nodes Total vCPUs Total RAM Containers ResultScaleout 10

40 150GB 504m10s

Scaleup 1 19m17s

with both chunking and shuffling

Scale-out wins• Better I/O parallelism, mutex contention, etc.

Page 23: Parallelizing CI using Docker Swarm-Mode

23Copyright©2017 NTT Corp. All Rights Reserved.

PR (merged): docker/docker#29775

The code is available!

bash

$ cd $GOPATH/src/github.com/docker/docker

$ make build-integration-cli-on-swarm

$ ./hack/integration-cli-on-swarm/integration-cli-on-swarm ╲

-push-worker-image your-docker-registry/worker:latest ╲

-replicas 50 ╲

-shuffle

Page 24: Parallelizing CI using Docker Swarm-Mode

24Copyright©2017 NTT Corp. All Rights Reserved.

Yes, generalized & simplified version available(some Docker-specific hacks were eliminated)

github.com/osrg/namazu-swarm

Is it applicable to other software as well?

"Namazu Swarm"• Namazu (鯰) means a catfish in Japanese

• Our related project: github.com/osrg/namazu(A tool for reproducing flaky tests and injecting faults)

• Unrelated: www.namazu.org (text search engine)

Page 25: Parallelizing CI using Docker Swarm-Mode

25Copyright©2017 NTT Corp. All Rights Reserved.

Easy to get started

Just write a Dockerfile with two labels!

vi DockerfileFROM your-projectADD tests.txt /

LABEL ╲net.osrg.namazu-swarm.v0.master.script="cat /tests.txt" ╲net.osrg.namazu-swarm.v0.worker.script="sh -e -x"

~~

Read a chunk of test IDs from stdin, and execute them

Emit all the test IDs (== commands, typically) to stdout

Page 26: Parallelizing CI using Docker Swarm-Mode

26Copyright©2017 NTT Corp. All Rights Reserved.

Easy Integration with CI and Clouds

Docker Swarm-mode cluster

(typically on cloud, but even ok to use localhost as a single-node cluster)

Travis CI

Circle CI

Jenkins

(Laptop)

Planned – Kubernetes (e.g. GKE, ACS..)

Planned - ECS

Namazu Swarm itself is tested on Travis

Page 27: Parallelizing CI using Docker Swarm-Mode

27Copyright©2017 NTT Corp. All Rights Reserved.

Experimental result

Traditional 1 Node(10 containers)

2 Nodes(30 containers)

5 Nodes(50 containers)

10 Nodes(90 containers)

56m7s 19m34s 17m19s 9m50s 7m52s

7.1x faster (Cost: 10x)

2.9x faster (Cost: 1x)

Apache ZooKeeper

Your own application• Your report is welcome

Page 28: Parallelizing CI using Docker Swarm-Mode

28Copyright©2017 NTT Corp. All Rights Reserved.

•Record and use the past execution history to optimize the schedule, rather than just shuffling

•Investigate deeply why scale-out wins• related: moby/moby#33254

Future work

Page 29: Parallelizing CI using Docker Swarm-Mode

29Copyright©2017 NTT Corp. All Rights Reserved.

•Mitigate extra overhead of pushing/pulling the image to/from the registry

• Can take a few minutes, depending on the network condition and the amount of the cached layers on each of the nodes

• FILEgrain: github.com/AkihiroSuda/filegrain• My lazy-pull extension for OCI Image Spec• Experimental result for `java:8` image

• P2P image distribution? e.g. IPFS

Future work

Image format Workload PullTraditional Docker/OCI Any 633 MB (100%)

FILEgrainsh 4 MB (0.6%)java –version 87 MB (14%)javac Hello.java 136 MB (22%)

Still POC..your contribution is welcome

Page 30: Parallelizing CI using Docker Swarm-Mode

30Copyright©2017 NTT Corp. All Rights Reserved.

•Docker Swarm-mode is effective for parallelizing CI jobs

•Introduced some techniques for optimal scheduling

Recap

1h22m7s(traditional testing)

4m10s10 nodesCost: 10x

Speed-up: 19.7x

15m3s

Cost: 1xSpeed-up: 5.5x

1 node1 node

Page 31: Parallelizing CI using Docker Swarm-Mode

31Copyright©2017 NTT Corp. All Rights Reserved.

•You can easily apply the tool to your software as wellgithub.com/osrg/namazu-swarm

Recap