25
Copyright©2017 NTT Corp. All Rights Reserved. Akihiro Suda <[email protected]> NTT Software Innovation Center Parallelizing CI using Docker Swarm-Mode Open Source Summit Japan (June 1, 2017) Last update: May 17, 2017

Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

Copyright©2017 NTT Corp. All Rights Reserved.

Akihiro Suda <[email protected]>

NTT Software Innovation Center

Parallelizing CI using Docker Swarm-Mode

Open Source Summit Japan (June 1, 2017) Last update: May 17, 2017

Page 2: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

2Copyright©2017 NTT Corp. All Rights Reserved.

https://github.com/AkihiroSuda

• Software Engineer at NTT Corporation

• Several talks at FLOSS community

• FOSDEM 2016

• ApacheCon Core North America 2016, etc.

• Docker Moby Project core maintainer

Who am I

: ≒ :

Docker transited into Moby Project (April, 2017)

Page 3: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

3Copyright©2017 NTT Corp. All Rights Reserved.

A problem in Docker/Moby project: CI is slow

https://jenkins.dockerproject.org/job/Docker-PRs/buildTimeTrend

capture: March 3, 2017

120 min

red valley = test failed immediately

Page 4: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

4Copyright©2017 NTT Corp. All Rights Reserved.

How about other FLOSS projects?

https://builds.apache.org/view/All/job/PreCo

mmit-HDFS-Build/buildTimeTrend

https://builds.apache.org/view/All/job/PreCo

mmit-YARN-Build/buildTimeTrend

https://amplab.cs.berkeley.edu/jenkins/job/

SparkPullRequestBuilder/buildTimeTrend

capture: March 3, 2017200 min

260 min

240 min

https://grpc-

testing.appspot.com/job/gRPC_pull_requ

ests_linux/buildTimeTrend

550 min

(for ease of visualization, picked up some projects that use Jenkins from various categories)

Page 5: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

5Copyright©2017 NTT Corp. All Rights Reserved.

•Blocker for reviewing/merging patches

•Discourages developers from writing tests

•Discourages developers from enabling additional testing features

• e.g. race detector (go test –race)

Why slow CI matters?

Page 6: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

6Copyright©2017 NTT Corp. All Rights Reserved.

Why slow CI matters?

Poor implementation quality &Slow development cycle

Page 7: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

7Copyright©2017 NTT Corp. All Rights Reserved.

•go test –parallel N

•mvn –-threads N

•parallel

• ...

Solution: Parallelize CI ?

Page 8: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

8Copyright©2017 NTT Corp. All Rights Reserved.

But just doing parallelization is not enough

•No isolation

• Concurrent test tasks may race for certain shared resources(e.g. files under `/tmp`, TCP port, ...)

•Poor scalability

• CPU/RAM resource limitation

• I/O parallelism limitation

Solution: Parallelize CI ?

Page 9: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

9Copyright©2017 NTT Corp. All Rights Reserved.

•Docker provides isolation

•Swarm-mode provides scalability

Solution: Docker (in Swarm-mode)

Distributeacross Swarm-mode

Parallelize & Isolateusing Docker containers

✓Isolation ✓Scalability

Page 10: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

10Copyright©2017 NTT Corp. All Rights Reserved.

• For ideal isolation, each of the test functions should be encapsulated into independent containers

• But this is not optimal in actual due to setup/teardown code

Challenge 1: Redundant setup/teardown

func TestMain(m *testing.M) {

setUp()

m.Run()

tearDown()

}

func TestFoo(t *testing.T)

func TestBar(t *testing.T)

redundantly executed

setUp()

testFoo.Run()

tearDown()

container

setUp()

testBar.Run()

tearDown()

container

Page 11: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

11Copyright©2017 NTT Corp. All Rights Reserved.

• Solution: execute a chunk of multiple test functions sequentially in a single container

Optimization 1: Chunking

setUp()

testFoo.Run()

tearDown()

container

setUp()

testBar.Run()

tearDown()

containerchunk

setUp()

testFoo.Run()

testBar.Run()

tearDown()

container

Page 12: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

12Copyright©2017 NTT Corp. All Rights Reserved.

• Test sequence is executed in lexicographic order (ABC...)

• Observation: long-running test functions concentrate on a certain portion

• because testing similar scenarios

Challenge 2: Makespan non-uniformity

Execution Order

TestApple1TestApple2

TestBanana1

TestBanana2

TestCarrot1

TestCarrot2

Page 13: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

13Copyright©2017 NTT Corp. All Rights Reserved.

Challenge 2: Makespan non-uniformity

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500N-th test

Makespan (seconds)

Example: Docker itself

DockerSuite.TestBuild*

DockerSwarmSuite.Test*

Page 14: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

14Copyright©2017 NTT Corp. All Rights Reserved.

Makespans of the chunks are likely to result in non-uniformity

Challenge 2: Makespan non-uniformity

Test1

Test2

Test3

Test4

Test1

Test2

Test3

Test4non-uniformity(wasted time for container 1)

speed-up

1 2

Chunks for containersSequence

Page 15: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

15Copyright©2017 NTT Corp. All Rights Reserved.

Solution: shuffle the chunks

• No guarantee for optimal schedule though

Optimization 2: Shuffling

Test1

Test2

Test3

Test4

Test1

Test2

Test3

Test4

Test1

Test2

Test3

Test4

1 2 1 2

Chunks Chunks (Shuffled)Sequence

Page 16: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

16Copyright©2017 NTT Corp. All Rights Reserved.

• Based on Funker (github.com/bfirsh/funker)

• FaaS-like architecture

• Loads are automatically balanced via Docker's built-in LB

• No explicit task queue

• Could be easily portable to other container orchestrators

Implementation

masterBuilt-in

LB

Client

worker.2

worker.1

worker.3

Funker!

Docker Swarm-mode cluster(typically on cloud)

(typically on laptop)

Page 17: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

17Copyright©2017 NTT Corp. All Rights Reserved.

•Evaluated my hack against the CI of Docker itself• Of course, this hack can be applicable to CI of other

software as well

•Target: Docker 16.04-dev (git:7fb83eb7)• Contains 1,648 test functions

•Machine: Google Computing Enginen1-standard-4 instances (4 vCPUS, 15GB RAM)

Experimental result

Page 18: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

18Copyright©2017 NTT Corp. All Rights Reserved.

Experimental result

1h22m7s(traditional testing)

4m10s

1 node

10 nodesCost: x10

Speed-up: x19.7

15m3sCost: x1

Speed-up: x5.5

1 node

•20 times faster at maximum with 10 nodes• But even effective with a single node!

Page 19: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

19Copyright©2017 NTT Corp. All Rights Reserved.

Detailed result

Best BCR (benefit-cost

ratio)

1(Chunk size:

1648)

10(165)

30(55)

50(33)

70(24)

1

1h22m7s(=traditional)

15m3sN/A (more than 30m)

2 12m7s 10m12s 11m25s 13m57s

5 10m16s 6m18s 5m46s 6m28s

10 8m26s 4m31s 4m10s 4m20s

Containers running in parallel

Fastest

configuration

x5.5 (BCR x5.5)

x19.7 (BCR x2.0)

x8.1 (BCR x4.0)

x14.2 (BCR x2.8)

No

de

s

Time: average of 5 runs, Graph driver: overlay2

Page 20: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

20Copyright©2017 NTT Corp. All Rights Reserved.

What if no optimization techniques?

Nodes ParallelizeParallelize

+ Chunking

Parallelize +

Chunking+

Shuffling(= previous slide)

1

more than 30m

14m58s 15m3s

2 10m1s 10m12s

5 7m32s 5m46s

10 6m9s 4m10s

Significantly faster x1.5 faster

See the previous slide for the number of containers running in parallel

Page 21: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

21Copyright©2017 NTT Corp. All Rights Reserved.

Scale-out vs Scale-up?

NodesTotal

vCPUsTotal RAM Containers Result

Scaleout 10

40 150GB 50

4m10s

Scaleup 1 19m17s

with both chunking and shuffling

Scale-out wins• Better I/O parallelism, mutex contention, etc.

Page 22: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

22Copyright©2017 NTT Corp. All Rights Reserved.

•Record and use the past execution history to optimize the schedule, rather than just shuffling

•Investigate deeply why scale-out wins

Future work

Page 23: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

23Copyright©2017 NTT Corp. All Rights Reserved.

PR (merged): https://github.com/docker/docker/pull/29775

The code is available!

$ cd $GOPATH/src/github.com/docker/docker

$ make build-integration-cli-on-swarm

$ ./hack/integration-cli-on-swarm/integration-cli-on-swarm ╲

-replicas 50 ╲

-shuffle ╲

-push-worker-image your-docker-registry/worker:latest

Page 24: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

24Copyright©2017 NTT Corp. All Rights Reserved.

•Yes, with just a few line of modifications

•Planning to split this hack into an independent generic test tool

Is it applicable to other software as well?

Page 25: Parallelizing CI using Docker Swarm-Mode · •Loads are automatically balanced via Docker's built-in LB •No explicit task queue •Could be easily portable to other container orchestrators

25Copyright©2017 NTT Corp. All Rights Reserved.

•Docker Swarm-mode is effective for parallelizing CI jobs

•Introduced some techniques for optimal scheduling

Conclusion

1h22m7s(traditional testing)

4m10s

1 node

10 nodesCost: x10

Speed-up: x19.7

15m3sCost: x1

Speed-up: x5.5

1 node