Transcript
Page 1: Escaping Automated Test Hell - One Year Later

Main sponsor

Escaping Automated Test Hell

Wojciech Seliga

One year later...

Page 2: Escaping Automated Test Hell - One Year Later

About me

• Coding for 30 years

• Agile Practices (inc. TDD) since 2003

• Dev Nerd, Tech Leader, Agile Coach, Speaker

• 5+ years with Atlassian (JIRA Development Team Lead)

• Spartez Co-founder

Page 3: Escaping Automated Test Hell - One Year Later

Year ago - recap

Page 4: Escaping Automated Test Hell - One Year Later

18 000 tests on all levels

Very slow and fragile feedback loop

Page 5: Escaping Automated Test Hell - One Year Later

Serious performance and reliability issues

Page 6: Escaping Automated Test Hell - One Year Later

FeedbackSpeed

`Test

Quality

Page 7: Escaping Automated Test Hell - One Year Later

Test Code is Not Trash

Design

MaintainRefactor

Share

Review

Prune

Respect

Discuss

Restructure

Page 8: Escaping Automated Test Hell - One Year Later

Optimum Balance

Page 9: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation

Page 10: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed

Page 11: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage

Page 12: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level

Page 13: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level Access

Page 14: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level Access Effort

Page 15: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

Page 16: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

Quality / Determinism

Page 17: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

MaintainabilityQuality / Determinism

Page 18: Escaping Automated Test Hell - One Year Later

Splitting codebase is key aspect of short test feedback loop

Page 19: Escaping Automated Test Hell - One Year Later

Now

Page 20: Escaping Automated Test Hell - One Year Later

People - Motivation

Page 21: Escaping Automated Test Hell - One Year Later

Shades of Red

Page 22: Escaping Automated Test Hell - One Year Later

Pragmatic CI Health

Page 23: Escaping Automated Test Hell - One Year Later

Build Tiers and Policy

Tier A1 - green soon after all commits

Tier A2 - green at the end of the day

Tier A3 - green at the end of the iteration

unit tests and functional* tests

WebDriver and bundled plugins tests

supported platforms tests, compatibility tests

Page 24: Escaping Automated Test Hell - One Year Later

Wallboards: Constant

Awareness

Page 25: Escaping Automated Test Hell - One Year Later

Training

• assertThat over assertTrue/False and assertEquals

• avoiding races - Atlassian Selenium with its TimedElement

• Unit tests over functional tests

• Brownbags, blogs, code reviews

Page 26: Escaping Automated Test Hell - One Year Later

Quality

Page 27: Escaping Automated Test Hell - One Year Later

Automatic Flakiness Detection Quarantine

Re-run failed tests and see if they pass

Page 28: Escaping Automated Test Hell - One Year Later

Quarantine - Healing

Page 29: Escaping Automated Test Hell - One Year Later

SlowMo - expose races

Page 30: Escaping Automated Test Hell - One Year Later

Selenium 1

Page 31: Escaping Automated Test Hell - One Year Later

Selenium 1

Page 32: Escaping Automated Test Hell - One Year Later

Selenium ditching Sky did not fall in

Page 33: Escaping Automated Test Hell - One Year Later

Ditching - benefits

• Freed build agents - better system throughput

• Boosted morale

• Gazillion of developer hours saved

• Money saved on infrastructure

Page 34: Escaping Automated Test Hell - One Year Later

Ditching - due diligence

• conducting the audit - analysis of the coverage we lost

• determining which tests needs to rewritten (e.g. security related)

• rewriting the tests

Page 35: Escaping Automated Test Hell - One Year Later

Flaky Browser-based TestsRaces between test code and asynchronous page logic

Playing with "loading" CSS class does not really help

Page 36: Escaping Automated Test Hell - One Year Later

Races Removal with Tracing// in the browser:function mySearchClickHandler() {    doSomeXhr().always(function() {        // This executes when the XHR has completed (either success or failure)        JIRA.trace("search.completed");    });}// In production code JIRA.trace is a no-op

// in my page object:@InjectTraceContext traceContext; public SearchResults doASearch() {    Tracer snapshot = traceContext.checkpoint();    getSearchButton().click(); // causes mySearchClickHandler to be invoked    // This waits until the "search.completed" // event has been emitted, *after* previous snapshot        traceContext.waitFor(snapshot, "search.completed");     return pageBinder.bind(SearchResults.class);}

Page 37: Escaping Automated Test Hell - One Year Later

Speed

Page 38: Escaping Automated Test Hell - One Year Later

Can we halve our build times?

Speed

Page 39: Escaping Automated Test Hell - One Year Later

Parallel Execution - Theory

End of Build

A1

Batches

Start of Build

Page 40: Escaping Automated Test Hell - One Year Later

Parallel Execution

End of Build

A1

Batches

Start of Build

Page 41: Escaping Automated Test Hell - One Year Later

Parallel Execution - Reality Bites

End of Build

A1

Batches

Start of Build

Agent availability

Page 42: Escaping Automated Test Hell - One Year Later

Dynamic Test Execution Dispatch - Hallelujah

Page 43: Escaping Automated Test Hell - One Year Later

Dynamic Test Execution Dispatch - Hallelujah

Page 44: Escaping Automated Test Hell - One Year Later

"You can't manage what you can't measure."

W. Edwards Deming

Page 45: Escaping Automated Test Hell - One Year Later

"You can't manage what you can't measure."

W. Edwards Deming

If you believe just in it

you are doomed.

Page 46: Escaping Automated Test Hell - One Year Later

You can't improve something if you can't measure it

Page 47: Escaping Automated Test Hell - One Year Later

You can't improve something if you can't measure it

Profiler, Build statistics, Logs, statsd → Graphite

Page 48: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Page 49: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

Page 50: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

Page 51: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Page 52: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Agent Availability/Setup

Page 53: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Agent Availability/Setup

Publishing Results

Page 54: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Page 55: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Page 56: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)

Page 57: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)

Publishing Results (1min)

Page 58: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

Publishing Results (1min)

Page 59: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

SCM Update (2min)

Publishing Results (1min)

Page 60: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

SCM Update (2min)

Agent Availability/Setup (mean 10min)

Publishing Results (1min)

Page 61: Escaping Automated Test Hell - One Year Later

Decreasing Test Execution Time to

ZERRO alone would not let us

achieve our goal!

Page 62: Escaping Automated Test Hell - One Year Later

Agent Availability/Setup

• starved builds due to busy agents building very long builds

• time synchronization issue - NTPD problem

Page 63: Escaping Automated Test Hell - One Year Later

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Stash was thankful (queue)

SCM Update - Checkout time

Page 64: Escaping Automated Test Hell - One Year Later

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Stash was thankful (queue)

SCM Update - Checkout time

2 min → 5 seconds

Page 65: Escaping Automated Test Hell - One Year Later
Page 66: Escaping Automated Test Hell - One Year Later

• Fix Predator

• Sandboxing/isolation agent trade-off:rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)

Fetching Dependencies

Page 67: Escaping Automated Test Hell - One Year Later

• Fix Predator

• Sandboxing/isolation agent trade-off:rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)

Fetching Dependencies

1.5 min → 10 seconds

Page 68: Escaping Automated Test Hell - One Year Later

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW -T 1.5C*optimal factor thanks to scientific trial and error research

Page 69: Escaping Automated Test Hell - One Year Later

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW -T 1.5C*optimal factor thanks to scientific trial and error research

7 min → 1 min

Page 70: Escaping Automated Test Hell - One Year Later

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

3000 poor tests(5min)

11000 good tests(1.5min)

Page 71: Escaping Automated Test Hell - One Year Later

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

7 min → 5 min

3000 poor tests(5min)

11000 good tests(1.5min)

Page 72: Escaping Automated Test Hell - One Year Later

Functional Tests

• Selenium 1 removal did help

• Faster reset/restore (avoid unnecessary stuff, intercepting SQL operations for debug purposes - building stacktraces is costly)

• Restoring via Backdoor REST API

• Using REST API for common setup/teardown operations

Page 73: Escaping Automated Test Hell - One Year Later

Functional Tests

Page 74: Escaping Automated Test Hell - One Year Later

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history - to be addressed

Page 75: Escaping Automated Test Hell - One Year Later

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history - to be addressed

1 min → 40 s

Page 76: Escaping Automated Test Hell - One Year Later

Unexpected Problem

• Stability Issues with our CI server

• The bottleneck changed from I/O to CPU

• Too many agents per physical machine

Page 77: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Page 78: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Page 79: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Page 80: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Publishing Results (40sec)

Page 81: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

Publishing Results (40sec)

Page 82: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

SCM Update (5sec)

Publishing Results (40sec)

Page 83: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

SCM Update (5sec)

Agent Availability/Setup (3min)*

Publishing Results (40sec)

Page 84: Escaping Automated Test Hell - One Year Later

Improvements Summary

Tests Before After Improvement %

Unit tests 29 min 17 min 41%

Functional tests 56 min 34 min 39%

WebDriver tests 39 min 21 min 46%

Overall 124 min 72 min 42%

* Additional ca. 5% improvement expected once new git clone strategy is consistently rolled-out everywhere

Page 85: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 86: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 87: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 88: Escaping Automated Test Hell - One Year Later

But that's still bad

We want CI feedback loop in a few minutes maximum

Page 89: Escaping Automated Test Hell - One Year Later

Splitting The Codebase

Page 90: Escaping Automated Test Hell - One Year Later

Resistance against splittingThe last attempt: Magic Machine

Decide with high confidence (e.g. > 95%) which subset of tests to run basing on the committed changes

Page 91: Escaping Automated Test Hell - One Year Later

Magic Machine

• Looking at Bamboo history (analysing correlation between changes and failures)

• Matching: package test/prod code and transitive imports

• Code instrumentation (Clover, Emma, AspectJ)

• Run most often failing first

Page 92: Escaping Automated Test Hell - One Year Later

Inevitable Split - Fears

• Organizational concerns - understanding, managing, integrating, releasing

• Mindset change - if something worked for 10 years why to change it?

• We damned ourselves with big buckets for all tests - where do they belong to?

Page 93: Escaping Automated Test Hell - One Year Later

Magic Machine strikes back

With heavy use of brain, common sense and expert judgement

Page 94: Escaping Automated Test Hell - One Year Later

Splitting code base• Step 0 - JIRA Importers Plugin (3 years ago)

• Step 1- New Issue View and NavigatorJIRA 6.0

Page 95: Escaping Automated Test Hell - One Year Later

We are still escaping hell. Hell sucks in your soul.

Page 96: Escaping Automated Test Hell - One Year Later

Conclusions

• Visibility and problem awareness help

• Maintaing huge testbed is difficult and costly

• Measure the problem

• No prejudice - no sacred cows

• Automated tests are not one-off investment, it's a continuous journey

• Performance is a damn important feature

Page 97: Escaping Automated Test Hell - One Year Later

Do you want to help?We are hiring in Gdańsk• Principal Java Developer

• Development Team Lead

• Java and Scala Developers

• UX Designer

• Front-End Developer

• QA Engineer

Visit us at the booth or apply at http://www.atlassian.com/company/careers

Page 98: Escaping Automated Test Hell - One Year Later

• Turtle - by Jonathan Zander, CC-BY-SA-3.0

• Loading - by MatthewJ13, CC-SA-3.0

• Magic Potion - by Koolmann1, CC-BY-SA-2.0

• Merlin Tool - by By L. Mahin, CC-BY-SA-3.0

• Choose Pills - by *rockysprings, CC-BY-SA-3.0

Images - Credits

Page 99: Escaping Automated Test Hell - One Year Later

Thank You!