How we've built Yahoo Fantasy Football (Droidcon Italy '15)

HOW WE’VE BUILT YAHOO FANTASY FOOTBALLAlex FlorescuYahoo UK#droidconit April 10th, 2015

OVERVIEW

Intro

Principles & practices

Testing

Internationalisation

Instrumentation & A/B testing

Performance

INTRO

London team started in January 2014

Fantasy Football (Fantasy Calcio) launched in July 2014

Android / iOS / Web clients + back-end team

THE APP

100k+ MAUs (on Android), ★★★★☆

Premier League, Campionato Italiano, Ligue 1©, Bundesliga, La Liga, MLS

KEY PRINCIPLES

Automate everything

Short release cycle

Performance, stability, quick changes

Track and measure everything

Data-driven product decisions

Stress and enforce principles, not process

ENGINEERING PRACTICES - CI

CI pipeline from day one

CD up to internal deployment

Unit testing & UI testing

Automatic APK generation and signing

Compile time configs for dev, dogfood & production builds

ENGINEERING PRACTICES - CI

Git flow: Work on a branch, do a pull request to merge

Short lived branches, keep PRs brief

Master always builds, always shippable

All code must be reviewed

Compile-time feature toggles “disable” code that is not ready

TESTING

CI without automated testing is …

Different levels of testing

On commit hook: robolectric suite

Next stage, smoke suite of UI tests

Nightly: full suite of UI tests, performance tests, monkey tests

ROBOLECTRIC TESTING

Robolectric tests run on JVM, no devices needed

Slower than plain JUnit tests, but significantly faster than UI tests

Very useful as unit tests

With architectures such as MVP, can also be acceptance tests

ROBOLECTRIC PROBLEMS

Not all Android framework functionality is replicated

Differences between JVM and Dalvik VM

Difficult to test complex user flows over multiple screens

Custom views sometimes problematic

OUR NUMBERS

700+ tests

50-60% coverage (higher in biz logic, lower in UI)

2’ to run, 6’ full build from scratch

UI TESTS

Good:

Proper integration tests

Run on device

Most closely resembling real user flows

Can catch device specific issues

UI TESTS

Bad:

Synchronisation problems (e.g. Button “OK” not found)

Brittle, hard to maintain

Very slow to run

Requires a device lab to be setup for CI

SMOKE SUITE VS FULL SUITE

Even small suites can take hours to run because of sync issues

For sanity checking, a smoke suite will do

Relatively fast (10-15min) & simple UI test

Ensure app runs and can see all screens

FULL SUITE

For enhanced testing, a nightly full suite

In-depth user flow tests, can run for hours

Make sure someone checks it daily! Should be a release blocker

CI PIPELINE

MONKEY TESTING

Useful for stability testing

Catches crashes and memory leaks

Could be included in automated nightly runs

Make sure app activity is restricted

Lock monkey in app (e.g. Surelock)

Consider removing certain features when monkey runs

TRACKING TESTING

Coverage useful for analysis (e.g. what areas get the least testing and why?), but should not enforce a coverage target

Reasonable to expect acceptance tests with features

Enforce testing through code review

Tests are code!

Refactoring, good architecture, documentation, still apply

I18N, L10N …

Translation: strings only

Localisation: adapting content for language, culture and region

Internationalisation: designing a product to allow localisation

CALCIO, SOCCER, FUßBALL…

We shipped to 20+ locales from day one

Challenges:

All strings needs to be translated

Number formatting, currency formatting etc.

Support, reviews, release notes

Testing load increased — UI issues with some locales only

I18N — DEALING WITH IT

Externalise all strings and enforce no lint errors on build

Collect all strings early for translation before they block release

Have standard release notes saved & translated for emergencies

Some test devices permanently on tricky locales

I13N — INSTRUMENTATION

What

Collecting data to understand how an app performs and how it is used

Why

Key to understanding what the users are doing

WHAT TO INSTRUMENT

Time spent in app

Buttons tapped

Loading time, network performance

Anything you want!

WHAT TO DO WITH DATA

How long does it take a user to create a team?

What are the best triggers for a user to sign in?

How often do users share something with friends?

Signs of frustration: e.g. repeating identical action

13N CHALLENGES

Collecting the data is the easy part (and it’s not easy)

Don’t reinvent the wheel, use 3rd party tools for this

We use Flurry

Real challenge:

What does user engagement mean? How do you measure it?

A/B TESTING — WHY?

What makes users more likely to invite or share with friends?

What makes users more likely to be engaged? Happy?

What features do we add or remove?

Is a new feature supporting our high level goals?

Goal: maximum user satisfaction and engagement with minimum number of features

EXPERIMENTS

Build-up an MVP of your new feature

Enabled the feature in a test bucket (e.g. only for 10% of users)

Data is collected for all users, bucket-aware and results are compared across test and control bucket

Results can be used to guide product decisions

EXPERIMENT EXAMPLE

Hypothesis: A prompt to share the newly created league will increase the number of shares

EXPERIMENT RESULTS

Succesful!

71% of users that see the prompt share the league

EXPERIMENT EXAMPLE

Hypothesis: A tutorial will increase the number of completed teams

EXPERIMENT RESULTS

Completion team was actually unaffected: hypothesis rejected

But, significantly more likely that they will complete the team in the same session

EXPERIMENTS

“Guesses” are not necessarily right

“Obvious” improvements may not be

Used correctly, real world data provides proof

PERFORMANCE

Caring is measuring

What numbers we track

Cold start time

FPS

Automated measurements (e.g. nightly build to track progress)

Track production numbers — this is what matters

PERFORMANCE

Numbers will vary wildly in different regions

Slower networks, older devices

When we started monitoring our world average for load time was ~2-3x our US/UK one

PERFORMANCE

WRAP-UP

CI & automated testing are key for quality and stability

Instrument everything, use data to experiment and guide product

A/B testing can confirm product hypothesis

You should localise your apps, but know what you’re getting into

Performance needs prod monitoring and on-going measurement

Q & A

yahoo-mep.tumblr.com

www.florescu.org

@flor3scu

Software

How we've built Yahoo Fantasy Football (Droidcon Italy '15)