The post release technologies of Crysis 3 (Slides Only) - Stewart Needham

Preview:

DESCRIPTION

For AAA games now there is a consumer expectation that the developer has a post release strategy. This strategy goes beyond just DLC content. Users expect to receive bug fixes, balancing updates, gamemode variations and constant tuning of the game experience. So how can you architect your game technology to facilitate all of this? Stewart explains the unique patching system developed for Crysis 3 Multiplayer which allowed the team to hot-patch pretty much any asset or data used by the game. He also details the supporting telemetry, server and testing infrastructure required to support this along with some interesting lessons learned.

Citation preview

The POST RELEASE TECHNOLOGIES OF CRYSIS 3

Twitter: @coolbeenz

Email: stewart@crytek.com

Job done?Introduction

CONTENTS1.The reasoning

2.Data Patching

3.Telemetry

Asset systems, Patch paks, Multiplayer flow, Handling failure & messaging

Collection, Storage, Syncing, Analysing, Matchmaking telemetry case study

Why, What, How

4.Release-DebugOther production mechanisms for gathering data

5.SummaryLessons learned and future developments

6.Questions?Over to you...

THE REASONINGPART1

“What are THEY for?”Post-Release Technologies...

TWEAKING

IMPROVING Diagnosing

Fixing Facilitating

the gameplay

the game experience

the cause of problems

bugs

themed weekends

“What EXACTLY are THEY?”Post-Release Technologies...

POST RELEASE TECHNOLOGIES

= DATA PATCHING + RELEASE DEBUG + TELEMETRY

“WHY DO WE NEED THEM?”Post-Release Technologies...

Because things do not always go to PLan

T200 (X360)27th Sept

Open BetaJan 29th

Closed Alpha

Nov 2ndT200 (PC)Oct 4th

T200 (PS3)11th Oct

T200 (X360)8th Nov

T200 (PS3)22nd Nov

T200 (PC)29th Nov

Because despite alphas, betas and numerous large scale tests things will still slip through the

net. The players are your most thorough QA.

The CRYSIS 3 TEST SCHEDULE T200 = EA Worldwide Tech 200

... For certification failures

... On discovering copyrighted content

... When players are abusing an exploit

As A way to Deploy ASSET FIXES RAPIDLY

BECAUSE CERTIFICATION COSTSTIME & MONEY

December 2012 JANUARY 2013 FEBRUARY 2013 MARCH

03-Dec 10-Dec 17-Dec 24-Dec 31-Dec 07-Jan 14-Jan 21-Jan 28-Jan 04-Feb 11-Feb 18-Feb 25-Feb 04-Mar 11-Mar

Open-beta liveOpen-beta cert

Final cert ReleaseRTM

Day 10 cert Day 10 live

40%Of commits

During CERT & RTM WEREASSETS & DATA

BECAUSE WE WANT PEOPLE TOKEEP PLAYING THE GAME

Because things don’t always go to PLan

SELL YOUR THEMED WEEKENDS

Because things don’t always go to PLan

SELL YOUR THEMED WEEKENDS

Because things don’t always go to PLan

SELL YOUR THEMED WEEKENDS

SO THAT WE CAN REACT TO FEEDBACK

AND BUILD A COMMUNITY

DATA PATCHINGPART2

CRYENGINE ASSET FILE SYSTEM - OVERVIEW

objects/level_specific/airport/architecture/terminal/main.cgfFiles referenced using paths

A virtual file systemFiles can be loose or part of asset packages (.pak) files

Files can be stored in memory, media or HDDPlatform agnostic API

CRYENGINE ASSET FILE SYSTEM - PAK FILES

Paks are digitally signed and encrypted in mastered buildsAntitamper mechanisms

A collection of filesThese are essentially zip archives of a folder hierarchy

Paks searched in order of most recently openedStack based searching

CRYENGINE ASSET FILE SYSTEM - PAK FILES

gEnv->pCryPak->OpenPak(“objects1.pak”);

gEnv->pCryPak->OpenPak(“objects2.pak”);

gEnv->pCryPak->OpenPak(“objects3.pak”);

objects1.pak

objects3.pak

objects2.pak

Search order

gEnv->pCryPak->FOpen(“objects/level_specific/airport/architecture/terminal/main.cgf”,”rbx”);

CRYENGINE ASSET FILE SYSTEM - PAK FILES

Level loading, MPModeSwitch.pakSome created for specific loading

Contents generally organised by typeObjects, animation, scripts, music, sounds, etc

.dds0, .chr, .cgf, .cgaSome created for streaming

PATCH PAKSA simple way to override ANY EXISTING ASSET?

... Create a patch.pak

... Mount this new pak file

... New assets will be prioritised

Mount it last or mark with a special ‘priority’ flag

Any subsequent file requests will be serviced by these patched files first

containing updated versions of specific assets

... Patching at the asset system levelSo individual game subsystems oblivious

... Only suitable for Title Updates and DLCAs we need to hardcode the loading of this pak file in a new executable

ON DEMAND PATCHING

... Differing lifetimes

... Separate hot/cold assets

... Risk reduction

DOWNLOADING & Applying PATCH PAKS TRANSPARENTLY

number of patch paks?”

Double XP Weekend vs Level setup fixes

Weapon balancing vs player stats fixes

Smaller files mean less chance of failure

“Why do we need to support a variable

ON DEMAND PATCHINGCRYSIS 3 IMPLEMENTATION DETAILS

Multiplayer Only

Process hidden within the transition to MP

Cache size of 2Mb (X360 only)

We already show a loading screen and re-initialise most game systems anyway

Self imposed limitations to reduced risk

Patch paks un-mounted on returning to single player

Regularly check for new updatesSo that players can be informed if they need to re-enter MP

It all starts with a file called Permissions.xml...

ON DEMAND PATCHINGDOWNLOAD PAKS INTO MEMORY OVER HTTP

MULTIPLAYER FLOW

User selects

Multiplayer

Login Online

ServicesTCR Reqs

Download

Permissions.x

ml

Check Cache

Download

Patch1.pak

Download

Patch2.pak

Mount paksInit Game

systems

Overview

MULTIPLAYER FLOW

User selects

MultiplayerTCR Reqs

Login Online

Services

Download

Permissions.x

ml

Check Cache

Download

Patch1.pak

Download

Patch2.pak

Mount paksInit Game

systems

Points of failure

MULTIPLAYER FLOW

TCR Reqs

TCR Requirements

Hook into existing handling

Require an extra 2Mb in save game

Cannot proceed unless allowed online

User selects

Multiplayer

Login Online

Services

How do we handle these?

Online play checks

Need extra storage to cache paks

MULTIPLAYER FLOW

TCR ReqsDownload

Permissions.x

ml

Check Cache

Download

Patch1.pak

Download

Patch2.pak

Failing to download

General networking failures

Bespoke networking configurations

Abort!No patches

No telemetry

How do we handle

these?

What can go wrong?

MULTIPLAYER FLOW

Download

Permissions.x

ml

Check Cache

Download

Patch1.pak

Download

Patch2.pak

Mount paks

Failing to download

What can go wrong?

MD5 Checks

TimeoutsGeneral networking failures

How do we handle these?

Cache Paks (Anti-tamper checks)Continue to download in the backgroundProvide help with manual download?

MULTIPLAYER FLOW

Download

Permissions.x

ml

Check Cache

Download

Patch1.pak

Download

Patch2.pak

Mount paks

Failing to download

Implement a configurable timeout

“WON’T THIS RESULT IN PLAYERS HAVING MIS-MATCHING SETS OF PATCHES?”

But...

YESBut it is ok because we have a plan...

1.Isolate PLAYERSThis is basically using the same checks used to isolate people running

old builds (Retail & Development)

Client AVersion

oxA5BC

Client CVersion

oxA5BC

Server 1Version

oxA5BC

Server 2Version

ox3370

Client BVersion

ox3370 Client DVersion

ox3370

Version code used as a matchmaking filter &

during context establishment.

P1

P2

P1

P2

1.Isolate PLAYERSXOR in the MD5s of each patch pack to create a unique version code

Client AVersion

oxA5BC

Client CVersion

oxA5BC

Server 1Version

oxA5BC

Server 2Version

ox3370

Client BVersion

ox3370 Client DVersion

ox3370

P1 P2

P1 P2

0x96CC

0x0100

0xA4BCXOR

XORExe

P2

P1

0x3370Matchmaking =

2.COMMUNICATELet players know that they are matchmaking against a reduced pool

DATA PATCHING FUTURE DEVELOPMENTSASSET DELTAs

Full file must be deployed for small modification

Text based assetsXML & LUA Files can easily have a delta injected after assets loading

Some of our XML files can be up to 500Kb in size

Regularly check for new updates

DATA PATCHING FUTURE DEVELOPMENTSASSET DELTAs

Patch XML Nodes

More complicated but huge savingsExtra tools & build steps required but xml patches reduced in size to 1-2% of original

Add, remove or modify at a node level

Current permissions.xml end-point fixed

Need a way to redirect the request externally

Added bonus

Using build-version, SKU-ID, Tags etc

Could use this to patch net-tests, fix dev builds etc

This makes testing new patches difficult

DATA PATCHING FUTURE DEVELOPMENTSRe-DIRECT HTTP REQUESTS

Some patches are not gameplay critical

Exclude these from any filteringBasically, do not XOR this packs MD5 into the matchmaking version

For example cosmetic asset changes or players personal stats configurations

0xA4BCXOR

XORExe

P2

P1

0xA5BCMatchmaking =

0x96CC

0x0100

DATA PATCHING FUTURE DEVELOPMENTSDIFFERENTIATE GAME CHANGING PAKS

TELEMETRY COLLECTIONPART3

TELEMETRY COLLECTION - CLIENT OVERVIEW

Data zipped up and streamed asynchronouslyCompressed and streamed

Collection and uploading via HTTPSimple API to push data from files or memory

Fire & Forget. Upload may fail for numerous reasons No Guarantees

TELEMETRY COLLECTION - SERVER OVERVIEW

No requirements for immediate results No complex processing on the server

Storage of files received onlyOrganised by date, platform and type

Any usernames & accounts salted and hashedAnonymous data

TELEMETRY COLLECTION - SYNCING DATA

Data deleted after seven daysServer data kept for fixed time period

Downloaded to Crytek servers Rsync-ed daily to internal servers

Ultimately discardedAnalysed locally

TELEMETRY COLLECTION - PROCESSING

Considered the weakest link in the chainManually triggered and collated

Turning raw telemetry into useful dataAchieved with a mixture of python & Excel

Optimising has never been a high priorityProcessing is slow and intensive

“HOW DO YOU HANDLE HUNDREDS OF THOUSANDS OF CLIENTS UPLOADING SIMILTANEOUSLY ?”

So...

SAMPLE PLAYERSSample deterministically at the client end

User:

coolbeenz

bool shouldUpload = (Hash( username ) % denominator) < numerator;

0x12345678 0x2E8 NOHash % 1000 < 100 ?

SAMPLE PLAYERSSample deterministically at the client end

Upload Do not Upload

Select a large denominator and do not change this

Choose a numerator to give you the desired sampling ratio

100

Vary the numerator to meet changing sampling demands

This sets the amount the sampling ratio can be incremented by

E.g 100/1000 = 10%

The individual users being sampled remains consistent

coolbeenz1000

“WHAT KIND OF TELEMETRY DO YOU COLLECT?”And...

CRYSIS 3 MATCHMAKING TELEMETRY

Matchmaking one of the top 5 complaints

Find a session fast but find a good session

For consoles & PC

This essentially boils down to ping times

PC also has a quick match option as well as a server browser

Based on MyCrysis Forum feedback

QUANTIFYING THE BLACKBOX

Tricky to balance and impossible to predictRequires constant re-evaluation even with adaptive algorithms

User experience feedback not good enoughYou know people are not happy but why exactly?

Create a system which is data driven

Server side

Client Side

Used Blaze servers. Rule based, highly configurable, including relaxation criteria

The rules and times used can be configured and therefore data patched

If we are going to collect telemetry we need to be able to action a response

CRYSIS 3 MATCHMAKING TELEMETRYSO WHERE DO WE START?

Q.How many times does a player matchmake?

Q.What kind of ping times do players get during that session?

Q.How long does it take a player to get into a session successfully?

Q.What is the most popular method of joining a session?

Q.What is the average matchmaking time?

CRYSIS 3 MATCHMAKING TELEMETRYDECIDE WHAT QUESTIONS NEED ANSWERING

Need a solution that does not result in GB’s of data

Collect a series of timestamped events in XML

Timestamps based on a zero base time

But still want to be flexible enough to answer a range of questions

Also collect meta data for each event

But still store a server timestamp for collating multiple clients data

<AttemptConnection Method="MatchMake" Timestamp="0.000" />“GameBrowser”“Join Session in progress”“Friend Invite”“Join Squad”

CRYSIS 3 MATCHMAKING TELEMETRYIMPLEMENT AN APPROPRIATE TELEMETRY SOLUTION

Q.How many times does a player matchmake?

Collect time stamped events with meta data

Q.How long does it take a player to get

into a session successfully?Q.What is the most popular

method of joining a session?

Q.What is the average matchmaking time?

CRYSIS 3 MATCHMAKING TELEMETRYIMPLEMENT AN APPROPRIATE TELEMETRY SOLUTION

RESULTSMatchmaking Telemetry

The most surprising result was that there were still 2 major bugs in the

client side code

Eventually this was increased to 82%

One of these was fixed with a data patch. Win!

The results were very insightfulResulted in several iterative improvements

Initially 65% of players took less than 5 seconds to find a match

Still not perfect but there are many external factors at play

1 in 15 matchmaking requests fail

RESULTSMatchmaking Telemetry

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 20 255 10 15Time (s)

% Users Matched

Time to matchmake

RESULTSMatchmaking Telemetry

How do players join a session?

Quick Match

Join Squad - Already In Game

Join Squad - Lobby

Private Game

Join Friends Game

Server Browser

Quick Match

Join Squad - Already In Game

Join Squad - Lobby

Join Friends Game

Console

PC

Automate the analysis of the telemetry

Utilise A/B testing

User actions telemetry

The results change over time so results can be skewed by a different player pool

We did not collect all user action events. For example when the user backed out

Manual process meant delays in turning around changes

FUTURE DEVELOPMENTSMatchmaking Telemetry

RELEASE DEBUGPART4

DEBUG SCREENSEnsure you can gather the info you need in large scale public testing

ERROR CODESEmbed error codes as well as user friendly (TCR) messaging

SUMMARYStart Early

Collecting telemetry is easy

Have the ability to scale collection

Turning that into useful information is difficult

Be able to balance server load and fail safe

Think ahead, the technology involved is complex and cannot be bolted on

Make it easy to testDont underestimate the amount of test required in development

Automate as much as you canAny manual elements of the system become it’s weakest point

Get buy-in from managementIt is difficult to justify continued support when the returns are not directly financial

“Do you HAVE ANY QUESTIONS?”That is it!

THANKYOU FOR LISTENINGAny feedback, positive or negative welcomed

Twitter: @coolbeenz

Email: stewart@crytek.com

Recommended