Quality Assurance and Game Testing. Post. doc., Center for Computer Games Research, IT University...

Preview:

Citation preview

Post. doc., Center for Computer Games Research, IT University of Copenhagen Don’t pay much, but opportunity for ”blue

sky research”

General expertise with the user experience Game testing Game design Etc. etc. etc.

Primer on empirical research methods

Game Testing 101: General principles

Game testing during the production cycle With introduction to several key methods

Focus: Theory, Practice & Tools

Scientific theory

Empirical research approaches

Empirical game studies

All technology testing is based on empirical research and evaluaiton methods

To understand what games testing really is, you must understand empirical research approaches

If not: Blind use methods you do not understand

Science: Any systematic knowledge or practice.

Science generally refers to a way of acquiring knowledge through the scientific method, as well as the organized body of knowledge gained through such research.

Adheres to positivist philosophy: Only authentic knowledge is scientific knowledge

Science = Logic + Observation

Three types of science: Natural science: The study of natural phenomena Social science: The study of human behavior and

societies Formal science: Mathematics – uses a priori rather

than empirical methods, includes statistics and logic

Two first are empirical sciences, third a mixture, however all feed into each other

▪ A priori = deductive knowledge (independent of experience)

▪ A posteriori = Inductive knowledge (dependent on experience)

Experimental science: Another term for empirical sciences

Applied science: Application of scientific research to specific human needs – such as game testing!

The two are often combined

Empirical sciences Knowledge obtained from observable

phenomena Reproduceable: Phenomena must be

reproduceable under experimental conditions by other scientits, in order to validated.

Careful, objective and systematic study of an area of knowledge

Must follow the scientific method

The scientific method A body of techniques for investigating

phenomena, acquiring knowledge

Collection of data through observation and experimentation, and the formulation and testing of hypotheses

Evidence must be observable, empirical and measureable, subject to principles of reasoning

Empirical research must follow:1. Define the question 2. Gather information and resources (observe) 3. Form hypothesis 4. Perform experiment and collect data 5. Analyze data 6. Interpret data and draw conclusions that serve as a

starting point for new hypothesis 7. Redo entire cycle if necessary8. Publish results 9. Retest (frequently done by other scientists)

Alternative: Explorative approach – similar requirements on

objectivity and reasoning, but forgoes hypothesis forming.

Example1. Are there any bugs with this feature of game X?2. Get the game, set up a lab and assimilate knowledge from

other test cases 3. Hypothesis: There probably are some bugs in our game .... 4. Run tests and collect test data5. Analyze data 6. Interpret test data and draw conclusions: We found X number

of bugs – do we have reason to believe the bugs all been found?

7. Redo entire cycle if necessary8. Publish results to bug database and get designers to fix them 9. Retest to see if bugs have been fixed

Game testing should always follow the scientific method!

A hypothesis defines an expected relationship between variables, which can be empirically tested.

For example: Eliminating the minimap in StarCraft will

increase player engagement Making the bazooka do more damage will

balance the weapons in this game There are no bugs in this level

Empirical research methods come in two forms:

Quantitative methods: Collect numerical data, strictly objective, analyzed using statistical methods

Qualitative methods: Collect data in the form of text, images, sounds etc. Drawn from observations, interviews, documentary

evidence etc., analyzed using qualitative data analysis methods (e.g. content coding)

Data and analysis can be subjective: Relies on researcher experience

Qualitative: More appropriate in early stages of research

(exploratory research) and for theory building Qualitative methods applies well in real world

setting, but lack validity and control Problem with subjective interpretation of the data

Examples▪ Case study: Observations carried out in a real world setting▪ Action research: Applying a research idea in practice,

evaluate results, modify idea (cross btw. experiment and case study)

Quantitative: Appropriate when theory is well developed. Theory testing and refinement

Examples:▪ Experiment: Apply treatment, measure results: This is

the only method that can demonstrate causal relationship between variables. Associated with the scientific method

▪ Survey: Asking rated questions in an interview▪ HIstorical data: Patterns in investments

Most quality research include both types of methods

Method selection is critical to success of any project

Selection must be driven by state of knowledge

Determining whether a hypothesis/theory is supported – easy with bugs, hard with game balancing

Quantitative data analysis: Use of statistical methods to identify patterns and relationships in the data

Qualitative data analysis: More subjective, relies on the researcher’s knowledge to identify patterns, extract themes and make generalizations

Data is objective – otherwise it is information

Processed (refined) information is termed knowledge

Generally: Data Information Knowledge

Foundational principle for all IT industries

QA is a knowledge acquisition process

Summarized: QA is the empirical process of

acquiring data, refining the data into information, and converting it to knowledge that can be implemented by company stakeholders (design, marketing etc.)

A reason why companies hire lots of testers

during crunch time ...

QA in game the games industry

Components of game testing

General purposes of game testing

Testing phases: Intro

Purpose of game testing:

To see how specific components of, or the entirety of, a game is played by people

The litmus test that allows developers to evaluate the state of the game and the quality of the gaming experience

The CompanySort of the company ...

QA is not a part of the main company by necessity:Keeping QA separate eliminates bias

QA is viewed as a necessary evil – low pay and crappy conditions are common

QA informs what is wrong in games under development (causing frustration)

Many forget QA can also tell what is good (causing happiness)

General software industry: QA takes 8-12% total resource

Games industry: less than 1% ....

General software industry: QA throughout production

Games industry: QA often delegated to secondary position in production pipeline

Result:

Digital games has horrible quality compared to

e.g. desktop applications

Non-technical game testing falls within HCI

HCI: Human-Computer Interaction

Mixture of computer science, psychology etc.

Many different types of measures – quantitative and qualitative

20+ years of use in the software industry

Purpose: Technical, content, functional

Phase: Positioning in the development cycle

Testing method: e.g. usability, bug hunting

Game feature: The element being tested

Technical Issues relating to the game engine itself and hardware Well-established methods common to software

development

Functional Bug hunting, stability, integrity of game assests,

gameplay, localization issues, controls, interface

Content Presentation, graphics, level design, game story, user

experience

Game production cycle has 3 general steps: Pre-production Production Post-production

Most game companies follow agile development Sprints and Scrums Rapid iterations of game elements Requires QA to follow same iterative nature

Pre-production Focus group tests Benchmarking

Production Metrics Bug hunting Playtesting Usability testing Game test labs

Post-production Post-mortems & managing communities

Important phase, but often overlooked

Testing of design and concepts: Story, character, world, artwork

Two typical methods: Focus groups Benchmarking

Popular method, but problematic

Good use can lead to valuable insights, bad use to disaster

Good for generating ideas, player impressions, norms/values of the audience

Bad for providing concrete feedback to specific issues

Intensive design: Few units but lots of variables

Central weakness: Non-representativeness of the group participants

Analytical selection: Group participants should display the characteristics required to illustrate the case at hand

Size: from 3-12

Less ruins interaction, more makes them impossible to manage

Testers: build a good tester database

Screen people before adding them

Cover target audience and outside it

Types of participants

Internal: From the company Literature advises against using people we

know in focus groups But some internal testers can be treated as

expert testers

External: From outside the company Fans and non-fans: The problem with bias

Practical considerations for running focus groups

Homogenous or inhomogenous structure?

Should participants know each other? Less likely to speak freely if they do Easier to get people to talk if they do

Group size

Small groups: Good for digging deep into associations of players Low degree of moderation, loose structure

Large groups: Good for gathering many different perspectives High degree of moderation, tight structure

Running a focus group

Prepare in advance: interview guide, purpose

Decide loose or tight structure Loose structure harder to compare across

groups Tight structure less chance of new knowledge

Visual aids should be ready

The moderator

Monitors and moderates the focus group

Incredibly important: must be a good listener and highly attentive to the participants and the social interaction

Usually teamed with an observer

A note of warning:

Focus groups are often run by marketing, game testing by QA – this is BAD!

Consumer testing (on e.g. the box art) should be run by marketing, but NOT game test focus groups

Little used in the industry, early-phase

A form of requirements analysis

Methodical evaluation of competing games, recording what works and what does not

Provides the minimum benchmark the new game must meet

Vast majority of testing during this phase

Early testing of game controls and specific game elements

Later testing of alpha builds, mechanics, story etc.

Iterative test pattern following agile development

E.g. on a bi-weekly basis Defining tests needed Run tests on newest builds Collate and analyze data Deliver reports Log results in test database

Numerical data drawn from client installs or servers

Tracking what players do when they play e.g. Who shoots whom where and when?

(heatmaps) Which areas of the map to players explore? Is the balancing between weapons working?

Immensely useful in games!

Logging the X,Y (Z) position of the player in real time and what people do in that time

Technique developed extensively by Microsoft Game Labs for Halo III (used by others before though)

Questions this can answer, e.g.: Which areas of the level map do players utilize? What do players do with their time? (is it what we

thought?) Does the level promote the behavior we anticipated? Where do players experience problems progressing

through a level? Do our players move through the map as we expected?

Logging the weapons used, the target, effect and the position when used.

Weapon balance is a key aspect of multi-player games, notably PvP.

Questions this can answer, e.g.: Is this weapon too hard or to easy to use? Is a specific weapon too effective, or too

ineffective? (balancing) Are there specific maps where specific weapons

are too effective? Do players use all the weapons the game offers? Do players use the weapons as the developers

intented? (if not, how can we use this?)

Logging how players fare in the face of game challenges – and other players

Balancing player tasks and challenge levels is one of the hardest design tasks

Questions this can answer, e.g.: How long do players survive on this map? Do players ever complete the map objectives? Is this map favorable to a specific side/team? Are there any patterns in the way people play

the game?

Metrics can inform WHAT players are doing

Metrics cannot inform WHY players are doing something

For this we need other types of tests

Bug hunting is a heavily structured process of locating game flaws and reporting them

Bug hunting is important because of the myriad opportunity for conflicts in the game code, objects etc.

Usually done by professional game testers Lousy job, huge turnaround, lots of issues here

Two overall purposes

Finding new bugs Hmm, I wonder what happens if I try running

into this door whilst firing the bazooka?

Trying to recreate old bugs that may have been fixed Hold ”up” when entering or leaving a room

and any currently-held items will be dropped

”The design equivalent of bug hunting” (Rouse, 2003)

When players see if the game is fun and try to find faults in the mechanics themselves AI, Controls, Balancing, Etc.

Playtesting is typically done with stable builds, feedback gathered via structured questionnaires and game logs

Playtesting is a form of user experience analysis:

Evaluating the impact of factors, deciding the experience of using a product

Focused on content and functionality

More than 20 years of method history within the software- and consumer products industry

Established methods, that are beginning to be adapted to the unique nature of digital games

Focus groupsBenchmarking

Usability testing(Technical QA)

Focus groupsBug huntingPlaytesting

Bug huntingPlaytestingBenchmarking

Playtesting must provide specific, structured feedback ”The third boss was hard to kill” - is not specific

”The third boss on the second map was hard to kill – I had no idea what to do or whether I had missed a special weapon. Perhaps I should have used the big rock, but I was not sure” – is specific

“Problem: Players do not find the rocket launcher for killing [boss name] on [map name]. It is not obvious to them that they should sidetrack to locate this weapon before the encounter” – is specific and structured

Key to working with playtesters:

Knowing when to take their opinions seriously, and when not to

Understanding the biases they operate under

Need different kinds of playtesters Internal playtesters – experts, but

subjective Professional playtesters – game testers

who also provide feedback on gameplay etc. Typically hard core gamers only

Amateur playtesters – shows us how players will react when meeting the game

Non-gamers – locates gameplay issues that are non-intuitive, and overlooked by experienced gamers

Some peope to avoid as playtesters People you know personally Your boss Hard-core fans of the game/company Idiots : ” The fourth type of people that you do

not want to have testing your game is idiots. Idiots tend to say idiotic things and have idiotic opinions, and as a result will not be of much help to you” (Rouse).

But who are the idiots? The people who disagree with the designer?

Early playtesting Best done by experienced people who can

overlook obvious flaws Generally guided playtesting

Late playtesting Good with inexperienced testers, when the

game is tuned and balanced Ideally all kinds of testers at alpha and beta Generally unguided playtesting – players

roam around

Interactive products need to facilitate the tasks the user is performing with it

The goals: Accessibility, utility and ease of use (and satisfaction)

In software, usability traditionally applied to interface design (quick, intuitive, easy)

Usability: In the domain of user-centered design

In games, pure usability is not enough (but really nice!)

The interactive experience must also be fun, immersive and engaging

Usability methods therefore adapted to gaming

Purpose: To find out how players interact with the game and track functional problems as well as content problems

In practice, many different ways of running usability tests, e.g. Expert test Think-aloud-test Task completion test Interface and functionality test

User position

Observer position

Data gathered from usability testing: Screen capture Metrics Comments from the player Error rates Behavior analysis Survey data Interview data And yet more ...

Task completion tests

Small samples are used: 6-10 people in a single test round

Rationale: 10 people locate 80-95% of the interface problems a group of 30 would find

Session times: 60-90 minutes

Two main phases

1) Player spends some unstructured time, familiarizing themselves with the game and controls

2) Player goes through a specific set of pre-defined tasks

Tasks based on pre-planned use cases

Can be done on a paper mock-up, prototype or alpha/beta build

Either observer next to the player or in a neighbouring room

Often followed by a structured survey or interview

Heuristic evaluation: Much used in interface design

Heuristics are design principles, e.g. Games should be easy to learn, hard to master Game interfaces should be intuitive to the player

Heuristics often used with expert usability tests, notably early in the design phase

Problems found earlier are easier to fix

Still not a good list of heuristics for games

Heuristic: Things the player needs to see should stand out. (i.e. everything the player needs to see needs to be big enough to be perceived)

The only object in this car chase game that can damage you is the tiny red dot just to the left of your car.

The object is too small for some players to see. The challenge should be evading the bullet, not seeing it in the first place.

Post-mortems Documentation of the experiences during the game

production, for new productions Rounded tests of the game, setting benchmarks for

the next game Community feedback

Collecting metrics for e.g. rankings, score board Running support and updating

Running tests of patch updates Client data: Fantastic wealth of metrical data! MMOs: Running tests of new content, data mining

on player activities etc. etc. etc.

Game testing is carried out in game testing labs

Laboratories provide a controlled setting, however, it also removes the player from his/her natural environment

Fundamental assumption of lab-based research:

Conditions within correspond to conditions without

3 general types, often mixed Focus group labs: Any multi-media

capable room

Playtesting labs: Banks of computers with wall partitions, open setups for multi-player

Usability labs: Two rooms connected by one-way mirrors and cameras. Tester in one room, observers in the other

Networked PCs

One-way mirrors

Room 1

Room 2 Room 3

Observation area

Observation area

Ceiling microphones

CONE – 180 degree WTW screen

High-end PCs

Coffee maker

Motion Capture Suits, VR Gloves, Stereoscopic goggles, head mounted displays ...

More high-end PCs

QA is vital in the games industry for numerous reasons, e.g.

- In meeting high consumer standards and minimizing returns- and support costs

- “No longer confined to the production domain of bug-hunting, testers are expanding into the territory of usability and focus group testing to help ensure higher customer expectations are met” [Gamasutra] (and the expectations are rising dramatically).

- QA is a breeding ground for talent of all types and a key internal recruitment resource

Game testing is not game design by committee

Input that comes from game testing should not be used and implemented without careful consideration

Testing may indicate something should change, but if your gut instinct say no, think about it first

You cannot please everyone

Game testers usually have more experience than focus group participants - their opinions should matter more Use them for more than bug hunting

QA has low status in the industry – game designers do not always listen or care No easy way to solve this Studios such as Lionhead are changing this

perspective in the industry

Game characters/avatars have: Visually developed Behaviour emphasises character theme Audio

Game characters/avatars often lack: Distinct, defined personalities Backgrounds In-depth integration into the game world

Of course there are exceptions!

“… Full character design, but with a necessarily one-dimensional personality so that the player can flesh out its motivations. The trick is to strike a balance between establishing the actor’s personality without letting that personality disturb the player” [Guard].

“At the end of the day, a game character shouldn't have anything more than superficial personality traits since, whatever the point of view, the player needs to retain as much control as possible.” [Rolling & Morris].

Question: Could characters with personalities, backgrounds etc. be useful in

Game design?

Hypothesis: Game characters with distinct personalities and backgrounds different from the player will prevent players from being entertained by

and utilize the characters

Follow-up questions: 1. If hypothesis is disproven, how broad the solution space for the design

of complex characters? 2. Furthermore, are there any character elements that should be avoided

in order not to alienate players?

Empirical testing of game character-player interaction

Focus on multi-player games, across digital/tabletop format Non-digital RPG Digital RPG Digital RPG with GM

Focus on Role-Playing Games (RPGs) – obvious target genre for character development

Potential problem: Need a measure for checking that players comprehend their characters (otherwise hypothesis cannot be disproven/proven)

Assumptions:

1.Laboratory conditions do not affect playstyle

2.Sample is representative of the population

3.Variables known and controlled

Character design Recall: Requirements on logical and

consistent approach

Approach: Used the EPAQ model as a foundation for

defining personality/background elements Integrated characters via game story Used popular D20 system to define

stats/rules-components All characters designed using same template

The EPAQ Model

Describes personality via adjectives/behaviors on a sliding scale

Agency-Communion form core of scale, both have positive and negative features E.g. degree of self-assuredness Degree of dependency on others

Unmitigated communion and unmitigated Agency are unhealthy extremes E.g. cannot be happy without others being happy Egoistical to the detriment of others

A character: Old-fashioned army lieutenant

C character: Compassionate reporter

UA character: Egomaniacal, cynical wartime cameraman

UC character:- Self-sacrificing politicianMIX character: Conflicted mascot of a major soft-drink

company

Obtaining player-character personality differences:

Obtained by comparing EPAQ point scores of characters and players

Across 4 components of EPAQ model (UA, UC, C, A)

Analyzed total of approx. 140 player-character pairs across 3 game setups

Measures:

FUN model [modified from Newman 2005] Multi-component measure of the quality of the gaming

experience. Includes immersion/engagement

SYMPA (player-character engagement) Experience (for each platform) Group Dynamics (player-player) Other questionnaires (various character-evaluation

aspects)

Coding of transcribed verbal & chat communication Game logs of chat and behavior Recording (audio-visual) of in-game behavior

Performed an initial pilot experiment Used to verify experiment setup and

procedure

Qualitative and quantitative methods used

Questionnaires, recordings, game logs, transcription and coding, interviews, focus groups …

Correlation, STDEV, multi-variate statistics, ANOVA, cluster methods, factor analysis

Data evaluation: Internal consistency of questionnaire constructs Variance across results Correlation of results

Qualitative: Interviews etc. Formed second venue of information, acting as

a qualifier on quantitative methods

No correlation between the personality of the character, that of the player and the FUN or SYMPA constructs

Indicates that (adult) players are not negatively impacted by playing characters with personalities different than their own (or similar to).

No indication that a complex character ruins the gaming experience in any of the three formats investigated

Hypothesis has been disproven

Interviews show players comprehended their characters (qualitative methods good for this type of problem)

Game formats impacts on the way that players utilize characters Games activate/promote different behavior and

character element use Players need to be prompted/have opportunity,

to activate the elements of characters in order to engage with these elements.

Broad character activation increases engagement with the gameplaying activity

Other results:

Player-character relationship is a key influence on the gaming experience Strong correlation between SYMPA and

FUN across formats

Experiment limitations:

Have looked at MPGs, not SPGs Have looked at RPGs – not FPS or similar games

Not a comparative study!• Conclusions do not say that characters with

complex psychologies are better than characters without

Next iteration of experiments could ideally be a comparative study

Recommended