61
Benjamin Good*, Salvatore Loguercio, Max Nanis, Andrew Su The Scripps Research Institute http://genegames.org/cure/ Rocky 2013 THE CURE: A GAME WITH THE PURPOSE OF GENE SELECTION FOR BREAST CANCER SURVIVAL PREDICTION

The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

  • Upload
    goodb

  • View
    650

  • Download
    0

Embed Size (px)

DESCRIPTION

Keynote Presentation for Rocky Bioinformatics conference 2013. Its about http://genegames.org/cure/

Citation preview

Page 1: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

Benjamin Good*, Salvatore Loguercio, Max Nanis, Andrew Su

The Scripps Research Institute

http://genegames.org/cure/

Rocky 2013

THE CURE: A GAME WITH THE PURPOSE OF GENE SELECTION FOR BREAST CANCER

SURVIVAL PREDICTION

Page 2: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

A QUESTION

How would you get 150 PhD level scientists to work together on the same problem?

Without any money?

Page 3: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAIL MAP

Games Survival Prediction

The Cure

Page 4: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

WHY GAMES?

It is estimated that 9 billion hours are spent playing Solitaire every year

Luis Von Ahn. : Google Tech Talk: Human Computation 2006. (Shortly after receiving $500,000 ‘Genius Grant’ for this work)

Page 5: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

Seven million hours of human labor

Empire State Building

ONE YEAR SOLITAIRE = 1,285 EMPIRE STATE BUILDINGS

Page 6: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

McGonigal J. Reality is broken : why games make us better and how they can change the world. New York: Penguin Press; 2011.

What if we could use a tiny fraction of that human effort to achieve another purpose?

empir

e stat

e build

ing

one y

ear o

f solita

ire

one y

ear o

f gam

es

7M 9B 150B

150 billion hours gaming each year

Page 7: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PURPOSES

Label all images on the Web

Find objects inside images

Teach computers English

Tag songs

Rate image quality

Computer science

Build ontologies

Tag Malaria parasites in blood smears

Map connections between neurons Align DNA and

protein sequences

Assemble genomes

Design RNA molecules

Figure out how proteins fold

Biology

Link genes with diseases

Develop better treatments for breast cancer

Page 8: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GAMES WITH A PURPOSE

The Cure

MOLT

Page 9: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAIL MAP

Games Survival Prediction

The Cure

Page 10: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

10 year survival?

find patterns

INFERRING SURVIVAL PREDICTORS

No

van't Veer, Laura J., et al. "Gene expression profiling predicts clinical outcome of breast cancer.” Nature 415.6871 (2002): 530-536.

Yes make predictions on new samples

No

Yes

10 year survival?

Page 11: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

find patterns make predictions

INFERRING SURVIVAL PREDICTORS

1) select genes

2) infer predictor from data (e.g. decision tree, SVM, etc.)

Out of the 25,000+ genes, which small set works together the best?

No

Yes

10 year survival?

Page 12: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PROBLEM: GENE SELECTION INSTABILITY

instability: different methods, different datasets produce different gene sets for the same phenotype [1]

[1] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine 5.10 (2013).

Page 13: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PROBLEM: THE VALIDATION GAP

training data, test data

validation

validation: predictive signatures often perform worse on independent data created for validation.

Photograph by Richard Hallman, National Geographic Adventure Blog

Page 14: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

find patterns

make predictions

ADDING PRIOR KNOWLEDGE TO THE DISCOVERY ALGORITHM

<10 yr survival

>10 yr survival

Page 15: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

EX.) NETWORK GUIDED FORESTS

Use network to find good gene combinations

Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology

Page 16: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

BUT MOST KNOWLEDGE IS NOT STRUCTURED

2000200120022003200420052006200720082009201020112012

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

1000000

Number ar-ticles added to PubMed

112 publications/hour(37 more by the end of this talk)

>160,000 publications linked to “breast cancer” since 2000 http://tinyurl.com/brsince2000

Page 17: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

HOW CAN WE USE UNSTRUCTURED KNOWLEDGE FOR GENE SELECTION?

Need an intelligent system that is good at reading and hypothesizing

Like you

Page 18: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAIL MAP

Games Survival Prediction

The Cure

Page 19: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

THE CURE HTTP://GENEGAMES.ORG/CURE/

Page 20: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

education level?

cancer knowledge?

biologist?

Page 21: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction
Page 22: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PLAY = GENE SELECTION

Alternate turns picking a gene from a “board” of 25

Your hand

Opponents hand

Page 23: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

SCORING

Cure Server

Score reflects accuracy of decision tree created with just the selected genes on real training data

Page 24: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PLAY WITH KNOWLEDGE: GENE ONTOLOGY

Page 25: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PLAY WITH KNOWLEDGE: GENE RIFS

Page 26: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

YOU WIN!

Page 27: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction
Page 28: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

COMMUNITY BOARD VIEW, CHOOSE OPEN BOARD

You beat this one

The community finished this board (e.g. 11 different players completed it)

This board is still open

Page 29: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

BOARDS

• 25 genes each

• randomly selected from 1,250 genes that passed an unsupervised filter for minimum expression level and variance for a particular dataset [1],[2]

• 4 different 100 board rounds completed, each with some overlap

• 3731 distinct genes used in the game

[1] Curtis, Christina, et al. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups." Nature (2012)[2] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine (2013)

Page 30: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PLAYERS

Sep-12

Oct-12

Nov-12

Dec-12Ja

n-13

Feb-13

Mar-13

Apr-13

May-13

Jun-1

3Ju

l-13

Aug-13

0

50

100

150

200

250

OtherDid not statenoneBAMScMDPhD

New player registra-tions

Sep-12

Oct-12

Nov-12

Dec-12Ja

n-13

Feb-13

Mar-13

Apr-13

May-13

Jun-1

3Ju

l-13

Aug-13

00.05

0.10.15

0.20.25

0.30.35

0.4

%PhD

http://io9.com/these-cool-games-let-you-do-real-life-science-486173006

1,077 Players registered (one year)

Sage DREAM7 challenge, game announcement

Page 31: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PLAYER DEMOGRAPHICS

no ns yes0

100200300400500600700

Cancer knowl-edge?

no ns yes0

100200300400500600700800

Are you a Biologist?

graduate_degree

undergraduate

none

bachelors

master

s mdnon

e nsothe

rphd

050

100150200250300350

Most recent degree

Page 32: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GAMES PLAYED • 9,904 games (non training)

0 100 200 300 400 500 600 700 8001

10

100

1000

Total games played per player

Player

Total games played

PhD

0 5 10 15 20 250

100

200

300

400

500

600

700

800

games played, top 20 players

PhD

MD

MSPhD

Page 33: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENE RANKINGS FROM GAMES

find patterns

make predictions

<10 yr survival

>10 yr survival

Page 34: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENE RANKINGS FROM GAMES• For each gene:

1. O = number of times it appeared in a game (some genes occur on multiple boards, all boards are played multiple times, all occurrences are counted)

2. S = number of times it was selected by a player

3. F = S/0

• Games can be filtered based on player data

• We can estimate an empirical P value for each value of O, S

• P reflects the chances of getting S or more by chance given O

Examples (all games):

• B-cell lymphoma 2 gene:

O = 13, S = 10, F = 10/13 = 0.77, P < 0.0001

• Alanine and arginine rich domain containing protein:

O = 33, S = 3, F = 3/33 = 0.09, P = 0.91

Page 35: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENES SELECTED BY ALL PLAYERS9904 GAMESP<0.001, 60 GENES

Top 10 enriched disease annotations n genes

adj. P < 2.43e-06background = 3731 genes used in any game

Top 10 genes

Wang, Jing, et al. "WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013." Nucleic acids research (2013).

Page 36: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENES SELECTED BY PEOPLE: WITH PHDS WITH KNOWLEDGE OF CANCER,

2373 GAMES P<0.001, 82 GENES

Top 10 genes

Top 10 enriched disease annotations n genes

adj. P < 5.76e-08

“Expert Gene Set”

Page 37: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENES SELECTED BY PEOPLE: WITHOUT PHDS, WITH NO KNOWLEDGE OF CANCER, THAT ARE NOT BIOLOGISTS

3607 GAMESP<0.001 , 10 GENES

• Gene set not significantly enriched with any disease annotations

Top 10 genes

Page 38: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

SELF REPORTING SEEMED TO WORK...

Page 39: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

EVEN WITHOUT FILTERING, THE DATA CONTAINS THE KNOWLEDGE• “All Players” still contained significant cancer signal.

Page 40: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PROBLEM: GENE SELECTION INSTABILITY

instability: different methods, different datasets produce different gene sets for the same phenotype [1]

[1] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine 5.10 (2013).

Page 41: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENE SET OVERLAPS, SOME BUT NOT MUCH

http://bioinformatics.psb.ugent.be/webtools/Venn/

“Expert Gene Set”

Page 42: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

PROBLEM: THE VALIDATION GAP

training data, test data

validation

validation: predictive signatures often perform worse on independent data created for validation.

Photograph by Richard Hallman, National Geographic Adventure Blog

Page 43: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

CLASSIFIER PERFORMANCE WITH DIFFERENT GENE GROUPS, DIFFERENT DATASETS

X-axis Test Set performance Griffith 2013 data

Y-axis Test Set performanceMetabric training Oslo Test

Only difference between points, are the genes used to build SVM classifier

10 year survivalYes

No

“Expert Gene Set”

Page 44: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

SUMMARYPlusses

• 1 year

• 1,000 players, 150 PhDs

• 10,000 games

• “expert knowledge” captured through an open game

• New gene ranking method with results competitive with established approaches

• Game is now in use in an undergraduate class

Minuses

• Did not make a significantly better breast cancer survival predictor

• Game could have been better in many ways

• no beginning, middle or end

• random guessing can win

• easy to cheat

Page 45: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

NEXT STEPS • More fun

• More learning for novices

• More control for experts

• More data

Page 46: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

THE END

More information at:http://genegames.org/cure/[email protected]@bgood

Thanks to:

Players!!!!Andrew SuSalvatore LoguercioMax NanisKarthik Gangavarapu

We are hiring! Looking for postdocs, programmers interested in crowdsourcing and bioinformatics. Contact: [email protected]

Page 47: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GAMES WITH A PURPOSE

The Cure

MOLT

Loguercio, Salvatore, et al. "Dizeez: an online game for human gene-disease annotation." PloS One (2013)

Khatib, Firas, et al. "Algorithm discovery by protein folding game players." Proceedings of the National Academy of Sciences (2011)

of collecting expert level knowledge

Page 48: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

HUMAN GUIDED FOREST (HGF)

http://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html

Let CURE players build decision modules

Page 49: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

WHY DID YOU SIGN UP? (83 RESPONSES)

To help breast cancer research

To learn something To have fun playing a game0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

Why did you sign up for The Cure? (select all that apply)

Page 50: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

WAS THE GAME FUN?

Yes, it was very fun A little bit entertaining No, not at all0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

perc

ent

Page 51: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

DO YOU KNOW ANYONE THAT HAS OR HAD BREAST CANCER?

Have you known or do you currently know anyone that has or has had breast cancer?

YesNo

Page 52: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

DID YOU LEARN ANYTHING FROM PLAYING?

Yes, I felt like I learned a lot Yes, I learned a little bit No, I did not learn anything0

10

20

30

40

50

60

Page 53: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

MY KNOWLEDGE OF BREAST CANCER IS:

I am an

expe

rt in b

reast c

ancer

I have

helpe

d con

duct c

ancer

resea

rch ias

part o

f my jo

b

I know

some b

iology

and h

ave so

me und

erstan

ding o

f wha

t cance

r is

I know

a littl

e biolo

gy, bu

t noth

ing sp

ecific

to can

cer

Nothing

, I do

not kn

ow a

thing a

bout

it0

0.1

0.2

0.3

0.4

0.5

0.6

Page 54: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

AGE?

Which category below includes your age?

17 or younger18-2021-2930-3940-4950-5960 and above

Page 55: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

GENDER?

What is your gender?

FemaleMale

Page 56: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAINING LEVELS

Page 57: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

the decision tree created using the feature “makes milk” is 100% correct on training data, you win!

Page 58: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAINING INTERFACE

Choose the feature that best distinguishes mammals from other creatures

Page 59: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

TRAINING INTERFACE

the decision tree created using the feature “has hair” is 94% correct on training data, you win!

Page 60: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

OVERLAP OF SIGNIFICANT GENE SETS FROM DIFFERENT CURE GAME FILTERS

No Expertise (3,607 games)PhD & Cancer Knowledge (2,373 games)

Biologist (4,913 games)

PhD or MD (3,070 games)

Cancer Knowledge (4,660 games)

Page 61: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction

MOST RANDOM GENE EXPRESSION SIGNATURES ARE SIGNIFICANTLY ASSOCIATED WITH BREAST CANCER OUTCOME

Venet et al.(2011). PLoS Comp. Bio.

Still need to pick gene setsFeature selection challenge still relevant Very useful grain of salt in interpreting these results..