69
Semantic Parsing for Cancer Panomics Hoifung Poon 1

Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Semantic Parsing for

Cancer Panomics

Hoifung Poon

1

Page 2: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Overview

2

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data

Page 3: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Overview

3

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KB

Infer cancer driver

mutations

High-Throughput Data

Page 4: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

4

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

…KB

Extract Pathways

from Pubmed

Overview

High-Throughput Data

Grounded

Unsupervised

Semantic Parsing

Page 5: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Collaborators

5

David Heckerman

Tony Gitter Lucy Vanderwende

Kristina Toutanova Chris Quirk

Ankur Parikh

Page 6: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Precision Medicine

Page 7: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

7

Before Treatment 15 Weeks

Vemurafenib on BRAF-V600 Melanoma

Page 8: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Vemurafenib on BRAF-V600 Melanoma

8

Before Treatment 15 Weeks 23 Weeks

Page 9: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

9

Page 10: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Traditional Biology

10

Targeted Experiments Discovery

One

hypothesis

Page 11: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Genomics

11

High-Throughput ExperimentsDiscovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

Many

hypotheses

?

Page 12: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC … Healthy

Disease(e.g., Alzheimer, Cancer)

Genome-Wide Association Studies (GWAS)

2000

2010

“Genetic diagnosis of diseases would be

accomplished in 10 years and that

treatments would start to roll out perhaps

five years after that.”

“A Decade Later, Genetic Maps Yield Few New Cures”

New York Times, June 2010.

12

Page 13: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Key Challenges

Human genome: 3 billion base pairs

Potential variations: > 10 million mutations

Combination: > 101000000 (1 million zeros)

Machine learning problem

Atomic features: > 10 million

Feature combination: Too many to enumerate

13

Page 14: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Genomics

14

Discovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

How to Scale Discovery?

High-Throughput Experiments

Page 15: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Cancer

Hundreds of mutations

Most are “passenger”, not driver

Can we identify likely drivers?

15

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC … Normal cells

Tumor cells

Page 16: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Panomics

16

… ATTCGGATATTTAAGGC …

Genome Transcriptome Epigenome

……

Page 17: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Pathway Knowledge

Genes work synergistically in pathways

17

Page 18: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Why Hard to Identify Drivers?

Complex diseases Synergistic perturbation

of multiple pathways

Cancer: 6 8 “hallmarks”

Promote growth

Avoid suicide

Evade immune attack

Induce blood vessels

Invade neighboring tissues

18

Page 19: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

19Hanahan & Weinberg [Cell 2011]

Page 20: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

20

EphA2 EphB2

Ovarian Cancer

Page 21: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

21

EphA2 EphB2

Ovarian Cancer

X

Page 22: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

A Grammar of Cancer?

Cancer Anti-Apoptosis & ProGrowth & …

Anti-Apoptosis Deactivate TP53

Anti-Apoptosis Activate BCL-2

22

Page 23: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Infer Cancer Driver Mutations

23

Gene A DNA mRNA Protein Protein Active

Transcription Translation Activation

… ATTCGGATATTTAAGGC …

What’s the level of activity?

Is change caused by mutation?

Page 24: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

24

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Pathway Knowledge

Page 25: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

25

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Pathway Knowledge ?

Page 26: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

26

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Pathway Knowledge ?

Page 27: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

27

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Pathway Knowledge !

Page 28: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Approach: Graph HMM

28

Gene A DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Page 29: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Extract Pathways from Pubmed

29

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data

Page 30: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

PubMed

22 millions abstracts

Two new abstracts every minute

Adds 2000-4000 every day

30

Page 31: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

VDR+ binds to

SMAD3 to form

JUN expression

is induced by

SMAD3/4

PMID: 123

PMID: 456

……

31

Extract Pathways from Pubmed

Page 32: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

32

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocytegp41 p70(S6)-kinase

activation

Extract Complex Knowledge

Page 33: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

33

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocytegp41 p70(S6)-kinase

activation

Extract Complex Knowledge

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Page 34: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

34

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Extract Complex Knowledge

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Page 35: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

35

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Extract Complex Knowledge

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Semantic Parsing

Page 36: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Bottleneck: Annotated Examples

GENIA (BioNLP Shared Task 2009-2013)

1999 abstracts

MeSH: human, blood cell, transcription factor

Can we breach the annotation bottleneck?

36

Page 37: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Free Lunch #1:

Distributional Similarity

Similar context Probably similar meaning

Annotation as latent variables

Textual expression Recursive clusters

Unsupervised semantic parsing

37

Poon & Domingos, “Unsupervised Semantic Parsing”.

EMNLP-2009 (Best Paper Award).

Page 38: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Problem Formulation

Dependency tree Semantic parse

Probability

Parsing

Learning

38

Prior: Favor fewer parameters

Page 39: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Free Lunch #2:

Existing KBs

Many KBs available

Gene/Protein: GeneBank, UniProt, …

Pathways: NCI, Reactome, KEGG, BioCarta, …

Annotation as latent variables

Textual expression Table, column, join, …

Grounded unsupervised semantic parsing

39

Poon, “Grounded Unsupervised Semantic Parsing”. ACL-13.

Page 40: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Natural-Language Interface

to Database

Get flight from Toronto to San Diego stopping at DTW

SELECT flight.flight_id

FROM flight, city, city c2, flight_stop, airport_service, airport_service as2

WHERE flight.from_airport = airport_service.airport_code AND flight.to_airport =

as2.airport_code AND airport_service.city_code = city.city_code AND as2.city_code =

city2.city_code AND city.city_name = ‘toronto’ AND city2.city_name = ‘san diego’ AND

flight_stop.flight_id = flight.flight_id AND flight_stop.stop_airport = ‘dtw’

Answers40

Page 41: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Clusters KB Elements

Entity: Table, Column, Cell

Relation: Relational join

Priors:

Favor lexical similarity

Favor short relational joins

41

Page 42: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

42

Job ID Company System

001 IBM Unix

002 Roche IBM

003 Microsoft Windows

……

Prior: Favor Unix → System

Bootstrap learning

with lexical prior

JOB

Page 43: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

43

Flight ID From Airport ……

Flight

Airport Code Airport Name ……

Airport

Foreign Key

Page 44: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

44

Flight Airport

Page 45: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

45

Flight

Days Fare Airline

Airport

Page 46: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

46

Flight Airport

flight BWI

Days Fare Airline

?

Flight

Days Fare Airline

Airport

Page 47: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

GUSP: Key Ideas

Leverage target database

47

Prior: Favor shorter join

Leverage schema

to guide learningFlight

Days Fare Airline

Airport

flight BWI

Page 48: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Free Lunch #3:

Dependency Parses

Start from syntactic parse

Rich resources and available parsers

Intractable structure learning Tree HMM

Exact inference is linear-time

Need to handle syntax-semantics mismatch

48

Page 49: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Syntax-Semantics Mismatch

49

get

toronto

flight from to

diego

at

san stopping

dtw

Page 50: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

50

get

toronto

flight from to

diego

at

san stopping

dtw

Syntax-Semantics Mismatch

Page 51: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

51

get

toronto

flight from to

diego

at

san stopping

dtw

Syntax-Semantics Mismatch

Page 52: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

52

get

toronto

flight from to

diego

at

san stopping

dtw

Syntax-Semantics Mismatch

Page 53: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Introduce Complex States

Raising

Sinking

Implicit

53

Page 54: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Raising

54

get

toronto

flight from to

diego

at

san stopping

dtw

E:flight

E:flight:R

Page 55: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Sinking

get

toronto

flight from to

diego

at

san stopping

dtw55

E:flight:R

V:city.name + E:flight

Page 56: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Implicit

56

Give me the fare (of the flight) from Seattle to Boston

fare

E:fare

fare

E:fare + E:flight

Page 57: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Experiment: Dataset

ATIS

Questions and ATIS database

Dev. / Test: Follow ZC07 [Zettlemoyer & Collins 2007]

Gold SQLs: Use at evaluation only

Gold logical forms in ZC07: Not used

Evaluate on question-answering accuracy

57

Page 58: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Experiment: Systems

LEXICAL: Lexical-trigger prior only

Supervised learning

ZC07: Zettlemoyer & Collins [2007]

FUBL: Kwiatkowski et al. [2011]

GUSPSIMPLE: Simple states only

GUSP++: All states

58

Page 59: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Results

59

System Accuracy

ZC07 84.6

FUBL 82.8

GUSP++ 83.5

Page 60: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Ablation

60

System Variant Accuracy

LEXICAL 33.9

GUSPSIMPLE 66.5

GUSP++ 83.5

Raising 75.7

Sinking 77.5

Implicit 76.2

Page 61: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Pathway Extraction

More to leverage from KB:

Semantic relations in KB likely occur in

semantic parse of some sentence

Priors:

Favor a parse w. relations in KB

Penalize a parse w. relations not in KB

61

Page 62: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Distant-Supervision

Existing work: Binary relation, classification Mintz et al. [2009]

Riedel et al. [2010]

Hoffmann et al. [2011]

Krishnamurphy & Mitchell [2012]

Etc.

Our approach: Generalize distant supervision

to semantic parsing

62

Parikh, Poon, Toutanova. In progress.

Page 63: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

http://literome.azurewebsites.net

63

Literome

Poon et al., “Literome: PubMed-Scale Genomic Knowledge

Base in the Cloud”, Bioinformatics 2014.

Page 64: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

PubMed-Scale Extraction

Preliminary pass:

2 million instances

13,000 genes, 870,000 unique interactions

Applications:

UCSC Genome Browser, MSR Interactions Track

Cancer expression profile modeling

Validate de novo pathway prediction

Etc.

64

Page 65: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Big Mechanism

42-million program for 12 teams

Reading, Assembly, Explanation

Domain: Cancer signaling pathways

We are funded

PI: Andrey Rzhetsky

Co-PI w. James Evans, Ross King

65

Page 66: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

We Have Digitized Life

66

Page 67: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Next: Digitize Medicine

67

Knock down genes A, B, C → Cure

Page 68: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Summary

Precision medicine is the future

Infer cancer driver mutations

Graphical model: Pathways + Panomics data

Extract pathways from Pubmed

Semantic parsing grounded in KBs

Literome: KB for genomic medicine

68

Page 69: Semantic Parsing for Cancer Panomics - Yoav Artzi · 2020. 9. 17. · David Heckerman Tony Gitter Lucy Vanderwende Kristina Toutanova Chris Quirk Ankur Parikh. Precision Medicine

Summary

69

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data