47
Text-to-Picture Synthesis Xiaojin Zhu Department of Computer Sciences University of Wisconsin–Madison Joint work with Chuck Dyer, Andrew Goldberg, Mohamed Eldawy, Bradley Strock, Lijie Heng, Art Glenberg (LTI seminar) Text-to-Picture Synthesis 1 / 46

Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Text-to-Picture Synthesis

Xiaojin Zhu

Department of Computer SciencesUniversity of Wisconsin–Madison

Joint work with Chuck Dyer, Andrew Goldberg,Mohamed Eldawy, Bradley Strock, Lijie Heng, Art Glenberg

(LTI seminar) Text-to-Picture Synthesis 1 / 46

Page 2: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Outline

prior work

our first system

our second system

one application

(LTI seminar) Text-to-Picture Synthesis 2 / 46

Page 3: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Humans switch modalities

(LTI seminar) Text-to-Picture Synthesis 3 / 46

Page 4: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Computers switch modalities, too

(LTI seminar) Text-to-Picture Synthesis 4 / 46

Page 5: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Text-to-Picture (TTP) synthesis

Convert general natural language text into meaningful pictures.

The girl rides the bus toschool in the morning.

(LTI seminar) Text-to-Picture Synthesis 5 / 46

Page 6: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Applications of Text-to-Picture

Literacy development: young children, 2nd language speakers

Assistive devices: people with learning disability

Universal language

Document summarization

Image authoring tool

(LTI seminar) Text-to-Picture Synthesis 6 / 46

Page 7: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Three qualities of a Text-to-Picture system

(LTI seminar) Text-to-Picture Synthesis 7 / 46

Page 8: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Prior work 1: “Writing with Symbols”

Rebus symbols (www.widgit.com)

Writing with Symbols (www.mayer-johnson.com)

(LTI seminar) Text-to-Picture Synthesis 8 / 46

Page 9: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Prior work 2: CarSim

A bus accident in southern Afghanistan last Thursdayclaimed 20 victims. Additionally, 39 people were injuredin the accident, which occurred early Thursday morningtwenty kilometers north of the city Kandahar. The bus wason its way from Kandahar towards the capital Kabul when itleft the road while overtaking and overturned, said generalSalim Khan, assistant head of police in Kandahar. Thestate of some of the injured was said to be critical.

[Johansson, Berglund, Danielsson and Nugues. IJCAI 2005]

(LTI seminar) Text-to-Picture Synthesis 9 / 46

Page 10: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Prior work 3: WordsEye

The lawn mower is 5 feet tall. John pushes the lawn mower.The cat is 5 feet behind John. The cat is 10 feet tall.

[Coyne and Sproat. SIGGRAPH 2001] (www.wordseye.com)

(LTI seminar) Text-to-Picture Synthesis 10 / 46

Page 11: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Our TTP system

Our TTP system should handle general text automatically.

(LTI seminar) Text-to-Picture Synthesis 11 / 46

Page 12: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Approaches to Text-to-Picture

1 “Canned” pictures

2 Model-based

3 Concatenative (our system)

First the farmer giveshay to the goat. Thenthe farmer gets milk fromthe cow.

(LTI seminar) Text-to-Picture Synthesis 12 / 46

Page 13: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Components of our TTP systems

1 Keyphrase selection

2 Image selection

3 Layout

4 Evaluation

(LTI seminar) Text-to-Picture Synthesis 13 / 46

Page 14: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Our first TTP system

(LTI seminar) Text-to-Picture Synthesis 14 / 46

Page 15: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Step 1: Keyphrase selection

Problem: decide which words to draw.

TextRank keyword summarization [Mihalcea and Tarau 2004].

Graph on nouns, proper nouns, adjectives.

Edges for word co-occurrence.

Random walk on the graph.

Stationary distribution (PageRank) as word importance.

Important difference: word picturability used for teleportingprobability.

(LTI seminar) Text-to-Picture Synthesis 15 / 46

Page 16: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Word picturabilityWe want to select picturable words.

107

108

109

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

comparison

conjunction

curiosity

deceitdignity interpretation

regrets

relaxation

topictrust

airplane

nebula

cat

heel

lemon

nail

pants

president

teacher

tunnel

web page hit

(imag

e hi

t) /

(web

pag

e hi

t)

(LTI seminar) Text-to-Picture Synthesis 16 / 46

Page 17: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Word picturability model

Labels provided by five human annotators0 1 0 0 0 writ1 1 1 1 1 yolks1 1 1 1 1 zebras1 0 1 0 1 zigzag

253 candidate features from Google, Yahoo!, Flickr search

Best feature: x = log(Google image hits/Google Web page hits)Logistic regression with forward feature selection

Pr(picturable|x) = 1/ (1 + exp(−2.78x− 15.4))

(LTI seminar) Text-to-Picture Synthesis 17 / 46

Page 18: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Step 2: Image selection

Problem: find the best image for a word.

Collect top image search results

Segmentation

Cluster image segments by color

Select the image containing the segment at the center of the largestcluster.

(LTI seminar) Text-to-Picture Synthesis 18 / 46

Page 19: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Step 3: Image layout

Problem: put the images together.

minimum overlap

important words at center

close in text, close in picture

Stochastic optimization.

(LTI seminar) Text-to-Picture Synthesis 19 / 46

Page 20: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Step 4: Evaluation

The large chocolate-colored horse trotted in the pasture.

The brown horse runs in the grass.

(LTI seminar) Text-to-Picture Synthesis 20 / 46

Page 21: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Evaluation

(reference) The large chocolate−colored horse trotted in the pasture.

(user) The brown horse runs in the grass.

1 0.910.9 1 1 0.9

synonyms

greedy word alignment

F-measure from precision and recall

(LTI seminar) Text-to-Picture Synthesis 21 / 46

Page 22: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

User study

story text original illustration↓

TTP picture ↓↓

user text (TTP) user text (illustration)↓ ↓

F-score (TTP) F-score (illustration)

(LTI seminar) Text-to-Picture Synthesis 22 / 46

Page 23: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

User study results

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

TTP F−score

Illus

trat

ion

F−

scor

e

0 0.05 0.1 0.150

0.05

0.1

0.15

TTP F−score

Pho

to F

−sc

ore

TTP vs. original illustration News photo vs. TTP + photo6 users, 40 stories 8 users, 20 stories(children’s story) (news)

(LTI seminar) Text-to-Picture Synthesis 23 / 46

Page 24: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Our first system is far from perfect

The girl loved the dog. Thegirl loved the dog’s softeyes and warm nose and bigpaws. The girl wished shehad a dog.

“A girl’s pet puts its paw on her nose.”“The dog walked up to the girl and sniffed her.”“The dog bit the girl in her nose and ran away.”“The girl’s nose smelled the dog and monkey as they walked away.”“The girl walked her dog and saw a hairy man with a big nose.”“The girl monkey nose smells dog paw prints.”

(LTI seminar) Text-to-Picture Synthesis 24 / 46

Page 25: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Our second TTP system

(LTI seminar) Text-to-Picture Synthesis 25 / 46

Page 26: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

ABC layout

Inspired by pilot user study

3 positions and an arrow

Positions ≈ semantic rolesI A = “who”I B = “what action” / “when”I C = “to whom” / “for what”

Function words omitted

Advantages

Structure helps disambiguate icons (verb vs. noun)

Learnable by casting as a sequence tagging problem

(LTI seminar) Text-to-Picture Synthesis 26 / 46

Page 27: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

ABC layout prediction as sequence tagging

Given input sentence, assign {A, B, C, O} tags to words

The girl rides the bus to school in the morningO A B B B O C O O B

(LTI seminar) Text-to-Picture Synthesis 27 / 46

Page 28: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Obtaining training data for layout predictor

Web-based “pictionary”-like tool to create ABC layouts for571 sentences from school texts, children’s books, news headlinesFor 48 texts, 3 annotators: tag agreement = 77%, Fleiss’ kappa = 0.71

(LTI seminar) Text-to-Picture Synthesis 28 / 46

Page 29: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Chunking by Semantic Role Labeling

Note: We actually work at chunk level; word level is too fine-grained.

Obtain semantically coherent chunks as basic units in the pictures

Assign PropBank semantic roles using ASSERT [Pradhan et al. 2004]

We use SRL as is—used model provided with ASSERT

PropBank roles define chunks to be placed in layout

Example:

The boy gave the ball to the girl yesterday↑ ↑ ↑ ↑ ↑

Arg0 Target Arg1 Arg2 ArgM-TMP

(LTI seminar) Text-to-Picture Synthesis 29 / 46

Page 30: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Sequence tagging with linear-chain CRFs

Goal: Tag each chunk with a label in {A,B,C,O}

Input: Chunk sequence x and features

Output: Most likely tag sequence y

y = A B B C Bx = The boy gave the ball to the girl yesterday

↑ ↑ ↑ ↑ ↑Arg0 Target Arg1 Arg2 ArgM-TMP

Note: Each chunk described by PropBank and other features

(LTI seminar) Text-to-Picture Synthesis 30 / 46

Page 31: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Sequence tagging with linear-chain CRFs

Probabilistic model:

p(y|x) =1

Z(x)exp

|x|∑t=1

K∑k=1

λkfk(yt, yt−1,x, t)

,

Different factorizations of λkfk(yt, yt−1,x, t):Model 1: Tag sequence ignored; one weight for each tag-feature

Model 2: HMM-like; weights for transitions and emissions

Model 3: General linear-chain; one weight per tag-tag-feature

(LTI seminar) Text-to-Picture Synthesis 31 / 46

Page 32: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

CRF Features

Binary predicate features evaluated for each SRL chunk1 PropBank role label of the chunk

I e.g., Arg0? Arg1? ArgM-LOC?

2 Part-of-speech tags of all words in the chunkI e.g., Contains JJ? NNP? RB?

3 Features related to the type of phrase containing the chunkI e.g., NP? PP? Is the chunk inside a VP?

4 Lexical features: 5000 frequent words and WordNet supersensesI e.g., Contains ’girl’? ’pizza’? verb.consumption?

(LTI seminar) Text-to-Picture Synthesis 32 / 46

Page 33: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

CRF Experimental Results

To choose model and CRF’s regularization parameter, ran 5-fold crossvalidation

10−1

100

101

0.71

0.72

0.73

0.74

0.75

0.76

0.77

0.78

Variance

Acc

urac

y an

d F

1

AccuracyF1Model 1Model 2Model 3

Best accuracy and macro-avg F1 achieved with Model 3, σ2 = 1.0Accuracy is similar to that of human annotators

(LTI seminar) Text-to-Picture Synthesis 33 / 46

Page 34: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

User Study: Is ABC layout more useful than linear layout?

Subjects: 7 non-native English speakers, 12 native speakers90 test sentences from important TTP application domainsEach subject saw 45 linear pictures and 45 ABC pictures

User study overall protocol

original sentence↓

SymWriter icons↙ ↘

ABC layout Linear layout↓ ↓

user text user text↓ ↓

BLEU/METEOR BLEU/METEOR(ABC) (Linear)

(LTI seminar) Text-to-Picture Synthesis 34 / 46

Page 35: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Sample picture and guesses: Linear layout

“we sing a song about a farm.”“i sing about the farm and animals”“we sang for the farmer and he gave us animals.”“i can’t sing in the choir because i have to tend to the animals.”

(LTI seminar) Text-to-Picture Synthesis 35 / 46

Page 36: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Sample picture and guesses: ABC layout

“they sing old mcdonald had a farm.”“we have a farm with a sheep, a pig and a cow.”“two people sing old mcdonald had a farm”“we sang old mcdonald on the farm.”

Original: We sang Old MacDonald had a farm.

(LTI seminar) Text-to-Picture Synthesis 36 / 46

Page 37: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Sample picture and guesses: ABC layout

“they sing old mcdonald had a farm.”“we have a farm with a sheep, a pig and a cow.”“two people sing old mcdonald had a farm”“we sang old mcdonald on the farm.”

Original: We sang Old MacDonald had a farm.

(LTI seminar) Text-to-Picture Synthesis 36 / 46

Page 38: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Results of user study

ABC layout allows non-native speakers to recover more meaning

However, the linear layout is better for native speakersI Familiar with left-to-right structure of EnglishI Can guess the meaning of obscure function-word icons

More complex layout does not require additional processing time

(LTI seminar) Text-to-Picture Synthesis 37 / 46

Page 39: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

One application:Improving children’s reading comprehension

(LTI seminar) Text-to-Picture Synthesis 38 / 46

Page 40: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Physical activity can enhance young children’s readingcomprehension

The Indexical Hypothesis (Glenberg, Gutierrez and Levin 2004)I Young readers may not “index” (map) words to objectsI Consequently, they fail to derive meaning from textI New instructional method: manipulating toys according to text

Physical manipulation results in better memory for andcomprehension of the text.

But: pain of real toys

(LTI seminar) Text-to-Picture Synthesis 39 / 46

Page 41: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Computer manipulation

Will manipulating images on a computer have the same effect asmanipulating physical toys?

Computer images generated manually (TTP in the future).

53 1st and 2nd grade children.

Three conditions:I physical manipulation (PM)I computer manipulation (CM)I re-read without manipulation (CR)

Memory/comprehension test after each story.

(LTI seminar) Text-to-Picture Synthesis 40 / 46

Page 42: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Example story

http://www.cs.wisc.edu/zhu/space2/psych/new/interface.php

(LTI seminar) Text-to-Picture Synthesis 41 / 46

Page 43: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

CM ≥ PM ≥ CR

Measure: proportion correct for the memory/comprehension testquestions

Condition N Correct

CM 20 .89± .06PM 14 .84± .11CR 19 .80± .10

CM>CR significant at p = 0.01CM as good as PM: opens up many doors

(LTI seminar) Text-to-Picture Synthesis 42 / 46

Page 44: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Conclusions

1 Text-to-Picture is an interesting and complex research topic.

2 We have some preliminary ideas, much still needs to be done.

Funding acknowledgment: NSF IIS-0711887, Wisconsin Alumni ResearchFoundation.

(LTI seminar) Text-to-Picture Synthesis 43 / 46

Page 45: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Backup Slides

(LTI seminar) Text-to-Picture Synthesis 44 / 46

Page 46: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

Why not use manual rules from PropBank to ABC?

PropBank roles are verb-specific

Arg0 is typically the agent, but Arg1, Arg2, etc. do not generalize

For example, Arg1 can map to either B or C:

BobArg0 → SueArg2

gaveTarget

bookArg1

BobArg0 → carArg1

droveTarget

Other issues

Best position of modifiers like ArgM-LOC depends on usage

Sentences with multiple verbs need special treatment

Bottom line

Mapping from semantic roles to layout positions is non-trivial!

(LTI seminar) Text-to-Picture Synthesis 45 / 46

Page 47: Text-to-Picture Synthesispages.cs.wisc.edu/~jerryzhu/pub/ttpCMU08.pdf · XK k=1 λ kf k(y t,y t−1,x,t) , Different factorizations of λ kf k(y t,y t−1,x,t): Model 1: Tag sequence

CRF Experimental Results

Relative importance of the types of features

Lexical > PropBank labels > phrase tags > part-of-speech tags

Learned feature weights make intuitive sense

Preferred tag transitions: A → B, B → C

Preferred in A: noun phrases (not nested in verb phrase)

Preferred in B: verbs and ArgM-NEGs

Preferred in C: supersense noun.objects, Arg4s, and ArgM-CAUs

Error analysis reveals similar mistakes as human annotators. Accuracy issimilar to inter-annotator agreement.

Conclusion

The CRF model can predict the layouts about as well as humans.

(LTI seminar) Text-to-Picture Synthesis 46 / 46