16
Cross-Domain Scruffy Inference Kenneth C. Arnold Henry Lieberman MIT Mind Machine Project Software Agents Group, MIT Media Laboratory AAAI Fall Symposium on Common Sense Knowledge November 2010 Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 1 / 16

Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Cross-Domain Scruffy Inference

Kenneth C. Arnold Henry Lieberman

MIT Mind Machine ProjectSoftware Agents Group, MIT Media Laboratory

AAAI Fall Symposium on Common Sense KnowledgeNovember 2010

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 1 / 16

Page 2: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Vision

Informal (“scruffy”) inductive reasoning over non-formalizedknowledgeUse multiple knowledge bases without tedious alignment.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 2 / 16

Page 3: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

We’ve Done This. . .

ConceptNet and WordNet (Havasi et al. 2009)Topics and Opinions in Text (Speer et al. 2010)Code and Descriptions of Purpose (Arnold and Lieberman 2010)

but how does it work?

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 3 / 16

Page 4: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

This Talk

BackgroundBlending is Collective Matrix Factorization.Singular vectors rotate.Other blending layouts work too.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 4 / 16

Page 5: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Background

Matrix Representations of Knowledge

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 5 / 16

Page 6: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Background

Factored Inference

Filling in missing values is inference.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 6 / 16

Page 7: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Background

Factored Inference

Represent each concept i and each feature j by k -dimensionalvectors ~ci and ~fj such that when A(i , j) is known,

A(i , j) ≈ ~ci ·~fj .

If A(i , j) is unknown, infer ~ci ·~fj .Equivalently, stack each ~ci in rows of C, same for F , then

A ≈ CF T .

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 7 / 16

Page 8: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Collective Factorization

Quantifying factorization quality

Quantify the “≈” in A ≈ CF T as a divergence:

D(XY T |A)

Minimizing loss ensures that the factorization fits the dataMany functions possible, e.g., SVD minimizes squared error:

Dx2(A|A) =∑

ij

(aij − aij)2.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 8 / 16

Page 9: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Collective Factorization

Collective Matrix Factorization

An analogy. . .Let people p rate restaurants r , represented by positive ornegative values in ‖p‖ × ‖r‖ matrix A.Restaurants also have characteristics c (e.g., “serves vegetarianfood”, “takes reservations”, etc.), represented by matrix B.Incorporating characteristics may improve rating prediction.Use the same restaurant vector to factor preferences andcharacteristics:

A ≈ PRT B ≈ RCT

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 9 / 16

Page 10: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Collective Factorization

Collective Matrix Factorization

A ≈ PRT B ≈ RCT

(A is person by restaurant, B is restaurant by characteristics)

Collective Matrix Factorization (Singh and Gordon 2008) gives aframework for solving this type of problemSpread out the approximation loss:

αD(PRT |A) + (1− α)D(RCT |B)

At α = 1, factors as if characteristics were just patterns of ratings.At α = 0, factors as if only qualities, not individual restaurants,mattered for ratings.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 10 / 16

Page 11: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Collective Factorization

Blending is a CMF

A ≈ PRT B ≈ RCT

(A is person by restaurant, B is restaurant by characteristics)

Can also solve with Blending:

Z =

[αAT

(1− α)B

]≈ R

[PC

]T

If decomposition is SVD, loss is seperable by component:

D

(R[PC

]T

|Z)

= D(RPT |αAT ) + D(RCT |(1− α)B)

⇒ Blending is a kind of Collective Matrix Factorization

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 11 / 16

Page 12: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Blended Data Rotates the Factorization

Veering

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 12 / 16

Page 13: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Blended Data Rotates the Factorization

Blended Data Rotates the Factorization

What happens at an intersection point?Consider you’re blending X and Y . Start with X ≈ ABT ; whathappens as you add in Y?First add in the new space that only Y covered.Now data is off-axis, so rotate the axes to align with the data.

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1.0

−0.5

0.0

0.5

1.0a = 1, θ = 0π

a = 1, θ = 0.25π

a = 1, θ = 0.5π

a = 0, θ = 0.5π

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 13 / 16

Page 14: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Blended Data Rotates the Factorization

Veering

“Veering” is caused by singular vectors of the blend rotating betweencorresponding singular vectors of the source matrices.

0.0 0.5 1.0 1.5 2.0α

0.0

0.5

1.0

1.5

2.0

2.5

sing

ular

valu

es

θU = 1.1

θU = 0.26

θU = 1.3

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 14 / 16

Page 15: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Layout Tricks

Bridge Blending

EnglishConceptNet

FrenchConceptNet

English Features French Concepts

Fre

nch

 Fea

ture

sEngl

ish C

once

pts

X

Y

En ↔ Fr

Dictionary

T

T

General bridge blend:[X Y

Z

]≈[UXYU0Z

] [VX0VYZ

]T

=

[UXY V T

X0 UXY V TYZ

U0Z V TX0 U0Z V T

YZ

]Again, loss factors:

D(A|A) =D(UXY V TX0|X ) + D(UXY V T

YZ |Y )+

D(U0Z V TX0|0) + D(U0Z V T

YZ |Z )

VYZ ties factorization of X and Z togetherthrough bridge data Y .Could use weighted loss in empty corner.

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 15 / 16

Page 16: Cross-Domain Scruffy Inferencealumni.media.mit.edu/~kcarnold/scruffy-inference/... · Incorporating characteristics may improve rating prediction. Use the same restaurant vector to

Summary

Summary

Blending is a Collective Matrix Factorization“Veering” indicates singular vectors rotating between datasets

What’s next?CMF permits many objective functions, even different ones fordifferent input data. What’s appropriate for commonsenseinference?Incremental?Can CMF do things we thought we needed 3rd-order for?

Kenneth C. Arnold, Henry Lieberman (MIT) Cross-Domain Scruffy Inference CSK 2010 16 / 16