Matrix Completion with Queries - Semantic ScholarMatrix Completion with Queries Natali Ruchansky,...

Preview:

Citation preview

Property of Natali Ruchansky

Matrix Completion with Queries

Natali Ruchansky, Mark Crovella, Evimaria Terzi

Property of Natali Ruchansky

Can you guess the picture?

Property of Natali Ruchansky

3

What about now?

Property of Natali Ruchansky

And now?

4

Property of Natali RuchanskySalvador Domingo Felipe Jacinto Dalí i Domènech

5

Property of Natali Ruchansky

How did you do it?

For most there is too little information to

recognize shapes or patterns.

Available Information Our Estimate

Recognize human features — ear, eye brow shape, and

facial contour.

I know, a human face. (not Van Gogh)

I’m not sure. Arbitrary guess.

I know this mustache! My friend Salvador Dali!

Input Image

6

Property of Natali Ruchansky

How Much and Which Information?

So the questions is, if we start at this image: !

!

!

!

How much and which information do I need to add so that my particular algorithm can infer the image?

abracadabra

Property of Natali Ruchansky

If we can answer: How much and which information do I need to add so that my particular algorithm can infer the image?

!

!

1. Choose which information to add, tailored to the particular reconstruction algorithm.

!

2. Reconstruct based on this information.

Property of Natali Ruchansky

The example of reconstructing Dali is an instance of the problem of Matrix Completion: !

Given a partially-observed matrix M, fill in the missing entires.

Property of Natali Ruchansky

In particular, the version applied to real world data is Low Rank Matrix Completion: !

Given a partially-observed matrix M of low rank r, fill in the missing entires.

Property of Natali Ruchansky

Completion of what?

Property of Natali Ruchansky

!

• Yelp users rate restaurants

Completion of what?

Property of Natali Ruchansky

!

• Yelp users rate restaurants

Completion of what?

But a given user has not visited all restaurants …

So the matrix is partially observed.users

restaurants

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Traffic counters measure traffic on roads

Completion of what?

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Traffic counters measure traffic on roads

Completion of what?

But counters do not exist on all roads …

So the matrix is partially observed.source

destination

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins

!

Completion of what?

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins

!

Completion of what?

But they cannot exhaustively run all experiments …

So the matrix is partially observed.protein

protein

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins

Completion of what?

https://www.telegeography.com/telecom-maps/global-traffic-map.1.html

Property of Natali Ruchansky

!

• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins

!

!

And many more instance of partially observed data…

Completion of what?

Property of Natali Ruchansky

Statistical Matrix CompletionTraditional approaches assume:

1. A random distribution of observations 2. At least n r log(n) observation

!With these (at least) these assumptions, statistical matrix

completion methods pose the problem as an optimization and find the best solution to match the visible information.

input meets assumptions reconstruction

Property of Natali Ruchansky

Statistical Matrix CompletionTraditional approaches assume:

1. A random distribution of observations 2. At least n r log(n) observation

!The challenge with these assumptions is that in real data:

1. The distribution is often not random 2. Very few entries are actually known.

Property of Natali Ruchansky

Statistical Matrix CompletionTraditional approaches assume:

1. A random distribution of observations 2. At least n r log(n) observation

!The challenge with these assumptions is that in real data:

1. The distribution is often not random 2. Very few entries are actually known.

9e7

2.5e8required n r log(n) :known ratings : ≈160,000,000

fewer entries

Property of Natali Ruchansky

Statistical Matrix CompletionTraditional approaches assume:

1. A random distribution of observations 2. At least n r log(n) observation

!The challenge with these assumptions is that in real data:

1. The distribution is often not random 2. Very few entries are actually known.

real observed data best guess

match on Ω, not elsewhere

Property of Natali Ruchansky

Our Question. !

!

How can we design one querying and matrix completion

algorithm, that minimizes the reconstruction error and number of queries ?

!

!

We call this the Active Completion problem.

+ + =

Property of Natali Ruchansky

Our Question. !

!

How can we design one querying and matrix completion

algorithm, that minimizes the reconstruction error and number of queries ?

!

!

We call this the Active Completion problem.

+ + =

1 2

Property of Natali Ruchansky

Our Question. !

!

How can we design one querying and matrix completion

algorithm, that minimizes the reconstruction error and number of queries ?

!

!

We call this the Active Completion problem.

+ + =

1 fixed to budget b

Property of Natali Ruchansky

With great power…Many data owners are in the powerful position to add additional observations: !

Property of Natali Ruchansky

Many data owners are in the powerful position to add additional observations: !

• Yelp can ask some users to rate some restaurants

With great power…

Property of Natali Ruchansky

With great power…Many data owners are in the powerful position to add additional observations: !

• Yelp can ask some users to rate some restaurants • Cities can install traffic counters

Property of Natali Ruchansky

With great power…Many data owners are in the powerful position to add additional observations: !

• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair

Property of Natali Ruchansky

With great power…Many data owners are in the powerful position to add additional observations: !

• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair

Property of Natali Ruchansky

With great power…Many data owners are in the powerful position to add additional observations: !

• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair

!

How to make the most use of the limited budget of queries?

Property of Natali Ruchansky

The AnswerWe construct an algorithm called Order&Extend

that is the first to integrate a querying strategy into its matrix completion algorithm.

!

!

Able to select a small number of queries needed to find an accurate completion.

Property of Natali Ruchansky

Our ApproachThe key to our approach is viewing matrix completion through a sequence of linear systems. !This allows us to identify: 1. Parts of the matrix that can be recovered given the observations 2. Other parts that cannot due to insufficient information 3. The additional entries needed to recover those areas. !!Note this means our algorithm will not do this: It will only estimate the parts it can.

Property of Natali Ruchansky

MC as Linear Systems

Mn

m

= X

Y

n

m

r r

Write the data M = XY as a product of factors.

Property of Natali Ruchansky

MC as Linear Systems

Mn

m

= X

Y

n

m

r r

Property of Natali Ruchansky

MC as Linear Systems

Property of Natali Ruchansky

MC as Linear Systems

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

Mij

Mi’j

yj

for rank 2 :

xixi’

Property of Natali Ruchansky

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

MijMi’j

yj

xixi’

﹖known

unknown

Two equations in two variables

Property of Natali Ruchansky

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

solve for y

MijMi’j

yj

xixi’

Property of Natali Ruchansky

M = xi1y1j + xi2y2j M = xi’1y1j + xi’2y2j

Iteratively solve systems of this form

fill in X and Y, then multiply to get the

estimate M=XY.~

MijMi’j

yj

xixi’

Property of Natali Ruchansky

How do we know when and what we need to query?

42

Property of Natali Ruchansky

Incomplete Systems

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

﹖known

unknown

Two equations in two variables

MijMi’j

xixi’

Property of Natali Ruchansky

Incomplete Systems

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

﹖known

unknown

Two equations in three variables

Mi’j was not observed in the input data.

MijMi’j

xixi’

Property of Natali Ruchansky

Incomplete Systems

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

﹖known

unknown

Query: what is the value of Mi’j ?

﹖MijMi’j

xixi’

Property of Natali Ruchansky

Incomplete Systems

Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j

﹖known

unknownTwo equations in two unknowns,

so we can solve for y.

MijMi’j

xixi’

Property of Natali Ruchansky

Unstable systemsX y = M

1 1/2

1/2 1/3

3/2

1y =1

1 1/2

1/2 1/3

3/2

5/6y’ =2

Property of Natali Ruchansky

Unstable systems

1 1/2

1/2 1/3

3/2

1

1 1/2

1/2 1/3

3/2

5/6

y =

y’ =

X y = M

Property of Natali Ruchansky

Unstable systems

1 1/2

1/2 1/3

3/2

1

1 1/2

1/2 1/3

3/2

5/6

y =

y’ =

X y = M

y =

y’ =

0

3

1

1

Property of Natali Ruchansky

Unstable systems

In the paper… !

1. How can we detect unstable systems? !!

2. How mitigate unstable systems?

Property of Natali Ruchansky

Minimizing QueriesEncountering an incomplete

or unstable systemAlgorithm needs

to query.

Property of Natali Ruchansky

Minimizing Queries

How can we also keep the number of queries asked to a minimum?

Encountering an incomplete or unstable system

Algorithm needs to query.

Property of Natali Ruchansky

Minimizing Queries

How can we also keep the number of queries asked to a minimum?

!

By manipulating the order in which we solve the systems. (Hence the ‘order’ in Order&Extend)

Encountering an incomplete or unstable system

Algorithm needs to query.

Property of Natali Ruchansky

TakeawayObserved data is typically:

- not random - sparse

…But we can query!

+ + = estimate

(minimally!)

Property of Natali Ruchansky

Option 1: Independent

Query Limit = 1

+ =

Decide what to query independently of how you complete.

Property of Natali Ruchansky

Option 2: Integrated

+

+

=

Who is guessing?

normal person

an artist

Decide what to query based on of how you complete.

Property of Natali Ruchansky!

Our algorithm Order&Extend is the first one composed of 1. a querying strategy 2. a completion algorithm

!

!

This integrated nature enables Order&Extend to : - carefully select a small number of queries,

so that the completion algorithm can - recover the matrix with high accuracy. !

!

And allows it to output partial completions for strict limits of the number allotted of queries.

tailored to

Property of Natali Ruchansky

A Flavor

other algorithms do not achieve comparable error

even with <40k queriesFor full and accurate

completion, Order&Extend

asks 13k queries

…while

(of internet traffic data)

Property of Natali RuchanskyDeeper discussion of:

• Matrix completion as a sequence of linear systems • Sequence of linear systems as graph propagation • Predicting unstable systems

• distinction from ill-condition • Efficient computation of stability checks • Finding a good solving-order

• through the lens of graph propagation !

Experiments: • Comparison with Matrix Completion algorithms

• extended with a querying ability • Approximate low-rank • Exact low-rank

Read the paper!

Property of Natali Ruchansky

Thank you.

from the book Dali’s Mustache

(and read the paper)

Recommended