Property of Natali Ruchansky
Matrix Completion with Queries
Natali Ruchansky, Mark Crovella, Evimaria Terzi
Property of Natali Ruchansky
Can you guess the picture?
Property of Natali Ruchansky
3
What about now?
Property of Natali Ruchansky
And now?
4
Property of Natali RuchanskySalvador Domingo Felipe Jacinto Dalí i Domènech
5
Property of Natali Ruchansky
How did you do it?
For most there is too little information to
recognize shapes or patterns.
Available Information Our Estimate
Recognize human features — ear, eye brow shape, and
facial contour.
I know, a human face. (not Van Gogh)
I’m not sure. Arbitrary guess.
I know this mustache! My friend Salvador Dali!
Input Image
6
Property of Natali Ruchansky
How Much and Which Information?
So the questions is, if we start at this image: !
!
!
!
How much and which information do I need to add so that my particular algorithm can infer the image?
abracadabra
Property of Natali Ruchansky
If we can answer: How much and which information do I need to add so that my particular algorithm can infer the image?
!
!
1. Choose which information to add, tailored to the particular reconstruction algorithm.
!
2. Reconstruct based on this information.
Property of Natali Ruchansky
The example of reconstructing Dali is an instance of the problem of Matrix Completion: !
Given a partially-observed matrix M, fill in the missing entires.
Property of Natali Ruchansky
In particular, the version applied to real world data is Low Rank Matrix Completion: !
Given a partially-observed matrix M of low rank r, fill in the missing entires.
Property of Natali Ruchansky
Completion of what?
Property of Natali Ruchansky
!
• Yelp users rate restaurants
Completion of what?
Property of Natali Ruchansky
!
• Yelp users rate restaurants
Completion of what?
But a given user has not visited all restaurants …
So the matrix is partially observed.users
restaurants
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Traffic counters measure traffic on roads
Completion of what?
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Traffic counters measure traffic on roads
Completion of what?
But counters do not exist on all roads …
So the matrix is partially observed.source
destination
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins
!
Completion of what?
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins
!
Completion of what?
But they cannot exhaustively run all experiments …
So the matrix is partially observed.protein
protein
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins
Completion of what?
https://www.telegeography.com/telecom-maps/global-traffic-map.1.html
Property of Natali Ruchansky
!
• Yelp users rate restaurants • Cities can install traffic counters • Biologists measure interaction of proteins
!
!
And many more instance of partially observed data…
Completion of what?
Property of Natali Ruchansky
Statistical Matrix CompletionTraditional approaches assume:
1. A random distribution of observations 2. At least n r log(n) observation
!With these (at least) these assumptions, statistical matrix
completion methods pose the problem as an optimization and find the best solution to match the visible information.
input meets assumptions reconstruction
Property of Natali Ruchansky
Statistical Matrix CompletionTraditional approaches assume:
1. A random distribution of observations 2. At least n r log(n) observation
!The challenge with these assumptions is that in real data:
1. The distribution is often not random 2. Very few entries are actually known.
Property of Natali Ruchansky
Statistical Matrix CompletionTraditional approaches assume:
1. A random distribution of observations 2. At least n r log(n) observation
!The challenge with these assumptions is that in real data:
1. The distribution is often not random 2. Very few entries are actually known.
9e7
2.5e8required n r log(n) :known ratings : ≈160,000,000
fewer entries
Property of Natali Ruchansky
Statistical Matrix CompletionTraditional approaches assume:
1. A random distribution of observations 2. At least n r log(n) observation
!The challenge with these assumptions is that in real data:
1. The distribution is often not random 2. Very few entries are actually known.
real observed data best guess
match on Ω, not elsewhere
Property of Natali Ruchansky
Our Question. !
!
How can we design one querying and matrix completion
algorithm, that minimizes the reconstruction error and number of queries ?
!
!
We call this the Active Completion problem.
+ + =
Property of Natali Ruchansky
Our Question. !
!
How can we design one querying and matrix completion
algorithm, that minimizes the reconstruction error and number of queries ?
!
!
We call this the Active Completion problem.
+ + =
1 2
Property of Natali Ruchansky
Our Question. !
!
How can we design one querying and matrix completion
algorithm, that minimizes the reconstruction error and number of queries ?
!
!
We call this the Active Completion problem.
+ + =
1 fixed to budget b
Property of Natali Ruchansky
With great power…Many data owners are in the powerful position to add additional observations: !
Property of Natali Ruchansky
Many data owners are in the powerful position to add additional observations: !
• Yelp can ask some users to rate some restaurants
With great power…
Property of Natali Ruchansky
With great power…Many data owners are in the powerful position to add additional observations: !
• Yelp can ask some users to rate some restaurants • Cities can install traffic counters
Property of Natali Ruchansky
With great power…Many data owners are in the powerful position to add additional observations: !
• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair
Property of Natali Ruchansky
With great power…Many data owners are in the powerful position to add additional observations: !
• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair
Property of Natali Ruchansky
With great power…Many data owners are in the powerful position to add additional observations: !
• Yelp can ask some users to rate some restaurants • Cities can install traffic counters • Biologists can experiment with a particular protein pair
!
How to make the most use of the limited budget of queries?
Property of Natali Ruchansky
The AnswerWe construct an algorithm called Order&Extend
that is the first to integrate a querying strategy into its matrix completion algorithm.
!
!
Able to select a small number of queries needed to find an accurate completion.
Property of Natali Ruchansky
Our ApproachThe key to our approach is viewing matrix completion through a sequence of linear systems. !This allows us to identify: 1. Parts of the matrix that can be recovered given the observations 2. Other parts that cannot due to insufficient information 3. The additional entries needed to recover those areas. !!Note this means our algorithm will not do this: It will only estimate the parts it can.
Property of Natali Ruchansky
MC as Linear Systems
Mn
m
= X
Y
n
m
r r
Write the data M = XY as a product of factors.
Property of Natali Ruchansky
MC as Linear Systems
Mn
m
= X
Y
n
m
r r
Property of Natali Ruchansky
MC as Linear Systems
Property of Natali Ruchansky
MC as Linear Systems
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
Mij
Mi’j
yj
for rank 2 :
xixi’
Property of Natali Ruchansky
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
MijMi’j
yj
xixi’
﹖known
unknown
Two equations in two variables
Property of Natali Ruchansky
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
solve for y
MijMi’j
yj
xixi’
Property of Natali Ruchansky
M = xi1y1j + xi2y2j M = xi’1y1j + xi’2y2j
Iteratively solve systems of this form
fill in X and Y, then multiply to get the
estimate M=XY.~
MijMi’j
yj
xixi’
Property of Natali Ruchansky
How do we know when and what we need to query?
42
Property of Natali Ruchansky
Incomplete Systems
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
﹖known
unknown
Two equations in two variables
MijMi’j
xixi’
Property of Natali Ruchansky
Incomplete Systems
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
﹖known
unknown
Two equations in three variables
﹖
Mi’j was not observed in the input data.
MijMi’j
xixi’
Property of Natali Ruchansky
Incomplete Systems
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
﹖known
unknown
Query: what is the value of Mi’j ?
﹖MijMi’j
xixi’
Property of Natali Ruchansky
Incomplete Systems
Mij = xi1y1j + xi2y2j Mi’j = xi’1y1j + xi’2y2j
﹖known
unknownTwo equations in two unknowns,
so we can solve for y.
MijMi’j
xixi’
Property of Natali Ruchansky
Unstable systemsX y = M
1 1/2
1/2 1/3
3/2
1y =1
1 1/2
1/2 1/3
3/2
5/6y’ =2
Property of Natali Ruchansky
Unstable systems
1 1/2
1/2 1/3
3/2
1
1 1/2
1/2 1/3
3/2
5/6
y =
y’ =
X y = M
Property of Natali Ruchansky
Unstable systems
1 1/2
1/2 1/3
3/2
1
1 1/2
1/2 1/3
3/2
5/6
y =
y’ =
X y = M
y =
y’ =
0
3
1
1
Property of Natali Ruchansky
Unstable systems
In the paper… !
1. How can we detect unstable systems? !!
2. How mitigate unstable systems?
Property of Natali Ruchansky
Minimizing QueriesEncountering an incomplete
or unstable systemAlgorithm needs
to query.
Property of Natali Ruchansky
Minimizing Queries
How can we also keep the number of queries asked to a minimum?
Encountering an incomplete or unstable system
Algorithm needs to query.
Property of Natali Ruchansky
Minimizing Queries
How can we also keep the number of queries asked to a minimum?
!
By manipulating the order in which we solve the systems. (Hence the ‘order’ in Order&Extend)
Encountering an incomplete or unstable system
Algorithm needs to query.
Property of Natali Ruchansky
TakeawayObserved data is typically:
- not random - sparse
…But we can query!
+ + = estimate
(minimally!)
Property of Natali Ruchansky
Option 1: Independent
Query Limit = 1
+ =
Decide what to query independently of how you complete.
Property of Natali Ruchansky
Option 2: Integrated
+
+
=
Who is guessing?
normal person
an artist
Decide what to query based on of how you complete.
Property of Natali Ruchansky!
Our algorithm Order&Extend is the first one composed of 1. a querying strategy 2. a completion algorithm
!
!
This integrated nature enables Order&Extend to : - carefully select a small number of queries,
so that the completion algorithm can - recover the matrix with high accuracy. !
!
And allows it to output partial completions for strict limits of the number allotted of queries.
tailored to
Property of Natali Ruchansky
A Flavor
other algorithms do not achieve comparable error
even with <40k queriesFor full and accurate
completion, Order&Extend
asks 13k queries
…while
(of internet traffic data)
Property of Natali RuchanskyDeeper discussion of:
• Matrix completion as a sequence of linear systems • Sequence of linear systems as graph propagation • Predicting unstable systems
• distinction from ill-condition • Efficient computation of stability checks • Finding a good solving-order
• through the lens of graph propagation !
Experiments: • Comparison with Matrix Completion algorithms
• extended with a querying ability • Approximate low-rank • Exact low-rank
Read the paper!
Property of Natali Ruchansky
Thank you.
from the book Dali’s Mustache
(and read the paper)