38
Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Approximation algorithms for

Path-Planning and Clustering

problems on graphs

(Thesis Proposal)

Shuchi ChawlaCarnegie Mellon University

Page 2: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University2

Two classes of Graph Optimization problems

Optimization problems on graphs arise in many fields

Typically NP-hard

We consider two classes of problems motivated by machine learning and AI:

Path-planning – Construct a “good” path, given a map

Clustering – Divide objects into groups based on similarity

Page 3: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Path-planning Problems

Page 4: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University4

A Robot Navigation Problem

Task: Deliver packages to certain locations Faster delivery => greater happiness; “reward”

Want a path with short length and large reward

Classic formulation – Traveling SalesmanFind the shortest tour covering all locations

Some complicating constraints Limited battery power – robot may die before finishing

task Packages have different deadlines for delivery Preference to the larger reward packages

An alternate formulation – Orienteering Construct a path of length · D Visit as many locations (reward) as possible

Page 5: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University5

Path-planning in the real-world: Motivation

Given graph (metric) G, construct a path satisfying some constraints and optimizing some function.

Some applications:Robotics Assembly analysisManufacturing Production planning

A trade-off between time and rewardmaximize reward with bounded lengthminimize length with reward quotasome combination of both

Page 6: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University6

A time-reward trade-off

Impose a reward quota and minimize length Metric TSP Collect all points k-Path Collect at least k reward

Budget the path-length and maximize reward Orienteering Hard bound on path length Time Window Visit node v within [Rv, Dv]

Optimize a combination of reward and length Prize Collecting TSP Min (length + reward left) Discounted Reward TSP max reward; reward

decreases with time

Page 7: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University7

A time-reward trade-off

Impose a reward quota and minimize length Metric TSP 1.5 [Christofides 76] k-Path 2 + [Chaudhury Godfrey Rao+ 03]

Budget the path-length and maximize reward Orienteering 3 Time Window 3log2n [Bansal Blum C Meyerson 04]

Optimize a combination of reward and length Prize Collecting TSP 2 [Goemans Williamson 95] Discounted Reward TSP 6.75 + [Blum C

Karger+ 03]

[Blum C Karger Meyerson Minkoff Lane 03]

Page 8: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University8

Orienteering and k-Path

Orienteering : length · D ; maximize reward k-Path : reward ¸ k ; minimize length

Complementary problems

Series of results on k-TSP (related to k-Path)

[BRV99] [Garg99] [AK00] [CGRT03] …

best approx: (2+)

None for Orienteering until recently!

Page 9: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University9

Why is Orienteering difficult?

First attempt – Use distance-based approximations to approximate reward

Let OPT(d) = max achievable reward with length d

A 2-approx for distance implies that ALG(d) ≥ OPT(d/2)

However, we may have OPT(d/2) << OPT(d) Bad trade-off between distance and reward!

sOPT(d)

APPROX

Page 10: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University10

Why is Orienteering difficult?

Second attempt – approximate subparts of the optimal path and shortcut other parts

If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away

Approximate the “extra” length taken by a path over the shortest path length

s tOPTAPPROX

Min-Excess Path Problem

Page 11: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University11

The Min-Excess Problem

Given graph G, start and end nodes s, t, reward on nodes v, target reward k, find a path that collects reward at least k and minimizes (P) = ℓ(P) – d(s,t)

At optimality, this is exactly the same as the k-path objective of minimizing ℓ(P)

However, approximation is different: Min-excess is strictly harder than K-path

We give a (2+)-approximation for Min-Excess

[Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03]

Our algorithm returns a path with length

d(s,t) + (2+) (P)

excess

Page 12: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University12

A 3-approximation to Orienteering

There exists a path from s to t, that collects reward at least has length D

Given a 3-approximation to min-excess:1. Divide into 3 “equal-reward” parts (hypothetically)

2. Approximate the part with the smallest excess 3-approximation to orienteering

s t

Excess of one path · (1+2+3)/3Can afford an excess up to (1+2+3)

1 2

3

Excess of path P (P) = dP(u,v)– d(u,v)

Using an r-approx for Min-excess ( r Z+ ), we get an r-approximation for s-t Orienteering

v1

v2 OPT

APPROX

Open: Given an r-approx for min-excess (r 2 R +), can we get r-approx to Orienteering?

[Blum C Karger + 03]

Page 13: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University13

The next step: Deadline-TSP

Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v)

Arises in scheduling, production planning

If the last node on the path has the min deadline, use Orienteering to approximate the reward

Don’t need to bother about deadlines of other nodes

Does OPT always have a large subpath with the above property?

There are many subpaths of OPT with the above property that together contain all the reward

NO!

[Bansal Blum C Meyerson 04]

Page 14: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University14

A segmentation of OPT

Time

Dead

line

Page 15: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University15

Deadline-TSP

Segment graph into many parts, approximate each using Orienteering and patch them together

How do we find such a segmentation without knowing the optimal path?

In order to avoid double-counting of reward, segments should be node-disjoint

Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)-approximation

Open: Is there a segmentation based on other properties (eg. distance from the root), that

gives a constant approximation?

Page 16: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University16

An overview of our results

Time-Window Problem 3 log2n

ApproximationProblem

Discounted-Reward TSP

Orienteering 3

References

[STOC 04]

[STOC 04]

[FOCS 03]6.75+

Deadline TSP 3 logn [STOC 04]

Min-Excess 2+ [FOCS 03]

Time-Window Problem - bicriteria

reward: log 1/ deadlines: 1+ [STOC 04]

Page 17: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University17

Future Directions

Better approximations can we get a constant factor for Time-Windows? special metrics such as trees or planar graphs hardness of approximation?

Asymmetric Path-planning the graph is directed; still obeys triangle inequality polylog-approximations and lower bounds for distance need entirely different ideas for asymmetric-

Orienteering is it log-hard?

Group Path-planning Reward is associated with “groups” of nodes visit at least one node in a group to obtain reward

Page 18: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University18

Future Directions

Stochastic Path-planning Closer home to Robot Navigation; The graph is a

Markov Decision Process Each edge is an “action” associated with a

probability distribution

The goal: Give a “strategy” to accomplish

a given task as fast as possible Best action could be history

dependent Can we write down the best strategy

in polynomial time? Approximate it in poly-time or

even in NP?

0.2

0.7

0.1

0.3

0.2

0.5

Page 19: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Correlation Clustering

Coming up next :

Page 20: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University20

Natural Language Processing

In order to understand the article automatically, need to figure out which entities are one and the same

Is “his” in the second line the same person as “The secretary” in the first line?

Page 21: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University21

Real-World Clustering Problems

A wide variety of clustering problems Co-reference Analysis Web document clustering Co-authorship (Citeseer/DBLP) Computer Vision

Typical characteristics: No well-defined “similarity metric” Number of clusters is unknown No predefined topics – desirable to figure them out as

part of the algorithm

Page 22: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University22

Cohen, McCallum & Richman’s idea

Mr. Rumsfieldhis

he

Saddam Hussein

Strong similarity

Strong dissimilarity

The secretary

“Learn” a similarity measure based on context

Page 23: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University23

Consistent clustering:edges inside clusters

edges between clusters

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Strong similarity

Strong dissimilarity

A good clustering

Page 24: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University24

Inconsistencies or “mistakes”

Strong similarity

Strong dissimilarity

A good clustering

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Consistent clustering:edges inside clusters

edges between clusters

Page 25: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University25

A good clustering

Mistakes

No consistent clustering!

Goal: Find the most consistent clustering

Strong similarity

Strong dissimilarity

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Page 26: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University26

Correlation Clustering

Given a graph with positive (similar) and negative (dissimilar) edges, find the most consistent clustering

NP-hard [Bansal, Blum, C, FOCS’02]

Two natural objectives – Maximize agreements

(# of +ve inside clusters) + (# of –ve between clusters)

Minimize disagreements(# of +ve between clusters) + (# of –ve inside clusters)

Equivalent at optimality, but different in terms of approximation

Page 27: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University27

Overview of results

Weighted graphs

Unweighted (complete) graphs

Max AgreeMin Disagree

17433 [Bansal Blum C 02]

4 [Charikar Guruswami Wirth

03]

PTAS[Bansal Blum C 02]

1.3048O(log n)

[CGW 03]

1.3044 [Swamy 04][Immorlica Demaine 03]

[Charikar Guruswami Wirth 03]

[Emanuel Fiat 03]

116/11529/28 [CGW 03] [CGW 03]

APX-hard [CGW 03]

Page 28: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University28

Minimizing Disagreements [Bansal, Blum, C, FOCS’02]

Goal: approximately minimize number of “mistakes” Assumption: The graph is unweighted and complete

A lower bound on OPT : Erroneous Triangles

Consider

+

-

Any clustering disagrees with at least one of these edges

+“Erroneous Triangle”

If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each oneDopt Maximum fractional packing of erroneous triangles

Page 29: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University29

Using the lower bound: -clean clusters

Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to

erroneous triangles

“clean” clusters each vertex has few disagreements incident on it few is relative to the size of the cluster # of disagreements · ¼ # of erroneous triangles

“good” vertex

“bad” vertexClean cluster All vertices are good

Page 30: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University30

Using the lower bound: -clean clusters

Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to

erroneous triangles

-clean clusters each vertex in cluster C has fewer than |C| positive

and |C| negative mistakes

# of disagreements · ¼ # of erroneous triangles

A high density of positive edgesWe can easily spot them in the graph

Possible solution: Find a -clean clustering, and charge disagreements to erroneous triangles

Caveat: It may not exist

Page 31: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University31

Using the lower bound: -clean clusters

We show: an almost--clean clustering that is almost as

good as OPT

Nice structure helps us find it easily.

Caveat: A -clean clustering may not exist

An almost--clean clustering:All clusters are either -clean or contain a single node

An almost -clean clustering always exists – trivially

OPT()

Page 32: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University32

OPT() – clean or singleton

Optimal Clustering

Imaginary Procedure

OPT() : All clusters are -clean or singleton

“bad” vertice

s

Few new mistakes

Page 33: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University33

Finding clean clusters

OPT()

ALG

Clean clusters

Charging-off mistakes

1. Mistakes among clean clusters - charge to erron. ∆s

2. Mistakes among singletons - no more than corresponding mistakes in OPT()

Page 34: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University34

A summary of results

Weighted graphs

Unweighted (complete) graphs

Max AgreeMin Disagree

17433 [Bansal Blum C 02]

4 [Charikar Guruswami Wirth

03]

PTAS

[Bansal Blum C 02]

1.3048O(log n)

[CGW 03]

1.3044 [Swamy 04][Immorlica Demaine 03]

[Charikar Guruswami Wirth 03]

[Emanuel Fiat 03]

116/11529/28 [CGW 03] [CGW 03]

APX-hard [CGW 03]

Page 35: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University35

Future Directions

Better combinatorial approximation The current best algorithms have a large running time

-- employ an LP with O(n2) variables

Improving the lower bound: Erroneous cycles – one negative edge and remaining

positiveThe gap of this lower bound is between 2 and 4

[Charikar Guruswami Wirth 03]

Can we obtain a 2-approximation?

A good “iterative” approximation on few changes to the graph, quickly recompute a good

clustering

Page 36: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University36

Future Directions

Clustering with small clusters Given that all clusters in OPT have size at most k, find

a good approximation Is this NP-hard? Different from finding best clustering with small

clusters, without guarantee on OPT

Clustering with few clusters Given that OPT has at most k clusters, find an

approximation

Maximizing Correlation number of agreements – number of disagreements Can we get a constant factor approximation?

Page 37: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Shuchi Chawla, Carnegie Mellon University37

Timeline

Plan to finish in a year

Summer 04 Stochastic/Time-dependent path-planningClustering with constraints

Fall 04 Asymmetric/group path-planningCombinatorial/streaming algo for clustering

Spring 05 Wrap-up; writing; job search!

Page 38: Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

Questions?