34
Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Embed Size (px)

Citation preview

Page 1: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Theoretical Computer Science in a Nutshell

David Pritchard

CCC Second Stage 2008

Page 2: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Outline

Theoretical Computer Science What’s the deal with research? Models, Techniques and Algorithms

Distributed Computing Model Motivation & Definition

A Randomized, Distributed Algorithm(get friendly with your Cycle Space)

Page 3: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Theoretical Computer Science (TCS) in a Nutshell You may already know about algorithms

and data structures (in the “RAM model”)(BFS, DFS, Dijkstra, Floyd-Warshall,

Euclidean, quicksort, binary search, flows…) This is only the tip of the iceberg in TCS TCS’s flavour: mathy (cool ideas and

proofs) but applicable to real problems

Page 4: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008
Page 5: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

One-slide TCS Taxonomy

Algorithms/data structures in many models Sequential (RAM, FSA, Turing machines) Parallel (dual-core, parallel RAM [PRAM]) Distributed (cluster/distributed computing)

Complexity: P, NP, coNP, PH, PP, #P, … Approximation and randomized algorithms Cryptography, quantum, geometry…

Page 6: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Why study TCS?

Immediately applicable bits (Google maps, credit cards, operations research)

Determine fundamental limitations on our power of computation (halting problem)

My view: combines most interesting parts of mathematics and computer programming lots of room for creativity natural field to study if you like contest problems (but

in research, you don’t know if there’s a nice answer)

Page 7: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Part 2: Distributed Computing Model

Page 8: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Distributed Computing Model

Graph (V, E) = network of “computers” Nodes store data and perform computations Edges relay messages between nodes

Page 9: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Distributed Computing Model

Goal: want the graph to compute properties of its initially unknown shape e.g. shortest path from 5F1 to 308 e.g. max flow (bandwidth) from 5F1 to 308

Motivating situations: internet, ad-hoc wireless networks, cellular telephone networks, sensor nets, social networks

5F1

308xyz

xyz

xyz xyzxyz

xyz

xyz

xyz

xyz

xyz

xyz

xyz

xyz

xyz

xyz

xyz

Page 10: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008
Page 11: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Formal Definition of Model

Unique ID (1, 2, 3…) for each node. #1 is leader Initially nodes only know their own ID and the ID

of each of their neighbours In each round every node can send a O(log |V|)-

bit message to each neighbour Messages are received next round

Node has ∞ storage & power between rounds Need to design a local program, a copy of which

will run at each node, to achieve goal

Page 12: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Formal Definition of Complexity

The time complexity of an algorithm is the number of rounds that elapse before termination

The message complexity is the total number of messages that are sent

We don’t care about time/space requirements at individual nodes

Page 13: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Distributed Computing 101

What if IDs are “ugly”? e.g. in sensor network, (3D8,1FE…) instead of (1,2…) or if graph is not connected Need a leader election algorithm

How can we communicate to all nodes? How can we count # nodes? How can we adapt to edge/node failures?

Page 14: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Basic Problem: Spanning Tree

Required: mark a subset of edges so that there is exactly one path from each node to the leader “Mark:” each node keeps a list of which of its

adjacent edges are in the tree

Each non-leader must know its parent Again, each node stores parent and child IDs

Page 15: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Solution: (Breadth-First) Spanning Tree Algorithm 1. Initialize only leader to be in the tree 2. In each round, at node v,

if this is the first round v is in the tree, send msg to each neighbour asking to join tree

else if (v not in tree) and v got msg from u add v and uv to tree & set u to be the parent of v

3. (Stop when all nodes are in tree)

Page 16: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Illustration of Spanning Tree Construction

Legend: computer; leader; edge w/ msg sent; tree edge (head=parent) Done! Now… can use tree to broadcast msg from leader to all nodes, or census

Page 17: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Distributed Census Algorithm

Each leaf node reports “1” to parent For each nonleaf, sum reports from

children, add 1, and send to parent1

11

1

1

1

1

1

33

4

6

2

5

7

8

1018

Page 18: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Time Analysis

Construction of T, broadcast, and census, take time proportional to height(T) Also proportional to diameter Diam of

network := max distance between any 2 nodes

Compare: sequential model always has time complexity >= |E| due to reading input Diam can be much smaller than |V|, |E|

Page 19: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Part 3: Randomized (Distributed) Algorithm for Cut Edges/Pairs

Page 20: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Types of Cuts in Graphs

A cut is a part of a connected graph that, when deleted, makes it disconnected

Cut edge:(“bridge”)

Cut pair:

Motivation to find these: want to attack or reinforce a network

Page 21: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Part 3 Summary

I’ll show you a simple new approach that lets you find cut edges and cut pairs

Yields O(E)-time RAM algorithm Older algorithms match this, but are complex

Yields O(Diam)-time distributed alg’s Beats previous best. In publication.

Tools: randomization, cycle space

Page 22: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

The Cycle Space

An even graph has even degree at each vertex; for graph (V, E)

The cycle space is “all subsets F of E such that (V, F) is an even graph”

If F is in the cycle space, we call F a binary circulation

Is a vector space (algebra 101)F marked in red

Page 23: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Examples of Binary Circulations

For this graph,some binary circulations

Φ1 shown in green

Φ2 shown in red

Φ3 shown in orange

another one is the empty graph (Φ ≡ 0)

Page 24: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Get To Know Your Cycle Space

Lemma 1: If F1 and F2 are binary circulations, so is F1 xor F2

Lemma 2: If e is a cut edge and F is a binary circulation, then e is not in F

Lemma 3: If {e, f} is a cut pair, and F is a binary circulation, then either (1) both e, f are in F or (2) neither e nor f are in F

Page 25: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

b-bit circulations

To denote many binary circ-ulations (Φ1,…,Φb) at once:

b-bit circulation: function Φ:E→{0,1}b where ith bit of Φ(e) is Φi(e)

e.g. for edges e* and f* & Φ1, Φ2, Φ3 as before, Φ(e*)=001, Φ(f*)=111

e*f*

Page 26: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Constructing Binary Circulations

For spanning tree T, E\T = “non-tree edges” Claim: for any T and subset S of E\T, a unique

subset S’ of T exists so that S u S’ is a circ. S u S’ is “unique completion” of “partial circulation” S

Corollary: given b-bit values on each non-tree edge, exists a unique assignment of values to tree edges that makes a b-bit circulation Next: proof/implementation of claim

xx

x x

x x

x

x

S’S

Page 27: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Binary Circulation Construction (Completion) Fixed: which edges of

E\T to include Idea: for uv in T, v a

leaf: conservation at v

determines if uv should be included

repeat! h(t) distributed rounds

at end each v knows incident Φ values

tree T & edges of E\T to include or exclude

xx

x x

x

uuu

u

v

vv

v

Must include

Must exclude

x x

x

x

Page 28: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Random Binary Circulations

Where randomness comes into play: include each non-tree edge w/ indep. prob. ½ then, compute completion

Fact: Pr[we obtain Φ*] = 2E-V+1, for any Φ* So all binary circulations are equally likely

Distributed implementation easy

Page 29: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Application 1: Cut Edges

Folklore: for circ. Φ & cut edge e, Φ(e)=0. Conversely, with a little work we can show:

For random binary (resp. b-bit) Φ, if e is not a cut edge, Pr[Φ(e)=0] = ½ (resp. (½)b)

S V\S

δ(S)

e

Page 30: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Application 1: Cut Edges

Distributed algorithm:Get random b-bit circulation Φ, b = 3lg(V)Output that each e is a cut edge if Φ(e)=0

Analysis:For cut edge e, Φ(e)=0For non-cut edge e, Pr[Φ(e)=0] = 2-b = V-3

Union bound correct with prob. 1-1/VO(D) distributed time, using BFS tree

Page 31: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Application 2: Cut Pairs

WOLOG G has no cut edges With a little work we can show:

For random b-bit circulation Φ, Pr[Φ(e)=Φ(f)] is 1 if {e,f} is a cut pair, 2-b otherwise

S V\S

δ(S)

e

f

Page 32: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Application 2: Cut Pairs

Sketch of algorithm:Generate a 5lg(V)-bit random circulation ΦSort all edges using Φ(e) as key for eOutput “cut pairs are {{e,f}|Φ(e)=Φ(f)}”

Each pair is correct with probability 1-V-5

Thus probability of failure < E2V-5 < 1/V

Page 33: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Cut Pairs: Details

Cut pairs can be described more compactly by cut classes Idea: if {e,f} and {f,g} are cut pairs, so is {e,g}

To get linear-time sequential algorithm use linear-time sort e.g. radix sort

Major distributed hurdle: not easy to find all edge pairs {e,f} with Φ(e)=Φ(f)!

Page 34: Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

In Closing

Notice that we’re gambling Gives wrong answer with probability ~1/|V| Always fast, usually correct: Monte Carlo

We can convert it to one which checks output for correctness and starts over in the event of an error Always correct, usually fast: Las Vegas