Upload
jeremy-edward-mcbride
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Theoretical Computer Science in a Nutshell
David Pritchard
CCC Second Stage 2008
Outline
Theoretical Computer Science What’s the deal with research? Models, Techniques and Algorithms
Distributed Computing Model Motivation & Definition
A Randomized, Distributed Algorithm(get friendly with your Cycle Space)
Theoretical Computer Science (TCS) in a Nutshell You may already know about algorithms
and data structures (in the “RAM model”)(BFS, DFS, Dijkstra, Floyd-Warshall,
Euclidean, quicksort, binary search, flows…) This is only the tip of the iceberg in TCS TCS’s flavour: mathy (cool ideas and
proofs) but applicable to real problems
One-slide TCS Taxonomy
Algorithms/data structures in many models Sequential (RAM, FSA, Turing machines) Parallel (dual-core, parallel RAM [PRAM]) Distributed (cluster/distributed computing)
Complexity: P, NP, coNP, PH, PP, #P, … Approximation and randomized algorithms Cryptography, quantum, geometry…
Why study TCS?
Immediately applicable bits (Google maps, credit cards, operations research)
Determine fundamental limitations on our power of computation (halting problem)
My view: combines most interesting parts of mathematics and computer programming lots of room for creativity natural field to study if you like contest problems (but
in research, you don’t know if there’s a nice answer)
Part 2: Distributed Computing Model
Distributed Computing Model
Graph (V, E) = network of “computers” Nodes store data and perform computations Edges relay messages between nodes
Distributed Computing Model
Goal: want the graph to compute properties of its initially unknown shape e.g. shortest path from 5F1 to 308 e.g. max flow (bandwidth) from 5F1 to 308
Motivating situations: internet, ad-hoc wireless networks, cellular telephone networks, sensor nets, social networks
5F1
308xyz
xyz
xyz xyzxyz
xyz
xyz
xyz
xyz
xyz
xyz
xyz
xyz
xyz
xyz
xyz
Formal Definition of Model
Unique ID (1, 2, 3…) for each node. #1 is leader Initially nodes only know their own ID and the ID
of each of their neighbours In each round every node can send a O(log |V|)-
bit message to each neighbour Messages are received next round
Node has ∞ storage & power between rounds Need to design a local program, a copy of which
will run at each node, to achieve goal
Formal Definition of Complexity
The time complexity of an algorithm is the number of rounds that elapse before termination
The message complexity is the total number of messages that are sent
We don’t care about time/space requirements at individual nodes
Distributed Computing 101
What if IDs are “ugly”? e.g. in sensor network, (3D8,1FE…) instead of (1,2…) or if graph is not connected Need a leader election algorithm
How can we communicate to all nodes? How can we count # nodes? How can we adapt to edge/node failures?
Basic Problem: Spanning Tree
Required: mark a subset of edges so that there is exactly one path from each node to the leader “Mark:” each node keeps a list of which of its
adjacent edges are in the tree
Each non-leader must know its parent Again, each node stores parent and child IDs
Solution: (Breadth-First) Spanning Tree Algorithm 1. Initialize only leader to be in the tree 2. In each round, at node v,
if this is the first round v is in the tree, send msg to each neighbour asking to join tree
else if (v not in tree) and v got msg from u add v and uv to tree & set u to be the parent of v
3. (Stop when all nodes are in tree)
Illustration of Spanning Tree Construction
Legend: computer; leader; edge w/ msg sent; tree edge (head=parent) Done! Now… can use tree to broadcast msg from leader to all nodes, or census
Distributed Census Algorithm
Each leaf node reports “1” to parent For each nonleaf, sum reports from
children, add 1, and send to parent1
11
1
1
1
1
1
33
4
6
2
5
7
8
1018
Time Analysis
Construction of T, broadcast, and census, take time proportional to height(T) Also proportional to diameter Diam of
network := max distance between any 2 nodes
Compare: sequential model always has time complexity >= |E| due to reading input Diam can be much smaller than |V|, |E|
Part 3: Randomized (Distributed) Algorithm for Cut Edges/Pairs
Types of Cuts in Graphs
A cut is a part of a connected graph that, when deleted, makes it disconnected
Cut edge:(“bridge”)
Cut pair:
Motivation to find these: want to attack or reinforce a network
Part 3 Summary
I’ll show you a simple new approach that lets you find cut edges and cut pairs
Yields O(E)-time RAM algorithm Older algorithms match this, but are complex
Yields O(Diam)-time distributed alg’s Beats previous best. In publication.
Tools: randomization, cycle space
The Cycle Space
An even graph has even degree at each vertex; for graph (V, E)
The cycle space is “all subsets F of E such that (V, F) is an even graph”
If F is in the cycle space, we call F a binary circulation
Is a vector space (algebra 101)F marked in red
Examples of Binary Circulations
For this graph,some binary circulations
Φ1 shown in green
Φ2 shown in red
Φ3 shown in orange
another one is the empty graph (Φ ≡ 0)
Get To Know Your Cycle Space
Lemma 1: If F1 and F2 are binary circulations, so is F1 xor F2
Lemma 2: If e is a cut edge and F is a binary circulation, then e is not in F
Lemma 3: If {e, f} is a cut pair, and F is a binary circulation, then either (1) both e, f are in F or (2) neither e nor f are in F
b-bit circulations
To denote many binary circ-ulations (Φ1,…,Φb) at once:
b-bit circulation: function Φ:E→{0,1}b where ith bit of Φ(e) is Φi(e)
e.g. for edges e* and f* & Φ1, Φ2, Φ3 as before, Φ(e*)=001, Φ(f*)=111
e*f*
Constructing Binary Circulations
For spanning tree T, E\T = “non-tree edges” Claim: for any T and subset S of E\T, a unique
subset S’ of T exists so that S u S’ is a circ. S u S’ is “unique completion” of “partial circulation” S
Corollary: given b-bit values on each non-tree edge, exists a unique assignment of values to tree edges that makes a b-bit circulation Next: proof/implementation of claim
xx
x x
x x
x
x
S’S
Binary Circulation Construction (Completion) Fixed: which edges of
E\T to include Idea: for uv in T, v a
leaf: conservation at v
determines if uv should be included
repeat! h(t) distributed rounds
at end each v knows incident Φ values
tree T & edges of E\T to include or exclude
xx
x x
x
uuu
u
v
vv
v
Must include
Must exclude
x x
x
x
Random Binary Circulations
Where randomness comes into play: include each non-tree edge w/ indep. prob. ½ then, compute completion
Fact: Pr[we obtain Φ*] = 2E-V+1, for any Φ* So all binary circulations are equally likely
Distributed implementation easy
Application 1: Cut Edges
Folklore: for circ. Φ & cut edge e, Φ(e)=0. Conversely, with a little work we can show:
For random binary (resp. b-bit) Φ, if e is not a cut edge, Pr[Φ(e)=0] = ½ (resp. (½)b)
S V\S
δ(S)
e
Application 1: Cut Edges
Distributed algorithm:Get random b-bit circulation Φ, b = 3lg(V)Output that each e is a cut edge if Φ(e)=0
Analysis:For cut edge e, Φ(e)=0For non-cut edge e, Pr[Φ(e)=0] = 2-b = V-3
Union bound correct with prob. 1-1/VO(D) distributed time, using BFS tree
Application 2: Cut Pairs
WOLOG G has no cut edges With a little work we can show:
For random b-bit circulation Φ, Pr[Φ(e)=Φ(f)] is 1 if {e,f} is a cut pair, 2-b otherwise
S V\S
δ(S)
e
f
Application 2: Cut Pairs
Sketch of algorithm:Generate a 5lg(V)-bit random circulation ΦSort all edges using Φ(e) as key for eOutput “cut pairs are {{e,f}|Φ(e)=Φ(f)}”
Each pair is correct with probability 1-V-5
Thus probability of failure < E2V-5 < 1/V
Cut Pairs: Details
Cut pairs can be described more compactly by cut classes Idea: if {e,f} and {f,g} are cut pairs, so is {e,g}
To get linear-time sequential algorithm use linear-time sort e.g. radix sort
Major distributed hurdle: not easy to find all edge pairs {e,f} with Φ(e)=Φ(f)!
In Closing
Notice that we’re gambling Gives wrong answer with probability ~1/|V| Always fast, usually correct: Monte Carlo
We can convert it to one which checks output for correctness and starts over in the event of an error Always correct, usually fast: Las Vegas