25
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001

5.4 randomized datastructures

Embed Size (px)

Citation preview

CSE 326Randomized Data Structures

David Kaplan

Dept of Computer Science & EngineeringAutumn 2001

Randomized Data Structures

CSE 326 Autumn 20012

Motivation for

Randomized Data StructuresWe’ve seen many data structures with good average case performance on random inputs, but bad behavior on particular inputs

e.g. Binary Search Trees

Instead of randomizing the input (since we cannot!), consider randomizing the data structure

No bad inputs, just unlucky random numbers Expected case good behavior on any input

Randomized Data Structures

CSE 326 Autumn 20013

Average vs. Expected TimeAverage (1/N) xi

Expectation Pr(xi) xi

Deterministic with good average time If your application happens to always (or often) use the

“bad” case, you are in big trouble!

Randomized with good expected time Once in a while you will have an expensive operation, but

no inputs can make this happen all the time

Like an insurance policy for your algorithm!

Randomized Data Structures

CSE 326 Autumn 20014

Randomized Data Structures Define a property (or subroutine) in an algorithm Sample or randomly modify the property Use altered property as if it were the true

property

Can transform average case runtimes into expected runtimes (remove input dependency).

Sometimes allows substantial speedup in exchange for probabilistic unsoundness.

Randomized Data Structures

CSE 326 Autumn 20015

Randomization in Action Quicksort Randomized data structures

Treaps Randomized skip lists

Random number generation Primality testing

Randomized Data Structures

CSE 326 Autumn 20016

Treap Dictionary Data Structure

Treap is a BST binary tree property search tree

property

Treap is also a heap heap-order

property random priorities

1512

1030

915

78

418

67

29

prioritykey

Randomized Data Structures

CSE 326 Autumn 20017

Treap Insert Choose a random priority Insert as in normal BST Rotate up until heap order is restored

67

insert(15)

78

29

1412

67

78

29

1412

915

67

78

29

915

1412

Randomized Data Structures

CSE 326 Autumn 20018

Tree + Heap… Why Bother? Insert data in sorted order into a treap …

What shape tree comes out?

67

insert(7)

67

insert(8)

78

67

insert(9)

78

29

67

insert(12)

78

29

1512

Treap Delete Find the key Increase its value to Rotate it to the fringe Snip it off

delete(9)

67

78

29

915

1512

78

67

9

915

1512

78

67

915

9

1512

78

67

915

1512

9

78

67

915

1512

Randomized Data Structures

CSE 326 Autumn 200110

Treap Summary Implements Dictionary ADT

insert in expected O(log n) time delete in expected O(log n) time find in expected O(log n) time but worst case O(n)

Memory use O(1) per node about the cost of AVL trees

Very simple to implement little overhead – less than AVL trees

Randomized Data Structures

CSE 326 Autumn 200111

Perfect Binary Skip List Sorted linked list # of links of a node is its height The height i link of each node (that has one)

links to the next node of height i or greater

8

2

11

10

1913 20

22

2923

Randomized Data Structures

CSE 326 Autumn 200112

Find in a Perfect Binary Skip List Start i at the maximum height Until the node is found or i is one and

the next node is too large: If the next node along the i link is less than

the target, traverse to the next node Otherwise, decrease i by one

Runtime?

Randomized Data Structures

CSE 326 Autumn 200113

Randomized Skip List IntuitionIt’s far too hard to insert into a perfect skip

list

But is perfection necessary?

What matters in a skip list? We want way fewer tall nodes than short ones Make good progress through the list with each

high traverse

Randomized Data Structures

CSE 326 Autumn 200114

Randomized Skip List Sorted linked list # of links of a node is its height The height i link of each node (that has one)

links to the next node of height i or greater There should be about 1/2 as many height i+1

nodes as height i nodes

2 19 23

8

13

292010

22

11

Randomized Data Structures

CSE 326 Autumn 200115

Find in a RSL Start i at the maximum height Until the node is found or i is one and

the next node is too large: If the next node along the i link is less than

the target, traverse to the next node Otherwise, decrease i by one

Same as for a perfect skip list!

Runtime?

Randomized Data Structures

CSE 326 Autumn 200116

Insertion in a RSL Flip a coin until it comes up heads; that

takes i flips. Make the new node’s height i. Do a find, remembering nodes where we go

down Put the node at the spot where the find ends Point all the nodes where we went down (up

to the new node’s height) at the new node Point the new node’s links where those

redirected pointers were pointing

RSL Insert Example

2 19 23

8

13

292010 11

insert(22)with 3 flips

2 19 23

8

13

292010

22

11

Runtime?

Randomized Data Structures

CSE 326 Autumn 200118

Range Queries and IterationRange query: search for everything that falls between two values

find start point walk list at the bottom output each node until the end point

Iteration: successively return (in order) each element in the structure

start at beginning, walk list to end Just like a linked list!

Runtimes?

Randomized Data Structures

CSE 326 Autumn 200119

Randomized Skip ListSummary

Implements Dictionary ADT insert in expected O(log n) find in expected O(log n)

Memory use expected constant memory per node about double a linked list

Less overhead than search trees for iteration over range queries

Randomized Data Structures

CSE 326 Autumn 200120

Random Number GenerationLinear congruential generator

xi+1 = Axi mod M

Choose A, M to minimize repetitions M is prime (Axi mod M) cycles through {1, … , M-1} before

repeating Good choices:

M = 231 – 1 = 2,147,483,647A = 48,271

Must handle overflow!

Randomized Data Structures

CSE 326 Autumn 200121

Some Properties of PrimesIf:

P is prime 0 < X < P

Then: XP-1 = 1 (mod P) the only solutions to X2 = 1 (mod P) are:

X = 1 and X = P - 1

Randomized Data Structures

CSE 326 Autumn 200122

Checking Non-PrimalityExponentiation (pow function, Weiss 2.4.4) can be modified to detect non-primality (composite-ness)

// Computes x^n mod(M)

HugeInt pow(HugeInt x, HugeInt n, HugeInt M) {

if (n == 0) return 1;

if (n == 1) return x;

HugeInt squared = x * x % M;

// 1 < x < M - 1 && squared == 1 --> M isn’t prime!

if (isEven(n)) return pow(squared, n/2, M);

else return (pow(squared, n/2, M) * x);

}

Randomized Data Structures

CSE 326 Autumn 200123

Checking PrimalitySystematic algorithm

For prime P, for all A such that 0 < A < PCalculate AP-1 mod P using powCheck at each step of pow and at end for primality conditions

Randomized algorithmUse one random A If the randomized algorithm reports failure, then P isn’t prime. If the randomized algorithm reports success, then P might be

prime. At most (P-9)/4 values of A fool this prime check if algorithm reports success, then P is prime with probability > ¾

Each new A has independent probability of false positive We can make probability of false positive arbitrarily small

Randomized Data Structures

CSE 326 Autumn 200124

Evaluating Randomized Primality TestingYour probability of being struck by lightning this

year: 0.00004%Your probability that a number that tests prime 11

times in a row is actually not prime: 0.00003%

Your probability of winning a lottery of 1 million people five times in a row: 1 in 2100

Your probability that a number that tests prime 50 times in a row is actually not prime: 1 in 2100

Randomized Data Structures

CSE 326 Autumn 200125

Cycles

Prime numbers

Random numbers

Randomized algorithms