Randomized Algorithms
Pasi Fränti9.10.2015
Treasure islandTreasure worth 20.000 awaits
5000
DAAexpedition
5000
5000
?
?
Map for sale: 3000
To buy or not to buy
Buy the map:
Take a change:
20000 – 5000 – 3000 = 12.000
20000 – 5000 = 15.000
20000 – 5000 – 5000 = 10.000
To buy or not to buy
Buy the map:
Take a change:
20000 – 5000 – 3000 = 12.000
20000 – 5000 = 15.000
20000 – 5000 – 5000 = 10.000
Expected result:0.5 ∙ 15000 + 0.5 ∙ 10000 = 12.500
Three type of randomization
1. Las Vegas- Output is always correct result- Result is not always found- Probability of success p
2. Monte Carlo- Result is always found- Result can be inaccurate (or even false!)- Probability of success p
3. Sherwood- Balancing the worst case behavior
Las Vegas
Dining philosophers
Who eats?
Las Vegas
Input: Bit-vector A[1,n]Output: Index of any 1-bit from A
LasVegas(A, n) index
REPEATk ← Random(1, n);
UNTIL A[k]=1;
RETURN k
1 0 0 1 0 00 0 0 0 1 0 …
6
8-Queens puzzle
INPUT: Eight chess queens and an 8×8 chessboardOUTPUT: Setup where no queens attack each other
8-Queens brute force
Brute force• Try all positions• Mark illegal squares• Backtrack if dead-end• 114 setups in total
Random• Select positions randomly• If dead-end, start over
Randomized• Select k rows randomly• Rest rows by Brute Force
8
6
4
…
Where next…?
Pseudo code8-Queens(k)
FOR i=1 TO k DO // k Queens randomly r Random[1,8];IF Board[i,r]=TAKEN THEN RETURN Fail;ELSE ConquerSquare(i,r);
FOR i=k+1 TO 8 DO // Rest by Brute Forcer1; foundNO;WHILE (r≤8) AND (NOT found) DO
IF Board[i,r] NOT TAKEN THEN ConquerSquare(i,r); foundYES;
IF NOT found THEN RETURN Fail;
ConquerSquare(i,j)Board[i,j] QUEEN;FOR z=i+1 TO 8 DO
Board[z,j] TAKEN;Board[z,j-(z-i)] TAKEN;Board[z,j+(z-i)] TAKEN;
Probability of success
s = processing time in case of successe = processing time in case of failure
p = probability of successq = 1-p = probability of failure
ep
qst
qepspttt
qepsqtt
qtqepsteqpst
Example:
s=e=1, p=1/6
t=1+5/1∙1=6
Experiments with varying k
K S E T P
0 114 - 114 100%
1 39.6 - 39.6 100%
2 22.5 36.7 25.2 88%
3 13.5 15.1 29.0 49%
4 10.3 8.8 35.1 26%
5 9.3 7.3 46.9 16%
6 9.1 7 53.5 14%
7 9 7 56.0 13%
8 9 7 56.0 13%
Fastestexpectedtime
Random SwapClustering
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Swap-based clustering
Clustering by Random Swap
RandomSwap(X) → C, PC ← SelectRandomRepresentatives(X);P ← OptimalPartition(X, C);REPEAT T times
(Cnew, j) ← RandomSwap(X, C);Pnew ← LocalRepartition(X, Cnew, P, j);Cnew, Pnew ← Kmeans(X, Cnew, Pnew);IF f(Cnew, Pnew) < f(C, P) THEN
(C, P) ← Cnew, Pnew;
RETURN (C, P);
P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-
369, 2000.
Select random neighbor
Accept only if it improves
1. Random swap:
2. Re-partition vectors from old cluster:
3. Create new cluster:
c x j random M i random Nj i ( , ), ( , )1 1
p d x c i p jik M
i k i
arg min ,1
2
p d x c i Nik j k p
i ki
arg min , ,2
1
Clustering by Random Swap
Choices for swapSwap is made from
centroid rich area tocentroid poor area.
Swap is made fromcentroid rich area tocentroid poor area.
O(M) clusters
to be removed
O(M) clusters
where to add
O(M2) different choices in total
=
Select a proper centroid for removal:
– M clusters in total: premoval=1/M.
Select a proper new location:
– N choices: padd=1/N
– M of them significantly different: padd=1/M
In total:
– M2 significantly different swaps.
– Probability of each is pswap=1/M2
– Open question: how many of these are good
– Theorem: α are good for add and removal.
Probability for successful Swap
Probability of not finding good swap:T
Mq
2
2
1
2
2
1loglogM
Tq
2
2
1log
log
M
qT
Estimated number of iterations:
Clustering by Random Swap
Iterated T times
2
2
ln -α
MqT
2
2
2222-ln
/
ln -
/1ln
ln
α
Mq
Mα
q
Mα
qT
Upper limit:
Lower limit similarly; resulting in:
Bounds for the iterations
Number of iterations needed (T):
α
NMq-N
α
Mq-MNT
2
2
2 lnln ,
2
2
ln -α
MqT
t = O(αN)
Total time:
Time complexity of single step (t):
Total time complexity
Probability of success (p)depending on T
0
20
40
60
80
100
0 50 100 150 200 250 300
Iterations
p
Bridge
160
165
170
175
180
185
190
0.1 1 10 100 1000Time
MS
E
Random Swap
Repeated k-means
Time-distortion performance
Monte Carlo
Input: Bit-vector A[1,n], max iterations zOutput: An index of any 1-bit in A.
MonteCarlo(A, n, z) → True/Falsei ← 0;REPEAT
k ← Random(1, n);i ← i+1;
UNTIL (A[k]=1) OR (i>z);RETURN k
Monte Carlo1 0 0 1 0 00 0 0 0 1 0 …
Monte Carlo
Potential problems to be considered:• Detecting prime numbers• Calculating integral of a function
To appear in … future…
Sherwood
• Used in Quicksort• Data is already sorted• Worst can happens always if sorted data
N-11
N-21
N-31
…O(N2)
Selection of pivot elementNaïve: first or last item
… 31 28 24 12 11 750 33 7 5 2 191929798
5 7 7 11 12 241 2 28 31 33 50 98979291…
Selection of pivot elementRandom item
75%
75%
…
O(NlogN)
5 7 7 11 12 241 2 28 31 33 50 …
• Worst can still happens• But with probability (1/n)n
11
25%
75%25%
25%
Simulated dynamic linked list
1. Sorted array- Search efficient: O(logN)- Insert and Delete slow: O(N)
2. Dynamically linked list- Insert and Delete fast: O(1)- Search inefficient: O(N)
Simulated dynamic linked listExample
i 1 2 3 4 5 6 7
Value 2 4 15
1 5 21
7
Next 2 5 6 1 7 0 3
1 152 4 75 21Head
Linked list:
Head=4Simulated by
array:
SEARCH (A, x)
i := A.HEAD;max := A[i].VALUE;
FOR k:=1 TO N DOj:=RANDOM(1, N);y:=A[j].VALUE;IF (max<y) AND (y≤x)
THENi:=j; max:=y;
RETURN LinearSearch(A, x, i);
Simulated dynamic linked listDivide-and-conquer with randomization
N random breakpoints
Biggest breakpoint ≤ x
Value searched
Full search from breakpoint i
Analysis of the search
max search for
N N(on
average)
• Divide into N segments• Each segment has N/N = N elements• Linear search within one segment.• Expected time complexity = N + N =
O(N)
Experiment with students
1 2 3 4 99 100
Data (N=100) consists of numbers from 1..100:
Select N breaking points:
Searching for…
77
Empty space for notes