34
Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Embed Size (px)

Citation preview

Page 1: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Balance in Binary Trees

Impact on Performance

Page 2: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Tree Shape and Performance

• A tree that is balanced has excellent performance

• O(log2N) for:– Searches– Insertions– Deletions

• Only a hash table can beat this performance– But it has its own issues

Page 3: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

What is balance?

• The notion is that the two sub-trees are of about the same size

• Thus a search eliminates half the tree in each examination

• Perfect balance:– For each node in the tree, the size of

the two sub-trees are off by at most one

Page 4: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Probabilities• What is the likelihood that a randomly

built tree will have good performance characteristics?

• This is a difficult question• The shape of a tree is dependent on the

entry order of the nodes to be inserted• Example:

– Consider the integers 1-7 as the items to put in a tree

– There are 7! = 5040 ways to order their input

• 7 ways to choose first• 6 ways to choose second• etc.

Page 5: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

What do we want?

4

6

75

2

31

A search must look at no more than 3 nodes

Page 6: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Example Continued• There are two really bad ways to

choose the tree:– In ascending order or descending

order– There are only two of these but there

are several others that are just as bad– Consider 1 7 6 5 4 3 2 or

• 1 2 3 7 6 5 4

• Bad in this case means that every node has zero or one descendents

Page 7: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

What do we not want?1

A search must look at no more than 7 nodes

2

3

4

2

6

7

1

5

3

7

65

4

Arrival in ascending order

Equally bad

Page 8: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Negative Combinatorics• There are two ways to choose the

first item– Each subsequent item provides two

ways:– The next item in ascending order– The last item– Therefore 2 * 2 * 2 * 2 * 2 * 2 * 1– Looks like 64 ways to choose a list– This is 1.27% chance of a list

• A search would look at no more than 7 nodes

Page 9: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Positive Combinatorics• There is only one way to choose

the root, it must be the 4• There are two ways to choose the

second: 2 or 6• There are three ways to choose

the third– If 2 was picked the 6 or any

descendent of 2– If 6 was picked the 2 or any

descendent of 6• It gets exciting after that

Page 10: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Positive Combinatorics• Sub-cases need to be examined of

the three last choices• These do not work well in this kind

of presentation• I believe that there are 80 out of

5040 (1.5%) permutations that yield a perfectly balance tree

• However, most possibilities fall somewhere in between maximum pathes of 7 and 3

Page 11: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Summary• The worst case is a linked list which is

bad– The worst case is not very likely

• The best case is perfectly balanced– The best case is more likely, but still unlikely

• Empirical studies indicate that the average path length of a unbalanced tree to be only 39% longer than a perfectly balanced tree

• Balancing is hard and slows insertions and deletions

Page 12: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

When to Balance

• In most cases an unbalanced tree will perform quite adequately

• If the application fulfills the following two criteria then balancing could be considered– The data is large and the search

performance impacts the program– The number of searches is large

compared to insertion and deletion

Page 13: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Perfectly balanced trees

• Definition:– For each node the number of nodes of

the left and right sub-trees differ by only 1

• Balancing a tree is a recursive process that involves nodes from the leaves to root

• It is usually the case that control information is placed in node that measures the balance

Page 14: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Balance Again

• Balancing occurs in insertion and deletion, but not searches

• It is somewhat intricate so perfect balance is seldom used

• The ratio of searches to inserts and deletes must be very high

• Is there another definition of balance that gives good performance with less rebalancing

Page 15: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Height Balanced

• Also known as AVL balance– Adelson, Velski and Landis – Developed it and proved its desirability

• Definition:– The tree is balanced if for each node

the heights of the two sub-trees differ at most by one

• It is the height of the tree that determines the worst case search

Page 16: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Digression on Search

• Consider searching an array• On average the search requires

½N comparisons• The worst case is N searchs to find

last one or to show not found• The average and worst case are

quite different • This is not the case for trees

Page 17: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Searching Trees4

6

75

2

31

More than half the nodes are leaves at maximum depth. Worst case is three probes, but average case is only slightly less than three probes.

Page 18: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

AVL Trees Again• Adelson, Velski and Landis proved:

– Worst case of an AVL tree is only 45% worse than perfectly balanced

– Average case: Insignificantly different than perfectly balanced

• Every perfectly balanced is also AVL balanced

• Far fewer rebalance, thus cheaper to construct– For the most part rebalancing occurs

when really needed

Page 19: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Construction

• Consider the construction of the following tree

• Four types of rebalancing operation– RR single– LL single– LR double– RL double

• Add: 4 5 7 2 1 3 6

Page 20: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

After 2 inserts

4

5

Still perfectly balanced

Page 21: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Insert 7

4

5

7

Neither perfect nor AVL, rebalance is needed

Page 22: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Rotate Right

4

5

7

Rebalance is needed – RR Single

Page 23: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

After Rotate

5

7

After rebalance

4

Page 24: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Insert 2

5

7

No problem

4

2

Page 25: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Insert 1

5

7

Unbalanced in other way – Do a LL single

4

2

1

Page 26: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Rebalance

5

7

Rebalance complete – not perfect but AVL

2

1 4

Page 27: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Insert 3

5

7

A rebalance is again needed, but different

2

1 4

3

Page 28: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

After Rotatation

4

5

This requires LR double

2

1 37

Page 29: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Insert 6

4

5

This requires RL double

2

1 37

6

Page 30: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Rotate 6-7

4

5

This requires RL double

2

1 36

7

Page 31: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Rotate 5-6-7

4

6

Now complete

2

1 375

Page 32: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

The problem of balancing• To implement requires extra stuff in

the nodes• Measures the height of the

descendents• Even with an AVL tree there is

substantial work to be done at insertion and deletion time

• Thus the search to insert and delete ratio needs to be high– Just not as high as perfect balance

Page 33: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Synonyms

• Another name for an AVL trees is Fibonacci tree

• The fact that heights may disagree by one leads to as strangely asymmetric tree

Page 34: Copyright 2004-2006 Curt Hill Balance in Binary Trees Impact on Performance

Copyright 2004-2006 Curt Hill

Is this balanced?

5

8

10

11

2

31

4

12

6

7 9