Upload
susanna-boone
View
214
Download
0
Embed Size (px)
Citation preview
Copyright 2004-2006 Curt Hill
Balance in Binary Trees
Impact on Performance
Copyright 2004-2006 Curt Hill
Tree Shape and Performance
• A tree that is balanced has excellent performance
• O(log2N) for:– Searches– Insertions– Deletions
• Only a hash table can beat this performance– But it has its own issues
Copyright 2004-2006 Curt Hill
What is balance?
• The notion is that the two sub-trees are of about the same size
• Thus a search eliminates half the tree in each examination
• Perfect balance:– For each node in the tree, the size of
the two sub-trees are off by at most one
Copyright 2004-2006 Curt Hill
Probabilities• What is the likelihood that a randomly
built tree will have good performance characteristics?
• This is a difficult question• The shape of a tree is dependent on the
entry order of the nodes to be inserted• Example:
– Consider the integers 1-7 as the items to put in a tree
– There are 7! = 5040 ways to order their input
• 7 ways to choose first• 6 ways to choose second• etc.
Copyright 2004-2006 Curt Hill
What do we want?
4
6
75
2
31
A search must look at no more than 3 nodes
Copyright 2004-2006 Curt Hill
Example Continued• There are two really bad ways to
choose the tree:– In ascending order or descending
order– There are only two of these but there
are several others that are just as bad– Consider 1 7 6 5 4 3 2 or
• 1 2 3 7 6 5 4
• Bad in this case means that every node has zero or one descendents
Copyright 2004-2006 Curt Hill
What do we not want?1
A search must look at no more than 7 nodes
2
3
4
2
6
7
1
5
3
7
65
4
Arrival in ascending order
Equally bad
Copyright 2004-2006 Curt Hill
Negative Combinatorics• There are two ways to choose the
first item– Each subsequent item provides two
ways:– The next item in ascending order– The last item– Therefore 2 * 2 * 2 * 2 * 2 * 2 * 1– Looks like 64 ways to choose a list– This is 1.27% chance of a list
• A search would look at no more than 7 nodes
Copyright 2004-2006 Curt Hill
Positive Combinatorics• There is only one way to choose
the root, it must be the 4• There are two ways to choose the
second: 2 or 6• There are three ways to choose
the third– If 2 was picked the 6 or any
descendent of 2– If 6 was picked the 2 or any
descendent of 6• It gets exciting after that
Copyright 2004-2006 Curt Hill
Positive Combinatorics• Sub-cases need to be examined of
the three last choices• These do not work well in this kind
of presentation• I believe that there are 80 out of
5040 (1.5%) permutations that yield a perfectly balance tree
• However, most possibilities fall somewhere in between maximum pathes of 7 and 3
Copyright 2004-2006 Curt Hill
Summary• The worst case is a linked list which is
bad– The worst case is not very likely
• The best case is perfectly balanced– The best case is more likely, but still unlikely
• Empirical studies indicate that the average path length of a unbalanced tree to be only 39% longer than a perfectly balanced tree
• Balancing is hard and slows insertions and deletions
Copyright 2004-2006 Curt Hill
When to Balance
• In most cases an unbalanced tree will perform quite adequately
• If the application fulfills the following two criteria then balancing could be considered– The data is large and the search
performance impacts the program– The number of searches is large
compared to insertion and deletion
Copyright 2004-2006 Curt Hill
Perfectly balanced trees
• Definition:– For each node the number of nodes of
the left and right sub-trees differ by only 1
• Balancing a tree is a recursive process that involves nodes from the leaves to root
• It is usually the case that control information is placed in node that measures the balance
Copyright 2004-2006 Curt Hill
Balance Again
• Balancing occurs in insertion and deletion, but not searches
• It is somewhat intricate so perfect balance is seldom used
• The ratio of searches to inserts and deletes must be very high
• Is there another definition of balance that gives good performance with less rebalancing
Copyright 2004-2006 Curt Hill
Height Balanced
• Also known as AVL balance– Adelson, Velski and Landis – Developed it and proved its desirability
• Definition:– The tree is balanced if for each node
the heights of the two sub-trees differ at most by one
• It is the height of the tree that determines the worst case search
Copyright 2004-2006 Curt Hill
Digression on Search
• Consider searching an array• On average the search requires
½N comparisons• The worst case is N searchs to find
last one or to show not found• The average and worst case are
quite different • This is not the case for trees
Copyright 2004-2006 Curt Hill
Searching Trees4
6
75
2
31
More than half the nodes are leaves at maximum depth. Worst case is three probes, but average case is only slightly less than three probes.
Copyright 2004-2006 Curt Hill
AVL Trees Again• Adelson, Velski and Landis proved:
– Worst case of an AVL tree is only 45% worse than perfectly balanced
– Average case: Insignificantly different than perfectly balanced
• Every perfectly balanced is also AVL balanced
• Far fewer rebalance, thus cheaper to construct– For the most part rebalancing occurs
when really needed
Copyright 2004-2006 Curt Hill
Construction
• Consider the construction of the following tree
• Four types of rebalancing operation– RR single– LL single– LR double– RL double
• Add: 4 5 7 2 1 3 6
Copyright 2004-2006 Curt Hill
After 2 inserts
4
5
Still perfectly balanced
Copyright 2004-2006 Curt Hill
Insert 7
4
5
7
Neither perfect nor AVL, rebalance is needed
Copyright 2004-2006 Curt Hill
Rotate Right
4
5
7
Rebalance is needed – RR Single
Copyright 2004-2006 Curt Hill
After Rotate
5
7
After rebalance
4
Copyright 2004-2006 Curt Hill
Insert 2
5
7
No problem
4
2
Copyright 2004-2006 Curt Hill
Insert 1
5
7
Unbalanced in other way – Do a LL single
4
2
1
Copyright 2004-2006 Curt Hill
Rebalance
5
7
Rebalance complete – not perfect but AVL
2
1 4
Copyright 2004-2006 Curt Hill
Insert 3
5
7
A rebalance is again needed, but different
2
1 4
3
Copyright 2004-2006 Curt Hill
After Rotatation
4
5
This requires LR double
2
1 37
Copyright 2004-2006 Curt Hill
Insert 6
4
5
This requires RL double
2
1 37
6
Copyright 2004-2006 Curt Hill
Rotate 6-7
4
5
This requires RL double
2
1 36
7
Copyright 2004-2006 Curt Hill
Rotate 5-6-7
4
6
Now complete
2
1 375
Copyright 2004-2006 Curt Hill
The problem of balancing• To implement requires extra stuff in
the nodes• Measures the height of the
descendents• Even with an AVL tree there is
substantial work to be done at insertion and deletion time
• Thus the search to insert and delete ratio needs to be high– Just not as high as perfect balance
Copyright 2004-2006 Curt Hill
Synonyms
• Another name for an AVL trees is Fibonacci tree
• The fact that heights may disagree by one leads to as strangely asymmetric tree
Copyright 2004-2006 Curt Hill
Is this balanced?
5
8
10
11
2
31
4
12
6
7 9