Upload
dorthy-newman
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Lecture # 30
Data Organization
and Binary Search
Data Organization
Problem
• Huge amounts of information
• How do I find– Information that I know I want– Information related to what I want
• How do I understand– Particular pieces of information– The whole collection of information
Limitations
• Screen space
• Network bandwidth– Bandwidth - how much information can be
transmitted per second
• Human attention
Kinds of things to organize
• Menu items– MS Word - about 150 menu items
• Text– Pages in a book - 500– Documents on the WWW - gazillions
• Images– All of the pictures created in a commercial
advertising company
Kinds of things to organize
• Sounds– Sound tracks to all TV and Radio news broadcasts
• Video– A complete collection of classic movies
• Structured information (records)– People– Cars– Students– Electronic appliance parts
A question of scale
• 10 things
• 100 things - menu
• 1,000 things - files on your computer
• 10,000 things - students at a university
• 1,000,000 things - books in a library
• gazillion things - WWW pages
Three ways to find things
• Lists – arrays
• Trees – organize in to categories
• Search – describe what you want and have the computer
find it
The Phone Book Challenge
• How long will it take to find “Bill Lund” in the BYU Directory?
• How long will it take to find “422-8766” in the BYU Directory?
What Algorithm did you use to search the phone book?
• Where did you start?
• How many steps did it take?
• Is there a more efficient way?
Binary search - for “Goodrich”
Binary search - for “Goodrich”
Lower = 0Upper = 10
Guess = (0+10)/2 = 5
Binary search - for “Goodrich”
Lower = 0Upper = 5
Guess = (0+5)/2 = 2
Binary search - for “Goodrich”
Lower = 2Upper = 5
Guess = (2+5)/2 = 3
Binary search - for “Goodrich”
Lower = 3Upper = 5
Guess = (3+5)/2 = 4
Binary search
• If there are 64 things in a list, how many times can you divide that list in half?– 32, 16, 8, 4, 2, 1
• 6 times
Binary search
• If there are 1024 things in a list, how many times can you divide that list in half?– 512, 256, 128, 64, 32, 16, 8, 4, 2, 1
• 10 times
Binary search
• If the size of the list doubles, how many more steps are required in a binary search?
1
Binary search
• If there are N items in a list then binary search takes
• log2(N) steps
Binary search
• Estimating log2(N)– Count the number of digits and multiply by 2.5
• 1000– 4*2.5 = 10 steps
• 1,000,000– 7*2.5 = 17-18 steps
• 1,000,000,000– 10*2.5= 25 steps
Provo/Orem phone book
• How long to find “Bill Lund?”~ 5000 in the BYU Directory
–Log2(5000) approx 4*2.5 = 10 steps
How to find a phone number
• 920-3231– 1 step
• 130-2313– 11 steps
• Average?– 5 steps
• Average N?– N/2
Provo/Orem phone book
• How many steps to find a phone number?– 5,000/2 = 2,500 average
• How can we improve this?
Sort the phone book by phone number
• What if I want to search on both name and number?
Using an IndexLast Name Phone number
Using an IndexLast Name Phone number
Anderson
Using an IndexLast Name Phone number
Anderson, Bilinski
Using an IndexLast Name Phone number
Anderson, Bilinski, Clark
Using an IndexLast Name Phone number
Anderson, Bilinski, Clark, Garcia
Using an IndexLast Name Phone number
123-3123
Using an IndexLast Name Phone number
123-3123, 130-2313
Using an IndexLast Name Phone number
123-3123, 130-2313, 232-0312
Using an IndexLast Name Phone number
123-3123, 130-2313, 232-0312, 238-1234
Search for GoodrichLast Name
Lower = 0Upper = 10
Guess = 5
lower
Search for GoodrichLast Name
Lower = 0Upper = 5
Guess = 2
above
Search for GoodrichLast Name
Lower = 2Upper = 5
Guess = 3
above
Search for GoodrichLast Name
Lower = 3Upper = 5
Guess = 4
above
Search for 823-1242
Lower = 0Upper = 10
Guess = 5
above
Phone number
Search for 823-1242
Lower = 5Upper = 10
Guess = 7
below
Phone number
Search for 823-1242
Lower = 5Upper = 7
Guess = 6
MATCH
Phone number
Using an IndexLast Name Phone number
• What about first name or city?– another index
Data Organization Summary
• What are we organizing for?• Scale
– 10 - 1,000 - 1,000,000 - 1,000,000,000
• Lists– Unsorted (N/2)– Sorted Log2(N)
• count the digits and multiply by 2.5
• To access in many ways– Use many indices into the same data