Upload
kellie-cameron
View
212
Download
0
Embed Size (px)
Citation preview
CO
MP 1
03
Bitsets
2
Sets, and more Sets! Unsorted Array Sorted Array O(n) for at least one of Linked List contains, add, remove
Binary Search Tree O(log n) for everything, if balanced
Can we do even better?
BitSets Hash tables
Same cost, regardless of size!
O(1) !!!
3
More operations on Sets Operations on single elements:
contains add, remove
Operations on whole sets: size (cardinality) of a set iterate through the set intersection (values common to two sets) union (values in either of two sets) set difference (values in one set but not the
other) test for equality/subset
Depending on the application, we may need any or allthese operations to be fast!
4
eg. Set Intersection Unsorted arrays / linked lists:
Algorithm: ? Cost: # comparisons =
Sorted arrays / linked lists:
Algorithm: ? Cost: # comparisons =
B X D T Q W E V Z C R F
F Y U J H I M X O K P T
B XD TQ WE V ZC RF
F YUJH I M XOK P T
5
Set Intersection Binary Search Trees:
Algorithm:
Cost:
Ex: Work out algorithms and costs for the other “whole set” operations, using different Set implementations.
6
Bit SetsIf the range of possible elements for a set is:
discrete finite not too big
... then can use an array of booleans: one cell for each possible element true if that element is in the set false if that element is not in the set
a b c d e f g h i j k l m n o p q r s t u v w x y z✔ ✗ ✔ ✔ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
a b c d e f g h i j k l m n o p q r s t u v w x y z✔ ✔ ✔ ✔ ✔ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
a b c d e f g h i j k l m n o p q r s t u v w x y z✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗✔ ✔ ✔ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗
a,e,i,o,u
b,d,f,h,k,l,t
g,j,p,q,y
7
BitSet Implementationprivate boolean[] data;
public BitSet(int maxItems) {data = new boolean[maxItems];
}public boolean contains(int value) {
if (value < 0 || value >= data.length)return false;
return data[value]; }public void add(int value) {
if (value >= 0 && value < data.length)data[value] = true;
}public void remove(int value) {
if (value >= 0 && value < data.length)data[value] = false;
}
Exercise: Extend add and remove to return booleans.
Or signal an error. Or use array list and expand as needed.
8
BitSets: Costs
u y✔✗
set.contains(‘f’) set.add(‘y’) set.remove(‘u’)
Intersection:for (i=0…N)
ans.data[i] = set1.data[i] && set2.data[i] Cost: O(N) (number of possible values!!) but NOT
item comparisons!!
Other operations: union, difference, equal, subset, …?
a b c d e f g h i j k l m n o p q r s t v w x z✔ ✗ ✔ ✔ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Cost: O(1)
Cost: O(1) Cost: O(1)
9
BitSets are the best Very Fast!
Can be improved: boolean[ ] data uses just one bit in each memory
location could use every bit, by thinking of the whole array
as an int/long!
can then operate on sets using bitwise operations:
& and |.
But: Values must be integers or characters (to index
into an array) Number of possible values must be not too
large(especially for intersection, union, iteration)
Eg: Days, months, timetable hours
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ….
Java might use less space than this, e.g. one byte per boolean.
10
O(1) Sets with big values?
✔
What about: Sets of objects (including strings, URLs, email
addresses)? Sets of floating point numbers (double)?
Need a way to compute an array index for an object, eg:add(“A sentence that belongs in a set of
sentences”)
“Hashing”: the number is the “hash code” of object
0 1 2 3 4 5 6 7 8 9 581 N✔ ✗ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗⋯ ⋯✗
Hash function 581