15
1 Concrete collections II

1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

Embed Size (px)

Citation preview

Page 1: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

1

Concrete collections II

Page 2: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

2

HashSet

• hash codes are used to organize elements in the collections, calculated from the state of an object – hash codes are not necessarily unique

• and so-called buckets are used for hash collisions – initial bucket count is recommended to be

between 75 % and 150% of the expected element count

– automatic rehashing when load factor (default 75%) threshold is reached

• then, the bucket count is doubled • for example,

new HashSet (101, 0.75) // capacity, load factor

Page 3: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

3

HashSet (cont.)

Page 4: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

4

HashSet (cont.)

• iteration in (seemingly) random order • however, LinkedHashSet can preserve insertion

order

• a general problem involving search structures: – hashCode should not change when called

multiple times on the the same object – changing the state of an element of a set or map

may corrupt the data structure

Page 5: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

5

On defining hash functions

• hashCode can be any integer, positive or negative • definitions of equals and hashCode must be

compatible so that:

if x.equals (y), then also x.hashCode () == y.hashCode ()

• equals compares two object for "equality": do they have the same state – the default implementation in Object compares the

objects for "identity": are they the same object • hash code is calculated recursively on every

significant field and referenced object, and then combined into one integer result

• the programmer must determine what is significant

Page 6: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

6

On defining hash functions (cont.)

• e.g., see Item.hashCode () in the Ex. 2-3 TreeSetTest

return 13 * description.hashCode() + 17 * partNo; • a compatible Item.equals:

if (this == otherObj) return true;

if (otherObj == null) return false;

if (getClass() != otherObj.getClass()) return false;

Item other = (Item) otherObj; // cast is safe!

return description.equals(other.description)

&& partNumber == other.partNumber;

• writing really good hash functions may involve mathematical/theoretic problems; see, e.g., [Bloch, 2001] for advice on writing Java hash functions

Page 7: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

7

TreeSet

• TreeSet defines a sorted collection, i.e., implements SortedSet

• elements implement the interface Comparable <T>:

public int compareTo (T other) {

return ... // -1, 0, 1 if <, =, >

} • otherwise, TreeSet must use a Comparator <Item>:

SortedSet <Item> sortByComparator =

new TreeSet <Item> (

new Comparator <Item> () {

public int compare (Item a, Item b) {

.. return descrA.compareTo (descrB);

}});

Page 8: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

8

TreeSet (cont.)

• iteration is done in sorted order

• comparing string ignoring case is a common requirement

– the String class defines a Comparator <String> object String.CASE_INSENSITIVE_ORDER

– may result in an unsatisfactory ordering (since locales are not considered)

Page 9: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

9

Concrete Maps

• associative table consisting of key-value pairs • two concrete implementations: HashMap, TreeMap

value = map.get (keyObject) // or null if not found

map.put (keyObject, valueObject) // insert key-value

• put returns the old value – returning null means that there was no previous

definition for that key• keyObject may be null• valueObject cannot be null

Page 10: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

10

Concrete Maps (cont.)

• maps are not collections, but you can get views on keys, values, and entries:

Set <K> aSet = map.keySet ();

Set <Map.Entry <K,V>> aSet = map.entrySet ();

Collection <V> c = map.values (); // a multiset• note that keys and entries may be (sorted) sets, but

values are Collections (i.e., multisets or bags) • however, generally collection views can be sets or

multisets, depending on the context • views are collections and thus potentially more powerful

than iterators • a view can be handled as one unit, provide extra

operations, and potentially allow multiple traversals

Page 11: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

11

Concrete Maps (cont.)• for example, can enumerate all keys of a map:

Set <String> keys = map.keySet ();

for (String key : keys) {   do something with key

} • if want both keys and values:

for (Map.Entry<String, Empl> entry : staff.entrySet()){     String key = entry.getKey ();   Empl value = entry.getValue ();   do something with key and value}

• can remove (but cannot add) entries through iterators of such view collections for maps

Page 12: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

12

Special Map Classes

• IdentityHashMap uses the value returned by System.identityHashCode (anObject) to organize the keys – correspondingly, == is used to test the equality of

the objects (i.e., object states are ignored) • the hash value represents a low-level physical

address/memory reference – the original hashCode value defined by Object

does return distinct integers for distinct objects – typically implemented by converting the internal

address of the object into an integer • useful for object traversal, where only the identity of

objects is meaningful

Page 13: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

16

Collections and algorithms • lists can also be handled by algorithms provided in

the Collections class • note the 's' at the end of Collections! • for example, picking up a winning combination:

List <Integer> numbers = . .;

Collections.shuffle (numbers);

List <Integer> winning = numbers.subList (0, 7);

Collections.sort (winning);

System.out.println (winning);

• when implementing your own algorithms or queries, consider returning a copy of values, or a view into the original collection to prevent unwanted modifications of a shared data structure

Page 14: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

17

Collections and algorithms (cont.)

• e.g., sorting: Collections.sort (aList, aComparator) – copies the specified list into an array, sorts the

array, and iterates over updating the list – stable sort that preserves order of equal values

• Collections.reverseOrder () returns a comparator that imposes the reverse of the natural ordering

• of course, algorithms may presuppose that given collections allow and support appropriate operations – a List is modifiable if it supports the set method – a List is resizable if it supports the add and

remove methods

Page 15: 1 Concrete collections II. 2 HashSet hash codes are used to organize elements in the collections, calculated from the state of an object –hash codes are

18

Collections and algorithms (cont.)

• binary search finds value or locates insertion point:

i = Collections.binarySearch (aList, key, aComp)

– if not found, returns the insertion point (say, 0) + 1, as a negative value (here, -1):

if (i < 0) aList.add (-i-1, key); // -(i+1)

• provides many simple algorithms, e.g.: – obj = Collections.max (aCollection, aComparator) – min, copy, fill, replaceAll, indexOfSubList,

lastIndexOfSubList, reverse, rotate, etc. – the code becomes more readable