Upload
lamtram
View
247
Download
1
Embed Size (px)
Citation preview
GS Collections and Java 8
GS Collections and Java 8
Bangalore Open Java Users Group Meeting
April 26, 2014
Disclaimer: This material is for educational purposes only.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
• Open source collections framework developed in Goldman Sachs
• Rich Lambda-Ready Iteration API
4/26/2014
What is GS Collections?
© 2014 Goldman Sachs. All rights reserved.
• GS Collections Features and Comparison
• Lambda-Ready Iteration Examples in GS Collections and Java 8
• GSC API > Java 8 Stream API
• Memory Optimization
• GS Collections @ opensource
4/26/2014
Agenda
© 2014 Goldman Sachs. All rights reserved.
GS Collections and Java 8
GS Collections Features & Comparison
(With Java 8, Guava and Trove)
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
• Container Implementations • Generified Object Containers • Primitive Containers
• Utility Classes • Iterate • MapIterate • ArrayIterate • StringIterate
• Lambda-Ready Functional Interfaces • Predicate, Function, Procedure • Primitive Predicates, Functions,
Procedures with combination of primitive/object types
• Container Interface Hierarchies • Readable • Mutable • Immutable • FixedSize • Unmodifiable Collections
• Iteration Styles • Eager (Serial, Parallel) • Lazy (Serial, Parallel)
• Container Types • List • Set, SortedSet • Map, SortedMap • BiMap • Stack • Multimap • Interval
• API Style • Rich base API w/ 90+ methods • Object-Oriented – API directly
available on the appropriate types (List, Set, etc.)
• Functional (API leveraging lambdas/immutability)
• Static Utility
4/26/2014
GS Collections Features
© 2014 Goldman Sachs. All rights reserved.
Features GS Collections Java 8 Guava Trove
Rich API
Memory Optimized Alternatives for List, Set, Map
Primitive Collections
Immutable Collections
New Containers Bag, Multimap, BiMap, Interval
IntStream, LongStream, DoubleStream
Multiset, Multimap, BiMap
Iteration Styles Eager/Lazy, Serial/Parallel
Lazy, Serial/Parallel
Lazy, Serial
Eager, Serial
4/26/2014
Comparison with other Java Collections Frameworks
© 2014 Goldman Sachs. All rights reserved.
GS Collections and Java 8
Lambda-Ready Iteration Examples
GS Collections’ internal iteration patterns work really
well with Java 8 lambdas and method references!
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
4/26/2014
Comparison with other Java Collections Frameworks
© 2014 Goldman Sachs. All rights reserved.
RichIterable interface: 50+ Unique Methods, 70+ Total Methods (w/overloads)
λ aggregateBy λ flatCollect notEmpty
λ aggregateInPlaceBy λ forEach λ partition
λ allSatisfy λ forEachWith λ reject
λ anySatisfy λ forEachWithIndex λ rejectWith
appendString getFirst λ select
asLazy getLast λ selectWith
chunk λ groupBy size
λ collect λ groupByEach toArray
λ collectIf λ injectInto toBag
λ collectWith isEmpty toList
contains Iterator λ toMap
containsAll makeString toSet
containsAllArguments λ max λ toSortedList / toSortedListBy
containsAllIterable λ maxBy λ toSortedMap
λ count λ min λ toSortedSet / toSortedSetBy
λ detect λ minBy zip
λ detectIfNone λ noneSatisfy zipWithIndex
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
How rich and lambda ready is your iterable?
List<Person> people = new ArrayList<>();
people.add(person1);
people.add(person2);
people.add(person3);
List<Address> addresses = new ArrayList<>();
for (Person person : people) {
addresses.add(person.getAddress());
}
MutableList<Person> people = Lists.mutable.of(person1, person2, person3);
MutableList<Address> addresses = people.collect(new Function<Person, Address>() {
public Address valueOf(Person person) {
return person.getAddress();
}
});
JDK 7 Collect
GS Collections Collect With Anonymous Inner Class
4/26/2014
Internal Iteration Example - Collect
© 2014 Goldman Sachs. All rights reserved.
MutableList<Person> people = Lists.mutable.of(person1, person2, person3);
MutableList<Address> addressesLambda = people.collect(person -> person.getAddress());
MutableList<Address> addressesMethodReference = people.collect(Person::getAddress);
GS Collections collect With JDK 8 Lambda and Method Reference
MutableList<Person> people = Lists.mutable.of(person1, person2, person3);
MutableList<Address> addresses = people.collect(new Function<Person, Address>() {
public Address valueOf(Person person) {
return person.getAddress();
}
});
GS Collections Collect With JDK 7 Anonymous Inner Class
4/26/2014
Internal Iteration Example - Collect
© 2014 Goldman Sachs. All rights reserved.
• A sequence of elements from a source that supports aggregate operations.
• In java 8, the JCF collections have a method stream()
• Streams support internal iterations and pipelining.
default Stream<E> stream() {
return StreamSupport.stream(spliterator());
}
4/26/2014
Preface: Java 8 Stream
© 2014 Goldman Sachs. All rights reserved.
Stream<Address> stream =
people.stream().map(Person::getAddress);
JDK 8 Map With Stream and Method Reference
List<Address> addresses =
people.stream().map(Person::getAddress).collect(Collectors.toList());
4/26/2014
JDK 8 Stream Example - Map
© 2014 Goldman Sachs. All rights reserved.
Iteration Style
GSC Example JDK Example
Eager (collect)
List<Address> addresses =
people.collect(Person::getAddress);
Lazy (collect / map)
LazyIterable<Address> addresses = people.asLazy() .collect(Person::getAddress);
Stream<Address> addresses = people.stream() .map(Person::getAddress);
Lazy (to list)
List<Address> addresses = people.asLazy() .collect(Person::getAddress) .toList();
List<Address> addresses = people.stream() .map(Person::getAddress) .collect(Collectors.toList());
4/26/2014
Eager vs Lazy Iteration
© 2014 Goldman Sachs. All rights reserved.
MutableList<Person> adults = people.select(person -> person.getAge() > 18);
MutableList<Person> sortedByName = people.toSortedListBy(person -> person.getLastName());
Person oldestPerson = people.maxBy(person -> person.getAge());
Selecting people whose age is greater than 18
Sorting people by their last name
Finding the oldest person
MutableList<Person> sortedByName = people.toSortedListBy(Person::getLastName);
Person oldestPerson = people.maxBy(Person::getAge);
MutableList<Person> people = Lists.mutable.of(person1, person2, person3);
4/26/2014
Lambda Ready API
© 2014 Goldman Sachs. All rights reserved.
Multimap<String,Person> peopleByState = people.groupBy(person -> person.getState());
Grouping people by state
Multimap<String,Person> peopleByState = people.groupBy(Person::getState);
Key (state) Values (person)
"Karnataka" person1, person2
"Punjab" person3, person4, person5
"Goa" person6
MutableListMultimap<String,Person> peopleByState = people.groupBy(Person::getState);
MutableList<Person> punjabis = peopleByState.get("Punjab");
4/26/2014
Lambda Ready API
© 2014 Goldman Sachs. All rights reserved.
Multimap<String,Person> peopleByState = people.groupBy(Person::getState);
Map<String, List<Person>> peopleByState = people.stream().collect(Collectors.groupingBy(Person::getState));
JDK 8 groupBy
GSC groupBy
4/26/2014
GSC groupBy vs Stream groupBy
© 2014 Goldman Sachs. All rights reserved.
Partition
PartitionMutableList<Person> partition = people.partition(person -> person.getAge() >= 18); MutableList<Person> adults = partition.getSelected(); MutableList<Person> children = partition.getRejected();
Map<Boolean, List<Person>> partition = people.stream().collect(Collectors.partitioningBy(person -> person.getAge() >= 18)); List<Person> adults = partition.get(Boolean.TRUE); List<Person> children = partition.get(Boolean.FALSE);
Partition in Stream API*
4/26/2014
GSC partition vs Stream partition
© 2014 Goldman Sachs. All rights reserved.
*Partitioning using Stream API returns a Map with Boolean keys. Getting the selected and rejected lists requires a call to Map.get(), which is a hash lookup on the map, unlike the straightforward calls to getSelected() and getRejected() on the GSC PartitionMutableList.
GS Collections and Java 8
GSC API > Java 8 Stream API
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
MutableList<Person> adults = people.select(person -> person.olderThan(18));
MutableList<Person> drivers = people.selectWith(Person::olderThan, 16); MutableList<Person> voters = people.selectWith(Person::olderThan, 18); MutableList<Person> eligibleToContestElections = people.selectWith(Person::olderThan, 25);
No “with” methods in Stream API
List<Person> drivers = people.stream() .filter(person -> person.olderThan(16)) .collect(Collectors.toList()); List<Person> voters = people.stream() .filter(person -> person.olderThan(18)) .collect(Collectors.toList()); List<Person> eligibleToContestElections = people.stream() .filter(person -> person.olderThan(25)) .collect(Collectors.toList());
4/26/2014
GSC API > Stream API: “with” methods
© 2014 Goldman Sachs. All rights reserved.
• Problem statement: Every person has multiple addresses, each in a different state.
• Group people by state where a person appears as a value mapped to every state he has an address in.
Key (state) Values (person)
"Karnataka" person1, person2
"Punjab" person1, person3
"Goa" person2
4/26/2014
GSC API > Stream API: groupByEach
© 2014 Goldman Sachs. All rights reserved.
No iteration pattern for this use case in Stream API
Map<String, List<Person>> peopleByStates = new HashMap<>(); for (Person person : people) { MutableList<String> states = person.getStates(); for (String state : states) { if (peopleByStates.get(state) == null) { peopleByStates.put(state, new ArrayList<>()); } peopleByStates.get(state).add(person); } } return peopleByStates;
GS Collections groupByEach
Multimap<String, Person> peopleByStates = people.groupByEach(Person::getStates);
4/26/2014
GSC API > Stream API: groupByEach
© 2014 Goldman Sachs. All rights reserved.
GS Collections and Java 8
Memory Optimization
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
• UnifiedMap – built without using Entry objects.
• UnifiedSet – not built using a map.
• Empty should be empty.
• Primitive Collections.
• Memory efficient containers for small-sized collections.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Memory Optimization
• Stream API can be used with JDK collections like HashSet and HashMap
• But the memory usage of these collections is way higher than GSC UnifiedSet and UnifiedMap
• With GS Collections, you get rich API along with memory savings!
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Why does it matter?
• For every put, HashMap creates an Entry object.
• UnifiedMap stores keys and values in alternate slots on a single array.
• Consecutive memory locations are faster to access.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
UnifiedMap
0
5
10
15
20
25
30
35
40
45
Size
(M
b)
Elements
Mutable Map
JDK HashMap
GSC UnifiedMap
Trove THashMap
JDK Hashtable
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Save 50% Memory with the GSC UnifiedMap
• HashSet uses HashMap as its backing collection.
• For every add, an Entry object is created with the element as key and null value.
• UnifiedSet uses an array as its backing collection.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
UnifiedSet
0
10
20
30
40
50
60
Size
(M
b)
Elements
Mutable Set
JDK HashSet
GSC UnifiedSet
Trove THashSet
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Save 400% Memory with the GSC UnifiedSet
• What are primitives? • Not everything in java is an Object.
• Primitive types are automatic variables that are not references.
• The variables hold the value and its place on the stack, so it’s much more efficient.
• Why primitive collections? • Reduced memory usage
• Improved performance
• Eliminates the need to depend on multiple libraries – PCJ, Trove etc.
• What primitive collections are available in GSC? • List, Set, Map (all primitive/object combinations)
• Stack
• Bag
• For what all primitive types? • All eight: boolean, byte, char, double, float, int, long, short
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Primitive Collections
0
5
10
15
20
25
Size
(M
b)
Elements
IntList
JDK ArrayList
GSC IntArrayList
Trove TIntArrayList
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Save Memory with Primitive Collections: IntList
• GS Collections on github: https://github.com/goldmansachs/gs-collections
• GS Collections Kata on github:
https://github.com/goldmansachs/gs-collections-kata
• GS Collections on Maven Central Repo:
http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.goldmansachs%22
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
GSC@opensource
• Reducing Code with Lambdas
• Anagrams Evolution
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Appendix
• Converted nearly all of our anonymous inner classes in tests to lambdas and method references.
• 8% reduction of code in the test module.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Reducing Code with Lambdas
• Anagram group: A group of words which contain exactly the same letters, but in a different order.
• Problem statement: Read a word list, and print out all the anagram groups that meet a size criterion. Inputs:
1. Word List
2. Size: the minimum size of anagram group to print out
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams: Java Collections Tutorial
10: [least, setal, slate, stale, steal, stela, taels, tales, teals, tesla]
12: [apers, apres, asper, pares, parse, pears, prase, presa, rapes, reaps, spare, spear]
11: [alerts, alters, artels, estral, laster, ratels, salter, slater, staler, stelar, talers]
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams: Sample Output
• Approach: Group the list of words by their alphagram.
• Alphagram: An alphagram of a word consists of the word's letters arranged in alphabetical order.
• For example, the alphagram of alphagram is aaaghlmpr.
• If two words have the same alphagram, they are anagrams.
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams with GSC
private static final Procedure<? super String> PRINT = new Procedure<String>() {
@Override
public void value(String each) {
System.out.println(each);
}
};
private static final int SIZE_THRESHOLD = 10;
private static final Function<RichIterable<String>, Integer> ITERABLE_SIZE_FUNCTION = new ListSizeFunction<>();
private static final Comparator<RichIterable<String>> ASCENDING_ITERABLE_SIZE = Comparators.byFunction(ITERABLE_SIZE_FUNCTION);
private static final Comparator<RichIterable<String>> DESCENDING_ITERABLE_SIZE = Collections.reverseOrder(ASCENDING_ITERABLE_SIZE);
private static final Predicate<RichIterable<String>> ITERABLE_SIZE_AT_THRESHOLD = new Predicate<RichIterable<String>>() {
@Override
public boolean accept(RichIterable<String> each) {
return each.size() >= SIZE_THRESHOLD;
}
};
private static final Function<RichIterable<String>, String> ITERABLE_TO_FORMATTED_STRING =
new Function<RichIterable<String>, String>() {
@Override
public String valueOf(RichIterable<String> list) {
return list.size() + ": " + list;
}
};
private static final Function<String, Alphagram> ALPHAGRAM_FUNCTION =
new Function<String, Alphagram>() {
@Override
public Alphagram valueOf(String string) {
return new Alphagram(string);
}
};
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(ALPHAGRAM_FUNCTION)
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Before Lambdafying
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(iterable -> iterable.size() + ": " + iterable)
.forEach((Procedure<String>) System.out::println);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
After Lambdafying
class Alphagram { private final char[] key; private Alphagram(String string) { this.key = string.toCharArray(); Arrays.sort(this.key); } @Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || this.getClass() != o.getClass()) { return false; } Alphagram alphagram = (Alphagram) o; return Arrays.equals(this.key, alphagram.key); } @Override public int hashCode() { return Arrays.hashCode(this.key); } @Override public String toString() { return new String(this.key); } }
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Alphagram
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(ALPHAGRAM_FUNCTION)
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(ALPHAGRAM_FUNCTION)
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
private static final Function<String, Alphagram> ALPHAGRAM_FUNCTION =
new Function<String, Alphagram>() {
@Override
public Alphagram valueOf(String string) {
return new Alphagram(string);
}
};
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(word -> new Alphagram(word))
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(ITERABLE_SIZE_AT_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
private static final Predicate<RichIterable<String>> ITERABLE_SIZE_AT_THRESHOLD =
new Predicate<RichIterable<String>>() {
@Override
public boolean accept(RichIterable<String> each) {
return each.size() >= SIZE_THRESHOLD;
}
};
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(ITERABLE_TO_FORMATTED_STRING)
.forEach(PRINT);
}
private static final Function<RichIterable<String>, String> ITERABLE_TO_FORMATTED_STRING =
new Function<RichIterable<String>, String>() {
@Override
public String valueOf(RichIterable<String> list) {
return list.size() + ": " + list;
}
};
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(iterable -> iterable.size() + ": " + iterable)
.forEach(PRINT);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(iterable -> iterable.size() + ": " + iterable)
.forEach(PRINT);
}
private static final Procedure<? super String> PRINT =
new Procedure<String>() {
@Override
public void value(String each) {
System.out.println(each);
}
};
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams
@Test
public void anagramsWithMultimap() {
this.getWords().groupBy(Alphagram::new)
.multiValuesView()
.select(iterable -> iterable.size() >= SIZE_THRESHOLD)
.collect(iterable -> iterable.size() + ": " + iterable)
.forEach((Procedure<String>) System.out::println);
}
4/26/2014 © 2014 Goldman Sachs. All rights reserved.
Anagrams