Go Back   CORTEX Forums > Vendors and Service Provders > Open Source Analytics > Open Source News and Opinion
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Roll your own high-performance Java collections classes

This is a discussion on Roll your own high-performance Java collections classes within the Open Source News and Opinion forums, part of the Open Source Analytics category; The Java collections framework is great. You can create maps, sets, lists with various element types, various performance characteristics (e.g. if you want O(1) insert, use a linked list), iterate ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 4th June 2011, 07:29 AM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,085
Latest News Headlines is on a distinguished road
Post Roll your own high-performance Java collections classes

The Java collections framework is great. You can create maps, sets, lists with various element types, various performance characteristics (e.g. if you want O(1) insert, use a linked list), iterate over them, and you can decorate them to give them other behaviors.

But suppose that you want to create a high-performance, memory efficient immutable list of integers? You'd write

List list =

Collections.unmodifiableList(

new ArrayList(

Arrays.asList(1000, 1001, 1002)));

There will be 6 objects allocated in the JVM: three Integer objects, an array Object[3] to hold the Integers, an ArrayList, and an UnmodifiableRandomAccessList. Not to mention the Arrays.ArrayList and Integer[3] used to construct the list and quickly thrown away.

The resulting list is no longer high-performance. A call to say 'int n = list.get(2)' requires 3 method calls (UnmodifiableRandomAccessList.get, ArrayList.get, Integer.intValue) and 3 indirections. And the sheer number of objects created reduces the chance that a given stretch of code will be able to operate solely from the contents of L1 cache.

So, what next? Should I write my own class, like this?

public class UnmodifiableNativeIntArrayList

implements List

{

...

}

Well, maybe. But there are rather a lot of variations to cover, and each one needs to be hand-coded and tested.

Do I use library code? I searched and turned up Apache Commons Primitives, Primitive Collections for Java (PCJ), and GNU Trove (trove4j). Of these, only GNU Trove is still active.

None of the libraries supports features such as maps with two or more keys, unmodifiable collections, synchronized collections, flat collections (similar to Apache Flat3Map). It's not surprising that they don't: each combination of features would require its own class, so the size of the jar file would grow exponentially.

So, I'd like to propose an alternate approach. You configure a factory, specifying the precise kind of collection you would like, and the factory generates the collection class in bytecode. You can use the factory to quickly create as many instances of the collection as you wish. The collection implements the Java collections interfaces, plus additional interfaces that allow you to efficiently access the collection without boxing/unboxing.

The above example would be written as follows:

// Initialize the factory when the program is loaded.

// Then the bytecode gets generated just once.

static final Factory factory =

new FactoryBuilder()

.list()

.elementType(Integer.TYPE)

.modifiable(false)

.factory();



int[] ints = {1000, 1001, 1002};

IntList list = factory.createIntList(ints);

Variants are expressed as FactoryBuilder methods:
  • FactoryBuilder FactoryBuilder.list()
  • FactoryBuilder FactoryBuilder.map()
  • FactoryBuilder FactoryBuilder.set()
  • FactoryBuilder FactoryBuilder.keyType(Class...) (for maps only)
  • FactoryBuilder FactoryBuilder.valueType(Class...) (for maps only)
  • FactoryBuilder FactoryBuilder.elementType(Class...) (for list and set only)
  • FactoryBuilder FactoryBuilder.sorted(boolean) (cf. the difference between Set and SortedSet)
  • FactoryBuilder FactoryBuilder.deterministic(boolean) (cf. the difference between HashMap and
  • LinkedHashMap)
  • FactoryBuilder FactoryBuilder.modifiable(boolean)
  • FactoryBuilder FactoryBuilder.fixedSize(boolean) (cf. the difference between Flat3Map and Map)
  • FactoryBuilder FactoryBuilder.synchronized(boolean)

And so forth. Additional variants could be added as the project evolved. Templates could be fine-tuned for particular combinations of variants.

The projects I mentioned above clearly use a template system, and we could use and extend those templates. The janino facility can easily convert the generated java code into bytecode. And the JVM would be able to apply JIT (just-in-time compilation) to these classes; in fact, these classes would be more amenable to compilation, because they would be compact and final.

The existing projects have invested a lot of effort designing high-performance collections. I'd like to build on that work; this project could even be an extension to those projects.

I'd like to hear if you're interested in working with me on this.


More from Julian Hyde on Open Source OLAP Blog ...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
MicroStrategy Announces New High Performance Business Intelligence Course Latest News Headlines Other International Vendors 0 13th May 2011 01:42 AM
CBE Group to drive performance, increase efficiency of collections with SAS® Analytic Latest News Headlines SAS Forum 0 15th March 2011 01:09 AM
Slides from High-Performance Analytics webinar now available admin Analytic News Feeds 0 15th April 2010 09:25 AM
Webinar: High-Performance Analytics with R and Microsoft HPC Server admin Analytic News Feeds 0 19th March 2010 03:31 AM
Improved collections classes for Mondrian's query execution process Latest News Headlines Open Source News and Opinion 0 17th February 2010 01:17 PM


All times are GMT +11. The time now is 07:55 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO