| |
| ||||||
Roll your own high-performance Java collections classesThis is a discussion on Roll your own high-performance Java collections classes within the Open Source News and Opinion forums, part of the Open Source Analytics category; The Java collections framework is great. You can create maps, sets, lists with various element types, various performance characteristics (e.g. if you want O(1) insert, use a linked list), iterate ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,085
![]() | The Java collections framework is great. You can create maps, sets, lists with various element types, various performance characteristics (e.g. if you want O(1) insert, use a linked list), iterate over them, and you can decorate them to give them other behaviors. But suppose that you want to create a high-performance, memory efficient immutable list of integers? You'd write List list = Collections.unmodifiableList( new ArrayList( Arrays.asList(1000, 1001, 1002))); There will be 6 objects allocated in the JVM: three Integer objects, an array Object[3] to hold the Integers, an ArrayList, and an UnmodifiableRandomAccessList. Not to mention the Arrays.ArrayList and Integer[3] used to construct the list and quickly thrown away. The resulting list is no longer high-performance. A call to say 'int n = list.get(2)' requires 3 method calls (UnmodifiableRandomAccessList.get, ArrayList.get, Integer.intValue) and 3 indirections. And the sheer number of objects created reduces the chance that a given stretch of code will be able to operate solely from the contents of L1 cache. So, what next? Should I write my own class, like this? public class UnmodifiableNativeIntArrayList implements List { ... } Well, maybe. But there are rather a lot of variations to cover, and each one needs to be hand-coded and tested. Do I use library code? I searched and turned up Apache Commons Primitives, Primitive Collections for Java (PCJ), and GNU Trove (trove4j). Of these, only GNU Trove is still active. None of the libraries supports features such as maps with two or more keys, unmodifiable collections, synchronized collections, flat collections (similar to Apache Flat3Map). It's not surprising that they don't: each combination of features would require its own class, so the size of the jar file would grow exponentially. So, I'd like to propose an alternate approach. You configure a factory, specifying the precise kind of collection you would like, and the factory generates the collection class in bytecode. You can use the factory to quickly create as many instances of the collection as you wish. The collection implements the Java collections interfaces, plus additional interfaces that allow you to efficiently access the collection without boxing/unboxing. The above example would be written as follows: // Initialize the factory when the program is loaded. // Then the bytecode gets generated just once. static final Factory factory = new FactoryBuilder() .list() .elementType(Integer.TYPE) .modifiable(false) .factory(); int[] ints = {1000, 1001, 1002}; IntList list = factory.createIntList(ints); Variants are expressed as FactoryBuilder methods:
And so forth. Additional variants could be added as the project evolved. Templates could be fine-tuned for particular combinations of variants. The projects I mentioned above clearly use a template system, and we could use and extend those templates. The janino facility can easily convert the generated java code into bytecode. And the JVM would be able to apply JIT (just-in-time compilation) to these classes; in fact, these classes would be more amenable to compilation, because they would be compact and final. The existing projects have invested a lot of effort designing high-performance collections. I'd like to build on that work; this project could even be an extension to those projects. I'd like to hear if you're interested in working with me on this. More from Julian Hyde on Open Source OLAP Blog ... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| MicroStrategy Announces New High Performance Business Intelligence Course | Latest News Headlines | Other International Vendors | 0 | 13th May 2011 01:42 AM |
| CBE Group to drive performance, increase efficiency of collections with SAS® Analytic | Latest News Headlines | SAS Forum | 0 | 15th March 2011 01:09 AM |
| Slides from High-Performance Analytics webinar now available | admin | Analytic News Feeds | 0 | 15th April 2010 09:25 AM |
| Webinar: High-Performance Analytics with R and Microsoft HPC Server | admin | Analytic News Feeds | 0 | 19th March 2010 03:31 AM |
| Improved collections classes for Mondrian's query execution process | Latest News Headlines | Open Source News and Opinion | 0 | 17th February 2010 01:17 PM |
| | |
| | |