Go Back   CORTEX Forums > Best Practices > Subject Matter Expertise > Data Integration Forum > Data Integration News Feeds
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

What?s new in 4.2.0

This is a discussion on What?s new in 4.2.0 within the Data Integration News Feeds forums, part of the Data Integration Forum category; Dear Kettle fans, Instead of pointing to the impressive list of changes in JIRA I took the time out to build a high level overview of all the new big ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 18th June 2011, 01:37 AM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,067
Latest News Headlines is on a distinguished road
Post What?s new in 4.2.0

Dear Kettle fans,

Instead of pointing to the impressive list of changes in JIRA I took the time out to build a high level overview of all the new big ticket items that are going to be in the upcoming version 4.2 of Kettle (Pentaho Data Integration).* Allow me to share it with you…:
  • The Excel Writer step offers advanced Excel output functionality to control
  • Graphical performance and progress feedback for transformations
  • The Google Analytics step allows download of statistics from your Google analytics account
  • The Pentaho Reporting Output step makes it possible for you to run your (parameterized) Pentaho reports in a transformation. It allows for easy report bursting of personalized reports.
  • The Automatic Documentation step generates (simple) documentation of your transformations and jobs using the Pentaho Reporting API.
  • The Get repository names step retrieves job and transformation information from your repositories.
  • The LDAP Writer step
  • The Ingres VectorWise (streaming) bulk loader step
  • The Greenplumb (streaming) bulk loader step (for gpload)
  • The Talend Job Execution job entry
  • Healthcare Level 7 : HL7 Input step, HL7 MLLP Input and HL7 MLLP Acknowledge job entries
  • The PGP File Encryption, Decryption & validation job entries facilitate encryption and decryption of files using PGP.
  • The Single Threader step for parallel performance tuning of large transformations
  • Allow a job to be started at a job entry of your choice (continue after fixing an error)
  • The MongoDB Input step (including authentication)
  • The ElasticSearch bulk loader
  • The Get ID from slave server step to get a globally unique Integer ID, for example for clustered transformations.
  • The XML Input Stream (StAX) step to read huge XML files at optimal performance and flat memory usage by flattening the structure of the data.
  • New Get ID from Slave Server step allows multi-host or clustered transformations to get globally unique integer IDs: http://wiki.pentaho.com/display/EAI/...m+Slave+Server
  • Carte improvements:
    1. reserve next value range from a slave sequence service
    2. allow parallel (simultaneous) runs of clustered transformations
    3. list (reserved and free) socket reservations service
    4. new options in XML for configuring slave sequences
    5. allow time-out of stale objects using environment variable KETTLE_CARTE_OBJECT_TIMEOUT_MINUTES
  • Memory tuning of logging back-end with: KETTLE_MAX_LOGGING_REGISTRY_SIZE, KETTLE_MAX_JOB_ENTRIES_LOGGED, KETTLE_MAX_JOB_TRACKER_SIZE allowing for flat memory usage for never ending ETL in general and jobs specifically.
  • Repository Import/Export
    1. Export at the repository folder level
    2. Export and Import with optional rule-based validations
    3. Import command line utility allow for rule-based (optional) import of lists of transformations, jobs and repository export files: http://wiki.pentaho.com/display/EAI/...+Documentation
  • ETL Metadata Injection:
    1. Retrieval of rows of data from a step to the “metadata injection” step
    2. Support for injection into the “Excel Input” step
    3. Support for injection into the “Row normaliser” step
    4. Support for injection into the “Row Denormaliser” step
  • The Multiway Merge Join step (experimental) allows for any number of data sources to be joined using one or more keys using an inner or a full outer join algorithm.
Beyond this list there’s as mentioned a long list of bug fixes and small improvements to the various steps and job entries.* It’s impossible to thank the complete community for all the contributions they’ve made to make this release a smashing success.* If you think it feels more like a 5.0 version please remember that we’re pretty conservative about version numbering.* As long as we don’t break our own Java API we won’t go to another major version.

Also remember you can try out all these new features right now by using a CI build or once the RC1 build is posted on SourceForge later on.* Please help our QA team by posting any issues you might find in JIRA.

Last but certainly not least let’s not forget to mention the upcoming exciting features of the new Pentaho BI Server version 4.* I won’t spoil the surprise for you but I can tell you that certain things in that new release are looking really (really!) nice.* Next Thursday (Europe – 13:00 GMT/UTC, 9:00am EST, Americas – 1:00pm EST, 10:00am PST) you can join us for a web conference with live demo.* Please register here if you are interested.

Have fun with the new Pentaho software releases!

Regards,
Matt



More from Matt Casters on Data Integration (Pentaho) Blog...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +11. The time now is 05:57 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO