Pentaho Data Integration version 3.0, change log versus version 2.5 November 14
th
, 2007
Pentaho Data Integration
version 3.0
Making a fresh start
Key differences with version 2.5.0
List of changes on November 14
th
2007
Compiled by Matt Casters, mcasters (at) pentaho.org
Send additional changes you found to this address.
page 1/24
Pentaho Data Integration version 3.0, change log versus version 2.5 November 14
th
, 2007
Index
1. Changes summary................................................................................................4
1.1. Preface...........................................................................................................4
1.2. Overview........................................................................................................5
2. General changes...................................................................................................7
2.1. New core API.................................................................................................7
2.1.1. Data and meta-data split........................................................................7
2.1.2. Lazy conversion.....................................................................................8
2.2. Improved preview and debug........................................................................9
2.3. Revised Spoon GUI.......................................................................................10
2.3.1. New icons...............................................................................................10
2.3.2. Customization of Spoon.........................................................................11
2.4. Mappings on steroids....................................................................................11
2.5. New steps......................................................................................................13
2.5.1. CSV Input...............................................................................................13
2.5.2. Fixed Input..............................................................................................13
2.5.3. Access Input ..........................................................................................14
2.5.4. LDAP Input.............................................................................................14
2.5.5. Append streams.....................................................................................15
2.5.6. Abort.......................................................................................................15
2.5.7. Regex Evaluation...................................................................................15
2.5.8. Mondrian Input.......................................................................................16
2.5.9. Closure generator...................................................................................16
2.5.10. Get files rows count..............................................................................16
2.6. New Job entries.............................................................................................17
2.6.1. XSL transformation.................................................................................17
2.6.2. Success..................................................................................................17
2.6.3. XSD validator.........................................................................................17
2.6.4. DTD validator.........................................................................................18
2.6.5. Write to log file........................................................................................18
2.6.6. Copy files................................................................................................19
2.6.7. Put a file with FTP..................................................................................19
2.6.8. Unzip files...............................................................................................20
2.7. Databases......................................................................................................21
2.8. Slave server improvements...........................................................................21
2.8.1. Remote execution of jobs.......................................................................21
page 2/24
Pentaho Data Integration version 3.0, change log versus version 2.5 November 14
th
, 2007
2.8.2. Improved partitioning support ................................................................22
3. Source code improvements..................................................................................23
3.1. A few extra lines of code...............................................................................23
3.2. Core committers............................................................................................23
3.3. Translators.....................................................................................................24
3.4. Commit stats..................................................................................................24
3.5. Feature requesters........................................................................................24
page 3/24
Pentaho Data Integration version 3.0, change log versus version 2.5 November 14
th
, 2007
1. Changes summary
1.1. Preface
Creating Pentaho Data Integration version 3.0 was a challenge in all meanings of the word. When you
start the new and improved Spoon version, you will notice the new icons look. However, the changes
that went on in the core API have been much bigger than what meets the eye on the first glance.
At a certain point during development you come to the realization that certain aspects of the existing
software design is not turning out quite the way you wanted or envisioned at the start. This realization is
at the core of most evolutions in software development and certainly also for PDI 3.0. We tried to get rid
of every bottlenecks and inconsistency we could think of. Now that we in fact did get rid of those, we
believe that in the future we'll be able to evolve even faster than we already do.
In this release we've again managed to attract more and more people in the community to help out with
coding, documentation, translation, packaging (w32 and OSX installers), bug reporting and a lot more.
This document was written as a special “thank you” note to all those people involved and to keep
everyone informed about the enormous progress we are making.
page 4/24
Pentaho Data Integration version 3.0, change log versus version 2.5 November 14
th
, 2007
1.2. Overview
These are the most notable changes that have been made:
● New core API
• Split of data and meta-data offering greater flexibility and performance
• Centralized data conversion API with accurate and enhanced error reporting
• Simplified and unified preview and debug architecture (with listeners)
• Advanced data storage engine
• Normal storage: native data types (as before)
• Binary string storage: for lazy conversion (see below)
• Indexed storage: references a table of contents. The data is a simple integer index.
• Improved step and job plug-in system with annotations:
• http://wiki.pentaho.org/display/EAI/Annotated+step+plugin+development
● Improved transformation preview and debug
• Preview with “Get more rows” functionality.
• Transformation start, pause, resume, stop
• Debugger with conditions (breakpoint pause)
● New high performance text file handling
• Lazy conversion helps to prevent String data conversion when it's not really needed
• New high performance “CSV Input” step with Non-blocking IO (NIO) support
• New high performance “Fixed Input” step with NIO and parallel reading support
● Revised Spoon GUI
• New set of icons with consistent color scheme.
• First steps towards full XUL support (Menu, Tool bar, ...)
• Created more possibilities for Spoon customization (OEM) of look and feel
● Mappings on steroids
• Allows zero or more input streams
• Allows zero or more output streams
• Can be parameterized using variables
• Supports global field renames on input and output
● Other new Steps
• Access Input : reads MS-Access MDB files directly.
• LDAP Input: reads information from an LDAP server
• Append streams: append 2 streams in an ordered way
• Abort: abort a transformation in case one or more rows are read
• Regex Evaluation : evaluate regular expressions
• Mondrian Input : read data from a Mondrian server using MDX
• Closure generator: generates transitive closured for use with Mondrian.
• Get files rows count : count the number of rows in files
page 5/24