| |
| ||||||
Dynamic de-normalization of attributes stored in key-value pair tablesThis is a discussion on Dynamic de-normalization of attributes stored in key-value pair tables within the Data Integration News Feeds forums, part of the Data Integration Forum category; Dear Kettlers, A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,067
![]() | Dear Kettlers, A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.* The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution. However, there are occasions where you need to query a source system and get some report going on them.* Let’s take a look at an example : mysql> select * from person;+----+-------+----------+| id | name* | lastname |+----+-------+----------+|* 1 | Lex** | Luthor** ||* 2 | Clark | Kent**** ||* 3 | Lois* | Lane**** |+----+-------+----------+3 rows in set (0.00 sec)mysql> select * from person_attribute;+----+-----------+---------------+------------+| id | person_id | attr_key***** | attr_value |+----+-----------+---------------+------------+|* 1 |******** 1 | GENDER******* | M********* ||* 2 |******** 2 | GENDER******* | M********* ||* 3 |******** 3 | GENDER******* | F********* ||* 4 |******** 1 | NATURE******* | EVIL****** ||* 6 |******** 2 | NATURE******* | KIND****** ||* 7 |******** 3 | NATURE******* | KIND****** ||* 8 |******** 1 | PEOPLE_SAVED* | 0********* ||* 9 |******** 2 | PEOPLE_SAVED* | 394239324* || 10 |******** 3 | PEOPLE_SAVED* | 263403**** || 11 |******** 1 | PEOPLE_HARMED | 983439**** || 12 |******** 2 | PEOPLE_HARMED | 0********* || 13 |******** 3 | PEOPLE_HARMED | 29******** |+----+-----------+---------------+------------+12 rows in set (0.00 sec)mysql> select * from attribute_description;+----+---------------+-------------------------+-----------+| id | attr_key***** | attr_description******* | attr_type |+----+---------------+-------------------------+-----------+|* 1 | GENDER******* | Gender (M/F)*********** | String*** ||* 2 | NATURE******* | Nature***************** | String*** ||* 3 | PEOPLE_SAVED* | Number of people saved* | Integer** ||* 4 | PEOPLE_HARMED | Number of people harmed | Integer** |+----+---------------+-------------------------+-----------+It was Pentaho partner OpenBI that gave me the use-case but the data is obviously fictive. In the real-world case, the “person_attribute” table contains over one hundred million rows of data. The challenge is that you want to have the following columns in your query: person ID, name, last name, gender, nature, number of saved people and number of people harmed.* However, it has to be implemented in such a way that if a new attribute is added to table “attribute_description” a new column will appear in the output.* It has to be 100% maintenance free. Fortunately, we’ve done some interesting things recently with dynamic transformation to allow us to create the following transformation: ![]() This transformation reads the metadata from the “attribute_description” table and injects that into the following template: ![]() The Inject step is instructed to specify the list of columns to denormalize and take the output from that specific step: ![]() As you can see in the next screen shot image from the previewing of data, all defined attributes are de-normalized correctly with the appropriate data types: ![]() Now we can add an attribute to see if our transformation takes it into account: mysql> insert into attribute_description(attr_key, attr_description, attr_type) values(’NUM_FRIENDS’, ‘Number of friends’, ‘Integer’);After updating our tables like this we can now preview the exact same step.* As you can tell from the following preview screen shot the extra column is automatically being picked up: ![]() Now obviously, from a maintenance perspective this helps a great deal in when you are building a data warehouse.* However, it also allows us to start thinking about doing reporting and analyses on these kinds of dynamic data sources.* How can we explain to Pentaho Metadata that attributes should be picked up automatically?* How can we refresh Mondrian schema information on-the-fly when the new attribute is introduced?* While the answers to those questions are not yet 100% clear, I’m sure it’s going to be an interesting discussion. Until next time, Matt P.S. Please note you’ll need a very recent build of PDI 4.2.0-M2 to be able to pass along data from a dynamic transformation. More from Matt Casters on Data Integration (Pentaho) Blog... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| League Tables | Latest News Headlines | Latest News | 0 | 13th May 2011 07:18 AM |
| 4 Key Tweeting Attributes of Twitter Legend, Guy Kawasaki | Latest News Headlines | Other International Vendors | 0 | 17th July 2010 03:20 AM |
| Post hires NAB pair | Latest News Headlines | 2010 Q2 News Headlines | 0 | 6th May 2010 11:19 AM |
| Loading a dimension table with SCD1 and SCD2 attributes | Latest News Headlines | DWH Tip Feeds | 0 | 25th November 2009 11:03 AM |
| The tspair package for finding top scoring pair classifiers in R. | admin | Analytic News Feeds | 0 | 18th November 2009 11:38 AM |
| | |
| | |