Go Back   CORTEX Forums > Best Practices > Subject Matter Expertise > Data Integration Forum > Data Integration News Feeds
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Dynamic de-normalization of attributes stored in key-value pair tables

This is a discussion on Dynamic de-normalization of attributes stored in key-value pair tables within the Data Integration News Feeds forums, part of the Data Integration Forum category; Dear Kettlers, A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 24th May 2011, 01:05 AM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,067
Latest News Headlines is on a distinguished road
Post Dynamic de-normalization of attributes stored in key-value pair tables

Dear Kettlers,

A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.* The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution.

However, there are occasions where you need to query a source system and get some report going on them.* Let’s take a look at an example :

mysql> select * from person;+----+-------+----------+| id | name* | lastname |+----+-------+----------+|* 1 | Lex** | Luthor** ||* 2 | Clark | Kent**** ||* 3 | Lois* | Lane**** |+----+-------+----------+3 rows in set (0.00 sec)mysql> select * from person_attribute;+----+-----------+---------------+------------+| id | person_id | attr_key***** | attr_value |+----+-----------+---------------+------------+|* 1 |******** 1 | GENDER******* | M********* ||* 2 |******** 2 | GENDER******* | M********* ||* 3 |******** 3 | GENDER******* | F********* ||* 4 |******** 1 | NATURE******* | EVIL****** ||* 6 |******** 2 | NATURE******* | KIND****** ||* 7 |******** 3 | NATURE******* | KIND****** ||* 8 |******** 1 | PEOPLE_SAVED* | 0********* ||* 9 |******** 2 | PEOPLE_SAVED* | 394239324* || 10 |******** 3 | PEOPLE_SAVED* | 263403**** || 11 |******** 1 | PEOPLE_HARMED | 983439**** || 12 |******** 2 | PEOPLE_HARMED | 0********* || 13 |******** 3 | PEOPLE_HARMED | 29******** |+----+-----------+---------------+------------+12 rows in set (0.00 sec)mysql> select * from attribute_description;+----+---------------+-------------------------+-----------+| id | attr_key***** | attr_description******* | attr_type |+----+---------------+-------------------------+-----------+|* 1 | GENDER******* | Gender (M/F)*********** | String*** ||* 2 | NATURE******* | Nature***************** | String*** ||* 3 | PEOPLE_SAVED* | Number of people saved* | Integer** ||* 4 | PEOPLE_HARMED | Number of people harmed | Integer** |+----+---------------+-------------------------+-----------+It was Pentaho partner OpenBI that gave me the use-case but the data is obviously fictive. In the real-world case, the “person_attribute” table contains over one hundred million rows of data.

The challenge is that you want to have the following columns in your query: person ID, name, last name, gender, nature, number of saved people and number of people harmed.* However, it has to be implemented in such a way that if a new attribute is added to table “attribute_description” a new column will appear in the output.* It has to be 100% maintenance free.

Fortunately, we’ve done some interesting things recently with dynamic transformation to allow us to create the following transformation:



This transformation reads the metadata from the “attribute_description” table and injects that into the following template:



The Inject step is instructed to specify the list of columns to denormalize and take the output from that specific step:



As you can see in the next screen shot image from the previewing of data, all defined attributes are de-normalized correctly with the appropriate data types:



Now we can add an attribute to see if our transformation takes it into account:
mysql> insert into attribute_description(attr_key, attr_description, attr_type) values(’NUM_FRIENDS’, ‘Number of friends’, ‘Integer’);
Query OK, 1 row affected (0.06 sec)

mysql> insert into person_attribute(person_id, attr_key, attr_value) values(1, ‘NUM_FRIENDS’, ‘5′);
Query OK, 1 row affected (0.00 sec)

mysql> insert into person_attribute(person_id, attr_key, attr_value) values(2, ‘NUM_FRIENDS’, ‘34′);
Query OK, 1 row affected (0.00 sec)

mysql> insert into person_attribute(person_id, attr_key, attr_value) values(3, ‘NUM_FRIENDS’, ‘12′);
Query OK, 1 row affected (0.00 sec)

After updating our tables like this we can now preview the exact same step.* As you can tell from the following preview screen shot the extra column is automatically being picked up:



Now obviously, from a maintenance perspective this helps a great deal in when you are building a data warehouse.* However, it also allows us to start thinking about doing reporting and analyses on these kinds of dynamic data sources.* How can we explain to Pentaho Metadata that attributes should be picked up automatically?* How can we refresh Mondrian schema information on-the-fly when the new attribute is introduced?* While the answers to those questions are not yet 100% clear, I’m sure it’s going to be an interesting discussion.

Until next time,

Matt

P.S. Please note you’ll need a very recent build of PDI 4.2.0-M2 to be able to pass along data from a dynamic transformation.



More from Matt Casters on Data Integration (Pentaho) Blog...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
League Tables Latest News Headlines Latest News 0 13th May 2011 07:18 AM
4 Key Tweeting Attributes of Twitter Legend, Guy Kawasaki Latest News Headlines Other International Vendors 0 17th July 2010 03:20 AM
Post hires NAB pair Latest News Headlines 2010 Q2 News Headlines 0 6th May 2010 11:19 AM
Loading a dimension table with SCD1 and SCD2 attributes Latest News Headlines DWH Tip Feeds 0 25th November 2009 11:03 AM
The tspair package for finding top scoring pair classifiers in R. admin Analytic News Feeds 0 18th November 2009 11:38 AM


All times are GMT +11. The time now is 05:56 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO