Search This Blog

Sunday, January 10, 2010

Different granularity in a single dimension

SQL Server 2005 | SQL Server 2008

Handling different granularity (for example actual and plan values) can get a little bit complicated. Of course there are standard methods, like splitting up the less granular data in order to meet the finer granularity. Or you could use a parent-child structure as this allows you to store data at different levels in the tree-like structure. Or you could supply ‘unknown’ elements to map the less granular information.

For this post I want to show a different approach. Usually for each dimension we are linking all fact tables that refer to this dimension to the same key in the dimension table (the dimension’s primary key). However, the data source view also allows us to link facts to different key columns in the same dimension table.

In my simple scenario I have time dimension (called DimDate) and two fact tables: Order and Order Plan. The orders are on a daily basis while the order plan is on a monthly basis. We want to link both fact tables to the same time dimension as shown below:

image_thumb11

The link from the fact order table to the time dimension DimDate (marked as 1 in the sketch above) is the usual link from the fact table to the primary key of the DimDate table. The time dimension is on daily granularity and so are the order facts. But for the plan value fact table FactOrderPlan, the link to the time dimension is realized by using two key columns: Year (as the year number, eg. 2008) and month (as the month number, eg. 11 for November), so the link in the data source view looks like this:

image1_thumb

For our cube it is important to specify the right granularity attribute. While the order table is linked to the Day (granularity attribute), we link the order plan fact table to the month attribute and define the proper key mappings for that.

image_thumb1

Now, the dimension usage looks like this:

image_thumb3

We also defined proper attribute relationship for the time dimension accordingly to build up a year->quarter->month->day hierarchy.

So let’s take the first look at the cube created by this method:

image_thumb5

This first look is as expected. As long as we are at a common granularity level shared by both fact tables, we can see the values correctly. Also, the aggregation of both fact sources works fine (although they are at different granularity).

Now, let’s drill down to the day level which is not present in our planning data:

image_thumb7

The behavior here is exactly the same as for other unrelated dimensions! The value of the nearest matching hierarchy is taken for the levels below. Sometimes this behavior of the cube confuses the users, but we can still change this behavior by changing the parameter IgnoreUnrelatedDimensions:

image_thumb12

Now, the planning values below the month granularity level have disappeared:

image_thumb10

However, if you want to create a calculated measure that is also based on the planning values, you should be aware of the fact, that the values are simply not existing any more at the day level. For example, let’s define a calculated measure PlanFulFillment using the following expression:

[Measures].[Amount] / [Measures].[Amount Plan]

At the day level, the measure Amount Plan does not exist, so this results in computation errors:

image_thumb14

You could still use the non-empty behavior for the calculate measure (set to “Amount Plan”) in order to have these computations disappear. However, if you want to refer to the monthly value, you can simple use the ValidMeasure MDX function that is always helpful in conjunction with IgnoreUnrelatedDimensions=false. So after defining our calculated measure as [Measures].[Amount] / ValidMeasure([Measures].[Amount Plan]) the result looks like this (at day level, the monthly values for the planning data is taken)

image_thumb16

So, surprisingly enough (at least for me), everything behaves exactly like we wanted it to do and this makes the approach to an alternative in some scenarios. Again, please check your attribute relationship carefully and also spend some time on testing the result as the approach can get dangerous for more complicated attributes structures.

I also checked this design with more attributes and parallel hierarchies in the time dimension (for example calendar week) and more fact tables (for example production plan) and the aggregation was still correct. Having IgnoredUnrelatedDimensions set to false is helpful here to clearly see, which fact is selected at the right granularity level.

Following is an example with three fact tables (one at day level, one at month level and one at week level):

image_thumb2

12 comments:

  1. I want to say thank you for your great post.

    While I think this is a great method, being new to BI I am also curious on what you meant by "use a parent-child structure as this allows you to store data at different levels in the tree-like structure."

    Can you elaborate or point me to any write-up on the above method?

    TIA

    ReplyDelete
  2. very good job, congratulations. so keep! ;)
    [bryq]

    ReplyDelete
  3. I have followed these instructions exactly, but I cannot get it to work. When I go to define the relationship and I select "granularity - Month", I still only get "Month" under measure group columns, not "Month" and "Year".

    This means I get the total over all years appearing next to each month - which is not what I want!

    I think the problem is in how the attribute relationships are defined, but I cannot find a good example. Are you able to shed any light on this?

    ReplyDelete
  4. Miles, you will need to create attributes with the corresponding keys in the dimension. For example, you can create an attribute MonthOfYear (Key columns Year, Month). If you now link your fact table to that dimension and choose MonthOfYear (in the define relationship dialog) as the granularity attribute you will see both key columns to be mapped to the corresponding measure group fields.
    Of course you will need to have the corresponding fields in your measure group (maybe as calculated columns) too and you should have defined this relationship in the data source view (although I don't think that this is required here).
    Hope this helps.

    ReplyDelete
  5. Excellent article Hilmar :)
    Just one doubt, usually we get a warning when we select an attribute which is not the key as the granularity attribute (like month in this case). Would that warning be still present if we follow this approach?

    Cheers,
    Jason Thomas

    ReplyDelete
  6. Hi Jason,
    the warning will still be there but this is one of the cases in which the warning just means something like "Are you sure that you wanted to do this?" to prevent zhe developer from accidently doing something wrong. In our case you can ignore this warning. There are situations like this in BIDS.
    Cheers, Hilmar

    ReplyDelete
  7. Hilmar,

    Could you go into more detail on how to do the step below. How exactly are you creating an attribute with 2 key columns? Are you concatenating this within the db (ie. 201011) or are you doing this within BIDS?

    "Miles, you will need to create attributes with the corresponding keys in the dimension. For example, you can create an attribute MonthOfYear (Key columns Year, Month). If you now link your fact table to that dimension and choose MonthOfYear (in the define relationship dialog) as the granularity attribute you will see both key columns to be mapped to the corresponding measure group fields.
    Of course you will need to have the corresponding fields in your measure group (maybe as calculated columns) too and you should have defined this relationship in the data source view (although I don't think that this is required here)."

    JD

    ReplyDelete
  8. JD, There is no need to concatenate the keys in a database view or in the data source view of your olap solution (although this would also be an option). You can define the key columns in the attribute's properties. Just do a right click on the attribute and choose 'properties'. In the property window, locate the properties for the key, name and value column. By clicking the ellipsis in the input box for the key column, a dialog will open where you can define any number of key columns (and also the order of the key columns).
    Please note, that attributes with more than one key column need to have the name property defined (with only one key column, name defaults to the key column).
    Hope this helps.
    Hilmar

    ReplyDelete
  9. I just implemented a solution using this approach and as another few tips on it I can offer the following:

    1. It is better than storing, for example, year data on month granularity because it allows for analysis over months without spikes. In my current project we need to find outliers on the different levels of the Date dimension and having Year data on Month skews the distribution significantly.

    2. Although the approach is imho very suitable for many scenarios, it could be less intuitive for business users. It is important to talk to them and agree that this is what they expect. In many older implementations aggregated data is stored on leaf level because of the way they have built them with referential integrity in mind.

    3. A calculated measure which does MG1.Measure+MG2.Measure+...+MGN.Measure can effectively combine the different measure group measures and allow browsing all the data together.

    4. If you define a diamond-shaped relationship between the attributes in your dimension and then have a measure group linking to a non-leaf attribute in a relationship chain it will have troubles aggregating to levels which are not in that chain. In example, if you have Date->Week->Year and Date->Month->Quarter->Year, then if you have a MG related to the Week attribute SSAS will not know how to aggregate the Week to Month or Quarter. Just something to consider.

    ReplyDelete
  10. Hi Hilmar,

    I had a quick question about non-key attribute. I am following the steps as you have mentioned above, but still, I am not able to slice the fact on the non key attribute. I have a two columns X, Y in my dimension table which can uniquely identify fact columns X and Y.

    Can you suggest me the solution for that. Column X acts like parent and Y is child in terms of hierarchy.

    Help would be appreciated. Thanks

    ReplyDelete
  11. Hi,

    To the previous post of July 25th, when I create non key attribute hierarchy, It's not aggregating and I get repeating parent name for different child rather than child belonging to same parent falling under one tree structure

    ReplyDelete
  12. Great article. Created this with productgroup, productId and day, month and year and it works perfectly

    ReplyDelete