Search This Blog

Saturday, January 30, 2010

Do not set error configuration for KeyDuplicate to IgnoreError

SQL Server 2005 | SQL Server 2008

Some posts ago I wrote about the error message ‘A duplicate attribute key has been found when processing’. At the end of the post I suggested not to set the error configuration for the KeyDuplicate to IgnoreError (unless you are in a prototyping scenario).

Some people asked me about this (as it seems to be a nice and easy solution, similar to ‘on error resume next’ in VBA which I also wouldn’t recommend).

The main reason for me is that I definitely prefer getting a processing error instead having wrong values in the cube which could be the consequence.

For example, let’s take a look at a date dimension:

DateID Month Year
1 January 2008
2 February 2008
3 March 2008
4 April 2008
5 May 2008
6 June 2008
7 July 2008
8 August 2008
9 September 2008
10 October 2008
11 November 2008
12 December 2008
13 January 2009
14 February 2009
15 March 2009
16 April 2009
17 May 2009
18 June 2009
19 July 2009

For this simple dimension you can see that months appear more than once (once per year). This means, that the following attribute relationship is wrong (assuming that we used each of the columns above as key for the respective attribute):

image_thumb1

Having set the KeyDuplicate to ‘ReportAndStop’ results in SSAS failing to process the dimension, so we are instantly aware that something is wrong. Here is the setting in BIDS (has to be set per dimension):

image_thumb3

And here is the error message if you do a ‘Process Full’ on the dimension:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimDate_x0024_', Column: 'Month', Value: 'April'. The attribute is 'Month'.

With this error message you may fix the problem and get things right. However, if you set the KeyDuplicate to ‘IgnoreError’ you will get no feed back from the system at all. However, if you take a look at the dimension you will find that each month is only associated with one of the years (in my case 2008).

image_thumb6

A user might not realize this and although I have fact values for all years, my cube now only shows values for 2008:

image_thumb5

The correct values in my case would be as follows:

image_thumb7

So in this situation we would have totally wrong values in the cube without even getting an error message and this is the main reason for me to have KeyDuplicate set to ‘ReportAndStop’. In my trivial example, the error would be easily detected by the users but as cubes are getting more complex you might not notice some missing and some wrong values. Another source for an attribute relationship violation could be a modeling error for an SCD-2 dimension. If, for example, you relied on the attribute relationship between product and product group and forgot to model this properly although the data source keeps historic changes, than you might see the sales on the wrong product group. Again, with the ‘ReportAndStop’ option you would immediately know that there is an issue with your dimension.

Sunday, January 10, 2010

Different granularity in a single dimension

SQL Server 2005 | SQL Server 2008

Handling different granularity (for example actual and plan values) can get a little bit complicated. Of course there are standard methods, like splitting up the less granular data in order to meet the finer granularity. Or you could use a parent-child structure as this allows you to store data at different levels in the tree-like structure. Or you could supply ‘unknown’ elements to map the less granular information.

For this post I want to show a different approach. Usually for each dimension we are linking all fact tables that refer to this dimension to the same key in the dimension table (the dimension’s primary key). However, the data source view also allows us to link facts to different key columns in the same dimension table.

In my simple scenario I have time dimension (called DimDate) and two fact tables: Order and Order Plan. The orders are on a daily basis while the order plan is on a monthly basis. We want to link both fact tables to the same time dimension as shown below:

image_thumb11

The link from the fact order table to the time dimension DimDate (marked as 1 in the sketch above) is the usual link from the fact table to the primary key of the DimDate table. The time dimension is on daily granularity and so are the order facts. But for the plan value fact table FactOrderPlan, the link to the time dimension is realized by using two key columns: Year (as the year number, eg. 2008) and month (as the month number, eg. 11 for November), so the link in the data source view looks like this:

image1_thumb

For our cube it is important to specify the right granularity attribute. While the order table is linked to the Day (granularity attribute), we link the order plan fact table to the month attribute and define the proper key mappings for that.

image_thumb1

Now, the dimension usage looks like this:

image_thumb3

We also defined proper attribute relationship for the time dimension accordingly to build up a year->quarter->month->day hierarchy.

So let’s take the first look at the cube created by this method:

image_thumb5

This first look is as expected. As long as we are at a common granularity level shared by both fact tables, we can see the values correctly. Also, the aggregation of both fact sources works fine (although they are at different granularity).

Now, let’s drill down to the day level which is not present in our planning data:

image_thumb7

The behavior here is exactly the same as for other unrelated dimensions! The value of the nearest matching hierarchy is taken for the levels below. Sometimes this behavior of the cube confuses the users, but we can still change this behavior by changing the parameter IgnoreUnrelatedDimensions:

image_thumb12

Now, the planning values below the month granularity level have disappeared:

image_thumb10

However, if you want to create a calculated measure that is also based on the planning values, you should be aware of the fact, that the values are simply not existing any more at the day level. For example, let’s define a calculated measure PlanFulFillment using the following expression:

[Measures].[Amount] / [Measures].[Amount Plan]

At the day level, the measure Amount Plan does not exist, so this results in computation errors:

image_thumb14

You could still use the non-empty behavior for the calculate measure (set to “Amount Plan”) in order to have these computations disappear. However, if you want to refer to the monthly value, you can simple use the ValidMeasure MDX function that is always helpful in conjunction with IgnoreUnrelatedDimensions=false. So after defining our calculated measure as [Measures].[Amount] / ValidMeasure([Measures].[Amount Plan]) the result looks like this (at day level, the monthly values for the planning data is taken)

image_thumb16

So, surprisingly enough (at least for me), everything behaves exactly like we wanted it to do and this makes the approach to an alternative in some scenarios. Again, please check your attribute relationship carefully and also spend some time on testing the result as the approach can get dangerous for more complicated attributes structures.

I also checked this design with more attributes and parallel hierarchies in the time dimension (for example calendar week) and more fact tables (for example production plan) and the aggregation was still correct. Having IgnoredUnrelatedDimensions set to false is helpful here to clearly see, which fact is selected at the right granularity level.

Following is an example with three fact tables (one at day level, one at month level and one at week level):

image_thumb2