Search This Blog

Monday, December 21, 2009

A different approach to modeling units

SQL Server 2005 | SQL Server 2008

While units are a built-in functionality in some OLAP databases, we need to take care of them in SSAS on our own. Usually I model units as a dimension so that the facts are linked to the unit they belong to. However, units usually must not be aggregated (like adding up meters with liters), and therefore we would set the IsAggregatable property to false. The user first has to choose a unit before the result is displayed. In some cases this may not be that clear. For (local) currency for example the values are aggregatable as long as the currency is consistent which may or may not depend of the currency unit. As long as the selection (no matter of how the filtering was done) leads to a single currency it would be possible to display the result in local currency. Whenever there is more than one currency involved, the result cannot be shown.

Taking the usual approach with the non-aggregatable unit dimension you may find a situation like below:

image_thumb

For this example we have stores aggregated to some kind of store group with sales. As the currency is a non-aggregatable dimension you don’t see totals for the rows (just for the columns). Assuming we set our default member for the currency unit dimension to some currency not included (or to unknown like I did), you don’t see any value at all, if the currency unit is not included in the query:

image_thumb1

So everything’s fine with this approach. But usually we don’t analyze by currency and if we simply put it on a filter, we might miss values of the other currencies like in the following screenshot of the same sample cube (the value for ‘Other’ gives no hint that there might be other sales here too):

image_thumb2

So the idea could be to tell the cube to display the currency values as long as the displayed cell contains only one currency unit. For other cells we can only display a warning. The following screenshot shows the final result:

image_thumb3

Please note, that although we don’t have the currency unit included in this pivot table on either the axis or the filter, the cube knows that certain cells only result in EUR values and therefore can be aggregated while other cells consist of more than one local currency and therefore cannot be aggregated.

To explain how this can be done, let’s first look at the model of our sample cube:

image_thumb5

The currency unit is linked to the fact table using its primary key UnitID. For the cube we include this key is a measure to the cube… hey, this sounds weird, why should we use the surrogate key as a measure?? Well, we even use it twice, once aggregated by the Min function and one aggregated by the Max function.

image_thumb6

Now, for any cell, where “Minimum CurrencyID” equals “Maximum CurrencyID” we are sure that we only have one currency unit in this cell. This can be used in a calculated member. I set the visibility of the original measure Amount to hidden and add a calculated member like this:

CREATE MEMBER CURRENTCUBE.[Measures].Amount
AS iif([Measures].[Minimum CurrencyID]=[Measures].[Maximum CurrencyID],[Measures].[AmountLocal],NULL),
FORMAT_STRING = iif([Measures].[Minimum CurrencyID]=[Measures].[Maximum CurrencyID],
strtomember("[Currency].[Unit].&["+CStr([Measures].[Minimum CurrencyID])+".]").properties("Format String"),
""),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Fact Sale' ;

As you can see we also take the format string (e.g. “$”#,##0.00) from the currency table to display each currency properly.

For the above screenshot I modified this code a little bit and returned the value 0 instead of NULL. I also used the text “multiple units” as the default format string for this value 0. I prefer the NULL value though as it is “more” accurate.

As long as you’re using Excel 2007 compliant format strings (as describe in one of my previous posts), everything should display properly in Excel too as shown in the following screenshot:

image_thumb11

Other clients can also leverage the formatting defined in our cube. For example in Reporting Services you could use the above formats if you refer to the FORMATTED_VALUE property instead of the VALUE property. So instead of

=Fields!Amount.Value

we would use:

=Fields!Amount.FormattedValue

Here’s a simple report based on the sample cube:

image_thumb8

Sunday, November 29, 2009

A duplicate attribute key has been found when processing…

SQL Server 2005 | SQL Server 2008

This post is about a common error message during dimension processing I’ve been asked about quite a few times so I thought it would be worth posting about it. The error message says that a duplicate attribute key has been found when processing as shown in the following screenshot for a test cube (I just processed one dimension here):

image_thumb3

Here’s the full error message:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product', Column: 'ProductGroup', Value: ''. The attribute is 'Product Group'.

When you got to this article because you just ran into this problem you probably don’t want to read much about the background but only want a solution. Unfortunately I found at least three possible reasons for this error message:

Reason 1 (likely): The most likely reason for that error is that you are having NULL values in your attribute key column.If you simply created the attribute by dragging it from the source view, BIDS only sets the key column (name and value column default to the key column in this case), so for example if you have a column ‘Product Group’ in your source table and drag it to your dimension, the product group (Text field) will automatically become the key for this attribute. The attribute is listed in the error message (in the example above it is ‘Product Group’).

Solution: Try avoiding those NULL values in your data source (for example by using a DSV query and the T-SQL coalesce-function). When your source data is a data warehouse it’s also a good practice to avoid null values as they complicate the queries to the data warehouse.

Reason 2 (likely): You defined an attribute relationship between two attributes of the dimension but the data in your source tables violates the relationship. The error message gives you the name of the conflicting attribute (text part ‘The attribute is…’). The attributes has a relationship to another attribute but for the value stated in the error message (‘Value: …’) there are at least two different values in the attribute that the relationship refers to. If you have BIDS Helper installed, you can also see the error details and all violating references when using the ‘Dimension Health Check’ function.

Solution: You may solve the error by making the key of the attribute unique. For example:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimDate_x0024_', Column: 'Month', Value: 'April'. The attribute is 'Month'.
In this example, the Month attribute violates an attribute relationship (maybe Month->Year) for the month April meaning that April appears for more than one year. By adding the year to the key of the month attribute you would make the relationsship unique again.

Reason 3 (not that likely): You have an attribute with separate key and name source fields. When you check the data, you see that keys are appearing more than once with different entries in their name columns (note that it’s not a problem if the key appears more than once if only the name column is the same). In this case you will usually also see the key value in the error message, for example:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product2', Column: 'ProductCode', Value: '1'. The attribute is 'Product Name'.
This means that the attribute ‘Product Name’ uses the source column ‘ProductCode’ as the key and for the product code 1 there is more than one name.

Solution: Use a unique key column (unique with respect to the name column)

Long explanation Reason 1:

In this case our attribute is only defined by one single source column (acting as key, name and value information) from the data source view. When processing a dimension, SSAS run select distinct queries on the underlying source table, so a duplicated key should be impossible even if the key appears multiple times. Just think of a date dimension like the following one (just for years and months):

image_thumb15

In this case the year (2009) appears in multiple rows. However, defining an attribute year (using the the year column as the key) does not give a conflict as it is queried using a distinct query (so 2009 only appears once). So again, how could we get a duplicate result when using a select distinct query? Here is how my product table looked like:

image_thumb16

As you can see the ProductGroup column has one row with an empty string and another row with a NULL value. When SSAS queries this attribute during processing it runs the following SQL query (that can be captured using the profiler):

SELECT DISTINCT [dbo_Product].[ProductGroup] AS [dbo_ProductProductGroup0_0]
FROM [dbo].[Product] AS [dbo_Product]

The result of the query looks like this:

image_thumb

Now, with the default NULL processing for our dimension attribute being set to ‘Automatic’ meaning Zero (for numerical values) or Blank (for texts) the NULL value above is converted to an empty string. So the result set has two lines with an empty string and that causes the error.

image_thumb17

So the problem can be avoided if you don’t have null values in your column. This explains the first reason described above.

Long explanation Reason 2:

I blogged about attribute relationship before and you may want to read this post about defining the key for attributes in an attribute relationship.

Long explanation Reason 3:

Let’s take a look at the following modified product table.

image_thumb2

The ProductID column is unique while the ProductCode is not. If we now define the ProductName attribute as follows we will also get a duplicate key error:

image_thumb4

The reason here is that for the ProductCode 1 two names are found (and therefore the select distinct returns two lines with ProductCode 1), so ProductCode is not a good key here. The problem would disappear if the ProductName for the third line would also be ‘A’ (like for the first line) or the ProductCode for the third line would be other than 1 or 2.

However, this reason occurs rather seldom because usually if we have a key and a name in our dimension, the source comes from some kind of master data table and therefore it should be unique. But for type 2 slowly changing dimensions you must not use the business key as key column (as there may be many rows with the same business key).

Another way to “solve” duplicate key errors (although not recommended) is to set the “KeyDuplicate” property for the error processing of the dimension to “IgnoreError” as shown below:

image_thumb18

However, this is definitely not recommended except for prototyping scenarios. This is clearly explained here http://technet.microsoft.com/en-us/library/bb630297.aspx.

Sunday, November 15, 2009

How to define an Excel compliant format strings for a cube

SQL Server 2005 | 2008

Cube defined formats are not necessarily displayed properly in Excel 2007. Sometimes the format seems to be ignored completely although it is working fine in the cube browser.

For a simple example I created a cube based in the following source table which acts as dimension and fact at the same time:

UnitID FormatString SampleValue
1 #,##0.00 12345.678
2 #,##0.00€ 12345.678
3 #,##0.00\€ 12345.678
4 #,##0.00 EUR 12345.678
5 #,##0.00 \E\U\R 12345.678
6 #,##0.00" EUR" 12345.678
7 #,##0.00"€" 12345.678

The ideas is simply to try out different format strings. I used the first two columns for my sample dimension and the first and the last column for my fact table giving a very simple cube

image_thumb9

In order to use the format string to format the sample value, I used this simple cube script:

scope ([Dim Unit].[Unit Format].[Unit Format]);
format_string(this) = [Dim Unit].[Unit Format].currentmember.name;
end scope;

Now, let’s browse our cube using the cube browser:

image_thumb2

As you can see, every of our approaches for formatting the value worked as expected. Now let’s take a look at the same cube using Excel 2007:

image_thumb8

As you can see, not all format strings are also understood by Excel. Simply using € or writing EUR confuses Excel – the value is displayed without any format. And although the cube browser understands even these formats, it’s dangerous to use unescaped characters in your format string as many of them have a meaning (like HH for the hour in a 24 hour representation). So it’s always a good idea to escape the characters. As you can see, both the backspace and the quotation marks work fine with Excel so it’s up to you what you prefer (or how other clients interpret the format string…).

Sunday, November 8, 2009

Simple Banding function for KPI Status or KPI Trend

SQL Server 2008

Although the way of defining the KPI status and trend is very flexible it is also somehow circumstantial if you simply want to rate a KPI by its target value. In this case you would have to calculate the status of the KPI as being between –1 (worse) and +1 (best).

In many cases we find simple case statements here:

case
when KPIVALUE("TestKPI") >= 0.95 * KPIGOAL("TestKPI") then 1
when KPIVALUE("TestKPI") < 0.7 * KPIGOAL("TestKPI") then -1
else 0
end

This results in the KPI status being –1, 0 or +1. But if you want to blend smoothly between these values you have to use a formula that is a little bit more complex. Especially if you need to this computation for many KPIs it’s not really nice.

I had some examples for AS stored procedures (sprocs) in this blog before. So here is a very simple one to calculate a linear approximation between the boundaries:

public static double KPIStatusBanding(double bound1, double bound2, double actual)
{
double factor=0;

if (bound1 < bound2) factor = 1;
else if (bound1 > bound2) factor = -1;
else return 0;

if (actual < bound1) return -factor;
else if (actual > bound2) return factor;
else return factor*2 * (actual - bound1) / (bound2 - bound1) - 1;
}

The parameters are as follows:

bound1 worst value
bound2 best value
actual actual value

image_thumb1

If bound1 < bound2 bigger values are better, if bound1 > bound2, lower values are better. With this function the calculation is quite easy. First let’s test the function itself with simple MDX queries:

with
member test as ASStatistics!KPIStatusBanding(100,200,175)
select test on 0
from [Adventure Works]

This results in a status value of 0.5.

In order to use the same boundaries as in my first example, I would use this KPI status expression:

ASStatistics!KPIStatusBanding(
0.7 * KPIGOAL("TestKPI"),
0.95 * KPIGOAL("TestKPI"),
KPIVALUE("TestKPI"))

image_thumb11

As we can see from the KPI browser the values now gets approximated smoothly:

image_thumb3

Role-playing time dimensions and semi-additive measures

My last two posts have been about semi-additive measures, so this is part 3 of my semi-additive measures mini series. This post is about role-playing time dimensions together with semi-additive measures. The good news is, that this shouldn't happen too often. At least it took me some time to invent an example. However, since I couldn't use the Adventure Works samples database I had to create my own data.

This is my scenario: I measure the number of cars in certain car parks (dimension) over the time (dimension). I modeled the time as date and hour (one of my next posts will be about how to separate date and time to separate dimensions), so I'm getting hourly snapshots of my car parks. The model looks somewhat like this:

image_thumb4

Since my car parks are located all over the world, I want to see the number of cars that were in the car park by local time and by GMT thus explaining the two links from the fact table to the date dimension. In my SSAS project I defined the DimDate dimension as a time dimension. So we have role playing time dimensions (roles are local time and GMT). The number of cars is our semi additive measure (using the average aggregation function for the time dimension).

Just to get familiar with the data I generated as my test data I first started with two separate cubes, one for the local time and one for GMT. In this step we don't have multiple time dimensions per cube so it's our normal and safe situation.

image_thumb7

The result reflects the time shift, therefore in the GMT cube you can see the same values just shifted by 6 and 9 hours respectively. As each cube has just one time dimension the aggregation is just the way we would expect it to be. For example, the local time cube takes the average over the local time and sums up the values across the car parks.

Now, what happens, if we put both dimensions and our parking fact table into one cube? As we know, it depends on the question, which time dimension comes first. If the local date is the first time dimension my result looked like this:

image_thumb10

The aggregation across the local time is correct (average) as expected while the aggregation across GMT looks strange. If you remember my post about the effects of semi-additive measures on other dimensions you will not be surprised. What you can see from the picture above is that

  • The aggregation across the GMT is correct for each car park (it averages the details) although GMT is not treated as a time dimension
  • The aggregation of the total across multiple car parks is wrong (it also averages the details).
  • The grand total cannot be easily explained (neither the sum nor the average of one of the two axis).

How can that behavior be explained? Let's take a look at the details:

image31

For this table I simply put the local time on the columns and the GMT on the rows. I also filtered for our two car parks we analyzed above. Now what you can see is that you get multiple local time values for each GMT hour. When looking at GMT 0:00 you will see 2239 cars in Denver and 2052 in Toronto but both at different local time (16:00 and 18:00). As our cube takes the local time is the time dimension, it averages the number of cars in both car parks, resulting in 2146 cars as (2239+2052) / 2 = 2146. This explains the reason why we see averages here while aggregating different car parks.

So, if we take a look again at the table above, this explains the averages:

image_thumb19

The numbers 2239 and 2052 in the first line are resulting from different car parks but also from different local times and that's why we see the average as a total here.

This also explains the total of 1932 as it is the average across all sums of our GMT data (in the picture above it would be the average of the last line which is not completely visible).

Ok, some things to remark:

  1. The behavior in our example above is exactly the same if the second time dimension is not a role playing one but a real dimension (own data source and own database dimension definition). Therefore it makes no difference here how you model your dimensions.
  2. In order to get proper aggregates you could use cube scripts (but keep in mind that you can only override existing cube cells, so there are drawbacks with multi-selects for example using Microsoft Excel).
  3. The aggregation problem would not exist if you would use separate measure groups for each of the role playing dimensions (ie. FactUtilizationLocal and FactUtilizationGMT) or separate cubes.
  4. The above data has nothing to do with real car parks but it is randomly created (using a noised sine curve).

Semi-additive measures, multiple measure groups and time dimensions

My last post was about the effects of semi-additive measures on dimensions other than the time dimension. Now let's focus on the situation of having more than one dimension of type 'time' in our cube.

As we've seen from many posts this may result in our aggregations are not what we would expect. The effect becomes clearly visible with semi-additive measures (although you should always set the type of your time dimension in order to provide useful information to the OLAP-clients). Only the first dimension is treated as a time-dimension while all following time dimensions are treated as normal dimensions. If, for example, a specific measure is set to aggregate as 'average of children' this will only consider the first time dimension as the one to compute the average for.

Now, let's get a little bit more detailed on that 'first' time dimension. In this context, 'first' relates to the order of the dimensions in a cube which can be changed by drag'n'drop on the first tab of the cube file editor (lower left corner):

image_thumb1

Now, imagine we have two dimensions of type 'time' in our cube. Let's name them 'Local Date' and 'GMT Date'. The 'Local Date' dimension comes first. The questions which of those dimensions is actually considered the time dimension of a measure group depends on the dimension usage:

  • If the measure group is only linked to the 'Local Date' dimension, this becomes the time dimension for the measure group
  • If the measure group is only linked to the 'GMT Date' dimension, this becomes the time dimension for the measure group
  • If the measure group is linked to both dimensions, the 'Local Date' dimension becomes the time dimension for the measure group (because it appears first in the dimension list of the cube)

In this sense, 'linked' really means the dimension usage as being defined on the dimension usage tab of the cube file editor. Links that are only created in the date source view but not modeled in the cube have no influence in choosing the first time dimension. For example, if the measure group 'Utilization GMT' is linked to both time dimensions (Local Date and GMT date) in the date source view but only to 'GMT Date' in the dimension usage, then the 'GMT Date' dimension would be the time dimension for the measure group even if the 'Local Date' dimension appears in front of the 'GMT Date' dimension. Sounds confusing? Ok, here is the quintessence:

You may have multiple dimensions of type 'time' in your cube, but you should avoid measure groups that have more than one of them in their dimension usage.

In any case, having more than one time dimension in a cube can be error-prone, even if you follow the above recommendation. It would have been better if Microsoft put the 'time dimension' as an attribute to the measure group instead of simple choosing the one that first appears in the cube definition. If somebody changes the order for any reason the aggregation behavior in one or more of your measure groups might change too.

You might not going to face a situation with more than one time dimension too often. But what about role playing time dimensions and semi-additive measures? My next post will be about this topic.

Sunday, August 23, 2009

Effects of semi-additive measures on other dimensions

Semi-additive measures are measures that don't use the same aggregation method along all of the cube's dimensions. In SSAS the time dimension plays an important role here. For example, if you choose the 'AverageOfChildren' aggregation method, the measure is averaged over the time but summed up over all other dimensions. When do we need this? Well, usually semi-additive measures are used when working with snapshot data like stock levels, balances of accounts etc.

For an example, let's look at a car park. We want to analyze the number of cars that are in one of our two park houses at a certain time. For our example we use a very simple fact table looking like this (I simplified the time to just 2 days, so imagine we're counting the cars at 1pm):

1_thumb1

As you can see, someone also wanted to analyze the weather at each park house and each day so that we can easily see the effect the weather has on the parking behavior (obviously more people took the car when the weather was bad...).

Based on the fact data above we use a very simple data model looking like this:

2_thumb

Now let's do some analysis on that model. First we analyze by date and park house:

3a_thumb2

This clearly shows the semi-additive behavior of our measure: For each row (combining the two days) the value is averaged while it gets summed up across the park houses (columns) as expected. At day 1, 1pm we had 10 cars in car park P1. At the same time on day 2 there were 30. Of course we cannot add these values together but the average makes sense. So the meaning of the grand total is 'at 1pm there was an average of 20 cars in our car park P1'.

Now, what would happen if we start analyzing by the weather? As the weather is not our time dimension you might expect values to add up as a simple sum. So let's try:

3_thumb

Hmm, what happened here? The sum would have been 75 but our OLAP query results in a value of 50. What had happened and what is the total? You can easily see that it is neither the sum nor the average of the above values.

In order to clarify this behavior, let's also include the time in our analysis:

4_thumb1

Using this view we can clearly see that SSAS made no mistake. The total number of cars being in our park houses at an average weather is 25 (average of 20 and 30, line one in the grid above), so this is correct. On the other hand, we had 30 cars in our park houses at day 1 (20 in a park house with average weather at that day and 10 in a park house with good weather) and 70 at day 2 giving a total average of 50. So this is also correct.

So the total for the weather is not the average or the sum of it's detailed values but the average of the summed up values.

This is a very simple example of how semi-additive aggregation of a measure also influences aggregation against non-time dimensions as well.

With this in mind you are now well prepared to look at the following analysis based on the same source data:

5_thumb1

This time we left the time dimension completely out of the analysis resulting in averages along both other dimensions (park house and weather) as all values we actually see are already averages.

Surely, it's less confusing when looking at a single point of time like in the example below, where everything sums up neatly:

6_thumb1

Sunday, August 9, 2009

Calculating the proper ISO week for the time dimension

There are some definitions for calendar weeks and T-SQL offers many of those calculations out of the box. When you're doing projects in Europe you will frequently find the ISO week definition for the calendar week as of ISO 8601. Week 01 in the ISO week calendar is defined is week with the year's first Thursday in it and weeks start with Monday and end with Sunday. The notation described in the ISO looks for exmaple like this 2009-W28-6. This would mean the sixth day of the ISO week (i.e. Saturday as the ISO week starts with Monday) of week 28 in the year 2009.

 

Fortunately T-SQL 2008 knows how to compute the ISO week number using the datapart function, for example

declare @dt date;
set @dt=CAST('2009-7-11' as DATE)
select DATEPART(ISO_WEEK,@dt)

results in 28 (week 28).

Now for the day of the week we can again use datepart, this time with the WEEKDAY parameter:

select DATEPART(WEEKDAY,@dt)

Depending on your machine settings this will most likely return 7 (as Saturday is the 7th day of the week when starting with Sunday=1). We can override the first day setting by using the following statement:

set datefirst 1;

 

After doing so the result of the above statement would be 6 as required. If you don't want to use the datefirst setting you could as well correct the day number manually using the modulus operator %:

select (DATEPART(weekday,@dt)+5) % 7+1

 

So up till now we have our week and the day. The remaining thing is the year. You might expect this to be very easy as the T-SQL year function extracts the year from a given date. But the year of the calendar week might differ from the calendar year, for example December 29, 2008 would be in the ISO week 1 of 2009 (2009-W1-1).

Therefore we need to compute the proper week. A way to do so is like this:

select DATEPART(year,DATEADD(d,-(DATEPART(weekday,@dt)+5) % 7+3,@dt))

 

Putting it all together we end up with the following computation:

cast(DATEPART(year,DATEADD(d,-(DATEPART(weekday,@dt)+5) % 7+3,@dt)) as nvarchar)
+N'-W'+CAST(DATEPART(ISO_WEEK,@dt) as nvarchar)
+N'-'+cast((DATEPART(weekday,@dt)+5) % 7+1 as nvarchar)

 

In order to use this expression in your data source views (DSV) computation you would have to replace the @dt variable with the specific date field and you have to make sure that the datefirst parameter is set to 7 (default), i.e.

select @@datefirst

should return the value 7.

Wednesday, August 5, 2009

Book recommendation: Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

So here is my first book recommendation in this blog. The authors (Chris Webb, Alberto Ferrari and Marco Russo) are all well known Analysis Services experts and they did a pretty good job in writing this book. It is fully packed with information, tips, tricks and recommendations. The authors also reference not only the core Analysis Services features but also provide useful links to other tools like BIDS Helper or MDX Studio.

clip_image002_thumb1

You can find the book and more information about its content in the Amazon book store.

Saturday, July 25, 2009

How to create time attribute relationship (2)

Building your time attribute relationship you have to consider the following questions:

  • How will I normally navigate the date hierarchies?
  • What formatting options do you usually use in your reports, eg. for the month: fullname (January, ...), shortname (Jan, ...), number (1, ...)
  • What other information do you need (eg. holidays, day number of year)?

When building up your time dimension, you should

  • Set up proper hierarchies giving the commonly used navigation routes
  • Remove attribute hierarchies of attributes that are also appear as a level in a hierarchy
  • Use proper keys for your attributes and set up the attribute relationship accordingly
  • Use rigid instead of flexible relationship whenever possible (usually this is true for date attributes...)
  • Name your attributes using a defined naming schema. The attribute type property gives a good advice. For example you can use names like DayOfMonth (1 to 31), DayOfWeek (1 to 7), DayOfYear (1 to 365) etc. Don't get confused with the many options available. Usually you will only need few of them. There is a good overview at http://technet.microsoft.com/en-us/library/ms175662.aspx.
  • Set the proper attribute type for each of your attributes
  • Set the dimension type property to "time" (important if you're also using semiadditive measures as they require a time dimension)

Below is an example of a date dimension I recently used. Keep in mind though that this is just an example. You might not need an ISO week or quarters (I usually try to avoid them...) or you may even want to see semesters or incorporate a manufacturing calendar.

image_thumb2

Attribute Key Name Sort by Type
Date DateID (eg. 20090711)     Date
Day Format DateID DayFormat (localized formatted date like 07/11) key Date
Day of Month DateID DayOfMonth (1 to 31) key DayOfMonth
Day of Year DayOfYear (computed) DayOfYear (1 to 365) key DayOfYear
Day Type Year
Month
DayOfMonth
DayType (eg. Holiday) key IsHoliday
Day Short Name Year
Month
DayOfMonth
DayShortName (e.g. Sa) key Days
Day Name Year
Month
DayOfMonth
DayName (eg. Saturday) key Days
Fiscal Month Of Year FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthOfYear (eg. 7) key
FiscalMonthOfYear
Fiscal Month Short Name FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthShortName (eg. Jul) key
FiscalMonthOfYear
Fiscal Month Name FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthName (eg. July) key FiscalMonths
Fiscal Quarter Name FiscalYear
QuarterOfFiscalYear
QuarterName (eg. Q2) key FiscalQuarters
Fiscal Year FiscalYear FiscalYear (eg. 2009) name FiscalYears
ISO Week ISOYear
ISOWeek (1 to 53)
ISOWeekName (29/09) key Iso8601WeekOfYear
ISO Year ISOYear ISOYear name Iso8601Years
Month Of Year Year
MonthOfYear (1 to 12)
MonthOfYear (1 to 12) key MonthOfYear
Month Short Name Year
MonthOfYear (1 to 12)
MonthShortName key Months
Month Name Year
MonthOfYear (1 to 12)
MonthName key Months
Quarter Name Year
QuarterOfYear (1 to 4)
QuarterName (eg. Q2) key Quarters
Week Of Year Year
WeekOfYear (1 to 52)
WeekOfYear (1 to 52) key WeekOfYear
Year Year Year name Years

Of course this is just one way of doing it and you might start using this approach and finetune it to your needs later. You may notice that I used the same source columns as names for fiscal month and month (and other related attributes). This is done in order to provide different keys for the fiscal periods in order to sort for example the months accordingly. For the standard calendar you want to see your months as January, February, ..., December while for a fiscal calendar with a fiscal year start of April 1 you want to see the fiscal months in the correct order April, May, June, ... December, January, February, March. The separate attributes allow different keys and so the sorting is not complicated.

You will also notice the ISO Week / ISO Year attributes which I've included for the European countries. I'll give some more information about this in a later post.

In this situation you could build natural hierarchies like the following ones:

image_thumb5

The following screenshot demonstrates how you can leverage the atttribute relations for example in Microsoft Excel 2007:

image_thumb11

Here, we simple display some more attributes alongside with the day information.

The result may then look like this

image_thumb10

If you've struggled with date dimension attributes and relationship too like I did, I'm very interested in the ideas you developed for setting up your date dimension and I encourage you to leave me a comment.

Saturday, July 11, 2009

How to create a time attribute relationship (1)

Thanks to Google Analytics I saw that many page hits on my blog result from keyword searches for "create time attribute relationship". And in fact, it seems to be more complicated to do a proper attribute relationship modeling with the time dimension than with most other dimensions. The reason for that is simple: The time dimension involves many related attributes (e.g. months can be written as number, short- and longname) and parallel hierarchies (like calender and fiscal view or calender weeks).

In this blog I had two posts about attribute relationship so far (here and here). These two posts are the background for the current post as they provide information why you should set proper attribute relationship and how you can achieve this even if your data doesn't really reflect such a relationship (as with the month number of the year in relationship to the year number). So if you haven't read those two posts, I recommend you start with them.

When discussing with other consultants about the date attribute relationship there are many different favors and dislikes. For example: Do you want the month as a number (so you can easily use in pivot tables) or is it better to only use the month's name and put the month's number in the key? Actually the latter is how AdventureWorks handles the topic, so let's take a look at that model:

image_thumb3

image2

The diagram in AdventureWorks looks pretty simple as many attributes have been hidden from the user. Keep in mind, however, that every design of the attribute relationship has it's consequences for usage in pivot tables or queries. For example, think of a need to display the 365 days of the year as rows and showing what month they belong to as a second column.

One approach for that could be to use the following MDX query:

with member monthname as [Date].[Month of Year].currentmember.name
select monthname on 0,
[Date].[Day of Year].[Day of Year]
on 1
from [Adventure Works]

But since the attributes "Day of Year" (1, 2, 3, ... 365) and "Month of Year" (January, February etc.) have no direct relationship, the query shows an output like this:

image_thumb31

In the diagram above attributes are displayed in their own box if they act as a hierarchy level or if they have other attribute relationships associated with them. As in SQL 2005 the arrow points to the less detailed level (e.g. arrow from month to year because the year is less detailed).

Anyway, there is nothing wrong with the attribute hierarchies above. It's just something you need to be aware of and if you start building your time dimension, Adventure Works might be a good approach.

As you can easily explore all the settings (keys, names, hierarchies) in the Adventure Works sample database, I'll give another example of how to set up a time dimension in the next post.

Monday, May 25, 2009

Liquid browsing

Last week I had the opportunity to talk with Peter Leweke from Liquiverse about their new data browser and their focus to turn data from lists into a meaning or as they say on their homepage "Life beyond lists". It's really exciting to play with the beta release I got. Liquiverse will announce their new release of LiquidCSV soon and it's definitely worth taking a look.

I've included a screenshot taken from an export of some AdventureWorks' human resources data.

liquid_thumb1

What makes the browser really interesting is the way, how fast and dynamically elements flow when changing sort order or applying filters. Also, when you move your mouse over some area with many data points, they magically drift away so you can easily address every single item of data. It's hard to explain - you will have to experience it.

Now, what does this have to do with OLAP? Today Liquiverse doesn't offer an interface to MDX or XMLA data sources but the API will be available in 2009 and it shouldn't be too difficult to include other data sources or to apply drill down effects using the API. So, I'm really looking forward to what's going on at Liquiverse.

Saturday, May 23, 2009

Accessing duplicate members in dimensions

In the last post we had an issue with multiple members sharing the same name but with different key attributes. Every approach of addressing such members with MDX like

[Dimension].[Hierarchy].[Name] or [Dimension].[Hierarchy].&[Key]

fails because this fetches only one particular member while the measure data may be associated with one of the other members sharing the same key.

Ok, usually you shouldn't be in this situation. If for example you're using the business key (instead of the surrogate key) as the attribute key, you only get one member and all the fact data is associated to that member. Therefor the above member expressions work fine.

But with parent-child attributes you cannot do this easily and writing the MDX from the previous post in order to filter all elements with a given name is somehow painful. And if you're keeping historical elements in the parent-child hierarchy you may want to find all elements with a given name.

The last post showed a solution to achieve this using MDX by filtering the elements by name, but this is somehow painful especially with many elements. So for today's post we're trying a different approach with a stored procedure.

The stored procedure simply takes a hierarchy and a string with comma separated member names and returns all matching members in the correct order. This may also be helpful for report development if you need to provide a specific subset of elements (say cost centers, regions, branches etc.) in a given order. Usually you can put those elements in a report parameter (maybe a hidden one) and use the strtoset-function to expand the elements to a set. But as stated above, this doesn't work with elements that share the same name and it might also be difficult to achieve a manually given sort order.

So, here's how the function works in the Adventure Works cube:

select {[Measures].[Amount]} on 0,
ASTools.GetMembers([Account].[Accounts],'Assets,Current Assets, Other Assets') on 1

from [Adventure Works]

The result looks like this:

image_thumb1

Note that although we just specified 'Other Assets', both accounts with that name are now in the list. This is the behavior we wanted to achieve. As stated in the last post, in reality, the two account are not the same, but for our example, we pretend they are as the situation would be exactly identical with slowly changing parent-child structures.

For clarification, take a look at the following query:

select {[Measures].[Amount]} on 0,
{
[Account].[Accounts].[Assets],
[Account].[Accounts].[Current Assets],
[Account].[Accounts].[Other Assets]
} on 1

from [Adventure Works]

In this case the account 'Other Assets' is specified by it's member's unique name which refers to only one member of the dimension. As expected, the result looks like this (note the missing $75K):

image_thumb2

Finallly, here is the source code for the procedure:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.AnalysisServices.AdomdServer;

namespace SSASTools
{
public class SSASLibrary
{
public Set GetMembers(Hierarchy h, String memberList)
{
SetBuilder sb = new SetBuilder();
Expression ex = new Expression();

String[] members = memberList.Split(',');

foreach (String cmembername in members)
{
ex.ExpressionText = "filter(" + h.UniqueName + ".allmembers, " + h.UniqueName + ".currentmember.name='"+ cmembername.Trim() + "')";
Set s = ex.CalculateMdxObject(null).ToSet();

foreach (Tuple t in s) {
sb.Add(t);
}
}

return sb.ToSet();
}
}
}

Saturday, May 9, 2009

Parent-child dimensions, historical values (SCD) and keys

Usually working with historical values in an SSAS cube is not really complicated. Think of a customer whose ZIP code changed over time which is modeled as a slowly changing dimension of type 2:

ID Customer ZIP Current
1 John Doe 10000 false
2 John Doe 10200 false
3 John Doe 13041 true

If we don't include the ZIP-code in the MDX query, we just see the sales. For example, let's take a look at the following query:

select {[Date].[Year].[Year]} on 0,
{[Customer].[Customer].[John Doe]} on 1
from [Sample Cube]
where {[Measures].[Sales]}

The result of this statement would look somewhat like this:

Customer 2004 2005 2006
John Doe $3,400 $2,000 $1,999

The sales of the customer are related to the ID (fact to dimension relationship), so we can see the aggregated sales for John Doe in total and we can still filter sales per ZIP code over the time (let's assume John Doe moved to a new house with a new ZIP twice).

But why don't we see John Doe three times in this example? Well, if you don't, it's because you used the customer name field (not the id) as the key for the attribute Customer. When reading the dimension attributes from the dimension, SSAS sends a query that looks somewhat similar like this:

select distinct Customer as Key, Customer as Name, ... from DimCustomer

So the key determines if two rows in the source table make the same dimension member or not. Generally speaking it's good practice not to use the artificial surrogate key as the key for your dimension attribute, especially when you're working with slowly changing dimensions.

But what about parent-child dimensions? In this situation, the primary key of your dimension table will automatically be the key for your dimension members as each node of the parent-child tree has its unique key. If you want to keep historical values in the parent child dimension (like changes in other attributes or the location within the tree) you'll end up with different members sharing the same name but having different keys.

Ok, this sounds confusing, so let's start with an example. I'm using Adventure Works DWH 2K8 as a source. When looking in the base dimension table we'll find something like this:

image_thumb1

Of course, this is not the same account (one is for assets, one for expenditures) but for our example let's pretend that the two accounts with the name "Other Assets" have been created because of a slowly dimension change, maybe the account with the id 24 has moved from the parent account 2 to the parent account 79. Again, this is not what was intended in the dimension table but for our example, the situation would be exactly the same.

Just like in our example with the customer John Doe above, when selecting "Other Assets" we want to see all fact values that are associated to one of the two accounts. So we might try this MDX query:

select
[Measures].[Amount] on 0,
[Account].[Accounts].[Other Assets] on 1

from [Adventure Works]

The result looks like this:

image_thumb2

Now, is this really the value for every amount that is associated with at least one of the two keys (24 and 84) above? Not really. In fact we just saw the values on one of the two rows. The reason is that because of the key column being different we really have two separate members in the dimension, both named "Other Assets". Each member expression just resolves to one member (no matter if you're using name or key or MDX functions like strtomember, nametoset etc.). So the expression [Account].[Accounts].[Other Assets] just resolves to one of the two dimension members and therefor the displayed value just reflects the value on that one member.

In order to show the difference, let's rewrite the query using a filter expression:

select {[Measures].[Amount]} on 0,

filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
) on 1

from [Adventure Works]

Now, the result shows both accounts with the given name:

image_thumb3

Here you can clearly see what was missing in our previous query.

Of course, we could as well aggregate both values (never do this with your real balance sheet though...) using the aggregate function:

with member [Account].[Accounts].[Other MemberAgg] as
Aggregate(
filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
)
)

select {[Measures].[Amount]} on 0,
[Account].[Accounts].[Other MemberAgg]
on 1

from [Adventure Works]

This kind of problem applies to several situations. Just think of a situation where you want to remove the account "Other Assets" from your query using the set minus operation:

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members - [Account].[Accounts].[Other Assets]
on 1

from [Adventure Works]

Without thinking about different keys you might have been surprised to still find "Other Assets" in the result of the query above:

image_thumb4

As expected, we just eliminated one of the two accounts from the list by addressing it with the member expression above. In order to get the desired result we also have to use the filter expression:

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members -

filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
)
on 1

from [Adventure Works]

image_thumb5

In the next post we'll continue thinking about those duplicate members and how to work with them efficiently.