Sunday, November 8, 2009

Simple Banding function for KPI Status or KPI Trend

SQL Server 2008

Although the way of defining the KPI status and trend is very flexible it is also somehow circumstantial if you simply want to rate a KPI by its target value. In this case you would have to calculate the status of the KPI as being between –1 (worse) and +1 (best).

In many cases we find simple case statements here:

case
when KPIVALUE("TestKPI") >= 0.95 * KPIGOAL("TestKPI") then 1
when KPIVALUE("TestKPI") < 0.7 * KPIGOAL("TestKPI") then -1
else 0
end

This results in the KPI status being –1, 0 or +1. But if you want to blend smoothly between these values you have to use a formula that is a little bit more complex. Especially  if you need to this computation for many KPIs it’s not really nice.

I had some examples for AS stored procedures (sprocs) in this blog before. So here is a very simple one to calculate a linear approximation between the boundaries:

public static double KPIStatusBanding(double bound1, double bound2, double actual)
{
    double factor=0;

    if (bound1 < bound2) factor = 1;
    else if (bound1 > bound2) factor = -1;
    else return 0;

    if (actual < bound1) return -factor;
    else if (actual > bound2) return factor;
    else return factor*2 * (actual - bound1) / (bound2 - bound1) - 1;
}

The parameters are as follows:

bound1 worst value
bound2 best value
actual actual value

 

image

If bound1 < bound2 bigger values are better, if bound1 > bound2, lower values are better. With this function the calculation is quite easy. First let’s test the function itself with simple MDX queries:

with
member test as ASStatistics!KPIStatusBanding(100,200,175)
select test on 0
from [Adventure Works]

This results in a status value of 0.5.

In order to use the same boundaries as in my first example, I would use this KPI status expression:

ASStatistics!KPIStatusBanding(
0.7 * KPIGOAL("TestKPI"),
0.95 * KPIGOAL("TestKPI"),
KPIVALUE("TestKPI"))

image

As we can see from the KPI browser the values now gets approximated smoothly:

image

Role-playing time dimensions and semi-additive measures

My last two posts have been about semi-additive measures, so this is part 3 of my semi-additive measures mini series. This post is about role-playing time dimensions together with semi-additive measures. The good news is, that this shouldn't happen too often. At least it took me some time to invent an example. However, since I couldn't use the Adventure Works samples database I had to create my own data.

This is my scenario: I measure the number of cars in certain car parks (dimension) over the time (dimension). I modeled the time as date and hour (one of my next posts will be about how to separate date and time to separate dimensions), so I'm getting hourly snapshots of my car parks. The model looks somewhat like this:

image

Since my car parks are located all over the world, I want to see the number of cars that were in the car park by local time and by GMT thus explaining the two links from the fact table to the date dimension. In my SSAS project I defined the DimDate dimension as a time dimension. So we have role playing time dimensions (roles are local time and GMT). The number of cars is our semi additive measure (using the average aggregation function for the time dimension).

Just to get familiar with the data I generated as my test data I first started with two separate cubes, one for the local time and one for GMT. In this step we don't have multiple time dimensions per cube so it's our normal and safe situation.

image

The result reflects the time shift, therefore in the GMT cube you can see the same values just shifted by 6 and 9 hours respectively. As each cube has just one time dimension the aggregation is just the way we would expect it to be. For example, the local time cube takes the average over the local time and sums up the values across the car parks.

Now, what happens, if we put both dimensions and our parking fact table into one cube? As we know, it depends on the question, which time dimension comes first. If the local date is the first time dimension my result looked like this:

image

The aggregation across the local time is correct (average) as expected while the aggregation across GMT looks strange. If you remember my post about the effects of semi-additive measures on other dimensions you will not be surprised. What you can see from the picture above is that

  • The aggregation across the GMT is correct for each car park (it averages the details) although GMT is not treated as a time dimension
  • The aggregation of the total across multiple car parks is wrong (it also averages the details).
  • The grand total cannot be easily explained (neither the sum nor the average of one of the two axis).

How can that behavior be explained? Let's take a look at the details:

image

For this table I simply put the local time on the columns and the GMT on the rows. I also filtered for our two car parks we analyzed above. Now what you can see is that you get multiple local time values for each GMT hour. When looking at GMT 0:00 you will see 2239 cars in Denver and 2052 in Toronto but both at different local time (16:00 and 18:00). As our cube takes the local time is the time dimension, it averages the number of cars in both car parks, resulting in 2146 cars as (2239+2052) / 2 = 2146. This explains the reason why we see averages here while aggregating different car parks.

So, if we take a look again at the table above, this explains the averages:

image

The numbers 2239 and 2052 in the first line are resulting from different car parks but also from different local times and that's why we see the average as a total here.

This also explains the total of 1932 as it is the average across all sums of our GMT data (in the picture above it would be the average of the last line which is not completely visible).

Ok, some things to remark:

  1. The behavior in our example above is exactly the same if the second time dimension is not a role playing one but a real dimension (own data source and own database dimension definition). Therefore it makes no difference here how you model your dimensions.
  2. In order to get proper aggregates you could use cube scripts (but keep in mind that you can only override existing cube cells, so there are drawbacks with multi-selects for example using Microsoft Excel).
  3. The aggregation problem would not exist if you would use separate measure groups for each of the role playing dimensions (ie. FactUtilizationLocal and FactUtilizationGMT) or separate cubes.
  4. The above data has nothing to do with real car parks but it is randomly created (using a noised sine curve).

Semi-additive measures, multiple measure groups and time dimensions

My last post was about the effects of semi-additive measures on dimensions other than the time dimension. Now let's focus on the situation of having more than one dimension of type 'time' in our cube.

As we've seen from many posts this may result in our aggregations are not what we would expect. The effect becomes clearly visible with semi-additive measures (although you should always set the type of your time dimension in order to provide useful information to the OLAP-clients). Only the first dimension is treated as a time-dimension while all following time dimensions are treated as normal dimensions. If, for example, a specific measure is set to aggregate as 'average of children' this will only consider the first time dimension as the one to compute the average for.

Now, let's get a little bit more detailed on that 'first' time dimension. In this context, 'first' relates to the order of the dimensions in a cube which can be changed by drag'n'drop on the first tab of the cube file editor (lower left corner):

image

 

Now, imagine we have two dimensions of type 'time' in our cube. Let's name them 'Local Date' and 'GMT Date'. The 'Local Date' dimension comes first. The questions which of those dimensions is actually considered the time dimension of a measure group depends on the dimension usage:

  • If the measure group is only linked to the 'Local Date' dimension, this becomes the time dimension for the measure group
  • If the measure group is only linked to the 'GMT Date' dimension, this becomes the time dimension for the measure group
  • If the measure group is linked to both dimensions, the 'Local Date' dimension becomes the time dimension for the measure group (because it appears first in the dimension list of the cube)

In this sense, 'linked' really means the dimension usage as being defined on the dimension usage tab of the cube file editor. Links that are only created in the date source view but not modeled in the cube have no influence in choosing the first time dimension. For example, if the measure group 'Utilization GMT' is linked to both time dimensions (Local Date and GMT date) in the date source view but only to 'GMT Date' in the dimension usage, then the 'GMT Date' dimension would be the time dimension for the measure group even if the 'Local Date' dimension appears in front of the 'GMT Date' dimension. Sounds confusing? Ok, here is the quintessence:

You may have multiple dimensions of type 'time' in your cube, but you should avoid measure groups that have more than one of them in their dimension usage.

In any case, having more than one time dimension in a cube can be error-prone, even if you follow the above recommendation. It would have been better if Microsoft put the 'time dimension' as an attribute to the measure group instead of simple choosing the one that first appears in the cube definition. If somebody changes the order for any reason the aggregation behavior in one or more of your measure groups might change too.

You might not going to face a situation with more than one time dimension too often. But what about role playing time dimensions and semi-additive measures? My next post will be about this topic.

Sunday, August 23, 2009

Effects of semi-additive measures on other dimensions

Semi-additive measures are measures that don't use the same aggregation method along all of the cube's dimensions. In SSAS the time dimension plays an important role here. For example, if you choose the 'AverageOfChildren' aggregation method, the measure is averaged over the time but summed up over all other dimensions. When do we need this? Well, usually semi-additive measures are used when working with snapshot data like stock levels, balances of accounts etc.

For an example, let's look at a car park. We want to analyze the number of cars that are in one of our two park houses at a certain time. For our example we use a very simple fact table looking like this (I simplified the time to just 2 days, so imagine we're counting the cars at 1pm):

1

As you can see, someone also wanted to analyze the weather at each park house and each day so that we can easily see the effect the weather has on the parking behavior (obviously more people took the car when the weather was bad...).

Based on the fact data above we use a very simple data model looking like this:

 

2 

 

Now let's do some analysis on that model. First we analyze by date and park house:

3a

This clearly shows the semi-additive behavior of our measure: For each row (combining the two days) the value is averaged while it gets summed up across the park houses (columns) as expected. At day 1, 1pm we had 10 cars in car park P1. At the same time on day 2 there were 30. Of course we cannot add these values together but the average makes sense. So the meaning of the grand total is 'at 1pm there was an average of 20 cars in our car park P1'.

Now, what would happen if we start analyzing by the weather? As the weather is not our time dimension you might expect values to add up as a simple sum. So let's try:

3

Hmm, what happened here? The sum would have been 75 but our OLAP query results in a value of 50. What had happened and what is the total? You can easily see that it is neither the sum nor the average of the above values.

In order to clarify this behavior, let's also include the time in our analysis:

4

Using this view we can clearly see that SSAS made no mistake. The total number of cars being in our park houses at an average weather is 25 (average of 20 and 30, line one in the grid above), so this is correct. On the other hand, we had 30 cars in our park houses at day 1 (20 in a park house with average weather at that day and 10 in a park house with good weather) and 70 at day 2 giving a total average of 50. So this is also correct.

So the total for the weather is not the average or the sum of it's detailed values but the average of the summed up values.

This is a very simple example of how semi-additive aggregation of a measure also influences aggregation against non-time dimensions as well.

With this in mind you are now well prepared to look at the following analysis based on the same source data:

5

This time we left the time dimension completely out of the analysis resulting in averages along both other dimensions (park house and weather) as all values we actually see are already averages.

Surely, it's less confusing when looking at a single point of time like in the example below, where everything sums up neatly:

6

Sunday, August 9, 2009

Calculating the proper ISO week for the time dimension

There are some definitions for calendar weeks and T-SQL offers many of those calculations out of the box. When you're doing projects in Europe you will frequently find the ISO week definition for the calendar week as of ISO 8601. Week 01 in the ISO week calendar is defined is week with the year's first Thursday in it and weeks start with Monday and end with Sunday. The notation described in the ISO looks for exmaple like this 2009-W28-6. This would mean the sixth day of the ISO week (i.e. Saturday as the ISO week starts with Monday) of week 28 in the year 2009.

 

Fortunately T-SQL 2008 knows how to compute the ISO week number using the datapart function, for example

declare @dt date;
set @dt=CAST('2009-7-11' as DATE)
select DATEPART(ISO_WEEK,@dt)

results in 28 (week 28).

Now for the day of the week we can again use datepart, this time with the WEEKDAY parameter:

select DATEPART(WEEKDAY,@dt)

Depending on your machine settings this will most likely return 7 (as Saturday is the 7th day of the week when starting with Sunday=1). We can override the first day setting by using the following statement:

set datefirst 1;

 

After doing so the result of the above statement would be 6 as required. If you don't want to use the datefirst setting you could as well correct the day number manually using the modulus operator %:

select (DATEPART(weekday,@dt)+5) % 7+1

 

So up till now we have our week and the day. The remaining thing is the year. You might expect this to be very easy as the T-SQL year function extracts the year from a given date. But the year of the calendar week might differ from the calendar year, for example December 29, 2008 would be in the ISO week 1 of 2009 (2009-W1-1).

Therefore we need to compute the proper week. A way to do so is like this:

select DATEPART(year,DATEADD(d,-(DATEPART(weekday,@dt)+5) % 7+3,@dt))

 

Putting it all together we end up with the following computation:

cast(DATEPART(year,DATEADD(d,-(DATEPART(weekday,@dt)+5) % 7+3,@dt)) as nvarchar)
+N'-W'+CAST(DATEPART(ISO_WEEK,@dt) as nvarchar)
+N'-'+cast((DATEPART(weekday,@dt)+5) % 7+1 as nvarchar)

 

In order to use this expression in your data source views (DSV) computation you would have to replace the @dt variable with the specific date field and you have to make sure that the datefirst parameter is set to 7 (default), i.e.

select @@datefirst

should return the value 7.

Wednesday, August 5, 2009

Book recommendation: Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

So here is my first book recommendation in this blog. The authors (Chris Webb, Alberto Ferrari and Marco Russo) are all well known Analysis Services experts and they did a pretty good job in writing this book. It is fully packed with information, tips, tricks and recommendations. The authors also reference not only the core Analysis Services features but also provide useful links to other tools like BIDS Helper or MDX Studio.

clip_image002

You can find the book and more information about its content in the Amazon book store.

Saturday, July 25, 2009

How to create time attribute relationship (2)

Building your time attribute relationship you have to consider the following questions:

  • How will I normally navigate the date hierarchies?
  • What formatting options do you usually use in your reports, eg. for the month: fullname (January, ...), shortname (Jan, ...), number (1, ...)
  • What other information do you need (eg. holidays, day number of year)?

When building up your time dimension, you should

  • Set up proper hierarchies giving the commonly used navigation routes
  • Remove attribute hierarchies of attributes that are also appear as a level in a hierarchy
  • Use proper keys for your attributes and set up the attribute relationship accordingly
  • Use rigid instead of flexible relationship whenever possible (usually this is true for date attributes...)
  • Name your attributes using a defined naming schema. The attribute type property gives a good advice. For example you can use names like DayOfMonth (1 to 31), DayOfWeek (1 to 7), DayOfYear (1 to 365) etc. Don't get confused with the many options available. Usually you will only need few of them. There is a good overview at http://technet.microsoft.com/en-us/library/ms175662.aspx.
  • Set the proper attribute type for each of your attributes
  • Set the dimension type property to "time" (important if you're also using semiadditive measures as they require a time dimension)

 

Below is an example of a date dimension I recently used. Keep in mind though that this is just an example. You might not need an ISO week or quarters (I usually try to avoid them...) or you may even want to see semesters or incorporate a manufacturing calendar.

 

image

 

Attribute Key Name Sort by Type
Date DateID (eg. 20090711)     Date
Day Format DateID DayFormat (localized formatted date like 07/11) key Date
Day of Month DateID DayOfMonth (1 to 31) key DayOfMonth
Day of Year DayOfYear (computed) DayOfYear (1 to 365) key DayOfYear
Day Type Year
Month
DayOfMonth
DayType (eg. Holiday) key IsHoliday
Day Short Name Year
Month
DayOfMonth
DayShortName (e.g. Sa) key Days
Day Name Year
Month
DayOfMonth
DayName (eg. Saturday) key Days
Fiscal Month Of Year FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthOfYear (eg. 7) key
FiscalMonthOfYear
Fiscal Month Short Name FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthShortName (eg. Jul) key
FiscalMonthOfYear
Fiscal Month Name FiscalYear (eg. 2009)
MonthOfFiscalYear (1 to 12)
MonthName (eg. July) key FiscalMonths
Fiscal Quarter Name FiscalYear
QuarterOfFiscalYear
QuarterName (eg. Q2) key FiscalQuarters
Fiscal Year FiscalYear FiscalYear (eg. 2009) name FiscalYears
ISO Week ISOYear
ISOWeek (1 to 53)
ISOWeekName (29/09) key Iso8601WeekOfYear
ISO Year ISOYear ISOYear name Iso8601Years
Month Of Year Year
MonthOfYear (1 to 12)
MonthOfYear (1 to 12) key MonthOfYear
Month Short Name Year
MonthOfYear (1 to 12)
MonthShortName key Months
Month Name Year
MonthOfYear (1 to 12)
MonthName key Months
Quarter Name Year
QuarterOfYear (1 to 4)
QuarterName (eg. Q2) key Quarters
Week Of Year Year
WeekOfYear (1 to 52)
WeekOfYear (1 to 52) key WeekOfYear
Year Year Year name Years

 

Of course this is just one way of doing it and you might start using this approach and finetune it to your needs later. You may notice that I used the same source columns as names for fiscal month and month (and other related attributes). This is done in order to provide different keys for the fiscal periods in order to sort for example the months accordingly. For the standard calendar you want to see your months as January, February, ..., December while for a fiscal calendar with a fiscal year start of April 1 you want to see the fiscal months in the correct order April, May, June, ... December, January, February, March. The separate attributes allow different keys and so the sorting is not complicated.

You will also notice the ISO Week / ISO Year attributes which I've included for the European countries. I'll give some more information about this in a later post.

In this situation you could build natural hierarchies like the following ones:

image

The following screenshot demonstrates how you can leverage the atttribute relations for example in Microsoft Excel 2007:

image

Here, we simple display some more attributes alongside with the day information.

The result may then look like this

image

 

If you've struggled with date dimension attributes and relationship too like I did, I'm very interested in the ideas you developed for setting up your date dimension and I encourage you to leave me a comment.