SQL Server 2008 | SQL Server 2008 R2

OLAP is perfect for analyzing fact records that are mapped to dimensions. However, what can be done, if the reference to the dimension changes depending on the selected period? I’m not talking about slowly changing dimensions here, but really about the period of selection.

Think of the following example: A manager wants to analyze the number of visited customers for his sales force. He wants to see, how many customers have been visited once, twice and so on during the last three months (or any other time period).

In order to do so, we need a dimension “Visits” (with the number of visits, for example 0, 1, 2, …) but we cannot map the visits against this dimension because this mapping depends on the selected period. For example, one single customer was visited once in January and once in February. If the selected period is January, the we need to see one customer being visited once. If the selected period is January and February together, we would need to see the customer being visited twice.

In SQL Server 2008 we can solve this by not linking our Visits dimension to the measure groups. The calculations are then performed by a cube script. For our example, we’re using a very simple data model:

The fact table FactVisits contains one row per visit (linked to date and customer). Note that the Visits dimension is not linked to the fact table. The source table for the visit dimension looks like this.

For the cube, we need to calculate the entries for the Visits dimension. To do so, I created an additional measure (Customer Count) based on some other value I found in the source fact table (the value will be overwritten using the cube script later). Usually you would create an additional column in the data source view (value 0) to source this measure from. Here is the definition of my Customer Count measure:

In order to calculate the number of customers for each visit count, I created the following cube script:

scope ([Visits].[Visit].[Visit],[Customer].[Customer].[Customer],[Measures].[Customer Count]);

this=iif(([Measures].[Visit Count])=CInt([Visits].[Visit].currentmember.properties("KEY")),1,NULL);

end scope;

The script is computed on leaf level of the visits (visit count dimension). In the cube script, the reference to [Visits].[Visit].currentmember.properties("KEY") results in the VisitID (number of visits). If the number of visits (measure [Visit Count]) is equal to the number of visits found in the visit dimension, this is counted as one. Note, that I used NULL instead of 0. This makes it easier to analyze, which customers have been visited for a given number of times, as null values can be suppressed (see last screenshot of this post).

So, let’s check the results up to this point.

The calculation now works for every single line but currently the totals are still wrong. The correct value for the totals can be computed using a dynamic set (to allow Excel multi selects) like this:

CREATE HIDDEN DYNAMIC SET CURRENTCUBE.[DynaVisits] AS [Visits].[Visit].[Visit];

([Visits].[Visit].[All])=Sum(existing [DynaVisits]);

I also want to count more than ten visits on a special dimension element (ID 999 in the table above). This can be achieved with the following cube script:

scope ([Visits].[Visit].&[999],[Customer].[Customer].[Customer],[Measures].[Customer Count]);

this=iif(([Measures].[Visit Count])>10,1,NULL);

end scope;

Testing the results (here using Microsoft Excel) shows the correct calculation for the grand total as well as for other levels of the visit count hierarchy:

The way our calculations works can be best observed when looking at a single customer.

In this case we had two visits of our customer in 2008 and one visit in 2009 giving a total of three visits for 2008 and 2009. Note that the actual fact is shown as a dimensional property here.

We can also apply multi select filters in Excel. Here is another sample screenshot filtering only customers with less than 4 visits during the last three months in 2009 (this is the ‘(multiple items)’ filter for the calendar). The data is now also analyzed by the sales representative (attribute of the customer). Note, that the grand totals are still correct:

As mentioned before it is also possible to see which customers are listed here by including the customers in the pivot table. This is shown in the following screenshot (filter to visit count = Zero).

However, the drill through action would still return all customers (due to the definition of my measure) as the value is only based on a computation. So the drill through should be disabled here.

In this case, I simply checked the measure in order to enable the drill through action:

Of course, the example is still a very simplified one. Usually you would need to know the customer base, so you’re not counting new customers as not being visited during the last years.

Interesting topic.

ReplyDeleteI named this "Event-based histograms" (see here - http://tomislavpiasevoli.spaces.live.com/blog/cns!5BB64CF526505D83!325.entry) because it's a histogram afterall, a one not based on a dimension but on a fact table instead (as you made distinction in the introduction).

Regards,

T

Great post! I'm surprised they've not given you a MVP yet.

ReplyDelete