Search This Blog

Monday, May 25, 2009

Liquid browsing

Last week I had the opportunity to talk with Peter Leweke from Liquiverse about their new data browser and their focus to turn data from lists into a meaning or as they say on their homepage "Life beyond lists". It's really exciting to play with the beta release I got. Liquiverse will announce their new release of LiquidCSV soon and it's definitely worth taking a look.

I've included a screenshot taken from an export of some AdventureWorks' human resources data.

liquid_thumb1

What makes the browser really interesting is the way, how fast and dynamically elements flow when changing sort order or applying filters. Also, when you move your mouse over some area with many data points, they magically drift away so you can easily address every single item of data. It's hard to explain - you will have to experience it.

Now, what does this have to do with OLAP? Today Liquiverse doesn't offer an interface to MDX or XMLA data sources but the API will be available in 2009 and it shouldn't be too difficult to include other data sources or to apply drill down effects using the API. So, I'm really looking forward to what's going on at Liquiverse.

Saturday, May 23, 2009

Accessing duplicate members in dimensions

In the last post we had an issue with multiple members sharing the same name but with different key attributes. Every approach of addressing such members with MDX like

[Dimension].[Hierarchy].[Name] or [Dimension].[Hierarchy].&[Key]

fails because this fetches only one particular member while the measure data may be associated with one of the other members sharing the same key.

Ok, usually you shouldn't be in this situation. If for example you're using the business key (instead of the surrogate key) as the attribute key, you only get one member and all the fact data is associated to that member. Therefor the above member expressions work fine.

But with parent-child attributes you cannot do this easily and writing the MDX from the previous post in order to filter all elements with a given name is somehow painful. And if you're keeping historical elements in the parent-child hierarchy you may want to find all elements with a given name.

The last post showed a solution to achieve this using MDX by filtering the elements by name, but this is somehow painful especially with many elements. So for today's post we're trying a different approach with a stored procedure.

The stored procedure simply takes a hierarchy and a string with comma separated member names and returns all matching members in the correct order. This may also be helpful for report development if you need to provide a specific subset of elements (say cost centers, regions, branches etc.) in a given order. Usually you can put those elements in a report parameter (maybe a hidden one) and use the strtoset-function to expand the elements to a set. But as stated above, this doesn't work with elements that share the same name and it might also be difficult to achieve a manually given sort order.

So, here's how the function works in the Adventure Works cube:

select {[Measures].[Amount]} on 0,
ASTools.GetMembers([Account].[Accounts],'Assets,Current Assets, Other Assets') on 1

from [Adventure Works]

The result looks like this:

image_thumb1

Note that although we just specified 'Other Assets', both accounts with that name are now in the list. This is the behavior we wanted to achieve. As stated in the last post, in reality, the two account are not the same, but for our example, we pretend they are as the situation would be exactly identical with slowly changing parent-child structures.

For clarification, take a look at the following query:

select {[Measures].[Amount]} on 0,
{
[Account].[Accounts].[Assets],
[Account].[Accounts].[Current Assets],
[Account].[Accounts].[Other Assets]
} on 1

from [Adventure Works]

In this case the account 'Other Assets' is specified by it's member's unique name which refers to only one member of the dimension. As expected, the result looks like this (note the missing $75K):

image_thumb2

Finallly, here is the source code for the procedure:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.AnalysisServices.AdomdServer;

namespace SSASTools
{
public class SSASLibrary
{
public Set GetMembers(Hierarchy h, String memberList)
{
SetBuilder sb = new SetBuilder();
Expression ex = new Expression();

String[] members = memberList.Split(',');

foreach (String cmembername in members)
{
ex.ExpressionText = "filter(" + h.UniqueName + ".allmembers, " + h.UniqueName + ".currentmember.name='"+ cmembername.Trim() + "')";
Set s = ex.CalculateMdxObject(null).ToSet();

foreach (Tuple t in s) {
sb.Add(t);
}
}

return sb.ToSet();
}
}
}

Saturday, May 9, 2009

Parent-child dimensions, historical values (SCD) and keys

Usually working with historical values in an SSAS cube is not really complicated. Think of a customer whose ZIP code changed over time which is modeled as a slowly changing dimension of type 2:

ID Customer ZIP Current
1 John Doe 10000 false
2 John Doe 10200 false
3 John Doe 13041 true

If we don't include the ZIP-code in the MDX query, we just see the sales. For example, let's take a look at the following query:

select {[Date].[Year].[Year]} on 0,
{[Customer].[Customer].[John Doe]} on 1
from [Sample Cube]
where {[Measures].[Sales]}

The result of this statement would look somewhat like this:

Customer 2004 2005 2006
John Doe $3,400 $2,000 $1,999

The sales of the customer are related to the ID (fact to dimension relationship), so we can see the aggregated sales for John Doe in total and we can still filter sales per ZIP code over the time (let's assume John Doe moved to a new house with a new ZIP twice).

But why don't we see John Doe three times in this example? Well, if you don't, it's because you used the customer name field (not the id) as the key for the attribute Customer. When reading the dimension attributes from the dimension, SSAS sends a query that looks somewhat similar like this:

select distinct Customer as Key, Customer as Name, ... from DimCustomer

So the key determines if two rows in the source table make the same dimension member or not. Generally speaking it's good practice not to use the artificial surrogate key as the key for your dimension attribute, especially when you're working with slowly changing dimensions.

But what about parent-child dimensions? In this situation, the primary key of your dimension table will automatically be the key for your dimension members as each node of the parent-child tree has its unique key. If you want to keep historical values in the parent child dimension (like changes in other attributes or the location within the tree) you'll end up with different members sharing the same name but having different keys.

Ok, this sounds confusing, so let's start with an example. I'm using Adventure Works DWH 2K8 as a source. When looking in the base dimension table we'll find something like this:

image_thumb1

Of course, this is not the same account (one is for assets, one for expenditures) but for our example let's pretend that the two accounts with the name "Other Assets" have been created because of a slowly dimension change, maybe the account with the id 24 has moved from the parent account 2 to the parent account 79. Again, this is not what was intended in the dimension table but for our example, the situation would be exactly the same.

Just like in our example with the customer John Doe above, when selecting "Other Assets" we want to see all fact values that are associated to one of the two accounts. So we might try this MDX query:

select
[Measures].[Amount] on 0,
[Account].[Accounts].[Other Assets] on 1

from [Adventure Works]

The result looks like this:

image_thumb2

Now, is this really the value for every amount that is associated with at least one of the two keys (24 and 84) above? Not really. In fact we just saw the values on one of the two rows. The reason is that because of the key column being different we really have two separate members in the dimension, both named "Other Assets". Each member expression just resolves to one member (no matter if you're using name or key or MDX functions like strtomember, nametoset etc.). So the expression [Account].[Accounts].[Other Assets] just resolves to one of the two dimension members and therefor the displayed value just reflects the value on that one member.

In order to show the difference, let's rewrite the query using a filter expression:

select {[Measures].[Amount]} on 0,

filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
) on 1

from [Adventure Works]

Now, the result shows both accounts with the given name:

image_thumb3

Here you can clearly see what was missing in our previous query.

Of course, we could as well aggregate both values (never do this with your real balance sheet though...) using the aggregate function:

with member [Account].[Accounts].[Other MemberAgg] as
Aggregate(
filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
)
)

select {[Measures].[Amount]} on 0,
[Account].[Accounts].[Other MemberAgg]
on 1

from [Adventure Works]

This kind of problem applies to several situations. Just think of a situation where you want to remove the account "Other Assets" from your query using the set minus operation:

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members - [Account].[Accounts].[Other Assets]
on 1

from [Adventure Works]

Without thinking about different keys you might have been surprised to still find "Other Assets" in the result of the query above:

image_thumb4

As expected, we just eliminated one of the two accounts from the list by addressing it with the member expression above. In order to get the desired result we also have to use the filter expression:

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members -

filter (
[Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
)
on 1

from [Adventure Works]

image_thumb5

In the next post we'll continue thinking about those duplicate members and how to work with them efficiently.