Saturday, July 11, 2009

How to create a time attribute relationship (1)

Thanks to Google Analytics I saw that many page hits on my blog result from keyword searches for "create time attribute relationship". And in fact, it seems to be more complicated to do a proper attribute relationship modeling with the time dimension than with most other dimensions. The reason for that is simple: The time dimension involves many related attributes (e.g. months can be written as number, short- and longname) and parallel hierarchies (like calender and fiscal view or calender weeks).

In this blog I had two posts about attribute relationship so far (here and here). These two posts are the background for the current post as they provide information why you should set proper attribute relationship and how you can achieve this even if your data doesn't really reflect such a relationship (as with the month number of the year in relationship to the year number). So if you haven't read those two posts, I recommend you start with them.

When discussing with other consultants about the date attribute relationship there are many different favors and dislikes. For example: Do you want the month as a number (so you can easily use in pivot tables) or is it better to only use the month's name and put the month's number in the key? Actually the latter is how AdventureWorks handles the topic, so let's take a look at that model:

image

 image

The diagram in AdventureWorks looks pretty simple as many attributes have been hidden from the user. Keep in mind, however, that every design of the attribute relationship has it's consequences for usage in pivot tables or queries. For example, think of a need to display the 365 days of the year as rows and showing what month they belong to as a second column.

One approach for that could be to use the following MDX query:

with member monthname as [Date].[Month of Year].currentmember.name
select monthname on 0,
[Date].[Day of Year].[Day of Year]
on 1
from [Adventure Works]

But since the attributes "Day of Year" (1, 2, 3, ... 365) and "Month of Year" (January, February etc.) have no direct relationship, the query shows an output like this:

image

In the diagram above attributes are displayed in their own box if they act as a hierarchy level or if they have other attribute relationships associated with them. As in SQL 2005 the arrow points to the less detailed level (e.g. arrow from month to year because the year is less detailed).

Anyway, there is nothing wrong with the attribute hierarchies above. It's just something you need to be aware of and if you start building your time dimension, Adventure Works might be a good approach.

As you can easily explore all the settings (keys, names, hierarchies) in the Adventure Works sample database, I'll give another example of how to set up a time dimension in the next post.

Monday, May 25, 2009

Liquid browsing

Last week I had the opportunity to talk with Peter Leweke from Liquiverse about their new data browser and their focus to turn data from lists into a meaning or as they say on their homepage "Life beyond lists". It's really exciting to play with the beta release I got. Liquiverse will announce their new release of LiquidCSV soon and it's definitely worth taking a look.

I've included a screenshot taken from an export of some AdventureWorks' human resources data.

liquid

 

What makes the browser really interesting is the way, how fast and dynamically elements flow when changing sort order or applying filters. Also, when you move your mouse over some area with many data points, they magically drift away so you can easily address every single item of data. It's hard to explain - you will have to experience it.

Now, what does this have to do with OLAP? Today Liquiverse doesn't offer an interface to MDX or XMLA data sources but the API will be available in 2009 and it shouldn't be too difficult to include other data sources or to apply drill down effects using the API. So, I'm really looking forward to what's going on at Liquiverse.

Saturday, May 23, 2009

Accessing duplicate members in dimensions

In the last post we had an issue with multiple members sharing the same name but with different key attributes. Every approach of addressing such members with MDX like

[Dimension].[Hierarchy].[Name] or [Dimension].[Hierarchy].&[Key]

fails because this fetches only one particular member while the measure data may be associated with one of the other members sharing the same key.

Ok, usually you shouldn't be in this situation. If for example you're using the business key (instead of the surrogate key) as the attribute key, you only get one member and all the fact data is associated to that member. Therefor the above member expressions work fine.

But with parent-child attributes you cannot do this easily and writing the MDX from the previous post in order to filter all elements with a given name is somehow painful. And if you're keeping historical elements in the parent-child hierarchy you may want to find all elements with a given name.

The last post showed a solution to achieve this using MDX by filtering the elements by name, but this is somehow painful especially with many elements. So for today's post we're trying a different approach with a stored procedure.

The stored procedure simply takes a hierarchy and a string with comma separated member names and returns all matching members in the correct order. This may also be helpful for report development if you need to provide a specific subset of elements (say cost centers, regions, branches etc.) in a given order. Usually you can put those elements in a report parameter (maybe a hidden one) and use the strtoset-function to expand the elements to a set. But as stated above, this doesn't work with elements that share the same name and it might also be difficult to achieve a manually given sort order.

So, here's how the function works in the Adventure Works cube:

select {[Measures].[Amount]} on 0,
ASTools.GetMembers([Account].[Accounts],'Assets,Current Assets, Other Assets') on 1

from [Adventure Works]

   

The result looks like this:

image

Note that although we just specified 'Other Assets', both accounts with that name are now in the list. This is the behavior we wanted to achieve. As stated in the last post, in reality, the two account are not the same, but for our example, we pretend they are as the situation would be exactly identical with slowly changing parent-child structures.

For clarification, take a look at the following query:

select {[Measures].[Amount]} on 0,
{
[Account].[Accounts].[Assets],
[Account].[Accounts].[Current Assets],
[Account].[Accounts].[Other Assets]
} on 1

from [Adventure Works]

 

In this case the account 'Other Assets' is specified by it's member's unique name which refers to only one member of the dimension. As expected, the result looks like this (note the missing $75K):

image

 

Finallly, here is the source code for the procedure:

 

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.AnalysisServices.AdomdServer;

namespace SSASTools
{
    public class SSASLibrary
    {
        public Set GetMembers(Hierarchy h, String memberList)
        {
            SetBuilder sb = new SetBuilder();
            Expression ex = new Expression();

            String[] members = memberList.Split(',');

            foreach (String cmembername in members)
            {  
                ex.ExpressionText = "filter(" + h.UniqueName + ".allmembers, " + h.UniqueName + ".currentmember.name='"+ cmembername.Trim() + "')";
                Set s = ex.CalculateMdxObject(null).ToSet();

                foreach (Tuple t in s) {               
                    sb.Add(t);
                }         
            }

            return sb.ToSet();
        }
    }
}

Saturday, May 9, 2009

Parent-child dimensions, historical values (SCD) and keys

Usually working with historical values in an SSAS cube is not really complicated. Think of a customer whose ZIP code changed over time which is modeled as a slowly changing dimension of type 2:

ID Customer ZIP Current
1 John Doe 10000 false
2 John Doe 10200 false
3 John Doe 13041 true

 

If we don't include the ZIP-code in the MDX query, we just see the sales. For example, let's take a look at the following query:

select {[Date].[Year].[Year]} on 0,
{[Customer].[Customer].[John Doe]} on 1
from [Sample Cube]
where {[Measures].[Sales]}

 

The result of this statement would look somewhat like this:

Customer 2004 2005 2006
John Doe $3,400 $2,000 $1,999

 

The sales of the customer are related to the ID (fact to dimension relationship), so we can see the aggregated sales for John Doe in total and we can still filter sales per ZIP code over the time (let's assume John Doe moved to a new house with a new ZIP twice).

But why don't we see John Doe three times in this example? Well, if you don't, it's because you used the customer name field (not the id) as the key for the attribute Customer. When reading the dimension attributes from the dimension, SSAS sends a query that looks somewhat similar like this:

 

select distinct Customer as Key, Customer as Name, ... from DimCustomer

 

So the key determines if two rows in the source table make the same dimension member or not. Generally speaking it's good practice not to use the artificial surrogate key as the key for your dimension attribute, especially when you're working with slowly changing dimensions.

But what about parent-child dimensions? In this situation, the primary key of your dimension table will automatically be the key for your dimension members as each node of the parent-child tree has its unique key. If you want to keep historical values in the parent child dimension (like changes in other attributes or the location within the tree) you'll end up with different members sharing the same name but having different keys.

Ok, this sounds confusing, so let's start with an example. I'm using Adventure Works DWH 2K8 as a source. When looking in the base dimension table we'll find something like this:

 

image

 

Of course, this is not the same account (one is for assets, one for expenditures) but for our example let's pretend that the two accounts with the name "Other Assets" have been created because of a slowly dimension change, maybe the account with the id 24 has moved from the parent account 2 to the parent account 79. Again, this is not what was intended in the dimension table but for our example, the situation would be exactly the same. 

Just like in our example with the customer John Doe above, when selecting "Other Assets" we want to see all fact values that are associated to one of the two accounts. So we might try this MDX query:

 

select
[Measures].[Amount] on 0,
[Account].[Accounts].[Other Assets] on 1

from [Adventure Works]

 

The result looks like this:

image

Now, is this really the value for every amount that is associated with at least one of the two keys (24 and 84) above? Not really. In fact we just saw the values on one of the two rows. The reason is that because of the key column being different we really have two separate members in the dimension, both named "Other Assets". Each member expression just resolves to one member (no matter if you're using name or key or MDX functions like strtomember, nametoset etc.). So the expression [Account].[Accounts].[Other Assets] just resolves to one of the two dimension members and therefor the displayed value just reflects the value on that one member.

In order to show the difference, let's rewrite the query using a filter expression:

select {[Measures].[Amount]} on 0,

filter (
  [Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
  ) on 1

from [Adventure Works]

 

Now, the result shows both accounts with the given name:

image

Here you can clearly see what was missing in our previous query. 

Of course, we could as well aggregate both values (never do this with your real balance sheet though...) using the aggregate function:

with member [Account].[Accounts].[Other MemberAgg] as
Aggregate(
filter (
  [Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
  )
)

select {[Measures].[Amount]} on 0,
[Account].[Accounts].[Other MemberAgg]
on 1

from [Adventure Works]

 

This kind of problem applies to several situations. Just think of a situation where you want to remove the account "Other Assets" from your query using the set minus operation:

 

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members - [Account].[Accounts].[Other Assets]
on 1

from [Adventure Works]

 

Without thinking about different keys you might have been surprised to still find "Other Assets" in the result of the query above:

image

 

As expected, we just eliminated one of the two accounts from the list by addressing it with the member expression above. In order to get the desired result we also have to use the filter expression:

select {[Measures].[Amount]} on 0,
[Account].[Accounts].members -

filter (
  [Account].[Accounts].members,[Account].[Accounts].currentmember.name="Other Assets"
  )
on 1

from [Adventure Works]

 

image

 

In the next post we'll continue thinking about those duplicate members and how to work with them efficiently.

Tuesday, April 21, 2009

Stress testing SSAS

Usually we're using load testing tools as a basis for performance tuning or in order to give reliable hardware recommendations. You can find two approaches for load testing at the Codeplex website.

 

The ASLoadSim project uses Visual Studio 2005 Team Edition for Software Testers and provides a good toolkit for automatic load testing. But even if you're not planning to use the Team Edition you may still want to have a look at the best practices document "SSAS Load Testing Best Practices" that is also included in this project's download.

The other source for stress testing available from the Codeplex website are the ASCMD stress testing scripts, which are basically a set of MDX queries and Windows command files to execute the queries. This is a very simple but still useful approach to stress testing the server and monitor the behavior of the server under heavy load.

You still need to provide some test queries for both projects. While both projects support features for clearing the cache in order to get comparable results, only the ASLoadSim includes random variables for the query builder. You may ask why you should think about random elements in your test queries as you've already cleared the cache for each run. But without those random elements you'll probably still have effects caused by the operation system IO cache or even by hardware caches built into the harddrive controllers or harddrives themselves.

So it's a good idea to provide some sort of random selection for your load test queries and since MDX offers support for VBA functions like Rnd() it should not be too difficult to design queries that randomly select elements from dimensions.

A very easy way to do so, is to use the topcount function on elements that are sorted randomly. The following query does this:

 

with
set random_customers as topcount(order([Customer].[Customer].[Customer],vba!rnd(1)),10)

select {} on 0,
random_customers on 1

from [Adventure Works]

 

image

 

Each run of the query should return 10 random customers from our Adventure Works Customer dimension. If you're only interested in a single random customer, it's also not difficult. We just have to consider that

Int ((upperbound - lowerbound + 1) * Rnd() + lowerbound)

 

results in a random number between lowerbound and upperbound. Because our level and set indexing is zero based, we have lowerbound=0 and upperbound=count-1, which simplifies the above formula to

Int (count* rnd())

 

So, a query, returning just a random customer could look like this:

select {} on 0,
[Customer].[Customer].[Customer].item(
vba!rnd() * [Customer].[Customer].[Customer].Count
)

on 1
from [Adventure Works]

 

Of course, in order to build your load test queries, you would include some other attributes, measures and calculated measures as well, but the samples above demonstrate how you can randomly pick dimension members.

Monday, April 13, 2009

SQL Server 2008 SP1 with Report Builder 2.0 as Click Once Application

As most of you might have already heard, SQL Server 2008 SP1 is out and can be downloaded from the Microsoft site. The build number is 2531 (so the complete version number is now 10.0.2531.0).

Among the bug fixes and improvements (all listed in KB 968369) you will also find the Click-Once version of the Report Builder (being a stand-alone application in the RTM release), so it's easier to deploy the Report Builder now. After the installation you will find the button for launching the Report Builder just where it was in SQL Server 2005.

Sunday, April 12, 2009

Performance Counters for SSAS 2008

Vidas Matelis posted an overview of the SSAS performance counters at ssas-info.com. You can find the complete list here:

http://www.ssas-info.com/analysis-services-articles/58-ssas-2008/1113-analysis-services-2008-performance-counters

 

As with other categories of performance counters you can easily get lost with the "countless counters".

The Analysis Services Performance Guide names these counters to start your performance analysis with:

 

MSOLAP: Processing

  • Rows read/sec

MSOLAP: Proc Aggregations

  • Temp File Bytes Written/sec
  • Rows created/Sec
  • Current Partitions

MSOLAP: Threads

  • Processing pool idle threads  (I sometimes find "query pool idle threads" more significant, maybe you want to monitor both)
  • Processing pool job queue length (I sometimes find "query pool job queue length" more significant, maybe you want to monitor both)
  • Processing pool busy threads (I sometimes find "query pool busy threads" more significant, maybe you want to monitor both)

SQL-Server: Memory Manager

  • Total Server Memory
  • Target Server Memory

Process

  • Virtual Bytes – msmdsrv.exe
  • Working Set – msmdsrv.exe
  • Private Bytes – msmdsrv.exe
  • % Processor Time – msmdsrv.exe and sqlservr.exe

Logical Disk:

  • Avg. Disk sec/Transfer – All Instances

Processor:

  • % Processor Time – Total

System:

  • Context Switches / sec

 

So it should be clear that you need to monitor the server machine comprehensively and not only the Analysis Services Process.

However, I picked some SSAS counters from the list that are a good one to start your exploration when focusing on the SSAS processes.

 

Storage Engine Query

I think that this is a really interesting category of counters to start with. Here you find information about the number of queries processed per second, the caching rate of the queries, the average time per query etc. There are really a lot of counters. So this category gives you a good overview of the servers' workload. For the beginning you might want to look at the following counters:

Object Counter Description
MSAS 2008:Storage Engine Query Current measure group queries Current number of measure group queries being actively worked on.
MSAS 2008:Storage Engine Query Measure group queries/sec Rate of measure group queries
MSAS 2008:Storage Engine Query Queries answered/sec Rate of queries answered.
MSAS 2008:Storage Engine Query Bytes sent/sec Rate of bytes sent by server to clients, in response to queries.
MSAS 2008:Storage Engine Query Queries from cache direct/sec Rate of queries answered from cache directly.
MSAS 2008:Storage Engine Query Queries from cache filtered/sec Rate of queries answered by filtering existing cache entry.
MSAS 2008:Storage Engine Query Queries from file/sec Rate of queries answered from files.
MSAS 2008:Storage Engine Query Avg time/query Average time per query, in milliseconds. Response time based on queries answered since the last counter measurement.
MSAS 2008:Storage Engine Query Dimension cache lookups/sec Rate of dimension cache lookups.
MSAS 2008:Storage Engine Query Dimension cache hits/sec Rate of dimension cache hits.
MSAS 2008:Storage Engine Query Measure group cache lookups/sec Rate of measure group cache lookups.
MSAS 2008:Storage Engine Query Measure group cache hits/sec Rate of measure group cache hits.
MSAS 2008:Storage Engine Query Aggregation lookups/sec Rate of aggregation lookups.
MSAS 2008:Storage Engine Query Aggregation hits/sec Rate of aggregation hits.

 

Connections

This category gives information about the number of connections, sessions and request which is also important to understand the workload and to see when bottlenecks occur.

Object Counter Description
MSAS 2008:Connection Current connections Current number of client connections established.
MSAS 2008:Connection Requests/sec Rate of connection requests. These are arrivals.
MSAS 2008:Connection Current user sessions Current number of user sessions established.

 

MDX

There are really a lot of counters about MDX. Just to name a few of them:

Object Counter Description
MSAS 2008:MDX Number of cell-by-cell evaluation nodes Total number of cell-by-cell evaluation nodes built by MDX execution plans
MSAS 2008:MDX Number of bulk-mode evaluation nodes Total number of bulk-mode evaluation nodes built by MDX execution plans
MSAS 2008:MDX Total cells calculated Total number of cell properties calculated

 

Memory

Memory is always important. Here you can also query the amount of memory being allocated by the aggregation cache.

Object Counter Description
MSAS 2008:Memory Memory Usage KB Memory usage of the server process. Same as perfmon counter for Process\PrivateBytes.
MSAS 2008:Memory AggCacheKB Current memory allocated to aggregation cache, in KB.
MSAS 2008:Memory Quota KB Current memory quota, in KB. Memory quota is also known as a memory grant or memory reservation.
MSAS 2008:Memory Quota Blocked Current number of quota requests that are blocked until other memory quotas are freed.

 

Aggregations

If your cubes rely on aggregations it might be interesting to know if they can be held in memory or if the are written to a temporary file. So you might also want to look at the following aggregation counters:

Object Counter Description
MSAS 2008:Proc Aggregations Temp File Bytes Writes/sec Usage of temporary file
MSAS 2008:Proc Aggregations Current partitions Current number of partitions being processed.
MSAS 2008:Proc Aggregations Memory size bytes Size of current aggregations in memory. This count is an estimate.
MSAS 2008:Proc Aggregations Temp file bytes written/sec Rate of writing bytes to a temporary file. Temporary files are written when aggregations exceed memory limits.

 

Processing

Processing time is also very important when considering performance especially when you're processing your cube regularly over the day while users are also making their queries. Knowing about the processings may also make the exploration of performance issues more easy. If you're just looking at the average query time for instance, you should also check what the server does in the meantime.
Object Counter Description
MSAS 2008:Processing Rows read/sec Rate of rows read from all relational databases.
MSAS 2008:Processing Total rows read Count of rows read from all relational databases.
MSAS 2008:Processing Rows converted/sec Rate of rows converted during processing.
MSAS 2008:Processing Total rows converted Count of rows converted during processing.
MSAS 2008:Processing Rows written/sec Rate of rows written during processing.
MSAS 2008:Processing Total rows written Count of rows written during processing.

 

Threads

Object Counter Description
MSAS 2008:Threads Query pool job queue length Nonzero values means that there are more queries than query threads. You may increase the number of threads (but only if CPU utilization is not too high because otherwise this would only result in more context switches and degrade performance)
MSAS 2008:Threads Query pool busy threads The number of busy threads in the query thread pool
MSAS 2008:Threads Query pool idle threads The number of idle threads in the query thread pool

 

I completely left out the counters for Caches, Datamining, Locks, Indexes and Proactive Caching here but these are important too and after investigating the above counters you will want to look at the more detailed counters as well. Just check Vidas' blog post for the complete list.