Search This Blog

Loading...

Sunday, March 18, 2012

SSRS bar charts tips and tricks

SQL Server 2008R2 | SQL Server 2012

This post shows some tricks with SSRS report charts that I stumbled over some weeks ago.

I start with a very simple bar chart showing some values along a full year (12 months):

image

Here are some variants of this chart together with some details how the chart was created using SQL Server Reporting Services:

image Highlight plan values

Here the last three months contain the plan values. The shading of the bars is done by an expression. The vertical line is not the vertical axis, but a line chart series. How can this line be a true vertical? Like with xy scatter charts, you can specify each point’s x and y coordinates separately. The x coordinate goes into the category field for the series. So in this case there are two points: (9.5,0) and (9.5,1). This line is on the secondary vertical axis with min/max set to 0 and 1.
Since the labels on the horizontal axis are automatically shown as scalar values, if one of the chart’s series contains x/y data, I had to create these labels for the horizontal axis using a second chart which is placed below the bar chart. 
image Another option for highlighting plan values

In this chart I didn’t use a line to separate the planning values but instead I inserted some space. One option to do so is to add an extra row of data with value zero. But by doing so, the gap between the two regions becomes too wide (in the example on the left, the gap is smaller than the width of a bar).
The gap can be created by correcting the category value for the data rows that correspond to the planning data.
In my case, I hard-coded the start for the planning data (usually it would depend on other data columns or report parameters). My category expression looks like this:

=Fields!Month.Value+
iif(Fields!Month.Value>9,0.3,0)


Note, that for all bars that correspond to the planning data, a value of 0.3 is added to the horizontal position.
In order to get a proper placement of the labels, the labels again are created using a separate chart as the data labels for bars of height 0. In order to explain this, the following screenshot enlarges the bars and shows the axis of the second chart:

image
image Plan values as a stepped line

The plan values are shown as a stepped line here. Bars that rise above the stepped line are showing a better actual compared to the plan. The stepped line is a simple basic series type. However, usually the steps would be located in the center of the bars like this:

image

In order to correct this, you can simply take off 0.5 from the x coordinates of this series, so in my case the category field’s expression looks like this:
=Fields!Month.Value - 0.5
In order to create a proper stepped line for the last bar, the last row of data for the stepped line has to be duplicated (one data point more than for the bar chart needed).
image  Shaded areas in the chart background

This effect is relatively simple. I used an area chart in the background for the months with the plan values. Again, the area chart’s x coordinates have to be corrected by 0.5 in order to fit to the bar chart’s positions.
The formula for the background area chart results in blank for all months without the shading (don’t use zero, because then you would create a linear slope between September and October in this example).
image  Colored horizontal axis

This chart type was once proposed by Prof. Hichert for a visual separation of actual and plan values. Here, the horizontal axis is hidden. The colored axis is formed by two area chart series. In order to prevent overlapping with the bar chart, they have to be on the same vertical axis as the actual values but with a negative value which as to be scaled depending on the maximum of the values for the bar chart.
image Empty horizontal axis

This is just a variation of the previous chart and again one of the examples of Prof. Hichert. Instead of using two different colors for the area chart forming the horizontal axis, here I’m using different fill colors.

These are just some of the endless possibilities to create charts in reporting services.

Sunday, January 8, 2012

Sparse Snapshots in DAX / BISM Tabular

SQL Server 2012/Denali | PowerPivot

In my last post I wrote about using delta values instead of full snapshots. However, the amount of data is identical if we using a sparse snapshots instead. For this purpose, the source data from my last post has just to be converted to absolute values as shown below:

Only delta values

 

Only snapshots (changes)

t3 image t2

So the amount of data is not really changed. But with the sparse snapshots the computations get a lot more difficult. If we have delta values (left table) we can simply sum up all deltas to a given date and this works for all of our related tables (for example products). With sparse snapshots (right table) we have to find out the last value per product and then add the results up to the total, so we have to do a calculation per product before.

Before we get to the calculation, here is the very simple source data model that I used:

t7

It took me some time to figure out the following solution and I’m pretty sure that there must be an easier method. So feel free to experiment and write comments.

Here is the final code:

Stock:=
    SumX(
        Summarize(
            'Product'
            ,  'Product'[Product]
            , "LastStock"
            , calculate(
                Sum('Stock'[StockLevel])
                , dateadd(
                    LastDate('Date'[Date])
                    ,-floor(
                        LastDate('Date'[Date])
                        -
                        MaxX(
                            Filter(
                                Summarize(
                                    DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                                    ,'Date'[Date]
                                    , "X"
                                    ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
                                )
                                , not isblank([X])
                            )
                            , 'Date'[Date]
                        )
                        , 1
                    )
                    ,DAY
                )
            )
        )
        ,
        [LastStock]
    )

This is how are data looks like without the computation from above:

t5

Since the data is sparse we only see the stock at dates where there are changes. For the aggregation I used the Sum function (which doesn’t make much sense here for the date aggregates) and you also see that the row total over the products only takes products into account when there are changes.

And here is the resulting Excel pivot table using the calculation from above:

t6

As you can see the value for Quickerstill starts with 20 then drops to 18 at January 3, then to 15 at January 5 etc. Also the totals are correct now.

The remaining part of this post is about the formula from above, so it’s up to you to decide if you want to continue reading. The most important point here is that the calculation is much easier when working with delta rows or delta rows with intermediate snapshots (for example each first day of a month, quarter, year).

 

Details of the calculation

The first question is about the way we’re doing the calculation. Wouldn’t it be easier and faster to have them on the rows and therefore persisted (is ‘persisted’ the right term when talking about an in-memory database??)? Well, if you look at the screenshots above, the only reason we’re doing all this is because we need calculated values on “rows” that do not exist. In the last screenshot there is no row for January 31, but the value has to be computed: For Notate it’s the value of January 1 and for Quickerstill it’s the value of January 21. So we have to use a calculated measure in the model.

To makes things easier here, let’s start with a single product (Quickerstill). I want to calculate the last date for which I have a value. To do this I filter the date range from a very early date (2000/1/1 here) to the last date in the current context to those values, where the sum of the stock amount (any aggregate would do) is not blank, then take the biggest date (max function). This is the code:

Step1:=
    MaxX(
        Filter(
            Summarize(
                DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                ,'Date'[Date]
                , "X"
                ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
            )
            , not isblank([X])
        )
        , 'Date'[Date]
    )

If I include the original measure and this calculation in a pivot table, this is the result:

t8

For each day our calculation returns the last date with data. For example the calculated date for January 12 is January 6 as this was the last day with data before January 12. It seems we are already close to the solution but please keep in mind that the calculation from above would not work for more than one product (for example if the products are not filtered) as the date for each product has to be different. If we filter the pivot table from above for product Notate, the results would look totally different. Here are the first rows:

t9

But let’s ignore this for a second and keep the filter on Quickerstill. The next task would be to calculate the stock level at the calculated date. This sounds easy and the following formula was my first approach:

Step2 (with error):=
    calculate(
        Sum('Stock'[StockLevel])
            ,MaxX(
                    Filter(
                        Summarize(
                            DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                            ,'Date'[Date]
                            , "X"
                            ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
                        )
                        , not isblank([X])
                    )
                    , 'Date'[Date]
                )
    )

The code in blue is the exactly the code from above which gives the filter context for the calculation of the sum of the stock level values. However, this results in an error:

Semantic Error: A function ‘MAXX‘ has been used in a True/False expression that is used as a table filter expression. This is not allowed.

I love the ‘This is not allowed’ here. Actually there is no reason why it isn’t and I’m not feeling like I’m doing something illegal here. Maybe it’s also just a limitation of the CTP 3 beta release that I am currently working with. I also tried to wrap the MaxX function in a DateAdd with 0 days but this is also ‘not allowed’.

But there is a way which can be found after some time of experimenting. I take the number of days between the last date from the context and the date from the calculation in blue and use this result to correct my last date. Sounds confusing? Let’s start with the number of days. Here is the next step of the calculation:

Step2:=
    floor(
        LastDate('Date'[Date])
        -

        MaxX(
            Filter(
                Summarize(
                    DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                    ,'Date'[Date]
                    , "X"
                    ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
                )
                , not isblank([X])
            )
            , 'Date'[Date]
        )

        , 1
    )

Again the code in blue is the last date calculation from above (step 1). Let’s take a look at the result of this calculation:

t10

As you see, the new calculation tells us, how many days we have to go back in time to find a value. For example for January 14 we get a value of 8 meaning we have to go back 8 days to January 6 to find a value. Now we can wrap this in the calculation of the stock value and for some reason that I don’t understand, this is not illegal anymore although I’m doing exactly the same as I did before. Here is the calculation:

Step3:=
    calculate(
        Sum('Stock'[StockLevel])
        , dateadd(
            LastDate('Date'[Date])
            ,-floor(
                LastDate('Date'[Date])
                -
                MaxX(
                    Filter(
                        Summarize(
                            DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                            ,'Date'[Date]
                            , "X"
                            ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
                        )
                        , not isblank([X])
                    )
                    , 'Date'[Date]
                )
                , 1
            )
            ,DAY
        )
    )

Again the code from the last step is colored blue (step 2). Now we’re pretty close to the final formula. Let’s check the result when including both products:

t11

Now the result for each product is already working correctly. But the total is still not correct as you can see from the line marked in red. For January 2 the last date with data is the same for both products (January 1), therefore the value is correct. But for January 3 there are different days for the last stock value, so we only see one product in the total.

However, the remaining part is not difficult. We simply summarize (group) by product and take the sum. Again, the code in blue is the last code from the step before (step 3):

Stock:=
    SumX(
        Summarize(
            'Product'
            ,  'Product'[Product]
            , "LastStock"
            , calculate(
                Sum('Stock'[StockLevel])
                , dateadd(
                    LastDate('Date'[Date])
                    ,-floor(
                        LastDate('Date'[Date])
                        -
                        MaxX(
                            Filter(
                                Summarize(
                                    DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))
                                    ,'Date'[Date]
                                    , "X"
                                    ,  calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))
                                )
                                , not isblank([X])
                            )
                            , 'Date'[Date]
                        )
                        , 1
                    )
                    ,DAY
                )
            )
        )
        ,
        [LastStock]
    )

Now, this does the trick and we end up with the final screenshot as shown at the beginning of this post.

If you like you can play with the formula by downloading the sample work book here (right click on the link, then choose ‘Save as…’). You will need the PowerPivot Add-In CTP 3 or later in order to open the workbook.

Saturday, November 19, 2011

Stock levels as delta rows in DAX / BISM Tabular

SQL Server Denali | PowerPivot

A common way to compress large amounts of snap shot data is to store delta values instead of each snapshot value. This makes sense, if the data does not change every day. But for a good query performance you might not want to aggregate data over long periods.Therefore it makes sense to have a regular absolute snapshot value in the data and to use deltas between those snapshots. The work needed to create the periodic snapshots is usually done in the ETL process.

For my example, I’m using the following data table with absolute snapshot values as the first row of the month and deltas afterwards:

p1

For example, for our product Quickerstill we start with a stock level of 20 boxes at the beginning of January 2011. Then, at January 3, 2011 we sold two of them (-2) and again on January 5 another 3 boxes (-3).

The goal is to create a measure that gives us the current stock level at each date (also on the dates between the delta rows).

In order to do so, we need a ‘real’ date dimension, so we have a separate date table that is linked to our facts. This is how this simple model looks like:

p2

 

The measure DeltaStock is the original stock value from our table above (mixture of snapshots and deltas).

In order to perform the desired computation we can simply use the month-to-date formula as each day’s value computes as the sum from the first day of the month to the current day (including all deltas).

This is the simple formula we’re using to compute the stock at each day:

Stock:=TOTALMTD(SUM([DeltaStock]),'Date'[Date])

It’s amazing how simple this calculation is. Let’s take a look at the result. In order to see the effect, I added the original stock column together with the new computed column.

p3

As you can see, we now have a stock for each day that computes correctly from the mixture of snapshot and deltas values.

Of course we could also do the calculation without the absolute snapshot values in between. In this case we have to aggregate the values from the very beginning up to the current date. First let’s take a look at the source data without the absolute snapshots:

t1

The data is pretty much the same as in the first approach. Only the rows for the absolute snapshots are missing (apart from the first initial values).

In this case the calculation would look like this:

Stock:=
  SumX(
     DATESBETWEEN(
        'Date'[Date]
        , date(2000,1,1)
        , lastdate('Date'[Date])
     )
     ,calculate(
         Sum('Stock'[DeltaStock])
         ,ALLEXCEPT('Date','Date'[Date])
      )
  )

The calculation is still pretty simple. However I would prefer the option with the snapshot values in between for performance reasons and because these snapshots can be easily created in ETL (if they are not delivered from the source system).

Just two more remarks before I finish this post. The first one is about the DatesBetween range in the formula above. I’m using 2000/1/1 as the start date. However if you take a look at the returned dates only the existing rows from our date table are returned if they are matching this date range. So we could also write 1900/1/1 without risking to end up with a lot of rows.

The other remark is about the future time. Since our calculation takes the last value as the value for all the future, you will find values for all entries of the date dimension. This might not be wanted. In this case you can wrap the calculation from above inside an if statement to check the date:

StockClipped:=
  if(
    firstdate('Date'[Date])>Now()
    , Blank()
    ,SumX(
       DATESBETWEEN(
          'Date'[Date]
            , date(2000,1,1)
            , lastdate('Date'[Date])
         )
         ,calculate(
           Sum('Stock'[DeltaStock])
           ,ALLEXCEPT('Date','Date'[Date])
       )
     )
  )

With this modification (which also works with the formula for the absolute intermediate snapshots from above) values are only shown for periods that are over or have at least started. So the formula would return a value for the full year 2011 once the year has started. If you only want to see values for periods that have ended, you can replace the function ‘firstdate’ with a ‘lastdate’.

Sunday, October 23, 2011

Excel Services Scorecard as Windows Desktop Gadget

SharePoint 2010 | SQL Server 2008 | SQL Server 2008R2 | SQL Server 2012 (Denali)

Early this year I stumbled across a very interesting Windows desktop gadget, that is capable of showing an Excel Services element on the Windows desktop. Here, element can be a certain named region, a pivot table or a chart in an Excel Services document. The technology for showing this element is the Excel Services REST API (REST stands for Representational State Transfer).

You can find the article about the desktop gadget as well as the download link for the gadget itself here. Installation and configuration of the gadget is pretty well explained on the linked site, so I can keep this short here.

After adding the gadget to your desktop, the gadget still needs to be configured.

p1

By clicking on the gadget configuration icon, the configuration dialog is displayed:

p2

Workbook The URL to your Excel Services workbook, for example
http://srv1/PowerPivot/Gadget.xlsx
Show in gadget Here you can pick from any named region, pivot table or chart that should be displayed in the “expanded” state of the gadget (flyout)
Thumbnail Here you can pick from any named region, pivot table or chart that should be displayed in the normal state of the gadget. This is what you see on your desktop first
Refresh refresh interval of the gadget

So the simple idea is to create a nice pivot table and to use this as the basis for the gadget. For my example I created a simple pivot table based on the Finance perspective of the Adventure Works OLAP cube. For the pivot table I have the year on the filter, the sales amount and the operating profit KPI as the data and the departments on the rows.

p3

If we use this large table in our gadget, the thumbnail view gets pretty much “microscopic”:

p6

So, for the thumbnail (default) view you should choose a much smaller area. The flyout (detail view) is much better, but the KPI indicators and the filters are not shown:

p5

So, here are a few tips and tricks you can use to make the gadget look nicer.

 

1. Use time filters for current year, current month etc.

Of course we want our gadget to always show the latest values. Because of the auto refresh interval, we don’t have to care about this. But usually we will also use a time filter to restrict the data to a specific week, month etc. Since we don’t see the filter, the idea is to have this set automatically to the current time.

This can be easily done in the Excel Pivot Table by using date filters. Unfortunately, these filters don’t work in the filter area, so we have to place the time hierarchy on the rows or columns. Then we can apply a filter (for example current year) as shown here:

p7

For Adventure Works this would not result in any data since the sample date is only available for the years shown in the screenshot. However, in real life scenarios, this would be a good choice for the filter.

 

2. For thumbnail view, convert the pivot table to formulas

Formulas are much easier to handle and to format, compared to a pivot table. Remembering the very narrow available space for the thumbnail view, having full control of the layout is important. You may even want to hard-code the thumbnail view. Let me explain what I mean. In order to have the connection name at a single place, we start in a blank sheet by putting the connection name in a named cell called “OLAPConnection” (feel free to choose a different name):

p8

As the next step, let’s construct the current date member. From Management Studio, we can see that the MDX name of the time members looks like this (may be different in your cube, this example is taken from the Adventure Works sample database):

  • [Date].[Calendar].[Calendar Year].&[2001]
  • [Date].[Calendar].[Month].&[2001]&[1]
  • [Date].[Calendar].[Date].&[20010101]

Here we have January 1, 2001. This is easily constructed using Excel’s time functions. So we simply add these fields to our Excel sheet

Label/Name Cell Formula
Today B3 =Now()
Year B4 ="[Date].[Calendar].[Calendar Year].&[" & Year(B3) & "]"
Month B5 ="[Date].[Calendar].[Month].&[" & Year(B3) & "]&[" & Month(B3) & "]"
Day B6 ="[Date].[Calendar].[Date].&[" & 10000*Year(B3)+100*Month(B3)+Day(B3) &"]"

Of course you could also add fields for the previous month, the week etc., just depending on the needs of your scorecard. I named the cells B4 as MDXYear, B5 as MDXMonth, B6 as MDXDay. This is how the result looks like:

p9

I had to fake the current date in order to see some values. Therefore I replaced the formula for today with this one:

=Date(2004, Month(Now()), Day(Now()))

Of course, you won’t want to do this in a real life scenario but since the sample dataset contains no data for 2011 I had to use this “time machine formula”.

Before you start wondering what all this is good for, let’s query some data. For example, let’s assume that we want to see the operating profit (which is on the account ‘Operating Profit’) for the current year. So this would by our Excel formula:

=CUBEVALUE(OLAPConnection,"[Measures].[Amount]","[Account].[Accounts].[Operating Profit]",MDXYear)

In order to show this value in our gadget, I placed it on a new sheet and adjusted the column width and height a little bit.

p10

In order to get a nice flyout, I also created a simple pivot chart in the Excel sheet, showing the operational profit during the year. After saving the Excel file, our gadget now looks like this:

p12

And here is the flyout:

p13

 

3. (Conditional) background colors and fonts are preserved in the REST API

In order to include a kind of traffic light approach, we already found out that Excel indicators are not yet supported in the REST API. However, conditional formatting is supported, so you can easily create a scorecard like the follow:

p14

Also KPI indicators can be created using special characters, for example from the WingDings font as shown in the following example:

p15

For this example, I used a separate table of banding ranges and an Excel VLookup to find the appropriate color for the indictor. Here, 1=green, 2=yellow, 3=red. In the Excel cell I used conditional formatting to choose the text color appropriately. But as we always want to display a certain element (here, the diamond from the WingDings font), I used a custom format for each of the cells, so the number (1, 2 or 3) is not shown but only one character. The corresponding character for the diamond is “u”, so the custom format looks like this:

p20

And of course, there is a lot more you can do with all these formatting, cube functions etc.

 

3. Make changes to the source code of the gadget

You can extract the gadget or modify the source code of the gadget yourself. After installing the gadget, the extracted sources can be found here:

%LOCALAPPDATA%\Microsoft\Windows Sidebar\Gadgets

After modifying the sources, the gadgets need to be switched off and on in order for the changes to apply. The main file is the gadget.html here. For example you could change the link “By Excel Services” to point on your SharePoint server. To do so, a simple change in the source is needed:

Before:
<tr><td id="dockedTitle" width="100%">
<a id="leftDockedTitleLink" href="
http://blogs.msdn.com/cumgranosalis/pages/excel-services-windows-7-gadget.aspx">By Excel Services</a>
</td><td>

After:
<tr><td id="dockedTitle" width="100%">
<a id="leftDockedTitleLink" href="
http://srv1/PowerPivot/Forms/AllItems.aspx">All reports</a>
</td><td>

I also changed the background color here, so this is how the result looks like:


p17

As you can see, you can do a lot of interesting things with this simple but very powerful desktop gadget.

 

4. Use parameters

If you want to use the same Excel Services file for more than one user you may want to pass parameters from each individual instance of the desktop gadget to the Excel Services file. This is also possible, however there are some things to take care of.

The syntax for passing a parameter is

Ranges('cellname')=value

Here, cellname is a named cell in Excel that we want to pass the value to. This is how it is entered in the desktop gadget, if our named cell is named ‘value1’ and we want to pass ‘xyz’:

z1

Multiple parameters may be passed by separating them with an ampersand, for example

Ranges('value1')=xyz&Ranges('value2')=42&Ranges('value3')=01

This could be our corresponding view in the desktop gadget:

z2

I just added a field that concatenates all three parameters to show that the Excel Services sheet is recomputed based on the passed values.

As you can see, all parameters are passed to the gadget. However, numeric parameters are considered as numbers in Excel, so 01 was changed to 1. If you want to prevent this, add a ' in front of each text.

This is how the parameters should look like when you want to make sure, text is passed as text (and not converted to a number or date) in Excel:

Ranges('value1')='xyz&Ranges('value2')=42&Ranges('value3')='01

You should also be aware, that this string is passed to the URL “as it is”. So you have to encode all characters that are not allowed in a URL.

For example, if you want to pass the year 2006 from AdventureWorks, the unique name of the date member would be:

[Date].[Calendar].[Calendar Year].&[2006]

In order to pass this to the gadget, you have to encode blanks, the square brackets and the ampersand. So the result would look like this:

Ranges('parDate')=%5BDate%5D.%5BCalendar%5D.%5BCalendar%20Year%5D.%26%5B2006%5D

This is very difficult to read. Therefore I recommend just to pass the key (2006 in this case) as the parameter and to construct the unique name in Excel using a formula like

="[Date].[Calendar].[Calendar Year].&[" & parDate & "]"

Of course, there are a lot more things you can do with this simple, yet powerful Windows desktop gadget. Like with most other gadget you may also place more than one instance of the gadget on your desktop, so you can have different scorecards. You can also add links to your Excel sheet, so that you can jump directly to a dashboard for a specific key performance indicator. Just start playing with the gadget and see how powerful und useful it is.

Sunday, September 25, 2011

Custom Aggregates in DAX / BISM Tabular (part 2)

SQL Server Danali | PowerPivot

This post combines two ideas

  1. The calculation of a moving average (last 30 days) as shown in a previous post
  2. Custom aggregations

What I want to show here today, is how we can influence the way our calculation works on higher levels. For this purpose I’m showing three different approaches:

  1. The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range.
    This corresponds to the original approach from my post about moving averages
  2. The monthly (quarterly, yearly) total is defined as the average of all the moving averages on day level.
  3. The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period (the moving average is only to be calculated on date level)

Of course, there are many more possibilities but I think the methods shown here, can be used for many other aggregation requirements.

Here are the three aggregations side by side.

image

Although the values on higher levels of aggregation differ a lot, the daily values are identical (we just wanted to change the rollup method, not the calculation on a detail level).

Let’s start with the first one:

 

1. The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range

This is shown here for our measure “Sales Amount(30 avg)”:

image

As shown in my previous post, the formula may look like this:

Sales Amount (30d avg):=
AverageX(
  Summarize(
    datesinperiod(
       'Date'[Date]
       , LastDate('Date'[Date])
       , -30
       , DAY
    )
    ,'Date'[Date]
    , "SalesAmountSum"
    , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
  )
  ,[SalesAmountSum]
)

Since we use the expression LastDate(‘Date’[Date]) the last date of each period is used for the moving average – exactly what we wanted.

 

Average at month level as the average of the moving averages (day level)

In this approach, the monthly aggregate has to be calculated as the average of all the daily moving averages of that month. The picture shows what this means:

image

This might look pretty difficult. However, for our calculation we simply have to wrap the existing calculation in another Summarize – function.

This is the formula I used:

Sales Amount (30d avg 2):=
AverageX(
  summarize(
    'Date'
    , 'Date'[Date]
    , "DayAvg"
   
AverageX(
      Summarize(
        datesinperiod(
          'Date'[Date]
          ,LastDate('Date'[Date])
          , -30
          , DAY
        )
        ,'Date'[Date]
        , "SalesAmountSum"
        , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
      )
      ,[SalesAmountSum]
    )
  )
  ,[DayAvg]
)

The blue part of the formula is exactly the same is in the first calculation. We’re just wrapping this in an additional Average function. Why does this still work on a day-level? Simply because the outer average computes the average of a single value in this case.

So, with just a minor change to the formula, we changed the method of aggregation quite a lot.

 

3. The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period

This sounds quite simple but in this case we have to distinguish two calculation paths:

  • day level
  • month level and above

The following picture shows the calculation for the monthly aggregate:

image

In order to split our computation paths we need to find out if we are on day level or not. Usually the IsFiltered(…) DAX function can be used for this purpose. However, since we have some columns with date granularity (date, day of month, day of year) in our date dimension, we would have to write something like IsFiltererd(‘Date’[Date]) || IsFiltered(‘Date’[Day of Month]) || …

To simplify this, I simple used a count of days in the following code. If we count only one day, we’re on day level. Of course, the count is the more expensive operation, but for this example, I leave it that way (the date table is not really big).

 

Sales Amount (30d avg 3):=
if (
  Count('Date'[Date])=1,
  AverageX(
    Summarize(
      datesinperiod(
        'Date'[Date]
        ,LastDate('Date'[Date])
        , –30
        , DAY
      )
      , 'Date'[Date]
      , "SalesAmountSum"
      , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
    )
    ,[SalesAmountSum]
  )
  , 
  AverageX(
    Summarize(
      'Date'
      ,'Date'[Date]
      ,"SalesAmountAvg"
      , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
    )
    ,[SalesAmountAvg]
  )
)

Again the blue part of the formula is exactly the same as in our first approach. This part is taken whenever we’re on a day level. On higher levels, the aggregate is computed as a simple (not moving) average of all daily values.

So, using the concepts of my previous post we were able to change the aggregation method to meet very sophisticated requirements.

Sunday, September 11, 2011

Custom Aggregates in DAX / BISM Tabular (part 1)

SQL Server Denali | PowerPivot

Custom aggregates can be created using cube scripts in BISM multidimensional (SSAS OLAP cubes). How can we do this with BISM tabular? In many cases, simple DAX calculations can solve this for us.

I’m referring to the example of my previous post about semi additive measures. Let’s say we’re monitoring the stock of two products we’re selling. For the totals we want to see the average stock over time. At least once a month we’re taking a snapshot of the stock. If we have more than one snapshot per month, the monthly total computes as the average of those snapshot. For aggregation above the month level we want to take the average of the monthly averages. At first, this looks like we only have to use average as the aggregation function. But the average of averaged values is not identical to the average of all values. Let’s take a look at this source data table:

t1

For product Notate we have a single measurement of 50 pieces in February. For product Quickerstill we have 8 distinct measurements with an average of 50 pieces in February. However, when we look at the total average for Quickerstill, the 8 distinct measurements in February result in a higher weight of the February average and therefore in a higher total average of 44 instead of 30:

t2

The Average Stock measure in this example is the same semi additive measure as in my previous post and computed like this:

AVG Stock:=Sum([Stocklevel])/DISTINCTCOUNT([Date])

The requirement for the custom aggregate means, that we also want to see a total average of 30 for product Quickerstill (20+50+20=90, 90/3=30). This requirement is somewhat unusual as the computation above gives the “correct” average of all values. One interpretation is that the weight for computing the average of two or more months is not influenced by the number of measurements within the month.

We can achieve this in a way that is very similar to the semi additive calculations from my last post. Here is the resulting formula:

Special AVG Stock:=AVERAGEX(SUMMARIZE('Stock',[Year],[Month],"AvgStock",AVERAGE([Stocklevel])),[AvgStock])

This formula simply summarizes the average stock at a grouping level of year and month. Then, in a second step, it takes these values and computes the average of them. By doing so, we have broken the aggregation into two layers. First we average by month, then we take the average of those values.

Here is the resulting table using the new aggregate:

t3

And after expanding the February values (2nd month) we clearly see the how our custom aggregate works:

p1

Of course, this is just a simple custom aggregate but it is remarkable that we didn’t need any kind of cube script with scope-statements to achieve this but only a very simple DAX expression.

Sunday, September 4, 2011

Moving Averages in DAX vs. MDX

SQL Server 2008 | SQL Server 2008R2 | SQL Server Denali | PowerPivot

Yes, I’m a supporter of equal rights of DAX and MDX. And like many others, I can’t wait to have BISM multidimensional (aka OLAP Cubes) supporting DAX so that we can use project Crescent on top of all those nice cubes. But back to my topic. My last post was about moving averages in DAX and I was so sure I blogged about calculating them in MDX before… but I didn’t. This is not fair.

On the other hand, Mosha Pasumansky, the godfather of MDX, wrote an excellent and very complete article about this topic and I can only suggest reading it. It doesn’t only cover simple moving averages but also weighted and exponential ones. Also Bill Pearson wrote a very good step-by-step guide about this topic. You can find it here and I can only suggest reading it.

So, basically there is no need for me to write another article about this. Therefore this will be a very short blog post… ah, I just remembered something I may write about. Mosha and Bill both investigated on the calculation of moving averages within a query. In the context of a specific query, things are sometimes easier compared to the situation where you create a cube measure that has to work under different query conditions. For example, you cannot be sure which hierarchy has been used.

The first thing that comes into mind is the wizard for adding time intelligence. This wizard does a pretty good job. The main result is a short piece of code that is inserted into the cube script. This piece of code looks similar to the following example:

Scope(
       {
         [Measures].members
       }
) ;

/*Three Month Moving Average*/ 
  (
    [Date].[Calendar Date Calculations].[Three Month Moving Average],
    [Date].[Month Name].[Month Name].Members,
    [Date].[Date].Members
  ) 
    =
  Avg(
       ParallelPeriod(
                       [Date].[Calendar].[Month],
                       2,
                       [Date].[Calendar].CurrentMember
       )
       : [Date].[Calendar].CurrentMember
       *
       [Date].[Calendar Date Calculations].[Current Date]
  ) ; 
End Scope ;

The result can be intuitively used by end users as they simply have to choose in which kind of view the data should appear (actual, three month moving average or any other calculation generated by the wizard, for example year-to-day or year-over-year growth). Also, this computation is focusing on the data dimension, not the specific measure, so it can be used for any measure in the cube.

In my last post I used a DAX calculation that computed the moving average based on the last date in the current interval. We can do pretty much the same in MDX by “translating” the DAX formula to MDX. Here is the calculation for a cube calculated member:

CREATE MEMBER CURRENTCUBE.[Measures].SalesAmountAvg30d AS 
Avg(
  LASTPERIODS( 
       30
       , tail(descendants([Date].[Calendar].currentmember,[Date].[Calendar].[Date]),1).item(0)
  )
  , [Measures].[Internet Sales Amount]
);

After defining this measure we can use it in a query or within a pivot table. Here’s the result from a query:

select {[Measures].[Internet Sales Amount], [Measures].[SalesAmountAvg30d]} on 0,
descendants([Date].[Calendar].[Calendar Year].&[2003])  on 1
from [Adventure Works]

t1

If you compare these values with the values from my last post you see that the values are absolutely identical (just the order of the values differs because of the way I wrote the query). Here are both definitions side by side:

MDX

DAX

Avg(
LASTPERIODS( 
   30
   , tail(
       descendants(
         [Date].[Calendar].currentmember
         ,[Date].[Calendar].[Date]
       )
       ,1
     ).item(0)
   )
   , [Measures].[Internet Sales Amount]
);
AverageX(
Summarize(
  datesinperiod('Date'[Date] 
   , LastDate('Date'[Date]),-30,DAY) 
   ,'Date'[Date] 
   , "SalesAmountSum"
   , calculate( 
     Sum('Internet Sales'[Sales Amount]),
     ALLEXCEPT('Date','Date'[Date])
     )

   )
   ,[SalesAmountSum]
)

Again, the idea (approach) is the same in both cases, therefore both definitions are similar. However, in my opinion the DAX syntax is a bit harder to read in this case. Especially the CALCULATE(…, ALLEXCEPT(…)) makes it harder to understand. In MDX, we rely on attribute relationship for this purpose but in DAX we need to “break” the context manually.

Now, let’s do some performance tests. In order to compare performance I used these queries

 

MDX

DAX

with
member SalesAmountAvg AS 
  Avg(
    LASTPERIODS( 
      30
      , tail(
        descendants(
          [Date].[Calendar].currentmember
          ,[Date].[Calendar].[Date]),1
        ).item(0)
    )
    , [Measures].[Internet Sales Amount]
  )
 

select
{
  [Measures].[Internet Sales Amount]
  , SalesAmountAvg
} on 0,
descendants([Date].[Calendar].[All Periods],,LEAVES) on 1

from [Adventure Works]

define
measure 'Internet Sales'[SalesAmountAvg] =
  AverageX(
    Summarize(
      datesinperiod('Date'[Date] 
      , LastDate('Date'[Date]),-30,DAY) 
      ,'Date'[Date] 
      , "SalesAmountSum"
      , calculate( 
          Sum(
          'Internet Sales'[Sales Amount])
          ,ALLEXCEPT(
          'Date','Date'[Date])
        )
    )
   ,[SalesAmountSum]
)


evaluate (
  addcolumns(
    values('Date'[Date])
    ,"Internet Sales Amount"
    , SumX(relatedtable
       ('Internet Sales'),[Sales Amount])
    ,"SalesAmountAvg",
    'Internet Sales'[SalesAmountAvg]
  )
)

Both queries return exactly the same results (you may add an “order by ‘Date’[Date]” at the end of the DAX query in order to have the dates returned in the same order as from the MDX query).

For the MDX queries I cleared the cache before running the queries. I changed the number of days (number of days to include in the average, written bold, red in the queries above) and got the following results. For number of days = 0 I took out the calculation and left only the sales amount as aggregate. Time was measured in seconds using SQL Server Management Studio (on a virtual machine, old hardware).

 

n=0

n=10

n=30

n=50

n=100

n=1000

MDX

1

3

3

3

3

4

DAX

0

9

9

9

9

12

t2

When looking at these results I was somewhat surprised. Not about the situation that the DAX query took longer to execute. Please keep in mind that I’m running the queries on an early preview of the product so I suppose there is still a lot of debugging and internal logging going on here. We will have to wait for the final product to make a comparison. What surprises me was the fact that the DAX query time did not go up significantly with higher values of n. For the MDX engine I was pretty sure that it would perform this way because we have mature and a well built cache behind it. So, although we’re increasing the number of computed cells dramatically (with higher values for n), the MDX query performance should almost be constant as we have a lot of overlapping calculations here. But also the current DAX engine performs in the same way that shows how very well the DAX engine is implemented. This is a pretty good result and we can expect a lot of performance from this new DAX query engine.