Search This Blog

Sunday, September 25, 2011

Custom Aggregates in DAX / BISM Tabular (part 2)

SQL Server Danali | PowerPivot

This post combines two ideas

  1. The calculation of a moving average (last 30 days) as shown in a previous post
  2. Custom aggregations

What I want to show here today, is how we can influence the way our calculation works on higher levels. For this purpose I’m showing three different approaches:

  1. The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range.
    This corresponds to the original approach from my post about moving averages
  2. The monthly (quarterly, yearly) total is defined as the average of all the moving averages on day level.
  3. The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period (the moving average is only to be calculated on date level)

Of course, there are many more possibilities but I think the methods shown here, can be used for many other aggregation requirements.

Here are the three aggregations side by side.

image

Although the values on higher levels of aggregation differ a lot, the daily values are identical (we just wanted to change the rollup method, not the calculation on a detail level).

Let’s start with the first one:

 

1. The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range

This is shown here for our measure “Sales Amount(30 avg)”:

image

As shown in my previous post, the formula may look like this:

Sales Amount (30d avg):=
AverageX(
  Summarize(
    datesinperiod(
       'Date'[Date]
       , LastDate('Date'[Date])
       , -30
       , DAY
    )
    ,'Date'[Date]
    , "SalesAmountSum"
    , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
  )
  ,[SalesAmountSum]
)

Since we use the expression LastDate(‘Date’[Date]) the last date of each period is used for the moving average – exactly what we wanted.

 

Average at month level as the average of the moving averages (day level)

In this approach, the monthly aggregate has to be calculated as the average of all the daily moving averages of that month. The picture shows what this means:

image

This might look pretty difficult. However, for our calculation we simply have to wrap the existing calculation in another Summarize – function.

This is the formula I used:

Sales Amount (30d avg 2):=
AverageX(
  summarize(
    'Date'
    , 'Date'[Date]
    , "DayAvg"
   
AverageX(
      Summarize(
        datesinperiod(
          'Date'[Date]
          ,LastDate('Date'[Date])
          , -30
          , DAY
        )
        ,'Date'[Date]
        , "SalesAmountSum"
        , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
      )
      ,[SalesAmountSum]
    )
  )
  ,[DayAvg]
)

The blue part of the formula is exactly the same is in the first calculation. We’re just wrapping this in an additional Average function. Why does this still work on a day-level? Simply because the outer average computes the average of a single value in this case.

So, with just a minor change to the formula, we changed the method of aggregation quite a lot.

 

3. The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period

This sounds quite simple but in this case we have to distinguish two calculation paths:

  • day level
  • month level and above

The following picture shows the calculation for the monthly aggregate:

image

In order to split our computation paths we need to find out if we are on day level or not. Usually the IsFiltered(…) DAX function can be used for this purpose. However, since we have some columns with date granularity (date, day of month, day of year) in our date dimension, we would have to write something like IsFiltererd(‘Date’[Date]) || IsFiltered(‘Date’[Day of Month]) || …

To simplify this, I simple used a count of days in the following code. If we count only one day, we’re on day level. Of course, the count is the more expensive operation, but for this example, I leave it that way (the date table is not really big).

 

Sales Amount (30d avg 3):=
if (
  Count('Date'[Date])=1,
  AverageX(
    Summarize(
      datesinperiod(
        'Date'[Date]
        ,LastDate('Date'[Date])
        , –30
        , DAY
      )
      , 'Date'[Date]
      , "SalesAmountSum"
      , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
    )
    ,[SalesAmountSum]
  )
  , 
  AverageX(
    Summarize(
      'Date'
      ,'Date'[Date]
      ,"SalesAmountAvg"
      , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))
    )
    ,[SalesAmountAvg]
  )
)

Again the blue part of the formula is exactly the same as in our first approach. This part is taken whenever we’re on a day level. On higher levels, the aggregate is computed as a simple (not moving) average of all daily values.

So, using the concepts of my previous post we were able to change the aggregation method to meet very sophisticated requirements.

Sunday, September 11, 2011

Custom Aggregates in DAX / BISM Tabular (part 1)

SQL Server Denali | PowerPivot

Custom aggregates can be created using cube scripts in BISM multidimensional (SSAS OLAP cubes). How can we do this with BISM tabular? In many cases, simple DAX calculations can solve this for us.

I’m referring to the example of my previous post about semi additive measures. Let’s say we’re monitoring the stock of two products we’re selling. For the totals we want to see the average stock over time. At least once a month we’re taking a snapshot of the stock. If we have more than one snapshot per month, the monthly total computes as the average of those snapshot. For aggregation above the month level we want to take the average of the monthly averages. At first, this looks like we only have to use average as the aggregation function. But the average of averaged values is not identical to the average of all values. Let’s take a look at this source data table:

t1

For product Notate we have a single measurement of 50 pieces in February. For product Quickerstill we have 8 distinct measurements with an average of 50 pieces in February. However, when we look at the total average for Quickerstill, the 8 distinct measurements in February result in a higher weight of the February average and therefore in a higher total average of 44 instead of 30:

t2

The Average Stock measure in this example is the same semi additive measure as in my previous post and computed like this:

AVG Stock:=Sum([Stocklevel])/DISTINCTCOUNT([Date])

The requirement for the custom aggregate means, that we also want to see a total average of 30 for product Quickerstill (20+50+20=90, 90/3=30). This requirement is somewhat unusual as the computation above gives the “correct” average of all values. One interpretation is that the weight for computing the average of two or more months is not influenced by the number of measurements within the month.

We can achieve this in a way that is very similar to the semi additive calculations from my last post. Here is the resulting formula:

Special AVG Stock:=AVERAGEX(SUMMARIZE('Stock',[Year],[Month],"AvgStock",AVERAGE([Stocklevel])),[AvgStock])

This formula simply summarizes the average stock at a grouping level of year and month. Then, in a second step, it takes these values and computes the average of them. By doing so, we have broken the aggregation into two layers. First we average by month, then we take the average of those values.

Here is the resulting table using the new aggregate:

t3

And after expanding the February values (2nd month) we clearly see the how our custom aggregate works:

p1

Of course, this is just a simple custom aggregate but it is remarkable that we didn’t need any kind of cube script with scope-statements to achieve this but only a very simple DAX expression.

Sunday, September 4, 2011

Moving Averages in DAX vs. MDX

SQL Server 2008 | SQL Server 2008R2 | SQL Server Denali | PowerPivot

Yes, I’m a supporter of equal rights of DAX and MDX. And like many others, I can’t wait to have BISM multidimensional (aka OLAP Cubes) supporting DAX so that we can use project Crescent on top of all those nice cubes. But back to my topic. My last post was about moving averages in DAX and I was so sure I blogged about calculating them in MDX before… but I didn’t. This is not fair.

On the other hand, Mosha Pasumansky, the godfather of MDX, wrote an excellent and very complete article about this topic and I can only suggest reading it. It doesn’t only cover simple moving averages but also weighted and exponential ones. Also Bill Pearson wrote a very good step-by-step guide about this topic. You can find it here and I can only suggest reading it.

So, basically there is no need for me to write another article about this. Therefore this will be a very short blog post… ah, I just remembered something I may write about. Mosha and Bill both investigated on the calculation of moving averages within a query. In the context of a specific query, things are sometimes easier compared to the situation where you create a cube measure that has to work under different query conditions. For example, you cannot be sure which hierarchy has been used.

The first thing that comes into mind is the wizard for adding time intelligence. This wizard does a pretty good job. The main result is a short piece of code that is inserted into the cube script. This piece of code looks similar to the following example:

Scope(
       {
         [Measures].members
       }
) ;

/*Three Month Moving Average*/ 
  (
    [Date].[Calendar Date Calculations].[Three Month Moving Average],
    [Date].[Month Name].[Month Name].Members,
    [Date].[Date].Members
  ) 
    =
  Avg(
       ParallelPeriod(
                       [Date].[Calendar].[Month],
                       2,
                       [Date].[Calendar].CurrentMember
       )
       : [Date].[Calendar].CurrentMember
       *
       [Date].[Calendar Date Calculations].[Current Date]
  ) ; 
End Scope ;

The result can be intuitively used by end users as they simply have to choose in which kind of view the data should appear (actual, three month moving average or any other calculation generated by the wizard, for example year-to-day or year-over-year growth). Also, this computation is focusing on the data dimension, not the specific measure, so it can be used for any measure in the cube.

In my last post I used a DAX calculation that computed the moving average based on the last date in the current interval. We can do pretty much the same in MDX by “translating” the DAX formula to MDX. Here is the calculation for a cube calculated member:

CREATE MEMBER CURRENTCUBE.[Measures].SalesAmountAvg30d AS 
Avg(
  LASTPERIODS( 
       30
       , tail(descendants([Date].[Calendar].currentmember,[Date].[Calendar].[Date]),1).item(0)
  )
  , [Measures].[Internet Sales Amount]
);

After defining this measure we can use it in a query or within a pivot table. Here’s the result from a query:

select {[Measures].[Internet Sales Amount], [Measures].[SalesAmountAvg30d]} on 0,
descendants([Date].[Calendar].[Calendar Year].&[2003])  on 1
from [Adventure Works]

t1

If you compare these values with the values from my last post you see that the values are absolutely identical (just the order of the values differs because of the way I wrote the query). Here are both definitions side by side:

MDX

DAX

Avg(
LASTPERIODS( 
   30
   , tail(
       descendants(
         [Date].[Calendar].currentmember
         ,[Date].[Calendar].[Date]
       )
       ,1
     ).item(0)
   )
   , [Measures].[Internet Sales Amount]
);
AverageX(
Summarize(
  datesinperiod('Date'[Date] 
   , LastDate('Date'[Date]),-30,DAY) 
   ,'Date'[Date] 
   , "SalesAmountSum"
   , calculate( 
     Sum('Internet Sales'[Sales Amount]),
     ALLEXCEPT('Date','Date'[Date])
     )

   )
   ,[SalesAmountSum]
)

Again, the idea (approach) is the same in both cases, therefore both definitions are similar. However, in my opinion the DAX syntax is a bit harder to read in this case. Especially the CALCULATE(…, ALLEXCEPT(…)) makes it harder to understand. In MDX, we rely on attribute relationship for this purpose but in DAX we need to “break” the context manually.

Now, let’s do some performance tests. In order to compare performance I used these queries

 

MDX

DAX

with
member SalesAmountAvg AS 
  Avg(
    LASTPERIODS( 
      30
      , tail(
        descendants(
          [Date].[Calendar].currentmember
          ,[Date].[Calendar].[Date]),1
        ).item(0)
    )
    , [Measures].[Internet Sales Amount]
  )
 

select
{
  [Measures].[Internet Sales Amount]
  , SalesAmountAvg
} on 0,
descendants([Date].[Calendar].[All Periods],,LEAVES) on 1

from [Adventure Works]

define
measure 'Internet Sales'[SalesAmountAvg] =
  AverageX(
    Summarize(
      datesinperiod('Date'[Date] 
      , LastDate('Date'[Date]),-30,DAY) 
      ,'Date'[Date] 
      , "SalesAmountSum"
      , calculate( 
          Sum(
          'Internet Sales'[Sales Amount])
          ,ALLEXCEPT(
          'Date','Date'[Date])
        )
    )
   ,[SalesAmountSum]
)


evaluate (
  addcolumns(
    values('Date'[Date])
    ,"Internet Sales Amount"
    , SumX(relatedtable
       ('Internet Sales'),[Sales Amount])
    ,"SalesAmountAvg",
    'Internet Sales'[SalesAmountAvg]
  )
)

Both queries return exactly the same results (you may add an “order by ‘Date’[Date]” at the end of the DAX query in order to have the dates returned in the same order as from the MDX query).

For the MDX queries I cleared the cache before running the queries. I changed the number of days (number of days to include in the average, written bold, red in the queries above) and got the following results. For number of days = 0 I took out the calculation and left only the sales amount as aggregate. Time was measured in seconds using SQL Server Management Studio (on a virtual machine, old hardware).

 

n=0

n=10

n=30

n=50

n=100

n=1000

MDX

1

3

3

3

3

4

DAX

0

9

9

9

9

12

t2

When looking at these results I was somewhat surprised. Not about the situation that the DAX query took longer to execute. Please keep in mind that I’m running the queries on an early preview of the product so I suppose there is still a lot of debugging and internal logging going on here. We will have to wait for the final product to make a comparison. What surprises me was the fact that the DAX query time did not go up significantly with higher values of n. For the MDX engine I was pretty sure that it would perform this way because we have mature and a well built cache behind it. So, although we’re increasing the number of computed cells dramatically (with higher values for n), the MDX query performance should almost be constant as we have a lot of overlapping calculations here. But also the current DAX engine performs in the same way that shows how very well the DAX engine is implemented. This is a pretty good result and we can expect a lot of performance from this new DAX query engine.