Search This Blog

Tuesday, June 17, 2008

Calculating Quantiles in MDX

Microsoft MDX offers a huge set of mathematical function, which can be further enhanced by using Excel-functions. Excel-functions may be called as Excel!functionname but the performance for this approach is not really good. Besides that, not all of your every-day functions are available through MDX. Before using Excel-functions or writing your own SSAS stored procedures it's worth taking a look at the built-in capabilities of MDX.

In this post we're talking about the calulcation of quantiles. Quantiles are used e.g. for computing risk measures like the so called Value-At-Risk (VaR). The Value-At-Risk is usually a 1% or 5% quantile of the P&L distribution over a set of scenarios (typically created by a monte carlo simulation).

In the following MDX-sample we calculate a simple quantile over a set of days (listed in a dimension "Date"). In order to compute the p% quantile of a set of n values we first have to sort the values and take the element at position (n-1)*p%+1. Usually we will not end up with an integer here but we don't want to complicate things here too much.

So let's try this code in the Adventure Works cube script:

CREATE MEMBER CURRENTCUBE.[MEASURES].[Quantil5Prc] AS
max(bottomcount([Date].[Date].[Date],
int((Count([Date].[Date].[Date])-1)*0.05)+1,
[Measures].[Gross Profit]),[Measures].[Gross Profit]),
NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount], [Measures].[Total Product Cost] },
VISIBLE = 1 ;

We use a bottomcount here to get the smallest p% elements. By using the max-function we can easily get the biggest of those values (which is actually the last if we had sorted the complete set). The expression int(Count([Date].[Date].[Date]) gives the count of days in our date dimension. Be aware that if you have the unknown member enabled for the dimension this element is also counted so you should disable the unknown member or reduce the count by 2 instead of 1.

For Adventure Works I found 1158 days in the Date dimension. Calculating the quantile for single elements or sets of elements executes pretty fast. Of course, query performance depends on the value of p%. For calculating a 90% quantile with the above formula the same query takes longer than a 5% quantile. So it's better to provide a low p% and switch to topcount instead of sorting too many rows.

The simple test query I used looked like this:

select {[MEASURES].[Quantil5Prc]} on 0
from [Adventure Works]

The query results in the value of 986.6533$ meaning that only in 5% of all days we have a gross profit below this value. Of course, you could put [Product].[Category].[Category] for example on the rows to get a more detailed result.

For further enhancements think of putting the p% value in a separate dimension with given numbers and a reasonable default member. Then you could easily switch the percentage of the quantile or compare different quantiles in one pivot table.

4 comments:

  1. Michael Vincent WallerJune 29, 2010 at 9:40 PM

    How you could you do subseting here:

    (Count([Date].[Date].[Date])
    gives the count of days in our date dimension.


    So, Say I wanted to the Value at Risk for the last 260 days for one calculated measure, and Value at Riks for last 90 days in another calculated measure.

    Thanks so much! This is a great post!!

    ReplyDelete
  2. Michael, you can replace the reference to [Date].[Date].[Date] with a set, for example [Date].[Date].&[20100101]:[Date].[Date].&[20100629]
    Depending on the query context you could also use the keyword EXISTING to limit the set to a where condition.

    ReplyDelete
  3. Hello Hilma,

    could you please post an example for a 95% quantile? Would you just take TOPCOUNT instead of BOTTOMCOUNT?

    I want a 95% quantile for measure and a set per month and product.

    ReplyDelete
  4. Hi Marco, It's exactly like you wrote. You should use a topcount (with p% = 5%) to get the 95% percentile.
    The calculation would look somewhat like this:

    max(topcount([Date].[Date].[Date],
    int((Count([Date].[Date].[Date])-1)*0.05)+1

    In order to see the calculation by product, you should place the products on the rows. This makes the calculation from above beeing performed by product.

    Also, if you want to be aware of filters on the date Dimension, you should remember adding existing before the [Date].[Date].[Date] and for Excel you should consider a dynamic set as I explained in this post: http://ms-olap.blogspot.com/2010/02/solution-von-ssas-2008-multi-selects-in.html

    ReplyDelete