<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4879885044062498968</id><updated>2012-01-30T13:54:11.644+01:00</updated><title type='text'>Microsoft OLAP Blog by Hilmar Buchta</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>77</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-9011266809880315258</id><published>2012-01-08T13:30:00.001+01:00</published><updated>2012-01-08T13:30:04.906+01:00</updated><title type='text'>Sparse Snapshots in DAX / BISM Tabular</title><content type='html'>&lt;p align="right"&gt;SQL Server 2012/Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;In &lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;a href="http://ms-olap.blogspot.com/2011/11/sql-server-denali-powerpivot-common-way.html"&gt;my last post&lt;/a&gt;&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt; I wrote about using delta values instead of full snapshots. However, the amount of data is identical if we using a sparse snapshots instead. For this purpose, the source data from my last post has just to be converted to absolute values as shown below:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="716"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="311"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Only delta values&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="91"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="312"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Only snapshots (changes)&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="311"&gt;&lt;a href="http://lh4.ggpht.com/-DjIQlDAvq0k/TwmL_ulMEuI/AAAAAAAAC68/x6QKK-s5_g0/s1600-h/t33.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t3" border="0" alt="t3" src="http://lh6.ggpht.com/-1gQHF8B_c_E/TwmMAkJWHtI/AAAAAAAAC7E/whnolDhkntA/t3_thumb1.png?imgmax=800" width="298" height="384" /&gt;&lt;/a&gt;&lt;/td&gt;        &lt;td width="91"&gt;&lt;a href="http://lh5.ggpht.com/-3oYq6dVxQek/TwmMBYdGjlI/AAAAAAAAC7M/mcwzmfzVbi8/s1600-h/image3.png"&gt;&lt;img style="display: block; float: none; margin-left: auto; margin-right: auto" title="image" alt="image" src="http://lh4.ggpht.com/-nqDtIszXlN4/TwmMCcg9jyI/AAAAAAAAC7U/QUSltvs_fUA/image_thumb1.png?imgmax=800" width="72" height="62" /&gt;&lt;/a&gt;&lt;/td&gt;        &lt;td valign="top" width="312"&gt;&lt;a href="http://lh5.ggpht.com/-OjnC_4t2dF4/TwmMDQTCDQI/AAAAAAAAC7c/BGJ_whMaKU8/s1600-h/t23.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t2" border="0" alt="t2" src="http://lh4.ggpht.com/-Z0qzwOWKeEU/TwmMEVC56SI/AAAAAAAAC7k/7mwM23bXxdA/t2_thumb1.png?imgmax=800" width="296" height="383" /&gt;&lt;/a&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;So the amount of data is not really changed. But with the sparse snapshots the computations get a lot more difficult. If we have delta values (left table) we can simply sum up all deltas to a given date and this works for all of our related tables (for example products). With sparse snapshots (right table) we have to find out the last value per product and then add the results up to the total, so we have to do a calculation per product before.&lt;/p&gt;  &lt;p&gt;Before we get to the calculation, here is the very simple source data model that I used:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-hByZ3XfUHys/TwmMFcyv7WI/AAAAAAAAC7s/w30MNbBCXCw/s1600-h/t74.png"&gt;&lt;img style="display: inline" title="t7" alt="t7" src="http://lh6.ggpht.com/-RJwK1l0eirY/TwmMGqKSzkI/AAAAAAAAC70/FZ3QXkRERXY/t7_thumb2.png?imgmax=800" width="563" height="171" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;It took me some time to figure out the following solution and I’m pretty sure that there must be an easier method. So feel free to experiment and write comments.&lt;/p&gt;  &lt;p&gt;Here is the final code:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Stock:=      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; SumX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Product'       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; 'Product'[Product]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;LastStock&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , calculate(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[StockLevel])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , dateadd(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,-floor(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; -       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; MaxX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 1       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,DAY       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [LastStock]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This is how are data looks like &lt;em&gt;without&lt;/em&gt; the computation from above:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-0zjGnbWOYbw/TwmMHBT4uoI/AAAAAAAAC74/PFVxX8qByXQ/s1600-h/t510.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t5" border="0" alt="t5" src="http://lh6.ggpht.com/-_pEtOy5qpEM/TwmMH0TN3LI/AAAAAAAAC8A/acf2XhjIbc4/t5_thumb6.png?imgmax=800" width="334" height="202" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Since the data is sparse we only see the stock at dates where there are changes. For the aggregation I used the Sum function (which doesn’t make much sense here for the date aggregates) and you also see that the row total over the products only takes products into account when there are changes.&lt;/p&gt;  &lt;p&gt;And here is the resulting Excel pivot table using the calculation from above:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-_efmd6Vvsz8/TwmMIxYbjXI/AAAAAAAAC8M/MZcRVO8IFSU/s1600-h/t65.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t6" border="0" alt="t6" src="http://lh4.ggpht.com/-ln6rUJ3uB7Y/TwmMKMGm2tI/AAAAAAAAC8U/BB2Ihb6zJN4/t6_thumb3.png?imgmax=800" width="333" height="565" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you can see the value for Quickerstill starts with 20 then drops to 18 at January 3, then to 15 at January 5 etc. Also the totals are correct now. &lt;/p&gt;  &lt;p&gt;The remaining part of this post is about the formula from above, so it’s up to you to decide if you want to continue reading. The most important point here is that the calculation is much easier when working with delta rows or delta rows with intermediate snapshots (for example each first day of a month, quarter, year). &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Details of the calculation&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first question is about the way we’re doing the calculation. Wouldn’t it be easier and faster to have them on the rows and therefore persisted (is ‘persisted’ the right term when talking about an in-memory database??)? Well, if you look at the screenshots above, the only reason we’re doing all this is because we need calculated values on “rows” that do not exist. In the last screenshot there is no row for January 31, but the value has to be computed: For Notate it’s the value of January 1 and for Quickerstill it’s the value of January 21. So we have to use a calculated measure in the model.&lt;/p&gt;  &lt;p&gt;To makes things easier here, let’s start with a single product (Quickerstill). I want to calculate the last date for which I have a value. To do this I filter the date range from a very early date (2000/1/1 here) to the last date in the current context to those values, where the sum of the stock amount (any aggregate would do) is not blank, then take the biggest date (max function). This is the code:&lt;/p&gt;  &lt;p&gt;&lt;font color="#0000ff" face="Courier New"&gt;&lt;font color="#000000"&gt;Step1:=&lt;/font&gt;       &lt;br /&gt;&lt;font color="#333333"&gt;&amp;#160;&amp;#160;&amp;#160; MaxX(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;If I include the original measure and this calculation in a pivot table, this is the result:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-tA56RR2SVxI/TwmMLCjxFjI/AAAAAAAAC8c/TbghmyBkUs4/s1600-h/t84.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t8" border="0" alt="t8" src="http://lh5.ggpht.com/-URTJB_M78oo/TwmMMdvaOuI/AAAAAAAAC8k/Z4tT4uWdKCs/t8_thumb2.png?imgmax=800" width="264" height="537" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For each day our calculation returns the last date with data. For example the calculated date for January 12 is January 6 as this was the last day with data before January 12. It seems we are already close to the solution but please keep in mind that the calculation from above would not work for more than one product (for example if the products are not filtered) as the date for each product has to be different. If we filter the pivot table from above for product Notate, the results would look totally different. Here are the first rows:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-qL9XptDQ4EM/TwmMNF0k-_I/AAAAAAAAC8s/6gqp_qlVP0I/s1600-h/t93.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t9" border="0" alt="t9" src="http://lh4.ggpht.com/-rRK40SGttOw/TwmMOTbtncI/AAAAAAAAC80/EF2RiFDK07E/t9_thumb1.png?imgmax=800" width="265" height="289" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;But let’s ignore this for a second and keep the filter on Quickerstill. The next task would be to calculate the stock level at the calculated date. This sounds easy and the following formula was my first approach:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Step2 (with error):=      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; calculate(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[StockLevel])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&lt;font color="#0000ff"&gt;MaxX(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&lt;/font&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The code in blue is the exactly the code from above which gives the filter context for the calculation of the sum of the stock level values. However, this results in an error:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;Semantic Error: A function ‘MAXX‘ has been used in a True/False expression that is used as a table filter expression. This is not allowed.&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I love the ‘This is not allowed’ here. Actually there is no reason why it isn’t and I’m not feeling like I’m doing something illegal here. Maybe it’s also just a limitation of the CTP 3 beta release that I am currently working with. I also tried to wrap the MaxX function in a DateAdd with 0 days but this is also ‘not allowed’.&lt;/p&gt;  &lt;p&gt;But there is a way which can be found after some time of experimenting. I take the number of days between the last date from the context and the date from the calculation in blue and use this result to correct my last date. Sounds confusing? Let’s start with the number of days. Here is the next step of the calculation:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;&lt;font color="#000000"&gt;Step2:=        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; floor(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; -&lt;/font&gt;&lt;font color="#00ff00"&gt;        &lt;br /&gt;&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; MaxX(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;       &lt;br /&gt;&lt;font color="#000000"&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 1        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Again the code in blue is the last date calculation from above (step 1). Let’s take a look at the result of this calculation:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-l3iVk2BYngA/TwmMPlcqidI/AAAAAAAAC88/paqS8l0EKPE/s1600-h/t104.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t10" border="0" alt="t10" src="http://lh6.ggpht.com/-mF8qnm0sI_A/TwmMQ8NRLkI/AAAAAAAAC9E/LZ2M-X4yces/t10_thumb2.png?imgmax=800" width="312" height="472" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you see, the new calculation tells us, how many days we have to go back in time to find a value. For example for January 14 we get a value of 8 meaning we have to go back 8 days to January 6 to find a value. Now we can wrap this in the calculation of the stock value and for some reason that I don’t understand, this is not illegal anymore although I’m doing exactly the same as I did before. Here is the calculation:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Step3:=      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; calculate(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[StockLevel])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , dateadd(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,-&lt;font color="#0000ff"&gt;floor(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; -         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; MaxX(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 1         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&lt;/font&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,DAY       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Again the code from the last step is colored blue (step 2). Now we’re pretty close to the final formula. Let’s check the result when including both products:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-uYGkNp8bGaU/TwmMRU92cqI/AAAAAAAAC9M/qRsOCz2Bchk/s1600-h/t113.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t11" border="0" alt="t11" src="http://lh3.ggpht.com/-QAP7I-7Ea0o/TwmMSZzM9vI/AAAAAAAAC9U/oozrPt7KuUo/t11_thumb1.png?imgmax=800" width="395" height="302" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Now the result for each product is already working correctly. But the total is still not correct as you can see from the line marked in red. For January 2 the last date with data is the same for both products (January 1), therefore the value is correct. But for January 3 there are different days for the last stock value, so we only see one product in the total.&lt;/p&gt;  &lt;p&gt;However, the remaining part is not difficult. We simply summarize (group) by product and take the sum. Again, the code in blue is the last code from the step before (step 3):&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Stock:=      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; SumX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Product'       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; 'Product'[Product]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;LastStock&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &lt;font color="#0000ff"&gt;calculate(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[StockLevel])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , dateadd(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,-floor(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; -         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; MaxX(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Filter(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DatesBetween('Date'[Date], date(2000,1,1), LastDate('Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;X&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;#160; calculate(Sum('Stock'[StockLevel]), ALLEXCEPT('Date', 'Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , not isblank([X])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 1         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,DAY         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&lt;/font&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [LastStock]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Now, this does the trick and we end up with the final screenshot as shown at the beginning of this post.&lt;/p&gt;  &lt;p&gt;If you like you can play with the formula by downloading the sample work book &lt;a href="https://skydrive.live.com/view.aspx?cid=61F98448A5E17D57&amp;amp;resid=61F98448A5E17D57%21743" target="_blank"&gt;here&lt;/a&gt; (right click on the link, then choose ‘Save as…’). You will need the &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=26721" target="_blank"&gt;PowerPivot Add-In CTP 3&lt;/a&gt; or later in order to open the workbook.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-9011266809880315258?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/9011266809880315258/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2012/01/sparse-snapshots-in-dax-bism-tabular.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/9011266809880315258'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/9011266809880315258'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2012/01/sparse-snapshots-in-dax-bism-tabular.html' title='Sparse Snapshots in DAX / BISM Tabular'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-1gQHF8B_c_E/TwmMAkJWHtI/AAAAAAAAC7E/whnolDhkntA/s72-c/t3_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-1501153200009126263</id><published>2011-11-19T12:19:00.001+01:00</published><updated>2011-11-19T12:19:41.904+01:00</updated><title type='text'>Stock levels as delta rows in DAX / BISM Tabular</title><content type='html'>&lt;p align="right"&gt;SQL Server Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;A common way to compress large amounts of snap shot data is to store delta values instead of each snapshot value. This makes sense, if the data does not change every day. But for a good query performance you might not want to aggregate data over long periods.Therefore it makes sense to have a regular absolute snapshot value in the data and to use deltas between those snapshots. The work needed to create the periodic snapshots is usually done in the ETL process.&lt;/p&gt;  &lt;p&gt;For my example, I’m using the following data table with absolute snapshot values as the first row of the month and deltas afterwards:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-h07OAY1E-sc/TseQqdep5VI/AAAAAAAAC54/far4L-xra1U/s1600-h/p13.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p1" border="0" alt="p1" src="http://lh3.ggpht.com/-FYQsuHwVUlI/TseQr30I-eI/AAAAAAAAC6A/h3oCRNY_3s0/p1_thumb1.png?imgmax=800" width="251" height="415" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For example, for our product Quickerstill we start with a stock level of 20 boxes at the beginning of January 2011. Then, at January 3, 2011 we sold two of them (-2) and again on January 5 another 3 boxes (-3).&lt;/p&gt;  &lt;p&gt;The goal is to create a measure that gives us the current stock level at each date (also on the dates between the delta rows).&lt;/p&gt;  &lt;p&gt;In order to do so, we need a ‘real’ date dimension, so we have a separate date table that is linked to our facts. This is how this simple model looks like:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-gRkZmvqABIU/TseQsyAmHtI/AAAAAAAAC6I/9U1YlyZAm1g/s1600-h/p210.png"&gt;&lt;img style="display: inline" title="p2" alt="p2" src="http://lh5.ggpht.com/-JbJ6qeSeGL8/TseQunf7OuI/AAAAAAAAC6Q/lNPQZsrqwCo/p2_thumb6.png?imgmax=800" width="591" height="234" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The measure DeltaStock is the original stock value from our table above (mixture of snapshots and deltas). &lt;/p&gt;  &lt;p&gt;In order to perform the desired computation we can simply use the month-to-date formula as each day’s value computes as the sum from the first day of the month to the current day (including all deltas).&lt;/p&gt;  &lt;p&gt;This is the simple formula we’re using to compute the stock at each day:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Stock:=TOTALMTD(SUM([DeltaStock]),'Date'[Date])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;It’s amazing how simple this calculation is. Let’s take a look at the result. In order to see the effect, I added the original stock column together with the new computed column.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-Tyt76X5y3Dc/TseQvaYW69I/AAAAAAAAC6Y/NV0enUGvjdo/s1600-h/p310.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p3" border="0" alt="p3" src="http://lh5.ggpht.com/-eUVAmTagTvM/TseQwvubg2I/AAAAAAAAC6g/UTBX3RLwqCk/p3_thumb6.png?imgmax=800" width="206" height="555" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you can see, we now have a stock for each day that computes correctly from the mixture of snapshot and deltas values.&lt;/p&gt;  &lt;p&gt;Of course we could also do the calculation without the absolute snapshot values in between. In this case we have to aggregate the values from the very beginning up to the current date. First let’s take a look at the source data without the absolute snapshots:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-efc1HYAE6pc/TseQxRgq3kI/AAAAAAAAC6o/JNwtlYR6ONA/s1600-h/t13.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t1" border="0" alt="t1" src="http://lh5.ggpht.com/-FpyamtavD6U/TseQzBSxKfI/AAAAAAAAC6w/W64gKpPwEGw/t1_thumb1.png?imgmax=800" width="260" height="311" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The data is pretty much the same as in the first approach. Only the rows for the absolute snapshots are missing (apart from the first initial values).&lt;/p&gt;  &lt;p&gt;In this case the calculation would look like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Stock:=      &lt;br /&gt;&amp;#160; SumX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DATESBETWEEN(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , date(2000,1,1)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , lastdate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,calculate(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[DeltaStock])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,ALLEXCEPT('Date','Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The calculation is still pretty simple. However I would prefer the option with the snapshot values in between for performance reasons and because these snapshots can be easily created in ETL (if they are not delivered from the source system).&lt;/p&gt;  &lt;p&gt;Just two more remarks before I finish this post. The first one is about the DatesBetween range in the formula above. I’m using 2000/1/1 as the start date. However if you take a look at the returned dates only the existing rows from our date table are returned if they are matching this date range. So we could also write 1900/1/1 without risking to end up with a lot of rows.&lt;/p&gt;  &lt;p&gt;The other remark is about the future time. Since our calculation takes the last value as the value for all the future, you will find values for all entries of the date dimension. This might not be wanted. In this case you can wrap the calculation from above inside an if statement to check the date:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;StockClipped:=      &lt;br /&gt;&amp;#160; if(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; firstdate('Date'[Date])&amp;gt;Now()       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , Blank()       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,SumX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DATESBETWEEN(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , date(2000,1,1)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , lastdate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,calculate(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Stock'[DeltaStock])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,ALLEXCEPT('Date','Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160; )&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;With this modification (which also works with the formula for the absolute intermediate snapshots from above) values are only shown for periods that are over or have at least started. So the formula would return a value for the full year 2011 once the year has started. If you only want to see values for periods that have ended, you can replace the function ‘firstdate’ with a ‘lastdate’.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-1501153200009126263?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/1501153200009126263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/11/sql-server-denali-powerpivot-common-way.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1501153200009126263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1501153200009126263'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/11/sql-server-denali-powerpivot-common-way.html' title='Stock levels as delta rows in DAX / BISM Tabular'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-FYQsuHwVUlI/TseQr30I-eI/AAAAAAAAC6A/h3oCRNY_3s0/s72-c/p1_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6319737781040218720</id><published>2011-10-23T15:44:00.001+02:00</published><updated>2011-10-23T15:44:58.824+02:00</updated><title type='text'>Excel Services Scorecard as Windows Desktop Gadget</title><content type='html'>&lt;p align="right"&gt;SharePoint 2010 | SQL Server 2008 | SQL Server 2008R2 | SQL Server 2012 (Denali)&lt;/p&gt;  &lt;p align="left"&gt;Early this year I stumbled across a very interesting Windows desktop gadget, that is capable of showing an Excel Services &lt;em&gt;element&lt;/em&gt; on the Windows desktop. Here, &lt;em&gt;element&lt;/em&gt; can be a certain named region, a pivot table or a chart in an Excel Services document. The technology for showing this element is the Excel Services REST API (REST stands for Representational State Transfer).&lt;/p&gt;  &lt;p align="left"&gt;You can find the article about the desktop gadget as well as the download link for the gadget itself &lt;a href="http://blogs.msdn.com/b/cumgranosalis/archive/2009/11/03/interoducing-the-excel-services-gadget.aspx" target="_blank"&gt;here&lt;/a&gt;. Installation and configuration of the gadget is pretty well explained on the linked site, so I can keep this short here.&lt;/p&gt;  &lt;p&gt;After adding the gadget to your desktop, the gadget still needs to be configured. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-ztAroAiIVkw/TqQZvGlhdqI/AAAAAAAAC1Y/RU9Od2J2OnQ/s1600-h/p13.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p1" border="0" alt="p1" src="http://lh4.ggpht.com/-wD2tNC2WuTM/TqQZwiOlY4I/AAAAAAAAC1g/NM6v3Z3S79U/p1_thumb1.png?imgmax=800" width="198" height="189" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;By clicking on the gadget configuration icon, the configuration dialog is displayed:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-xTA_-6CKkkQ/TqQZx4yqb3I/AAAAAAAAC1o/JhNf_KDdigc/s1600-h/p22.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p2" border="0" alt="p2" src="http://lh4.ggpht.com/-wM18LZD6MUQ/TqQZy5LABuI/AAAAAAAAC1w/U-gL3MuFNkw/p2_thumb.png?imgmax=800" width="202" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="798"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="144"&gt;Workbook&lt;/td&gt;        &lt;td valign="top" width="652"&gt;The URL to your Excel Services workbook, for example          &lt;br /&gt;http://srv1/PowerPivot/Gadget.xlsx&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="144"&gt;Show in gadget&lt;/td&gt;        &lt;td valign="top" width="652"&gt;Here you can pick from any named region, pivot table or chart that should be displayed in the “expanded” state of the gadget (flyout)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="144"&gt;Thumbnail&lt;/td&gt;        &lt;td valign="top" width="652"&gt;Here you can pick from any named region, pivot table or chart that should be displayed in the normal state of the gadget. This is what you see on your desktop first&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="144"&gt;Refresh&lt;/td&gt;        &lt;td valign="top" width="652"&gt;refresh interval of the gadget &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;So the simple idea is to create a nice pivot table and to use this as the basis for the gadget. For my example I created a simple pivot table based on the Finance perspective of the Adventure Works OLAP cube. For the pivot table I have the year on the filter, the sales amount and the operating profit KPI as the data and the departments on the rows.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-2MvC67tkNgE/TqQZz6Nk3TI/AAAAAAAAC14/S6EIH78WlkE/s1600-h/p33.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p3" border="0" alt="p3" src="http://lh5.ggpht.com/-rVLCAmVitUI/TqQZ1Kdw6rI/AAAAAAAAC2A/OmIg1pv03h4/p3_thumb1.png?imgmax=800" width="478" height="180" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If we use this large table in our gadget, the thumbnail view gets pretty much “microscopic”:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-zCKIMqrozU0/TqQZ2UZjjmI/AAAAAAAAC2I/RQXzldA3BEA/s1600-h/p63.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p6" border="0" alt="p6" src="http://lh3.ggpht.com/-CMTHH37rhx0/TqQZ3pmgOGI/AAAAAAAAC2Q/9n0WfF9ONdg/p6_thumb1.png?imgmax=800" width="226" height="167" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;So, for the thumbnail (default) view you should choose a much smaller area. The flyout (detail view) is much better, but the KPI indicators and the filters are not shown:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-EzVVyBw3j8w/TqQZ4n2qXSI/AAAAAAAAC2Y/CZUyDJm7mSE/s1600-h/p58.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p5" border="0" alt="p5" src="http://lh6.ggpht.com/-6qnov9YuUKg/TqQZ53nGq2I/AAAAAAAAC2g/qC-0MblK8dI/p5_thumb3.png?imgmax=800" width="388" height="150" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;So, here are a few tips and tricks you can use to make the gadget look nicer.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;1. Use time filters for current year, current month etc.&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Of course we want our gadget to always show the latest values. Because of the auto refresh interval, we don’t have to care about this. But usually we will also use a time filter to restrict the data to a specific week, month etc. Since we don’t see the filter, the idea is to have this set automatically to the current time.&lt;/p&gt;  &lt;p&gt;This can be easily done in the Excel Pivot Table by using date filters. Unfortunately, these filters don’t work in the filter area, so we have to place the time hierarchy on the rows or columns. Then we can apply a filter (for example current year) as shown here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-kfeRYqvBq_0/TqQZ66LbOsI/AAAAAAAAC2o/VvCduJNvGOQ/s1600-h/p74.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p7" border="0" alt="p7" src="http://lh6.ggpht.com/-LRMJTYXrsW4/TqQZ8rFHg4I/AAAAAAAAC2w/tpOHP1sJ_V0/p7_thumb2.png?imgmax=800" width="326" height="417" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For Adventure Works this would not result in any data since the sample date is only available for the years shown in the screenshot. However, in real life scenarios, this would be a good choice for the filter.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;2. For thumbnail view, convert the pivot table to formulas&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Formulas are much easier to handle and to format, compared to a pivot table. Remembering the very narrow available space for the thumbnail view, having full control of the layout is important. You may even want to hard-code the thumbnail view. Let me explain what I mean. In order to have the connection name at a single place, we start in a blank sheet by putting the connection name in a named cell called “OLAPConnection” (feel free to choose a different name):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-nxB7uX6etJo/TqQZ9drAPUI/AAAAAAAAC24/v-Ywg6QHAKY/s1600-h/p83.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p8" border="0" alt="p8" src="http://lh5.ggpht.com/-MKsUj-hMXZ8/TqQZ-aZvpSI/AAAAAAAAC3A/JRFXp1Y6Kms/p8_thumb1.png?imgmax=800" width="592" height="87" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As the next step, let’s construct the current date member. From Management Studio, we can see that the MDX name of the time members looks like this (may be different in your cube, this example is taken from the Adventure Works sample database):&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;font face="Courier New"&gt;[Date].[Calendar].[Calendar Year].&amp;amp;[2001]&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Courier New"&gt;[Date].[Calendar].[Month].&amp;amp;[2001]&amp;amp;[1]&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Courier New"&gt;[Date].[Calendar].[Date].&amp;amp;[20010101]&lt;/font&gt; &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Here we have January 1, 2001. This is easily constructed using Excel’s time functions. So we simply add these fields to our Excel sheet&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="791"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="91"&gt;&lt;strong&gt;Label/Name&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="40"&gt;&lt;strong&gt;Cell&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="658"&gt;&lt;strong&gt;Formula&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="91"&gt;Today&lt;/td&gt;        &lt;td valign="top" width="40"&gt;B3&lt;/td&gt;        &lt;td valign="top" width="658"&gt;&lt;font face="Courier New"&gt;=Now()&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="91"&gt;Year&lt;/td&gt;        &lt;td valign="top" width="40"&gt;B4&lt;/td&gt;        &lt;td valign="top" width="658"&gt;&lt;font face="Courier New"&gt;=&amp;quot;[Date].[Calendar].[Calendar Year].&amp;amp;[&amp;quot; &amp;amp; Year(B3) &amp;amp; &amp;quot;]&amp;quot;&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="91"&gt;Month&lt;/td&gt;        &lt;td valign="top" width="40"&gt;B5&lt;/td&gt;        &lt;td valign="top" width="658"&gt;&lt;font face="Courier New"&gt;=&amp;quot;[Date].[Calendar].[Month].&amp;amp;[&amp;quot; &amp;amp; Year(B3) &amp;amp; &amp;quot;]&amp;amp;[&amp;quot; &amp;amp; Month(B3) &amp;amp; &amp;quot;]&amp;quot;&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="91"&gt;Day&lt;/td&gt;        &lt;td valign="top" width="40"&gt;B6&lt;/td&gt;        &lt;td valign="top" width="658"&gt;&lt;font face="Courier New"&gt;=&amp;quot;[Date].[Calendar].[Date].&amp;amp;[&amp;quot; &amp;amp; 10000*Year(B3)+100*Month(B3)+Day(B3) &amp;amp;&amp;quot;]&amp;quot;&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Of course you could also add fields for the previous month, the week etc., just depending on the needs of your scorecard. I named the cells B4 as MDXYear, B5 as MDXMonth, B6 as MDXDay. This is how the result looks like:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-K6G_hn306bE/TqQZ_Gs1zbI/AAAAAAAAC3I/SgWy6NauTjg/s1600-h/p93.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p9" border="0" alt="p9" src="http://lh5.ggpht.com/-E6HsWpt4B1o/TqQaAcrTFiI/AAAAAAAAC3Q/3wVizKSk54A/p9_thumb1.png?imgmax=800" width="554" height="137" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I had to fake the current date in order to see some values. Therefore I replaced the formula for today with this one:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;=Date(2004, Month(Now()), Day(Now()))&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Of course, you won’t want to do this in a real life scenario but since the sample dataset contains no data for 2011 I had to use this “time machine formula”.&lt;/p&gt;  &lt;p&gt;Before you start wondering what all this is good for, let’s query some data. For example, let’s assume that we want to see the operating profit (which is on the account ‘Operating Profit’) for the current year. So this would by our Excel formula:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;=CUBEVALUE(OLAPConnection,&amp;quot;[Measures].[Amount]&amp;quot;,&amp;quot;[Account].[Accounts].[Operating Profit]&amp;quot;,MDXYear)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;In order to show this value in our gadget, I placed it on a new sheet and adjusted the column width and height a little bit.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-txBfXuNvMpc/TqQaBGRzt2I/AAAAAAAAC3Y/3bKSiVWt1kY/s1600-h/p106.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p10" border="0" alt="p10" src="http://lh4.ggpht.com/-B4tPTM-nOa8/TqQaCCTHTnI/AAAAAAAAC3g/ROteAyLtEWs/p10_thumb2.png?imgmax=800" width="168" height="83" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In order to get a nice flyout, I also created a simple pivot chart in the Excel sheet, showing the operational profit during the year. After saving the Excel file, our gadget now looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-pO8z3pAJ_Co/TqQaDZtOZ8I/AAAAAAAAC3o/ir64_wDPLCs/s1600-h/p123.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p12" border="0" alt="p12" src="http://lh4.ggpht.com/-N3yv0vDzuAw/TqQaER0-5bI/AAAAAAAAC3w/UKwSIuHsGEU/p12_thumb1.png?imgmax=800" width="213" height="154" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And here is the flyout:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-fULoC-PGd0k/TqQaHLGLEkI/AAAAAAAAC34/ORxQ5WD3_6M/s1600-h/p133.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p13" border="0" alt="p13" src="http://lh6.ggpht.com/-fOt675O-wAA/TqQaIURUr5I/AAAAAAAAC4A/CGjpT4eLwD0/p13_thumb1.png?imgmax=800" width="298" height="270" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;3. (Conditional) background colors and fonts are preserved in the REST API&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In order to include a kind of traffic light approach, we already found out that Excel indicators are not yet supported in the REST API. However, conditional formatting is supported, so you can easily create a scorecard like the follow:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-cOIhUN0x66w/TqQaJHiuLxI/AAAAAAAAC4I/c3tzAO7BJN0/s1600-h/p147.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p14" border="0" alt="p14" src="http://lh4.ggpht.com/-UUY8l3RQmgA/TqQaKQKuOrI/AAAAAAAAC4Q/JHeWVngRwUQ/p14_thumb3.png?imgmax=800" width="170" height="209" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Also KPI indicators can be created using special characters, for example from the WingDings font as shown in the following example:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-x1Ma0EDRlUs/TqQaLeCisCI/AAAAAAAAC4Y/L4PtP5V8Uzo/s1600-h/p154.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p15" border="0" alt="p15" src="http://lh4.ggpht.com/-3PjUcm44l9U/TqQaMUAGF0I/AAAAAAAAC4g/HyZ9kSr94k0/p15_thumb2.png?imgmax=800" width="155" height="200" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For this example, I used a separate table of banding ranges and an Excel VLookup to find the appropriate color for the indictor. Here, 1=green, 2=yellow, 3=red. In the Excel cell I used conditional formatting to choose the text color appropriately. But as we always want to display a certain element (here, the diamond from the WingDings font), I used a custom format for each of the cells, so the number (1, 2 or 3) is not shown but only one character. The corresponding character for the diamond is “u”, so the custom format looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-HTDxee7Vke0/TqQaNJDxDrI/AAAAAAAAC4o/NfQ0769ZTZU/s1600-h/p203.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p20" border="0" alt="p20" src="http://lh6.ggpht.com/-udsLgRRSXW4/TqQaOhTb8LI/AAAAAAAAC4w/tp49rIeAa0E/p20_thumb1.png?imgmax=800" width="365" height="327" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And of course, there is a lot more you can do with all these formatting, cube functions etc.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;3. Make changes to the source code of the gadget&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;You can extract the gadget or modify the source code of the gadget yourself. After installing the gadget, the extracted sources can be found here: &lt;/p&gt;  &lt;p&gt;%LOCALAPPDATA%\Microsoft\Windows Sidebar\Gadgets&lt;/p&gt;  &lt;p&gt;After modifying the sources, the gadgets need to be switched off and on in order for the changes to apply. The main file is the gadget.html here. For example you could change the link “By Excel Services” to point on your SharePoint server. To do so, a simple change in the source is needed:&lt;/p&gt;  &lt;p&gt;Before:    &lt;br /&gt;&lt;font face="Courier New"&gt;&amp;lt;tr&amp;gt;&amp;lt;td id=&amp;quot;dockedTitle&amp;quot; width=&amp;quot;100%&amp;quot;&amp;gt;      &lt;br /&gt;&amp;lt;a id=&amp;quot;leftDockedTitleLink&amp;quot; href=&amp;quot;&lt;/font&gt;&lt;font face="Courier New"&gt;http://blogs.msdn.com/cumgranosalis/pages/excel-services-windows-7-gadget.aspx&amp;quot;&lt;/font&gt;&lt;font face="Courier New"&gt;&amp;gt;By Excel Services&amp;lt;/a&amp;gt;      &lt;br /&gt;&amp;lt;/td&amp;gt;&amp;lt;td&amp;gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;After:    &lt;br /&gt;&lt;font face="Courier New"&gt;&amp;lt;tr&amp;gt;&amp;lt;td id=&amp;quot;dockedTitle&amp;quot; width=&amp;quot;100%&amp;quot;&amp;gt;      &lt;br /&gt;&amp;lt;a id=&amp;quot;leftDockedTitleLink&amp;quot; href=&amp;quot;&lt;/font&gt;&lt;font face="Courier New"&gt;http://srv1/PowerPivot/Forms/AllItems.aspx&amp;quot;&lt;/font&gt;&lt;font face="Courier New"&gt;&amp;gt;All reports&amp;lt;/a&amp;gt;      &lt;br /&gt;&amp;lt;/td&amp;gt;&amp;lt;td&amp;gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;I also changed the background color here, so this is how the result looks like:&lt;/p&gt;  &lt;p&gt;   &lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-XX-_gkNGur8/TqQaPjPrbyI/AAAAAAAAC44/sIgSUVIp3mA/s1600-h/p172.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p17" border="0" alt="p17" src="http://lh4.ggpht.com/-Z4mFlMp5-t4/TqQaRCLz40I/AAAAAAAAC5A/lLnTvU10uWU/p17_thumb.png?imgmax=800" width="198" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you can see, you can do a lot of interesting things with this simple but very powerful desktop gadget.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;4. Use parameters&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;If you want to use the same Excel Services file for more than one user you may want to pass parameters from each individual instance of the desktop gadget to the Excel Services file. This is also possible, however there are some things to take care of.&lt;/p&gt;  &lt;p&gt;The syntax for passing a parameter is &lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Ranges('cellname')=value&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Here, cellname is a named cell in Excel that we want to pass the value to. This is how it is entered in the desktop gadget, if our named cell is named ‘value1’ and we want to pass ‘xyz’:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-WvhQnLQ71xI/TqQaSFIVHsI/AAAAAAAAC5I/fvA87NJNfwk/s1600-h/z12.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="z1" border="0" alt="z1" src="http://lh6.ggpht.com/-rrMVzbFu3Lg/TqQaUpAz6jI/AAAAAAAAC5Q/bPzf-82lH2Q/z1_thumb.png?imgmax=800" width="244" height="214" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Multiple parameters may be passed by separating them with an ampersand, for example&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Ranges('value1')=xyz&amp;amp;Ranges('value2')=42&amp;amp;Ranges('value3')=01&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This could be our corresponding view in the desktop gadget:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-SLhhLFI9rJQ/TqQaVePzlvI/AAAAAAAAC5Y/7_j3wISuoLg/s1600-h/z23.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="z2" border="0" alt="z2" src="http://lh3.ggpht.com/-U2lbaiFmu5E/TqQaWJ-WU5I/AAAAAAAAC5g/glwRVYjf898/z2_thumb1.png?imgmax=800" width="102" height="109" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I just added a field that concatenates all three parameters to show that the Excel Services sheet is recomputed based on the passed values.&lt;/p&gt;  &lt;p&gt;As you can see, all parameters are passed to the gadget. However, numeric parameters are considered as numbers in Excel, so 01 was changed to 1. If you want to prevent this, add a ' in front of each text. &lt;/p&gt;  &lt;p&gt;This is how the parameters should look like when you want to make sure, text is passed as text (and not converted to a number or date) in Excel:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Ranges('value1')=&lt;font color="#ff0000"&gt;'&lt;/font&gt;xyz&amp;amp;Ranges('value2')=42&amp;amp;Ranges('value3')=&lt;font color="#ff0000"&gt;'&lt;/font&gt;01&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;You should also be aware, that this string is passed to the URL “as it is”. So you have to encode all characters that are not allowed in a URL.&lt;/p&gt;  &lt;p&gt;For example, if you want to pass the year 2006 from AdventureWorks, the unique name of the date member would be:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;[Date].[Calendar].[Calendar Year].&amp;amp;[2006]&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;In order to pass this to the gadget, you have to encode blanks, the square brackets and the ampersand. So the result would look like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Ranges('parDate')=%5BDate%5D.%5BCalendar%5D.%5BCalendar%20Year%5D.%26%5B2006%5D&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This is very difficult to read. Therefore I recommend just to pass the key (2006 in this case) as the parameter and to construct the unique name in Excel using a formula like &lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;=&amp;quot;[Date].[Calendar].[Calendar Year].&amp;amp;[&amp;quot; &amp;amp; parDate &amp;amp; &amp;quot;]&amp;quot;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Of course, there are a lot more things you can do with this simple, yet powerful Windows desktop gadget. Like with most other gadget you may also place more than one instance of the gadget on your desktop, so you can have different scorecards. You can also add links to your Excel sheet, so that you can jump directly to a dashboard for a specific key performance indicator. Just start playing with the gadget and see how powerful und useful it is.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6319737781040218720?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6319737781040218720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/10/excel-services-scorecard-as-windows.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6319737781040218720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6319737781040218720'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/10/excel-services-scorecard-as-windows.html' title='Excel Services Scorecard as Windows Desktop Gadget'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-wD2tNC2WuTM/TqQZwiOlY4I/AAAAAAAAC1g/NM6v3Z3S79U/s72-c/p1_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-7952576105842938567</id><published>2011-09-25T15:33:00.001+02:00</published><updated>2011-09-25T15:33:38.183+02:00</updated><title type='text'>Custom Aggregates in DAX / BISM Tabular (part 2)</title><content type='html'>&lt;p align="right"&gt;SQL Server Danali | PowerPivot&lt;/p&gt;  &lt;p align="left"&gt;This post combines two ideas&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;     &lt;div align="left"&gt;The calculation of a moving average (last 30 days) as shown in a &lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt;&lt;a href="http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html" target="_blank"&gt;previous post&lt;/a&gt;&lt;/font&gt;&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div align="left"&gt;Custom aggregations&lt;/div&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;p align="left"&gt;What I want to show here today, is how we can influence the way our calculation works on higher levels. For this purpose I’m showing three different approaches:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;     &lt;div align="left"&gt;The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range.        &lt;br /&gt;This corresponds to the original approach from my post about &lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt;&lt;a href="http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html" target="_blank"&gt;moving averages&lt;/a&gt;&lt;/font&gt;&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div align="left"&gt;The monthly (quarterly, yearly) total is defined as the average of all the moving averages on day level.&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div align="left"&gt;The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period (the moving average is only to be calculated on date level)&lt;/div&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;p align="left"&gt;Of course, there are many more possibilities but I think the methods shown here, can be used for many other aggregation requirements.&lt;/p&gt;  &lt;p align="left"&gt;Here are the three aggregations side by side. &lt;/p&gt;  &lt;p align="left"&gt;&lt;a href="http://lh6.ggpht.com/-lLBfTpsmMdg/Tn8tWXyxE4I/AAAAAAAAC00/cWg72JC6BY4/s1600-h/image18.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-QElL2xLhfxM/Tn8tZtv1MfI/AAAAAAAAC04/y-H0JvKgVuw/image_thumb10.png?imgmax=800" width="499" height="557" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p align="left"&gt;Although the values on higher levels of aggregation differ a lot, the daily values are identical (we just wanted to change the rollup method, not the calculation on a detail level).&lt;/p&gt;  &lt;p align="left"&gt;Let’s start with the first one:&lt;/p&gt;  &lt;p align="left"&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;1. The monthly (quarterly, yearly) total is defined as the last day’s moving average of the given date range&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This is shown here for our measure “Sales Amount(30 avg)”:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-hqBwaJarshE/Tn8tdX1jTXI/AAAAAAAAC08/cE4RXC2aNuQ/s1600-h/image4.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-cFLP3q11ziI/Tn8tgAXMh9I/AAAAAAAAC1A/3Gjul_cx-PI/image_thumb2.png?imgmax=800" width="457" height="455" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As shown in my &lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt;&lt;a href="http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html" target="_blank"&gt;previous post&lt;/a&gt;&lt;/font&gt;&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;, the formula may look like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg):=      &lt;br /&gt;AverageX(       &lt;br /&gt;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; datesinperiod(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , LastDate('Date'[Date])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , -30       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , DAY       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))       &lt;br /&gt;&amp;#160; )       &lt;br /&gt;&amp;#160; ,[SalesAmountSum]       &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Since we use the expression LastDate(‘Date’[Date]) the last date of each period is used for the moving average – exactly what we wanted.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Average at month level as the average of the moving averages (day level)&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In this approach, the monthly aggregate has to be calculated as the average of all the daily moving averages of that month. The picture shows what this means:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-s1ATnhVGKYI/Tn8tjmjcFwI/AAAAAAAAC1E/PpSkrpIoXAs/s1600-h/image8.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-Fm_r9WYecng/Tn8tmLfQBKI/AAAAAAAAC1I/znpYPWd_JJM/image_thumb4.png?imgmax=800" width="464" height="445" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This might look pretty difficult. However, for our calculation we simply have to wrap the existing calculation in another Summarize – function. &lt;/p&gt;  &lt;p&gt;This is the formula I used:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg 2):=      &lt;br /&gt;AverageX(       &lt;br /&gt;&amp;#160; summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; 'Date'       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;DayAvg&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;/font&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;AverageX(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; datesinperiod(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , -30         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , DAY         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,[SalesAmountSum]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&lt;/font&gt;&amp;#160; )       &lt;br /&gt;&amp;#160; ,[DayAvg]       &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The blue part of the formula is exactly the same is in the first calculation. We’re just wrapping this in an additional Average function. Why does this still work on a day-level? Simply because the outer average computes the average of a single value in this case.&lt;/p&gt;  &lt;p&gt;So, with just a minor change to the formula, we changed the method of aggregation quite a lot.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;3. The monthly (quarterly, yearly) total is defined as the average of the sales amount of all days of that period&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This sounds quite simple but in this case we have to distinguish two calculation paths:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;day level &lt;/li&gt;    &lt;li&gt;month level and above &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The following picture shows the calculation for the monthly aggregate:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-y16ZLsoHdcE/Tn8tpxY4J4I/AAAAAAAAC1M/j8DTvKJ5Y8U/s1600-h/image14.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-nUIOwGI0UMg/Tn8tsIRPEOI/AAAAAAAAC1Q/7Bkm9QhojVw/image_thumb8.png?imgmax=800" width="462" height="436" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In order to split our computation paths we need to find out if we are on day level or not. Usually the IsFiltered(…) DAX function can be used for this purpose. However, since we have some columns with date granularity (date, day of month, day of year) in our date dimension, we would have to write something like IsFiltererd(‘Date’[Date]) || IsFiltered(‘Date’[Day of Month]) || …&lt;/p&gt;  &lt;p&gt;To simplify this, I simple used a count of days in the following code. If we count only one day, we’re on day level. Of course, the count is the more expensive operation, but for this example, I leave it that way (the date table is not really big). &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg 3):=      &lt;br /&gt;if (       &lt;br /&gt;&amp;#160; Count('Date'[Date])=1,       &lt;br /&gt;&amp;#160; &lt;font color="#0000ff"&gt;AverageX(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; Summarize(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; datesinperiod(         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,LastDate('Date'[Date])         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , –30         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , DAY         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , 'Date'[Date]         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,[SalesAmountSum]         &lt;br /&gt;&amp;#160; )         &lt;br /&gt;&lt;/font&gt;&amp;#160; ,&amp;#160; &lt;br /&gt;&amp;#160; AverageX(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date'       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,&amp;quot;SalesAmountAvg&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,[SalesAmountAvg]       &lt;br /&gt;&amp;#160; )       &lt;br /&gt;)       &lt;br /&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Again the blue part of the formula is exactly the same as in our first approach. This part is taken whenever we’re on a day level. On higher levels, the aggregate is computed as a simple (not moving) average of all daily values.&lt;/p&gt;  &lt;p&gt;So, using the concepts of my previous post we were able to change the aggregation method to meet very sophisticated requirements. &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-7952576105842938567?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/7952576105842938567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/09/custom-aggregates-in-dax-bism-tabular_25.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7952576105842938567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7952576105842938567'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/09/custom-aggregates-in-dax-bism-tabular_25.html' title='Custom Aggregates in DAX / BISM Tabular (part 2)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-QElL2xLhfxM/Tn8tZtv1MfI/AAAAAAAAC04/y-H0JvKgVuw/s72-c/image_thumb10.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6933959773230639592</id><published>2011-09-11T15:30:00.001+02:00</published><updated>2011-09-11T15:30:25.005+02:00</updated><title type='text'>Custom Aggregates in DAX / BISM Tabular (part 1)</title><content type='html'>&lt;p align="right"&gt;SQL Server Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;Custom aggregates can be created using cube scripts in BISM multidimensional (SSAS OLAP cubes). How can we do this with BISM tabular? In many cases, simple DAX calculations can solve this for us.&lt;/p&gt;  &lt;p&gt;I’m referring to&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt; the example of my &lt;a href="http://ms-olap.blogspot.com/2011/08/semi-additive-measures-in-dax-bism.html" target="_blank"&gt;previous post&lt;/a&gt;&lt;/font&gt; about semi additive measures. Let’s say we’re monitoring the stock of two products we’re selling. For the totals we want to see the average stock over time. At least once a month we’re taking a snapshot of the stock. If we have more than one snapshot per month, the monthly total computes as the average of those snapshot. For aggregation above the month level we want to take the average of the monthly averages. At first, this looks like we only have to use average as the aggregation function. But the average of averaged values is not identical to the average of all values. Let’s take a look at this source data table:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-7jzjb7lo3jI/Tmy3zJjnliI/AAAAAAAAC0U/V81lNYaCJeA/s1600-h/t12.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t1" border="0" alt="t1" src="http://lh3.ggpht.com/-2UquQVY6P4M/Tmy30mnQf6I/AAAAAAAAC0Y/95-0Df8L1xE/t1_thumb.png?imgmax=800" width="244" height="233" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For product Notate we have a single measurement of 50 pieces in February. For product Quickerstill we have 8 distinct measurements with an average of 50 pieces in February. However, when we look at the total average for Quickerstill, the 8 distinct measurements in February result in a higher weight of the February average and therefore in a higher total average of 44 instead of 30:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/--2dyI6QNypI/Tmy31ez0pGI/AAAAAAAAC0c/wqfDPFTM9X0/s1600-h/t23.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t2" border="0" alt="t2" src="http://lh6.ggpht.com/-Mu7cfW1S-yI/Tmy32X4g1II/AAAAAAAAC0g/2xo5xxDIsLA/t2_thumb1.png?imgmax=800" width="393" height="135" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The Average Stock measure in this example is the same semi additive measure as in &lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt;&lt;a href="http://ms-olap.blogspot.com/2011/08/semi-additive-measures-in-dax-bism.html" target="_blank"&gt;my previous post&lt;/a&gt;&lt;/font&gt; and computed like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;AVG Stock:=Sum([Stocklevel])/DISTINCTCOUNT([Date])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The requirement for the custom aggregate means, that we also want to see a total average of 30 for product Quickerstill (20+50+20=90, 90/3=30). This requirement is somewhat unusual as the computation above gives the “correct” average of all values. One interpretation is that the weight for computing the average of two or more months is not influenced by the number of measurements within the month. &lt;/p&gt;  &lt;p&gt;We can achieve this in a way that is very similar to the semi additive calculations from my last post. Here is the resulting formula:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Special AVG Stock:=AVERAGEX(SUMMARIZE('Stock',[Year],[Month],&amp;quot;AvgStock&amp;quot;,AVERAGE([Stocklevel])),[AvgStock])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This formula simply summarizes the average stock at a grouping level of year and month. Then, in a second step, it takes these values and computes the average of them. By doing so, we have broken the aggregation into two layers. First we average by month, then we take the average of those values.&lt;/p&gt;  &lt;p&gt;Here is the resulting table using the new aggregate:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-MCgngMe2B-A/Tmy33dOdUPI/AAAAAAAAC0k/WI5_Oe8PRLQ/s1600-h/t37.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t3" border="0" alt="t3" src="http://lh6.ggpht.com/-4v66qyFv0Wc/Tmy34VPUziI/AAAAAAAAC0o/WL_JLZvUlLo/t3_thumb3.png?imgmax=800" width="460" height="146" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And after expanding the February values (2nd month) we clearly see the how our custom aggregate works:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-MXzYNFQ_npg/Tmy36aGDNJI/AAAAAAAAC0s/zvHEeJC7p1c/s1600-h/p19.png"&gt;&lt;img style="display: inline" title="p1" alt="p1" src="http://lh6.ggpht.com/-IbIIaZ8-SFk/Tmy37966qqI/AAAAAAAAC0w/r6dqW7f4sEQ/p1_thumb4.png?imgmax=800" width="570" height="278" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Of course, this is just a simple custom aggregate but it is remarkable that we didn’t need any kind of cube script with scope-statements to achieve this but only a very simple DAX expression.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6933959773230639592?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6933959773230639592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/09/custom-aggregates-in-dax-bism-tabular.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6933959773230639592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6933959773230639592'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/09/custom-aggregates-in-dax-bism-tabular.html' title='Custom Aggregates in DAX / BISM Tabular (part 1)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-2UquQVY6P4M/Tmy30mnQf6I/AAAAAAAAC0Y/95-0Df8L1xE/s72-c/t1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-5879602050157242047</id><published>2011-09-04T15:54:00.001+02:00</published><updated>2011-09-04T15:54:24.700+02:00</updated><title type='text'>Moving Averages in DAX vs. MDX</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008 | SQL Server 2008R2 | SQL Server Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;Yes, I’m a supporter of equal rights of DAX and MDX. And like many others, I can’t wait to have BISM multidimensional (aka OLAP Cubes) supporting DAX so that we can use project Crescent on top of all those nice cubes. But back to my topic. My&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt;&lt;font style="style"&gt; &lt;a href="http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html" target="_blank"&gt;last post&lt;/a&gt;&lt;/font&gt;&lt;font style="background-color: #ffff00"&gt;&lt;/font&gt; was about moving averages in DAX and I was so sure I blogged about calculating them in MDX before… but I didn’t. This is not fair.&lt;/p&gt;  &lt;p&gt;On the other hand, Mosha Pasumansky, the godfather of MDX, wrote an &lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/09/04/moving-averages-in-mdx.aspx" target="_blank"&gt;excellent and very complete article&lt;/a&gt; about this topic and I can only suggest reading it. It doesn’t only cover simple moving averages but also weighted and exponential ones. Also Bill Pearson wrote a very good step-by-step guide about this topic. You can find it &lt;a href="http://www.databasejournal.com/features/mssql/article.php/3411651/MDX-in-Analysis-Services-Mastering-Time--Moving-Averages---Another-Approach.htm" target="_blank"&gt;here&lt;/a&gt; and I can only suggest reading it.&lt;/p&gt;  &lt;p&gt;So, basically there is no need for me to write another article about this. Therefore this will be a very short blog post… ah, I just remembered something I may write about. Mosha and Bill both investigated on the calculation of moving averages within a query. In the context of a specific query, things are sometimes easier compared to the situation where you create a cube measure that has to work under different query conditions. For example, you cannot be sure which hierarchy has been used. &lt;/p&gt;  &lt;p&gt;The first thing that comes into mind is the wizard for adding time intelligence. This wizard does a pretty good job. The main result is a short piece of code that is inserted into the cube script. This piece of code looks similar to the following example:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Scope(      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Measures].members       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }       &lt;br /&gt;) ;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;/*Three Month Moving Average*/&amp;#160; &lt;br /&gt;&amp;#160; (       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar Date Calculations].[Three Month Moving Average],       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [Date].[Month Name].[Month Name].Members,       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [Date].[Date].Members       &lt;br /&gt;&amp;#160; )&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; =       &lt;br /&gt;&amp;#160; Avg(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ParallelPeriod(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar].[Month],       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 2,       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar].CurrentMember       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : [Date].[Calendar].CurrentMember       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; *       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar Date Calculations].[Current Date]       &lt;br /&gt;&amp;#160; ) ;&amp;#160; &lt;br /&gt;End Scope ;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The result can be intuitively used by end users as they simply have to choose in which kind of view the data should appear (actual, three month moving average or any other calculation generated by the wizard, for example year-to-day or year-over-year growth). Also, this computation is focusing on the data dimension, not the specific measure, so it can be used for any measure in the cube.&lt;/p&gt;  &lt;p&gt;In my last post I used a DAX calculation that computed the moving average based on the last date in the current interval. We can do pretty much the same in MDX by “translating” the DAX formula to MDX. Here is the calculation for a cube calculated member:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].SalesAmountAvg30d AS&amp;#160; &lt;br /&gt;Avg(       &lt;br /&gt;&amp;#160; LASTPERIODS(&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 30       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , tail(descendants([Date].[Calendar].currentmember,[Date].[Calendar].[Date]),1).item(0)       &lt;br /&gt;&amp;#160; )       &lt;br /&gt;&amp;#160; , [Measures].[Internet Sales Amount]       &lt;br /&gt;);&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;After defining this measure we can use it in a query or within a pivot table. Here’s the result from a query:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select {[Measures].[Internet Sales Amount], [Measures].[SalesAmountAvg30d]} on 0,      &lt;br /&gt;descendants([Date].[Calendar].[Calendar Year].&amp;amp;[2003])&amp;#160; on 1       &lt;br /&gt;from [Adventure Works]&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-LaNxgSNbUMM/TmODA8EPilI/AAAAAAAAC0A/LSMi7vyKxFA/s1600-h/t13.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t1" border="0" alt="t1" src="http://lh6.ggpht.com/-SL33CdMjkUE/TmODCd3lrEI/AAAAAAAAC0E/bL2ly_eWCZM/t1_thumb1.png?imgmax=800" width="311" height="318" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If you compare these values with the values from my &lt;a href="http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html" target="_blank"&gt;last post&lt;/a&gt; you see that the values are absolutely identical (just the order of the values differs because of the way I wrote the query). Here are both definitions side by side:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="710"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="359"&gt;         &lt;p align="center"&gt;&lt;strong&gt;MDX&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="349"&gt;         &lt;p align="center"&gt;&lt;strong&gt;DAX&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="359"&gt;&lt;font face="Courier New"&gt;Avg(            &lt;br /&gt;LASTPERIODS(&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160; 30             &lt;br /&gt;&amp;#160;&amp;#160; , tail(             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; descendants(             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar].currentmember             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,[Date].[Calendar].[Date]             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,1             &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ).item(0)             &lt;br /&gt;&amp;#160;&amp;#160; )             &lt;br /&gt;&amp;#160;&amp;#160; , [Measures].[Internet Sales Amount]             &lt;br /&gt;);&lt;/font&gt;&lt;/td&gt;        &lt;td valign="top" width="349"&gt;&lt;font face="Courier New"&gt;AverageX(            &lt;br /&gt;Summarize(             &lt;br /&gt;&amp;#160; datesinperiod('Date'[Date]&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160; , LastDate('Date'[Date]),-30,DAY)&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160; ,'Date'[Date]&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;             &lt;br /&gt;&lt;/font&gt;&lt;font color="#ff0000"&gt;&lt;font color="#000000" face="Courier New"&gt;&amp;#160;&amp;#160; , calculate(&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum('Internet Sales'[Sales Amount]),               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ALLEXCEPT('Date','Date'[Date])               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )&lt;/font&gt;&lt;strong&gt;&lt;font face="Courier New"&gt;                &lt;br /&gt;&lt;/font&gt;&lt;/strong&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;&amp;#160;&amp;#160; )            &lt;br /&gt;&amp;#160;&amp;#160; ,[SalesAmountSum]             &lt;br /&gt;)&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Again, the idea (approach) is the same in both cases, therefore both definitions are similar. However, in my opinion the DAX syntax is a bit harder to read in this case. Especially the CALCULATE(…, ALLEXCEPT(…)) makes it harder to understand. In MDX, we rely on attribute relationship for this purpose but in DAX we need to “break” the context manually.&lt;/p&gt;  &lt;p&gt;Now, let’s do some performance tests. In order to compare performance I used these queries&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="703"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="359"&gt;         &lt;p align="center"&gt;&lt;strong&gt;MDX&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="342"&gt;         &lt;p align="center"&gt;&lt;strong&gt;DAX&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="359"&gt;         &lt;p&gt;&lt;font face="Courier New"&gt;with              &lt;br /&gt;member SalesAmountAvg AS&amp;#160; &lt;br /&gt;&amp;#160; Avg(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; LASTPERIODS(&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;strong&gt;&lt;font color="#ff0000"&gt;30&lt;/font&gt;&lt;/strong&gt;               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , tail(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; descendants(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; [Date].[Calendar].currentmember               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,[Date].[Calendar].[Date]),1               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ).item(0)               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , [Measures].[Internet Sales Amount]               &lt;br /&gt;&amp;#160; )               &lt;br /&gt;&amp;#160;&lt;/font&gt;&lt;/p&gt;          &lt;p&gt;&lt;font face="Courier New"&gt;select              &lt;br /&gt;{               &lt;br /&gt;&amp;#160; [Measures].[Internet Sales Amount]               &lt;br /&gt;&amp;#160; , SalesAmountAvg               &lt;br /&gt;} on 0,               &lt;br /&gt;descendants([Date].[Calendar].[All Periods],,LEAVES) on 1               &lt;br /&gt;              &lt;br /&gt;from [Adventure Works]&lt;/font&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="342"&gt;         &lt;p&gt;&lt;font face="Courier New"&gt;define              &lt;br /&gt;measure 'Internet Sales'[SalesAmountAvg] =               &lt;br /&gt;&amp;#160; AverageX(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; Summarize(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; datesinperiod('Date'[Date]&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , LastDate('Date'[Date]),&lt;strong&gt;&lt;font color="#ff0000"&gt;-30&lt;/font&gt;&lt;/strong&gt;,DAY)&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , calculate(&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Sum(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Internet Sales'[Sales Amount])               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,ALLEXCEPT(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 'Date','Date'[Date])               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; )               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )               &lt;br /&gt;&amp;#160;&amp;#160; ,[SalesAmountSum]               &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;          &lt;p&gt;           &lt;br /&gt;&lt;font face="Courier New"&gt;evaluate (              &lt;br /&gt;&amp;#160; addcolumns(               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; values('Date'[Date])               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,&amp;quot;Internet Sales Amount&amp;quot;               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , SumX(relatedtable               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ('Internet Sales'),[Sales Amount])               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,&amp;quot;SalesAmountAvg&amp;quot;,               &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; 'Internet Sales'[SalesAmountAvg]               &lt;br /&gt;&amp;#160; )               &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Both queries return exactly the same results (you may add an “order by ‘Date’[Date]” at the end of the DAX query in order to have the dates returned in the same order as from the MDX query).&lt;/p&gt;  &lt;p&gt;For the MDX queries I cleared the cache before running the queries. I changed the number of days (number of days to include in the average, written bold, red in the queries above) and got the following results. For number of days = 0 I took out the calculation and left only the sales amount as aggregate. Time was measured in seconds using SQL Server Management Studio (on a virtual machine, old hardware).&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="501"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="72"&gt;         &lt;p align="right"&gt;&lt;strong&gt;&amp;#160;&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="71"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=0&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=10&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="77"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=30&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="78"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=50&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="66"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=100&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="59"&gt;         &lt;p align="right"&gt;&lt;strong&gt;n=1000&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="72"&gt;&lt;strong&gt;MDX&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="71"&gt;         &lt;p align="right"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="77"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="78"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="66"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="59"&gt;         &lt;p align="right"&gt;4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="72"&gt;&lt;strong&gt;DAX&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="72"&gt;         &lt;p align="right"&gt;0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="78"&gt;         &lt;p align="right"&gt;9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="80"&gt;         &lt;p align="right"&gt;9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="82"&gt;         &lt;p align="right"&gt;9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="74"&gt;         &lt;p align="right"&gt;9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="68"&gt;         &lt;p align="right"&gt;12&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-WjJJfaCHK3Q/TmODCwhAxsI/AAAAAAAAC0I/g31nP_5VRvE/s1600-h/t27.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t2" border="0" alt="t2" src="http://lh6.ggpht.com/-ZrtqYaHwLtc/TmODD2jzuEI/AAAAAAAAC0M/pYDCXdJYk0o/t2_thumb3.png?imgmax=800" width="446" height="286" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;When looking at these results I was somewhat surprised. Not about the situation that the DAX query took longer to execute. Please keep in mind that I’m running the queries on an early preview of the product so I suppose there is still a lot of debugging and internal logging going on here. We will have to wait for the final product to make a comparison. What surprises me was the fact that the DAX query time did not go up significantly with higher values of n. For the MDX engine I was pretty sure that it would perform this way because we have mature and a well built cache behind it. So, although we’re increasing the number of computed cells dramatically (with higher values for n), the MDX query performance should almost be constant as we have a lot of overlapping calculations here. But also the current DAX engine performs in the same way that shows how very well the DAX engine is implemented. This is a pretty good result and we can expect a lot of performance from this new DAX query engine.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-5879602050157242047?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/5879602050157242047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/09/moving-averages-in-dax-vs-mdx.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5879602050157242047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5879602050157242047'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/09/moving-averages-in-dax-vs-mdx.html' title='Moving Averages in DAX vs. MDX'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-SL33CdMjkUE/TmODCd3lrEI/AAAAAAAAC0E/bL2ly_eWCZM/s72-c/t1_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-7013545170546531204</id><published>2011-08-21T15:26:00.001+02:00</published><updated>2011-08-21T15:26:49.318+02:00</updated><title type='text'>Moving Average in DAX / BISM Tabular</title><content type='html'>&lt;p align="right"&gt;SQL Server Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;Alberto Ferrari already wrote about &lt;a href="http://sqlblog.com/blogs/alberto_ferrari/archive/2011/01/26/powerpivot-stocks-exchange-and-the-moving-average.aspx" target="_blank"&gt;calculating moving averages in DAX by using a calculated column&lt;/a&gt;. I’d like to present a different approach here by using a calculated measure. For the moving average I’m calculating a daily moving average (over the last 30 days) here.&lt;/p&gt;  &lt;p&gt;For my example, I’m using the PowerPivot workbook which can be downloaded as part of the SSAS Tabular Model Projects from the &lt;a href="http://msftdbprodsamples.codeplex.com/releases/view/55330" target="_blank"&gt;Denali CTP 3 samples&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;In this post, I’m developing the formula step by step. However, if you are in a hurry, you might directly want to jump to the &lt;strong&gt;final results&lt;/strong&gt; below.&lt;/p&gt;  &lt;p&gt;With calendar year 2003 on the filter, date on columns and sales amount (from table Internet Sales) in the details, the sample data looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/--V1ZluW_4NA/TlEHIZ5G4RI/AAAAAAAACy0/ja1DrD6_tQU/s1600-h/t12.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t1" border="0" alt="t1" src="http://lh5.ggpht.com/-rNVxdzvStdU/TlEHJRhylDI/AAAAAAAACy4/m_ad-iD7-7U/t1_thumb.png?imgmax=800" width="187" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In each row’s context, the expression 'Date'[Date] gives the current context, i.e. the date for this row. But from a calculated measure we cannot refer to this expression (as there is no current row for the Date table), instead we have to use an expression like LastDate('Date'[Date]).&lt;/p&gt;  &lt;p&gt;So, in order to get the last thirty days we can use this expression&lt;/p&gt; &lt;font face="Courier New"&gt;DatesInPeriod('Date'[Date],LastDate('Date'[Date]),-30,DAY)    &lt;br /&gt;&lt;/font&gt;  &lt;p&gt;We can now summarize our internet sales for each of those days by using the summarize function:&lt;/p&gt; &lt;font face="Courier New"&gt;Summarize(    &lt;br /&gt;&amp;#160; DatesInPeriod('Date'[Date],LastDate('Date'[Date]),-30,DAY)     &lt;br /&gt;&amp;#160; ,'Date'[Date]     &lt;br /&gt;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;     &lt;br /&gt;&amp;#160; , Sum('Internet Sales'[Sales Amount])     &lt;br /&gt;)&lt;/font&gt;   &lt;p&gt;And finally, we’re using the DAX function AverageX to compute the average of those 30 values:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg):=AverageX(      &lt;br /&gt;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; DatesInPeriod('Date'[Date],LastDate('Date'[Date]),-30,DAY)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , Sum('Internet Sales'[Sales Amount])       &lt;br /&gt;&amp;#160; )       &lt;br /&gt;&amp;#160; ,[SalesAmountSum]       &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This is the calculation that we are using in our Internet Sales table as shown in the screenshot below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-lAzEs7uI3-I/TlEHKbwDBLI/AAAAAAAACy8/TIxFuzYyLwM/s1600-h/t23.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t2" border="0" alt="t2" src="http://lh4.ggpht.com/-1rSIJMb3nXY/TlEHLoEgJKI/AAAAAAAACzA/zd3mRyYlM4k/t2_thumb1.png?imgmax=800" width="414" height="300" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;When adding this calculation to the pivot table from above, the result looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-bLLTmUmYvpw/TlEHOWtVmEI/AAAAAAAACzE/3mJ43VhDkKo/s1600-h/image4.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-iRN_4unJUDw/TlEHQe0BJsI/AAAAAAAACzI/KFd8SLZXPUw/image_thumb2.png?imgmax=800" width="249" height="443" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Looking at the result it seems that we don’t have any data prior to January 1, 2003: The first value for the moving average is identical to the day value (there are no rows before that date). The second value for the moving average is actually the average of the first two days and so on. This is not quite correct but I’m getting back to this problem in a second. The screenshot shows the computation for the moving average of at January 31 as the average of the daily values from January 2 to 31.&lt;/p&gt;  &lt;p&gt;Our calculated measure also works fine when filters are applied. In the following screenshot I used two product categories for the data series:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-7QgAoFU33so/TlEHVPQUF7I/AAAAAAAACzM/sbB9eia8AZA/s1600-h/t33.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t3" border="0" alt="t3" src="http://lh6.ggpht.com/-q6lO5L7lros/TlEHXRgHMVI/AAAAAAAACzQ/6tflm-QioXY/t3_thumb1.png?imgmax=800" width="616" height="389" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;How does our calculated measure work on higher aggregation levels? In order to find out, I’m using the Calendar hierarchy on the rows (instead of the date). For simplicity I removed the semester and quarter levels using Excel’s pivot table options (Show/Hide fields option).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-A6Flw0nQPZc/TlEHZVdRc-I/AAAAAAAACzU/y7oGern332k/s1600-h/t47.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t4" border="0" alt="t4" src="http://lh5.ggpht.com/-2n8Gpd7Di20/TlEHbCw7rYI/AAAAAAAACzY/LeTnYanjrgE/t4_thumb3.png?imgmax=800" width="404" height="326" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you can see, the calculation still works fine. Here, the monthly aggregate is the moving average for the last day of the specific month. You can see this clearly for January (value of 14,215.01 also appears in the screenshot above as the value for January 31). If this was the business requirement (which sounds reasonable for a daily average), then the aggregation works fine on a monthly level (otherwise we will have to fine tune our calculation and this will be a topic of am upcoming post).&lt;/p&gt;  &lt;p&gt;But although the aggregation makes sense on a monthly level, if we expand this view to the day level you’ll see that our calculated measure simply returns the sales amount for that day, not the average of the last 30 days anymore:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-efl5IDjK9Pc/TlEHb_LDGGI/AAAAAAAACzc/c4zGEiZhJKE/s1600-h/t53.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t5" border="0" alt="t5" src="http://lh6.ggpht.com/-MZ26YRyJCtg/TlEHdLdMduI/AAAAAAAACzg/8DgKsLtNg44/t5_thumb1.png?imgmax=800" width="327" height="425" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;How can this be. The problem results from the context in which we calculate our sum, as highlighted in the following code:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg):=AverageX(      &lt;br /&gt;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; datesinperiod('Date'[Date],LastDate('Date'[Date]),-30,DAY)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;       &lt;br /&gt;&lt;font style="background-color: #ffff00"&gt;&amp;#160;&amp;#160;&amp;#160; , Sum('Internet Sales'[Sales Amount])&lt;/font&gt;       &lt;br /&gt;&amp;#160; )       &lt;br /&gt;&amp;#160; &lt;/font&gt;&lt;font face="Courier New"&gt;,[SalesAmountSum]      &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Since we evaluate this expression over the given dates period, the only context that is overwritten here, is 'Date'[Date]. In our hierarchy we’re using different attributes from our dimension (Calendar Year, Month and Day Of Month). As this context is still present, the calculation is also filtered by those attributes. And this explains why we the current day’s context is still present for each line. To get things clear, as long as we evaluate this expression outside of a date context, everything is fine as the following DAX query shows when being executed by Management Studio on the Internet Sales perspective of our model (using the tabular database with the same data):&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;evaluate (      &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; datesinperiod('Date'[Date],date(2003,1,1),-5,DAY)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; , Sum('Internet Sales'[Sales Amount])       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; )       &lt;br /&gt;)       &lt;br /&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Here, I reduced the time period to 5 days and also set a fixed date as LastDate(…) would result in the last date of my date dimension table for which no data is present in the sample data. Here is the result from the query:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-v-05pYYhHUM/TlEHd0JgEdI/AAAAAAAACzk/zoRDNM8tZWI/s1600-h/t62.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t6" border="0" alt="t6" src="http://lh6.ggpht.com/-vZe909atTAk/TlEHe8D187I/AAAAAAAACzo/1U3k8heLHyQ/t6_thumb.png?imgmax=800" width="214" height="140" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;However, after setting a filter to 2003, no data rows outside of 2003 will be included in the sum. This explains the remark above: It looked like we only have data starting from January 1, 2003. And now, we know why: The year 2003 was on the filter (as you can see in the very first screen shot of this post) and therefore it was present when calculating the sum. Now, all we have to do is to get rid of those additional filters because we’re already filtering our results by Date. The easiest way to do so, is to use the Calculate function and apply ALL(…) for all attributes for which we want to remove the filter. As we have some of those attributes (Year, Month, Day, Weekday, …) and we want to remove the filter from all of them but the date attribute, the shortcut function ALLEXCEPT is very useful here.&lt;/p&gt;  &lt;p&gt;If you do have an MDX background you will wonder why we don’t get a similar problem when using SSAS in OLAP mode (BISM Multidimensional). The reason is that our OLAP database has attribute relationships, so after setting the date (key) attribute, the other attributes are &lt;em&gt;automatically&lt;/em&gt; changed too and we don’t have to take care about this (see my &lt;a href="http://ms-olap.blogspot.com/2010/03/effects-of-attribute-relationship.html" target="_blank"&gt;post here&lt;/a&gt;). But in the tabular model we don’t have attribute relationships (not even a true key attribute) and therefore we need to eliminate unwanted filters from our calculations.&lt;/p&gt;  &lt;p&gt;So here we are with the …&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Final results&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Sales Amount (30d avg):=&lt;/font&gt;&lt;font face="Courier New"&gt;AverageX(      &lt;br /&gt;&amp;#160; Summarize(       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; datesinperiod('Date'[Date],LastDate('Date'[Date]),-30,DAY)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,'Date'[Date]       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; , &amp;quot;SalesAmountSum&amp;quot;       &lt;br /&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;&amp;#160;&amp;#160;&amp;#160; , calculate(Sum('Internet Sales'[Sales Amount]), ALLEXCEPT('Date','Date'[Date]))          &lt;br /&gt;&lt;/font&gt;&lt;/strong&gt;&amp;#160; )       &lt;br /&gt;,[SalesAmountSum]       &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;And this is our final pivot table in Excel:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-kAy9ZQBmaG8/TlEHhqW4wXI/AAAAAAAACzs/t54zplaQKJM/s1600-h/t74.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t7" border="0" alt="t7" src="http://lh5.ggpht.com/-zrAPqOecQDo/TlEHjUcOJiI/AAAAAAAACzw/R3vD61_oZnA/t7_thumb2.png?imgmax=800" width="301" height="549" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;To illustrate the moving average, here is the same extract of data in a chart view (Excel):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-vS64mDNa62Y/TlEHkyffHxI/AAAAAAAACz0/EcHb6-8JTGg/s1600-h/t83.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t8" border="0" alt="t8" src="http://lh6.ggpht.com/-Y3G3s6P7G8g/TlEHl6LRAKI/AAAAAAAACz4/Tmd99n5Skx0/t8_thumb1.png?imgmax=800" width="674" height="311" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Although we filtered our data on 2003 the moving average for the first 29 days of 2003 correctly takes the corresponding days of 2002 into account. You will recognize the values for January 30 and 31 from our first approach as these were the first days for which our first calculation had a sufficient amount of data (full 30 days).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-7013545170546531204?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/7013545170546531204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7013545170546531204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7013545170546531204'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/08/moving-average-in-dax-bism-tabular.html' title='Moving Average in DAX / BISM Tabular'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-rNVxdzvStdU/TlEHJRhylDI/AAAAAAAACy4/m_ad-iD7-7U/s72-c/t1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-7146308963125318456</id><published>2011-08-13T17:17:00.001+02:00</published><updated>2011-08-13T17:17:29.061+02:00</updated><title type='text'>Semi additive measures in DAX / BISM Tabular</title><content type='html'>&lt;p align="right"&gt;SQL Server Denali | PowerPivot&lt;/p&gt;  &lt;p&gt;Semi additive measures, i.e. measures that have to be aggregated differently over different dimensions, are commonly used in BI solutions. One example could be stock levels. Of course we don’t want to sum them up over time, but only over product, location etc. For the time, a different aggregation is used, for example average or last value.&lt;/p&gt;  &lt;p&gt;The following example shows how to implement some of the most commonly used semi additive measures in DAX.&lt;/p&gt;  &lt;p&gt;In my example I’m using PowerPivot (Denali edition), but the same calculations can be used in a BISM Tabular model in Visual Studio.&lt;/p&gt;  &lt;p&gt;In order to keep things simple, I’m using just a short table of test data:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-p6xSwMNQ6Lc/TkaVS4hnepI/AAAAAAAACx8/4gyZcI7xkyw/s1600-h/p18.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="p1" border="0" alt="p1" src="http://lh4.ggpht.com/-YpKkqqU90pU/TkaVUYg4m8I/AAAAAAAACyA/b_vxhgnvI9U/p1_thumb4.png?imgmax=800" width="222" height="450" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you see, we only have two products with monthly stock levels in 2010 and 2011.&lt;/p&gt;  &lt;p&gt;Although not needed for my semi additive measures, I created additional columns in my PowerPivot sheet for convenient reasons: Year, Month, Day (using the corresponding DAX-function with the same name). I also set the newly created columns, as well as the Stocklevel column to hidden (it makes no sense to sum up the stock level). Although the date information is kept in the same table as the data to keep things simple for this example, I encourage to build a separate date dimension table here (similar idea as with a date dimension in a multidimensional model).&lt;/p&gt;  &lt;p&gt;Finally, I created a hierarchy named ‘Calendar’ on my newly created date columns:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-z7cQdU_c8KY/TkaVVOhMgjI/AAAAAAAACyE/Fivz2yqvkqA/s1600-h/p113.png"&gt;&lt;img style="display: inline" title="p1" alt="p1" src="http://lh5.ggpht.com/-TcwRnTxy19U/TkaVWI7exeI/AAAAAAAACyI/SxAxVsxwuu8/p1_thumb7.png?imgmax=800" width="369" height="326" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Now we’re ready for the semi additive measures. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Average (over time)&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Let’s start with an easy one, the average over time. Since we can easily compute the distinct count of our date values, we can simply add up the stock level and divide it by the distinct count. In my example the formula looks like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Avg Stock:=Sum([Stocklevel])/DISTINCTCOUNT([Date])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Last value (over time)&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In order to compute the last value, the DAX function LASTDATE comes in handy. Here is the formula:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Last Stock:=CALCULATE(SUM([Stocklevel]),LASTDATE('Stock'[Date]))&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Min/Max value (over time)&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;For min/max we have to be a little bit more tricky. In the approach I’m showing here, I’m grouping the table by date by using the SUMMARIZE function and the SUM aggregation. Then I’m using the function MINX or MAXX to find the minimal or maximal value.&lt;/p&gt;  &lt;p&gt;Here are the two formulas:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Max Stock:=MINX(SUMMARIZE('Stock','Stock'[Date],&amp;quot;SumByDate&amp;quot;,SUM('Stock'[Stocklevel])),[SumByDate])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Max Stock:=MAXX(SUMMARIZE('Stock','Stock'[Date],&amp;quot;SumByDate&amp;quot;,SUM('Stock'[Stocklevel])),[SumByDate])&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;To understand these formulas you can see the effect pretty well after restoring the PowerPivot workbook to a SSAS server in tabular mode. After doing so, we can create a query to show the result of the inner SUMMARIZE function using this DAX query:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;evaluate(      &lt;br /&gt;SUMMARIZE('Stock','Stock'[Date],&amp;quot;SumByDate&amp;quot;,SUM('Stock'[Stocklevel]))       &lt;br /&gt;)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Here’s the result:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-fE4dCP_728M/TkaVYNShqSI/AAAAAAAACyM/_j9bHsI7F9I/s1600-h/t45.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t4" border="0" alt="t4" src="http://lh6.ggpht.com/-B_54HmEr7d0/TkaVY9EUl1I/AAAAAAAACyQ/MUamjPyME3I/t4_thumb1.png?imgmax=800" width="150" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The MinX or MaxX function simply takes the lowest/highest value from this table.&lt;/p&gt;  &lt;p&gt;Now let’s see, how this looks like in Excel. The following screenshot shows the calculations in my PowerPivot sheet:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-JdAjgShwksk/TkaVaNJygKI/AAAAAAAACyU/CBFZYw8fUKs/s1600-h/t24.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t2" border="0" alt="t2" src="http://lh6.ggpht.com/-sA4g8UUY49A/TkaVcQTmNdI/AAAAAAAACyY/0pY2ILmkTWo/t2_thumb2.png?imgmax=800" width="600" height="242" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Here’s the result in Excel&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-mQ6QHHgNzZE/TkaVddCkahI/AAAAAAAACyc/rsBrteDp2GM/s1600-h/t18.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t1" border="0" alt="t1" src="http://lh6.ggpht.com/-jlmJwG7LKAw/TkaVerIlU-I/AAAAAAAACyg/CBvvna6YKIA/t1_thumb4.png?imgmax=800" width="294" height="343" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And of course, the aggregations also work correctly when filtering the data as shown below (single select on product and multi select on months):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-4NOjBTMDdFY/TkaVfQPBpzI/AAAAAAAACyk/JdiydwprEkk/s1600-h/t32.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t3" border="0" alt="t3" src="http://lh5.ggpht.com/-79Q9yZ6oX9c/TkaVge9N5iI/AAAAAAAACyo/lLE_2Zjnyzc/t3_thumb.png?imgmax=800" width="244" height="127" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Another cool feature is that besides DAX we can still use standard MDX to query our SSAS tabular model, for example:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select      &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;{[Measures].[Avg Stock],[Measures].[Last Stock],      &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;[Measures].[Min Stock],[Measures].[Max Stock]} on 0,      &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;[Stock].[Calendar].[Year] on 1      &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;from [Model]&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-VDL2M7dvEfo/TkaVhKJelhI/AAAAAAAACys/qcbaqajE82M/s1600-h/t54.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="t5" border="0" alt="t5" src="http://lh3.ggpht.com/-wJbFOzq8-ic/TkaVh0IFgzI/AAAAAAAACyw/pHcDMVXYYqs/t5_thumb2.png?imgmax=800" width="347" height="87" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;A final word about empty (missing) rows: The above calculations need a value of zero as the information that there is no stock at that month. If the value is left blank (no source data row at all), the month itself is treated as missing (interpretation more like we didn’t have this product in our portfolio at all).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-7146308963125318456?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/7146308963125318456/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/08/semi-additive-measures-in-dax-bism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7146308963125318456'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/7146308963125318456'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/08/semi-additive-measures-in-dax-bism.html' title='Semi additive measures in DAX / BISM Tabular'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-YpKkqqU90pU/TkaVUYg4m8I/AAAAAAAACyA/b_vxhgnvI9U/s72-c/p1_thumb4.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-1031599316736775559</id><published>2011-07-30T13:30:00.001+02:00</published><updated>2011-07-30T13:30:30.949+02:00</updated><title type='text'>Parallel hierarchies in a parent-child structure</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;This post is about a problem I faced some years ago. The source system was SAP with user defined hierarchies, in this case within the cost center and cost type tables. Parallel hierarchies are well supported in SQL Server BI but in this case, users in SAP could define multiple hierarchies on their own and they wanted these hierarchies to be also available in the OLAP cube. For example, costs associated with the cost center 1000 should be analyzed as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-YsrOxlDvL4I/TjPrN8dE8dI/AAAAAAAACwA/2KQeQj1QyBI/s1600-h/hierarchy28.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="hierarchy" border="0" alt="hierarchy" src="http://lh5.ggpht.com/-jZ8qYk9HgD8/TjPrOxN6rmI/AAAAAAAACwE/gGTOyw0VCgE/hierarchy_thumb14.png?imgmax=800" width="344" height="224" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;All costs that are booked on cost center 1000 have to appear in the hierarchy as shown in the sketch. And end-users may also be able to create new hierarchies (for example to analyze a certain project). Of course there may be better ways to model this but in this case we had basically two tables for the cost centers:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;table CC (Cost Center) &lt;/li&gt;    &lt;li&gt;table CCG (Cost Center Group) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Table CC contains all cost centers (for example the above cost center 1000) together with additional information (like name, responsible person etc.) while table CCG contains the hierarchy. in CCG we basically find two columns:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Node name &lt;/li&gt;    &lt;li&gt;Parent node name &lt;/li&gt; &lt;/ul&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="340"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="171"&gt;&lt;strong&gt;Node&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="167"&gt;&lt;strong&gt;Parent Node&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;1000&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Internal_HR_DE&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;Internal_HR_DE&lt;/td&gt;        &lt;td valign="top" width="167"&gt;HR_DE&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;HR_DE&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Germany&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;1000&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Marketing_DE&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;Marketing_DE&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Germany&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;Germany&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Corporate&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;Marketing_DE&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Marketing&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="171"&gt;Marketing&lt;/td&gt;        &lt;td valign="top" width="167"&gt;Corporate&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Facts (in this case actual or planned costs) are associated with the cost center number (for example 1000). Usually, parent-child hierarchies may be used in this case where we have a very dynamic structure and we do not know the number of levels. However, parent-child may only be used if each node has at most one parent. But here we find the cost center 1000 having two parents (Internal HR Costs DE and Marketing_DE). The same situation exists with the Marketing_DE node (having parents Marketing and Germany).&lt;/p&gt;  &lt;p&gt;The solution I’m presenting here is to create additional nodes until each node only has one parent. This is possible as each node of a parent-child hierarchy in SSAS has a name and a key property. So, the name will be identical, while the key will be different. In order to show the process, let’s add internal keys to each of the hierarchy elements.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-XZoFcrq0kZU/TjPrPqos9xI/AAAAAAAACwI/F3bpv5H2Tlg/s1600-h/hierarchy18.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="hierarchy1" border="0" alt="hierarchy1" src="http://lh4.ggpht.com/-zBSyeeq9j8g/TjPrQcI322I/AAAAAAAACwM/GGwYEhgr0HY/hierarchy1_thumb4.png?imgmax=800" width="322" height="248" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;What we have to do now is to create additional nodes for every node that has more than one parent. Let’s start with the ‘Marketing_DE’ node:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-1_H9JJ8vLrE/TjPrRP5rzqI/AAAAAAAACwQ/DPKaY-W4Rb4/s1600-h/hierarchy210.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="hierarchy2" border="0" alt="hierarchy2" src="http://lh5.ggpht.com/-663wHTzADzQ/TjPrSFiJghI/AAAAAAAACwU/hO9taxmy-xY/hierarchy2_thumb6.png?imgmax=800" width="402" height="226" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The additional node gets a new (internal) key, in this example the number 8. But there is still a node with multiple parents: the cost center 1000. Let’s also transform this into separate nodes:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-NrWuGiMxb3c/TjPrS1XlFGI/AAAAAAAACwY/ooEqgSBfr7c/s1600-h/hierarchy33.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="hierarchy3" border="0" alt="hierarchy3" src="http://lh4.ggpht.com/-1jw6mEZhiwQ/TjPrT5TliWI/AAAAAAAACwc/MEHfSOQXiAI/hierarchy3_thumb1.png?imgmax=800" width="400" height="217" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;After this step, each node has at most one parent and therefore the structure can be modeled as an SSAS parent-child hierarchy.&amp;#160; &lt;/p&gt;  &lt;p&gt;But now, we have to think about the fact rows. Without the hierarchy, facts would have been associated to the cost center by using the internal key (surrogate key), so for example 1000 € that are booked on cost center 1000 would appear in the fact table like&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="500"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="125"&gt;&lt;strong&gt;DateKey&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="125"&gt;&lt;strong&gt;CostCenterKey&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="125"&gt;&lt;strong&gt;…&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="125"&gt;&lt;strong&gt;Amount&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;20110630&lt;/td&gt;        &lt;td valign="top" width="125"&gt;1&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;1000&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;        &lt;td valign="top" width="125"&gt;…&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;But now, we have to associate this single fact row to three rows in the dimension table (as the cost center 1000 appears three times now). Therefore we have to use a many-to-many approach, so we add another table, a so called bridge table with the following rows:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="290"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="136"&gt;&lt;strong&gt;CostCenterKey&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="152"&gt;&lt;strong&gt;CostCenterDimKey&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;…&lt;/td&gt;        &lt;td valign="top" width="152"&gt;…&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;1&lt;/td&gt;        &lt;td valign="top" width="152"&gt;1&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;1&lt;/td&gt;        &lt;td valign="top" width="152"&gt;9&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;1&lt;/td&gt;        &lt;td valign="top" width="152"&gt;10&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;…&lt;/td&gt;        &lt;td valign="top" width="152"&gt;…&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For technical reasons, our fact table has to be linked to a dimension (of flat cost centers), which is also used by the bridge table. This is shown in the following image:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-N5q4lteOLBw/TjPrUXrx93I/AAAAAAAACwg/SZqJlKCgDuA/s1600-h/hierarchy43.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="hierarchy4" border="0" alt="hierarchy4" src="http://lh3.ggpht.com/-qThR18LoLlY/TjPrVQ2uuAI/AAAAAAAACwk/SB2PWkGqv7M/hierarchy4_thumb1.png?imgmax=800" width="486" height="114" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The most difficult part here is to “normalize” the parent-child structure. One way to do this is to use a stored procedure. Here is the code I used. Within this procedure, the following tables are used:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="658"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;masterdata.Costcenter&lt;/td&gt;        &lt;td valign="top" width="406"&gt;the flat table of cost centers (only leaf-level). The key field is the cost center number (for example 1000 for our cost center from above)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;masterdata.CostcenterGroup&lt;/td&gt;        &lt;td valign="top" width="406"&gt;the hierarchy structure as shown above&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;ods.CostcenterGroupExpanded&lt;/td&gt;        &lt;td valign="top" width="406"&gt;Output table: the expanded tree containing the fields of the table masterdata.Costcenter plus the following additional fields:          &lt;br /&gt;          &lt;br /&gt;          &lt;table border="1" cellspacing="0" cellpadding="2" width="401"&gt;&lt;tbody&gt;             &lt;tr&gt;               &lt;td valign="top" width="111"&gt;CostcenterKey&lt;/td&gt;                &lt;td valign="top" width="289"&gt;the new generated surrogate key&lt;/td&gt;             &lt;/tr&gt;              &lt;tr&gt;               &lt;td valign="top" width="111"&gt;ParentKey&lt;/td&gt;                &lt;td valign="top" width="289"&gt;the key of the parent node&lt;/td&gt;             &lt;/tr&gt;              &lt;tr&gt;               &lt;td valign="top" width="111"&gt;Level&lt;/td&gt;                &lt;td valign="top" width="289"&gt;a technical field used during iteration&lt;/td&gt;             &lt;/tr&gt;           &lt;/tbody&gt;&lt;/table&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Here is the code:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE PROCEDURE [dbo].[ExpandCostcenterGroup]      &lt;br /&gt;AS       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; SET NOCOUNT ON       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; truncate table ods.costcenterGroupExpanded       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; declare @level int       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; declare @affectedrows int       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; declare @totalrowcount int       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; set @level=0       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; insert into ods.costcenterGroupExpanded(costcentergroup,Parentgroup,Description1,Description2,Responsibility,AccountingArea)&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Select distinct costcenterGroup, Parent,Description1,Description2,Responsibility,AccountingArea from masterdata.costcentergroup       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; -- Initialize all keys&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; update ods.costcenterGroupExpanded       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; set ParentKey=(select min(costcenterkey) from ods.costcenterGroupExpanded where costcenterGroup=c.Parentgroup)&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; from ods.costcenterGroupExpanded as c       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; where not c.ParentGroup is null&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; set @affectedrows=1&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; while @affectedrows&amp;gt;0       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; begin       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Set @level=@level+1       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; set @totalrowcount=(select Count(*) from ods.costcenterGroupExpanded)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; insert into&amp;#160; ods.costcenterGroupExpanded(costcentergroup,Parentgroup,ParentKey,&amp;quot;level&amp;quot;,Description1,Description2,Responsibility,AccountingArea)       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; select distinct cparent.costcentergroup, cparent.Parentgroup,cparent.costcenterKey,@level,cparent.Description1,cparent.Description2,cparent.Responsibility,cparent.AccountingArea       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; from ods.costcenterGroupExpanded as cparent inner join ods.costcenterGroupExpanded as cchild on       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; cparent.Parentgroup=cchild.costcenterGroup       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; where cparent.ParentKey!=cchild.costcenterKey       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; and cchild.&amp;quot;Level&amp;quot;=@level-1       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; set @affectedrows=@@rowcount       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; end       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; return&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;GO&lt;/font&gt;     &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;To keep things simple, I truncate the output table CostcenterGroupExpanded here. However, there is a drawback with this approach: The surrogate keys may change after changes of the imported source tables. This will result in a problem for example for the Excel users. If you’re using filters like ‘show this element only’, only the key is stored.&lt;/p&gt;  &lt;p&gt;In order to avoid this you will need to store the mapping and the assigned surrogate key separately. Here it is necessary not only to store the combination of cost center/parent cost center/surrogate key but the whole branch up to the root instead. If you look at the example above you will find two entries of ‘Cost Center 1000’ –&amp;gt; ‘Marketing_DE’, so this is not unique. You have to store the full path up to the root for each node (not only for the leaf-nodes) to make it unique:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="525"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="384"&gt;&lt;strong&gt;Path&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="139"&gt;&lt;strong&gt;Given Surrogate Key&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;1000 –&amp;gt; Internal_HR_DE –&amp;gt; HR_DE –&amp;gt; Germany –&amp;gt; Corporate&lt;/td&gt;        &lt;td valign="top" width="139"&gt;1&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;1000 –&amp;gt; Marketing_DE –&amp;gt; Germany –&amp;gt; Corporate&lt;/td&gt;        &lt;td valign="top" width="139"&gt;9&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;1000 –&amp;gt; Marketing_DE –&amp;gt; Marketing –&amp;gt; Corporate&lt;/td&gt;        &lt;td valign="top" width="139"&gt;10&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;Marketing_DE –&amp;gt; Germany –&amp;gt; Corporate&lt;/td&gt;        &lt;td valign="top" width="139"&gt;7&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;Marketing_DE –&amp;gt; Marketing –&amp;gt; Corporate&lt;/td&gt;        &lt;td valign="top" width="139"&gt;8&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="384"&gt;…&lt;/td&gt;        &lt;td valign="top" width="139"&gt;&amp;#160;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;In order to store the full path up to the root level I recommend using a hash code (MD5 for example) as this is easier to handle as a long list of node names. In this case our additional key store table would look like this&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="527"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="379"&gt;&lt;strong&gt;PathMD5&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="146"&gt;&lt;strong&gt;Given Surrogate Key&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="379"&gt;417913d10ef49f5ff90db9db9f3d2569&lt;/td&gt;        &lt;td valign="top" width="146"&gt;1&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="379"&gt;8e27be6b156a52016e01dc049bc39126&lt;/td&gt;        &lt;td valign="top" width="146"&gt;9&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="379"&gt;52b1bcaec016e09d4086f37e63814aa5&lt;/td&gt;        &lt;td valign="top" width="146"&gt;10&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="379"&gt;…&lt;/td&gt;        &lt;td valign="top" width="146"&gt;&amp;#160;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;The sample code above does not manage this key store table so the keys may change a lot on each load. But for practical purposes you will have to add this key management to make sure the same node always gets the same key.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-1031599316736775559?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/1031599316736775559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/07/parallel-hierarchies-in-parent-child.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1031599316736775559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1031599316736775559'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/07/parallel-hierarchies-in-parent-child.html' title='Parallel hierarchies in a parent-child structure'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-jZ8qYk9HgD8/TjPrOxN6rmI/AAAAAAAACwE/gGTOyw0VCgE/s72-c/hierarchy_thumb14.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-5847695325952291857</id><published>2011-07-03T17:43:00.000+02:00</published><updated>2011-07-13T17:47:40.700+02:00</updated><title type='text'>Same measure in different granularity</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;In an &lt;a href="http://ms-olap.blogspot.com/2010/01/different-granularity-in-single.html"&gt;earlier post&lt;/a&gt; I wrote about handling different granularity in a dimension-fact relationship. This time I want to get back to this topic from the end-user perspective. To illustrate this, I use a very simple data model here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-h7_2-XFzs6o/Th29zh2AuHI/AAAAAAAACKw/_7zZh4fMTqg/s1600-h/Unbenannt5_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt5_thumb3" border="0" alt="Unbenannt5_thumb3" src="http://lh3.ggpht.com/-1o-LouZFidc/Th290QokpII/AAAAAAAACK0/Zf5lxiOt_kY/Unbenannt5_thumb3_thumb.png?imgmax=800" width="192" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For this model, we have daily revenue data and monthly revenue plan data. Here is the link from the measure groups to the dimensions:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-377CnpyLghI/Th2904J2dTI/AAAAAAAACK4/J6DIPFrTBKg/s1600-h/Unbenannt6_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt6_thumb1" border="0" alt="Unbenannt6_thumb1" src="http://lh3.ggpht.com/-TtIjgPIJIOA/Th291labLMI/AAAAAAAACK8/Fu2EzNX95BQ/Unbenannt6_thumb1_thumb.png?imgmax=800" width="484" height="130" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In this example, both measure groups only have one measure: revenue. Since measures are a dimension of its own in the cube, measure names have to be distinct, so I named the measures “Revenue” and “Plan Revenue”. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-oTS_SJSJql4/Th292HO6hUI/AAAAAAAACLA/CafwKC9fZiI/s1600-h/Unbenannt7_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt7_thumb1" border="0" alt="Unbenannt7_thumb1" src="http://lh5.ggpht.com/-L8NKu4saBDs/Th292sKhAwI/AAAAAAAACLE/c9C-DHUwRbE/Unbenannt7_thumb1_thumb.png?imgmax=800" width="140" height="87" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If your cube contains more data from other measure groups, things can soon be getting confusion with many measure groups. This post is about two common solutions in order to reduce the number of measure groups here.&lt;/p&gt;  &lt;p&gt;Two approaches are shown here:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Combining both measures in one measure group by using a calculated measure &lt;/li&gt;    &lt;li&gt;Introducing a scenario dimension &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Before we start, one remark. The last post was about using translations in the cube the add more flexibility regarding the naming of the measures. Since the measures are clearly associated with a measure group you could be tempted to use the same name for both measures (which is possible for translations). This is how the result would look like in the cube browser:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-fxN6Tn_sRzo/Th293P-_DRI/AAAAAAAACLI/5iT58Rgr6Ng/s1600-h/Unbenannt8_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt8_thumb1" border="0" alt="Unbenannt8_thumb1" src="http://lh4.ggpht.com/-icXLcOQVBhI/Th293ywjz8I/AAAAAAAACLM/PedNQN_K_fw/Unbenannt8_thumb1_thumb.png?imgmax=800" width="152" height="181" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;However, when using a Pivot client like Excel you usually don’t see the measure group anymore. A simple analysis could look like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-qsqq940B4eo/Th294oIpZ4I/AAAAAAAACLQ/WTGTNtKxEnc/s1600-h/Unbenannt9_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt9_thumb" border="0" alt="Unbenannt9_thumb" src="http://lh6.ggpht.com/-JpJFmmCIWC8/Th295r--_PI/AAAAAAAACLU/Vu9Uk4KTaGo/Unbenannt9_thumb_thumb.png?imgmax=800" width="192" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;So from your Pivot table you cannot tell which is the actual and which is the plan revenue. Therefore I recommend not to use the same name for different measures in the different measure groups.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Combining both measures in one measure group by using a calculated measure&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;One possible option is to add a calculated measure the the Revenue measure group. In order to do so, I renamed the plan revenue to “Plan Revenue Internal” and set its visibility to hidden. Then the calculation could be as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-aACCiMcjRLo/Th296AQbo1I/AAAAAAAACLY/S0EsEPGsdz8/s1600-h/Unbenannt10_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt10_thumb" border="0" alt="Unbenannt10_thumb" src="http://lh6.ggpht.com/-2Wf75VGSbLI/Th2969utsDI/AAAAAAAACLc/6zb9rWQ5ZHA/Unbenannt10_thumb_thumb.png?imgmax=800" width="244" height="202" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This is how the cube looks like in the cube browser:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-hiHGSPlgplg/Th297jQ6dzI/AAAAAAAACLg/gAYLhAv4hsQ/s1600-h/Unbenannt11_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt11_thumb1" border="0" alt="Unbenannt11_thumb1" src="http://lh4.ggpht.com/-7VNB7PoGzf4/Th298SSd9oI/AAAAAAAACLk/7fT8pSjN960/Unbenannt11_thumb1_thumb.png?imgmax=800" width="244" height="166" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;By “copying” the plan measure into the revenue measure group you only have one visible measure group left, while both values are still separated by the name.&lt;/p&gt;  &lt;p&gt;As I started the post about the different granularity, how does this look like at the detail level? The following screen shot shows the date dimension at the day level:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-k0WlfWCcIzs/Th298-lEDyI/AAAAAAAACLo/uEyufFECAU4/s1600-h/Unbenannt12_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt12_thumb3" border="0" alt="Unbenannt12_thumb3" src="http://lh4.ggpht.com/-HBuyJwFQcYo/Th299rDcB1I/AAAAAAAACLs/pb9ZUK8NfeY/Unbenannt12_thumb3_thumb.png?imgmax=800" width="299" height="409" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Assuming you have set the “IgnoreUnrelatedDimensions” property of the plan revenue measure group to false, only actual values are displayed here, no plan values. So this approach works really well.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Introducing a scenario dimension&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Another common way for implementing this situation is to use a scenario dimension. The scenario dimension contains two entries: Actual and Plan. The data from the fact table FactOrder is linked to the scenario member ‘Actual’ while the data from the FactPlan table is linked to the scenario member ‘Plan’. This can be easily done in the data source view (DSV) as shown below for the plan table:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-uqGiypHDFHc/Th29-Pb46SI/AAAAAAAACLw/e_i5LwYdHmo/s1600-h/image_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb" border="0" alt="image_thumb" src="http://lh4.ggpht.com/-XjE8yFUnLCc/Th29_F9SDHI/AAAAAAAACL0/MJH1JpGhqEo/image_thumb_thumb.png?imgmax=800" width="244" height="199" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The fact tables are linked to the scenario dimension in the data source view:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-KZFxwVBeEV0/Th29_lqikhI/AAAAAAAACL4/j_LZ6pWH9cA/s1600-h/t1_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="t1_thumb" border="0" alt="t1_thumb" src="http://lh6.ggpht.com/-krhn0CS2Nn8/Th2-AGqcjjI/AAAAAAAACL8/I7aDV--ujMc/t1_thumb_thumb.png?imgmax=800" width="206" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The scenario dimension should be set IsAggregatable=False (as it makes no sense to aggregate actual and plan data). Also we should provide a default element. This is shown in the screenshot below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-QZKTjOotfGg/Th2-A5U13xI/AAAAAAAACMA/1K8lw8dScr0/s1600-h/image_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-8ApPqgcCG3I/Th2-BsckeRI/AAAAAAAACME/blmtNM8WH9c/image_thumb1_thumb.png?imgmax=800" width="244" height="166" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For the cube, we only want to have one single measure revenue which is neither of the two existing measures. Therefore we make both existing measures invisible. In order to distinguish them from visible measures, I prefixed them with an underscore:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-6HxJaHnb3yQ/Th2-CNwdqFI/AAAAAAAACMI/q3qmeSVgoU0/s1600-h/image_thumb2%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh3.ggpht.com/-xUb8FU4Y33w/Th2-C1qu78I/AAAAAAAACMM/8tFK7B9Nkz0/image_thumb2_thumb.png?imgmax=800" width="152" height="89" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Both measures have their Visible-property set to false. The only visible measure in this case has to be a calculated measure that takes the data from one of the two invisible measures depending on the chosen scenario. Since our scenarios are distinct and not aggregatable, we can simply add both measures (as one of the two is always zero):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-MEbSRF_8XKo/Th2-Dccg1MI/AAAAAAAACMQ/IauMIEnWYkY/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh3.ggpht.com/-tevtBqXDp0Q/Th2-EF-h-RI/AAAAAAAACMU/urcM_88MwkY/image_thumb3_thumb.png?imgmax=800" width="244" height="173" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Choosing this model for implementing the different granularity, the cube shows only one measure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-qHvP9DdwSrQ/Th2-E50eF5I/AAAAAAAACMY/eePBfqk2VLA/s1600-h/image_thumb6%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh4.ggpht.com/-VcMiug9YR6Y/Th2-FVS16TI/AAAAAAAACMc/G8y_Nff50n0/image_thumb6_thumb.png?imgmax=800" width="150" height="173" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If the scenario dimension is not used, actual values are shown. You can still easily display both values by putting the scenario dimension on one of the axis:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-GYfQDiolbRc/Th2-GNKHA4I/AAAAAAAACMg/vC__I8GWpPA/s1600-h/image_thumb5%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh6.ggpht.com/-0nzseH9s2Bo/Th2-G8VZUPI/AAAAAAAACMk/CBePI1Ivuro/image_thumb5_thumb.png?imgmax=800" width="309" height="469" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If you compare this screenshot with the one from above you will see the same values, only the presentation is a little bit different (first option had two measures, last option one measure and a scenario dimension).&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Summary&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Both options shown here enable us to have the same measure in two or more measure groups of different granularity. The second option with the scenario dimension looks a little bit more tidy from a technical perspective. However, sometimes the first option is easier to use, especially when you add other calculated measures with formulas that are combining two or more scenarios (for example a measure like ‘Actual to plan ratio’). For those measures it’s not possible to assign them to one scenario. On the other hand, choosing the scenario dimension can dramatically reduce the number of visible measures in your cube, which makes the cube easier to understand for end-users. So depending on the requirements and the structure of the data, one of the two options will be the best to choose.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-5847695325952291857?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/5847695325952291857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/07/same-measure-in-different-granularity_3629.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5847695325952291857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5847695325952291857'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/07/same-measure-in-different-granularity_3629.html' title='Same measure in different granularity'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-1o-LouZFidc/Th290QokpII/AAAAAAAACK0/Zf5lxiOt_kY/s72-c/Unbenannt5_thumb3_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-5395803096995719641</id><published>2011-06-05T17:41:00.000+02:00</published><updated>2011-07-13T17:42:36.676+02:00</updated><title type='text'>Valid characters for measure names</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;The name for measures in a cube should be clear and easy to understand to avoid confusion for the end-users. However, not all characters are allowed for cube measures. Here is a list of characters, that is not allowed: &lt;/p&gt;  &lt;p align="center"&gt;. , ; ' ` : / \ * | ? &amp;quot; &amp;amp; % $ ! + = ( ) [ ] { } &amp;lt; &amp;gt;&lt;/p&gt;  &lt;p&gt;Some of these characters are allowed for calculated measures although I wouldn’t recommend this as it reduces the readability of the your MDX queries (see below).&lt;/p&gt;  &lt;p&gt;For example, you cannot give the name “Avg. Sales Amount” to a built-in cube measure (because of the dot). Also sometimes the end-users want to see the unit with the measure name and one frequently used notation for this is to use square brackets like “Utilization [%]” (here the square brackets and the percent-sign violate the list of allowed characters).&lt;/p&gt;  &lt;p&gt;On the other hand, many special characters are allowed. The following characters are just an example of the long list of symbols within the normal text:&lt;/p&gt;  &lt;p align="center"&gt;Ø © α € £ ¥ ® ™ ± ≠ ≤ ≥ ÷ × ∞ µ&lt;/p&gt;  &lt;p&gt;For example, it’s quite common to use the Ø-sign as a symbol for the average and you can name your measure “Revenue Ø”. In order to do so, one simple way is to open Microsoft Word and use the Insert Symbol dialog as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-atLllqlABLY/Th280e3A54I/AAAAAAAACKA/51BAwtp_aKo/s1600-h/image_thumb4%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh4.ggpht.com/-7v54GCG9aOg/Th281Ecag8I/AAAAAAAACKE/eAn4geJ1TnQ/image_thumb4_thumb.png?imgmax=800" width="244" height="177" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;After inserting the symbol into a blank Word document you can simply copy it to the clip board and then paste the symbol to the measure name:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-JZSYGpC-U6I/Th281v8AXbI/AAAAAAAACKI/12WmbF8EsZc/s1600-h/Unbenannt_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt_thumb1" border="0" alt="Unbenannt_thumb1" src="http://lh6.ggpht.com/-6hp7GobDDGA/Th282Wy_XOI/AAAAAAAACKM/Ykkj1orhQdo/Unbenannt_thumb1_thumb.png?imgmax=800" width="100" height="62" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The new measure name is also displayed correctly in OLAP clients like Microsoft Excel:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-FZw76Cj5pzs/Th282-mQVDI/AAAAAAAACKQ/ODWwFMr2cOk/s1600-h/Unbenannt1_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt1_thumb1" border="0" alt="Unbenannt1_thumb1" src="http://lh5.ggpht.com/-6Lyy1GOJUBM/Th283V82dVI/AAAAAAAACKU/iTu6kWnSBRg/Unbenannt1_thumb1_thumb.png?imgmax=800" width="164" height="193" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;However, the special symbols might be more complicated to enter in certain situations where you cannot easily pick the measure from a list. For example, in MDX you also have to enter these character:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select {[Measures].[Revenue],&lt;font color="#ff0000"&gt;[Measures].[Revenue Ø]&lt;/font&gt;} on 0,       &lt;br /&gt;[Dim Date].[Calendar Week].[Year] on 1       &lt;br /&gt;from [OLAPSample1]&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;For this reason I recommend measure names that only consist of simple characters and numbers, for example [AvgRevenue] instead of [Revenue Ø].&lt;/p&gt;  &lt;p&gt;But how can we help our end-users to see the measures in the way they expect them to see? One simple solution is to use the cube translation for this purpose (Enterprise feature, you cannot do this on a Standard Edition). Within the cube translation you are not restricted to specific characters, so you can also use all of the above characters, while still keeping a simple technical name for the measure.&lt;/p&gt;  &lt;p&gt;To show this, I added some more measures to the cube:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-UTzrC4h0cto/Th283w4f2xI/AAAAAAAACKY/IlEIybLSWLw/s1600-h/Unbenannt2_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt2_thumb1" border="0" alt="Unbenannt2_thumb1" src="http://lh3.ggpht.com/-WFNegdZZWwU/Th284u6LvTI/AAAAAAAACKc/u5xaTuwFOC4/Unbenannt2_thumb1_thumb.png?imgmax=800" width="142" height="130" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;On the translation tab of the cube we can now add the displayed names for those measures:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-EviymBDtvDE/Th285Fi1xHI/AAAAAAAACKg/ecSlRrlgIq4/s1600-h/Unbenannt31_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt31_thumb1" border="0" alt="Unbenannt31_thumb1" src="http://lh5.ggpht.com/-Gj_bi6s6okQ/Th2858E82lI/AAAAAAAACKk/raBxomk62FI/Unbenannt31_thumb1_thumb.png?imgmax=800" width="374" height="228" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And as expected, Excel correctly displays the “translated” measure names:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-sH5fewz4NBk/Th286SGC6YI/AAAAAAAACKo/zXoYXDqHj00/s1600-h/Unbenannt31_thumb1%25255B4%25255D%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="Unbenannt31_thumb1[4]" border="0" alt="Unbenannt31_thumb1[4]" src="http://lh3.ggpht.com/-lf1kdrB9MYI/Th286yVWqjI/AAAAAAAACKs/1UHXc8fAWYM/Unbenannt31_thumb1%25255B4%25255D_thumb.png?imgmax=800" width="374" height="228" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Using translations for the measure names also gives the flexibility to change the name of measures if required without also changing all the queries and calculations that are based on this cube (all queries use the built-in names that are now hidden from the end-user). So, translations are a good method to add a façade to your cube’s measures while also adding some more flexibility in the naming of the measures.&lt;/p&gt;  &lt;p&gt;Although we can now be very flexible with the naming of the measures, I still recommend to keep things simple. In general, the unit of the measure is usually clear from the name and the format of the values (for example a percentage value). In this case you should avoid showing the unit with the measure name as it is only redundant in this case.&lt;/p&gt;  &lt;p&gt;The next post will be about handling similar measures in different measure groups of different granularity (for example revenue as actual and as planned value). In this case you might question yourself how to name these measures properly. With translations you might be tempted to give two measures the same display name (which is possible!). Without anticipating too much of the next post, I don’t recommend to do this as can be very confusing to work with such a cube.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-5395803096995719641?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/5395803096995719641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/06/valid-characters-for-measure-names.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5395803096995719641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5395803096995719641'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/06/valid-characters-for-measure-names.html' title='Valid characters for measure names'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-7v54GCG9aOg/Th281Ecag8I/AAAAAAAACKE/eAn4geJ1TnQ/s72-c/image_thumb4_thumb.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-303226579894313270</id><published>2011-05-01T11:50:00.001+02:00</published><updated>2011-07-13T17:51:24.685+02:00</updated><title type='text'>Profit calculation for churn prevention data mining models (part 3 of 3)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;The last two posts were about cost optimization for a churn prevention campaign. We analyzed the following four options:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Trivial option (only using average return rate and average customer value) &lt;/li&gt;    &lt;li&gt;Data Mining (using average customer value but individual return rate based on a mining model) &lt;/li&gt;    &lt;li&gt;Value driven approach (using average return rate but individual customer value) &lt;/li&gt;    &lt;li&gt;Combination of method 2 and 3 (using individual return rate and value) &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;As been said in the last post, the results may differ a lot from case to case. Especially the effectiveness of option 2 and 3 depend a lot on the available information for data mining (for option 2) and the variance of the customer value (for option 3). If both methods give some improvement, then the combination can be expected to be the best choice.&lt;/p&gt;  &lt;p&gt;However, we always looked at a scenario where the trivial approach gave almost no insight. What I mean is, that for the trivial approach there is not much difference between giving every customer a voucher compared to giving no customer a voucher (costs between $12,500 and $13,750 in this example). In other words the line chart showing the costs for the trivial example was almost a horizontal line. Here is the corresponding chart from the first post (click on the image to see a larger version):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-0WE1S9KTVLA/Th2-6A69l5I/AAAAAAAACMo/nf-Vv04vdqA/s1600-h/image_thumb%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb" border="0" alt="image_thumb" src="http://lh6.ggpht.com/-QNtFIKA-9Bc/Th2-69r1SNI/AAAAAAAACMs/fJZMWaK7z1k/image_thumb_thumb.png?imgmax=800" width="244" height="153" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Now, let’s change our basic parameter a little bit. The following table shows the old and the new parameters:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="500"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="246"&gt;&amp;nbsp;&lt;/td&gt;        &lt;td valign="top" width="131"&gt;         &lt;p align="right"&gt;&lt;strong&gt;Old scenario&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="121"&gt;         &lt;p align="right"&gt;&lt;strong&gt;New scenario&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="246"&gt;         &lt;p align="left"&gt;Avg. Customer Profit&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="131"&gt;         &lt;p align="right"&gt;$25&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="121"&gt;         &lt;p align="right"&gt;$50&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="246"&gt;         &lt;p align="left"&gt;Avg. Churn Rate&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="131"&gt;         &lt;p align="right"&gt;50%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="121"&gt;         &lt;p align="right"&gt;80%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="246"&gt;         &lt;p align="left"&gt;Voucher costs (prevention costs)&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="131"&gt;         &lt;p align="right"&gt;$10&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="121"&gt;         &lt;p align="right"&gt;$5&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="246"&gt;         &lt;p align="left"&gt;Over all Churn Rate with prevention&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="131"&gt;         &lt;p align="right"&gt;15%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="121"&gt;         &lt;p align="right"&gt;5%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Here are the resulting chart (I’ve copied the chart for the old scenario from my last post):&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="500"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Old scenario&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="250"&gt;         &lt;p align="center"&gt;&lt;strong&gt;New scenario&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;&lt;a href="http://lh3.ggpht.com/-t9pV-tay6O8/Th2-7Wc4EqI/AAAAAAAACMw/ZIN6AXZg9wc/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-Sx5tzWo08gM/Th2-7yejpKI/AAAAAAAACM0/VzLF1RHNUGA/image_thumb1_thumb.png?imgmax=800" width="244" height="153" /&gt;&lt;/a&gt;&lt;/td&gt;        &lt;td valign="top" width="250"&gt;&lt;a href="http://lh5.ggpht.com/-R7N-e-qrn-8/Th2-8lp1c5I/AAAAAAAACM4/mQ5JdP31QpU/s1600-h/image_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh6.ggpht.com/-xsxa13_KvR0/Th2-9BqYRqI/AAAAAAAACM8/-eSltObtxGc/image_thumb2_thumb.png?imgmax=800" width="244" height="153" /&gt;&lt;/a&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;While we clearly see the effect of the optimization in the old scenario (left chart), as the purple line (combined method) has a significant minimum compared to the blue line (trivial approach), in the new scenario (right chart) the minimum of all three methods is almost identical. The trivial method (blue line) has it’s minimal costs at the right end point of the line but although the other methods perform better in the middle range of the chart, they cannot deliver values that are much lower than the right end point of the blue line (for my sample data, even the combined model performs only about 1.2% better than the trivial model from our first approach in this case).&lt;/p&gt;  &lt;p&gt;In general, a good optimization is more easy to achieve, if the trivial method has no clear decision (line is almost horizontally oriented). In the new scenario, the profit for each customer is high compared to the prevention costs and the prevention campaign is extremely efficient. In such a case you cannot expect much optimization from your value- or data mining driven approach.&lt;/p&gt;  &lt;p&gt;So the first thing to remember from this post is to first check the trivial approach. This is the approach that is almost instantly available for most situations, even if the values are just estimations. If the result is more like the blue line in the left chart (horizontally oriented) it is very likely that you can achieve a significant optimization. If it is more like the blue line in the right chart (either falling or rising) you might only want to check further improvements if the costs for getting the churn score or the individual value are not too high. Otherwise you would risk doing an expensive optimization project with the result, that no optimization is possible.&lt;/p&gt;  &lt;p&gt;There is still one open question from the last post and that is about the value for the voucher. Up to this point our success rate for making customers return by using a voucher was a fixed average value based on a test sample. Of course, this does not appear to be sensible: Customer with a high value might even be annoyed by a cheap voucher. Also, customers with a high churn value (likely to go away) might not respond to our voucher campaign in the same way as customers with a low churn rate (customers that are likely to return no matter if there is a voucher or not). So it’s time to add more reality to the model. To do so, we’ll have to analyze the data from our customer test samples (first post) in more detail. We only analyzing by customer value (an advanced model could also analyze by churn score, although a bigger test sample is required then). The following table shows the range for the customer profit, the voucher that was associated with that range and the number of customers who did not return although they received the voucher:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="611"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Profit Rage&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Number of customers in sample&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Value of Voucher&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Churn rate (not returning)&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="right"&gt;0-10&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="right"&gt;78&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="right"&gt;0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="right"&gt;31%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="right"&gt;10-15&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="right"&gt;55&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="right"&gt;21%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="right"&gt;15-25&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="right"&gt;447&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="right"&gt;5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="right"&gt;17%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="right"&gt;25-35&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="right"&gt;275&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="right"&gt;10&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="right"&gt;10%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="125"&gt;         &lt;p align="right"&gt;&amp;gt;35&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="179"&gt;         &lt;p align="right"&gt;145&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="132"&gt;         &lt;p align="right"&gt;15&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="173"&gt;         &lt;p align="right"&gt;7%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;With this extra information, we can calculate the costs for the voucher and the probability to prevent the customer from going away more precisely. I’m still using the sort order from my combined model. Here is the result in the chart view. In order to make the difference easier to see I changed the minimum for the y-axis (don’t be fooled by the different presentation):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-J8awXdviJ_s/Th2--AnkcoI/AAAAAAAACNA/ySGr8w2PFwQ/s1600-h/image_thumb4%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh6.ggpht.com/-87tpfgjIJOA/Th2--_fJ_RI/AAAAAAAACNE/iTTf5da6Z9g/image_thumb4_thumb.png?imgmax=800" width="609" height="379" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The newly created model is named ‘Variable Model’ in this plot (line color cyan). Because of the different bands the line is not as smooth as for the other models. However it turns out, that this approach is the best one, based on my sample data. Also you’ll notice that the end points of the cyan line differ from the other lines’ endpoints because the costs for the vouchers are no longer a constant. Here are the detailed results from all the approaches:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="bottom" width="135"&gt;         &lt;p align="right"&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="89"&gt;         &lt;p align="right"&gt;&lt;b&gt;Trivial Model&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="87"&gt;         &lt;p align="right"&gt;&lt;b&gt;Profit Model&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="127"&gt;         &lt;p align="right"&gt;&lt;b&gt;Churn Score Model&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="113"&gt;         &lt;p align="right"&gt;&lt;b&gt;Combined Model&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="106"&gt;         &lt;p align="right"&gt;&lt;b&gt;Variable Model&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="bottom" width="135"&gt;         &lt;p align="left"&gt;&lt;b&gt;Minimal costs&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="89"&gt;         &lt;p align="right"&gt;12,500.00&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="87"&gt;         &lt;p align="right"&gt;11,761.15&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="127"&gt;         &lt;p align="right"&gt;11,817.60&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="113"&gt;         &lt;p align="right"&gt;11,228.95&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="106"&gt;         &lt;p align="right"&gt;10,999.31&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="bottom" width="135"&gt;         &lt;p align="left"&gt;&lt;b&gt;Improvement&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="89"&gt;         &lt;p align="right"&gt;0.00&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="87"&gt;         &lt;p align="right"&gt;738.85&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="127"&gt;         &lt;p align="right"&gt;682.40&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="113"&gt;         &lt;p align="right"&gt;1,271.05&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="106"&gt;         &lt;p align="right"&gt;1,500.69&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="bottom" width="135"&gt;         &lt;p align="left"&gt;&lt;b&gt;Improvement %&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="89"&gt;         &lt;p align="right"&gt;0.0%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="87"&gt;         &lt;p align="right"&gt;5.9%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="127"&gt;         &lt;p align="right"&gt;5.5%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="113"&gt;         &lt;p align="right"&gt;10.2%&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="bottom" width="106"&gt;         &lt;p align="right"&gt;12.0%&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;After the three posts now it’s time for a short summary:&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;For cost optimization it’s important to first analyze the actual situation. Optimization is not always possible and before starting an expensive project, it’s better to look at the parameters (as shown above, see remarks for the ‘trivial model’). If you decide for optimization, data mining is not the only option. A value driven approach is as important as a mining model in many scenarios, unless all your customers share the same value. And it’s important to know your parameters as good as possible. In my example, the test sample of customers with and without vouchers was very important. Also keep in mind, that the success rate of a voucher (or any other method of prevention) is not a constant, but depends at least on the customer value (usually computed based on the orders of the past) and the likeliness of the customer to turn away (churn score, usually computed by a mining model). The combination of all these parameters is the key to making the optimization methods more efficient.&lt;/p&gt;  &lt;p&gt;And I should also add a warning remark that applies to all the approaches we did here: We always try to model the behavior of the customers in the future based on data from the past. The first four approaches are all based on the same modeling idea (only the last model with the variable voucher is based on different pre-conditions). The only difference is the subset of customers that are addressed (different sort order). However, in all these cases the effect of the optimization may be different in reality and with every model it is important to also validate and constantly refine the model.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-303226579894313270?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/303226579894313270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/05/profit-calculation-for-churn-prevention.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/303226579894313270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/303226579894313270'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/05/profit-calculation-for-churn-prevention.html' title='Profit calculation for churn prevention data mining models (part 3 of 3)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-QNtFIKA-9Bc/Th2-69r1SNI/AAAAAAAACMs/fJZMWaK7z1k/s72-c/image_thumb_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-8218921298757239627</id><published>2011-04-09T18:59:00.001+02:00</published><updated>2011-07-13T17:54:33.540+02:00</updated><title type='text'>Profit calculation for churn prevention data mining models (part 2 of 3)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;In part 1 of this mini series of data mining posts I showed how data mining can be used to optimize a churn prevention campaign by selecting the optimal sub sample of customers to receive a voucher. For this optimization I used a data mining model (without going into detail about the model itself). However, developing such a model is not always an easy task and can result in relevant additional costs. This post first looks at an “alternative” method which you might find easier to implement in the first step. In fact, when facing any kind of optimization or prediction problem, data mining should not be the only option you’re taking into account.&lt;/p&gt;  &lt;p&gt;With our data mining approach we were able to transform the average churn rate of the customers (used by the trivial model first) into a customer specific churn rate. By sorting the customers by this individual churn rate we could address those customers first, for which is was most likely that they will not return. This made our investment more efficient. &lt;/p&gt;  &lt;p&gt;For the next approach, we will ignore this individual churn score and go back to the average churn score of 50% from the example of my last post. But now we will be looking at a customer specific profit. Depending on your data, profit may be replaced by some other value measure for the customer (revenue, customer life time value etc.). &lt;/p&gt;  &lt;p&gt;In the same way as in the previous approach, we can simply sort our customers by their value in descending order and address the customers with the highest value first. Again, the warning from my last post also applies here: The customers with the highest value might not respond to the $10 voucher in the same way as customers with a lower value. I will get back to this point later.&lt;/p&gt;  &lt;p&gt;In the same way as before we can calculate the costs per customer:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="691"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;&lt;strong&gt;Option&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="439"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Costs&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 1: Customers gets no voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;[average churn rate] x [customer profit]&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 2: Customer gets voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;[average churn rate] x (1- [prevention success rate]) x [customer profit]          &lt;br /&gt;+ [cost of a single voucher]           &lt;br /&gt;=           &lt;br /&gt;[average churn rate] x [over all prevention success rate] x [customer profit]           &lt;br /&gt;+ [cost of a single voucher]           &lt;br /&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For my sample data, the chart (now including the trivial approach, the churn score approach and the new profit based approach) looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-cdyqdJOiM8M/Th2_qsCspDI/AAAAAAAACNI/xFbRZLYjkt0/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh5.ggpht.com/-qbJxbvZrCic/Th2_rsr5UTI/AAAAAAAACNM/2jpfvMEHU5w/image_thumb1_thumb.png?imgmax=800" width="609" height="379" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As you can see, for my test data both approaches (green and red line) perform almost in the same way. However, this highly depends on your customer base and I adjusted my data in a way not to prefer one of the two approaches. If there is a low variance in your customer values (almost the same value for every customer) the method based on the value will perform almost as bad as the trivial approach (as it gives no benefit then). If the variance is high, the value based method might easily outperform a purely data mining driven score calculation. &lt;/p&gt;  &lt;p&gt;Another good idea would be to combine both methods. Instead of ordering the customers by value &lt;em&gt;or&lt;/em&gt; churn score we could sort them by the product [customer values] x [churn score]. In this case both parameters are variables and the calculation looks somewhat like this:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="691"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;&lt;strong&gt;Option&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="439"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Costs&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 1: Customers gets no voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;[churn score] x [customer profit]&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 2: Customer gets voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;[churn score] x (1- [prevention success rate]) x [customer profit]          &lt;br /&gt;+ [cost of a single voucher]&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For my sample data the chart output of all four methods shows that the combination is superior (which should be expected in most real world scenarios):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-QmKWncFnCho/Th2_selbT_I/AAAAAAAACNQ/DKzjJg1dOwA/s1600-h/image_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh3.ggpht.com/-xLlIuI5Lqnc/Th2_tuVjUPI/AAAAAAAACNU/XIpalsms65g/image_thumb3_thumb.png?imgmax=800" width="609" height="379" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In my example the optimizations using only profit (value) or only churn score give an improvement by about 5.5% to 6% while the combination (purple line in the chart) gives about 10.2%. &lt;/p&gt;  &lt;p&gt;For all of these samples we used the same parameters. How does the whole picture changes, if the parameters are changed (for example the average return rate)? And what may happen if we also adjust the last fixed parameter in this model, the value for the voucher itself? This will be the topic of the third and last part of this series.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-8218921298757239627?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/8218921298757239627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/04/profit-calculation-for-churn-prevention.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/8218921298757239627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/8218921298757239627'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/04/profit-calculation-for-churn-prevention.html' title='Profit calculation for churn prevention data mining models (part 2 of 3)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-qbJxbvZrCic/Th2_rsr5UTI/AAAAAAAACNM/2jpfvMEHU5w/s72-c/image_thumb1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-4824325987363234862</id><published>2011-02-27T16:30:00.001+01:00</published><updated>2011-02-27T16:30:17.251+01:00</updated><title type='text'>CeBIT 2011 in Hannover from March 1-5, 2011</title><content type='html'>&lt;p&gt;The CeBIT in Hannover, Germany is a great time to learn about new technology, share information and discuss about new topics. As for the last years we (ORAYLIS GmbH) are also present and I would be glad to welcome you at our booth that is located within the Microsoft area: Fair Hall 4, Booth 26 / P30. I will be at the booth from March 2 (Wednesday) to March 5 (Saturday).&lt;/p&gt;  &lt;p&gt;Hope to see you at the CeBIT!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-4824325987363234862?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/4824325987363234862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/02/cebit-2011-in-hannover-from-march-1-5.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4824325987363234862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4824325987363234862'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/02/cebit-2011-in-hannover-from-march-1-5.html' title='CeBIT 2011 in Hannover from March 1-5, 2011'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-4316781450971455177</id><published>2011-02-27T16:18:00.001+01:00</published><updated>2011-07-13T17:56:33.447+02:00</updated><title type='text'>Profit calculation for churn prevention data mining models (part 1 of 3)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;Data Mining is frequently used to optimize costs in marketing scenarios. Most marketing campaigns result in costs and we want these campaigns to be as efficient as possible. &lt;/p&gt;  &lt;p&gt;To illustrate this process let’s assume we’re operating a web shop. Customers need to register before they can order goods and therefore we know how many customers are returning. While we also spend some afford to win new customers, for this example we will focus on keeping the existing ones and making them buy more in our shop. &lt;/p&gt;  &lt;p&gt;From our web shop system we know, that 50% of our customers place only one single order and do not return (within a given period of time, for example within a year). Marketing suggests to offer our existing customers a benefit in order to make them return, let’s say a $10 voucher. &lt;/p&gt;    &lt;p&gt;&lt;strong&gt;The Simple Approach&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;While we will analyze more sophisticated models later, for now we will take a look at a very simple approach, that does not take differences between individual customers into account. Therefore, we are looking at average values. For example, the average profit for each order is $25. &lt;/p&gt;  &lt;p&gt;Having just the information from above, we cannot decide if the idea of the marketing department is a good one. But we could run a test sample. Let’s say we give the voucher to a random sample of 500 customers. Now we can compare customers who got the voucher with customers who didn’t. After this test we find out that&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="789"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="324"&gt;Group 1 (Customers with voucher):&lt;/td&gt;        &lt;td valign="top" width="463"&gt;15% do not return&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="325"&gt;Group 2 (Customers without the voucher):&lt;/td&gt;        &lt;td valign="top" width="463"&gt;50% do not return (as we already knew)&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;So, based on our test sample, the voucher is pretty effective to keep customers returning. We can also calculate the success rate using this formula:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Bd-dM8Nf8cw/Th3AGt3q-tI/AAAAAAAACNY/S7k_4-gEm5g/s1600-h/Unbenannt6%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="Unbenannt6" border="0" alt="Unbenannt6" src="http://lh3.ggpht.com/-bSLNTMeWZR4/Th3AHRIcZjI/AAAAAAAACNc/JiWLCWCxA1k/Unbenannt6_thumb.png?imgmax=800" width="409" height="48" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The success rate is impressive, but is it also sensible from a cost perspective? To answer this, let’s look at two options that we have:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;We give vouchers to &lt;strong&gt;none&lt;/strong&gt; of our customers &lt;/li&gt;    &lt;li&gt;We give vouchers to &lt;strong&gt;all&lt;/strong&gt; of our customers &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;For both options, we want to calculate the total costs, including costs from loosing existing customers. For all of the following examples we assume that we only have 1,000 customers and that we are looking at a specific time frame (for example a year). Let’s start with the first option. If we don’t give vouchers to any of our customers, the only costs come from loosing customers and therefore loosing profit. Since 50% of our customers are not returning and the average profit from each customer was $25, we loose 1,000 x 50% x $25 = $12,500 in this scenario. Now let’s look at the second option: From our test sample above we know that only 15% of the customers do not return after receiving a voucher. In the same way as for the first option we can calculate the loss value as 1,000 x 15% x $25 = $3,750. But for this option we also need to add the costs for the vouchers, which is 1,000 x $10 = $10,000. So the total costs for option 2 are $3,750 + $10,000 = $13,750. To sum things up:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="497"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="66"&gt;&lt;strong&gt;Option&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="265"&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;&lt;strong&gt;Costs&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="69"&gt;1&lt;/td&gt;        &lt;td valign="top" width="263"&gt;vouchers to none of our customers&lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;$ 12,500&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="72"&gt;2&lt;/td&gt;        &lt;td valign="top" width="261"&gt;vouchers to all of our customers&lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;$ 13,750&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Of course there would also be the option to offer half of our customers a voucher, but this would just mean that we take the average of the two cost values above as shown in the following chart:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-wqDxjr-gJ5o/Th3AIK1BziI/AAAAAAAACNg/nxTrfnchI08/s1600-h/image_thumb11%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh5.ggpht.com/-BUMwtUXiLjU/Th3AI1AI8zI/AAAAAAAACNk/5HMF_DtlEuk/image_thumb11_thumb.png?imgmax=800" width="609" height="379" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In this chart, option 1 corresponds to the left endpoint of the line while option 2 corresponds to the right endpoint of the line. So, although our vouchers are very successful, they don’t make sense here as they result in higher costs. Of course we could think of changing the value for the voucher but we would have to run another test to see how our customer base responds to the new voucher level (I’ll get back to this point in part 3 of this series). In most cases, the random sample method from above would have been done using different groups with different values for the vouchers. The effect on our success rate is usually not linear, so it’s worth finding the best value for a voucher. But we’ll keep things simple here and assume that the $10 voucher is already the best we could do.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Data Mining&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Did I mention data mining in the title? So far we didn’t do any data mining with on our data. But maybe data mining gives us an option to optimize the costs in our scenario? &lt;/p&gt;  &lt;p&gt;The main difference between our analysis from above and the data mining approach (and some other options that I will come to) is that now we are looking at the individual customer. In the first approach we just had an average customer drop-off rate of 50%. With data mining, we’re trying to predict the individual rate/score per customer. To do so, we have to set up a mining model that is trained with the knowledge we have about our customers (for example from the web registration form) and the individual buying behavior in the past. This model can then be used to predict returning customers. As this post is about cost optimization, we’re not actually building the model here but we just look at the output. So let’s assume that every customer now has a churn score associated with the customer record and that the model is validated and tested and performs well (see my other posts about back testing mining models).&lt;/p&gt;  &lt;p&gt;We can now calculate the individual costs per customer for our two options from above:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="691"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;&lt;strong&gt;Option&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="439"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Costs&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 1: Customers gets no voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;churnscore x [average profit]&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Option 2: Customer gets voucher&lt;/td&gt;        &lt;td valign="top" width="439"&gt;churnscore x (1- [prevention success rate]) x [average profit]          &lt;br /&gt;+ [cost of a single voucher]&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For the first option, the calculation is simple. We only take the expectancy value of the profit loss here. If the individual churn score is for example 30% (0.3) then the costs in this calculation are 30% x $25 = $7.5. Please note that although we are looking at a data mining approach here, I’m still using the average profit of the customer. We will change this later to see the additional effect.&lt;/p&gt;  &lt;p&gt;For the second option, we’re using our prevention success rate (70% from above). Even with the voucher, some customers will still not be returning. And as in the example from above, we have to add the costs of the voucher itself. If the individual churn score is 30% (0.3 as in the example above), then this case would result in the following costs: 30% x (100%-70%) x $25 + $10 = $12,25.&lt;/p&gt;  &lt;p&gt;For my example I’m using randomly generated test data (with a modified random function). As long as our average rates from above match with the score we’re using here, it is obvious that both approaches give the same result for the two decisions from above: no customers gets a voucher and every customer gets a voucher. So how do we use our churn score now? The answer is simple: Instead of addressing for example 50% of the customers we could now address 50% of the customers with the highest churn scores as we assume that it is more likely for them to go away.&lt;/p&gt;  &lt;p&gt;The chart from above may now look like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-2Dmmhycn3X8/Th3AJxY-s1I/AAAAAAAACNo/aJOWA-Q3_14/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh3.ggpht.com/-edNyE9kp_vA/Th3AMAMphdI/AAAAAAAACNs/8RrulPj6Cu8/image_thumb1_thumb.png?imgmax=800" width="609" height="379" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;You can clearly see the effect of the optimization here (green line). As we’re talking about costs, the lower the better. The green line showing our data mining approach has a minimum at appox. 22%. The value at that point is $11,818. So, using the data mining approach we could save $682 based on our sample set of 1,000 customers, compared to the total costs of $12,500 for the trivial approach. This is an improvement of about 5.45%. Remember though, that we’re only looking at sample data here, so for your scenario the output may be totally different. Also, customers with a high churn rate might not respond to our vouchers in the same way as customers with a lower churn rate.&lt;/p&gt;  &lt;p&gt;In the next part of this post we’ll focus more on the profit value of this scenario.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-4316781450971455177?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/4316781450971455177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/02/profit-calculation-for-churn-prevention.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4316781450971455177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4316781450971455177'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/02/profit-calculation-for-churn-prevention.html' title='Profit calculation for churn prevention data mining models (part 1 of 3)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-bSLNTMeWZR4/Th3AHRIcZjI/AAAAAAAACNc/JiWLCWCxA1k/s72-c/Unbenannt6_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-8613995589783051514</id><published>2011-01-31T10:18:00.001+01:00</published><updated>2011-01-31T10:18:32.789+01:00</updated><title type='text'>BOL Community Article about Back-Testing Data Mining Results</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p align="left"&gt;In addition to my blog posts about back-testing Data Mining Results (see &lt;a href="http://ms-olap.blogspot.com/2010/09/do-you-trust-your-data-mining-results.html"&gt;part 1&lt;/a&gt;, &lt;a href="http://ms-olap.blogspot.com/2010/10/do-you-trust-your-data-mining-results.html"&gt;part 2&lt;/a&gt; and &lt;a href="http://ms-olap.blogspot.com/2010/12/do-you-trust-your-data-mining-results.html"&gt;part 3&lt;/a&gt;) I also wrote a Books Online Community Article that covers the topic at a more detailed level. The article can be downloaded here: &lt;a title="http://msdn.microsoft.com/en-us/library/gg557481.aspx" href="http://msdn.microsoft.com/en-us/library/gg557481.aspx"&gt;http://msdn.microsoft.com/en-us/library/gg557481.aspx&lt;/a&gt;. The sample data and solution is available for download. The link is provided in the article.&lt;/p&gt;  &lt;p align="left"&gt;You will find all the Community Articles for SQL Server 2008 here: &lt;a title="http://msdn.microsoft.com/en-us/library/cc872864(SQL.100).aspx" href="http://msdn.microsoft.com/en-us/library/cc872864(SQL.100).aspx"&gt;http://msdn.microsoft.com/en-us/library/cc872864(SQL.100).aspx&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-8613995589783051514?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/8613995589783051514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/01/bol-community-article-about-back.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/8613995589783051514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/8613995589783051514'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/01/bol-community-article-about-back.html' title='BOL Community Article about Back-Testing Data Mining Results'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-907684633757256565</id><published>2011-01-16T10:43:00.001+01:00</published><updated>2011-07-13T17:59:23.760+02:00</updated><title type='text'>Cumulated Gains Chart and Lift Chart in SSRS</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;           &lt;p&gt;This post is about reproducing some of the mining charts that are built into Business Intelligence Development Studio using Reporting Services. A common use case might be that you periodically train your model and the results of the training should be published as html reports.&lt;/p&gt;    &lt;p&gt;For my example I use the Adventure Works Targeted Mailing mining structure. We want to plot the results from the “TM Decision Tree Model”. First let’s start with the lift chart. In BIDS, the chart for “Bike Buyer=1” looks like this:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-bsSWesZCsjc/Th3AtuTOh8I/AAAAAAAACNw/hbVkZdveQhw/s1600-h/image9_thumb6%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image9_thumb6" border="0" alt="image9_thumb6" src="http://lh3.ggpht.com/-mwMYndYCFS0/Th3AulzDWLI/AAAAAAAACN0/LuCdjvlyQsY/image9_thumb6_thumb%25255B1%25255D.png?imgmax=800" width="644" height="384" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;In order to reproduce the chart, we’re going to use the SystemGetLiftTable stored function in SSAS. Since we need to do some computations, I prefer to use SQL, so we have to define a linked server first:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;EXEC sp_addlinkedserver        &lt;br /&gt;@server='SSAS_AW',         &lt;br /&gt;@srvproduct='',         &lt;br /&gt;@provider='MSOLAP',         &lt;br /&gt;@datasrc='Adventure Works DW 2008R2'&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;The query for loading the results from the lift chart looks like this:&lt;/p&gt;   &lt;font face="Courier New"&gt;SELECT percentile,      &lt;br /&gt;VALUE,       &lt;br /&gt;CASE       &lt;br /&gt;WHEN percentile &amp;gt;= 100 * CONVERT(FLOAT, totalattributevaluecases) /       &lt;br /&gt;totalcases       &lt;br /&gt;THEN 100       &lt;br /&gt;ELSE percentile / (( CONVERT(FLOAT, totalattributevaluecases) /       &lt;br /&gt;totalcases ))       &lt;br /&gt;END idealmodel       &lt;br /&gt;FROM Openquery(ssas_aw,       &lt;br /&gt;'CALL SystemGetLiftTable([TM Decision Tree], 2, ''Bike Buyer'', 1)') AS       &lt;br /&gt;derivedtbl_1&lt;/font&gt;     &lt;p&gt;For the chart, we use the percentile of the lift table as category group (x-axis) and choose these entries for the values (y-axis):&lt;/p&gt;    &lt;ul&gt;     &lt;li&gt;Sum(Value)        &lt;br /&gt;This is the actual lift curve for our mining model &lt;/li&gt;      &lt;li&gt;Sum(Percentile)        &lt;br /&gt;As this is the same value as on the x-axis, this gives the random guess model line (linear) &lt;/li&gt;      &lt;li&gt;Sum(IdealModel)        &lt;br /&gt;The case-when-end statement above reflects the curve for the ideal model (linear function from 0 to 1 between 0 and totalattribute/totalcases) &lt;/li&gt;   &lt;/ul&gt;    &lt;p&gt;This is how the chart looks like in the SSRS designer:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-4vfjiM5XHMQ/Th3Avh7U7VI/AAAAAAAACN4/Oo_0go1dJws/s1600-h/image_thumb1%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-SNIWACqLo98/Th3AwuxoeNI/AAAAAAAACN8/xlPsSPxxwik/image_thumb1_thumb%25255B1%25255D.png?imgmax=800" width="644" height="273" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;There are different names for this kind of a chart. For this post I’m using the naming conventions as shown here: &lt;a title="http://www2.cs.uregina.ca/~hamilton/courses/831/notes/lift_chart/lift_chart.html" href="http://www2.cs.uregina.ca/~hamilton/courses/831/notes/lift_chart/lift_chart.html"&gt;http://www2.cs.uregina.ca/~hamilton/courses/831/notes/lift_chart/lift_chart.html&lt;/a&gt;, so the chart is labeled “Cumulated Gains Chart” instead of lift chart (BIDS).&lt;/p&gt;    &lt;p&gt;So, here is the resulting Cumulated Gains Chart from my SSRS report:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-mf6mZXjDaCo/Th3Axm9xrBI/AAAAAAAACOA/IZrLJiNYfOo/s1600-h/image_thumb3%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh6.ggpht.com/-33u1yeganLc/Th3AyiFWLmI/AAAAAAAACOE/yEIHPaDR6ls/image_thumb3_thumb%25255B1%25255D.png?imgmax=800" width="644" height="358" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;As you can see, the chart looks pretty much the same as the chart that was displayed in BIDS.&lt;/p&gt;    &lt;p&gt;Another useful chart is the actual lift factor of the model, often referred to as the ‘lift chart’. For this chart, we use the same SSAS function but we need to compute the lift in our query: &lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;SELECT percentile,        &lt;br /&gt;VALUE / percentile AS lift,         &lt;br /&gt;1 AS randomguess         &lt;br /&gt;FROM Openquery(ssas_aw,         &lt;br /&gt;'CALL SystemGetLiftTable([TM Decision Tree], 2, ''Bike Buyer'', 1)') AS         &lt;br /&gt;derivedtbl_1&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;This is how our chart looks like: &lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-R98u9xCvplo/Th3Azc5-jGI/AAAAAAAACOI/BYwD4TMMmTE/s1600-h/image_thumb5%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh6.ggpht.com/-1Ygd3RZ0sXw/Th3A0H1EoxI/AAAAAAAACOM/KEpL7Tm3Sb4/image_thumb5_thumb%25255B1%25255D.png?imgmax=800" width="644" height="382" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Finally, we also want to reproduce the classification matrix (often referred to as confusion matrix). This is how the matrix looks like in BIDS:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-409kj0oXxTk/Th3A0spqJjI/AAAAAAAACOQ/6acELZGqQoU/s1600-h/image_thumb6%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh5.ggpht.com/-AZ-2aFdH-eA/Th3A1ZyucuI/AAAAAAAACOU/tyUvHyRyQR8/image_thumb6_thumb%25255B1%25255D.png?imgmax=800" width="244" height="62" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;To get these results, we use the function SystemGetClassificationMatrix as shown below:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;SELECT predictedvalue,        &lt;br /&gt;SUM(CASE actualvalue         &lt;br /&gt;WHEN 0 THEN [COUNT]         &lt;br /&gt;ELSE 0         &lt;br /&gt;END) actual_0,         &lt;br /&gt;SUM(CASE actualvalue         &lt;br /&gt;WHEN 1 THEN [COUNT]         &lt;br /&gt;ELSE 0         &lt;br /&gt;END) actual_1         &lt;br /&gt;FROM Openquery(ssas_aw,         &lt;br /&gt;'CALL SystemGetClassificationMatrix( [TM Decision Tree], 2, ''Bike Buyer'')'         &lt;br /&gt;) AS derivedtbl_1         &lt;br /&gt;GROUP BY predictedvalue&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;The second parameter of the function SystemGetClassificationMatrix means      &lt;br /&gt;1 – training data       &lt;br /&gt;2 – test data       &lt;br /&gt;3 – both (training and test data)&lt;/p&gt;    &lt;p&gt;I used a simple matrix on my report to display the result from the query above and applied some coloring in order to distinguish error cases from correct cases:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-naoJPNYNtXM/Th3A1390olI/AAAAAAAACOY/jZiDw0B7LOw/s1600-h/image_thumb8%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb8" border="0" alt="image_thumb8" src="http://lh5.ggpht.com/-AwL8NozLV1s/Th3A2s8lteI/AAAAAAAACOc/UzbPSSP6IH0/image_thumb8_thumb%25255B1%25255D.png?imgmax=800" width="244" height="79" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;To complete this post, I’d like to point out that you can also query your model cases directly, using a query like&lt;/p&gt;    &lt;p&gt;SELECT *, IsTestCase() As TestCase FROM [TM Decision Tree].CASES&lt;/p&gt;    &lt;p&gt;This query returns all the cases from the TM Decision Tree model with an additional column “TestCase” that contains true, of the case belongs to the test data set. In order to run this query you need to enable the drill-through option for the mining model. You can find out more about the SSAS mining functions in this post: &lt;a title="http://www.bogdancrivat.net/dm/archives/14" href="http://www.bogdancrivat.net/dm/archives/14"&gt;http://www.bogdancrivat.net/dm/archives/14&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-907684633757256565?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/907684633757256565/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2011/01/cumulated-gains-chart-and-lift-chart-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/907684633757256565'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/907684633757256565'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2011/01/cumulated-gains-chart-and-lift-chart-in.html' title='Cumulated Gains Chart and Lift Chart in SSRS'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-mwMYndYCFS0/Th3AulzDWLI/AAAAAAAACN0/LuCdjvlyQsY/s72-c/image9_thumb6_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6002320867550255248</id><published>2010-12-06T10:32:00.001+01:00</published><updated>2011-07-13T18:01:21.449+02:00</updated><title type='text'>Do you trust your Data Mining results? – Part 3/3</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;This is the third and last post of my Data Mining back testing mini series. For my first posts I made the assumption that no action is taken based on the prediction. Although the mining process marked some of our contracts with a high probability for being cancelled, we did nothing to prevent our customers from doing so.&lt;/p&gt;  &lt;p&gt;Of course, this is not the real life scenario. Usually our goal is churn prevention (or some other action based on the mining results). However, any action we take will (hopefully) change the behavior of our customers. Or to be more precise, we expect that less customers are really cancelling their contract. This is always a challenge for our mining process in total as the prevention may “dry out” our mining source data. After a while we might not have any non-influenced customers left with a high probability for cancelling the contract. A test group would help but in many scenarios this is not wanted (because the customers are just too important to risk loosing them just for the sake of some statistical validation). Another method would be to leave the influenced customers out of the training set. This doesn’t disturb our prediction but do we still know our prevention is successful enough?&lt;/p&gt;  &lt;p&gt;For our back testing process, we can also add the expected prevention rate to our Monte Carlo procedure. In this case we’re not only trying to validate the model but also the prevention rate. On the other hand, if our test fails we’re not sure if it’s the model or our assumed prevention rate that is actually wrong.&lt;/p&gt;  &lt;p&gt;For the following I assume that every customer with a churn probability greater then 35% gets prevention. From the past we know that 90% of our preventions are successful. Here are the results:&lt;/p&gt;  &lt;p&gt;Old model (no prevention included):    &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/-hLvdfECqJEI/Th3BPwxUlWI/AAAAAAAACOg/dVORw6s6Ql4/s1600-h/image_thumb31%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb31" border="0" alt="image_thumb31" src="http://lh5.ggpht.com/-_MJiefk749w/Th3BQuItetI/AAAAAAAACOk/xX0Yg4FHpC4/image_thumb31_thumb.png?imgmax=800" width="434" height="194" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;New model (including prevention):    &lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-uH9kFSSg7tA/Th3BR9Jpz1I/AAAAAAAACOo/c2l8thVdNko/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh5.ggpht.com/-TBX0olgjQ-M/Th3BSlF9DlI/AAAAAAAACOs/8zo33w_NMY4/image_thumb1_thumb.png?imgmax=800" width="443" height="196" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As expected the curves have shifted to the left. In the new model (including the prevention) the number of customers cancelling the contract is significantly lower than in the old model.&lt;/p&gt;  &lt;p&gt;We can still use the same methods as shown in the first post to derive the threshold value for our test. This is the curve for the alpha and beta error (again: for beta for choose an alternative model that goes off by 3%).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/--ZPDTdCbUGA/Th3BTessvLI/AAAAAAAACOw/zlIEKWODRGs/s1600-h/image_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh4.ggpht.com/-FyQuI0wX6QI/Th3BUFW6KEI/AAAAAAAACO0/_ebq5GwOesQ/image_thumb3_thumb.png?imgmax=800" width="444" height="250" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Again, here are some values for the threshold T, alpha and beta:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="0" width="295"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="center"&gt;&lt;strong&gt;T&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Alpha&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1580&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;88.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;0.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1600&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;75.6&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;0.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1620&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;56.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;0.4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1640&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;36.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;1.6&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1660&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;19.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;5.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1680&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;8.4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;13.3&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1700&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;3.0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;27.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1720&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;0.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;46.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="99"&gt;         &lt;p align="right"&gt;1740&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;0.2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;65.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For example with T=1680 we get alpha&amp;lt;10% and beta&amp;lt;14%. But as being said above, if our model fails the test we’re now not sure anymore if it is the model or the prevention rate. Again, the best way to avoid this uncertainty would be to use test groups. For example, we could exclude a random sample of 10% of all contracts with a churn score above 35% from prevention. Then we’re able to compare the behavior of the random sample (test group) with the customers that received prevention benefits. This test group enables us to measure the effectiveness of our prevention activity. And of course you could set up a statistical test in the same way we described here, to proof that the test group really supports a certain success rate.&lt;/p&gt;  &lt;p&gt;To wrap things up, depending on the products, processes, the way new customers are acquired etc., the real-life data mining problems may be much more difficult than presented in this small blog series. Anyway, the methods described here, like the Monte Carlo algorithm or the statistical hypothesis test, are the main building blocks for achieving a reliable understanding how well the model performs. So, depending on the actual scenario, the above mentioned methods may be combined, adjusted and repeated to get the desired results.&lt;/p&gt;  &lt;p&gt;This also ends this mini series about back testing. The basic methods are more or less ubiquitous and may also be used for several similar problems, for example to do the back testing for a financial risk model. By adjusting and combining the methods, a specific business requirement may be modeled into a reliable test. And as long as our model conforms to the test, we can have enough trust in our data mining models to use them as the basis for business decisions.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6002320867550255248?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6002320867550255248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/12/do-you-trust-your-data-mining-results.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6002320867550255248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6002320867550255248'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/12/do-you-trust-your-data-mining-results.html' title='Do you trust your Data Mining results? – Part 3/3'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-_MJiefk749w/Th3BQuItetI/AAAAAAAACOk/xX0Yg4FHpC4/s72-c/image_thumb31_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-533537400356653606</id><published>2010-10-10T13:09:00.001+02:00</published><updated>2011-07-13T18:02:40.445+02:00</updated><title type='text'>Do you trust your Data Mining results? – Part 2/3</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;While the first part of this post was more about the idea and interpreting the results of the test, this part shows how to implement the Monte Carlo test.&lt;/p&gt;  &lt;p&gt;First, we need a table with the predicted data mining probabilities. This is the output of the PredictProbability function from your mining result query. I’m using the same source data as in my previous post here. If you like you can easily create your own table and populate it with random probability values in order to test the code for the simulation below:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE TABLE [dbo].[Mining_Result](      &lt;br /&gt;[CaseKey] [int] NOT NULL,       &lt;br /&gt;[PredictScore] [float] NULL       &lt;br /&gt;) ON [PRIMARY]&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @i int=0 &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;while (@i&amp;lt;10000) begin &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;insert into Mining_Result(CaseKey, PredictScore)      &lt;br /&gt;values(@i, convert(float,CAST(CAST(newid() AS binary(4)) AS int))/2147483648.0/2+.5) &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;set @i=@i+1 &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;end&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Don’t be confused by the convert(…cast…cast newid()…) expression. This is just my approach to calculate a random number within an SQL select statement.&lt;/p&gt;  &lt;p&gt;Next we need a table for storing our Mining results:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE TABLE [dbo].[Mining_Histogram](      &lt;br /&gt;[NumCases] [int] NOT NULL,       &lt;br /&gt;[Count] [int] NULL,       &lt;br /&gt;[Perc] [float] NULL,       &lt;br /&gt;[RunningPerc] [float] NULL,       &lt;br /&gt;CONSTRAINT [PK_DM_Histogram] PRIMARY KEY CLUSTERED       &lt;br /&gt;(       &lt;br /&gt;[NumCases] ASC       &lt;br /&gt;)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]       &lt;br /&gt;) ON [PRIMARY]&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;Then this is how we’re doing our Monte Carlo test:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;truncate table Mining_Histogram; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @numtrials int = 10000;      &lt;br /&gt;declare @cnt int;       &lt;br /&gt;declare @lp int; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;set @lp=0; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;-- perform a monte carlo test: &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;while (@lp&amp;lt;@numtrials) begin &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select @cnt=COUNT(*) from Mining_Result where PredictScore &amp;gt;      &lt;br /&gt;convert(float,CAST(CAST(newid() AS binary(4)) AS int))/2147483648.0/2+.5       &lt;br /&gt;if exists(select NumCases from Mining_Histogram Where NumCases=@cnt)       &lt;br /&gt;update Mining_Histogram set [Count]=[Count]+1 Where NumCases=@cnt       &lt;br /&gt;else       &lt;br /&gt;insert into Mining_Histogram(NumCases,[Count]) values (@cnt, 1)       &lt;br /&gt;set @lp=@lp+1; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;end&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;I’m using the same trick for the random numbers as shown above. In this example, we’re doing 10,000 iterations. For each iterations we compute the number of cases for which the Predicted Score is higher than a random number. For example, if for a certain case the predict score is 0.8 it is more likely that a random number between 0.0 and 1.0 is below the score than for a prediction score of 0.1.&lt;/p&gt;  &lt;p&gt;Next, we’re filling the gaps in our histogram table with zeros to make the histogram look nicer:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @min int;      &lt;br /&gt;declare @max int;       &lt;br /&gt;select @min=MIN(NumCases), @max=MAX(NumCases) from Mining_Histogram &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;set @lp=@min;      &lt;br /&gt;while (@lp&amp;lt;@max) begin       &lt;br /&gt;if not exists(select NumCases From Mining_Histogram Where NumCases=@lp)       &lt;br /&gt;insert into Mining_Histogram(NumCases,[Count]) values (@lp, 0);       &lt;br /&gt;set @lp=@lp+1       &lt;br /&gt;end&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Finally we’re computing the row probability and the running total using this T-SQL:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @maxcount float;      &lt;br /&gt;select @maxcount=SUM([Count]) from Mining_Histogram;       &lt;br /&gt;update Mining_Histogram Set Perc=[Count]/@maxcount; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @CaseIdx int      &lt;br /&gt;declare @perc float       &lt;br /&gt;declare @RunningTotal float =0 &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;DECLARE rt_cursor CURSOR FOR select NumCases, Perc From Mining_Histogram      &lt;br /&gt;OPEN rt_cursor &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;FETCH NEXT FROM rt_cursor INTO @CaseIdx, @perc &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;WHILE @@FETCH_STATUS = 0      &lt;br /&gt;BEGIN       &lt;br /&gt;SET @RunningTotal = @RunningTotal + @perc       &lt;br /&gt;update Mining_Histogram set RunningPerc=@RunningTotal Where NumCases=@CaseIdx       &lt;br /&gt;FETCH NEXT FROM rt_cursor INTO @CaseIdx, @perc       &lt;br /&gt;END &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CLOSE rt_cursor      &lt;br /&gt;DEALLOCATE rt_cursor&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;After running the simulation this is how the plots of the result look like (using my own values). The first plot shows the value of the field NumCases on the x-axis and the value of the field perc on the y-axis. The second plot has the same x-axis but shows the RunningPerc field on the y-axis: &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-evXCaC3msWI/Th3BiAtcR0I/AAAAAAAACO4/zM9UdBc2Srs/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-GXM_03UILQ8/Th3Biz0fmUI/AAAAAAAACO8/B9xcmTHpSyw/image_thumb1_thumb.png?imgmax=800" width="379" height="201" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-oCWIl3Cuefc/Th3BjsW-C9I/AAAAAAAACPA/Eff2m65LPY0/s1600-h/image_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh6.ggpht.com/-at8rZbM73kE/Th3BkDWI8gI/AAAAAAAACPE/nNa-nzxxBmM/image_thumb2_thumb.png?imgmax=800" width="381" height="227" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;These two plots look very much the same as the plots from my last post (although I used C# code there to generate the histogram data).&lt;/p&gt;  &lt;p&gt;If you used the randomly generated scores from above for testing, you will notice the peak being around 5000 cases (instead of 2800 cases in my example).&lt;/p&gt;  &lt;p&gt;And if you like a smoother version of the density function (as all the teeth and bumps mainly result from Monte Carlo approach), you could use this SQL query to compute a moving average:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;declare @minrange int=0      &lt;br /&gt;declare @windowsize int = 50 &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select @minrange=Min(NumCases) from Mining_Histogram &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;SELECT H.NumCases, AVG(H1.[Count]) [Count], AVG(H1.Perc) Perc      &lt;br /&gt;FROM Mining_Histogram H       &lt;br /&gt;left join Mining_Histogram H1 on H1.NumCases between H.NumCases-@windowsize and H.NumCases &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;where H.NumCases&amp;gt;@minrange+@windowsize &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;group by H.NumCases&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-dUGNnUDx8AM/Th3BnPP2ccI/AAAAAAAACPI/5GHfMiNHUvM/s1600-h/image_thumb4%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh6.ggpht.com/-Yjhu6a-BIyY/Th3Bn-XRYLI/AAAAAAAACPM/Cc9rdQU5WqU/image_thumb4_thumb.png?imgmax=800" width="405" height="215" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In order to do the histogram computation automatically with prediction query I recommend putting the code in an SSIS script component. I would also use another type of random number generator. This also allows you to set the seed for the random number generator. For my implementation I used an asynchronous script component that first loads all cases into memory (ArrayList collection), then performs the Monte Carlo test on the in-memory data and then writes the results back to the output buffer. This allows you do work with more scenarios and to log the progress during the loading and testing phase of the component.&lt;/p&gt;  &lt;p&gt;I’m planning to write a Books Online Community Technical article on this topic. This article will be more detailed regarding the implementation. I will post a link to this article in my blog then.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-533537400356653606?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/533537400356653606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/10/do-you-trust-your-data-mining-results.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/533537400356653606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/533537400356653606'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/10/do-you-trust-your-data-mining-results.html' title='Do you trust your Data Mining results? – Part 2/3'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-GXM_03UILQ8/Th3Biz0fmUI/AAAAAAAACO8/B9xcmTHpSyw/s72-c/image_thumb1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-390765911967308161</id><published>2010-09-19T18:18:00.001+02:00</published><updated>2011-07-13T18:04:46.305+02:00</updated><title type='text'>Do you trust your Data Mining results? – Part 1/3</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;Data Mining has been built into SQL Server since version 2005 and it’s quite comfortable and wizard-driven to design your mining models. However, Data Mining is not much about the toolkit but more about data preparation and interpreting the results. Without a proper data preparation, the algorithms will fail in really predicting or clustering the data. And the same is true for the interpretation of the results. But before we can start interpreting the data, we have to trust the results. At the design time of each mining model we can use test case holdouts, lift charts and cross validation to see if the model is robust and meaningful. But most of our prediction models try to predict a future behavior based on the knowledge of today and the past. What if there are significant changes in the market that are not already trained into our models? Is our model still correct or are we missing an important variable?&lt;/p&gt;  &lt;p&gt;Today’s post is about implementing a back testing process to validate the mining results. For our example, we use a churn score prediction model. Think of a telecommunication company: Each customer has a 12 months contract that can be cancelled by the customer at the end of the period. If the contract is not cancelled, it’s continued for another 12 months. We want to predict how many customers are going to cancel their contracts during the next 3 months (the 3 months latency has to be build into our training set, but this is a different topic). To make things more simple in the first step let’s assume the company is not going to use the mining results (no churn prevention) but just waits 3 months to compare the reality with the prediction. That’s what we’re doing during a back testing.&lt;/p&gt;  &lt;p&gt;So let’s assume that we did a churn prediction 3 months ago. Our results are returned from our data mining model as a table like this:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="0" width="375"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="center"&gt;&lt;strong&gt;CaseKey&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Churn              &lt;br /&gt;(&lt;/strong&gt;&lt;strong&gt;Churn Prediction)&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="center"&gt;&lt;strong&gt;ChurnProbability              &lt;br /&gt;(Churn Probability %)&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;true&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;87.4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;7.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;1.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;true&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;50.2&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;11.3&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;6&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;16.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;6.9&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;8&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;1.8&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;2.6&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;10&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;false&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;18.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="84"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="146"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="143"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For my example I have 30,000 contracts (cases). Churn prediciton = true means that this contract (case) is likely to be cancelled by our customer (predicted by the mining model). In my dataset this is true for 2010 cases. I left some more columns out here, but usually you are also looking at the support and other measures. &lt;/p&gt;  &lt;p&gt;Now, 3 months have passed and we want to check how good our initial data mining model was. As said below we didn’t do anything to prevent our 2010 cases from above to cancel their contract. Now, looking at our CRM system reveals that actually 2730 customers of those 30,000 cancelled their contract. What does this mean? We expected 2010 to cancel but in reality it was 2730. Does this mean our model is wrong? Or can we still rely on the model?&lt;/p&gt;  &lt;p&gt;The clue to answering this question is to compute how likely it is, that&lt;/p&gt;  &lt;p&gt;a) Our model is correct and    &lt;br /&gt;b) We see 2730 customers to cancel their contract&lt;/p&gt;  &lt;p&gt;Just to avoid confusion, we’re not looking at the error cases &lt;em&gt;within&lt;/em&gt; our prediction (as we would do with a Receiver Operating Characteristics analysis) but we try to validate the model itself.&lt;/p&gt;  &lt;p&gt;If the Churn probability is the same for each case we could use a binomial test validate the model(see &lt;a href="http://ms-olap.blogspot.com/2008/06/binomial-distribution-for-kpi-status.html"&gt;one of my very first blog posts about this topic&lt;/a&gt;). Another way to do this computation is to run a Monte Carlo scenario generator against our data from above. Basically, the test works like this&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Do N loops (scenarios)      &lt;ul&gt;       &lt;li&gt;Set the number of cancelled contracts x for this scenario to zero initially &lt;/li&gt;        &lt;li&gt;Look at each case          &lt;ul&gt;           &lt;li&gt;Compute a random number r and compare this number with the ChurnProbability p &lt;/li&gt;            &lt;li&gt;If the r&amp;lt;p count this case as cancelled (increment x) &lt;/li&gt;         &lt;/ul&gt;       &lt;/li&gt;        &lt;li&gt;increment the number of occurrences of x cancellations &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I’m showing an implementation of this approach in my next post but for today let’s just concentrate on interpreting the results. For my example I used a SSIS script component to actually perform the Monte Carlo test. I used 30,000 scenarios and ended up with the following result:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-1wefTFvX9Po/Th3CCOHK9sI/AAAAAAAACPQ/OSMlq2f-zNY/s1600-h/image_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh3.ggpht.com/-DPbS1bDh638/Th3CC1AuemI/AAAAAAAACPU/tQOe7R_tnMo/image_thumb3_thumb.png?imgmax=800" width="408" height="181" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can see, most of my scenarios ended with approx. 2800 cancellations (peak in the chart). This might be the first surprise. Assuming the mining algorithm was right, there are still much more cancellations happening than being predicted in the predicted churn column. How can this be? Well, actually the predicted value follows a very simple rule: It switches at 50%. This is a strong simplification of the true distribution. So instead of looking at the predicted values you should better look at the expectancy value:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="709"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="348"&gt;&lt;strong&gt;&lt;u&gt;Predicted value&lt;/u&gt;&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="359"&gt;&lt;strong&gt;&lt;u&gt;Expectancy value&lt;/u&gt;&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="348"&gt;select count(*) from Mining_Result where churn=1&lt;/td&gt;        &lt;td valign="top" width="359"&gt;select SUM(ChurnProbability) from Mining_Result&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="348"&gt;Result: 2010&lt;/td&gt;        &lt;td valign="top" width="359"&gt;Result: 2784&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;As you can see, the expectancy value matches our distribution histogram from above. In most situations, the expectancy value differs from the value count. This is highly dependent on the distribution of the probability values. For example, doing the same test with the bike buyer decision tree model in Adventure Works I got 9939 cases with a predicted value of 1 for the BikeBuyer variable. Here the expectancy value is about 9135, so in this case it is lower than the number of predicted cases.&lt;/p&gt;  &lt;p&gt;Back to our histogram from above. We can easily replace the number of cases by the percentage value of the total cases. This results in the probability density function. In order to proceed we have to use the aggregated density function. For our example, this function looks like this&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-qeHknkwR9FQ/Th3CDmvBidI/AAAAAAAACPY/nr3XK-MNX-U/s1600-h/image_thumb5%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh5.ggpht.com/-cy1n6mjhq6k/Th3CEeZ-yPI/AAAAAAAACPc/tqay_Xy44so/image_thumb5_thumb.png?imgmax=800" width="415" height="208" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This function tells us the probability for seeing less than a certain number of cancellation. As expected, the probability to see less then 30,000 cancellations is 100% (as we only have 30,000 customer who could cancel). On the other hand, the probability to see less than 0 cancellations is 0%. Again it may be a surprise to see that actually the probability for seeing less than 2600 cancellations is close to 0 (from the graph above). How does this look around our real number of 2730 cancelled contracts? Here is the extract from the table:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="0" width="399"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="center"&gt;&lt;strong&gt;NumCases&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="center"&gt;&lt;strong&gt;TotalProbability %&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="center"&gt;&lt;strong&gt;1-TotalProbability %&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2726&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;7.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;92.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2727&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;8.3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;91.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2728&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;8.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;91.3&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2729&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;9.1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;90.9&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;2730&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;9.5&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;90.5&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2731&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;9.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;90.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2732&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;10.3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;89.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2733&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;10.8&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;89.2&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2734&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;11.2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;88.8&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="148"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="160"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;In this table, the total probability is the aggregated probability from our chart above and means the probability for seeing less than NumCases cancellations while 1 minus total probability means the probability to see more than NumCases cancellations.&lt;/p&gt;  &lt;p&gt;From this table you can see that the probability for seeing more than 2730 cancellations is still about 90.5%. Now let’s look at the area between 2740 and 2920 cancellations (to reduce the number of lines I’m only showing every 20th row):&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="0" width="400"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="center"&gt;&lt;b&gt;NumCases&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="center"&gt;&lt;b&gt;TotalProbability %&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="center"&gt;&lt;b&gt;1-TotalProbability %&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2740&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;14.4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;85.6&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2760&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;28.4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;71.6&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2780&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;46.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;53.5&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2800&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;65.1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;34.9&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2820&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;80.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;19.3&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2840&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;91.0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;9.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2860&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;96.6&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;3.4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2880&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;98.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;1.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2900&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;99.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;0.3&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;2920&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;99.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;0.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="89"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="145"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="164"&gt;         &lt;p align="right"&gt;…&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;While it is still likely (86%) to see more than 2740 cancellations, it becomes more and more unlikely with higher the value gets for the cancellation. And seeing more than 2900 cancellations is very unlikely (less than 1%). Of course, this only refers to the case, that the model is operating correctly.&lt;/p&gt;  &lt;p&gt;In order to make our back testing an easy procedure we want to define a simple threshold T. Our model passes the test as long as there are no more cancellations than T. If the model does not pass the test, we have to re-validate the variables and check the overall state of the model. We do not want to do this too often. Therefore, the probability for our model being correct and still not passing the test should be less than 10% (remember that we will run the mining prediction over and over again). Now we need to find a proper value for T.&lt;/p&gt;  &lt;p&gt;From our table above we can see that T is close to 2840 cancellations. We can query the correct value from our histogram table:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select MIN(numcases) from mining_histogram where 1-TotalProbability&amp;lt;0.1&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The result is T=2838 for my example. So our 2730 cancellations from above are definitely below our test threshold T and therefore our model clearly passes the test.&lt;/p&gt;  &lt;p&gt;Now that we’ve set a threshold to the condition “model correct and cancellations&amp;gt;2838”, what about the situation in which our model is incorrect. In this case we would assume that the real probability for a customer cancelling the contract is higher than predicted by our model. This is of course only one assumption we could make. Depending on the conditions and the environment, the definition of the “wrong model” can be different. In any case we have to define a “wrong” or alternative model.&lt;/p&gt;  &lt;p&gt;For our example, this is how our model looks like with a 3% higher probability for cancellation (blue line).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-45YQt29WNjY/Th3CFEZ1ArI/AAAAAAAACPg/fpclf4-gNJM/s1600-h/image_thumb9%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb9" border="0" alt="image_thumb9" src="http://lh5.ggpht.com/-nRuR4IxhWdk/Th3CFzBk0WI/AAAAAAAACPk/Uohli9aOPJs/image_thumb9_thumb.png?imgmax=800" width="420" height="184" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;For this chart, we have to create a separate histogram table for our alternative (wrong) model and also calculate the results in our Monte Carlo process. Again, we can read the probability for the condition “model is wrong but passes the test” from our histogram table. In our case it’s about 24.4%.&lt;/p&gt;  &lt;p&gt;If you’re not interested in a more statistical view of our test you can now skip to the conclusion below.&lt;/p&gt;  &lt;p&gt;Otherwise you probably already know that there are two possible mistakes we could make here:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;model is correct but fails the test, usually referred to as type 1 or alpha error (false positive) &lt;/li&gt;    &lt;li&gt;model is incorrect but passes the test, usually referred to as type 2 or beta error (false negative) &lt;/li&gt; &lt;/ul&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="499"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="164"&gt;&amp;nbsp;&lt;/td&gt;        &lt;td valign="top" width="167"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Model is correct&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="166"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Model is incorrect              &lt;br /&gt;(alternative model is correct)&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="164"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Test is negative, meaning the m&lt;/strong&gt;&lt;strong&gt;odel passes the test&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="168"&gt;&lt;font color="#008000"&gt;correct result(probability=1-alpha, so called specificity of the test)&lt;/font&gt;&lt;/td&gt;        &lt;td valign="top" width="166"&gt;&lt;font color="#ff0000"&gt;type 2 error / beta error&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="163"&gt;         &lt;p align="center"&gt;&lt;strong&gt;Test is positive, meaning the model does not pass the test&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="169"&gt;&lt;font color="#ff0000"&gt;type 1 error / alpha error&lt;/font&gt;&lt;/td&gt;        &lt;td valign="top" width="166"&gt;&lt;font color="#008000"&gt;correct result (probability=1-beta, so called power or sensitivity of the test)&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;In order to get a better understanding of our test situation we can plot alpha and beta together into one chart:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-hQ4U5uUq7vo/Th3CGYQRrlI/AAAAAAAACPo/tRIW015kgU8/s1600-h/image_thumb13%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb13" border="0" alt="image_thumb13" src="http://lh5.ggpht.com/-n3uAu60-gZY/Th3CHJl7HKI/AAAAAAAACPs/xgt1rCJGeNw/image_thumb13_thumb.png?imgmax=800" width="425" height="239" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The green line shows the probability for more than N cancellations in our correct model. The red line shows the probability for less than N cancellations in our wrong model. The bigger our threshold gets (the more we get to the right side in the diagram) the&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;lower our alpha gets (lower risk for the type 1 error “model is correct but does not pass the test”) &lt;/li&gt;    &lt;li&gt;higher our beta gets (higher risk for type 2 error “model is incorrect but does pass the test”) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Here are some sample values for different values of T&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="0" width="302"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="center"&gt;&lt;b&gt;T&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="center"&gt;&lt;b&gt;Alpha&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="center"&gt;&lt;b&gt;Beta&lt;/b&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2780&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;53.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;1.8&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2790&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;44.3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;3.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2800&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;34.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;5.2&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2810&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;26.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;8.4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2820&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;19.3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;12.7&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2830&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;13.5&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;18.6&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2840&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;9.0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;25.9&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2850&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;5.7&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;34.1&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2860&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;3.4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;43.5&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2870&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;1.9&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;53.0&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;2880&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="106"&gt;         &lt;p align="right"&gt;1.1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="97"&gt;         &lt;p align="right"&gt;62.2&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;If we want to keep alpha below 5% you can see that beta will over 35%. On the other hand, if we try to keep beta below 5% alpha we will over 35%. The “best” values for alpha/beta or not simply the value for T where alpha is almost equal to beta (here T=2826, alpha and beta approx. 16%) but depends on the error that you are wanting to minimize. In our example above we demanded alpha&amp;lt;10% which resulted in T=2830 and beta being about 25%.&lt;/p&gt;  &lt;p&gt;In our case the reason for keeping alpha low would be to avoid readjusting the model too often (causing costs). On the other hand, keeping beta low reduces the risk of working with a wrong model and potentially loosing more customers. Basically this decision has to be made before actually defining the threshold value T.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;For this example we defined a very simple test (number cancellations &amp;lt; 2838) which satisfies these two criteria:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;It is unlikely (&amp;lt;10%) that our model does not pass the test although it is correct &lt;/li&gt;    &lt;li&gt;It is unlikely (&amp;lt;25%) that our model passes the test although it is wrong (goes off by 3%) &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;While the 1st criteria means we’re not loosing more customers as expected, the 2nd criteria means we’re not spending too much budget on fine tuning the model.&lt;/p&gt;  &lt;p&gt;It should be stated that the above calculations are done on modified (randomized) data. In practical life it can be more difficult to find a proper test for the model and also the tradeoff between error 1 and 2 can be much higher.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-390765911967308161?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/390765911967308161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/09/do-you-trust-your-data-mining-results.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/390765911967308161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/390765911967308161'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/09/do-you-trust-your-data-mining-results.html' title='Do you trust your Data Mining results? – Part 1/3'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-DPbS1bDh638/Th3CC1AuemI/AAAAAAAACPU/tQOe7R_tnMo/s72-c/image_thumb3_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-3320990618741620647</id><published>2010-09-17T17:25:00.001+02:00</published><updated>2010-09-17T17:25:45.291+02:00</updated><title type='text'>BI.Quality 2.0 released on Codeplex</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;Today we released the new version of &lt;a href="http://biquality.codeplex.com/"&gt;BI.Quality on Codeplex&lt;/a&gt;. The new release contains many features from the wishlist (check the &lt;a href="http://biquality.codeplex.com/releases/view/44018"&gt;release notes&lt;/a&gt; for details):&lt;/p&gt;  &lt;li&gt;Consistent table interface within one test case e.g. to create tables via local files or query &lt;/li&gt;  &lt;li&gt;Unlimited number of queries within one test case&lt;/li&gt;  &lt;li&gt;New XmlTableWriter and XmlTableReader to export/import a table &lt;/li&gt;  &lt;li&gt;New CsvTableWriter and CsvTableReader to export/import a table &lt;/li&gt;  &lt;li&gt;Assert for query execution time &lt;/li&gt;  &lt;li&gt;Consistent delta within Asserts, absolute or relative &lt;/li&gt;  &lt;li&gt;Extended error handling, e.g. the AssertTable breaks no more after the first failure.&lt;/li&gt;  &lt;li&gt;Extended example test suite with all best practice test cases based on Adventure Works DWH 2008 &lt;/li&gt;  &lt;li&gt;All changes are fully downwards compatible to Version 1.0.0 &lt;/li&gt;  &lt;li&gt;Complete refactoring of the codebase for better extendibility &lt;/li&gt;  &lt;li&gt;Improved error messages and self tests&lt;/li&gt;  &lt;p&gt;There are only minor changes between 1.9.7 and 2.0.0. Some fixes include the enhanced downwards compatibility for release 1.0 test cases although it is recommended to use the 2.0 syntax for developing new test cases.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-3320990618741620647?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/3320990618741620647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/09/biquality-20-released-on-codeplex.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3320990618741620647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3320990618741620647'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/09/biquality-20-released-on-codeplex.html' title='BI.Quality 2.0 released on Codeplex'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6230464489885371526</id><published>2010-08-28T16:00:00.001+02:00</published><updated>2011-07-13T18:08:23.965+02:00</updated><title type='text'>Data Mining on a small amount of data</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;Data Mining is usually associated with finding previously unknown patterns in a large amount of data. But also a small amount of data may contain patterns, that are difficult to spot. In order to illustrate this, let’s look at the following situation. A customer with 80 stores wants to understand why some stores perform better than others. This is especially important before setting up new stores. It would be great to estimate the performance of the new store before building it up. This would make it much easier to decide on future store locations. Also, it might be useful for optimizing the performance of the existing stores. In order to find out about this, the customer collects some data per store as shown in the following table:&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="565"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="198"&gt;&lt;strong&gt;Criteria&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="365"&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Sales area&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Area for sales (in square meters)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Total working hours per week&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Total working hour for all staff members per week (avg over the last 3 months)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Total opening hours per week&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Total hours that the store is open per week&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Location&lt;/td&gt;        &lt;td valign="top" width="365"&gt;E.g. City center, City or Outskirts&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Location type&lt;/td&gt;        &lt;td valign="top" width="365"&gt;E.g. Mall, Plaza or separated&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Store interior status&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Condition of the store, e.g. modern, average or old&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Store age in months&lt;/td&gt;        &lt;td valign="top" width="365"&gt;How old is this store&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Parking facilities&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Is it easy to park near the store? E.g. values include good, average, bad&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Average parking costs per hour&lt;/td&gt;        &lt;td valign="top" width="365"&gt;What are the average parking costs per hour? Zero means free parking&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Number of competitors within 10 minutes walk distance&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Number of competitors within 10 minutes walk distance&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Number of competitors within 15 minutes driving distance&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Number of competitors within 15 minutes driving distance&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Buying Power&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Buying power of the people how live near the store ranging from very low to very high.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="198"&gt;Sales amount per week&lt;/td&gt;        &lt;td valign="top" width="365"&gt;Average over the last three months&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;The data can be retrieved by querying the IT systems (for example HR and ERP), by using a survey (usually the staff in the store knows about the competitors and parking facilities around) or by using external market research data (for the Buying power). In my case, the data is just generated sample data:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-sgXVGVGzHDc/Th3CiKCiErI/AAAAAAAACPw/3KnOsKiNMtI/s1600-h/image_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh6.ggpht.com/-JJPyA7J6L5Q/Th3Cj6uBIFI/AAAAAAAACP0/WCCzgqXEdI8/image_thumb2_thumb.png?imgmax=800" width="580" height="256" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;After gathering all the data, things got more complicated as expected. How do you “score” each store? Which of the parameters are most relevant? Even if we’re only having 80 rows of data, it is not at all easy to see the dependencies.&lt;/p&gt;  &lt;p&gt;For this example I’m using the &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=af070f2c-46f4-47b6-b7bf-48979b999aeb&amp;amp;displaylang=en"&gt;Microsoft Office Data Mining Add-In&lt;/a&gt; so we can do all the data analysis using Excel. Since it’s Excel everything should be very easy. We want to use the Microsoft Decision Tree algorithm (Icon “Classify” from the ribbon bar):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-irzuutvA1bw/Th3ClO0x_1I/AAAAAAAACP4/yjjBWj5X7qI/s1600-h/image_thumb5%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh6.ggpht.com/-XlXyrkhpD8w/Th3CmhOqCHI/AAAAAAAACP8/GB7RCy7nOZA/image_thumb5_thumb.png?imgmax=800" width="586" height="142" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The process is pretty easy. We have to decide which attribute we want to predict (Sales amount per week) and the wizard does all the rest. Now, here is the complete resulting decision tree:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-SQeu6O0wg8E/Th3CnaPRzmI/AAAAAAAACQA/chxCEtQBovg/s1600-h/image_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh6.ggpht.com/-U4IvitHQWro/Th3CnwsFiiI/AAAAAAAACQE/4550lEMYP34/image_thumb3_thumb.png?imgmax=800" width="208" height="72" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;No branches? What did we do wrong? Well, here are a few steps we should have taken, before simply invoking the decision tree. The most important thing is the proper preparation of the data:&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;1. Create relative measures, not absolute ones&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The decision tree is capable of detecting rules like “if A then B” or “if A then not B” or even complicated combinations. However, dealing with continuous values is more difficult. The decision tree does not work quite well with rules like “if A is multiplied by 2, B is multiplied by 1.5” but tries to branch this as “if A is &amp;gt; 20 then B &amp;gt; 15”, “if A is &amp;gt;10, then B &amp;gt;7.5”. This might be especially true with input variables like our sales area or opening hours.&lt;/p&gt;  &lt;p&gt;To quickly analyze the relationship we can use Excel’s Scatter chart type. Let’s start with the store size:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-iS5JHjaH8U0/Th3Cofr9q_I/AAAAAAAACQI/pENDJs_cXVc/s1600-h/image_thumb7%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh6.ggpht.com/-oafq6YW4-Fo/Th3CpXV_-4I/AAAAAAAACQM/DA-nbAHRz8U/image_thumb7_thumb.png?imgmax=800" width="447" height="249" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;From this chart it seems reasonable to calculate sales by square meter instead of taking the absolute sales amount as there seems to be a more or less linear relationship between the sales and the store size.&lt;/p&gt;  &lt;p&gt;Now let’s have a look at the influence of the opening hours. Again we’re using Excel’s Scatter chart:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-9XgZ26QjQWw/Th3Cp-Vm59I/AAAAAAAACQQ/d2AdNjD7N1E/s1600-h/image_thumb9%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb9" border="0" alt="image_thumb9" src="http://lh3.ggpht.com/-gimDIMmU98o/Th3CqlhTNeI/AAAAAAAACQU/ts0NeQcdLgw/image_thumb9_thumb.png?imgmax=800" width="454" height="323" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As the trend line shows, the relationship seems to be a little bit logarithmic. However, let’s assume it’s also linear. Therefore we’re going to create an additional column in our spreadsheet computing sales by square meter and opening hours. This is the formula for our new column “Relative Sales”:&lt;/p&gt;  &lt;p&gt;=[@[Sales Amount per week]]/[@[Sales Area]]/[@[Total opening hours per week]]&lt;/p&gt;  &lt;p&gt;Of course, you would like to also check the influence of other variables, for example the age of the store:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-8i8OTJPiO5g/Th3CrY1YYhI/AAAAAAAACQY/9ARNBsfccHE/s1600-h/image_thumb11%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh3.ggpht.com/-pHue-0JjScA/Th3CsM75mQI/AAAAAAAACQc/FP68ThpeDZY/image_thumb11_thumb.png?imgmax=800" width="465" height="279" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This one looks pretty scattered, so we’re just taking the age of the store as an input variable.&lt;/p&gt;  &lt;p&gt;But there is another relative measure we should create: The average number of sales persons in the store. We’re simply using this formula:&lt;/p&gt;  &lt;p&gt;=[@[Total working hours per week]]/[@[Total opening hours per week]]&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;2. Make the input parameters discrete if possible&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;While it’s always a good idea to use discrete values for our input parameters, it almost becomes a must if you’re not having many rows of data. An example for a good discrete value is our location as it can only take these values: City center, City or Outskirts. The fewer the number of buckets, the better is it. If you’re not getting any results, try making the data more simple by choosing less buckets for your discrete values.&lt;/p&gt;  &lt;p&gt;But look at our newly created columns for the relative sales or the average staff:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-k8Uyq2jCTGg/Th3CtkifRwI/AAAAAAAACQg/LWreBoSdJDQ/s1600-h/image_thumb13%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb13" border="0" alt="image_thumb13" src="http://lh4.ggpht.com/-9NPHA4CMGAc/Th3CvRO9fzI/AAAAAAAACQk/wGMub7EntIY/image_thumb13_thumb.png?imgmax=800" width="396" height="331" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;For our Mining purpose these are too many distinct values and although our decision tree will try to cluster them, we should do this in advance. Therefore we can either use Excel formulas or we could use the functionality of the data mining add-in: The “Explore Data” wizard:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-dXO03mK1qT4/Th3CwEI28RI/AAAAAAAACQo/fenCCuayIHU/s1600-h/image_thumb14%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb14" border="0" alt="image_thumb14" src="http://lh5.ggpht.com/-NUPyAfr3osI/Th3CwyTocoI/AAAAAAAACQs/Cxdo_uwexLg/image_thumb14_thumb.png?imgmax=800" width="244" height="153" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;After selecting the table and the column, the wizard analyzes the data and proposes some buckets as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-SwQzJbWp_7s/Th3CxoGPT_I/AAAAAAAACQw/oqRt3uIlQD0/s1600-h/image_thumb16%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb16" border="0" alt="image_thumb16" src="http://lh6.ggpht.com/-VPsaycLEMEQ/Th3CyRXvakI/AAAAAAAACQ0/0jOtYPZOltw/image_thumb16_thumb.png?imgmax=800" width="244" height="215" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;For my purpose, I reduce the number of buckets to three (be brave!). By clicking the “Add new Column” button the resulting values are added as an additional column to our table:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-VSLkO8qaQ08/Th3CzOQfvKI/AAAAAAAACQ4/nTff31c9e8U/s1600-h/image_thumb17%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb17" border="0" alt="image_thumb17" src="http://lh6.ggpht.com/-M1PfzmEAjyI/Th3Cz08lLNI/AAAAAAAACQ8/0eilYNZPMJU/image_thumb17_thumb.png?imgmax=800" width="244" height="215" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;I’m doing the same for the average staff members (4 buckets), the total opening hours (4 buckets) the store age in months (5 buckets) and the parking costs (4 buckets):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-SHZwN2LDOcc/Th3C04_XY2I/AAAAAAAACRA/RHDkFPIn5V4/s1600-h/image_thumb20%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb20" border="0" alt="image_thumb20" src="http://lh5.ggpht.com/-yovZbpFxa7E/Th3C2Nhr4GI/AAAAAAAACRE/lJKzx0oGSHc/image_thumb20_thumb.png?imgmax=800" width="570" height="197" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Our number of competitors is here 0-3 and 0-5, so we leave this data unchanged (not too many buckets).&lt;/p&gt;  &lt;p&gt;So let’s try again with our prepared data set. Now, our decision tree looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-15aV02D7UF0/Th3C2md89UI/AAAAAAAACRI/DCBYvO1O_RA/s1600-h/image_thumb21%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb21" border="0" alt="image_thumb21" src="http://lh3.ggpht.com/-K9LSZSnK8KE/Th3C3dnAW4I/AAAAAAAACRM/mDpv5DjPT9E/image_thumb21_thumb.png?imgmax=800" width="244" height="95" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can see there are only three influences identified by the mining algorithm here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-Cs2TcjNWmIU/Th3C4FuX61I/AAAAAAAACRQ/9egiu2ZbThU/s1600-h/image_thumb22%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb22" border="0" alt="image_thumb22" src="http://lh4.ggpht.com/-H53fuwaAoXs/Th3C48qfZTI/AAAAAAAACRY/7ePBAuOeDdI/image_thumb22_thumb.png?imgmax=800" width="244" height="141" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;So, if parking is free, then the parking facilities are important while if parking is not free, the opening hours are important. This is a good start for looking for new store locations.&lt;/p&gt;  &lt;p&gt;Another nice tool from Excel’s data mining add-in is the prediction calculator which can be found on the “Analyze” ribbon:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-hZhgMVtaD4A/Th3C5qa1vJI/AAAAAAAACRg/FKEe1MvS7d0/s1600-h/image_thumb23%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb23" border="0" alt="image_thumb23" src="http://lh5.ggpht.com/-D5KIzgiWfXk/Th3C6e5CY6I/AAAAAAAACRo/Hzg6SHpGOtA/image_thumb23_thumb.png?imgmax=800" width="244" height="63" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This one creates a ready to use input sheet in which you can enter the values of a potential new store and Excel immediately computes the likeliness for high sales:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-9_oRO4WyLpc/Th3C7On0FqI/AAAAAAAACRw/0BCvgI0FQF4/s1600-h/image_thumb24%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb24" border="0" alt="image_thumb24" src="http://lh4.ggpht.com/-2e4-IUrXp6A/Th3C71HrL9I/AAAAAAAACR4/xUacVwGpPUQ/image_thumb24_thumb.png?imgmax=800" width="244" height="130" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This calculator is based on a different mining model (Logistic Regression). You can also see the impact of the input values on your sales:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-xATfHVuax4s/Th3C8k9hAiI/AAAAAAAACSA/3rkyXbTMC3g/s1600-h/image_thumb25%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb25" border="0" alt="image_thumb25" src="http://lh3.ggpht.com/-vAMnIT_4_80/Th3C9riTaMI/AAAAAAAACSI/MQXyqC_zvj4/image_thumb25_thumb.png?imgmax=800" width="196" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In our case, smaller stores seem to perform better. The high value at the opening time below 41.5 hours may indicate that our computed column from above might not be well designed. And while there are quite a lot of stores with a sales area of less than 57 square meters there is only one single store which is already opened for 52 months: Our first store, which is definitely special (always equipped in the most modern style and only selected staff members are chosen to work there). Maybe you want to take this out of the data before doing the analysis.&lt;/p&gt;  &lt;p&gt;So, after the mining you have to review your results properly. And of course you should also verify the other methods of making sure, your model is working fine (lift chart, case support as from my last post etc.).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6230464489885371526?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6230464489885371526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/08/data-mining-on-small-amount-of-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6230464489885371526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6230464489885371526'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/08/data-mining-on-small-amount-of-data.html' title='Data Mining on a small amount of data'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-JJPyA7J6L5Q/Th3Cj6uBIFI/AAAAAAAACP0/WCCzgqXEdI8/s72-c/image_thumb2_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-1601811168249058231</id><published>2010-07-13T12:35:00.001+02:00</published><updated>2011-07-13T18:09:25.898+02:00</updated><title type='text'>How much support do you need for your Data Mining results?</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;When querying your SSAS mining model you may have noted the prediction function predictsupport which is supplied by most of the mining algorithms, for example for the decision tree. The mining algorithm does not only predict a certain variable and gives the prediction score but also tells us how many cases were used to base this decision on (case support). You might have the feeling that cases with a lower support are not that reliable compared to cases with a higher support. This post is about how to determine the needed support for a given model.&lt;/p&gt;  &lt;p&gt;To make this theoretical topic a little bit more practical, let’s look at actual data. For this post I’m using the targeted mailing decision tree model (TM Decision Tree) of the Adventure Works SSAS sample database. While you can do the same process described here with a data mining query result, I’m looking at the model itself instead. In order to do so I run the following query on my SSAS database:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;SELECT FLATTENED [NODE_CAPTION]      &lt;br /&gt;,[NODE_DISTRIBUTION]       &lt;br /&gt;,[NODE_SUPPORT]       &lt;br /&gt;,[NODE_DESCRIPTION] &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;FROM [TM Decision Tree].content &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;WHERE IsDescendant('', '000000001') AND [CHILDREN_CARDINALITY]=0&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;I left a lot of interesting columns out here to save the space for the query result. There are many more fields you can query (some depending on the mining algorithm).&lt;/p&gt;  &lt;p&gt;By selecting “children cardinality equals zero”, we only get the child elements from our decision tree. And since we used the flattened keyword our NODE_SUPPORT element is returned as multiple lines here (one line per possible value, try removing FLATTENED from the query above to see the difference). So this is how the result looks like:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-LP_kWaqTrZU/Th3C4aXoF_I/AAAAAAAACRU/fLATi889HS4/s1600-h/image_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh3.ggpht.com/-gtKhE88wlas/Th3C5f4gDbI/AAAAAAAACRc/wkyDqGcENa0/image_thumb2_thumb.png?imgmax=800" width="528" height="292" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Since our Bike Buyer attribute can have three states (Missing, 0=no bike buyer, 1=bike buyer) we get three lines per node in our flattened decision tree model query. For example the first three lines belong to the node labeled “Year Income &amp;lt; 58000” which can be found at the top of the lower half in the decision tree model viewer:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-FTuTpG8nm0g/Th3C6O8TIDI/AAAAAAAACRk/z9GQwq9rlGg/s1600-h/image_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh3.ggpht.com/-Tv5fbBkMF-Q/Th3C67liBcI/AAAAAAAACRs/FAsKPnzIE3Q/image_thumb3_thumb.png?imgmax=800" width="124" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;If you look at the details of this node in the mining model viewer, you will see the following values:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-I8RJXKi03yE/Th3C7R5QhLI/AAAAAAAACR0/BmkTQrYd1WY/s1600-h/image_thumb4%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh6.ggpht.com/-2JUmcR4KDZE/Th3C8Lh8GyI/AAAAAAAACR8/VkLd_uXjSls/image_thumb4_thumb.png?imgmax=800" width="244" height="56" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This is almost exactly what we have in our table above. For this node that Mining algorithm found 1362+465=1827 cases that matched the conditions of the node. You can see the full list of conditions in the NODE_DESCRIPTION column (which I left out here). In my case this is&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Number Cars Owned = 2 and Region not = 'Pacific' and Yearly Income &amp;lt; 58000&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So from all the rows that matched this criteria 1362 had a Bike Buyer variable value of 0 and 465 had a value of 1 during the training of the model. The probability is then computed by the number of the cases, for example 100*465/1827 gives approx. 25.46%. Actually, the decision tree calculates the probability in a different way. For many cases this is very close to the quotient above, but there are differences. If you want to be exactly sure to control the method of calculation you may want to calculate the probability score on your own (for the example above this means to calculate 100*465/1827 instead of taking the probability provided by the mining algorithm).&lt;/p&gt;  &lt;p&gt;If we use this model for prediction, probabilities above 50% will be mapped to the positive result. For our example, we want to predict the “Bike Buyer” variable. For cases that fall into the node displayed above, the result will be 0 (as the probability for zero is greater than 50%) meaning that these contacts are unlikely to buy a bike.&lt;/p&gt;  &lt;p&gt;In order to proceed, I focus on the prediction value of Bike Buyer = 1. I copy the results from our query above to Excel and removed all lines for result 0 or missing. I also did some formatting of the table. These are the first rows of the resulting table. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-tUhYcq6b4JM/Th3C9d1JFCI/AAAAAAAACSE/RX70e5RECCo/s1600-h/image_thumb6%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh3.ggpht.com/-wY7l1uLs4Wk/Th3C-jCDPLI/AAAAAAAACSM/k6K2X5Q4BPk/image_thumb6_thumb.png?imgmax=800" width="441" height="288" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In order to decide whether this support is good or bad, we have to decide on which basis we want to do the decision:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Decision based on the predicted value &lt;/li&gt;    &lt;li&gt;Decision based on the predicted values probability &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In the first case, we’re only using the predicted value from our mining. In our example it would be a prediction of the Bike Buyer variable as a value of true or false. Our concern would be that our support does favor the alternative decision with a high probability. &lt;/p&gt;  &lt;p&gt;In the second case, we’re taking the real prediction probability for some further calculation, for example for the back testing (I will discuss this during my next posts) or for calculation expectancy values of our prediction (for example expected costs).&lt;/p&gt;  &lt;p&gt;Let’s start with the first case here.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;u&gt;1. Decision based on the predicted value&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This case is based on some rule to derive the prediction variable from the prediction score and optionally further attributes. SSAS does this on a very simple basis by changing the decision at 50%. But the rule can be very different. You may consider all rows with probability &amp;gt;30% as potential bike buyers or you may include your estimated revenue. You could also use the profit analysis that is built into SQL Server Data Mining to decide on the rule. But in any case, for your mining results you are only interested on the output of the rule (not on the “real” probability), in our case “true” or “false”.&lt;/p&gt;  &lt;p&gt;For our example, I’m using the “default” rule of 50%. In order to understand the effect of the support, let’s look at our concerns. First we start with a case having a probability higher than 50% (or whatever our threshold is), for example take a look at line 3 of the Excel table from above. This case has a probability of 61.28% meaning that 65 our of 106 rows had the Bike Buyer variable set to 1.In this case, our concern is:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer =1” is 50% or lower but still the random sample of data we used during training resulted in 65 positive rows.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;If this happened, our model would be choosing the wrong decision for all rows that are matching the node’s criteria. &lt;/p&gt;  &lt;p&gt;In order to deal with this concern statistically, I’m doing a simple and common trick here (maybe you should allow yourself some minutes to think about that). Let’s change the concern to&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer =1” is exactly 50% but still the random sample of data we used during training resulted in 65 or more positive rows.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;For this concern we can simple use Excel’s binomial function to compute the probability:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;1-BINOM.DIST(65,106,0.5,TRUE)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;I used the Excel 2010 function here. If you’re using Excel 2007 or before, the function name is BINOMDIST. The above function returns a value of 0.7%. So it is very unlikely that our model training was based on a wrong decision.&lt;/p&gt;  &lt;p&gt;For the lines with a probability of less than 50% our concerns are just the other way round (of course you could also take the same rules as above and look for “Bike Buyer=0”). For our example I reference the first line in the table above:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer = 1” in this specific node is 50% or higher but we still see 465 positive rows in our 1827 cases.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;Again, I’m transforming this to:&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer = 1” in this specific node is exactly 50% but we still see 465 or less positive rows in our 1827 cases.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In this case, we have to use this function to compute the probability:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;BINOM.DIST(465,1827,0.5,TRUE)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Again the probability is very low (almost exactly 0%).&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Let’s add this calculation to the Excel-table from above:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-frEpxWWqbPI/Th3C_8RikTI/AAAAAAAACSQ/nifIwgwtsOk/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh5.ggpht.com/-1k4xJJWAasQ/Th3DBWom-HI/AAAAAAAACSU/tO8QPk_zmIo/image_thumb1_thumb.png?imgmax=800" width="429" height="273" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The formula for E7 is&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;=IF(C7&amp;lt;0.5,BINOM.DIST(B7,D7,0.5,TRUE),1-BINOM.DIST(B7,D7,0.5,TRUE))&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This formula is then copied for all lines below.&lt;/p&gt;  &lt;p&gt;How do we read this table? To keep things simple, low values in the last column are good (as in our two example values from above). The closer the predicted probability gets to 50% (or in other words, the weaker the prediction gets) the more support is needed to make it relevant. &lt;/p&gt;  &lt;p&gt;How do you work with this table? Let’s say you want to be 95% sure that the support is strong enough to prevent our mining system from being trained for the wrong response. In this case you would filter out nodes from the table above that have a value of 5% or above in the last column. This is the result&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-6V5UHZug1dc/Th3DCP1bfxI/AAAAAAAACSY/-aOAia4Jm3Q/s1600-h/image_thumb5%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh4.ggpht.com/-86loGqmcI28/Th3DDTHVnwI/AAAAAAAACSc/9E80f2H4B_s/image_thumb5_thumb.png?imgmax=800" width="439" height="115" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In this case we would have filtered out 608 from 12939 cases used for the support (approx. 4.7%). What are you going to do with these cases? Well, this depends on your mining question. Typically you would consider these nodes to be bike buyers if you want to minimize the risk of loosing potential customers or you would consider them as not being bike buyers if your goal is to reduce costs (by excluding these cases from the mailing). In any case, you will have to ignore the response from the data mining process if you do not want to risk being misled.&lt;/p&gt;  &lt;p&gt;In our case it is ok to drop 608 cases, but what do you do if all cases have a very poor support? In this case, you should review your model carefully. Seeing a lot of cases with a low support usually means that your model is “over trained”. For example, your decision tree model has too many nodes. Mining works best when the real world can be simplified. For the decision tree you might try to increase the required support or set a higher value for the complexity penalty parameter.&lt;/p&gt;  &lt;p&gt;The more you want to be sure that your case support is good enough the more cases you will have to drop. For our example here are a few values:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-fVTdnDKIfaI/Th3DEZvlYlI/AAAAAAAACSg/FF_1YBHrId8/s1600-h/image_thumb7%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh6.ggpht.com/-DI80mDnyoIw/Th3DEzSRp1I/AAAAAAAACSk/y4OjTbSEWaQ/image_thumb7_thumb.png?imgmax=800" width="455" height="55" /&gt;&lt;/a&gt; &lt;/p&gt;      &lt;p&gt;&lt;strong&gt;&lt;u&gt;2. Decision based on the predicted values probability&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In this case we’re not so much interested in a simple decision (bike buyer true or false) but more about the probability for each case. This also means we’re having to consider different concerns (although similar compared to the ones above). The whole process is very similar to the one above and if you are not too much interested in the outcomes you can just skip to the conclusion.&lt;/p&gt;  &lt;p&gt;Basically, we only need to replace the 50% in the example above with an alternative model’s probability derived from the predicted probability. However, as described above, our concerns differ in the cases the favor a positive or negative result. So for our example, we’re considering a cut-off at 50% again. &lt;/p&gt;  &lt;p&gt;Let’s start with the lines having a probability higher than 50%. For example, let’s look at the 3rd line from the table above. The probability found during training was 61.28% that is, 65 of the 106 rows hat the Bike Buyer variable set to 1.&lt;/p&gt;  &lt;p&gt;In this situation our concern is that the real probability is less than 61.28% but still the random sample we used during training resulted in 65 positive rows. Ok, if the probability is still 61.279% this wouldn’t make much difference. Let’s try with a difference of 3%, so our concern (already transformed as described above) is:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer = 1” in this specific node is exactly 58.28% but we still see 65 or more positive rows in our 106 cases.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Again we’re using Excel’s binomial function to compute the probability:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;1-BINOM.DIST(65,106,0.5828,TRUE)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The above functions returns a value of 23.26%. Although it is below 50% it is still significant (compare this to the 0.7% from above).&lt;/p&gt;  &lt;p&gt;For the lines with a probability of less than 50% our concerns are just the other way round. For our example I use the first line from the table above as a reference again:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;The real probability for “Bike Buyer = 1” in this specific node is exactly 28.46% but we still see 465 or less positive rows in our 1827 cases.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;To compute this in Excel you have to take the following formula:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;BINOM.DIST(465,1827,0.2846,TRUE)&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;In the case we get a result of 0.22% meaning it is extremely unlikely.&lt;/p&gt;  &lt;p&gt;I added this calculation as an additional column to my Excel table from above. This is the result:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-S45StYnc_tE/Th3DGdSeseI/AAAAAAAACSo/o0hPApoWICo/s1600-h/image_thumb9%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb9" border="0" alt="image_thumb9" src="http://lh6.ggpht.com/-l0GMBmEsfAQ/Th3DINBUTFI/AAAAAAAACSs/hbbLy5oFS2U/image_thumb9_thumb.png?imgmax=800" width="451" height="324" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The formula for E7 is&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;=IF(C7&amp;lt;0.5,BINOM.DIST(B7,D7,C7+$B$4,TRUE),1-BINOM.DIST(B7,D7,C7-$B$4,TRUE))&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;This formula is then copied for all lines below.&lt;/p&gt;  &lt;p&gt;Again we can now filter the table for whatever sureness we would like to have. If we want to be 80% sure that your case support does not favor a model that is 3% off, we would filter all lines from the table above that are having a value higher than 20% in the last column.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-Nyi85_5UW08/Th3DJZA2y1I/AAAAAAAACSw/BTb4dyVo-G8/s1600-h/image_thumb11%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh3.ggpht.com/--v0iIjrhZps/Th3DLIz9cHI/AAAAAAAACS0/neeWMniSfQE/image_thumb11_thumb.png?imgmax=800" width="460" height="291" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;A lot or nodes do remain in this case (a lot more than in our discrete example from above): In this case we would have to ignore about 19.4% of our predictions.&lt;/p&gt;  &lt;p&gt;Here are some sample results for the cases that need to be left out:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-hoIrcPUk1f4/Th3DL8JWkbI/AAAAAAAACS8/B1fw59Knt9E/s1600-h/image_thumb13%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb13" border="0" alt="image_thumb13" src="http://lh6.ggpht.com/-EQ6ch3YqY5s/Th3DMxjpGVI/AAAAAAAACTE/beOB5gJxXZM/image_thumb13_thumb.png?imgmax=800" width="463" height="134" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As expected, the more you want to be sure the more cases you have to drop. Also the less tolerance you allow for your model, the more cases you have to drop. So it’s up to you to find the best mix of security and usability of the model.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-1601811168249058231?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/1601811168249058231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/07/how-much-support-do-you-need-for-your.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1601811168249058231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1601811168249058231'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/07/how-much-support-do-you-need-for-your.html' title='How much support do you need for your Data Mining results?'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-gtKhE88wlas/Th3C5f4gDbI/AAAAAAAACRc/wkyDqGcENa0/s72-c/image_thumb2_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6771198205167411534</id><published>2010-07-11T11:17:00.001+02:00</published><updated>2011-07-13T18:10:07.735+02:00</updated><title type='text'>How to use an Excel 2003 file as a datasource for an SSAS cube</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008R2&lt;/p&gt;  &lt;p&gt;Ok, usually you wouldn't want to use an Excel file directly as a datasource for an SSAS cube in real life scenarios. But just in case you'd like to set up a quick demo without bothering to create a new database using Microsoft SQL Server or maybe Microsoft Access, the ability of sourcing your cube from an Excel file could be more than welcome. Just imagine you want to try some design ideas. While working with SQL server databases you would end in a large amount of test databases or you would need to backup/restore your databases all the time to test different scenarios. With Excel as a source for your cubes you could put your test data right into your SSAS solution. In order to modify the datasource you can simple make a copy of your Excel file (for backing up the older version) or of your solution instead of caring about databases. And even if you don't have the databases installed you can use any of your testing solutions by just opening the solution as the data becomes part of the solution.&lt;/p&gt;  &lt;p&gt;Sounds good, doesn't it? But how can you do so? First, when trying to set Excel as an OLE-DB source you will notice that it just isn't there.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-5YuTOH__1YQ/Th3DL56ccKI/AAAAAAAACS4/Hh904LohTaI/s1600-h/image_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-puAwoysm-A8/Th3DM7SL-9I/AAAAAAAACTA/LGQ90j40mTw/image_thumb1_thumb.png?imgmax=800" width="432" height="253" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;But what we can do, is to use the Microsoft Jet 4.0 OLE DB Provider. So that's where we start. The next dialog asks us to provide the database file name. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-EruG7fYyz-w/Th3DNlepDnI/AAAAAAAACTI/hfIJT1iAhbw/s1600-h/image3_thumb1%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image3_thumb1" border="0" alt="image3_thumb1" src="http://lh3.ggpht.com/-1G0G_T0oIBE/Th3DOSY8gQI/AAAAAAAACTM/29qGmg4bM9g/image3_thumb1_thumb.png?imgmax=800" width="435" height="450" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;If you click on the 'Browse...' button you will notice that the selection is limited to .mdb-files or .accdb-files as the Jet OLE-DB provider as usually used with Microsoft Access databases. So we just change our file type selection to 'All files' and pick our Excel file.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-I8Xe6Ywbn4g/Th3DPrtivXI/AAAAAAAACTQ/weP9PkzYS4o/s1600-h/image6_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image6_thumb2" border="0" alt="image6_thumb2" src="http://lh3.ggpht.com/-oNrSN3ggSZU/Th3DQt6HV5I/AAAAAAAACTU/S0U7hDCdO_Q/image6_thumb2_thumb.png?imgmax=800" width="437" height="321" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now, if you click on 'Test connection' you will get an error message like the one below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-jpfCNdF20aE/Th3DRWQ-XTI/AAAAAAAACTY/oPxsOks_Wjw/s1600-h/image9_thumb3%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image9_thumb3" border="0" alt="image9_thumb3" src="http://lh4.ggpht.com/-xOKMB2mL-eE/Th3DSPiGCuI/AAAAAAAACTc/JVQfagBln5A/image9_thumb3_thumb.png?imgmax=800" width="438" height="92" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Our Jet OLE-DB provider still believes, that we are connecting to a Microsoft Access database file and therefore it cannot connect. So here comes the really important step. We open our connection again, click on the 'Edit' button to edit the connection string and then we switch to the 'All' tab of the connection properties.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-qjN2YU0MhFg/Th3DSxb7xBI/AAAAAAAACTg/mAdVb6Sy-Ts/s1600-h/image12_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image12_thumb2" border="0" alt="image12_thumb2" src="http://lh5.ggpht.com/-U4cYeDwp3qY/Th3DUOifuNI/AAAAAAAACTk/7l5vKJAF424/image12_thumb2_thumb.png?imgmax=800" width="437" height="452" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As shown in the screenshot we have to set the extended properties to 'Excel 8.0;HDR=Yes;IMEX=1;'.&lt;/p&gt;  &lt;p&gt;Excel 8.0 stands for Excel 2003 (I couldn't get Excel 2007 to connect properly using 'Excel 9.0', so I stayed with the Excel 2003 format here). 'HDR=Yes' means that our Excel tables contain headers.&lt;/p&gt;  &lt;p&gt;After that, a click on 'Test Connections' gives the desired result:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/--Nvv-BOfvks/Th3DUuix5gI/AAAAAAAACTo/4weW9nLxkjU/s1600-h/image15_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image15_thumb2" border="0" alt="image15_thumb2" src="http://lh5.ggpht.com/-66SDy6Nm1mA/Th3DVmNtW5I/AAAAAAAACTs/x76-DAcMZxU/image15_thumb2_thumb.png?imgmax=800" width="444" height="84" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Of course, we now need to build up our Excel-file. Each &amp;quot;source table&amp;quot; sits on its own sheet. You can easily build up some time dimension or use Excel functions like RAND() to create random fact data or VLOOKUP(...) to link your tables with testing data to each other.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-VDQIsMSHdus/Th3DWYC7DuI/AAAAAAAACTw/A8hB_-fFuXM/s1600-h/image18_thumb2%25255B2%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image18_thumb2" border="0" alt="image18_thumb2" src="http://lh6.ggpht.com/-aR6L-_DZozI/Th3DXiPG1ZI/AAAAAAAACT0/4A01qhc1RZs/image18_thumb2_thumb.png?imgmax=800" width="447" height="396" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Although this is not at all useful for real life situations (as we would extract the data from the Excel sheet using ETL tools or simply not storing the source data in Excel at all), this might still be useful in order to set up a quick and dirty example solution and play around by modifying the source data (add columns, use different formats etc.) without the need to work on a 'real' database.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6771198205167411534?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6771198205167411534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/07/how-to-use-excel-2003-file-as.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6771198205167411534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6771198205167411534'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/07/how-to-use-excel-2003-file-as.html' title='How to use an Excel 2003 file as a datasource for an SSAS cube'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-puAwoysm-A8/Th3DM7SL-9I/AAAAAAAACTA/LGQ90j40mTw/s72-c/image_thumb1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-3972366339986346919</id><published>2010-06-12T11:51:00.001+02:00</published><updated>2010-06-12T11:51:16.825+02:00</updated><title type='text'>10 Tips for every SSAS developer</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;Within BIDS, our development environment for SSAS applications, the Cube Wizard does a great job. With the wizard you get all the dimensions, the cube and the proper links between the cube and the dimensions. It even takes care of role playing dimensions and names them accordingly. But although it does a great job, the created cube is not ready for the end users. Recently I saw some cubes being just in the state where the wizard left them. So today’s post is about the most essential steps to perform after the cube wizard has finished creating the cube. I don’t go into too much detail here. Some of the topics have been addressed in other posts. Maybe I’m writing about other topics in upcoming posts. So the following list is more like a check list to verify your development.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;1. Make the cube user friendly / do not use technical names &lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;While the operational databases belong to a technical area, the OLAP solution has to be clear and easy to understand for the end users. Avoid technical names, use abbreviations only if they are commonly known and well documented and don’t prefix dimensions with ‘Dim’ or ‘D_’ or something like that (same with the fact-tables). The user wants to analyze sales by time not FactSales by DimTime.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;2. Check all dimensions for their attributes&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The cube wizard cannot know which attributes are important for a dimension. Check each attribute carefully. Delete attributes that are not needed at all. It does not make much sense to develop for future requirements. This only makes the cube harder to understand for the end users. Also, think about the usage of the attributes. For example, having an attribute “OptIn” for a CRM cube with members yes and no may not be ideal for usage in a pivot table as you would only see yes or no on the column/row. It is better to have “With OptIn” and “Without OptIn” as members because here, the meaning is immediately clear. Of course, numeric flags like gender being 0 or 1 also do not make much sense. Members should be clear and readable.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;3. Create appropriate attribute relationships and create meaningful hierarchies&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Attribute relationship-settings and hierarchies are the key to build a robust OLAP solution that enables query developers to write good MDX queries and also results in a better performance of the cube. However, attribute relationships can be tricky, so make sure you fully understand this topic. You should also use tools like &lt;a href="http://bidshelper.codeplex.com/"&gt;BIDSHelper’s&lt;/a&gt; dimension health check to make sure that your attribute relationship really matches the source data.     &lt;br /&gt;For further explanation see &lt;a href="http://ms-olap.blogspot.com/2008/10/attribute-relationship-example.html"&gt;http://ms-olap.blogspot.com/2008/10/attribute-relationship-example.html&lt;/a&gt; or &lt;a href="http://ms-olap.blogspot.com/2008/11/turning-non-natural-hierarchy-into.html"&gt;http://ms-olap.blogspot.com/2008/11/turning-non-natural-hierarchy-into.html&lt;/a&gt;     &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;4. Provide properties as needed (not every attribute of your dimension is needed as an attribute hierarchy!)&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Define only those fields of your dimension table as attributes that you need either for hierarchies or as a filter or for pivot axis. Fields that are just informational (like telephone number of the employee for example) don’t make much sense as attribute hierarchies. Set the property “AttributeHierarchyEnabled” to false for these attributes. They are then shown grayed-out in the dimension designer. However, you can still use them in your OLAP tool, for example in Excel. It is important to understand how your attribute relationships are defined here. &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;5. Set the format for measures and calculations&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The standard format for numbers is just a simple number format. This usually looks ugly in the cube browser or other frontend tools. It’s not a big deal to set the format for every measure. For example, you could use “#,##0.00” for numbers to get a nicely formatted two digit number representation. You should also do this for calculated measures and cube calculations.    &lt;br /&gt;For further explanation see &lt;a href="http://ms-olap.blogspot.com/2009/11/how-to-define-excel-compliant-format.html"&gt;http://ms-olap.blogspot.com/2009/11/how-to-define-excel-compliant-format.html&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;6. Set the proper dimension type. Especially, define a time dimension&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Setting the dimension type of your time dimension to “time” makes it much easier for your OLAP client (like Excel) to provide special time dependent filtering options. Also, if you’re planning to use semi-additive measures, the time dimension must be marked accordingly.    &lt;br /&gt;Furthermore you also need to qualify the type of the time dimension’s attributes (using Years, HalfYears, Quarters, Months and Date). This is necessary for MDX functions like YTD(…), MTD(…) and ParallelPeriod(…) and for the “Add Time Intelligence” wizard.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;7. Define the default measure for the cube&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The default measure is used, when no measure is on the axis or the slicer (where - expression) of the query. If not set, SSAS takes the first measure in the first measure group. This can easily change if new measures or measure groups are created in the cube. Queries that rely on the default measure (which is in turn not a good practice) will then result in a different result. That’s why it is important to set a default measure.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;8. Define the default member for dimensions/attributes which are not aggregatable&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Similar to the default measure for the cube, each dimension has a default member that is used, if the dimension is not explicitly present in the query. For example, think of a dimension ‘Scenario’ with members like Actual, Forecast, Midterm plan etc. It does not make sense to aggregate the different members so you will usually set the IsAggregatable property to false. In this case you don’t have an All-element and you should provide a default member (in our example it would usually be ‘Actual’). If not specified, the first member is used which could again lead to errors if new members are created or the member names change. Keep in mind, that there are three ways to specify the default member: in the dimension itself, in the cube script (also useful for role-playing dimensions with different names) and in the security role definition.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;9. Create a drill-through action for every measure group and make this the default action&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Using a drill-through action shows the detail data behind a given cell. However, the default drill-through method only shows the name information of the dimensions’ key attributes. Often enough we would see meaning less surrogate keys here which do not make sense for our end users. By defining a drill through action the developer can set the displayed attributes for each dimension making the result of the drill through query much more readable. In addition, making this action the default action also works well with clients like Microsoft Excel. Here, a double-click on a cell triggers the default action.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;10. Properly deal with unrelated dimensions&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;By default, values on unrelated dimensions are shown as the All-member of the unrelated dimension. For example, in Adventure Works, measures from the Internet Sales measure group are not linked to a reseller (obviously). However, if you try to analyze internet sales by reseller (for example by reseller type) you will see the same value (the total of sales) for each reseller. This is confusing. The property to control this behavior is the IgnoreUnrelatedDimensions setting in the measure group properties. When setting this to ‘false’ values for unrelated dimensions are not shown anymore. This is much easier to understand for the end users.    &lt;br /&gt;For further explanation see &lt;a href="http://ms-olap.blogspot.com/2010/04/properly-showing-values-for-unrelated.html"&gt;http://ms-olap.blogspot.com/2010/04/properly-showing-values-for-unrelated.html&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Whats’ next?&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;So that’s all? Of course not. First, all the tips from above do not prevent a bad architecture or cube structure. A good architecture is important for each Business Intelligence solution and there are many good books (for example Kimball’s “The Data Warehouse Toolkit”) out there about setting up such an infrastructure. In general, building an SSAS solution on top of a well made dimensional model instead of some OLTP systems is a good idea.&lt;/p&gt;  &lt;p&gt;And of course, the tips from above are not complete in any sense. If you asked someone else, you may get different tips. For example, it’s also important to set up the cube security or to define some time intelligence for easier analysis (like YTD-computations or year-over-year growth etc.). It is also a good practice to simplify complex cubes with many measure groups and dimensions using perspectives to make them easier to understand by the end user. And as many measures/calculations are not self-explaining just from their name, it is a good idea to link an html-document with a specification or helpful text with the member. You can use a cube action for this.&lt;/p&gt;  &lt;p&gt;However, I guess you all have many more good ideas with best practices and most important tips. If you like, I would be very happy if you could share these tips with me and - who knows – maybe there will be a post like “25 tips for every SSAS developer” in the near future.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;By the way, this is the &lt;strong&gt;second anniversary&lt;/strong&gt; of my OLAP blog. It was quite an interesting time so far and I’m looking forward to more topics to write about. The main location of this blog is &lt;/em&gt;&lt;a href="http://ms-olap.blogspot.com"&gt;&lt;em&gt;http://ms-olap.blogspot.com&lt;/em&gt;&lt;/a&gt;&lt;em&gt;. The posts are mirrored to &lt;/em&gt;&lt;a title="http://oraylis-olap.spaces.live.com/" href="http://oraylis-olap.spaces.live.com/"&gt;&lt;em&gt;http://oraylis-olap.spaces.live.com/&lt;/em&gt;&lt;/a&gt;&lt;em&gt; and also reposted to the new blog system of ORAYLIS at &lt;/em&gt;&lt;a href="http://blog.oraylis.de"&gt;&lt;em&gt;http://blog.oraylis.de&lt;/em&gt;&lt;/a&gt;&lt;em&gt; (here you can find more interesting posts about Sharepoint, Business Intelligence in general and much more). My posts are also readable by using some blog aggregators like &lt;/em&gt;&lt;a title="http://www.ssas-info.com/" href="http://www.ssas-info.com/"&gt;&lt;em&gt;http://www.ssas-info.com/&lt;/em&gt;&lt;/a&gt;&lt;em&gt; which I really recommend to learn more about SSAS, MDX and OLAP.&lt;/em&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-3972366339986346919?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/3972366339986346919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/06/10-tips-for-every-ssas-developer.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3972366339986346919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3972366339986346919'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/06/10-tips-for-every-ssas-developer.html' title='10 Tips for every SSAS developer'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-3643974540943478850</id><published>2010-05-24T14:38:00.001+02:00</published><updated>2011-07-13T18:11:27.637+02:00</updated><title type='text'>Self-Service BI, PowerPivot and the future of traditional BI (DWH, OLAP, MDX)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008R2 | Excel 2010/PowerPivot&lt;/p&gt;  &lt;p&gt;Since I’m using the recently released &lt;a href="http://www.powerpivot.com/"&gt;Microsoft PowerPivot Add-In for Excel 2010&lt;/a&gt; and when reading the rumors about the future of traditional OLAP and MDX there are some questions about the big picture of a BI-environment including self-service functionality. Basically, the BI world of the past had excepted the idea of a central data warehouse having a meta data layer such as OLAP to perfectly present the information to the end users. What about this new player PowerPivot then? How does this fit into the picture? Is there still a future for things like OLAP, MDX, central data warehouses or do we only need to roll out self-service BI functionality to every desktop PC? Some people have asked me questions about my point of view here and although I’m not a Microsoft representative I’d like to share my personal opinion with you:&lt;/p&gt;  &lt;p align="center"&gt;&lt;strong&gt;Self-service BI tools are not and will never be a replacement for traditional BI-systems but a great enhancement for them.&lt;/strong&gt;&lt;/p&gt;  &lt;p align="center"&gt;&lt;a href="http://lh3.ggpht.com/-2L4ZI1jZTuM/Th3Dqeeg5QI/AAAAAAAACT4/8alhjdqUs28/s1600-h/image_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh6.ggpht.com/-CAhHQZZt3X4/Th3Drq3CwWI/AAAAAAAACT8/ei2w0KDBYDg/image_thumb1_thumb.png?imgmax=800" width="644" height="324" /&gt;&lt;/a&gt; &lt;em&gt;Simple dashboard build using only Excel 2010 with PowerPivot. No centralized BI needed anymore?&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;In other words, including self-service BI functionality to your BI-system will increase the possibilities and the analytical power of the end users. But if you are cutting costs for centralized BI solutions believing to replace them with self-service tools, you’ll end up with even higher costs as suddenly the work that has only been done once in the centralized BI system will be performed redundantly, in an inconsistent way, error-prone and with much more working time needed. &lt;/p&gt;  &lt;p&gt;In order to understand this, let’s look at just some advantages, a centralized BI-system can offer, which cannot be replaced by a decentralized self-service BI tool like PowerPivot:&lt;/p&gt;  &lt;p&gt;&lt;u&gt;Combination of data from multiple source in a consistent and time-saving way      &lt;br /&gt;&lt;/u&gt;If this is done on a user per user approach, it is very likely that different users are getting to different results. Also, this work is highly redundant. Imagine different departments getting to the IT in order get “their” data exported, then trying to combine it into a single data store. Often enough, this job requires additional mapping tables (customers, articles etc. may have different ids in different systems). We are supporting customers with multiple ERP-systems (due to mergers) and the mapping can be quite complicated. But even if it is just a single source of information, mapping between different tables has to be done and requires skills and knowledge about the data models. For example, if you forget to consider a key field the result may differ significantly. Or think about the need to exclude rows with a certain status, because this means they’re cancelled. One end user might know about this, the other might not.     &lt;br /&gt;This means that you will still need your sophisticated ETL processes, a proper front room model, slowly changing dimensions and all the stuff we know from our typical BI projects.&lt;/p&gt;  &lt;p&gt;&lt;u&gt;Consistent use of common calculations considering approved business rules      &lt;br /&gt;&lt;/u&gt;In many cases, our current ETL processes include complicated business processes for doing calculations and data mappings. Key measures and performance indicators have to be calculated in a consistent way. Business rules are the backbone for the company’s information system. If different departments are comparing apples and oranges, there is a lot of space for confusion and wrong decisions. Think of a simple example, like reporting the revenue. Are warranty adjustments considered? What about partial payments or commissions? What about discounts (for example staff discounts)? For all those aspects it has to be decided to include or exclude them into a certain key measure and also which relation to the time dimension is correct. Is revenue for a partial payment considered as a sale (full amount) at the date of purchase or are the real payments (cash flow) considered? If every user has to make these decisions it is very unlikely that everybody is doing the same calculations. Comparing results for different departments (think of a sales meeting for the different product managers) will then get very difficult.&lt;/p&gt;  &lt;p&gt;&lt;u&gt;Security can only be implemented in a central data store&lt;/u&gt;     &lt;br /&gt;Some real world scenarios are currently looking like this: The IT department exports data for other organizational structures, manually filtering out the data that is not intended for the recipient. This might work with very simple security structures, but with more security roles, user dependent security or more information recipients, this would lead to an enormous amount of work for exporting all the data. And if there are changes to the companies security model, all the exports have to be considered again. Having a central OLAP solution makes it easy to define the security roles and access rights in a central place using the business view on the data. For example, in OLAP you can restrict a user to see the cost centers for which she or he is responsible – OLAP takes care about all related data (for example automatically filtering the cost facts to these cost centers). There is no need for a huge amount of data exports as users can retrieve the data and information they need and IT only has to make sure that the data is available and secured.&lt;/p&gt;  &lt;p&gt;&lt;u&gt;The need for management reporting      &lt;br /&gt;&lt;/u&gt;Management needs an overview about some or all business units. Having the information (especially the calculations, KPIs etc.) in a decentralized environment makes it very difficult to get this management reporting in a simple, time saving way. It is more likely, that IT has to do special exports which are then processed by the controlling department to build the management reports. This could result in controlling spending all the time in doing data management, not information management and controlling. Also, in this scenario, the data from the management reporting will most likely differ from the data of the departments. Just imagine the CEO going to a some product manager saying “Hey, you’re product profitability is –5%” and the product manager says “No, it isn’t. Look at MY report. Here it reads +3%” and actually neither of them could say which result was the right one…&lt;/p&gt;  &lt;p&gt;Then, after all, if the central BI environment is so important, do we really need decentralized self service BI? Well, not every user will need self-service BI but for some it can be a real time saver or give them a lot of analytical power. Here are just two important scenarios:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The end user wants to analyze information by special properties of the data which are not present in the centralized data warehouse. Just think of product managers. Each product has a different target group, special conditions in the market and therefore potentially certain aspects that are different from product to product and therefore from product manager to product manager. Those information might only be relevant for certain products. Having this in a central data warehouse would be confusing as the information has no meaning for most of the products. Allowing each department to cover the specific needs of their work while still providing the central information being available in the data warehouse is the best way here. &lt;/li&gt;    &lt;li&gt;The end user wants to combine the centrally provided information with other sources of data, for example information that has been purchased/acquired from external sources and which is not complete in means of geography, time etc. Think of a marketing department planning a campaign. In order to do so, they want to analyze sales data in conjunction with external data for purchasing power. The external information was only purchased once for the region where the campaign is planned. This kind of data cannot be loaded into the central data warehouse. But with self-service BI it can be analyzed side-by-side with the centrally provided sales data.      &lt;br /&gt;Or think of one department trying to improve product quality by changing some of the parameters during the production cycle for some of the production batches. These changes are tracked in some other system (let’s say Excel) but not in the central DWH as they do only apply to this single line of production. Self-service BI allows us to analyze changes in the parameters together with data from the central data warehouse side by side (for example quality control data in this case). &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;So my opinion is that the traditional centralized BI systems really benefit from self-service BI functionality. However, self-service BI can never replace traditional BI. But what about technical aspects, especially the future of MDX? MDX is the query language for multidimensional databases (used by many vendors). In PowerPivot we can do a lot of the calculations using the new expression language called DAX. Will DAX be a replacement for MDX? Absolutely not! DAX is meant to bring analytical power to Excel users. It looks similar to Excel functions and in fact many Excel functions can be used. Its strength is simplicity. Although one could imagine to extent the expression based DAX language to query functionality (in MDX you can write both, queries and expressions) this would also complicate the use of DAX which is clearly not intended. Even with today’s Excel, many users only know about the operators +, –, / and * and the SUM function (advanced users know about SUMIF…). In order to have end users, even power users, being able to leverage the power of a self service BI solution, the calculation functions have to be as simple as possible. This is the idea of DAX. However, when defining complex queries, building highly sophisticated business logic into calculations or KPIs, implementing ease-of-use like KPI trends, OLAP actions, drill through queries, navigation in hierarchies (DAX has no hierarchies) MDX has everything that’s needed here. Including this functionality into DAX would only make it complicated and more difficult to understand and use. Of course, this is only my opinion, but I’m sure that MDX will still be used for what it is used today and DAX expressions will be used for self-service calculations and mappings that are intended to be done by none-technical people.&lt;/p&gt;  &lt;p&gt;When using client-side technology to create any kind of informational insight we have to monitor this process carefully. There has to be a process to maintain business requirements and implement changes to the central BI-system to avoid the negative effects mentioned above. As with PowerPivot, it is also possible to monitor which workbooks have been used and which data sources have been queried (if the PowerPivot sheets are published to Sharepoint 2010). I think this is also important to really understand, if the self-service BI tool is used in a way it is intended to be used or if some analysis requirements that should have been part of the central BI solution are now starting to be solved in multiple departments redundantly. &lt;/p&gt;  &lt;p&gt;So again, we will see a co-operation between the centralized BI-system and self-service BI solutions, as well as between MDX and DAX. Self-service BI and DAX are not the fox in the chicken-house of traditional BI but they are extending the vision and scope of BI-systems of today and in the future.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-3643974540943478850?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/3643974540943478850/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/05/self-service-bi-powerpivot-and-future.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3643974540943478850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3643974540943478850'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/05/self-service-bi-powerpivot-and-future.html' title='Self-Service BI, PowerPivot and the future of traditional BI (DWH, OLAP, MDX)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-CAhHQZZt3X4/Th3Drq3CwWI/AAAAAAAACT8/ei2w0KDBYDg/s72-c/image_thumb1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-3633355585291901528</id><published>2010-05-16T12:51:00.001+02:00</published><updated>2011-07-13T18:13:25.341+02:00</updated><title type='text'>Analyzing the number of visits per customer</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;OLAP is perfect for analyzing fact records that are mapped to dimensions. However, what can be done, if the reference to the dimension changes depending on the selected period? I’m not talking about slowly changing dimensions here, but really about the period of selection.&lt;/p&gt;  &lt;p&gt;Think of the following example: A manager wants to analyze the number of visited customers for his sales force. He wants to see, how many customers have been visited once, twice and so on during the last three months (or any other time period). &lt;/p&gt;  &lt;p&gt;In order to do so, we need a dimension “Visits” (with the number of visits, for example 0, 1, 2, …) but we cannot map the visits against this dimension because this mapping depends on the selected period. For example, one single customer was visited once in January and once in February. If the selected period is January, the we need to see one customer being visited once. If the selected period is January and February together, we would need to see the customer being visited twice.&lt;/p&gt;  &lt;p&gt;In SQL Server 2008 we can solve this by not linking our Visits dimension to the measure groups. The calculations are then performed by a cube script. For our example, we’re using a very simple data model:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-IxWCwhbfL1A/Th3D9LzUNeI/AAAAAAAACUA/c96XSc_kLFI/s1600-h/image_thumb1%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh5.ggpht.com/-ib_WAaAmcRg/Th3D982UOZI/AAAAAAAACUE/mDtkAtXMHF0/image_thumb1_thumb%25255B1%25255D.png?imgmax=800" width="453" height="273" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The fact table FactVisits contains one row per visit (linked to date and customer). Note that the Visits dimension is not linked to the fact table. The source table for the visit dimension looks like this.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-iBnJeZWSiIo/Th3D-l4KS-I/AAAAAAAACUI/R6YDFdbx5qY/s1600-h/image_thumb2%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh5.ggpht.com/-c7-cvfwplV8/Th3D_lpxWjI/AAAAAAAACUM/Wk8BngK6drc/image_thumb2_thumb%25255B1%25255D.png?imgmax=800" width="220" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;For the cube, we need to calculate the entries for the Visits dimension. To do so, I created an additional measure (Customer Count) based on some other value I found in the source fact table (the value will be overwritten using the cube script later). Usually you would create an additional column in the data source view (value 0) to source this measure from. Here is the definition of my Customer Count measure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-QZJObQEVvg0/Th3EAF195QI/AAAAAAAACUQ/azCJl-vNHXo/s1600-h/image_thumb31%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb31" border="0" alt="image_thumb31" src="http://lh5.ggpht.com/-XMIhHNBcdxQ/Th3EAzt4KRI/AAAAAAAACUU/2uiNo-brGR8/image_thumb31_thumb%25255B1%25255D.png?imgmax=800" width="190" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In order to calculate the number of customers for each visit count, I created the following cube script:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;scope ([Visits].[Visit].[Visit],[Customer].[Customer].[Customer],[Measures].[Customer Count]);      &lt;br /&gt;this=iif(([Measures].[Visit Count])=CInt([Visits].[Visit].currentmember.properties(&amp;quot;KEY&amp;quot;)),1,NULL);       &lt;br /&gt;end scope;&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;The script is computed on leaf level of the visits (visit count dimension). In the cube script, the reference to [Visits].[Visit].currentmember.properties(&amp;quot;KEY&amp;quot;) results in the VisitID (number of visits). If the number of visits (measure [Visit Count]) is equal to the number of visits found in the visit dimension, this is counted as one. Note, that I used NULL instead of 0. This makes it easier to analyze, which customers have been visited for a given number of times, as null values can be suppressed (see last screenshot of this post).&lt;/p&gt;  &lt;p&gt;So, let’s check the results up to this point. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-37d0DI0fp20/Th3EBZ62PBI/AAAAAAAACUY/Os_ouHi9iiY/s1600-h/image_thumb4%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh5.ggpht.com/-GDYdAF3Nh5c/Th3EB39sKUI/AAAAAAAACUc/6KOGMWPZ1Ps/image_thumb4_thumb%25255B1%25255D.png?imgmax=800" width="147" height="189" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The calculation now works for every single line but currently the totals are still wrong. The correct value for the totals can be computed using a dynamic set (to allow Excel multi selects) like this:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE HIDDEN DYNAMIC SET CURRENTCUBE.[DynaVisits] AS [Visits].[Visit].[Visit];      &lt;br /&gt;([Visits].[Visit].[All])=Sum(existing [DynaVisits]);&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;I also want to count more than ten visits on a special dimension element (ID 999 in the table above). This can be achieved with the following cube script:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;scope ([Visits].[Visit].&amp;amp;[999],[Customer].[Customer].[Customer],[Measures].[Customer Count]);      &lt;br /&gt;this=iif(([Measures].[Visit Count])&amp;gt;10,1,NULL);       &lt;br /&gt;end scope;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Testing the results (here using Microsoft Excel) shows the correct calculation for the grand total as well as for other levels of the visit count hierarchy:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-V_F4I062aCs/Th3ECVzOMhI/AAAAAAAACUg/zjEykyaTTXE/s1600-h/image_thumb11%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh3.ggpht.com/-r15ahVqk8oQ/Th3EDGYZIXI/AAAAAAAACUk/mLCHfVSUC0E/image_thumb11_thumb%25255B1%25255D.png?imgmax=800" width="293" height="424" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The way our calculations works can be best observed when looking at a single customer.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-GrWF9Pbs6iI/Th3EDuw9yMI/AAAAAAAACUo/YCOez3adAOE/s1600-h/image_thumb111%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb111" border="0" alt="image_thumb111" src="http://lh6.ggpht.com/-3drEb4pOiug/Th3EEUqz2zI/AAAAAAAACUs/NVaiDoPh8j0/image_thumb111_thumb%25255B1%25255D.png?imgmax=800" width="376" height="202" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In this case we had two visits of our customer in 2008 and one visit in 2009 giving a total of three visits for 2008 and 2009. Note that the actual fact is shown as a dimensional property here. &lt;/p&gt;  &lt;p&gt;We can also apply multi select filters in Excel. Here is another sample screenshot filtering only customers with less than 4 visits during the last three months in 2009 (this is the ‘(multiple items)’ filter for the calendar). The data is now also analyzed by the sales representative (attribute of the customer). Note, that the grand totals are still correct:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-8pljJK393Jk/Th3EExfs12I/AAAAAAAACUw/gueQf5SeUn4/s1600-h/image_thumb3%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh4.ggpht.com/-98o_Urp780Q/Th3EFlm4kcI/AAAAAAAACU0/oaXwVZTIUU0/image_thumb3_thumb%25255B1%25255D.png?imgmax=800" width="881" height="204" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As mentioned before it is also possible to see which customers are listed here by including the customers in the pivot table. This is shown in the following screenshot (filter to visit count = Zero).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-Tv4XvL0K1sU/Th3EG7QG_1I/AAAAAAAACU4/YjKEe1JNJVo/s1600-h/image_thumb8%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb8" border="0" alt="image_thumb8" src="http://lh4.ggpht.com/-0l-EN5LXeyQ/Th3EHuuIyeI/AAAAAAAACU8/AYCdeTw8Cq8/image_thumb8_thumb%25255B1%25255D.png?imgmax=800" width="724" height="281" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;However, the drill through action would still return all customers (due to the definition of my measure) as the value is only based on a computation. So the drill through should be disabled here.&lt;/p&gt;  &lt;p&gt;In this case, I simply checked the measure in order to enable the drill through action:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-WYUVASSDvVE/Th3EIVhgzXI/AAAAAAAACVA/_XkHM705ql4/s1600-h/image_thumb17%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb17" border="0" alt="image_thumb17" src="http://lh5.ggpht.com/-rrZtc_JQBfM/Th3EJELjMII/AAAAAAAACVI/-DeRSUgIhuQ/image_thumb17_thumb%25255B1%25255D.png?imgmax=800" width="515" height="369" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;Of course, the example is still a very simplified one. Usually you would need to know the customer base, so you’re not counting new customers as not being visited during the last years. &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-3633355585291901528?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/3633355585291901528/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/05/analyzing-number-of-visits-per-customer.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3633355585291901528'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3633355585291901528'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/05/analyzing-number-of-visits-per-customer.html' title='Analyzing the number of visits per customer'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-ib_WAaAmcRg/Th3D982UOZI/AAAAAAAACUE/mDtkAtXMHF0/s72-c/image_thumb1_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6319835958095706755</id><published>2010-04-25T12:22:00.001+02:00</published><updated>2011-07-13T18:13:45.048+02:00</updated><title type='text'>Pie chart with ‘others’ category (collected data)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;Pie chart with too many categories don’t make much sense. The following screenshot shows the order count from the AdventureWorks OLAP database by subcategory (no selection on date here for this example) as a pie chart:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-ZCZ9oH3qI5Y/Th3EI1YD9qI/AAAAAAAACVE/6vDQ262OwTY/s1600-h/image_thumb4%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh5.ggpht.com/-BGJGtfoLAtc/Th3EJ2TsfnI/AAAAAAAACVM/-Gy1bNP11PU/image_thumb4_thumb.png?imgmax=800" width="244" height="142" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Fortunately, SQL Server 2008 Reporting Services adds a feature to collect all slices below a certain threshold (either as a fixed value or as a percentage) as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-hu5i-6HVbSo/Th3EKt1VHgI/AAAAAAAACVQ/Sjz7YQYpqjs/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh4.ggpht.com/-e4yQdspeQyo/Th3ELQ9dRJI/AAAAAAAACVU/qO0ruh96c7U/image_thumb3_thumb.png?imgmax=800" width="244" height="155" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;To get this result you have to check the custom attributes (properties) of the chart series (either by selecting the pie itself or by choosing the chart series in the property box picker)&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-iB1P_xnx2Mw/Th3EL8ApA9I/AAAAAAAACVY/LMC5KTATlvQ/s1600-h/image_thumb6%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh5.ggpht.com/-z6I4QCjnWbw/Th3EMj2F7UI/AAAAAAAACVc/Sy5j1kltxVg/image_thumb6_thumb.png?imgmax=800" width="243" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In our example I collected all slices with a value of less than 3% to one single slice with the name ‘Other’. You can even show the other values as an exploded pie chart (although I think it’s more confusing).&lt;/p&gt;  &lt;p&gt;In cases where you want to show just a certain number of slices (instead of using a threshold) or if you are using Reporting Services 2005 which doesn’t support the collected slice, you may do the collection by MDX:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;WITH      &lt;br /&gt;SET SelectedSubCategories AS       &lt;br /&gt;TopCount       &lt;br /&gt;(       &lt;br /&gt;Order       &lt;br /&gt;(       &lt;br /&gt;[Product].[Subcategory].[Subcategory]       &lt;br /&gt;,[Measures].[Order Count]       &lt;br /&gt;,DESC       &lt;br /&gt;)       &lt;br /&gt;,10       &lt;br /&gt;)       &lt;br /&gt;SET OtherSubCategories AS       &lt;br /&gt;[Product].[Subcategory].[Subcategory] - SelectedSubCategories       &lt;br /&gt;MEMBER [Product].[Subcategory].[Other] AS       &lt;br /&gt;Aggregate(OtherSubCategories)       &lt;br /&gt;SELECT       &lt;br /&gt;[Measures].[Order Count] ON 0       &lt;br /&gt;,NON EMPTY       &lt;br /&gt;{       &lt;br /&gt;SelectedSubCategories       &lt;br /&gt;,[Product].[Subcategory].[Other]       &lt;br /&gt;} ON 1       &lt;br /&gt;FROM [Adventure Works]       &lt;br /&gt;CELL PROPERTIES VALUE;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;The idea is quite simple. First, you create a set with the number of slices you want to see (I called it SelectedSubCategories here). Then you can simply get all other categories using a set minus operation (I called it OtherSubCategories here). Finally you create the ‘Other’ member in the dimension as an aggregate of the last set.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-3CXAkiReJTU/Th3ENcNNfuI/AAAAAAAACVg/piC0e-mn4uM/s1600-h/image_thumb7%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh3.ggpht.com/-MyikAaCbfP8/Th3EOCnhGiI/AAAAAAAACVk/cmmHhuYR1Yc/image_thumb7_thumb.png?imgmax=800" width="244" height="155" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Of course, you can even make the parameter for the number of slices a report parameter so the user can choose how many slices are shown in the diagram.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6319835958095706755?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6319835958095706755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/04/pie-chart-with-others-category.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6319835958095706755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6319835958095706755'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/04/pie-chart-with-others-category.html' title='Pie chart with ‘others’ category (collected data)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-BGJGtfoLAtc/Th3EJ2TsfnI/AAAAAAAACVM/-Gy1bNP11PU/s72-c/image_thumb4_thumb.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-6197912472499436915</id><published>2010-04-04T19:59:00.001+02:00</published><updated>2011-07-13T18:15:37.531+02:00</updated><title type='text'>Properly showing values for unrelated dimensions (IgnoreUnrelatedDimensions)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;There have already been some posts about unrelated dimension handling, but I think, the topic is worth being visited again. So this is yet another unrelated dimension post (yaudp).&lt;/p&gt;  &lt;p&gt;A good summary about this topic can be found in &lt;a href="http://bennyaustin.wordpress.com/2009/06/25/ignoreunrelateddimensions/"&gt;this post&lt;/a&gt; by Benny Austin, however there are some drawbacks not being mentioned there and my post today is focused on those drawbacks and how to deal with them.&lt;/p&gt;  &lt;p&gt;If you are short on time here is a quick summary of the following text:&lt;/p&gt;  &lt;p&gt;Setting IgnoreUnrelatedDimensions to false can make cube browsing easier to understand, but&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Take care with default values in unrelated dimensions: As the dimension is not related, no value will be shown. &lt;/li&gt;    &lt;li&gt;Take special care of dimensions with the IsAggregatable property set to false (no ‘All’ element): If such a dimension is unrelated, setting IgnoreUnrelatedDimensions to false has no influence for this dimension &lt;/li&gt;    &lt;li&gt;Role playing dimensions are considered as different dimensions. If one dimension role is related and another role is unrelated, the setting IgnoreUnrelatedDimensions=false will not show data for the unrelated role(s) of the dimension. &lt;/li&gt;    &lt;li&gt;Calculated measures associated with a fact table do not depend on the IgnoreUnrelatedDimensions setting of the measure group they have been assigned to but only to setting of the measure groups that they are generated from. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;If you say ‘yes, I knew all this’ I suggest stop reading here. If not, here is the detailed explanation of the above bullet points.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Setting IgnoreUnrelatedDimensions to false can make cube browsing easier to understand…&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;For my samples I’m not using Analysis Services (as we have to do some changes to the model) but a very simple OLAP model that looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-0oTQHt5eczk/Th3EZjGQ-xI/AAAAAAAACVo/ZTXZ5ifdS4k/s1600-h/image_thumb1%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh6.ggpht.com/-YceNU-6zm_M/Th3Eae7LhZI/AAAAAAAACVs/aaL1lvFFXoU/image_thumb1_thumb%25255B1%25255D.png?imgmax=800" width="673" height="190" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Think of the model as being used for a web shop. We want to sell the six products from the fabulous novel “The Deadline” by Tom DeMarco using a web shop and compare our sales with the number of web site visitors for a given day. In order to see some interesting effects of the IgnoreUnrelatedDimensions topic, we have a role playing time dimension DimDate for our web site orders (order data and delivery date).&lt;/p&gt;  &lt;p&gt;So, here is the dimension usage screenshot:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-LWf1aGShKAQ/Th3Ea7JY0hI/AAAAAAAACVw/6WjFDYQZ-cM/s1600-h/image_thumb3%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh3.ggpht.com/-y2ytuxfe3BA/Th3EbhWTEQI/AAAAAAAACV0/FPZ0f2WM6hs/image_thumb3_thumb%25255B1%25255D.png?imgmax=800" width="499" height="156" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can see, the measure group for our web site visits is not linked to delivery date or product. Therefore we get the following result if we want to analyze web visits by product:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-25V4IgPgniM/Th3EcY18IcI/AAAAAAAACV4/nZkAN_4YC7A/s1600-h/image_thumb8%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb8" border="0" alt="image_thumb8" src="http://lh3.ggpht.com/-4UUR2LS9HVU/Th3Ec6-ugJI/AAAAAAAACV8/Q5o-Lw_8Jt4/image_thumb8_thumb%25255B1%25255D.png?imgmax=800" width="221" height="137" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Most people would find this result confusing. If you are not absolutely aware of the dimension usage, seeing the same number of web visits for every product could lead to a wrong understanding.&lt;/p&gt;  &lt;p&gt;A common work around for this behavior is to set the IgnoreUnrelatedDimensions property of the measure group Web Visits to false as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-jyXYLIWAmsU/Th3Edj301YI/AAAAAAAACWA/4QHziQZFPmo/s1600-h/image_thumb5%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh3.ggpht.com/-AhUtrinREag/Th3EeAipvxI/AAAAAAAACWE/ehZPjMym3Xk/image_thumb5_thumb%25255B1%25255D.png?imgmax=800" width="299" height="167" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This causes cell results from unrelated dimensions to disappear as shown in the following screenshot:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-5Jfw2bB5Wkg/Th3Eeka_ieI/AAAAAAAACWI/e1I8kmmdfDI/s1600-h/image_thumb9%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb9" border="0" alt="image_thumb9" src="http://lh3.ggpht.com/-KOH_LOvsZFI/Th3EfQvM_3I/AAAAAAAACWM/ZTvM6lAEsRE/image_thumb9_thumb%25255B1%25255D.png?imgmax=800" width="211" height="120" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This looks much better. So why shouldn’t we always use this?&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Take care with default values in unrelated dimensions&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first problem with our design appears if we define a default member other then ‘All’ for the product dimensions. I am a special fan of Quickerstill, so let’s make this the default product:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-wUbgcVdLXnM/Th3Ef97dzuI/AAAAAAAACWQ/SxXcCzEVoa0/s1600-h/image_thumb151%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb151" border="0" alt="image_thumb151" src="http://lh5.ggpht.com/-r9LMK8g5u8Q/Th3EgoIF37I/AAAAAAAACWU/JHmXoRkHIaQ/image_thumb151_thumb%25255B1%25255D.png?imgmax=800" width="424" height="231" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Please keep in mind that there are other methods of setting the default value (cube script, role security definition) that will have the same effect. Since our web visit measure group is not related to the product, we now get now value returned when we simply use the measure “Number of visits”:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-rr9DgON6oAE/Th3EhFR3c7I/AAAAAAAACWY/NXlnCJxb-wc/s1600-h/image_thumb161%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb161" border="0" alt="image_thumb161" src="http://lh6.ggpht.com/-iILhhhvqmVQ/Th3Eh_lpzmI/AAAAAAAACWc/2Kzv0ZIfSrQ/image_thumb161_thumb%25255B1%25255D.png?imgmax=800" width="146" height="66" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Same happens for the following MDX query:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;select [Measures].[Number Of Visits] on 0      &lt;br /&gt;from [OLAPSample]&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;The reason is that although we don’t have the product dimension in our pivot table, the default member still acts as a filter and as you can see from the previous screenshot of the pivot table, there is no value for ‘number of visits’ for the product members. The only way to get a value for the Number of visits is now to explicitly set the product dimension to its all member. I’m using MDX here in order not to get confused by side effects caused by the way the built-in pivot component generates the sums: &lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="955"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="234"&gt;&lt;a href="http://lh4.ggpht.com/-5bYyEKMlouk/Th3EipjsgXI/AAAAAAAACWg/INYDUp4pLoQ/s1600-h/image_thumb17%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb17" border="0" alt="image_thumb17" src="http://lh4.ggpht.com/--PSSBBz_nfw/Th3Ejaq5EOI/AAAAAAAACWk/l8BGd0JU5g8/image_thumb17_thumb%25255B1%25255D.png?imgmax=800" width="169" height="156" /&gt;&lt;/a&gt;&lt;/td&gt;        &lt;td valign="top" width="719"&gt;&lt;font face="Courier New"&gt;select [Measures].[Number Of Visits] on 0,            &lt;br /&gt;[Product].[Product].allmembers on 1             &lt;br /&gt;from [OLAPSample]&lt;/font&gt; &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Take special care of dimensions with the IsAggregatable property set to false…&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;After looking at the following example I would have expected that removing the ‘All’ member from the dimension would result in no values are shown in any case. Removing the ‘All’ member is simply done by setting IsAggregatable to false:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-RTQ8Bsd6RLc/Th3EjzG9AiI/AAAAAAAACWo/Wihext-AaNA/s1600-h/image_thumb91%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb91" border="0" alt="image_thumb91" src="http://lh6.ggpht.com/-B56Q0GVs3eY/Th3Ekjb-fLI/AAAAAAAACWs/_wDtOwu2Wi4/image_thumb91_thumb%25255B1%25255D.png?imgmax=800" width="287" height="213" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;But now the query above results in the following: &lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="841"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="238"&gt;&lt;a href="http://lh5.ggpht.com/-YW3DQ8XN0IY/Th3ElZ7ORUI/AAAAAAAACWw/3BR9nQ-YdXY/s1600-h/image_thumb10%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb10" border="0" alt="image_thumb10" src="http://lh3.ggpht.com/-11Q5dq8rvN0/Th3El-NawfI/AAAAAAAACW0/Q8ZuBhYHpOg/image_thumb10_thumb%25255B1%25255D.png?imgmax=800" width="170" height="138" /&gt;&lt;/a&gt;&lt;/td&gt;        &lt;td valign="top" width="601"&gt;&lt;font face="Courier New"&gt;select [Measures].[Number Of Visits] on 0,            &lt;br /&gt;[Product].[Product].allmembers on 1             &lt;br /&gt;from [OLAPSample]&lt;/font&gt;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;    &lt;p&gt;As expected, we don’t see the total any more. As I haven’t expected, the result is exactly the same as if we had set IgnoreUnrelatedDimensions to true (default). So in this case the setting for IgnoreUnrelatedDimensions is ignored.&lt;/p&gt;  &lt;p&gt;I would still doubt the need for the default member in many cases. It could be an indicator that the fact table isn’t properly modeled (if we have to exclude certain fact values ‘by default’) or they might be able to be replaced by calculated measures.&lt;/p&gt;  &lt;p&gt;Also, the problem with the default member does only exist, if the dimension is not related. In some cases you can fix this. For example, if you have a scenario dimension (actual, plan, forecast etc.) with different fact sources and with a default of ‘actual’ you can link all the fact sources to the matching dimension member (although your actual data source does only contain one scenario).&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Unrelated role playing dimensions…&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In our sample we have two roles for the time dimension (order date and delivery date). However, web visits are only linked to one of them (order date). Therefore analyzing the web visits by the other (unlinked) role has the same effect as with any other unlinked dimension: The values are suppressed. For the following screenshot, we took the Year from the &lt;u&gt;delivery date&lt;/u&gt; dimension:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-UK_b9_0YU0Y/Th3EmhVTPGI/AAAAAAAACW4/kNRDTSVGzlY/s1600-h/image_thumb11%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh5.ggpht.com/-RsmaCe2GFyo/Th3EneaeWpI/AAAAAAAACW8/8c5qbCCON2Q/image_thumb11_thumb%25255B1%25255D.png?imgmax=800" width="208" height="63" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Calculated measures…&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Regarding the IgnoreUnrelatedDimensions setting, calculated measures are always calculated with the settings of the measure groups used in the calculation. The ASSOCIATED_MEASURE_GROUP attribute has no influence here. For example, let’s simply “copy” the measure “Total Price” from the order measure group to the web visits measure group and vice versa:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].[Total Price (2)]      &lt;br /&gt;AS [Measures].[Total Price],       &lt;br /&gt;FORMAT_STRING = &amp;quot;#,##0.00&amp;quot;,       &lt;br /&gt;VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Fact Web Visits' ; &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].[Number Of Visits (2)]      &lt;br /&gt;AS [Measures].[Number Of Visits],       &lt;br /&gt;FORMAT_STRING = &amp;quot;#,##0&amp;quot;,       &lt;br /&gt;VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Fact Order' ;&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Bjmk5LODrJs/Th3EoOX4gkI/AAAAAAAACXA/4_bHtcQfxF0/s1600-h/image_thumb16%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb16" border="0" alt="image_thumb16" src="http://lh4.ggpht.com/-GQwD2LFPeW0/Th3Eoxo_dcI/AAAAAAAACXE/sbA9p4F_ETk/image_thumb16_thumb%25255B1%25255D.png?imgmax=800" width="203" height="152" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The following screenshot shows, that each of our calculated measures (Total Price (2) and Number Of Visits (2)) behaves exactly in the same way like the original measure although they are part of a different measure group: &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-RpxzXWKmQFs/Th3EpVTKcLI/AAAAAAAACXI/bo_sUWWvMVg/s1600-h/image_thumb15%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb15" border="0" alt="image_thumb15" src="http://lh3.ggpht.com/-kPd8JoROG1s/Th3EqJB1sII/AAAAAAAACXM/oHU551cTfm8/image_thumb15_thumb%25255B1%25255D.png?imgmax=800" width="383" height="118" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;Conclusion&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Setting IgnoreUnrelatedDimensions can be a good way of improving readability of olap results. However, things are getting tricky with unrelated dimensions that are having default members. In some cases you can get rid of the problems by changing the design (splitting fact tables, using calculated measures, relating unrelated dimensions if possible). Especially with non-aggregatable dimensions the behavior becomes strange as IgnoreUnrelatedDimensions doesn’t work at all in this case.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-6197912472499436915?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/6197912472499436915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/04/properly-showing-values-for-unrelated.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6197912472499436915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/6197912472499436915'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/04/properly-showing-values-for-unrelated.html' title='Properly showing values for unrelated dimensions (IgnoreUnrelatedDimensions)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-YceNU-6zm_M/Th3Eae7LhZI/AAAAAAAACVs/aaL1lvFFXoU/s72-c/image_thumb1_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-3976626523379037033</id><published>2010-03-14T17:39:00.001+01:00</published><updated>2011-07-13T18:16:48.303+02:00</updated><title type='text'>Effects of attribute relationship settings on calculations</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;   &lt;p&gt;This post is again about attribute relationship. I recently saw a very good presentation by Michael Mukovskiy, a colleague and friend of mine, regarding attribute relationship and its influence on calculated members.&lt;/p&gt;    &lt;p&gt;In order to keep things simple, I start with a very simple date dimension having the following attributes and relations:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-IKw2G62sVdk/Th3EuH8arAI/AAAAAAAACXQ/VTT_qUrnasY/s1600-h/image_thumb2%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh5.ggpht.com/-F-WdxSasnq8/Th3EvL7fMII/AAAAAAAACXU/0WUNKJUr0Qc/image_thumb2_thumb%25255B1%25255D.png?imgmax=800" width="497" height="40" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Let’s assume we also have a measure group “Sales” with a measure “Quantity”. For our cube we also want to display percentage of the sales with respect to the year (eg. January: 10%, February: 12% etc.). In order to do so, we need the quantity per year and for our simple example I just do the computation for this (the percentage can easily computed then).&lt;/p&gt;    &lt;p&gt;To do so, we use the following cube script:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].QuantityFullYear        &lt;br /&gt;AS ([Measures].[Quantity], [Date].[Year].currentmember),         &lt;br /&gt;VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Sales' ;&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;Opening the cube browser, one can see something like this:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-JEioSVSfhks/Th3EvlL2a0I/AAAAAAAACXY/NviL8hN4jwA/s1600-h/image_thumb4%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh6.ggpht.com/-JA37IAgqHNE/Th3EwPwUK3I/AAAAAAAACXc/x8gyvSmpdng/image_thumb4_thumb%25255B1%25255D.png?imgmax=800" width="314" height="212" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;For this post it’s only important that the calculated member defined above computes correctly on every level of the date hierarchy: For each year we’re getting the total of the full year no matter what level in the date hierarchy we have on our axis. It’s sufficient to use [Date].[Year].currentmember in the calculation to do the trick. However, in order to understand the following example we have to look at the computation a little bit more precisely. &lt;/p&gt;    &lt;p&gt;So let’s take a look at one of the query cells:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-NXZYJHVeI54/Th3Ew2a9s2I/AAAAAAAACXg/eLKYTJB4BWU/s1600-h/image_thumb6%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh3.ggpht.com/-5iUCZ-GNdS8/Th3ExnGfAYI/AAAAAAAACXk/OX_cKBFNSfg/image_thumb6_thumb%25255B1%25255D.png?imgmax=800" width="314" height="60" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Before our computation takes place, May 1, 2009 is selected on the date hierarchy so this is the context of our calculation. Because of our attribute relationship, this also results in changes for the other date attributes as shown below:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-F341xWTKt38/Th3EyEgHnlI/AAAAAAAACXo/FgNfIbc1i80/s1600-h/image_thumb7%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh4.ggpht.com/-uT6e98kSC6k/Th3Ey8v_GjI/AAAAAAAACXs/SmDySk71VcA/image_thumb7_thumb%25255B1%25255D.png?imgmax=800" width="168" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;You can verify this easily by defining calculated members that rely on each of the levels (eg.&lt;font face="Courier New"&gt; member QuarterName As [Date].[Quarter].currentmember.name&lt;/font&gt;).&lt;/p&gt;    &lt;p&gt;It might be a little bit surprising that a simple calculation like &lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;(        &lt;br /&gt;[Measures].[Quantity],         &lt;br /&gt;[Date].[Year].currentmember         &lt;br /&gt;)&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;really gives the full year’s value because we’re also in the context of a specific month (May) and quarter (Q2) and even day (1). So one could assume that we would have to write our calculation like this:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;(        &lt;br /&gt;[Measures].[Quantity],         &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New"&gt;[Date].[Year].currentmember,        &lt;br /&gt;[Date].[Quarter].[All],         &lt;br /&gt;[Date].[Month].[All],         &lt;br /&gt;[Date].[Day].[All]         &lt;br /&gt;)&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;As we saw from our example above, this is not necessary (although it gives the same result). The reason for this is that the reference to a specific member in the Year attribute again changes the context for our computation and in this case this results in all attributes preceding the Year attribute (in our case: Quarter, Month an Day) to be changed to All.&lt;/p&gt;    &lt;p&gt;In more detail, the following rules apply for single attribute context changes (in this example the context change happens for the Month attribute):&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-OkArqkbKtY4/Th3EzbAPxUI/AAAAAAAACXw/rgvtmuMF_xk/s1600-h/image_thumb11%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh5.ggpht.com/-TaKowwB3jRg/Th3E0NRHlmI/AAAAAAAACX0/-QF7shW6__w/image_thumb11_thumb%25255B1%25255D.png?imgmax=800" width="663" height="278" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;As we can see from the first rule shown here, a context change to a specific member results in all attributes that precede the changed attribute in the attribute relationship to be changed to All. So this does the trick for our computation. It doesn’t matter that we’re actually changing the year to the same value it had before as the context was 2009 and currentmember also gives 2009, so we’re changing from 2009 to 2009. It’s still a Any/all –&amp;gt; specific value change and therefore all preceding attributes are changed to All.&lt;/p&gt;    &lt;p&gt;Up till now, this was only the prerequisite for this post. So now, it’s getting more interesting. Let’s assume everybody’s happy with our cube and it is used for some months. Nobody really remembers how our calculated member is defined and everything works correctly.&lt;/p&gt;    &lt;p&gt;Then, one user likes to have a calendar week attribute included in the date dimension. Of course this is easily done and now our attribute relationship looks like this:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-jAR2dOJUOJA/Th3E0kPgYlI/AAAAAAAACX4/QUlxV5TcvH8/s1600-h/image_thumb13%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb13" border="0" alt="image_thumb13" src="http://lh5.ggpht.com/-o_ydNXyumdg/Th3E1RuaXdI/AAAAAAAACX8/Q6WzqarM8qk/image_thumb13_thumb%25255B1%25255D.png?imgmax=800" width="499" height="67" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Of course we could also model an attribute relationship between Week and Year (at least for some definitions of the calendar week…) and also define a hierarchy for this. But for our simple example let’s continue without.&lt;/p&gt;    &lt;p&gt;So, we only changed the date dimension and deploy our cube because we would expect our calculated member from above to work properly after this changed (hey, we didn’t touch it). So, let’s take a look at the pivot table we used above:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-6pSa0wmtuDk/Th3E10apMXI/AAAAAAAACYA/w6i9NdQdq_g/s1600-h/image_thumb19%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb19" border="0" alt="image_thumb19" src="http://lh6.ggpht.com/-H-daSAF0QDU/Th3E2kzur4I/AAAAAAAACYE/DS0zgscKTyg/image_thumb19_thumb%25255B1%25255D.png?imgmax=800" width="313" height="214" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;As you can clearly see, our calculated member still works fine for the year, quarter and month level but not for the day level of our hierarchy. In order to understand what went wrong here, let’s again take a look at a specific date, eg. May 1, 2009. Instead of giving the full year’s values of 1501, we only get 21 here. The context change to this specific date also results in our week attribute to change (as it depends on the day). In my method for computing the calendar week, it computes to week 18. The following screenshot shows the calendar week together with the day: &lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/--L6KY9tr6hk/Th3E3VcNC1I/AAAAAAAACYI/Bg5RGAzB-tA/s1600-h/image_thumb21%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb21" border="0" alt="image_thumb21" src="http://lh3.ggpht.com/-Li3p2dYzL-g/Th3E3-QKhJI/AAAAAAAACYM/LtLh6PXLypI/image_thumb21_thumb%25255B1%25255D.png?imgmax=800" width="283" height="490" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;For the cell of May 1, 2009 our calculated member QuantityFullYear is computed in the following context:&lt;/p&gt;    &lt;table border="0" cellspacing="0" cellpadding="2" width="221"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="111"&gt;&lt;strong&gt;Attribute&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="108"&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Day&lt;/td&gt;          &lt;td valign="top" width="108"&gt;May 1, 2009&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Month&lt;/td&gt;          &lt;td valign="top" width="108"&gt;May&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Quarter&lt;/td&gt;          &lt;td valign="top" width="108"&gt;Q2/09&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Year&lt;/td&gt;          &lt;td valign="top" width="108"&gt;2009&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Week&lt;/td&gt;          &lt;td valign="top" width="108"&gt;18&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt;    &lt;p&gt;Now the expression ([Measures].[Quantity], [Date].[Year].currentmember) is evaluated. Since [Date].[Year].currentmember is now 2009 (based on the context we’re in), we have a context change like in rule 1 above (although it’s again changing to the same value 2009 –&amp;gt; 2009). This forces all attributes that precede the year attribute to change to All. But our week does not precede the year, as it is kind of a branch like shown below (sometimes the visualization of BIDS helper is easier to understand compared to the built-in functionally):&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-LmMflvzePzM/Th3E42BlynI/AAAAAAAACYQ/SQT2-Bi9Va8/s1600-h/image_thumb22%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb22" border="0" alt="image_thumb22" src="http://lh5.ggpht.com/-J6-jg1-acIE/Th3E5UoOYAI/AAAAAAAACYU/5ZMAiXcB-eA/image_thumb22_thumb%25255B1%25255D.png?imgmax=800" width="199" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;So for our computation, the measure Quantity is evaluated in this context:&lt;/p&gt;    &lt;table border="0" cellspacing="0" cellpadding="2" width="221"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="111"&gt;&lt;strong&gt;Attribute&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="108"&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Day&lt;/td&gt;          &lt;td valign="top" width="108"&gt;All&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Month&lt;/td&gt;          &lt;td valign="top" width="108"&gt;All&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Quarter&lt;/td&gt;          &lt;td valign="top" width="108"&gt;All&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Year&lt;/td&gt;          &lt;td valign="top" width="108"&gt;2009&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="111"&gt;Week&lt;/td&gt;          &lt;td valign="top" width="108"&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;18&lt;/font&gt;&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt;    &lt;p&gt;This means, we’re only getting the aggregated quantity of week 18 which can be easily proofed by looking at our last pivot including the week (4+2+6+0+0+3+6=21).&lt;/p&gt;    &lt;p&gt;Although we made no changes to the calculation itself, it doesn’t work properly anymore after our change for the attribute relationship. This is just another example that shows, that you really need to take care of your attribute relationships and also need to fully understand the consequences on calculations. Even worse, it is hard to find such problems as our change to the cube happened at a totally different part so nobody expects the calculation to fail afterwards.&lt;/p&gt;    &lt;p&gt;I also recommend establishing test queries to assert the functionality of all computations. Such test queries can be run from the ETL process and check, if all computations are still working after loading data into the cube (kind of a unit test idea).&lt;/p&gt;    &lt;p&gt;For our problem with the computation, there are at least two possible solutions. In some cases you can simply create the missing attribute relationship. In our case we could create a relationship between the week of the year and the year attribute (assuming the definition of the calendar week allows doing so). Our attribute relationship for the date dimension would look like this then:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-aT6NjeAiVT4/Th3E51dgUdI/AAAAAAAACYY/5_z8zjc0a8Y/s1600-h/image_thumb1%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh6.ggpht.com/-mE90P8736XU/Th3E6m_eYUI/AAAAAAAACYc/wxUeg06t8vg/image_thumb1_thumb%25255B1%25255D.png?imgmax=800" width="497" height="89" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;In this case, the relationship between the week and the year forces the week to its All member when referencing [Date].[Year].currentmember.&lt;/p&gt;    &lt;p&gt;If you cannot create such a relationship, we have to force the week to the All member manually in the calculation::&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].QuantityFullYear        &lt;br /&gt;AS ([Measures].[Quantity], [Date].[Year].currentmember, &lt;strong&gt;&lt;font color="#ff0000"&gt;[Date].[Week].[All]&lt;/font&gt;&lt;/strong&gt;),         &lt;br /&gt;VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Sales' ;&lt;/font&gt; &lt;/p&gt;    &lt;p&gt;The result for both solutions gives the desired result on every hierarchy level of the date dimension:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-WP-lkljBYxg/Th3E7EEEUHI/AAAAAAAACYg/-gRzh4Qbqoc/s1600-h/image_thumb24%25255B6%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb24" border="0" alt="image_thumb24" src="http://lh4.ggpht.com/-dr3qD9qJCyk/Th3E729fgKI/AAAAAAAACYk/5FbdFmq3nrI/image_thumb24_thumb%25255B1%25255D.png?imgmax=800" width="316" height="216" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-3976626523379037033?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/3976626523379037033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/03/effects-of-attribute-relationship.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3976626523379037033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/3976626523379037033'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/03/effects-of-attribute-relationship.html' title='Effects of attribute relationship settings on calculations'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-F-WdxSasnq8/Th3EvL7fMII/AAAAAAAACXU/0WUNKJUr0Qc/s72-c/image_thumb2_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-4700316317174143986</id><published>2010-03-13T10:02:00.001+01:00</published><updated>2011-07-14T09:15:12.363+02:00</updated><title type='text'>Data quality in BI projects</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2&lt;/p&gt;  &lt;p&gt;There is a new testing suite for BI solutions named BI.Quality available at codeplex: &lt;a href="http://biquality.codeplex.com/"&gt;http://biquality.codeplex.com/&lt;/a&gt;. The suite is based on NUnit and supports quite a lot of different testing methods. What I like most is the ability to automatically compare result sets from different sources, for example SQL and MDX. This allows you to define test scenarios your system must meet after the OLAP cube has been loaded. Or think of migration projects where you can still run queries against the old system and compare the results with the new system. As everything can be automated, tests can be run on a regular basis (for example directly after every cube processing) to make sure, that all facts are really complete and correct.&lt;/p&gt;  &lt;p&gt;Here are some screenshots:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-P4PbwQmZZ1g/Th6XdOHN9iI/AAAAAAAACYo/8de0VKJ_RcY/s1600-h/Screen1_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="Screen1_thumb" border="0" alt="Screen1_thumb" src="http://lh4.ggpht.com/-7i4wIQcoktQ/Th6XeAUFaOI/AAAAAAAACYs/OX5WxQ53b-o/Screen1_thumb_thumb.png?imgmax=800" width="244" height="133" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-aBIETJ3y4vE/Th6Xe4LyKYI/AAAAAAAACYw/T8yh8rDpfH8/s1600-h/Output_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="Output_thumb" border="0" alt="Output_thumb" src="http://lh4.ggpht.com/-FnIp5K59yc8/Th6Xf0_uqVI/AAAAAAAACY0/b63yuU4q8sQ/Output_thumb_thumb.png?imgmax=800" width="244" height="232" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;More info can be found at &lt;a href="http://biquality.codeplex.com/"&gt;http://biquality.codeplex.com/&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Disclaimer: Although it is an open source project the current development is done by ORAYLIS (my employer).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-4700316317174143986?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/4700316317174143986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/03/data-quality-in-bi-projects.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4700316317174143986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/4700316317174143986'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/03/data-quality-in-bi-projects.html' title='Data quality in BI projects'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-7i4wIQcoktQ/Th6XeAUFaOI/AAAAAAAACYs/OX5WxQ53b-o/s72-c/Screen1_thumb_thumb.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-1502439202151076591</id><published>2010-02-14T17:06:00.002+01:00</published><updated>2011-07-14T09:17:58.897+02:00</updated><title type='text'>Solution for SSAS 2008 multi-selects in Excel (Dynamic Sets)</title><content type='html'>&lt;p align="right"&gt;SQL Server 2008&lt;/p&gt;  &lt;p align="left"&gt;Being able to select multiple dimension elements in an MDX client requires some care in designing calculations for the cube. However, with SQL Server 2005 it was really difficult to design calculations in a way that work fine with Excel 2007. The reason for this is that Excel 2007 uses sub cubes for filtering and sets in SQL Server 2005 did not reflect sub cubes. So this article is about multi-select friendly queries in SSAS 2008 using dynamic sets.&lt;/p&gt;  &lt;p align="left"&gt;There have been some posts about the problems around multi-selects. Of course you cannot use a currentmember reference in your calculation (if a set is in the where condition, there is no single current member) but use sets in most cases. However, also sets do not react on where conditions in an MDX statement by default. You have to add the EXISTING keyword to get the desired result. In order to illustrate this, we’ll start with a very simple calculated measure. Our measure should just return the number of days being selected as the filter for our date dimension. I’ll use the good old Adventure Works example database for my tests here.&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;with &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;MEMBER CountDays_TooSimple     &lt;br /&gt;AS DrillDownLevel([Date].[Calendar].[Date], [Date].[Calendar].[Date]).count &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;MEMBER CountDays_Using_Existing     &lt;br /&gt;AS DrillDownLevel(existing [Date].[Calendar].[Date], [Date].[Calendar].[Date]).count &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;select {CountDays_TooSimple,CountDays_Using_Existing} on 0 &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;from [Adventure Works] &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;where [Date].[Calendar].[Calendar Year].&amp;amp;[2003]&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;The output looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-R26GFvjcDQg/Th6YBGfngpI/AAAAAAAACY4/_dhdvgMnl9k/s1600-h/image_thumb9%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb9" border="0" alt="image_thumb9" src="http://lh5.ggpht.com/-vqr7daqxiw0/Th6YCICaENI/AAAAAAAACY8/qaCrkyuP7VI/image_thumb9_thumb.png?imgmax=800" width="272" height="42" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can clearly see, the expression set expression without the EXISTING keyword (used for the measure CountDays_TooSimple) did not react on the filter (resulting in all 1188 days that are stored in the Adventure Works database), while the calculation with EXISTING (used in the measure CountDays_Using_Existing) did. Ok, this has nothing to do multi selects (as we only selected a single year) but it illustrated how we could use EXISTING to adjust sets to the given scope. The behavior is basically the same with multi selects. So, if we change the where condition to January and February we also get the right result for our measure that uses the EXISTING-method:&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;…     &lt;br /&gt;where {[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[1],[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[2] }      &lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier new"&gt;…&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-YoiURc_Xj58/Th6YCytKcyI/AAAAAAAACZA/_LNyAawmcJQ/s1600-h/image_thumb8%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb8" border="0" alt="image_thumb8" src="http://lh5.ggpht.com/-oFsAEpkN5G8/Th6YD97TE-I/AAAAAAAACZE/5AuhhuLjfvI/image_thumb8_thumb.png?imgmax=800" width="273" height="43" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;However, and this is where things are getting complicated, our measures still depend on the way how we implemented our filter. If we turn the filter into a sub cube, then the result looks different:&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;with &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;MEMBER CountDays_TooSimple     &lt;br /&gt;AS DrillDownLevel([Date].[Calendar].[Date], [Date].[Calendar].[Date]).count &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;MEMBER CountDays_Using_Existing     &lt;br /&gt;AS DrillDownLevel(existing [Date].[Calendar].[Date], [Date].[Calendar].[Date]).count &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;select {CountDays_TooSimple,CountDays_Using_Existing} on 0 &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;&lt;strong&gt;from (       &lt;br /&gt;select {[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[1],[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[2] } on 0        &lt;br /&gt;from [Adventure Works]        &lt;br /&gt;)&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;Now, even our calculation that used the EXISTING keyword to filter the set does not work anymore:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-mQmbbO3hCY4/Th6YEYl15_I/AAAAAAAACZI/4l5YnXBXcQA/s1600-h/image_thumb7%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh5.ggpht.com/-bzRoh_Qp6qc/Th6YFad1IFI/AAAAAAAACZM/9H0LL7AnY7Q/image_thumb7_thumb.png?imgmax=800" width="272" height="43" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Since Excel 2007 uses sub cubes to filter queries, this behavior is exactly how Excel would return this calculation (see below for the complete example in Excel).&lt;/p&gt;  &lt;p&gt;So, how can this be solved? In SQL Server 2008 there is a new feature, called dynamic named sets (or dynamic sets). The good thing about dynamic sets is, that they also react on sub cubes filters. So let’s look at the following calculation of the same measure:&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;with &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;&lt;strong&gt;DYNAMIC SET [CountDaysDynaSetDays] AS [Date].[Calendar].[Date]&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;&lt;strong&gt;MEMBER CountDays_Dynamic       &lt;br /&gt;AS DrillDownLevel(CountDaysDynaSetDays, [Date].[Calendar].[Date]).count&lt;/strong&gt; &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;select {CountDays_Dynamic} on 0 &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;from (     &lt;br /&gt;select {[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[1],[Date].[Calendar].[Month].&amp;amp;[2003]&amp;amp;[2] } on 0      &lt;br /&gt;from [Adventure Works]      &lt;br /&gt;)&lt;/span&gt;&lt;/p&gt;      &lt;p&gt;At first glance, nothing has really changed here. We simple declare the reference to our set (level here) [Date].[Calendar].[Date] as a dynamic set and used the set name in the calculation instead. For the result, I also included the two other measures from above in order to show the difference:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-zydmdIw-QVk/Th6YFz2eoHI/AAAAAAAACZQ/jmIKbLiM8QY/s1600-h/image_thumb6%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh4.ggpht.com/-ys_343zYqKo/Th6YGsOvhTI/AAAAAAAACZU/FdjVAW0BQoY/image_thumb6_thumb.png?imgmax=800" width="387" height="43" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This shows how dynamic sets can be used to react on multiple selects provided as a sub cube. I didn’t show it here, but our calculation using the dynamic sets (CountDays_Dynamic) also works fine with a simple where clause.&lt;/p&gt;  &lt;p&gt;In order to check if this calculation also works in Excel, let’s transfer all the measures above to a cube calculation. We simply add them at the end of the Adventure Works’s cube script:&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;CREATE MEMBER CURRENTCUBE.CountDays_TooSimple     &lt;br /&gt;AS DrillDownLevel([Date].[Calendar].[Date], [Date].[Calendar].[Date]).count ,      &lt;br /&gt;VISIBLE = 1 ; &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;CREATE MEMBER CURRENTCUBE.CountDays_Using_Existing     &lt;br /&gt;AS DrillDownLevel(existing [Date].[Calendar].[Date], [Date].[Calendar].[Date]).count ,      &lt;br /&gt;VISIBLE = 1 ; &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: courier new"&gt;CREATE DYNAMIC SET CURRENTCUBE.[CountDaysDynaSetDays] AS [Date].[Calendar].[Date] ;     &lt;br /&gt;CREATE MEMBER CURRENTCUBE.CountDays_Dynamic      &lt;br /&gt;AS DrillDownLevel(CountDaysDynaSetDays, [Date].[Calendar].[Date]).count ,      &lt;br /&gt;VISIBLE = 1 ; &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;For the cube browser, everything looks as expected. For the following screenshot I also selected January and February 2003. Please note, that also the calculation using the EXISTING keyword works fine here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-OL5TG9g1rGk/Th6YHPoc-VI/AAAAAAAACZY/vPO3j7oUvfw/s1600-h/image_thumb11%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh4.ggpht.com/-5WfDu-wrqIw/Th6YH33CBAI/AAAAAAAACZc/O3F3gYFJrt8/image_thumb11_thumb.png?imgmax=800" width="367" height="76" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now let’s open Excel and try there:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-LxsH3uG04sM/Th6YIwoKBWI/AAAAAAAACZg/JL70ygoLUzA/s1600-h/image_thumb13%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb13" border="0" alt="image_thumb13" src="http://lh5.ggpht.com/-pIEjuyutAfo/Th6YJfudHhI/AAAAAAAACZk/PqNNJTpGLjI/image_thumb13_thumb.png?imgmax=800" width="461" height="105" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As Excel uses a sub cube method, our second measure (CountDays_Using_Existing) fails here (giving the total number of days). But the calculation using the dynamic set still works fine in all scenarios.&lt;/p&gt;  &lt;p&gt;So, if you’re facing problems with multi selects you should also think of dynamic sets as one possible option to circumvent the problems.&lt;/p&gt;  &lt;p&gt;Also, dynamic named sets can also be a solution for performance optimizations (see &lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/08/25/mdx-in-katmai-dynamic-named-sets.aspx"&gt;Mosha's blog&lt;/a&gt; for example).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-1502439202151076591?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/1502439202151076591/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/02/solution-von-ssas-2008-multi-selects-in.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1502439202151076591'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/1502439202151076591'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/02/solution-von-ssas-2008-multi-selects-in.html' title='Solution for SSAS 2008 multi-selects in Excel (Dynamic Sets)'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-vqr7daqxiw0/Th6YCICaENI/AAAAAAAACY8/qaCrkyuP7VI/s72-c/image_thumb9_thumb.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-2401407368757666254</id><published>2010-01-30T10:22:00.001+01:00</published><updated>2011-07-14T09:19:40.231+02:00</updated><title type='text'>Do not set error configuration for KeyDuplicate to IgnoreError</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;Some posts ago I wrote about the error message ‘&lt;a href="http://ms-olap.blogspot.com/2009/11/duplicate-attribute-key-has-been-found.html"&gt;A duplicate attribute key has been found when processing&lt;/a&gt;’. At the end of the post I suggested not to set the error configuration for the KeyDuplicate to IgnoreError (unless you are in a prototyping scenario).&lt;/p&gt;  &lt;p&gt;Some people asked me about this (as it seems to be a nice and easy solution, similar to ‘on error resume next’ in VBA which I also wouldn’t recommend). &lt;/p&gt;  &lt;p&gt;The main reason for me is that I definitely prefer getting a processing error instead having wrong values in the cube which could be the consequence.&lt;/p&gt;  &lt;p&gt;For example, let’s take a look at a date dimension:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td width="94"&gt;&lt;strong&gt;DateID&lt;/strong&gt;&lt;/td&gt;        &lt;td width="142"&gt;&lt;strong&gt;Month&lt;/strong&gt;&lt;/td&gt;        &lt;td width="142"&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;1&lt;/td&gt;        &lt;td&gt;January&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;2&lt;/td&gt;        &lt;td&gt;February&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;3&lt;/td&gt;        &lt;td&gt;March&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;4&lt;/td&gt;        &lt;td&gt;April&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;5&lt;/td&gt;        &lt;td&gt;May&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;6&lt;/td&gt;        &lt;td&gt;June&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;7&lt;/td&gt;        &lt;td&gt;July&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;8&lt;/td&gt;        &lt;td&gt;August&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;9&lt;/td&gt;        &lt;td&gt;September&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;10&lt;/td&gt;        &lt;td&gt;October&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;11&lt;/td&gt;        &lt;td&gt;November&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;12&lt;/td&gt;        &lt;td&gt;December&lt;/td&gt;        &lt;td&gt;2008&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;13&lt;/td&gt;        &lt;td&gt;January&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;14&lt;/td&gt;        &lt;td&gt;February&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;15&lt;/td&gt;        &lt;td&gt;March&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;16&lt;/td&gt;        &lt;td&gt;April&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;17&lt;/td&gt;        &lt;td&gt;May&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;18&lt;/td&gt;        &lt;td&gt;June&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;19&lt;/td&gt;        &lt;td&gt;July&lt;/td&gt;        &lt;td&gt;2009&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;…&lt;/td&gt;        &lt;td&gt;…&lt;/td&gt;        &lt;td&gt;…&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;For this simple dimension you can see that months appear more than once (once per year). This means, that the following attribute relationship is &lt;strong&gt;wrong&lt;/strong&gt; (assuming that we used each of the columns above as key for the respective attribute):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-nP3mbx_qOMU/Th6YdCN4EQI/AAAAAAAACZo/1C1NJjpoUlA/s1600-h/image_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh4.ggpht.com/-ioavOZtkrQ8/Th6Ydm6rTaI/AAAAAAAACZs/pydOqY9lLl4/image_thumb1_thumb.png?imgmax=800" width="277" height="40" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Having set the KeyDuplicate to ‘ReportAndStop’ results in SSAS failing to process the dimension, so we are instantly aware that something is wrong. Here is the setting in BIDS (has to be set per dimension):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-fH2eiFt2oXg/Th6YedNGk1I/AAAAAAAACZw/bN9D5WSTjcQ/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh5.ggpht.com/-A0UrZ_Pu188/Th6Ye2mfjgI/AAAAAAAACZ0/Oafcoc8ay7U/image_thumb3_thumb.png?imgmax=800" width="342" height="268" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;And here is the error message if you do a ‘Process Full’ on the dimension:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimDate_x0024_', Column: 'Month', Value: 'April'. The attribute is 'Month'.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;With this error message you may fix the problem and get things right. However, if you set the KeyDuplicate to ‘IgnoreError’ you will get no feed back from the system at all. However, if you take a look at the dimension you will find that each month is only associated with one of the years (in my case 2008).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-y2vquycv4qE/Th6Yfq_bHOI/AAAAAAAACZ4/w3Q8uqTLnYM/s1600-h/image_thumb6%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh5.ggpht.com/-qmYdH5b3vig/Th6Ygalx0vI/AAAAAAAACZ8/fBzrFw_f43o/image_thumb6_thumb.png?imgmax=800" width="156" height="324" /&gt;&lt;/a&gt; &lt;/p&gt;      &lt;p&gt;A user might not realize this and although I have fact values for all years, my cube now only shows values for 2008:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-lUBSDl3VokM/Th6Yg2-kymI/AAAAAAAACaA/rtlc1gQEQFY/s1600-h/image_thumb5%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh5.ggpht.com/-1iRjEJzGQS0/Th6YhvsQ40I/AAAAAAAACaE/B7koSxTCKeg/image_thumb5_thumb.png?imgmax=800" width="185" height="78" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The correct values in my case would be as follows:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-nb3p-Ndlmpw/Th6YiFoS2WI/AAAAAAAACaI/aiyCiatL6Yk/s1600-h/image_thumb7%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh5.ggpht.com/-dzS7CpSnnmM/Th6Yi3PnYCI/AAAAAAAACaM/V_S7o0CnmEg/image_thumb7_thumb.png?imgmax=800" width="201" height="104" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;So in this situation we would have totally wrong values in the cube without even getting an error message and this is the main reason for me to have KeyDuplicate set to ‘ReportAndStop’. In my trivial example, the error would be easily detected by the users but as cubes are getting more complex you might not notice &lt;em&gt;some&lt;/em&gt; missing and &lt;em&gt;some&lt;/em&gt; wrong values. Another source for an attribute relationship violation could be a modeling error for an SCD-2 dimension. If, for example, you relied on the attribute relationship between product and product group and forgot to model this properly although the data source keeps historic changes, than you might see the sales on the wrong product group. Again, with the ‘ReportAndStop’ option you would immediately know that there is an issue with your dimension.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-2401407368757666254?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/2401407368757666254/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/01/do-not-set-error-configuration-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/2401407368757666254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/2401407368757666254'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/01/do-not-set-error-configuration-for.html' title='Do not set error configuration for KeyDuplicate to IgnoreError'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-ioavOZtkrQ8/Th6Ydm6rTaI/AAAAAAAACZs/pydOqY9lLl4/s72-c/image_thumb1_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-5352490735531218436</id><published>2010-01-10T12:56:00.001+01:00</published><updated>2011-07-14T09:23:02.425+02:00</updated><title type='text'>Different granularity in a single dimension</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;   &lt;p&gt;Handling different granularity (for example actual and plan values) can get a little bit complicated. Of course there are standard methods, like splitting up the less granular data in order to meet the finer granularity. Or you could use a parent-child structure as this allows you to store data at different levels in the tree-like structure. Or you could supply ‘unknown’ elements to map the less granular information.&lt;/p&gt;    &lt;p&gt;For this post I want to show a different approach. Usually for each dimension we are linking all fact tables that refer to this dimension to the same key in the dimension table (the dimension’s primary key). However, the data source view also allows us to link facts to different key columns in the same dimension table. &lt;/p&gt;    &lt;p&gt;In my simple scenario I have time dimension (called DimDate) and two fact tables: Order and Order Plan. The orders are on a daily basis while the order plan is on a monthly basis. We want to link both fact tables to the same time dimension as shown below:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Gm29k3cVc3s/Th6ZCyigfjI/AAAAAAAACaQ/2aA-0E8YzR8/s1600-h/image_thumb11%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh4.ggpht.com/-6LUBdbyx2oo/Th6ZEmw_QaI/AAAAAAAACaU/RxDHxy2muvE/image_thumb11_thumb.png?imgmax=800" width="313" height="386" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;The link from the fact order table to the time dimension DimDate (marked as 1 in the sketch above) is the usual link from the fact table to the primary key of the DimDate table. The time dimension is on daily granularity and so are the order facts. But for the plan value fact table FactOrderPlan, the link to the time dimension is realized by using two key columns: Year (as the year number, eg. 2008) and month (as the month number, eg. 11 for November), so the link in the data source view looks like this:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-cZeEz5FVEjA/Th6ZFY9EcBI/AAAAAAAACaY/q7brS1iL24Q/s1600-h/image1_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image1_thumb" border="0" alt="image1_thumb" src="http://lh6.ggpht.com/-nA_0GkuSHtY/Th6ZG1TQdhI/AAAAAAAACac/Y4eqSkbd0eo/image1_thumb_thumb.png?imgmax=800" width="570" height="351" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;For our cube it is important to specify the right granularity attribute. While the order table is linked to the Day (granularity attribute), we link the order plan fact table to the month attribute and define the proper key mappings for that.&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-oMKPt242J1s/Th6ZIfTCmVI/AAAAAAAACag/lf90MyjwQBM/s1600-h/image_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh6.ggpht.com/-_eNU3ZO9igs/Th6ZI3y4oEI/AAAAAAAACak/LNcKOj8yW6g/image_thumb1_thumb.png?imgmax=800" width="690" height="402" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Now, the dimension usage looks like this:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-rAtB-UNR4hs/Th6ZJslbDJI/AAAAAAAACao/5VTRQHr6ins/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh6.ggpht.com/-4rjbG1G1vqs/Th6ZK6gM4EI/AAAAAAAACas/vsvuTkznWj0/image_thumb3_thumb.png?imgmax=800" width="483" height="133" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;We also defined proper attribute relationship for the time dimension accordingly to build up a year-&amp;gt;quarter-&amp;gt;month-&amp;gt;day hierarchy.&lt;/p&gt;    &lt;p&gt;So let’s take the first look at the cube created by this method:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-F8fjdlmi1sM/Th6ZL8TsTII/AAAAAAAACaw/opXXjMpv5RM/s1600-h/image_thumb5%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh6.ggpht.com/-7JfKDci6l-o/Th6ZMSy2DTI/AAAAAAAACa0/KVvllR_5U0M/image_thumb5_thumb.png?imgmax=800" width="320" height="288" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;This first look is as expected. As long as we are at a common granularity level shared by both fact tables, we can see the values correctly. Also, the aggregation of both fact sources works fine (although they are at different granularity).&lt;/p&gt;    &lt;p&gt;Now, let’s drill down to the day level which is not present in our planning data: &lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-bpVroPXXb1g/Th6ZNBISi7I/AAAAAAAACa4/7lXBwTMhWQU/s1600-h/image_thumb7%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb7" border="0" alt="image_thumb7" src="http://lh5.ggpht.com/-gYmwqNUP6h4/Th6ZN2Tn6bI/AAAAAAAACa8/1XIeUw3xZEg/image_thumb7_thumb.png?imgmax=800" width="319" height="492" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;The behavior here is exactly the same as for other unrelated dimensions! The value of the nearest matching hierarchy is taken for the levels below. Sometimes this behavior of the cube confuses the users, but we can still change this behavior by changing the parameter IgnoreUnrelatedDimensions: &lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-320qXVF7nEI/Th6ZOs5p--I/AAAAAAAACbA/LqEgRpguI7Y/s1600-h/image_thumb12%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb12" border="0" alt="image_thumb12" src="http://lh5.ggpht.com/-BVEcct_P5lU/Th6ZPetB2eI/AAAAAAAACbE/ak0754w4Re4/image_thumb12_thumb.png?imgmax=800" width="368" height="228" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Now, the planning values below the month granularity level have disappeared:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-Tj7JkK_7YGI/Th6ZP1ExYaI/AAAAAAAACbI/mBlirhqrmOE/s1600-h/image_thumb10%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb10" border="0" alt="image_thumb10" src="http://lh6.ggpht.com/-_qk6_Obz77w/Th6ZRBbbZJI/AAAAAAAACbM/SHkYUEzjfR4/image_thumb10_thumb.png?imgmax=800" width="318" height="450" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;However, if you want to create a calculated measure that is also based on the planning values, you should be aware of the fact, that the values are simply not existing any more at the day level. For example, let’s define a calculated measure PlanFulFillment using the following expression:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;[Measures].[Amount] / [Measures].[Amount Plan]&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;At the day level, the measure Amount Plan does not exist, so this results in computation errors:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-8Se97fLlpbI/Th6ZRtWM3SI/AAAAAAAACbQ/F5fqybW5yVg/s1600-h/image_thumb14%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb14" border="0" alt="image_thumb14" src="http://lh6.ggpht.com/-fJ5E7O9YIWI/Th6ZSR1qZpI/AAAAAAAACbU/PQi8h-bNLPg/image_thumb14_thumb.png?imgmax=800" width="393" height="126" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;You could still use the non-empty behavior for the calculate measure (set to “Amount Plan”) in order to have these computations disappear. However, if you want to refer to the monthly value, you can simple use the ValidMeasure MDX function that is always helpful in conjunction with IgnoreUnrelatedDimensions=false. So after defining our calculated measure as [Measures].[Amount] / &lt;strong&gt;ValidMeasure&lt;/strong&gt;([Measures].[Amount Plan]) the result looks like this (at day level, the monthly values for the planning data is taken)&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-aGW94CV5nag/Th6ZTe2MyuI/AAAAAAAACbY/R6OsHlS97dU/s1600-h/image_thumb16%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb16" border="0" alt="image_thumb16" src="http://lh5.ggpht.com/-Aw79520cnGU/Th6ZUCLmWEI/AAAAAAAACbc/Dc20OR_b2Fo/image_thumb16_thumb.png?imgmax=800" width="393" height="129" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;So, surprisingly enough (at least for me), everything behaves exactly like we wanted it to do and this makes the approach to an alternative in some scenarios. Again, please check your attribute relationship carefully and also spend some time on testing the result as the approach can get dangerous for more complicated attributes structures.&lt;/p&gt;    &lt;p&gt;I also checked this design with more attributes and parallel hierarchies in the time dimension (for example calendar week) and more fact tables (for example production plan) and the aggregation was still correct. Having IgnoredUnrelatedDimensions set to false is helpful here to clearly see, which fact is selected at the right granularity level. &lt;/p&gt;    &lt;p&gt;Following is an example with three fact tables (one at day level, one at month level and one at week level):&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-WMfP84AxJhQ/Th6ZUp6qKcI/AAAAAAAACbg/0ToIJla55aA/s1600-h/image_thumb2%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh5.ggpht.com/-bj0vH13Q2xg/Th6ZVVeVXwI/AAAAAAAACbk/ocOAqfkxw3E/image_thumb2_thumb.png?imgmax=800" width="662" height="136" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-5352490735531218436?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/5352490735531218436/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2010/01/different-granularity-in-single.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5352490735531218436'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/5352490735531218436'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2010/01/different-granularity-in-single.html' title='Different granularity in a single dimension'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-6LUBdbyx2oo/Th6ZEmw_QaI/AAAAAAAACaU/RxDHxy2muvE/s72-c/image_thumb11_thumb.png?imgmax=800' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-520893218200048742</id><published>2009-12-21T15:21:00.001+01:00</published><updated>2011-07-14T09:25:15.825+02:00</updated><title type='text'>A different approach to modeling units</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;   &lt;p&gt;While units are a built-in functionality in some OLAP databases, we need to take care of them in SSAS on our own. Usually I model units as a dimension so that the facts are linked to the unit they belong to. However, units usually must not be aggregated (like adding up meters with liters), and therefore we would set the IsAggregatable property to false. The user first has to choose a unit before the result is displayed. In some cases this may not be that clear. For (local) currency for example the values are aggregatable as long as the currency is consistent which may or may not depend of the currency unit. As long as the selection (no matter of how the filtering was done) leads to a single currency it would be possible to display the result in local currency. Whenever there is more than one currency involved, the result cannot be shown.&lt;/p&gt;    &lt;p&gt;Taking the usual approach with the non-aggregatable unit dimension you may find a situation like below:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-jhvKHEkjT1I/Th6ZrtDEXvI/AAAAAAAACbo/bZ5MkStbrxI/s1600-h/image_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb" border="0" alt="image_thumb" src="http://lh6.ggpht.com/-ijK9kOxxIfg/Th6ZsZlqNlI/AAAAAAAACbs/xwIh2SVuMmY/image_thumb_thumb.png?imgmax=800" width="207" height="134" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;For this example we have stores aggregated to some kind of store group with sales. As the currency is a non-aggregatable dimension you don’t see totals for the rows (just for the columns). Assuming we set our default member for the currency unit dimension to some currency not included (or to unknown like I did), you don’t see any value at all, if the currency unit is not included in the query:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-2X_6Ztg8v0s/Th6ZtD1sowI/AAAAAAAACbw/whzWnpicZ8Y/s1600-h/image_thumb1%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1" border="0" alt="image_thumb1" src="http://lh6.ggpht.com/-4I7wN5NCprM/Th6Ztui9hZI/AAAAAAAACb0/sf4geMBZjNU/image_thumb1_thumb.png?imgmax=800" width="108" height="37" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;So everything’s fine with this approach. But usually we don’t analyze by currency and if we simply put it on a filter, we might miss values of the other currencies like in the following screenshot of the same sample cube (the value for ‘Other’ gives no hint that there might be other sales here too):&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-OYMChfmb814/Th6ZufUUgYI/AAAAAAAACb4/dTPmY-Fuk8Y/s1600-h/image_thumb2%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh3.ggpht.com/-wsn6dRS6cz4/Th6ZvMnwVMI/AAAAAAAACb8/WWWqeqxgP34/image_thumb2_thumb.png?imgmax=800" width="134" height="108" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;So the idea could be to tell the cube to display the currency values as long as the displayed cell contains only one currency unit. For other cells we can only display a warning. The following screenshot shows the final result:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-JFklX_a3OWE/Th6ZvmIIZHI/AAAAAAAACcA/zgVLLvVB-2E/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh6.ggpht.com/-PSHeYmhlLvo/Th6ZwDa_NbI/AAAAAAAACcE/nmn6B4Xj0I0/image_thumb3_thumb.png?imgmax=800" width="142" height="109" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Please note, that although we don’t have the currency unit included in this pivot table on either the axis or the filter, the cube knows that certain cells only result in EUR values and therefore can be aggregated while other cells consist of more than one local currency and therefore cannot be aggregated.&lt;/p&gt;    &lt;p&gt;To explain how this can be done, let’s first look at the model of our sample cube:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-PgzcX_Oo68g/Th6Zw4drTaI/AAAAAAAACcI/nIos2vhp9Bs/s1600-h/image_thumb5%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb5" border="0" alt="image_thumb5" src="http://lh6.ggpht.com/-0P1fNhRuXq0/Th6ZxRbeWRI/AAAAAAAACcM/q_MCShfoWqc/image_thumb5_thumb.png?imgmax=800" width="513" height="276" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;The currency unit is linked to the fact table using its primary key UnitID. For the cube we include this key is a measure to the cube… hey, this sounds weird, why should we use the surrogate key as a measure?? Well, we even use it twice, once aggregated by the Min function and one aggregated by the Max function.&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-4DYnrUaZrec/Th6Zx58f4KI/AAAAAAAACcQ/cJF3qpjc9Xg/s1600-h/image_thumb6%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb6" border="0" alt="image_thumb6" src="http://lh5.ggpht.com/-rlo1EX7hOnk/Th6Zy-GjR-I/AAAAAAAACcU/6csRoEAFZs8/image_thumb6_thumb.png?imgmax=800" width="167" height="66" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Now, for any cell, where “Minimum CurrencyID” equals “Maximum CurrencyID” we are sure that we only have one currency unit in this cell. This can be used in a calculated member. I set the visibility of the original measure Amount to hidden and add a calculated member like this:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;CREATE MEMBER CURRENTCUBE.[Measures].Amount        &lt;br /&gt;AS iif([Measures].[Minimum CurrencyID]=[Measures].[Maximum CurrencyID],[Measures].[AmountLocal],NULL),         &lt;br /&gt;FORMAT_STRING = iif([Measures].[Minimum CurrencyID]=[Measures].[Maximum CurrencyID],         &lt;br /&gt;strtomember(&amp;quot;[Currency].[Unit].&amp;amp;[&amp;quot;+CStr([Measures].[Minimum CurrencyID])+&amp;quot;.]&amp;quot;).properties(&amp;quot;Format String&amp;quot;),         &lt;br /&gt;&amp;quot;&amp;quot;),         &lt;br /&gt;VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Fact Sale' ; &lt;/font&gt;&lt;/p&gt;    &lt;p&gt;As you can see we also take the format string (e.g. “$”#,##0.00) from the currency table to display each currency properly.&lt;/p&gt;    &lt;p&gt;For the above screenshot I modified this code a little bit and returned the value 0 instead of NULL. I also used the text “multiple units” as the default format string for this value 0. I prefer the NULL value though as it is “more” accurate.&lt;/p&gt;    &lt;p&gt;As long as you’re using Excel 2007 compliant format strings (as describe in &lt;a href="http://ms-olap.blogspot.com/2009/11/how-to-define-excel-compliant-format.html"&gt;one of my previous posts&lt;/a&gt;), everything should display properly in Excel too as shown in the following screenshot:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-ZB_daWgvqLY/Th6Zz-YlugI/AAAAAAAACcY/ZXXNX7-qd3g/s1600-h/image_thumb11%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb11" border="0" alt="image_thumb11" src="http://lh6.ggpht.com/-T5aojrDq9xs/Th6Z0aUifiI/AAAAAAAACcc/BKdPseGXW3U/image_thumb11_thumb.png?imgmax=800" width="216" height="165" /&gt;&lt;/a&gt; &lt;/p&gt;    &lt;p&gt;Other clients can also leverage the formatting defined in our cube. For example in Reporting Services you could use the above formats if you refer to the FORMATTED_VALUE property instead of the VALUE property. So instead of&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;=Fields!Amount.Value&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;we would use:&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;=Fields!Amount.FormattedValue&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;Here’s a simple report based on the sample cube:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-kEvHdHZoaMo/Th6Z1DjX4tI/AAAAAAAACcg/7hsSTHjOBeM/s1600-h/image_thumb8%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb8" border="0" alt="image_thumb8" src="http://lh5.ggpht.com/-UqaBWh4mE7A/Th6Z2Bj6bDI/AAAAAAAACck/wuzkVauBONQ/image_thumb8_thumb.png?imgmax=800" width="404" height="236" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4879885044062498968-520893218200048742?l=ms-olap.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ms-olap.blogspot.com/feeds/520893218200048742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://ms-olap.blogspot.com/2009/12/different-approach-to-modeling-units.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/520893218200048742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4879885044062498968/posts/default/520893218200048742'/><link rel='alternate' type='text/html' href='http://ms-olap.blogspot.com/2009/12/different-approach-to-modeling-units.html' title='A different approach to modeling units'/><author><name>Hilmar Buchta</name><uri>http://www.blogger.com/profile/07497529483535542290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://2.bp.blogspot.com/_rzeWIeWVsxA/ScDkLmopJCI/AAAAAAAABAg/sdlcn6fz96M/S220/Hilmar+20090307_Klein.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-ijK9kOxxIfg/Th6ZsZlqNlI/AAAAAAAACbs/xwIh2SVuMmY/s72-c/image_thumb_thumb.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4879885044062498968.post-5937255148751675272</id><published>2009-11-29T12:04:00.001+01:00</published><updated>2011-07-14T09:32:17.918+02:00</updated><title type='text'>A duplicate attribute key has been found when processing…</title><content type='html'>&lt;p align="right"&gt;SQL Server 2005 | SQL Server 2008&lt;/p&gt;  &lt;p&gt;This post is about a common error message during dimension processing I’ve been asked about quite a few times so I thought it would be worth posting about it. The error message says that a duplicate attribute key has been found when processing as shown in the following screenshot for a test cube (I just processed one dimension here):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-gF6zsL0Ml3A/Th6bV_xpt3I/AAAAAAAACco/QCcDUxEq-3o/s1600-h/image_thumb3%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb3" border="0" alt="image_thumb3" src="http://lh4.ggpht.com/-MIAoTKobEzk/Th6bWn6ba_I/AAAAAAAACcs/wbiVSfIoxG4/image_thumb3_thumb.png?imgmax=800" width="654" height="522" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Here’s the full error message:&lt;/p&gt;  &lt;p&gt;&lt;font face="Courier New"&gt;&lt;strong&gt;Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product', Column: 'ProductGroup', Value: ''. The attribute is 'Product Group'.&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;When you got to this article because you just ran into this problem you probably don’t want to read much about the background but only want a solution. Unfortunately I found at least three possible reasons for this error message:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Reason 1 (likely)&lt;/u&gt;&lt;/strong&gt;: The most likely reason for that error is that you are having NULL values in your attribute key column.If you simply created the attribute by dragging it from the source view, BIDS only sets the key column (name and value column default to the key column in this case), so for example if you have a column ‘Product Group’ in your source table and drag it to your dimension, the product group (Text field) will automatically become the key for this attribute. The attribute is listed in the error message (in the example above it is ‘Product Group’).&lt;/p&gt;  &lt;p&gt;&lt;em&gt;Solution: Try avoiding those NULL values in your data source (for example by using a DSV query and the T-SQL coalesce-function). When your source data is a data warehouse it’s also a good practice to avoid null values as they complicate the queries to the data warehouse.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;u&gt;&lt;strong&gt;Reason 2 (likely)&lt;/strong&gt;&lt;/u&gt;: You defined an attribute relationship between two attributes of the dimension but the data in your source tables violates the relationship. The error message gives you the name of the conflicting attribute (text part ‘The attribute is…’). The attributes has a relationship to another attribute but for the value stated in the error message (‘Value: …’) there are at least two different values in the attribute that the relationship refers to. If you have &lt;a href="http://www.codeplex.com/bidshelper"&gt;BIDS Helper&lt;/a&gt; installed, you can also see the error details and all violating references when using the ‘Dimension Health Check’ function. &lt;/p&gt;  &lt;p&gt;&lt;em&gt;Solution: You may solve the error by making the key of the attribute unique. For example:      &lt;br /&gt;&lt;/em&gt;&lt;em&gt;Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimDate_x0024_', Column: 'Month', Value: 'April'. The attribute is 'Month'.      &lt;br /&gt;&lt;/em&gt;&lt;em&gt;In this example, the Month attribute violates an attribute relationship (maybe Month-&amp;gt;Year) for the month April meaning that April appears for more than one year. By adding the year to the key of the month attribute you would make the relationsship unique again.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;u&gt;Reason 3 (not that likely)&lt;/u&gt;&lt;/strong&gt;: You have an attribute with separate key and name source fields. When you check the data, you see that keys are appearing more than once with different entries in their name columns (note that it’s not a problem if the key appears more than once if only the name column is the same). In this case you will usually also see the key value in the error message, for example:     &lt;br /&gt;Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product2', Column: 'ProductCode', &lt;u&gt;Value: '1'&lt;/u&gt;. The attribute is 'Product Name'.     &lt;br /&gt;This means that the attribute ‘Product Name’ uses the source column ‘ProductCode’ as the key and for the product code 1 there is more than one name.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;Solution: Use a unique key column (unique with respect to the name column)&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;u&gt;Long explanation Reason 1&lt;/u&gt;&lt;/strong&gt;: &lt;/p&gt;  &lt;p&gt;In this case our attribute is only defined by one single source column (acting as key, name and value information) from the data source view. When processing a dimension, SSAS run select distinct queries on the underlying source table, so a duplicated key should be impossible even if the key appears multiple times. Just think of a date dimension like the following one (just for years and months):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-DD0snbsd59Q/Th6bXfOi6EI/AAAAAAAACcw/Hp3IdnS0imE/s1600-h/image_thumb15%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb15" border="0" alt="image_thumb15" src="http://lh4.ggpht.com/-R7Dxejk1GP8/Th6bYDjhS9I/AAAAAAAACc0/rT2MS8bYDLo/image_thumb15_thumb.png?imgmax=800" width="172" height="250" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In this case the year (2009) appears in multiple rows. However, defining an attribute year (using the the year column as the key) does not give a conflict as it is queried using a distinct query (so 2009 only appears once). So again, how could we get a duplicate result when using a select distinct query? Here is how my product table looked like:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-yxFOKFU-ink/Th6bYyPG19I/AAAAAAAACc4/kK-s1KYA8LM/s1600-h/image_thumb16%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb16" border="0" alt="image_thumb16" src="http://lh3.ggpht.com/-Y-BQeXC-_TY/Th6bZVZ7tbI/AAAAAAAACc8/GZg55nyIkgc/image_thumb16_thumb.png?imgmax=800" width="275" height="133" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can see the ProductGroup column has one row with an empty string and another row with a NULL value. When SSAS queries this attribute during processing it runs the following SQL query (that can be captured using the profiler):&lt;/p&gt;  &lt;p&gt;SELECT DISTINCT [dbo_Product].[ProductGroup] AS [dbo_ProductProductGroup0_0]    &lt;br /&gt;FROM [dbo].[Product] AS [dbo_Product] &lt;/p&gt;  &lt;p&gt;The result of the query looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-aUrHBALQZRk/Th6baHAyvQI/AAAAAAAACdA/AS1n2hklvfs/s1600-h/image_thumb%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb" border="0" alt="image_thumb" src="http://lh3.ggpht.com/-2IF3ymbghec/Th6basWa8BI/AAAAAAAACdE/phMToBYQ_fA/image_thumb_thumb.png?imgmax=800" width="196" height="100" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now, with the default NULL processing for our dimension attribute being set to ‘Automatic’ meaning Zero (for numerical values) or Blank (for texts) the NULL value above is converted to an empty string. So the result set has two lines with an empty string and that causes the error. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-G78OkQzUbQs/Th6bbrcrnwI/AAAAAAAACdI/WmiEibqT01Y/s1600-h/image_thumb17%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb17" border="0" alt="image_thumb17" src="http://lh6.ggpht.com/-B1DVLAku0Kw/Th6bcAC3bUI/AAAAAAAACdM/5FEegYFdJfk/image_thumb17_thumb.png?imgmax=800" width="362" height="344" /&gt;&lt;/a&gt; &lt;/p&gt;          &lt;p&gt;So the problem can be avoided if you don’t have null values in your column. This explains the first reason described above.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;u&gt;Long explanation Reason 2&lt;/u&gt;&lt;/strong&gt;: &lt;/p&gt;  &lt;p&gt;I blogged about attribute relationship before and you may want to read &lt;a href="http://ms-olap.blogspot.com/2008/11/turning-non-natural-hierarchy-into.html"&gt;this post&lt;/a&gt; about defining the key for attributes in an attribute relationship.&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;u&gt;Long explanation Reason 3&lt;/u&gt;&lt;/strong&gt;: &lt;/p&gt;  &lt;p&gt;Let’s take a look at the following modified product table.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-736dIqM7SS4/Th6bc05DU2I/AAAAAAAACdQ/Bh-5Bl1liH4/s1600-h/image_thumb2%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2" border="0" alt="image_thumb2" src="http://lh6.ggpht.com/-lTvJqq4oe8U/Th6bdpl9zoI/AAAAAAAACdU/k3DwOLqnAzE/image_thumb2_thumb.png?imgmax=800" width="346" height="93" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The ProductID column is unique while the ProductCode is not. If we now define the ProductName attribute as follows we will also get a duplicate key error:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-SpiW5DN7594/Th6beGxIjXI/AAAAAAAACdY/3eCoZOqQnQQ/s1600-h/image_thumb4%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb4" border="0" alt="image_thumb4" src="http://lh6.ggpht.com/-fSCmud7SWDQ/Th6besgxufI/AAAAAAAACdc/wB0V5WNe25g/image_thumb4_thumb.png?imgmax=800" width="395" height="44" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The reason here is that for the ProductCode 1 two names are found (and therefore the select distinct returns two lines with ProductCode 1), so ProductCode is not a good key
