Search This Blog

Thursday, December 27, 2012

7 reasons why you should prefer SCD 1 over SCD 2+

dimensional modeling

When it comes to dimension design a common question is about dealing with attributes that are changing over time. A frequently used pattern for such attributes was developed by Ralph Kimball with his concept of slowly changing dimensions (SCD). One aspect is to handle source systems that simply overwrite their master data, while you want to preserve past attribute properties for your analytics. I’m not going in to too much detail here, but focus on the 2 most important types:

Type 1:changes are directly overwritten in the dimension table. Past values are not preserved but get lost

Type 2: Past values for dimension data is kept using date from/to columns and/or a current row indicator

There are more types and also combinations of types (hybrid types). You can find much more about slowly changing dimensions here.

When talking to other BI architects I frequently hear the opinion that type 2 should be used for almost every attribute. I wouldn’t agree to this statement and try to use SCD type 1 wherever it is possible and type 2 only, if there is a real business requirement for type 2. To be clear, there’s nothing wrong with type 2, but it adds a significant complexity to your BI solution and in some cases, also has other downsides to be aware of. This post collects 7 topics that you should know about when deciding for type 1, type 2 or any other method for dealing with historical values.

 

1. Corrections versus historically relevant changes

Usually, we don’t want corrections, like a typo in a customer’s name, to be kept historically. For example, if a customer’s name was corrected from ‘Miler Inc.’ to ‘Miller Inc.’, usually we don’t want to see this as two different customers in our reports:

Instead of

image

we want to see

image

However, if the Miller Inc. consolidates with Hiller Inc. to “Miller & Hiller Inc.”, we might want to see them as separate customers, for example like this:

image

So, the problem with corrections is obvious. When a change of data happens in the source system, we might not be able to distinguish corrections from changes that need to be preserved over time. One option could be to use a function like soundex to treat minor changes to such a field as a type 1 modification (to the most current type 2 entry) and all others as a type 2 change. For the example from above, “Miler Inc.” and “Miller Inc.” both result in soundex code “M460”. But using soundex (or a similar function) does not guarantee us to get the result we wanted, i.e. to decide between corrections and modifications at a reliable rate, it’s still just a guess.

So it’s much more common to treat attributes/fields where we expect more corrections as SCD-1 and attributes that rather experience structural changes as SCD-2. Again, this does not necessarily guarantee to be correct.

In many cases where we need SCD-2, a good approach is the following:

  • If we only have a text-field, make it SCD-1
  • If we have a key/value field (because the value is taken from another table), make it a
    • SCD-2 change, if the key changes
    • SCD-1 change, if the text for that key changes

In special cases, things might be more complicated. For example, if you have ZIP code and city, a change to the ZIP code alone may rather be an SCD-1 change (more likely to be a correction), while a change in the ZIP code and the city name more likely indicates a relocation and therefore results in an SCD-2 change.

In any way, we’re trying to guess the idea the user had, when doing the modification, just by looking at the result.

 

2. Comparing with data from the past

When the requirement is to compare current data with data from the past, it seems to be obvious that we need SCD-2 to preserve the correct dimension structures from the past. But is this really true? Well, actually it depends on what you need the analysis for. For example, take a look at the following chart, showing sales of a specific product group.

image

First thing to note is that this chart breaks a lot of information design rules. In this chart confusion is intensified with a faked continuity by using a line chart (shame on me). The line chart here obfuscates the real situation by showing a continuous descent in September while actually the value is falling instantly here. Now, what can you say when looking at this line chart?

  • The product group performed ok between January and September
  • September was a great month
  • Between September and October a catastrophe must have happened. Maybe a competitor launched better and cheaper products or there were reasons to cause a bad publicity

And how would an end user react, if you say that all this is not true? The descent in October is nothing but a SCD-2 change (a what?) as one major product was moved to a different product group. In fact nothing happened, not a catastrophe, no competitor being better, just a change in the master data. This is a typical situation with SCD-2 changes over time: Results are not self-explaining and may cause jerky leaps when data is plotted over time.

If the product hierarchy was maintained as SCD-1, maybe the chart would look like the green line below:

image

I think you can clearly see the advantage. The green line (SCD-1) is free of SCD-2 leaps, so it’s much easier to understand for the end-users. The interpretation is more like:

  • We had some problems between January and October
  • but we managed a turn-around in November and December
So, whenever you like to see trends, derive insight for future development or find outliers, the SCD-1 view is much easier to understand without additional explanation about master data changes.

 

3. Mixture of different concepts (type 1, type 2 …)

Maybe we don’t want all attributes to be type 2. The requirement could be that some hierarchies are on type 2 while others are on type 1. Again, think about the end users. Mixing both concepts may be very confusing. In some aspects you may be able to compare with printed reports from the past, in some aspects data is shown differently. However, making all attributes type 2 is also a problem (see ‘corrections’ above). One option might be to use perspectives: One perspective for the current type-1 view, one for the past type-2 view. However, another option would be to question if you really need the type-2 view, especially in a mixed form.

 

4. Justification or reproduction of results from the past

The requirement for keeping dimensional data in its historical form (past view, type-2) often results from the need for justification. If we want to be able to reproduce results in the same way as they were at some point of time in the past, we need to keep track of changes. Maybe someone asks you why you haven’t already seen the sales problems from our example from above (green line) in April. With the SCD-2 type of the chart (blue line from above), you could prove that the chart for April was still fine, so obviously you’re not responsible for the mess (this doesn’t solve the problem, but maybe a good justification). Other reasons to reproduce a report with exactly the same values from the past, are compliance reasons (SOX, BASEL II etc.). However, if you build your BI solution to justify yourself, keep in mind:

  • Keeping historical values for the dimensions is not sufficient in most cases
  • Fact data may have been corrected or may not have been available at the time the report was generated in the past

So, often enough, you also need to preserve the historical state of your fact data. Maybe you end up with a separate key date dimension to show the data exactly as it was at this point in time. Besides the need for keeping the historical state of all attributes in all dimensions (even for corrections!), this may also raise the complexity and costs of your BI-system significantly. So, if you don’t have these requirements, there is no reason to implement this historical view.

 

5. What is the “real” time of the change?

If we have virtual structures that only exist in some master data system, that are only maintained there and that are not delimited in time (valid date from/to), the SCD-2 detection usually works as expected. This might be true for typical objects in controlling, a cost center hierarchy or a profit center. In other cases, the time of change in the master data system, must not necessarily match the real time of the change. Think of a senior sales manager. A very important customer was assigned to the sales manager at the beginning of the year. Then the sales manager wins a big deal with this customer. But the BI system doesn’t show the revenue associated to this sales manager. What went wrong? Usually, the sales manager wants to know in such a situation. Let’s assume, the customer relationship department simply forgot to move the customer over to the new sales manager. After being notified they immediately correct this error. But then? If the BI system has no valid from/to date information from the source system, it is only capable of observing the date where the change occurred. Data from the past will still be associated to the wrong sales manager, usually causing trouble and confusion. Of course, there is no simple solution for this case. If you don’t have a proper valid date range from the source system, this issue can only be solved by manually correcting this case. The thing, I want to point out here, is that SCD-2 alone might not be able to solve such situations as it is impossible to guess the real date for a change from observing the change in a master data table.

 

6. Persisted data in BI-system

Of course it’s normal to persist data in the data warehouse system. But many smaller BI systems don’t need to do that, as all data is available in some source systems. This situation has some benefits. For example it’s easy to include other topics of data for the full history (complete re-load). Also, there is no need to back-up the data warehouse. So, if you are in such a good situation and a requirement is to include SCD-2 while the source system simply overwrites its data, you can no longer take all that data from the source system. Again, not much of a problem, but it adds some kind of complexity to you BI-system.

 

7. Testing

If the source system overwrites its data while the BI system maintains history in a SCD-2 style, it’s more difficult to test the results from you source system with the BI system as your source system can only report data in the most current state while you BI-system reports some (or all) attributes in the historic perspective. Now, testing is very important and in many cases you can even do automatic testing (using a tool like BI.Quality). So, if you cannot reproduce the current state of the data from your BI-system, testing is very difficult. So for many situations, my recommendation is to use SCD-6 (1+2+3) instead of SCD-2 even if the SCD-1 view is not necessary but for testing purposes. Again, this adds complexity.

 

Conclusion

The concept of the slowly changing dimensions belongs to the fundament of BI data modeling. However, keeping historical values using type 2 (SCD-2) may have some negative side effects and raise the complexity of your BI system. So it’s a good advice to consider handling historical changes carefully and to be fully aware of those side effects.

Sunday, December 16, 2012

Hierarchies with text fields (attribute relationship)

SQL Server 2005-2012

When setting up hierarchies for multidimensional SSAS attribute relationships between the levels are highly recommended. There might be scenarios in which the source data breaks those attribute relationships and usually this can be solved by choosing a proper key, in some cases combined keys. I wrote about this topic in earlier posts (see here and here).

In general you are in a good situation if the attribute relationship is modeled as a 1:n relationship between each corresponding level in the relational model of the source system. In this case you can rely on the keys of those tables. For example, in AdventureWorks we find tables like Product, ProductSubcategory and ProductCategory for the product hierarchy. Each table is a foreign key relationship to the next level, so we can be absolutely sure that we have true 1:n relation between products and their sub category and between the sub category and the category (using the keys from the relational databases). In our attribute relationship we can rely on those keys and no matter how the content of text fields changes in these tables, the attribute relationship for our hierarchies will still work.

However, and this is the topic of this post, you might be in a situation where the source systems only delivers some text fields and you have no information how these fields are linked in the source system. Maybe there isn’t any link at all and the text fields are nothing but additional columns at the same grain as your dimension key. If it’s a requirement to build a hierarchy using these text fields and if you want to set up attribute relationship correctly, changes in the source data might break your attribute relationship and therefore cause the dimension processing to fail, if you use the text fields as attribute key for these attributes.

To illustrate this situation, let’s take a look at the following data table which is just an example of an interface file (maybe a csv-file):

image

The requirement is to create the following drill down path (hierarchy) based on this data:

Manager —> Sales Territory —> State —> City —> Office

Checking the existing rows of data, the attribute relationship can be set up identically to this drill down path, as all of the following relationships are one-to-many:

  • Office Id to city
  • city to state
  • state to sales territory
  • sales territory to manager

So you could be tempted to create the attribute relationship based on those text fields and to use the text field itself as the key for the given attribute. The dimension will process correctly and everything seems to be fine. But in this case, with no proper keys from the source system, we have no guarantee at all, that the cardinality between those attributes will always be like this. For example, let’s assume, the following 3 new office are established:

image

The first row is a second office in New York. This line does not break any of the above relationship settings. But office 7 is managed by Linda Randall instead of Barbara Moreland. So the relationship between manager and sales territory is no longer 1:n but now n:m. And office 8 is newly-found in Florence, Kentucky, which is different from Florence, Alabama and therefore breaks the relationship between state and city. Within our four attribute relationships from above only two are still valid after inserting the rows for office 6-8. It may be possible to process the dimension correctly for some time, but at any time there is the risk that rows show up which break the relationship (offices 7 and 8 from the example above) causing the dimension process to fail with a duplicate key error.

The actual error will be somewhat like this:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Office', Column: 'City', Value: 'Florence'. The attribute is 'City'.

You will find more information about his kind of error and also about other possible causes for this to happen here.

So, how can we prevent this to happen and how can we immunize our hierarchy against changes in the data? The solution is quite simple. If we don’t get reliable keys from our source system, we have to compute them by ourselves. One idea might be to use combined text fields, which would work, but for performance reasons we prefer integer keys for the attributes.

The easiest way to create these keys, is to establish a lookup table for each attribute as shown below. For each attribute, we build a look up key that contains all dependent attributes (from our hierarchy) and compute an artificial key (maybe by using an identity column). In our example, for the city, we have to include city, state, sales territory and manager in the lookup. I’m using a simple text field in this example, but in general a hash code (like MD5) would be a better solution.

image

Whenever a look up key is not found, a new row is inserted, thus resulting in a new key.

Using this look up table, we can enrich our original data with reliable keys as shown below:

image

And no matter how the data changes in our source table, we will still be able to create attribute keys that fulfill the 1:n requirement for attribute relationship.

Ok, I’m hearing you say, that this was an easy case. What can I do if I have multiple parallel hierarchies instead of only one? How can the keys be computed in such a case. Well, maybe you have already guessed it: Usually, if there are no reliable keys generated from the source system, there is no work around but to create the attributes per hierarchy. For example, if the users also wanted a simplified hierarchy like

Office—>Sales territory

we end up with a second attribute (column) for the sales territory. If you don’t create separate attributes you might end up with duplicated entries confusing the user of the hierarchy. This happens because elements in our dimension/hierarchy are shown by key. With a single column, our full hierarchy from above would be correct, while the simplified hierarchy would happen to look like this:

  • Southeast
    • Office 5
    • Office 8
  • Southeast
    • Office 7

The reason for this is that using the hierarchy key, we end up the two different versions of “Southeast” (one for manager Barbara and one for Linda, in order to preserve the 1:n relationship to the manager). But of course, what the end user would expect is somewhat like this:

  • Southeast
    • Office 5
    • Office 7
    • Office 8

This explains, why we need separate attributes (and separate keys) for each of the hierarchies.

And again, all of the above is only necessary if we cannot fully rely on the 1:n relationship.

Sunday, August 26, 2012

One or two dimensions?

Dimensional modeling

This post is about a typical question in dimensional modeling. The question is about how to cluster the attributes for analysis into dimension tables in your data ware house. Usually this is done within the ETL process of your BI system, so this might not be a typical question for the SSAS design. However, the question might come up in the context of your data source views and also in the context of usability (which makes it exactly a question of the SSAS design).

In the elementary school of BI we learn that our master data tables do not necessarily be the dimensions, that our transactional data tables do not necessarily be our fact tables but that the dimensional model is somewhat re-grouping all the information that is stored in columns of those tables to a flexible, yet easy to understand model of fact and dimension tables.

For example, a typical operational data structure for orders, with transactional tables of order head and order detail, might result in a single order fact table with some of the order head information being stored as dimensional information.

So, for any two attributes in your source tables, you have to consider to put them into the same or into different dimension tables. Here are the two options:

 

Option 1: Same dimension table for both attributes

image

Option 2: Separate dimension tables for the two attributes

image

 

In order to give you some guideline with this decision, here are some aspects that favor one of the two options.

 

Aspect / requirement Favors one dimension table (option 1) Favors separate dimension tables (option 2)
Both attributes logically belong to the same business entity, for example the customer or the region or the company

Stern

Both attributes have a strong relationship, like one to many (or ‘almost’ one to many)

Stern

Both attributes naturally appear in the same hierarchy (for example product group and product)

Stern

 

The number of existing combinations of both attributes is significantly higher than the number of rows for the attribute with the higher row count (see below) 

Stern

There are fact tables that are only linked to one of the two attributes (for example planned revenue being linked to the product group, but not to the product, see below)  

Stern

The association between attribute 1 and attribute 2 is rapidly changing over time but must be reflected in the model (historical view must be kept, see below)  

Stern

It should be possible to analyze facts by the most current combination of the two attributes (see below)

Stern

 

While most of the aspects are self-explaining, I’d like to add some more information for some of the aspects here:

 

The number of existing combinations of both attributes is not significantly higher than the number of rows for the attribute with the higher row count.

Although both attributes might appear to belong to the same logical entity, putting them into one dimension table might dramatically increase the number of rows for this table. For example: Date and Time. A table for storing the data at day level contains about 3,650 rows in ten years (10 x 365). A table for storing time at the detail level of a minute contains only 1,440 rows (24 hours x 60 minutes). So keeping date and time separated results in two small dimension tables. But as each information in the time table exists for every day, the combined table would result in 3,650 x 1,440 = 5,256,000 rows. While most BI systems are capable of handing dimension of that size, it is still a very large dimension table.

 

There are fact tables that only be linked to one of the two attributes

In an earlier post I wrote about associating fact tables to different levels of granularity within the same dimension table. This is possible, however it might be confusing and it somewhat breaks the idea of a true star- or snowflake layout. But actually, this is a hard decision for the dimensional architect and has to be considered very carefully, because in most such scenarios where you face this situation, both attributes really belong to the same entity and may also form a hierarchy (one to many). But only one of the two fact tables has a detail level that is sufficient to link it to both attributes. For example, you may have revenue as actual and forecast in different fact tables. The actual revenue is at the day level but the forecast is at month level. Now, day and month perfectly belong together and even form a good hierarchy. And separating day and month into separate dimensions would be a strange solution. So consider for yourself if you want to separate those attributes or use the technique described in this post to keep them together.

 

The association between attribute 1 and attribute 2 is rapidly changing over time but must be reflected in the model (historical view must be kept)

In the context of preserving the historical view in our data ware house, we usually consider Kimball’s slowly changing dimension approach, especially slowly changing dimensions of type 2, 3 or combinations. However, separating our attributes in two different dimension tables automatically preserves the historical view, when being used from the same fact table (assuming the fact table is also linked to the date/time-dimension). This is somehow obvious but often ignored by dimensional architects. And while slowly changing dimensions of type 2 or 3 increase the number of rows in our dimension table and might not work for rapidly changing dimensions, separating the attributes keeps the individual dimension tables small.
Of course, the opposite is also true and explained as the next aspect:

 

It should be possible to analyze facts by the most current combination of the two attributes

In Kimball’s sense of slowly changing dimensions, this would be a type 1 slowly changing dimension situation. For example, the requirement is to only analyze customers (attribute 1) by their last city of residence (attribute 2). If, for any reason, customer and city are separated into different dimension tables (one named customer, the other named customer geographic for example) a relocation of the customer would require updating existing fact rows in order to get the present view (scd type 1). Updating existing fact rows is usually something you want to avoid, if possible.

Sunday, August 5, 2012

Semi Additive Measures in SQL Server Standard Edition

SQL Server 2008 | SQL Server 2008R2 | SQL Server 2012

A few years ago, Alex Whittles wrote a very detailed explanation about semi additive measures in a SQL Server Standard Edition. For my post today I’m concentrating on some other ideas, especially on dynamic sets in SQL Server 2008. Semi additive measures are measures that have a different way of aggregation over time (or other specific dimensions). For example, thinking of stock values you want to sum them up by product, location, country etc. but not by time (here you may want to see the last value, the average, the minimum/maximum or whatever).

For SQL Server SSAS (multidimensional) semi additive measures are a feature that is only available in the Enterprise Edition of the server. The following post shows methods how to implement such calculations on a Standard Edition. However, the implementation in the Enterprise Edition performs better than the custom calculation shown here.

For my example, I’m using pretty much the same source data as for my DAX post about semi additive measures.

This is how the sample data looks like when using the measure StockLevel:

t1

The time dimension is at day granularity. The stock values are associated with the last day of the month (in order to properly show the effect of the ‘Last Value’ semi additive function).

Since we use Sum as the aggregation function here, the grand total for each row is correct while the sub totals and and the grand total for the columns make no sense.

This is a job for semi additive measures! In order to see different effects, I created three of them named

  • EP Average
  • EP LastValue
  • EP LastNonEmpty

The prefix EP stands for Enterprise Edition. For example, this is the definition of “EP LastNonEmpty”:

t2

The following screen shot shows all of those measures side by side (for all products):

t3

The difference between ‘last value’ and ‘last non empty’ is about the total. Last non empty really takes the last non empty value within the current context – not within the full date dimension. So for the measure “EP LastNonEmpty” we see a total for 2011 but no total for April (as there is no value for April at all). The measure “EP LastValue” on the other hand gives no total for 2011 as there is no value for December 31, 2011.

Now let’s try to rebuild those measures.

 

Average over time (version 1)

First we start with the average over time. If you have (or can create) a distinct count measure for the date key of the fact table it is quite easy to compute the average over time. Let’s assume our distinct count measure is named [Date Key Distinct Count], then the calculation for the average looks like this:

CREATE MEMBER CURRENTCUBE.[Measures].[STD Average]
AS Measures.StockLevel / [Measures].[Date Key Distinct Count];

We can check the results be putting both measures side by side:

t4

Of course, the calculation also works if you filter for a specific product.

Average over time (version 2)

I just wanted to present a different approach in case you do not want to create a distinct measure (separate measure group). We still need to find out the number of days in a given date range. We can use a dynamic set in order to do so:

CREATE DYNAMIC SET CURRENTCUBE.[CurrentDateRange]
AS [Date].[Day].[Day], DISPLAY_FOLDER = 'Calculations'  ; 

CREATE MEMBER CURRENTCUBE.[Measures].[STD Average 2]
AS [Measures].[Stocklevel]
    /count(nonempty(existing CurrentDateRange, [Measures].[Stocklevel])) ;

The set needs to be dynamic in order to reflect a sub cube context (for example if Microsoft Excel 2007 and later is used to query the cube). Also, we have to filter out the dates with values as the average needs to count only existing date rows.

Comparing the three measures (Enterprise method and the two methods for the average) side by side shows that they are the same:

t5

 

Last value

For the last value, I’m also using the approach with the dynamic set here. We use the same set, so the calculation is simply:

CREATE MEMBER CURRENTCUBE.[Measures].[STD LastValue]
AS ([Measures].[Stocklevel], tail(existing CurrentDateRange,1).item(0)) ;

The expression tail (…, 1) gives the last item of the set. We evaluate our measure on this last item.

Before comparing the results with the enterprise measure, let’s quickly add the last one:

 

Last Non Empty

This formula is almost identical to the one I used before, except that we wrap the set in a nonempty(…) function. Again we take the last item (in this case this is the last item with data), so the function looks like this:

CREATE MEMBER CURRENTCUBE.[Measures].[STD LastNonEmpty]
AS ([Measures].[Stocklevel],
   tail(nonempty(existing CurrentDateRange, [Measures].[Stocklevel]),1).item(0)) ;

 

Here are our two last measures together with their corresponding Enterprise measures side by side:

t6

 

Of course you can use the calculation above also for other kind of aggregates. For example a minimum over time (sum over all other dimensions) would look like this:

CREATE MEMBER CURRENTCUBE.[Measures].[STD Min]
AS Min(existing CurrentDateRange, [Measures].[Stocklevel]);

And you can also use this kind of calculation for dimensions other than the time dimension.

Sunday, April 29, 2012

Some more Reporting Services charts

SQL Server 2008-2012

To finish my current Reporting Services mini series in this blog, here are some more charts I recently created using SSRS (this is just a list, I did not include titles and they make no sense in this combination). Most charts here make use of the range chart type which allows very flexible positioning of the bars.

imageimage

imageimage

image image

image

imageimage

And I also like to refer to my older blog post about using the map functionality of SSRS to display all kind of stuff. The example below is taken from this post. It shows a level in a park house together with the utilization of each single parking slot (based on data from a data base). The layout and all elements like the barriers, the arrows and the text have been created using Microsoft Visio.

image

Monday, April 16, 2012

SSRS Custom Drawing (Code)

SQL Server 2005 - 2012

After my last blog post, I got a lot of requests for the code I used for the graphics. So this post just shows the source code for the graphics of my last post.

Both examples require that you include a reference to System.Drawing to your report as shown below:

image

 

1. The deviation chart element

image

Here is the code used for the last column’s image:

Function PaintAbw(ByVal width As Integer, ByVal height As Integer, ByVal min As Single, ByVal max As Single, ByVal middle As Single, ByVal cur As Single) As System.Drawing.Bitmap
    ' Draws a vertical deviation chart
    ' parameters
    '   width/height   width and height of the resulting image in pixel
    '   min/max        minimum and maximum value of the data being passed to the chart
    '   middle         position of the vertical reference line (usually zero or avg of the data)
    '                  data points below this value are plotted in red, above this value in blue
    '   cur            actual value to display

    Dim objBitmap As System.Drawing.Bitmap
    Dim objGraphic As System.Drawing.Graphics
    Dim myBrush As System.Drawing.Brush
    Dim x0 As Integer, x As Integer

    ' Initialize the graphic
    objBitmap = New System.Drawing.Bitmap(width, height)
    objGraphic = System.Drawing.Graphics.FromImage(objBitmap)
    objGraphic.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.AntiAlias
    objGraphic.FillRectangle(System.Drawing.Brushes.White, 0, 0, width, height)

    ' Draw the vertical line (the "middle" position)
    x0 = 8 + ((width - 18) * (middle - min)) \ (max - min)
    objGraphic.DrawLine(System.Drawing.Pens.Gray, x0, 0, x0, height)
    If cur < min Then cur = min
    If cur > max Then cur = max

    ' Draw the deviation (line and circle)
    x = 8 + ((width - 18) * (cur - min)) \ (max - min)
    objGraphic.DrawLine(System.Drawing.Pens.Gray, x0, height \ 2, x, height \ 2)
    If cur < middle Then
        myBrush = New System.Drawing.SolidBrush(System.Drawing.Color.FromArgb(215, 100, 100))
    Else
        myBrush = New System.Drawing.SolidBrush(System.Drawing.Color.FromArgb(100, 100, 215))
    End If
    objGraphic.FillPie(myBrush, x - 4, height \ 2 - 4, 8, 8, 0, 360)

    ' Return the image as type bitmap
    Return objBitmap
End Function

Function PaintAbwBmp(ByVal width As Integer, ByVal height As Integer, ByVal min As Single, ByVal max As Single, ByVal middle As Single, ByVal cur As Single) As Byte()
    ' Wrapper function for PaintAbw. This function is to be called from reporting services as
    ' for parameters see PaintAbw

    Dim bmpImage As System.Drawing.Bitmap

    ' Get bitmap from PaintAbw
    bmpImage = PaintAbw(width, height, min, max, middle, cur)

    ' Convert this bitmap to a byte array of type BMP
    Dim stream As System.IO.MemoryStream = New IO.MemoryStream
    Dim bitmapBytes As Byte()
    bmpImage.Save(stream, System.Drawing.Imaging.ImageFormat.Bmp)
    bitmapBytes = stream.ToArray
    stream.Close()
    bmpImage.Dispose()
    Return bitmapBytes
End Function

Examples

PaintAbw(100, 20, -100.0, 100.0, 0.0, 50.0) image
PaintAbw(100, 20, 0.0, 100.0, 0.0, 80.0) image
PaintAbw(100, 20, -100.0, 100.0, 0.0, -80.0) image

Usage within the report as the source for a bitmap

image

Sample Expression:

=Code.PaintAbwBmp(100,20,Min(Fields!Value.Value, "DataSet1"),Max(Fields!Value.Value, "DataSet1"), Avg(Fields!Value.Value, "DataSet1"), Fields!Value.Value)

 

2. The KPI slider

image

Here is the code used for the last column’s image:

Function PaintBox(ByVal width As Integer, ByVal height As Integer, ByVal level As Single) As System.Drawing.Bitmap
    ' Draws a range chart (blending from red to green)
    ' parameters
    '   width/height   width and height of the resulting image in pixel
    '   min/max        minimum and maximum value of the data being passed to the chart
    '   level          position of slide
    '                  0: left, 1: right, 0.5 middle
    '                  make sure you scale this value to your needs

    Dim objBitmap As System.Drawing.Bitmap
    Dim objGraphic As System.Drawing.Graphics

    ' Initialize the graphic
    objBitmap = New System.Drawing.Bitmap(width, height)
    objGraphic = System.Drawing.Graphics.FromImage(objBitmap)
    objGraphic.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.AntiAlias
    objGraphic.FillRectangle(System.Drawing.Brushes.White, 0, 0, width, height)

    ' Create color gradient for the background
    Dim BrushRedYellow As New System.Drawing.Drawing2D.LinearGradientBrush(New System.Drawing.Rectangle(8, 0, (width - 16) \ 2, height), System.Drawing.Color.Red, System.Drawing.Color.Yellow, System.Drawing.Drawing2D.LinearGradientMode.Horizontal)
    Dim BrushYellowGreen As New System.Drawing.Drawing2D.LinearGradientBrush(New System.Drawing.Rectangle(7 + (width - 16) \ 2, 0, (width - 16) \ 2, height), System.Drawing.Color.Yellow, System.Drawing.Color.Green, System.Drawing.Drawing2D.LinearGradientMode.Horizontal)
    objGraphic.FillRectangle(BrushRedYellow, 8, 3, (width - 16) \ 2, height - 10)
    objGraphic.FillRectangle(BrushYellowGreen, 8 + (width - 16) \ 2, 3, (width - 16) \ 2, height - 10)

    ' Scale level properly
    If level > 1 Then level = 1
    If level < 0 Then level = 0

    ' Draw the arrow
    Dim Arrow(2) As System.Drawing.PointF
    Arrow(0) = New System.Drawing.PointF(8 + (width - 16) * level, height - 10)
    Arrow(1) = New System.Drawing.PointF(16 + (width - 16) * level, height - 2)
    Arrow(2) = New System.Drawing.PointF(0 + (width - 16) * level, height - 2)
    objGraphic.FillPolygon(System.Drawing.Brushes.Black, Arrow)

    ' Return the image as type bitmap
    Return objBitmap
End Function

Function PaintBoxBmp(ByVal width As Integer, ByVal height As Integer, ByVal level As Single) As Byte()
    ' Wrapper function for PaintBox. This function is to be called from reporting services as
    ' for parameters see PaintBox
    Dim bmpImage As System.Drawing.Bitmap

    ' Get bitmap from PaintBox
    bmpImage = PaintBox(width, height, level)

    ' Convert this bitmap to a byte array of type BMP
    Dim stream As System.IO.MemoryStream = New IO.MemoryStream
    Dim bitmapBytes As Byte()
    bmpImage.Save(stream, System.Drawing.Imaging.ImageFormat.Bmp)
    bitmapBytes = stream.ToArray
    stream.Close()
    bmpImage.Dispose()
    Return bitmapBytes
End Function

 

Examples

PaintBox(100, 20, 0.0) image
PaintBox(100, 20, 1.0) image
PaintBox(100, 20, 0.5) image

 

=Code.PaintBoxBmp(100,20,Fields!Sales_Amount.Value/Fields!Sales_Amount_Quota.Value)

Sample Expression:

=Code.PaintAbwBmp(100,20,Min(Fields!Value.Value, "DataSet1"),Max(Fields!Value.Value, "DataSet1"), Avg(Fields!Value.Value, "DataSet1"), Fields!Value.Value)

Tuesday, April 10, 2012

SSRS Custom Drawing

SQL Server 2005 -  SQL Server 2012

In my last post I showed some tricks with reporting services charts. As I mentioned, the possibilities are almost endless.For example, Jason Thomas created an excellent step-by-step instruction of how to create overlapping bar charts. So, I can just encourage everybody to be creative with these chart properties. It’s really amazing what kind of charts can be created this way.

But with the ability to use code behind a report, SSRS is even more powerful. In an early post, Jason Thomas reported about a method of rotating text in SSRS 2005. As SSRS 2005 has no property to rotate text, the solution was to write custom code that actually returns a bitmap as a byte array. This bitmap can then be placed on the report. Just imagine what this means for you: You can create a .Net drawing function, pass parameters from the report (maybe from a data set) to the function and place the resulting imagine on the report! Being able to use the powerful System.Drawing functions gives us almost unlimited power to create interesting graphical representations. This post shows just two examples.

In the following screenshot of a Reporting Services report I used this technique to create a custom KPI representation (the “sliders” in the last column).

image

In order to create this graphical element, I used two functions. The first function returns a System.Drawing.Bitmap, the second function converts this bitmap to a byte array. I removed the simple drawing code itself here to focus on the idea. This is how the two functions look like:

Function PaintBox(ByVal width As Integer, ByVal height As Integer, ByVal level As Single) As System.Drawing.Bitmap
    Dim objBitmap As System.Drawing.Bitmap
    Dim objGraphic As System.Drawing.Graphics

    objBitmap = New System.Drawing.Bitmap(width, height)
    objGraphic = System.Drawing.Graphics.FromImage(objBitmap)

    ' … do the drawing here, for example objGraphic.DrawLine(System.Drawing.Pens.Gray, 0, 0, width, height)

    Return objBitmap
End Function

Function PaintBoxBmp(ByVal width As Integer, ByVal height As Integer, ByVal level As Single) As Byte()
    Dim bmpImage As System.Drawing.Bitmap

    bmpImage = PaintBox(width, height, level)
    Dim stream As System.IO.MemoryStream = New IO.MemoryStream
    Dim bitmapBytes As Byte()
    bmpImage.Save(stream, System.Drawing.Imaging.ImageFormat.Bmp)
    bitmapBytes = stream.ToArray
    stream.Close()
    bmpImage.Dispose()
    Return bitmapBytes
End Function

The report uses a table with an image in the last column:

image

In the image properties, the following settings have to be made:

  • Image Source: Database
  • MIME Type: image/bmp (you may use one of the other mime types instead, as long if the mime type matches the output format for the function PaintBoxBmp above).
  • Use this field: Expression
    =Code.PaintBoxBmp(100,20,Fields!Sales_Amount.Value/Fields!Sales_Amount_Quota.Value)

image

Also, the code above requires a reference to the System.Drawing assembly:

image

When writing such functions, it’s a good practice to wrap them in a custom .Net assembly.

If you use a custom .Net assembly…

  • … it’s easier to write the code. Nobody wants to write larger code blocks using the SSRS code editor which gives no hints or syntax highlighting
  • … you may use C# to code the custom assembly
  • … you can use import statements to reference assemblies (like System.Drawing). With code behind a report, this doesn’t work yet, so you always have to use the fully qualified class name
  • … it’s easier to test and debug your code for example by sending the output to a Windows Forms test application

Here is another example of a graphical element (last column) created using the approach from above:

image

These kind of charts show which rows are above or below the average. There is one image (chart) per line composing the effect of a single chart with a straight vertical line.

So with just a little bit of code behind the report it’s possible to create interesting graphical representations of data that are not provided by Reporting Services out of the box.

Sunday, March 18, 2012

SSRS bar charts tips and tricks

SQL Server 2008R2 | SQL Server 2012
This post shows some tricks with SSRS report charts that I stumbled over some weeks ago.
I start with a very simple bar chart showing some values along a full year (12 months):
image
Here are some variants of this chart together with some details how the chart was created using SQL Server Reporting Services:
image Highlight plan values
Here the last three months contain the plan values. The shading of the bars is done by an expression. The vertical line is not the vertical axis, but a line chart series. How can this line be a true vertical? Like with xy scatter charts, you can specify each point’s x and y coordinates separately. The x coordinate goes into the category field for the series. So in this case there are two points: (9.5,0) and (9.5,1). This line is on the secondary vertical axis with min/max set to 0 and 1.
Since the labels on the horizontal axis are automatically shown as scalar values, if one of the chart’s series contains x/y data, I had to create these labels for the horizontal axis using a second chart which is placed below the bar chart. 
image Another option for highlighting plan values

In this chart I didn’t use a line to separate the planning values but instead I inserted some space. One option to do so is to add an extra row of data with value zero. But by doing so, the gap between the two regions becomes too wide (in the example on the left, the gap is smaller than the width of a bar).
The gap can be created by correcting the category value for the data rows that correspond to the planning data.
In my case, I hard-coded the start for the planning data (usually it would depend on other data columns or report parameters). My category expression looks like this:

=Fields!Month.Value+ iif(Fields!Month.Value>9,0.3,0)

Note, that for all bars that correspond to the planning data, a value of 0.3 is added to the horizontal position.
In order to get a proper placement of the labels, the labels again are created using a separate chart as the data labels for bars of height 0. In order to explain this, the following screenshot enlarges the bars and shows the axis of the second chart:

image
image Plan values as a stepped line
The plan values are shown as a stepped line here. Bars that rise above the stepped line are showing a better actual compared to the plan. The stepped line is a simple basic series type. However, usually the steps would be located in the center of the bars like this:

image

In order to correct this, you can simply take off 0.5 from the x coordinates of this series, so in my case the category field’s expression looks like this:
=Fields!Month.Value - 0.5
In order to create a proper stepped line for the last bar, the last row of data for the stepped line has to be duplicated (one data point more than for the bar chart needed).
image  Shaded areas in the chart background
This effect is relatively simple. I used an area chart in the background for the months with the plan values. Again, the area chart’s x coordinates have to be corrected by 0.5 in order to fit to the bar chart’s positions.
The formula for the background area chart results in blank for all months without the shading (don’t use zero, because then you would create a linear slope between September and October in this example).
image  Colored horizontal axis

This chart type was once proposed by Prof. Hichert for a visual separation of actual and plan values. Here, the horizontal axis is hidden. The colored axis is formed by two area chart series. In order to prevent overlapping with the bar chart, they have to be on the same vertical axis as the actual values but with a negative value which as to be scaled depending on the maximum of the values for the bar chart.
image Empty horizontal axis

This is just a variation of the previous chart and again one of the examples of Prof. Hichert. Instead of using two different colors for the area chart forming the horizontal axis, here I’m using different fill colors.
These are just some of the endless possibilities to create charts in reporting services.