Search This Blog

Loading...

Sunday, November 29, 2009

A duplicate attribute key has been found when processing…

SQL Server 2005 | SQL Server 2008

This post is about a common error message during dimension processing I’ve been asked about quite a few times so I thought it would be worth posting about it. The error message says that a duplicate attribute key has been found when processing as shown in the following screenshot for a test cube (I just processed one dimension here):

image_thumb3

Here’s the full error message:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product', Column: 'ProductGroup', Value: ''. The attribute is 'Product Group'.

When you got to this article because you just ran into this problem you probably don’t want to read much about the background but only want a solution. Unfortunately I found at least three possible reasons for this error message:

Reason 1 (likely): The most likely reason for that error is that you are having NULL values in your attribute key column.If you simply created the attribute by dragging it from the source view, BIDS only sets the key column (name and value column default to the key column in this case), so for example if you have a column ‘Product Group’ in your source table and drag it to your dimension, the product group (Text field) will automatically become the key for this attribute. The attribute is listed in the error message (in the example above it is ‘Product Group’).

Solution: Try avoiding those NULL values in your data source (for example by using a DSV query and the T-SQL coalesce-function). When your source data is a data warehouse it’s also a good practice to avoid null values as they complicate the queries to the data warehouse.

Reason 2 (likely): You defined an attribute relationship between two attributes of the dimension but the data in your source tables violates the relationship. The error message gives you the name of the conflicting attribute (text part ‘The attribute is…’). The attributes has a relationship to another attribute but for the value stated in the error message (‘Value: …’) there are at least two different values in the attribute that the relationship refers to. If you have BIDS Helper installed, you can also see the error details and all violating references when using the ‘Dimension Health Check’ function.

Solution: You may solve the error by making the key of the attribute unique. For example:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimDate_x0024_', Column: 'Month', Value: 'April'. The attribute is 'Month'.
In this example, the Month attribute violates an attribute relationship (maybe Month->Year) for the month April meaning that April appears for more than one year. By adding the year to the key of the month attribute you would make the relationsship unique again.

Reason 3 (not that likely): You have an attribute with separate key and name source fields. When you check the data, you see that keys are appearing more than once with different entries in their name columns (note that it’s not a problem if the key appears more than once if only the name column is the same). In this case you will usually also see the key value in the error message, for example:
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_Product2', Column: 'ProductCode', Value: '1'. The attribute is 'Product Name'.
This means that the attribute ‘Product Name’ uses the source column ‘ProductCode’ as the key and for the product code 1 there is more than one name.

Solution: Use a unique key column (unique with respect to the name column)

Long explanation Reason 1:

In this case our attribute is only defined by one single source column (acting as key, name and value information) from the data source view. When processing a dimension, SSAS run select distinct queries on the underlying source table, so a duplicated key should be impossible even if the key appears multiple times. Just think of a date dimension like the following one (just for years and months):

image_thumb15

In this case the year (2009) appears in multiple rows. However, defining an attribute year (using the the year column as the key) does not give a conflict as it is queried using a distinct query (so 2009 only appears once). So again, how could we get a duplicate result when using a select distinct query? Here is how my product table looked like:

image_thumb16

As you can see the ProductGroup column has one row with an empty string and another row with a NULL value. When SSAS queries this attribute during processing it runs the following SQL query (that can be captured using the profiler):

SELECT DISTINCT [dbo_Product].[ProductGroup] AS [dbo_ProductProductGroup0_0]
FROM [dbo].[Product] AS [dbo_Product]

The result of the query looks like this:

image_thumb

Now, with the default NULL processing for our dimension attribute being set to ‘Automatic’ meaning Zero (for numerical values) or Blank (for texts) the NULL value above is converted to an empty string. So the result set has two lines with an empty string and that causes the error.

image_thumb17

So the problem can be avoided if you don’t have null values in your column. This explains the first reason described above.

Long explanation Reason 2:

I blogged about attribute relationship before and you may want to read this post about defining the key for attributes in an attribute relationship.

Long explanation Reason 3:

Let’s take a look at the following modified product table.

image_thumb2

The ProductID column is unique while the ProductCode is not. If we now define the ProductName attribute as follows we will also get a duplicate key error:

image_thumb4

The reason here is that for the ProductCode 1 two names are found (and therefore the select distinct returns two lines with ProductCode 1), so ProductCode is not a good key here. The problem would disappear if the ProductName for the third line would also be ‘A’ (like for the first line) or the ProductCode for the third line would be other than 1 or 2.

However, this reason occurs rather seldom because usually if we have a key and a name in our dimension, the source comes from some kind of master data table and therefore it should be unique. But for type 2 slowly changing dimensions you must not use the business key as key column (as there may be many rows with the same business key).

Another way to “solve” duplicate key errors (although not recommended) is to set the “KeyDuplicate” property for the error processing of the dimension to “IgnoreError” as shown below:

image_thumb18

However, this is definitely not recommended except for prototyping scenarios. This is clearly explained here http://technet.microsoft.com/en-us/library/bb630297.aspx.

46 comments:

  1. Very detailed and excellent explanation.
    My problem was with the third scenario.

    ReplyDelete
  2. Hi;

    Wha ti don't understand in your article, and many others, that, you talk about an attribute's key.
    This confuse me you can't iamgine how.
    I'm not really expert in SSAS, but i'm working on some cube, wich generate some errors realted to your article.
    To be more clear, u said "You have an ATTRIBUTE with separate KEY", for me, this is strange, what i know is: a key is an attribute in itself, a table can have a key, i never thought that an attribute can have a key.
    please, can u explain me this, maybe i don't know the use of every terms. and if it's possible, u can explain it by a scema.

    Thank you in advance.

    ReplyDelete
  3. If you look at the properties tab of each attribute you can find three properties relating to the underlying source: Key, Name and Value. Key can consist of multiple columns and defines the granularity of each attribute. If you just created the attribute by dragging it to the left pane, the default key is the source column and the default name is empty (meaning the same as the key). This has nothing to do with a primary key. If you dropped the column MonthName as an attribute, the default key for this column is the month name. But (as stated above) if you set key and name property you have to make shure that they key is really related to the name.
    Hope this helps.

    ReplyDelete
    Replies
    1. Just the explanation I needed.
      Thanks

      Delete
  4. i'm the same one who posted under these informations: May 26, 2010 5:03 PM - Anonymous
    Thank you for the fast answer,
    OK, now i can see that any diemnsion need a key attribute, which can be a collection of different keys (and this is new for me).
    I still confused on why we need define this? this is an example of why i'm confused:
    We have the table Employees (PK:int; Name:String; FirstName:String)
    i want create a dimenion Employee form this table, that means i need define a Key attribute (to don't have the error concerning Dupplicate key), so my question: why the PK attribute isn't sufficient? isn't this a way to differenciate between rows in the dimlension employee?
    i really can't see the need of all this, and, why SSAS will find dupplication if we don't define a collection of attributes to be used as a key.

    Thank u in advance (you really heplped in your first answer, to see the exostence of these properties.)

    ReplyDelete
  5. For your employee dimension you could go like this: Make the PK the dimension key attribute and set it to hidden, then take Name and FirstName as additional attributes (maybe also the concatenation). For the Name attribute only define Name as the key column, same with FirstName. In this case you will not get any difficulties.
    However, if you set the key property for the Name column to the PK and you have two rows with the same PK but different name you will get the error(how could that be? PK should be a unique surrogate key, but if it is a business key and you have something like valid from/to it could appear more than once).
    Multiple key fields for an attribute are mainly used for defining attribute relationships. I have another post about this here in my blog: http://ms-olap.blogspot.com/2008/11/turning-non-natural-hierarchy-into.html

    ReplyDelete
  6. Hi Again,
    Now i see better what you was saying:
    My problem was when i used the fact table as a dimension (Fact dimension using the ID as a filed to build the relashionship in Dimension Usage), and what made the problem (as i think)is: when i leaved some other fields (That can have dupplicate values) here is an example:

    Dim = Transactions (ID,FareCardNumber, FileName)
    Which is the Fact table in same time.

    ID=1, FareCardNumber=000000110000,FileName=Test1

    ID=2, FareCardNumber=000000110000,FileName=Test2

    ID=3, FareCardNumber=000000110000,FileName=Test3

    ID=4, FareCardNumber=000000111111,FileName=Test3


    Now my quetion is:
    i need the informations contained in FareCardNumber and FileName fields to be displayed in some reports, and, to be used as a Parameter (filter).

    As u can see too, this two fields, can have many values (That we can't expect).
    SO:
    Have i set their properties (ColumnKey to ID)--> that means (as i think), there will not be any calculated aggregation by Test3 for example.
    or
    Have i to put them, everyone in a Seprate dimension by creating that in the DSV (View)? --> how can i implement that without the expected values?
    Or
    Have i to use the concept of Kimball (Junk Diemnsions)? --> how can i implement that if i havent the expected values?
    or
    Another question of design rules (or best practice): is the fact that a field is used in a report as a filter or parameter make us choose it as a dimension?

    I hope that i was enough clear.
    And thank you very much for your last answrs.

    ReplyDelete
  7. In your case I would suggest to create a separate view for the dimension using select distinct (and leaving the ID out). This can be done in the DSV or persisted using SSIS and is more flexible than the fact dimension. For each attribute of this dimension try using the field as the key, e.g. Key column for FareCardNumber is FareCardNumber itself. By this, each number only apears once in your new dimension.
    Regarding your other question: Indead, if you want to filter your data by some information, this is a very strong sign that this information will be in a dimension. Usually your report will run some MDX and usually it will slice by your filter. This works best if it is a dimension (same with column or row headers).

    ReplyDelete
  8. Hi,

    I think I have a similar problem but I cant understand where the problem is. I'll try to use your example to make it easier to understand.My table looks like:
    product id product group product name
    1 A AAA
    2 A BBB
    3 B AAA
    4 B AAA

    I create a dimension with the product key as the primary key. I then add the product group which is fine but when i add the product name I get the duplicate key error.
    What I cant understand is why I dont get the same error when i add the product group that has duplicate records?
    I dont want to change the key columns (solution 3) because I want to be able to aggregate the data by product group and product name. If you set the key column=product id then product name "A" will appear twice and not once.
    Any ideas what the problem is?

    ReplyDelete
  9. Hi Fotios, If you choose key = [product name] for the dimension attribute "product name" and leave name/value empty, you will not have the duplicate key error on product name and each product will only appear once in the attribute members. The error would only happen on key=[product id] and name=[product name].
    If, for reasons of attribute relationship and hierarchies, you still need a combined key ([product id] + [product group]) you can create an additional hidden attribute, that is only used in the hierarchy.
    Hope this helps.
    Cheers,
    Hilmar

    ReplyDelete
    Replies
    1. Hi Hilmar,
      i have the same situation as Fotis mentioned in his thread.you are mentioning the default behaviour that each attribute has the key of it's name and name/value is always empty.

      but i can't able to understand the causes of this problem and why it's happening on some attribute and others are fine as metniod in the Fotis example.

      what you suggest regarding the keyduplicate error option which also working fine in my case as well as.what you recommend about this option.

      could you please explain in details with the help of Fotis example which exactly the my problem.

      thanks in advance.

      Cheers,
      Zaim Raza

      Delete
  10. I haven't tested that but I have found another solution that works and you may be intested to know. Since i know that the name and group will not be unique and they are not part of the primary key of the dimension I am setting in the dimension properties the ErrorConfiguration/KeyDuplicate errors value to ignore errors.
    By doing that it ignores the duplicate values and groups them based on the name and not the key.

    Thanks for the help and quick response!

    ReplyDelete
  11. Reson 2 was my problem

    ReplyDelete
  12. Try to find and remove the white space in the columns or set trimming to none in keycolumns properties section :)

    ReplyDelete
  13. I Had a problem with collation, the SQL uses (SQL_Latin1_general_CP1_CI_AI) and thinks "C"<>"Ç" and Analysis uses Latin1_General_CI_AI and thinks "C"="Ç", and my customer don´t want to change de SQL DB Collation to Latin1_General_CI_AI :(

    ReplyDelete
  14. No final solution, just workaround... at a view that casts the strings to right collation. Any sugestion????

    ReplyDelete
  15. Workaround solution, VIEW with collation casting. "FIELDNAME COLLATE XXXXXXX"

    ReplyDelete
  16. Hi

    Great post, you helped me, but there is a 4th scenario, which I had suffered from, that I think you should add, to make the list full.

    It's basically a similar case to #3, I had an attribute Units and Price, when there is a NULL in it, it's converted to 0, but there are some lines in the dimension table with the value of 0, so what I had to do to solve the issue, was to deal with the null in the DWH, by converting null to let's say -1, that way the 0 value, do not conflicts with the null converted to 0.

    ReplyDelete
  17. Thank you for your tip. I had the very same issue and had nulls sprayed all over my tables. I am somewhat new to this field, so still learning..Thanks!

    ReplyDelete
  18. Great article, I had another scenario that took me a long time to identify.

    Carriage return at the end of some records caused the same error message. Walking through the 'Long explanation Reason 1', and selecting the distinct field helped to see the hidden duplicate.

    Solution A: Combine the values by applying trimming in a "Named Query" in the Data Source View to eliminate white spaces.

    Solution B: To show both values as distinct items in dimension, under the attribute properties KeyColumns -> Trimming , change from "Right" to "None".

    ReplyDelete
  19. I just came across a different cause for this error which took me a few good hours to find.

    The following error popped up when processing a dimension:

    Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_vwDimSCDProperty', Column: 'InsurancePolicyNumber', Value: 'HOM108382271'. The attribute is 'Insurance Policy Number'.

    The root cause of the error was duplicate values with one of the values containing some non-displayable characters.

    The column is defined as CHAR(15). The value 'HOM108382271' from the error message is 12 characters long. The table has 2 rows starting with these 12 characters, but one has 3 spaces and the other has CR+LF+Space at the end. DISTINCT thinks they are different, but SSAS thinks they are duplicates!

    ReplyDelete
  20. Just want to thank you for taking the time to explain why SSAS is emitting that misleading error message.

    ReplyDelete
  21. Thank you for this very useful article and replies. A long struggle with an attribute key field ended in it containing non-ASCII characters (a ZIP code field loaded from a free format text source). The solution in this case was to set the 'Key Columns / InvalidXMLCharacters' property to 'Remove'.

    SSAS is really good, and the more you learn about it the better it gets, even though often it can cause some headaches!

    ReplyDelete
  22. Thank you ! Very helpful !

    ReplyDelete
  23. Thanks for the explaination. The first reason you site was my problem, and is the least sited on other blog sites. I'll be back.

    ReplyDelete
  24. Great explaination,
    helped me out!

    ReplyDelete
  25. Similar to Reason #1. If the data whose data type is string has some space-like character, we will run into this problem too. We have to remove or replace such characters with space

    ReplyDelete
  26. Hi Hilmar,
    Thank you so much for your post it is really helpful. I have a situation like I have one Fact table in which I have order key and package key. And according to business requirement one order key can have multiple packages. i.e. multiple package key and order key will be repeated but package key will be different. I am getting a duplicate attribute key has been found error. I am not sure what should i do here because i do not want to change the actual fact table. Could you please give any suggestions? it is urgent. Many thanks for your help.

    ReplyDelete
    Replies
    1. The duplicate key error means that you have an issue with a dimension table. So, without knowing all of the details, it sounds like you should have two separate dimension tables (one for orders, one for packages, each with the corresponding unique key) or put both entities in one dimension table (with a surrogate key that is generated per combination of order and package).
      I guess you have already chosen one of those two approaches. It is also possible that there is another reason for the error. Did you check the attribute that was mentioned in the error message? Be sure that the key of that attribute is really valid. You could also create a backup of the cube dimension and then clear all of the attribute relationships to see if the error is a false relationship. And if this doesn't work, you may want to sequentially take one by one attribute out in order to see, if a specific attribute causes the error.

      Delete
    2. Hi Hilmar,

      Thank you so much for your prompt reply. The other think i do not understand is that The cube is running and being processed on daily basis and doing the incremental processing. Actually, I have not developed the cube I am new to the department. When I deploy the cube on new machine and process from scratch i get these hierarchy duplication errors. What do you thing might be the problem? WHy it is running ok in schedule but not on the processing from scratch?

      In above problem what I have done in order key properties i made the collection of order key and packaged key. In the name column I have selected the order key. It has resolved the duplicate hierarchy issue. I am not sure did I apply the right approach for the issue I have mentioned above? Many thanks for your help

      Delete
    3. Having a key collection of order key and package key is the right approach if you combine orders and packages in a single dimension. This should work as desired.
      If a full process (from scratch) fails, but the daily loads run, you still have a serious issue in one of your dimensions. Usually, dimensions are processed in update mode in the daily load. But certain errors in the dimension structure can only be found in a full process mode for example rigid attribute relationship). So, you should make sure that your dimensions can be processed in full mode with no errors. To keep things simple, you only need to process the dimensions (measure groups don't have a side effect here and dimensions are usually faster in processing). If the dimensions process correctly, you don't have an issue with duplicate attributes.

      Delete
    4. Thank you so much Hilmar, very helpful feedback, Now I will process each dimension separately and make sure they are error free. I will let you know here. might ask few more ambiguities later.
      many thanks.

      Delete
    5. Hi,
      Thanks your post is very helpful. I am getting the following error
      Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: 'dim_Organisation', Column: 'Organisation_x0020_Key', Value: '34700'. The attribute is 'Organisation Key'

      I have checked all dimension tables and fact tables and also underlying database, and there is no organisation key 34700. I am unable to understand why it is generating this error. Any Idea?

      Delete
    6. The error message means that there is a fact record with that key but there is no matching dimension member. You should check your fact table or rows with missing dimension members. Usually the key handling is done in the ETL process to prevent such errors. If the dimension row is not missing, it's likely that the dimension is not processed to the most current state of the dimension table.

      Delete
  27. Hi Hilmar,

    Sorry I am bothering you again, I am getting this error now Warning Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: 'dim_Organisation', Column: 'Organisation_x0020_Key', Value: '34700'. The attribute is 'Organisation Key'.

    I have checked all dimensions and fact tables and underlying database and there is no organisation key '34700' and i am getting the above error. I am unable to understand, any idea what am I missing here?

    Many thanks

    ReplyDelete
  28. Thanks Hilmar for all your help. I am getting the error in dimension usage. there is one referenced relationship, reference dimension is organisation and intermediate dimension is ship to customer Address. And relationship is by organisation key in both dimensions. I am getting this error.

    ReferenceMeasureGroupDimension [SLBU].[Sales].[Organisation] : The intermediate granularity attribute 'Organisation Key' should have 'AttributeHierarchyEnabled' set to 'true'.

    I have checked the organisation key property AttributeHierarchyEnabled in both dimensions and it is set to true. I am unable to understand why it is saying set it to true. Many thanks.
    Regards,
    Mustafa

    ReplyDelete
    Replies
    1. I just checked again, but there is no other setting to be made but the AttributeHierarchyEnabled=true on the intermediate dimension's Attribute that is just as the key to the reference dimension. Could you please check your intermediate dimension again?

      Delete
    2. Hi Hilmar,
      Thank you so much for all your help. I have checked there were some issue with referenced dimension. Initially I have deleted the relationship and recreated. but i was getting the same error. Now I have deleted the cube and added the dimensions one by one and created the same relationship and it is working. I did not change anything in data source view. I am unable to understand why it was giving error of Organisation Key' should have 'AttributeHierarchyEnabled' set to 'true'. Is there anything i was missing here?

      Delete
  29. Very helpfull, but I 'd like to add a cause of the situation. In our database the problem was not the fact that it had duplicate values, that was as it was ment to be, but the translation of a value to a key gave this error because there were 'special' characters in some values. For example: the value 44RF0123 exists in de db and de value 44RF0123 followed bij the tab character (0x09). (Of cause an entry error). By processing the cube, the same key ('44RF0123') will be made bij 2 different values (one with and one without the accidental added tab character). Repairing the original values bij clearing the tab-characters solved the problem.

    Jack

    ReplyDelete
    Replies
    1. Hi Jack, excelent point. I already added some more cases in the newer blog post which you can find here:
      http://ms-olap.blogspot.de/2013/08/a-duplicate-attribute-key-has-been.html

      Delete
  30. Very nice, Third solution solved my problem

    ReplyDelete
  31. Thank you so much sir....................

    ReplyDelete