Search This Blog

Sunday, August 31, 2014

Create table as select (CTAS) with “not null” column

PDW v1 | PDW 2012 | APS

CTAS (Create Table As Select) is a common way on the Parallel Data Warehouse (PDW) to transform one table into another table for example for calculations or for ELT (Extract Load Transform) processes.

The general syntax is quite simple and explained in the PDW help file:

CREATE TABLE [ database_name . [ dbo ] . | dbo. ] table_name
        [ ( { column_name } [ ,...n ] ) ]
    WITH (
        DISTRIBUTION = { HASH( distribution_column_name ) | REPLICATE }
            [ , <CTAS_table_option> [ ,...n ] ] 
    AS <select_statement>

<CTAS_table_option> ::=
    | CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,...n ] )
    | PARTITION( partition_column_name RANGE [ LEFT | RIGHT
        FOR VALUES ( [ boundary_value [,...n] ] ) ) ]

<select_statement> ::=
    [ WITH <common_table_expression> [ ,...n ] ]
    SELECT <select_criteria>


For example, a simple CTAS statement to copy the contents of one table (FactSales) to another table (FactSalesTmp) may look like this:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *  from FactSales

As you can see from the CTAS syntax definition above it’s not possible specify column properties like NULL/NOT NULL or constraints. In the help file there is a note about this:

You cannot specify NULL | NOT NULL for the columns in the CTAS statement; the nullability property is derived from the columns and expressions in the SELECT results.

However, if you’re using CTAS to fill an intermediate stage table used for partition switching operations the switch out table has to have exactly the same definition as the target table and this also means that the NULL/NOT NULL setting on the column has to be identical. So how is the nullability property derived from the expressions in the select results? To show this behavior, I’m using a simple fact table with the following definition:

create table FactSales (
      DateKey int not null
    , ProductKey int not null
    , StoreKey int not null
    , Quantity int not null
    , Amount decimal (13,4) not null
    , Costs decimal(13,4)
) with (distribution = hash(DateKey))

Using the simple CTAS statement from above (the one copying FactSales to FactSalesTmp) preserves the nullability of all columns. As long as you refer to existing table columns, the nullability is preserved from that columns.

But what about calculations? Let’s try the following CTAS statement:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, Amount*0.8 AS StandardCosts
from FactSales

I simply added another column here using a simple calculation. You can check the resulting table structure by choosing the context menu ‘View Code’ in Data Tools or by running the following query:

select name, is_nullable from sys.columns where object_id=object_id('FactSalesTmp')


What you see, is that the calculation (StandardCosts) is understood to be nullable by the PDW. How can we mark this column as not null the CTAS statement?

Rewriting the CTAS from above using coalesce doesn’t solve the problem:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, coalesce(Amount*0.8,0) AS StandardCosts
from FactSales


However, using isnull does the trick:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, isnull(Amount*0.8,0) AS StandardCosts
from FactSales



The reason for this behavior is the different handling of the data type for coalesce and isnull. So, if you want to have an expression being marked as ‘not null’ in a CTAS statement, use the isnull-function.

Sunday, August 10, 2014

The 'KeyColumns' #0 has NullProcessing set to 'UnknownMember', but the dimension doesn't have UnknownMember set to 'Visible' or 'Hidden'

SQL Server 2005-2014


By default, SSAS provides an automatically created member with the name ‘unknown’ for each dimension. This member is intended to be the home for all facts that don’t fit to a real member (provided from the data source). In the example above, fact data that does not match any of the listed product categories could be mapped to the unknown-element.


I’m saying ‘could’ and not ‘is’ because the rules for mapping fact data to the unknown-element can be configured in the dimension properties.

But using this mechanism has certain drawbacks:

  • Processing time increases a lot if one row is encountered which has to be mapped to unknown
  • Only one text (for example ‘unknown’, can be configured) for all attributes in the dimension
  • Cases are hard to find since you don’t see this mapping in the underlying data warehouse tables


In a good data warehouse design, ETL takes care of the correct mapping of fact data to its dimensions by using surrogate keys. Each join is then an inner join. In order to do so, dimension tables usually contain a row for the unknown element. Frequently, the surrogate key –1 is used for this row.

But following this best practice results in the SSAS dimension showing two elements for ‘unknown’: The dimension entry and the automatically created entry.


So, why does SSAS dimension have this build-in unknown element by default? If we build almost all SSAS cubes based on a good data warehouse design where the dimensions maintain their own unknown element, there is no need for an automatically created unknown element anymore. But since SSAS cube’s wizard is intended to work with most types of data structures, the unknown element is there by default. Without having ETL enforced surrogate keys you just cannot be sure, that every fact row maps to its dimensions.

So, as explained above, we want to remove this default unknown element in almost all SSAS cube development projects. This can be easily done in the properties dialog of the dimension:


There are four available options for the unknown member:

visible The unknown-member of the dimension exists and is visible
hidden The unknown-member of the dimension exists and is hidden
none The unknown-member of the dimension does not exist
automatic null The unknown-member of the dimension exists und is visible, if there are violations of the referential integrity (fact keys not found in in dimension).

Again, if we take care of the surrogate keys in the ETL process, there is no need for a dimension unknown element at all. So, the best option is, to disable it (UnknownMember set to none).

However, because of other default settings, you’re getting the following error when trying to deploy your SSAS model afterwards:

The 'KeyColumns' #0 has NullProcessing set to 'UnknownMember', but the dimension doesn't have UnknownMember set to 'Visible' or 'Hidden'

If you’re getting this error for the first time, it might not be clear, where to fix it, especially since there are two changes that need to be made:


1. Adjusting the dimension key attribute

If you look at the dimension, you’ll notice the red hash-line below the key attribute of your dimension.


In order to fix this, you’ll need to open the properties of that attribute. Now navigate to the key columns setting and expand the view for each of the columns (since you may have more than one column bindings for the attribute’s key) as shown in the following screenshot:


Here you can set the NullProcessing option to “Error”. The default is “UnkownMember” but since we just disabled this, this causes the error.

Remember to do this for each key column of this attribute.


2. Adjust null processing in the dimension usage.

The second place to modify is the dimension mapping. Therefore open the cube and go to the dimension usage tab. You’ll notice the red hash-line at the attribute (in my example, the Product ID):


In order to fix this, click the button near to the attribute (the one labeled with “-“) to open this dialog:


Click advanced to edit the properties of the mapping:


In the lower part of the dialog, you can set the “Null Processing” from “UnknownMember” to “Error”.

After this change you should be able to deploy the cube again.