Search This Blog

Saturday, May 25, 2013

Using “cursors” in PDW

PDW v1/v2

Did I say cursor? Isn’t this an evil word? Shouldn’t we try as hard as possible to avoid them in database design and especially in a data warehouse?

Yes, sure. But there might be some patterns which make it useful to loop over a table, for example a configuration table, and do something with each line of the table. Since PDW currently doesn’t support cursors (why should it?), what can we do?

One option is to use the foreach-container in SSIS. It’s a good, reliable and easy way to implement loops.

However, if you need to, you can also do this using SQL. The following example shows a loop over all partitions of the FactSalesHeader table:

create table #FactSalesHeaderPartitions
SELECT sp.partition_number, prv.value AS boundary_value, lower( AS boundary_value_type, sp.rows,
row_number() over (order by sp.partition_number) AS RowNr
FROM sys.tables st JOIN sys.indexes si ON st.object_id = si.object_id AND si.index_id <2
join sys.partitions sp ON sp.object_id = st.object_id AND sp.index_id = si.index_id
join sys.partition_schemes ps ON ps.data_space_id = si.data_space_id
join sys.partition_range_values prv ON prv.function_id = ps.function_id
join sys.partition_parameters pp ON pp.function_id = ps.function_id
join sys.types sty ON sty.user_type_id = pp.user_type_id AND prv.boundary_id = sp.partition_number
where st.object_id = (select object_id from sys.objects WHERE name = 'FactSalesHeader')

declare @part_count int = (select count(*) from #FactSalesHeaderPartitions)

declare @i int = 1

while @i <= @part_count begin

    declare @partition int
    declare @boundary int
    select @partition=partition_numnber, @boundary=boundary_value 
      from #FactSalesHeaderPartitions where RowNr=@i

    -- do something with the partition, for example take it out, merge (ctas)
    -- it with new data and then switch the resulting partition into the
    -- fact table.
    set @i+=1

drop table #FactSalesHeaderPartitions

The idea is to put all the rows that you want to create a loop for in a temporary table, including a column with a row number. You can then create a simple while-loop starting with 1 and going up the the number of rows in the temporary table. Since the row number is also included in the temporary table, you can then select the correct values into your variables within the while loop.

Please note that the above SQL code for reading partitions uses PDW v1 AU 3.5 system tables. For other versions the table names may be different.

Sunday, May 19, 2013

Reminder for PASS SQLSaturday #230 (Germany)

Don’t miss the SQLSaturday on July 13, 2013. You can find more information about the location, registration and the agenda here:

Also take a look at the agenda of the Pre-Conference (July 12, 2013) with interesting speakers and presentations: Here is the link for the details and the registration:

Saturday, May 18, 2013

Connecting to PDW from PowerPivot

PDW v1/v2

When connecting to a PDW from PowerPivot, there might be some confusion about what to enter as the server name.

Confusion might even start at the very beginning when choosing the proper external data connection. After opening PowerPivot, you might assume to find the PDW connection behind the “From Database” ribbon icon. But you have to choose “From Other Sources” instead:


In the following dialog, choose “Microsoft SQL Server Parallel Data Warehouse” as the database type:


Next, a dialog opens to enter the connection details.

While you can choose any name you like for the connection name, the server name may be confusing as you usually enter a host name or IP address here. But for the PDW you have to enter the path to an IDS file containing the connection information:


How does such in IDS file look like? Let’s take a look at an example:

ProviderName=Microsoft SQL Server MPP OLE DB Provider

Be sure to enter the correct IP address and port (default is 17001 for the TDS and 17000 for the Sequelink, but TDS is much faster), as well as the correct user/password.