Search This Blog

Saturday, May 25, 2013

Using “cursors” in PDW

PDW v1/v2

Did I say cursor? Isn’t this an evil word? Shouldn’t we try as hard as possible to avoid them in database design and especially in a data warehouse?

Yes, sure. But there might be some patterns which make it useful to loop over a table, for example a configuration table, and do something with each line of the table. Since PDW currently doesn’t support cursors (why should it?), what can we do?

One option is to use the foreach-container in SSIS. It’s a good, reliable and easy way to implement loops.

However, if you need to, you can also do this using SQL. The following example shows a loop over all partitions of the FactSalesHeader table:

create table #FactSalesHeaderPartitions
SELECT sp.partition_number, prv.value AS boundary_value, lower( AS boundary_value_type, sp.rows,
row_number() over (order by sp.partition_number) AS RowNr
FROM sys.tables st JOIN sys.indexes si ON st.object_id = si.object_id AND si.index_id <2
join sys.partitions sp ON sp.object_id = st.object_id AND sp.index_id = si.index_id
join sys.partition_schemes ps ON ps.data_space_id = si.data_space_id
join sys.partition_range_values prv ON prv.function_id = ps.function_id
join sys.partition_parameters pp ON pp.function_id = ps.function_id
join sys.types sty ON sty.user_type_id = pp.user_type_id AND prv.boundary_id = sp.partition_number
where st.object_id = (select object_id from sys.objects WHERE name = 'FactSalesHeader')

declare @part_count int = (select count(*) from #FactSalesHeaderPartitions)

declare @i int = 1

while @i <= @part_count begin

    declare @partition int
    declare @boundary int
    select @partition=partition_numnber, @boundary=boundary_value 
      from #FactSalesHeaderPartitions where RowNr=@i

    -- do something with the partition, for example take it out, merge (ctas)
    -- it with new data and then switch the resulting partition into the
    -- fact table.
    set @i+=1

drop table #FactSalesHeaderPartitions

The idea is to put all the rows that you want to create a loop for in a temporary table, including a column with a row number. You can then create a simple while-loop starting with 1 and going up the the number of rows in the temporary table. Since the row number is also included in the temporary table, you can then select the correct values into your variables within the while loop.

Please note that the above SQL code for reading partitions uses PDW v1 AU 3.5 system tables. For other versions the table names may be different.

No comments:

Post a Comment