## Sunday, January 19, 2014

### T-SQL 2012 Window Functions in ELT (part 3) - calculating price elasticity

SQL Server 2012

To complete this short series about SQL Server 2012 window functions for ELT processing, I’d like to show an example of a calculation typically needed in retail for measuring the responsiveness of the sales quantity to changes in the unit prices. This measure is called the price elasticity of demand. There are different methods for calculating the price elasticity. A simple one is to divide the percentage increase of quantity by the percentage increase of the unit price. Assuming Q1 was the quantity before the price change, Q2 is the quantity after the price change and P1 and P2 are the corresponding unit prices, the elasticity can be calculated using this formula:

For example, if the unit price is raised by 5% and you see a change in the sales quantity of –10%, then the elasticity would compute to –10%/5% = –2. In most cases, the elasticity is a negative value, as a higher price usually results in a smaller amount of goods being sold. There are other methods of calculating the price elasticity (point elasticity, arc elasticity) but I don’t want to focus on this too much here.

Some approaches are computing price elasticity by product only ignoring the fact that for most cases the responsiveness of the sales quantity to changes in the unit price significantly depends on the unit price. If the product is much cheaper than the same product from the competitors a price increase of 10% may have no effect at all (it’s still cheap). However, if the price increase leads to a price which is higher than the price from the competitors you may see a totally different effect for the sales quantity. So, for this example, we like to compute the price elasticity by the unit price of each product.

For the calculation we basically need the sales quantity by price and product. Although the AdventureWorks sample database does not provide a good data source for this purpose, I will still show an example based on this data because it is widely available. Let’s try the reseller fact table here:

select
ProductKey
, UnitPrice
, Sum(OrderQuantity) OrderQuantity

from
[dbo].[FactResellerSales]

group by ProductKey, UnitPrice
order by ProductKey, UnitPrice

The result of the query shows the total quantity by price and product:

As you can see, the order quantity is different for different unit prices. However, we cannot really compare the order quantities for different prices using the query from above. The reason is simple. Imagine you have a product with a price of 1.5 EUR being sold for 12 weeks at a total of 1000 pieces per week. Now for only one week you change the price to 1.2 EUR and observe a higher demand of 2000 pieces. Using the query from above would give the following result

 Product Unit Price Quantity My Product 1.5 EUR 12,000 My Product 1.2 EUR 2,000

Just using the total quantity by unit price would lead to the wrong conclusion that lowering the price from 1.5 to 1.2 EUR leads to less sales. So I think it’s clear that we need to normalize the results before being able to compare them. For example, if we normalize by the number we weeks here, we get the desired result:

 Product Unit Price Quantity By Week My Product 1.5 EUR 1,000 My Product 1.2 EUR 2,000

Is time the only dimension that needs to be normalized? Usually not. Imagine you have regional promotion where only 10 out of 100 stores sell the product at the price of 1.2 EUR. Again the 90 stores selling at 1.5 EUR cannot be compared to the 10 stores selling at 1.2 EUR.

Usually we need to normalize our results by

• Number of days the product is being sold
• Number of stores that are selling the product

As a consequence, for our example that is based on the ResellerFacts table in the AdventureWorks database, I’m going to modify the query from above as shown below:

select
ProductKey
, UnitPrice
, Avg(OrderQuantity) OrderQuantity
from
(
select ProductKey, UnitPrice, Sum(convert(float,[OrderQuantity])) OrderQuantity
from [dbo].[FactResellerSales]
group by ProductKey, UnitPrice, ResellerKey, OrderDateKey
) SalesDetails
group by ProductKey, UnitPrice
order by ProductKey, UnitPrice

The line in red shows aggregates the original data by product and unit price and also by reseller and order date where the latter two are important to separate the aggregates for the normalization. We don’t need ResellerKey or OrderDateKey in the output column list, we only need to have separate rows. This allows us to compute an average quantity in the outer query. Here’s the result:

As you see, the values differ a lot from the results of the first query. For practical use, you would usually filter on a specific date range (for example the last 12 months) and also group by other attributes (for example region or even store if you have enough data). To keep things simple, I’m not going to do this here.

In order to do the calculation of the elasticity, we need to fetch the previous unit price and order quantity together with the current one. This is where the window functions become helpful. In order to keep the results simple, I choose a single product here:

select
ProductKey
, UnitPrice
, Avg(OrderQuantity) OrderQuantity
, lag(UnitPrice,1) over (partition by ProductKey order by UnitPrice) PrevUnitPrice
, lag(Avg(OrderQuantity),1) over (partition by ProductKey order by UnitPrice) PrevOrderQuantity
from
(
select ProductKey, UnitPrice, Sum(convert(float,[OrderQuantity])) OrderQuantity
from [dbo].[FactResellerSales]
group by ProductKey, UnitPrice, ResellerKey, OrderDateKey
) SalesDetails
where ProductKey=470
group by ProductKey, UnitPrice
order by ProductKey, UnitPrice

The unit price and the order quantity can be used to plot the demand curve, that shows the relationship between price and sales quantity.

The higher the price, the lower the sales quantity for most cases. So the graph shows exactly what we have expected. Usually the demand curve is not that linear as it is in my example here.

Let’s get back to the computation of the price elasticity. What you can see in the example from above is that a price change form 17.0955 to 18.995 results in a change of quantity from 44 down to 30.45. Using the formula from above the elasticity would compute as

In order to complete the computation of the price elasticity, I’m going to wrap the query from above into a CTE:

with SalesByPrice as (
select
ProductKey
, UnitPrice
, Avg(OrderQuantity) OrderQuantity
, lag(UnitPrice,1) over (partition by ProductKey order by UnitPrice) PrevUnitPrice
, lag(Avg(OrderQuantity),1) over (partition by ProductKey order by UnitPrice) PrevOrderQuantity
from
(
select ProductKey, UnitPrice, Sum(convert(float,[OrderQuantity])) OrderQuantity
from [dbo].[FactResellerSales]
group by ProductKey, UnitPrice, ResellerKey, OrderDateKey
) SalesDetails
where ProductKey=470
group by ProductKey, UnitPrice
)

select
ProductKey
, UnitPrice
, PrevUnitPrice
, OrderQuantity, PrevOrderQuantity
, case
when coalesce(PrevUnitPrice,0)<>0 then
((OrderQuantity-PrevOrderQuantity) / PrevOrderQuantity)
/
((UnitPrice-PrevUnitPrice) / PrevUnitPrice)
else null
end PriceElasticity
from SalesByPrice
where ProductKey=470
order by ProductKey, UnitPrice

The actual calculation of the price elasticity is shown in red here. Here is the result:

What you can see here, is that for higher unit prices the sales quantity drops more significantly thus resulting in a higher (negative) price elasticity.

To finish this post, some remarks to the price elasticity: Although the computation is simple (especially if you’re using the window functions), for practical purposes the sales quantity is often corrected for example by seasonal effects, promotions (up-lift) or other effects. This gives a more realistic calculation for the price elasticity. You also need to monitor that you have enough sales data for each product to do this calculation. It doesn’t make much sense to group this calculation by store if you end up with only a few rows per price. And in order to plot this result one option would be do fill the gaps in the unit price (maybe also using window functions) so that you can compare different products regarding their price elasticity on the same axis. Filling the gaps also allows you to compute the price for the optimal profit if you also have the unit costs available. But this would be a separate topic which also can be challenging.

## Saturday, January 4, 2014

### T-SQL 2012 Window Functions in ELT (part 2)

SQL Server 2012

In my last post I explained some of the capabilities of the SQL Server 2012 window functions. Window function allow us to do computations involving other rows in a result set. Much like in my last post, I’m using the same sensor source for some more demonstrations of window functions and how they can help us in data transformation. Here is a sample of my sensor data:

As you can see, the sensors deliver their data approximately every 15 minutes. But imagine you need the data exactly in a 15 minutes interval with data being interpolated based on their distance to the 15 minutes slot. Again, ETL (a script task for example) or cursor come into ones mind but let’s assume we want to do the computation in T-SQL. How could we do that.

Here comes a simple approach for such kinds of problems:

• Perform a union all of the existing data with a fixed interval time stamp data.
• Sort by timestamp and compute time difference to the row before/after
• Perform the interpolation

For the time stamp data I use a table like the following one, which contains the true 15 minutes intervals:

The union-operation brings both tables together:

select SensorID
, [DateTime]
, Temperature
, Throughput
, FailCount
, 'Orig' as [Type]
from [dbo].[SensorData]
union all
(
select distinct SD.SensorID, T.[Time], null,null,null, 'Gen' from [dbo].[SensorData] SD
cross join [dbo].[Time] T
)
order by SensorID, [DateTime]

I included an information if the data is from the original data set (‘Orig’) or from the generated time stamps (‘Gen’). The result looks like this:

For a more readable query, I load this into a temporary table (#tmp) by wrapping the statement from above with a

select * into #tmp
from ( << query from above >>) Mix

Next step is to compute the previous and next values for the generated rows of data. I load this in a temporary table (#tmp2) too and also compute the time difference to the previous and next row in seconds:

select
*
, LAG([DateTime],1) over (partition by sensorid order by [DateTime]) TimeLag

, coalesce(DATEDIFF(second, LAG([DateTime],1) over (partition by sensorid order by [DateTime]), [DateTime]),0) SecondsLag
, coalesce(DATEDIFF(second, [DateTime], LEAD([DateTime],1) over (partition by sensorid order by [DateTime])),0) SecondsLead

, LAG([Temperature],1) over (partition by sensorid order by [DateTime]) TemperatureLag
, LAG([Throughput],1) over (partition by sensorid order by [DateTime]) ThroughputLag
, LAG([FailCount],1) over (partition by sensorid order by [DateTime]) FailCountLag
into #Tmp2
from #Tmp

The most important columns are shown here (only for the temperature):

Next task is to do the interpolation. I’m using a simple linear blending based on the time difference by updating the records of Type ‘Gen’ appropriately. On a PDW, this would not be a recommended operation. You would rather CTAS (create table as select, similar to select into) the result to a result table:

update #Tmp2
set
Temperature =
case
when TemperatureLag IS NULL then TemperatureLead
when TemperatureLead IS NULL then TemperatureLag
end

, ThroughPut =
case
when ThroughputLag IS NULL then ThroughputLead
when ThroughputLead IS NULL then ThroughputLag