Time series analysis is widely used to forecast logistics, production or other business processes. Usually you want to understand if there is a trend or a seasonality in the time series. This could support forecasting and planning. However, there are different approaches to understanding trend. While trend often refers to historical changes of data, for me, trend is nothing that happens in the past (this is more like a historical drift), but trend implies a prediction of future behavior. Or, in other words, a positive trend means that it is likely that the growth continues.

Let’s illustrate this with a simple example:

Hmm, this clearly looks like there is a trend. In order to build up confidence, let’s add a linear regression for this graph:

Quite impressive. We could also train an auto arima model and do some forecasting of our data:

We clearly see that even a sophisticated arima model found a trend (drift) and forecasted the chart accordingly.

**Trend detected! Case solved?**

Talking about a trend in data always implies that it is likely that the process follows the trend (at least for the near future). Up till now, we only did some basic chart analysis but I wrote nothing about the source, driver or generator of the data. Without knowing details about the nature of our time series we are on dangerous ground when it comes to forecasting. Please don’t be disappointed, but in my case I used a so called random walk model to plot the graph from above. In a random walk model, the value for the next period is calculated relative to the value of the current period. The difference between one point and the other in my case is just a normally distributed** random number**. Random walk processes often look like they have trend or even seasonality. But for this example, the chart may go to either direction from here, it’s purely random. Or in other words, it’s dangerous to speak of a trend in this case. So whenever we look at trend detection we have to understand the reason why the trend is likely to extend to the future.

**Ok, no trend! Case solved?**

Obviously, since the chart was generated using a random number generator, we cannot really declare this a trend. But how likely is it, that we get such a drift in our random walk process? In order to understand this, I plotted 1,000 random walk processes (each with 100 steps). The line from above is actually one of them. Here’s the result:

As you can see, most of them are concentrated on a relatively small window indicated by the red lines (which are just plotted based on a square root function). Some of them went up to about +30, some down to about –30 but that’s not the major part.

If we take a look at the distribution of final values (the values we see after 100 steps), the histogram looks like this:

This looks pretty much like a normal distribution with a mean of 0 and a standard deviation of 10 which is again the square root of 100, so this also explains the red lines from the chart above (one standard deviation). What can we do with this information? Well, first we could take a look at the cumulated distribution (and since I’m not much interested in positive or negative trend I used absolute values):

How do we read this chart? It just gives us the percentage of cases which end outside a given tolerance from zero. For example, only 5% of all cases (50 out of 1,000) ended at a value bigger than 20.5 or smaller than -20.5 (the .95 confidence marker) and only 1% of all cases (10 out of 1,000) ended distance of more than 24.5 from zero (the .99 confidence marker). Our chart from above actually ended at 29.7 which hits the 0.001 confidence level. In other words, I used the single worst outlier (1 out of 1,000) of my set of lines to plot the chart at the beginning of this post.

**I’m … eh … confused …**

So, what does this all lead up to? Trend or no trend? Well, since we know that I used a random number generator, frankly there’s no reason to believe that the trend continues and therefore it makes no sense to speak of a trend. However, if the chart was generated using real life data and we know that the underlying process follows a random walk model, it would be extremely unlikely that there is no drift or trend in the data. In our case, only 1 out of 1000 cases would behave in such an abnormal way otherwise. So, in this case I’d be pretty sure there is a trend in the series of data.

**Conclusion**

In order to detect a trend in a time series, make sure that you fully understand where the data comes, how the data is generated and what the characteristics of the time series is. Is it oscillating, is it a random walk process (first order derivation) or is the driver even at a higher derivation (you could think of a random accelerating process)? Only if you understand the characteristics and the driver (changing component) it makes sense to look for a trend and analyze the likelihood that you can rely on the trend for future development.

## No comments:

## Post a Comment