Uncategorized

TIME SERIES ANALYSIS USING REGRESSION – KNOWING TREND AND CYCLICITY

Time series analysis is one of the most used technique to know the trend and seasonality of any business. Or it is used to check and measure the dependent variable over changes is independent variable over a period of time. If you want to see the sales pattern of your company, economic parameters of a country, customer footfall in a branded store or even the pattern of attrition or employee churn rate in an organization.

Time Series Components

What are the different components that can be brought out of your data using time series analysis?

  1. Trend – Year on Year change, predictable but uncontrollable
  2. Seasonality – Within a year change, predictable but uncontrollable
  3. Cyclicity – Change across several years, unpredictable, can be known only after it has occurred
  4. Irregular or Random Variation – Change because of some unique thing occurring at a particular time, cannot be predicted

The most common way to see trend is by using the historical data and draw a graph. The zig zag lines of increasing and decreasing slope is the most common way to apprehend the pattern of data. But can you find out more from the graph? By this way you are going to miss the following: –

  1. The value of growth or decline, period to period, i.e. Trend
  2. The value of growth or decline within the period, i.e. Seasonality

Trend and Seasonality

Trend tells by how much the dependent variable will change from one period to another. Seasonality tells how much the dependent variable will change within the period. Take the example of temperature from time to time. Trend will show how the temperature is rising or decreasing from one period to another which may be year on year or month on month. Seasonality will tell you how the temperature increases or decreases within the period. If you are considering Yearly trend, then seasonality value can be day to day, month to month or quarter to quarter. For example, the temperature will be 2degree more as compared to last year is the trend. And the temperature changes which will be increasing and decreasing within the year shows the Seasonality.

Cyclicity

Cyclicity is the repetitive nature of the business in the market. Take the case of automobile industry, it has a growth of 4-5 years and then decline for 1-2 years. This continues in a repetitive cycle, but no one can predict the time of decline and growth. Many experts who have worked in the industry can have developed intuition on it but can not be confirmed by data and is unpredictable. On a broader sense, no one accurately predict when next economic depression is coming. Everybody knows that there will be an economic depression, but can not be predicted when and by how much it can affect the economy.

Irregular or Random Variation

Irregularity or Random variation is not any usual event, but some very strange and unpredictable act or event that disrupts the performance of your business or the dependent variable. Just imagine it is hot summer and sales of ice cream and cold colas sales will be near all time high. But suddenly there is disturbance in the weather and temperature goes down and summer is cut short. Think of all the companies selling ice creams and colas. You can-not predict such things as this is not under control.

Sale of Cosmetics Case

This blog will see a case of sales of cosmetics and use time series analysis technique to predict future sales. Similarly, many cases from other fields will be taken to understand how this technique can be implemented in those areas.

For more clarity, let us drive deep considering sales of products. The Marketing and Sales Head of a company wanted to build a model based on analytics to predict the number of sales of the product. He has to give the numbers to the Production head for planning of production for the next quarter. The case is clearly written in the Excel file. Click here to download the file.

Dummy Variable Treatment

You may note that the dummy variables have been used here which we will explain in short. The first thing to know is your data, the independent variable Region is a categorical variable. To use the same in regression, you have to use n-1 no of dummy variables. Here regions have 6 variables East, West, North, South, Central and North East. So, one no of less variable will be used i.e. 5. We will exclude North East as shown in the Regression tab of the excel file. The value 1 and 0 is used to show the presence or absence of that particular variable. In the following video you can see how the data is set before starting the regression for time series analysis and how the dummy variables are set up.

HOW DUMMY VARIABLES ARE PREPARED FOR REGRESSION

We will be using the solver technique to build the time series prediction model through regression. For a recap you may read this blog again and to revise the technique used and interpretation of the regression.

Into the Case

You can see from the excel file that the data is for the product cosmetics, from Quarter 1 of 2010 to Quarter 2 of 2015. The data is from 7 regional markets in the country. You can note that the value of unit cost and price have not changed in these years. Let’s neglect this as we are not into their pricing strategy now, but more interested in building time series model. We will come out with another blog in the future related to pricing strategy using analytics.

Here we will be using additive model which will factor in the trend and seasonality. Which will be as below –

Dependent Variable = Constant + Period no*Trend + Seasonality Index + Independent Variable*Independent Variable Index

Here we will find Yearly Trend i.e. changes year on year and Quarterly Seasonality. The following image and video here shows how to implement the solver technique in excel file after preparing the data.

ADDITIVE FORMULA IMPLIMENTATION BEFORE APPLICATION OF SOLVER FOR OPTIMIZATION
Go to Data and click Data Analysis. A new small window of Solver opens up as shown in the following video.
STEP 1
SOLVER IMPLEMENTATION

After using the solver technique, we get the result as shown in the Regression tab. Let us interpret the results.

Sales Interpretation from Time Series regression model

COMPONENTSVALUESINTERPRETATION
Constant673.78The Minimum sale of cosmetics will be about 674 units
Trend96.15The sales will increase by 96 units every year
East Region-51.93The sales will be less by 52 units in East region
West Region-186.79The sales in the West region will be less by 187 units
North Region133.75The sales in the North region will be more by 134 units
South Region385.64The sales in the South region will be more by 386 units
Central Region-196.50The sales in the Central region will be less by 197 units
Q149.59Sales in First quarter will be more than the average by 50 units
Q2176.86Sales in the Second quarter will be increase and will be more than the average by 177 unit
Q3197.14Sales in Third quarter will be the highest in the season and shall be more than the average by 197 units
Q4-423.59The sales in the last quarter will take a dip and will be lesser by 424 units than the average.
Discount24726.75If the Discount is 10%, sales will increase by 2473 units.

If the Discount is 20%, the sale will increase by 4946 units.
Display Promotion1710.97Sales will increase by 1711 units if the stores in any region use Display Promotion.
FINAL INTERPRETATION OF THE CASE RESULT

Result Interpretation

Trend – We can say that every year there is pattern of positive trend of increase in sales of cosmetics.

Seasonality – The sales is positive from Q1 and grows maximum at Q3 and then it decreases in Q4 and again raises in Q1.

Discount and Display promotion plays a very important part in the sales of the cosmetics.

So, to predict the sales of no of units in Q3 of 2015 we need to put the respective values in the model formed as below –

Unit sold = Constant + Trend*Year no + Seasonality index of the Quarter + Region index + Discount*Discount index +  Display Promotion

The Predicted values of sales in entire Q3 for all the regions are shown in the lower side of the Regression tab

Model’s Worthiness

Now it is also important to know of the model is valid and reliable. So, we have checked the same through data analysis method shown in the Correlation tab of the file. We see that the model is valid as Significant f value is very low. The R2 value is around 82%, which means that this regression model gives 82% accurate result as compared to considering average as predicted forecasted value. You can see from the graph between the Units Sold and the Forecasted values of the same period, that the forecasted line is almost overlapping the Actual Units sold. So the model seems valid.

But there is a problem in the p values of the all the independent values except Discount and Display promotion. These are dummy variables and are not giving much problems in the predicted values. However, you can remove these variables and do the regression yourself with rest of the independent variables with values less than 0.05 to check if R2 value improves. If you do it then let us know your findings in the comment or email us at shoutingdata@gmail.com.

We did the correlation test to check the strength of the relations between the dependent and independent variables in the same sheet. We see that Discount and Display promotion are collinear as they have high correlation. So, one of the variables can be removed from the model. But it is better to replace is with some other independent variable which is not related with either of the two variables to keep the model strong.

Conclusion

However, till the time the Marketing Manager finds out which variable can be considered and its value is present in their MIS system or not, till then he can use the current model built. If you think of any other variable that can be included here, then answer the same in the comment or email is. You may leave your suggestions or any needed improvement from your side in our blogs, in the comments. We will try our best to meet your expectations.