Deviation, Standard Deviation and Variance are the terms which are very confusing to many. Many misunderstand and do not find enough examples to differentiate the terms and concepts. Standard Deviation is the most important concept to clear out to understand higher concept of analytics

These are useful measure of **Dispersion**. Some simple measure of dispersion is Range, quartile, percentile etc.

Let us take them one by one –

The working file gives stock of material in store of a shop, per day for a particular month. Please download the working file from here.

## Understanding **Deviation**

The difference between the individual data from the central tendency of the data set is deviation. It is simply difference between two numbers and the result which comes out is either positive or negative.

From the file you can see the value of deviation taken, it is called **Deviation from the Mean**. If you study regression, then you will find deviation from the predicted value, which we will see in future article.

Is deviation very useful? Only when one needs to see how individual values are differing from the central tendency.

In the working file the deviation can tell how individual stock value per day is differing from the average stock of the month. On the first day, the stock was in excess of 111 over the average stock of the month. On the third day the stock was 88 less in numbers with respect to the average.

## Understanding **Variance**

It is the average of the square of deviation. First you need to square the value of the deviation, sum it all and divide by total no of data. In the working file also, you may find the same done.

But what does variance signify? Frankly, it does not tell anything significant. It is just an average of the square of deviation. Its unit is also different so can’t be compared with deviation or anything. It is just an intermediary value which will be used to find significant thing called Standard Deviation. To find standard deviation, we have to find out the Variance, that is the only significant thing about it.

## Understanding **Standard Deviation**

It is square root of Variance, of same unit as that of the data, so comparison becomes easy. It tells where the values of the data set are located with respect to the central tendency. Suppose your house is at the middle of the lane and there are 15 houses to the right as well as on the left. Now does it become easy to describe anyone who lives at which position to your house? Similarly, standard deviation, helps in determining the position of the data, and by which one can determine if the data set is highly dispersed or not for their further work.

In the working example, we divided the curve on the X-axis into three part of standard deviation, both to the right and left of the mean. Now it becomes easy to locate which data lies in which are and how close it is to standard deviation.

To know how to create the curve ** click here** to read more.

We can see that most of the data – 64% lies within the first standard deviations, i.e. between 116.13 and 361.34. Or we can Say that 64% of the stock is close to the mean stock of the month. Or 64% of the stock data per day varies very less and is around the mean stock and lies within the standard deviation of 116 to 361.

Accordingly, the stores manager can work out to reduce the deviation of the rest of the 36% of the stock for following month.

## Understanding Dispersion

The spread of data around the central tendency is called Dispersion. Some bell curve are less spread, and some are more spread.

If the data is highly dispersed, then: –

- The data set may not be reliable, as highly dispersed data represents lesser the central tendency.
- Remove the outliers and work on collecting proper data, or may improve process for better data.

To read more about normal distribution **click here**