When dealing with numerical data sets, the terms “standard error” (standard error of the sample mean) and “standard deviation” (standard deviation of the sample) are often confused. I took up a quest of finding the best definitions and examples, so that the difference between them becomes clear in simple and intuitive way.
Disclaimer: This text is a compilation of articles I found on the Internet. All the credit goes to the rightful owners. Please check the list of references at the end of this article.
The SD quantifies scatter – estimates how much the values vary from the mean
Variance measures the spread of your results. On its own, the variance isn’t the most useful statistic; however, taking the square root of the variance gives you the standard deviation which indicates (estimates) how much the entire set of data deviates from the mean.
In other words, the standard deviation (SD) quantifies/estimates variability or scatter (a measure of how spread out numbers are), and it is expressed in the same units as your data. If the data are sampled from a Gaussian distribution, then you expect 68% or a bit more than two thirds of the values to lie in the interval mean plus or minus one SD and 95% to lie within two SD of the mean.
Contrary to popular misconception, the standard deviation is a valid measure of variability regardless of the distribution. About 95% of observations of any distribution usually fall within the 2 standard deviation limits, though those outside may all be at one end. We may choose a different summary statistic, however, when data have a skewed distribution.
This figure shows three sets of data, all with exactly the same mean and SD. The sample on the left is approximately Gaussian. The other two samples are far from Gaussian yet have precisely the same mean (100) and standard deviation (35).
This graph points out that interpreting the mean and SD can be misleading if you assume the data are Gaussian, but that assumption isn’t true.