Library Guides: Statistics: Describing Data using the Mean and Standard Deviation

How does the mean and standard deviation describe data?

The standard deviation is a measurement in reference to the mean that means:

A large standard deviation indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean.
When deciding whether sample measurements are suitable inferences for the population, the standard deviation of those measurements is of crucial importance.
Standard deviations are often used as a measure of risk in finance associated with price-fluctuations of stocks, bonds, etc.

Chebyshev's rule is an approximation of the percentage of data points captured between deviations of any data set.

Chebyshev's Theorem It is possible that very few of the measurements will fall within one standard deviation of the mean (Consider a bimodal distribution where the two modes are on both extreme ends). At least \(\frac{3}{4}\) of the measurements will fall within two standard deviations of the mean. At least \(\frac{8}{9}\) of the measurements will fall within three standard deviations of the mean. Generally, for any number k greater than 1, at least \(1-\frac{1}{k^2}\) of the measurements will fall within k standard deviations of the mean.

undefined

Example: A sample of size \(n=50\) has mean \(\bar{x}=28\) and standard deviation \(s=3\). Without knowing anything else about the sample, what can be said about the number of observations that lie in the interval \(922,34)\)? What can be said about the number of observations that lie outside the interval?

Solution:

The interval \((22,34)\) is formed by adding and subtracting two standard deviations from the mean. By Chebyshev's Theorem, at least \(\frac{3}{4}\) of the data are within this interval. Since \(\frac{3}{4}\) of \(50\) is \(37.5\), this means that at least 37.5 observations are in the interval. But \(.5\) of a measurement does not make sense, so we conclude that at least 38 observations must lie inside the interval \((22,34)\).

If \(\frac{3}{4}\) of the observations are made inside the interval, than \(\frac{1}{4}\) of them are outside. We conclude that at most 12 \((50-38=12)\) observations lie outside the interval \((22,34)\).

There are more accurate ways of calculating the percentage or number of intervals inside standard deviations. Chebyshev's Theorem and the empirical rule we'll introduce next are just approximations.

If the histogram of a data set is approximately bell-shaped, we can approximate the percentage of data between standard deviations using the empirical rule.

Empirical Rule Approximately 68% of the measurements will fall within one standard deviation of the mean. Approximately 95% of the measurements will fall within two standard deviation of the mean. Approximately 99.7% of the measurements will fall within three standard deviation of the mean.

undefined

Example: Heights of 18-yr-old males have a bell-shaped distribution with mean \(69.6\) inches and standard deviation \(1.4\) inches. About what proportion of all such mean are between 68.2 and 71 inches tall? And What interval centered on the mean should contain about 95% of all such mean?

Solution:

Since the interval \((68.2,71.0)\) are one standard deviation from the mean, by the emprical rule, 68% of all 18-year old males have heights in this range.

95% by the empirical rule represents plus/minus two standard deviations from the mean.

\[\bar{x} \pm 2s = 69.6 \pm 2(1.4) = (66.8,\,72.4)\]

Therefore, 95% of the mean are between 66.8 inches to 72.4 inches.

Practice Questions - Empirical Rule

Statistics by Matthew Cheung. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.