Skip to Main Content

Statistics

Confidence Interval

Confidence Intervals are intervals that estimate a population parameter. 

For example, it is very difficult to collect every data value to calculate the population mean \(\mu\), or standard deviation \(\sigma\). But, we can estimate the parameters using confidence intervals.

Confidence interval for a population mean with normal distribution:

\(\sigma\) is known:

\[\mu=\bar{x} \pm (z_{\alpha/2})\left( \frac{\sigma}{\sqrt{n}}\right)\]

\(\sigma\) is unknown:

\[\mu\approx \bar{x} \pm (z_{\alpha/2})\left( \frac{s}{\sqrt{n}}\right)\]

where \(z_{\alpha/2}\) is the z-value corresponding to an area \(\alpha/2\) in the tail of a standard normal distribution. 

Conditions Required:

  • A random sample is selected from the target population
  • The distribution is approximately normal
  • The sample size \(n\) is large (e.g., \(n\geq 30\))

The confidence level is the probability that an interval estimator encloses the population parameter. For example, a 95% confidence level represents that there is a 95% chance that the confidence interval contains the population mean.

Common Misinterpretations:

It does not mean that there is a 95% chance that the population mean lies within the interval.

It also does not mean that the confidence interval contains 95% of the data values.

Commonly used confidence levels.

Confidence Level \(100(1-\alpha)\%\) \(\alpha\) \(\alpha/2\) \(z_{\alpha/2}\)
90% 0.10 0.05 1.645
95% 0.05 0.025 1.960
99% 0.01 0.005 2.575

Example: A random sample of 100 observations from a normally distributed population possesses a mean equal to 83.2 and a standard deviation equal to 6.4. Find a 95% and 99% confidence interval for \(\mu\). 

Solution:

Since the sample is large (n=100) and the distribution is normally distributed we can use the formula, we can calculate the confidence interval for the population mean using the z-table. The sample standard deviation is given (s=6.4).

95% confidence interval. \(z_{0.05/2}=1.960\)

\begin{align} \mu &\approx \bar{x} \pm (z_{0.05/2})\left( \frac{s}{\sqrt{n}}\right) \\ &\approx 83.2 \pm 1.960\left( \frac{6.4}{\sqrt{100}}\right) \\ &\approx 83.2 \pm 1.2544 \\ &\approx (81.9456,84.4544)\end{align}

There is a 95% chance that the interval (81.9456,84.4544) contains \(\mu\).

99% confidence interval. \(z_{0.01/2}=2.575\)

\begin{align} \mu &\approx \bar{x} \pm (z_{0.01/2})\left( \frac{s}{\sqrt{n}}\right) \\ &\approx 83.2 \pm 2.575\left( \frac{6.4}{\sqrt{100}}\right) \\ &\approx 83.2 \pm 1.648 \\ &\approx (81.552,84.848)\end{align}

There is a 99% chance that the interval (81.552,84.848) contains \(\mu\).

Confidence Interval for Small Samples

For smaller distributions, the z-statistic is no longer an accurate measure because the small number of samples does not ensure that the distribution is normal. However, we can use the t-statistic to approximate a normal distribution.

Confidence Interval for small samples:

If \(\sigma\) is known, you can still use the z-statistic. If \(\sigma\) is unknown.

\[\mu \approx \bar{x} \pm t_{\alpha/2}\left(\frac{s}{\sqrt{n}}\right)\]

where \(t_{\alpha/2}\) is the t-value corresponding to an area \(\alpha/2\) in the upper tail of the Students' t-distribution based on \((n-1)\) degrees of freedom. 

Conditions to use t-statistic in confidence interval:

  • A random sample is selected from the target population
  • The population has a distribution that is approximately normal
  • \(\sigma\) is not given
  • Sample size is small (e.g.,\(n < 30\))

Example: Suppose you have selected a random sample of \(n=13\) measurements from a distribution that is approximately normal. The sample reported \(\bar{x}=53.4g\) and \(s=8.6g\). Find the 98% confidence interval. 

Solution:

Since the sample is small \(n=13\) and \(\sigma\) is not given. We will have to use the t-statistic.

98% confidence interval, degree of freedom = 12, \(\alpha=0.02/2=0.01\), \(t_{0.01} = 2.681\) from t-table

\begin{align} \mu &\approx \bar{x} \pm t_{0.02/2}\left(\frac{s}{\sqrt{n}}\right) \\ &\approx 53.4 \pm 2.681\left(\frac{8.6}{\sqrt{13}}\right) \\ &\approx 53.4 \pm 6.39475 \\ &\approx (47.00525,59.79475) \end{align}

There is a 98% chance that the interval (47.00525g,59.79475g) contains \(\mu\).

Confidence Interval for Population Proportion

Sometimes your data is expressed as a proportion or fraction of successes, \(\hat{p}\), with proportional mean, \(p\).

Sampling Distribution of \(\hat{p}\)

  1. The mean of the sampling distribution of \(\hat{p}\) is \(p\)
  2. The standard deviation of the sampling distribution of \(\hat{p}\) is \(\sqrt{pq/n}\); that is \(\sigma_{p}=\sqrt{pq/n}\), where \(q=1-p\)
  3.  For large samples, the sampling distribution of \(\hat{p}\) is approximately normal. A sample size is considered large if both \(n\hat{p}\geq15\) and \(n\hat{q}\geq15\)

Large-Sample Confidence Interval for p

\[p \approx \hat{p} \pm z_{\alpha/2}\sigma_{\hat{p}} = \hat{p} \pm z_{\alpha/2}\sqrt{\frac{pq}{n}} \approx \hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}\hat{q}}{n}}\]

where \(\hat{p}=\frac{x}{n}\) and \(\hat{q}=1-\hat{p}\)

Conditions Required for a Valid Large-Sample Confidence Interval for p:

  • A random sample is selected from the target population
  • The sample size n is large. 

Example: A random sample of size \(n=196\) yielded \(\hat{p}=0.64\). Construct a 95% confidence interval for p.

Solution:

\(\hat{p}=0.64, \hat{q}=1-0.64=0.36, z_{0.05/2}=1.96\)

\begin{align} p &\approx \hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}\hat{q}}{n}}\\ &\approx 0.64 \pm 1.96 \sqrt{\frac{(0.64)(0.36)}{196}} \\ &\approx 0.64 \pm 0.0672 \\ &\approx (0.5728,0.7072) \end{align}

There is a 95% chance the interval (0.5728,0.7072) contains p.


Statistics by Matthew Cheung. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

chat loading...