# Statistics

## Confidence Interval

Confidence Intervals are intervals that estimate a population parameter.

For example, it is very difficult to collect every data value to calculate the population mean $$\mu$$, or standard deviation $$\sigma$$. But, we can estimate the parameters using confidence intervals.

 Confidence interval for a population mean with normal distribution: $$\sigma$$ is known: $\mu=\bar{x} \pm (z_{\alpha/2})\left( \frac{\sigma}{\sqrt{n}}\right)$ $$\sigma$$ is unknown: $\mu\approx \bar{x} \pm (z_{\alpha/2})\left( \frac{s}{\sqrt{n}}\right)$ where $$z_{\alpha/2}$$ is the z-value corresponding to an area $$\alpha/2$$ in the tail of a standard normal distribution.  Conditions Required: A random sample is selected from the target population The distribution is approximately normal The sample size $$n$$ is large (e.g., $$n\geq 30$$)

The confidence level is the probability that an interval estimator encloses the population parameter. For example, a 95% confidence level represents that there is a 95% chance that the confidence interval contains the population mean.

Common Misinterpretations:

It does not mean that there is a 95% chance that the population mean lies within the interval.

It also does not mean that the confidence interval contains 95% of the data values.

Commonly used confidence levels.

 Confidence Level $$100(1-\alpha)\%$$ $$\alpha$$ $$\alpha/2$$ $$z_{\alpha/2}$$ 90% 0.10 0.05 1.645 95% 0.05 0.025 1.960 99% 0.01 0.005 2.575

Example: A random sample of 100 observations from a normally distributed population possesses a mean equal to 83.2 and a standard deviation equal to 6.4. Find a 95% and 99% confidence interval for $$\mu$$.

Solution:

Since the sample is large (n=100) and the distribution is normally distributed we can use the formula, we can calculate the confidence interval for the population mean using the z-table. The sample standard deviation is given (s=6.4).

95% confidence interval. $$z_{0.05/2}=1.960$$

\begin{align} \mu &\approx \bar{x} \pm (z_{0.05/2})\left( \frac{s}{\sqrt{n}}\right) \\ &\approx 83.2 \pm 1.960\left( \frac{6.4}{\sqrt{100}}\right) \\ &\approx 83.2 \pm 1.2544 \\ &\approx (81.9456,84.4544)\end{align}

There is a 95% chance that the interval (81.9456,84.4544) contains $$\mu$$.

99% confidence interval. $$z_{0.01/2}=2.575$$

\begin{align} \mu &\approx \bar{x} \pm (z_{0.01/2})\left( \frac{s}{\sqrt{n}}\right) \\ &\approx 83.2 \pm 2.575\left( \frac{6.4}{\sqrt{100}}\right) \\ &\approx 83.2 \pm 1.648 \\ &\approx (81.552,84.848)\end{align}

There is a 99% chance that the interval (81.552,84.848) contains $$\mu$$.

## Confidence Interval for Small Samples

For smaller distributions, the z-statistic is no longer an accurate measure because the small number of samples does not ensure that the distribution is normal. However, we can use the t-statistic to approximate a normal distribution.

 Confidence Interval for small samples: If $$\sigma$$ is known, you can still use the z-statistic. If $$\sigma$$ is unknown. $\mu \approx \bar{x} \pm t_{\alpha/2}\left(\frac{s}{\sqrt{n}}\right)$ where $$t_{\alpha/2}$$ is the t-value corresponding to an area $$\alpha/2$$ in the upper tail of the Students' t-distribution based on $$(n-1)$$ degrees of freedom.

Conditions to use t-statistic in confidence interval:

• A random sample is selected from the target population
• The population has a distribution that is approximately normal
• $$\sigma$$ is not given
• Sample size is small (e.g.,$$n < 30$$)

Example: Suppose you have selected a random sample of $$n=13$$ measurements from a distribution that is approximately normal. The sample reported $$\bar{x}=53.4g$$ and $$s=8.6g$$. Find the 98% confidence interval.

Solution:

Since the sample is small $$n=13$$ and $$\sigma$$ is not given. We will have to use the t-statistic.

98% confidence interval, degree of freedom = 12, $$\alpha=0.02/2=0.01$$, $$t_{0.01} = 2.681$$ from t-table

\begin{align} \mu &\approx \bar{x} \pm t_{0.02/2}\left(\frac{s}{\sqrt{n}}\right) \\ &\approx 53.4 \pm 2.681\left(\frac{8.6}{\sqrt{13}}\right) \\ &\approx 53.4 \pm 6.39475 \\ &\approx (47.00525,59.79475) \end{align}

There is a 98% chance that the interval (47.00525g,59.79475g) contains $$\mu$$.

## Confidence Interval for Population Proportion

Sometimes your data is expressed as a proportion or fraction of successes, $$\hat{p}$$, with proportional mean, $$p$$.

 Sampling Distribution of $$\hat{p}$$ The mean of the sampling distribution of $$\hat{p}$$ is $$p$$ The standard deviation of the sampling distribution of $$\hat{p}$$ is $$\sqrt{pq/n}$$; that is $$\sigma_{p}=\sqrt{pq/n}$$, where $$q=1-p$$  For large samples, the sampling distribution of $$\hat{p}$$ is approximately normal. A sample size is considered large if both $$n\hat{p}\geq15$$ and $$n\hat{q}\geq15$$
 Large-Sample Confidence Interval for p $p \approx \hat{p} \pm z_{\alpha/2}\sigma_{\hat{p}} = \hat{p} \pm z_{\alpha/2}\sqrt{\frac{pq}{n}} \approx \hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}\hat{q}}{n}}$ where $$\hat{p}=\frac{x}{n}$$ and $$\hat{q}=1-\hat{p}$$

Conditions Required for a Valid Large-Sample Confidence Interval for p:

• A random sample is selected from the target population
• The sample size n is large.

Example: A random sample of size $$n=196$$ yielded $$\hat{p}=0.64$$. Construct a 95% confidence interval for p.

Solution:

$$\hat{p}=0.64, \hat{q}=1-0.64=0.36, z_{0.05/2}=1.96$$

\begin{align} p &\approx \hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}\hat{q}}{n}}\\ &\approx 0.64 \pm 1.96 \sqrt{\frac{(0.64)(0.36)}{196}} \\ &\approx 0.64 \pm 0.0672 \\ &\approx (0.5728,0.7072) \end{align}

There is a 95% chance the interval (0.5728,0.7072) contains p. 