Library Guides: Statistics: Collecting Data

Data Collection

There are different ways you can collect data. One is to use data from a published source such as Statistics Canada.

A designed experiment is a data collection method where the researcher exerts full control over the characteristics of the experimental units sampled. These experiments typically involve a group of experimental units that are assigned the treatment and an untreated (or control) group.

An observational study is a data collection method where the experimental units sampled are observed in their natural setting. No attempt is made to control the characteristics of the experimental units sampled. (Examples include opinion polls and surveys.)

As discussed in previous pages, it is often very difficult to collect data from the entire population. Thus, we can use a sample an apply inferential statistics.

A representative sample exhibits characteristics typical of those possessed by the target population.

There are different ways a representative sample can be collected to avoid biases.

A simple random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.

There are more complex random sample collections:

A stratified random sample is typically used when the experimental units associated with the population can be separated into two or more groups of units, called strata.
Cluster sampling samples natural groupings of experimental units first, then collect data from all experimental units within each cluster.
Systematic sample involves systematically selecting every kth experimental unit from a list of all experimental units.
A randomized response sampling randomizes the selection of data from everyone who answers to a poll or survey.

Biases

No matter what type of sampling design you employ to collect the data for your study, be careful to avoid selection bias.

Selection bias results when a subset of experimental units in the population has little or no chance of being selected for the sample.

For example, if you are collecting data of Centennial students and you survey the next 10 students you meet. You may result in a data pool that has no students from the School of Business.

Nonresponse bias is a type of selection bias that results when data on all experimental units in a sample are not obtained.

For example, you ask a question on a survey where the response may put the participant in jeopardy of disclosing information they wish to keep private or can cause them to be excluded from something. As a result, they choose not to respond.

Finally, even if your sample is representative of the population, there can be a measurement error.

Measurement error refers to inaccuracies in the values of the data collected. In surveys, the error may be due to ambiguous or leading questions and the interviewer's effect on the respondent.

For example, a survey contains the following question, "It has been found that smoking causes cancer, do you plan to smoke?" Based on the way the question is asked, it it leading towards a "No" response.

Statistics by Matthew Cheung. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.