1.4 Concluding Remarks

A communication scientist wants to know whether children are sufficiently aware of the dangers of media use. On a media literacy scale from one to ten, an average score of 5.5 or higher is assumed to be sufficient.

If we translate this to the simple candy bag example, we realize that the outcome in our sample does not have to be the true population value, for example twenty per cent. If twenty per cent of all candies in the population are yellow, we could very well draw a sample bag with fewer or more than twenty per cent yellow candies.

Average media literacy, then, can exceed 5.5 in our sample of children, even if average media literacy is below 5.5 in the population or the other way around. How we decide on this is discussed in later chapters.

1.4.1 Sample characteristics as observations

Perhaps the most confusing aspect of sampling distributions is the fact that samples are our cases (units of analysis) and sample characteristics are our observations. We are accustomed to think of observations as measurements on empirical things such as people or candies. We perceive each person or each candy as a case and we observe a characteristic that may change across cases (a variable), for instance the colour or weight of a candy.

In a sampling distribution, however, we observe samples (cases) and measure a sample statistic as the (random) variable. Each sample adds one observation to the sampling distribution and its sample statistic value is the value added to the sampling distribution.

1.4.2 Means at three levels

If we are dealing with the proportion of yellow candies in a sample (bag), the sample statistic is a proportion and we want to know the proportion of yellow candies in the population. The sampling distribution collects a large number of sample proportions. The mean of the proportions in the sampling distribution (expected value) equals the proportion of yellow candies in the population, because a sample proportion is an unbiased estimator of the population proportion.

Things become a little confusing if we are interested in a sample mean, such as the average weight of candies in a sample bag. Now we have means at three levels: the population, the sampling distribution, and the sample.

Figure 1.6: What is the relation between the three distributions?

The sampling distribution, here, is a distribution of sample means but the sampling distribution itself also has a mean, which is called the expected value or expectation of the sampling distribution. Don’t let this confuse you. The mean of the sampling distribution is the average of the average weight of candies in every possible sample bag. This mean of means has the same value as our first mean, namely the average weight of the candies in the population because a sample mean is an unbiased estimator of the population mean.

Remember this: The population and the sample consist of the same type of observations. In the current example, we are dealing with a sample and a population of candies. In contrast, the sampling distribution is based on a different type of observation, namely samples, for example, sample bags of candies.

The sampling distribution is the crucial link between the sample and the population. On the one hand the sampling distribution is connected to the population because the population statistic (parameter), for example, average weight of all candies, is equal to the mean of the sampling distribution. On the other hand, it is linked to the sample because it tells us which sample means we will find with what probabilities. We need the sampling distribution to make statements about the population based on our sample.