Part I: The Theory of Sampling

Research begins with some population which we wish to study. Sometimes the population is so small that we can simply study it all. For example, all living former presidents of the United States or the seventeen fourth grade students in Mrs. Johnson's class. In such cases we can just collect data from each member of the population without any need for sampling. And, of course, our results will be as accurate a description of the population as our data gathering method allows. However in most cases the population is so large that we are prevented by considerations of time or effort from examining every individual. In such cases we often turn to sampling.

Possible Sampling Errors

Since we will examine only a part of the population we cannot be certain that our measurements on the sample give us exactly the same result that we would have gotten had we measured the entire population. For example, we may have asked a sample of 50 shoppers whether they had bought frozen pizza within the past week and found that 10 of them responded that they had. But we need to think about how accurately our sample reflects the buying habits of the population of all shoppers last week. We know realistically that while 20% of our sample bought pizza the fraction of the whole population that bought pizza might be some number like 18% or 23%. Or even worse, we might have accidentally found the only 10 people in the country who bought pizza! While we can never be completely certain that sampling will give us an exactly right answer there are ways to figure out how dependable our results are. In order for our readers to better understand the dependability of results based on a sample, we report on what size of error may be expected.

In fact, we do more than report the possibility of error, we actually control its magnitude by choosing how many members of the population to sample. To design a sampling study we first consider how much risk of error we will accept. We then use this information to calculate a sample size. Before we can consider what size the possible errors should be, we need to define two kinds of possible error.

The first kind of possible error is described by a value called precision or sometimes tolerance. Precision reflects the understanding that our sample value may not be exactly the value for the population as a whole. Since findings are usually reported as a percentage (20% of the shoppers bought frozen pizza last week) precision allows us to report a range of values rather than a single number. For example the value of 20% which we observed for our sample might be reported as a population value of between 16% and 24%. A shorthand way of writing this is 20 +- 4%. This is read as twenty plus or minus four percent.

The second kind of error describes the possibility that our findings are outside of the range set by our precision figure. This kind of error is described by a measure called confidence. Confidence is a report of how certain we are that the population value is within our precision interval. For example a confidence of 90% means that there is one chance in ten that our figure of 20+-4% is wrong (the 90% is the nine chances out of ten that our sample value is within the stated precision). Similarly a 99% confidence would mean that there is only one chance in 100 that our sample value (or more correctly our sample value range since the precision gives us a range of numbers) is misleading. Because confidence reports our certainty that we have avoided a large error, we usually want a sample size that will give us a the confidence value at 90%, 95%, or even 99%. In statistical jargon, choosing a certain size sample to keep error within acceptable limits is called setting the precision and confidence limits.

Why not avoid the chances of reporting misleading data by setting a very low precision such as +- 1% and a very high confidence like 99.9%? This would mean that there was only one chance in one thousand that the value obtained from our sample was more than one percent different from the value for the whole population. The answer is that, as we will see below, setting such high confidence and such a small precision range is just another way of saying that we will have a very large sample size, often very nearly the entire population. In most research situations the precision and confidence have to be set to keep the study from being too expensive or from taking too long.

The foregoing discussion may make the job of selecting a sample size seem difficult and fraught with mathematics. In practice, setting values for acceptable error is not all that difficult. In the next sections we will talk later about how researchers decide on values for confidence and precision.



[Back]
[Next]
[Contents]

this page is at http://testbed.cis.drexel.edu/sample/errors.html