Other Sources of Error

Random sampling is not some sort of magical talisman that protects an investigator from all errors, rather it is a way of predicting the likely effects of one particular kind of sampling error. As we said above, the error whose probability is expressed in the confidence and precision figures is that error of accidently getting a sample which is not exactly representative of the population. There are a great many other sources of error in research against which random sampling gives no protection, nor does it estimate their likelihood.

One common such source of error in this class is survey questions which do not, for one reason or the other, provide measures of what you wanted to measure. For example students might lie in answering a question about academic honesty or they simply might not remember accurately how many times they had used the library in the last month. Another uncontrolled type of error is experimental error. A research assistant records the wrong time for a rat's performance in a maze or a scale is miscalibrated and hence reports the wrong weights.

In addition, Random sampling is no protection against sampling a biased population. For example a written questionnaire is biased towards those who are literate in the language of the questions. A phone survey is biased toward those who have phones and who are home to answer them.

If you think that the possible sources of error seem pervasive, you are correct. In research it is almost impossible to have a sampling design which is unassailably free from error. Good researchers are not those who avoid all error, but those who control sources of error whenever possible (by using random sampling for example) and try to be aware of possible errors. At least some of these possible errors can be tested for by appropriate statistical techniques.

The Sampling Frame

It is important to be clear about what population we are actually sampling. Suppose we distribute questionnaires to a random sample of patrons arriving at the library over a period of two weeks. We are in fact sampling uses not users. That is to say, people who use the library more often are more likely to receive a questionnaire than infrequent users. Note that this is true even though patrons are not given a questionnaire if they have previously completed one. This is because each act of entering the library gives the person a chance of getting a questionnaire. The more visits, the more chances. And of course in such a situation we are ignoring those potential users who do not come to the library at all.

Similarly if we sample the cards in a card catalogue, we give books with more authors or a larger number of subject headings a greater chance of being selected. Limiting selection to main entry cards helps somewhat if our interest is in sampling bibliographic entities. But if we are trying to sample physical volumes still discriminates against books with multiple copies represented on a single card and against multi-volume works and serials.

At times there is no easy way to overcome sampling frame problems. Sometimes we may redefine the population. For example, we may use the card catalogue to sample single volume monographs, perhaps using some other source to sample other works.


this page is at http://testbed.cis.drexel.edu/sample/errorsource.html