Example 1: Sampling Work by Selecting Times

Suppose that a library director wants to study how she uses her time throughout a period of several weeks. She will do this by selecting random work times and recording what she is doing at that moment. She wants to know the percentage of time she spends on each task category with an error of plus or minus 5%. A confidence of 90% seems sufficient. She thinks that each important aspect of her job will be 20% or less of her day and so will use this figure to estimate the variance. The sample size program gives a size of 174. She thinks that a study carried out over two weeks would be representative and a sample of 174 would mean interrupting work about every 40 minutes. This seems like an acceptable level of inconvenience to live with for two weeks. The sample design will have four categories, first selecting one of two weeks, then one of five work days, next an hour between 7 am and 6 pm, and finally a minute between 0 and 59.

The executive knows that in fact she does not work from 7am to 7pm (actually 6:59pm) every day, so that some of the sample times will simply be skipped. In this case the sample drawn should be increased, perhaps to 200 observations. A supplemental sample cannot be used since one would not know how big a supplement to draw until the period for doing the study was over.

To collect the data, the library director draws a random sample of 200 according to the design:

	Category 1 -- Week  numbered from 1 to 2
	Category 2 -- Day named Mon  Tue  Wed  Thur  Fri
	Category 3 -- Hour named 7am  8am  9am  10am 11am
	      12noon  1pm  2pm  3pm  4pm  4pm  5pm  6pm
	Category 4 -- Minute numbered from 0 to 59

She has the program to print a sorted list of sample times.
Week	1	Day	Tue	Hour	5pm	Minute	45	
Week	1	Day	Wed	Hour	11am	Minute	26	
Week	1	Day	Wed	Hour	2pm	Minute	13	
Week	1	Day	Thur	Hour	9am	Minute	12	
Week	2	Day	Mon	Hour	12noon	Minute	43	
Week	2	Day	Tue	Hour	9am	Minute	46	
Week	2	Day	Tue	Hour	1pm	Minute	27	
Week	2	Day	Wed	Hour	12noon	Minute	5	
Week	2	Day	Thur	Hour	9am	Minute	22	
Week	2	Day	Fri	Hour	4pm	Minute	25	

She then uses a pocket calculator with a built in alarm clock. She sets it to the next time on the list and when it goes off, writes down what she is doing and sets the alarm for the next time.

She was initially a bit worried that since she knew what time she was setting the alarm for this might affect her behavior. She considered having her secretary keep track of the alarm and inform her when it was time to record an observation. As a test she drew a sample of 15 times over just one day (that is, using just the hour and minute categories above) and tried keeping track of her work. This test made her feel confident that setting the alarm was not going to affect her behavior.

There is a hidden moral in the example above. When the library director originally designed the study she chose a precision value of 3%, giving a sample size of 484. At the time, the precision seemed reasonable; for a task that was recorded at 20% of the total effort, the precision interval would be 17% to 23%. Somehow that seemed more accurate than the interval of 15% to 25% that was actually used. But a sample size of 484 meant interrupting work on the average of every 15 minutes. She could not see how she could do the study and still get her work done! Of course she could have extended the time period for the study from two weeks to several months. But such a lengthy study would would have been a great burden, not to mention that the longer time period would increase the likelihood of changes in the work pattern over time. Besides, there was a deadline by which the answer was needed.

Should the investigator have abandoned her study because the sample size she needed was too large? In a strict interpretation of statistical sampling, one would argue that the precision and confidence are set according to the investigator's best judgement of how much accuracy is necessary for a meaningful result. At this point, the argument would continue, the fact that the required sample size of 484 cannot conveniently be obtained, is an indicator that this particular study cannot be done. Or cannot be done in the manner originally conceived. In fact there have been scientific sampling studies which have not been done because they require sample sizes which would be impractical or present inconveniences to the subjects.

If we are very sure of our need for specific levels of confidence and precision we should probably not compromise on them. Many investigators, however, begin with values of confidence and precision which they have not completely evaluated. The investigators examine the implications of different sample sizes and ask themselves questions like, "Would data that was only this accurate be useful?" Would results at these confidence and precision values be convincing?"


this page is at http://testbed.cis.drexel.edu/sample/example1.html