Critical Elements in Weeding An Elementary School Collection:

A Random Sampling Approach


Diane M. Oesau

M. Carl Drott


The statistical method of random sampling was used to examine characteristics such as age, circulation, and physical condition of books in a sample portion of an elementary school collection in order to draw conclusions about the collection as a whole before a weeding plan was established. The method, analysis procedures, and findings used to determine criteria for the de-acquisition of seldom used or out of date materials are explained.


There is little disagreement about the need for conscientious weeding of unused materials from any school library collection. Media specialists often point to the need for more efficient and aesthetic use of limited available space. Out-dated information must be removed and subject areas deserving a greater percentage of acquisition funds identified. In addition, weeding studies indicate that enhanced browsability occurs as a result of weeding and that this is a factor in increasing circulation (Slote, 1982). If collections in schools are expected to meet the changing information requirements of a dynamic curriculum, then a systematic approach to weeding must become a part of overall collection development strategies.

Although there appears to be consensus on the need to weed, there is none on how best to do it, or what evidence should be collected in preparation to weed. Many collection evaluation approaches are not well-suited to school media centers. They often take on an entire collection at once, focus on examination of each individual title, and require that a knowledgeable professional make subjective judgements about each item's potential utility to users.

For some media specialists, a weeding policy means tossing out volumes when they begin to crumble, or creating space by removing the Nth duplicate copy of a single title. Many weeding strategies are incapable of being characterized in terms of the objective results that are needed to justify essential budget requests for additional materials funds School library media specialists need to know what constitutes "usefulness" in their collections. Are materials meeting the curricular and recreational needs of students? Are materials circulating? If so, how frequently? Does the information in the books available reflect current thought?What is the condition, age and utility of the collection? These are the types of evaluative questions media specialists must ask before embarking on any comprehensive weeding plan. A recent monograph by Doll and Barron (1991) describes, in greater detail than this article, weeding policy considerations and the use of random sampling.

This article describes how a statistical method known as random sampling was used to examine characteristics such as age, circulation, and physical condition of books in a school collection in order to draw conclusions about the collection as a whole before a weeding plan was established. It shows how a media specialist employed this technique. Findings were used to determine initial cut-off criteria for the de-acquisition of seldom-used and out-of-date materials, and to develop criteria and an approach for future collection development. The data collected by sampling was used to explain the need for weeding to the principal and district administration. The fact that the weeding plan was based on objective data allowed the media specialist to make a strong case for the need for special funding to undertake the task of weeding unused items and replacing out-of-date materials.


Sampling is a research technique which assumes that values or characteristics that are true for a statistically-drawn portion of a population hold true for the population as a whole. Sampling is often used when the entire population is too big, or too costly, to examine in its entirety. It is a mathematical way of reducing the task of finding out what the current collection is "really like" to a manageable size.

Statistical theory supports the concept of sampling. It allows the researcher, in effect, to control how the sample is to be drawn. By choosing values specific to the study at hand, the researcher estimates how many items in the sample will be needed to represent the entire collection, and how reliable the results will be. This "tailor-made" aspect makes sampling a useful tool for media specialists who must implement an evaluation method within the time and staffing constraints of a busy school library media center schedule.

A possible semantic misunderstanding may need to be cleared up. The word "random" often connotes something haphazard or chaotic. Random sampling is neither of these. Sampling is said to be random only in that it affords every item in the population an equal opportunity for selection into the sample. In other words, every book in the monograph collection will have the same chance of being chosen as part of the group to represent the entire collection.

Samples are designed in terms of their risk of error. Random sampling allows the researcher to choose a sample which keeps the possibility of error within acceptable limits. The researcher controls the magnitude of error by choosing how many members of the population to sample. A sampling study considers how much error to accept and then calculates a sample size based on these error parameters.


The elementary school that was the focus of this study is located in a suburb of a major metropolitan area. Three-hundred fifty-seven students are enrolled in grades kindergarten through six. The school also serves four Special Education classes: three classes of trainable mentally retarded students, and one class of emotionally-disturbed students ranging in age from 8 to 12 years. The school library media center collection is expected to support curricular and recreational use on the part of all students and teachers.

The media center's monograph collection contains approximately 12,000 volumes representing 8,500 unique titles. The circulating collection is divided into fiction and nonfiction sections, as well as a section designated "easy". Nonfiction titles are classified and organized according to the Dewey Decimal system. The circulation system is not automated at the present time. The school employs one professional Library Media Specialist, who is supported by a group of 12 adult volunteers. Four different individuals have held this position since the school opened in 1968. As far as the present media specialist knows, the collection has never been thoroughly or systematically weeded.

There is a unique history to this collection which makes evaluation and weeding especially difficult. In the late 1970's and early 1980's, six elementary schools throughout the district closed due to declining enrollments and shifts in budgetary priorities. Book collections from these schools were re-distributed to other schools. Naturally, some of these orphaned volumes found their way into the collection discussed here. However, no adequate records were kept as to which volumes were added and when. Thus the contents of some sections of the collection may reflect the vagaries of apportionment rather than considered selection decisions.

Though the overall physical condition of the books appears to be good, some materials are obviously dated. A cursory examination of titles in the nonfiction section - books about ethnic groups in America, energy, space exploration and computer technology - reveal terms that are well out of date. The copyright dates for these books, some from the 1950's and 1960's, support the assumption of datedness. Nevertheless, nonfiction materials are used conscientiously for projects and other class work, as well as for leisure browsing. Most bookshelves are tightly packed, making it difficult for students to browse effectively. This crowded condition may be affecting collection use.

The media specialist believes that age may be a factor in material use. She wants to look at the age and circulation histories of books in the sample to see how these characteristics relate to each other, and to the physical condition of the collection. Based on this information she will develop a weeding strategy which defines the characteristics of materials to be considered for removal from the collection. Since the collection is too large to examine in its entirety within a reasonable period of time, a random sample of titles can be used to efficiently describe collection characteristics. For this particular study, it was decided to limit the analysis to that portion of the collection that is allowed to circulate. The random sample will be drawn using the shelf list catalog. Reference, nonprint, and periodical materials will not be included.


Preparation for the actual drawing of the sample involves two steps; calculating the sample size and organizing a set of random numbers so that they can correspond to specific items in the collection. A method for performing these steps manually has previously appeared in the literature and will not be repeated here (Drott, 1969). In this case the media specialist had access to a computer program which automates the process at a considerable savings in time and effort.

To prepare to calculate the sample size it was necessary to considered how the information that was collected about the collection was going to be used. The first user was the media specialist herself. She wanted to know how old the materials in various parts of the collection were, how recently the materials had been used, and what the overall condition of the collection was.

The medial specialist is very familiar with the collection and unlikely to be mislead by a non-representative sample. What the media specialist does need is some qualitative estimates. If a weeding rule were to eliminate books more than twenty years old that hadn't circulated in the last five years, how many books would that be? If it was a goal to repair one quarter of the books that are in poor condition, how many books would have to be repaired?

A second important group of constituencies for accurate information about the collection is the administration, the school board, and ultimately the community. In the first place, any discarding of library materials is likely to be disturbing to those who are not used to thinking about media centers beyond the physical of volumes on the shelves. To such a view, any reduction in quantity must also be a diminution in quality. Members of the community who are involved in planning and decision making in their daily work routinely expect that proposals will be supported by objective information in addition to individual professional judgement. A proposal for media center collection restructuring which can present both of these aspects is likely to be more attractive than one that is less broadly supported.

A computer program which calculates the accuracy of the results for any given sample size was used (Drott, 1992). After examining various alternatives a sample size of 432 was decided upon. With this sample size it is fairly certain that the error caused by using a sample rather than looking at the whole collection is less than 4%. For example, 32% of the books in the sample were found to be specifically designated as "easy reading." For the selected sample size there is less than one chance in twenty that the fraction of "easy reading" books in the entire collection is less than 28% or more than 36%.


If the purpose of our study is to collect data from a sample of books, then our first problem is to to find a way to pick the books for our sample. Experience with random sampling has shown that going directly to the shelves is very difficult. Fat books are easier to select than thin books. Books on mid-level shelves catch the eye more readily than books on high or low shelves. Books that are in circulation don't have a chance of being selected. As a result, most collection sampling projects pick books from a card file, preferably the shelf list.

The easiest way to select from a shelflist is to select specific cards by measuring. The mechanics of this process are described below. In order to select specific cards the random sampling program was instructed that for each card to be picked it should select a drawer, then select a number of inches, and finally select a number of sixteenths of an inch to complete the measurement. Formally this could be specified as:

Category 1--drawers numbered 1 to 20 (the number of drawers in the shelf list)

Category 2--inches numbered 0 to 15 (the depth in inches of the deepest drawer)

Category 3--sixteenths numbered 0-15 (completion of the depth measurement)

The result is a list of 432 sample points such as these:

Drawer 1 inch 12 sixteenths 8
Drawer 4 inch 7 sixteenths 12
Drawer 9 inch 6 sixteenths 10
For convenience in drawing the sample the sample points were sorted by the computer.


A true measurement of shelf list cards is necessary to preserve the integrity of the sample. Before measuring, all paper clips, rubber bands and guide cards from the drawer were removed and replaced with self-stick note papers for easy re-filing. The first drawer was placed on a table and the metal rod removed. The volunteer drawing the sample laid a ruler along the top of the cards starting with the first card, not the front of the drawer and measured in to the first number given in the sample description list. The corner of that card was tilted slightly but not removed. (Removing it would have skewed all subsequent measurements for this drawer.) When all possible measurements for a drawer had been made, the tilted cards were banded together. Then the metal rod was replaced and the drawer was marked "done." If a particular media center were to prefer not to remove the shelflist cards, the information from the selected cards could be transcribed at the drawer.

It is important to note here that all measurements given in the sample may not be possible for all drawers in the catalog. Some drawers may not contain as many cards as others, or some cards may be for non-monographic or non-circulating materials. Measurements in the random sample list that did not apply to any given drawer were simply crossed out. The program was later used to generate a supplemental list of numbers as needed, never replicating a number given in the initial list. In this study the task of measuring was performed by the media specialist and two part-time adult volunteers. It took approximately two hours to complete the sample of 432 cards.

Once the shelf list sample was identified, the corresponding volumes were pulled from the shelves. Six student volunteers performed this phase of the project. They worked on the task for no more than 30 minutes a day, during the recess half of their lunch hour. They were rewarded with a pizza party when the data collection was completed three weeks later.

A book cart of supplies was set up so that the task of pulling the books from the shelves could be simplified. The cart contained: sharpened pencils, blank bibliographic tally sheets, a folder for completed tallies, pulled shelf list cards banded together in groups of five, and two small boxes labelled DONE and SNAGS. ("Snags" were cards representing books that are not on the shelves. These were later searched in the circulation file, repair station and temporary classroom collections of media center materials.) Students were directed to pull and tally no more than five books at a time. There were good reasons for this. Students were less likely to become bored or overwhelmed with the task ahead of them, and they were better able to concentrate and remain alert to "snags" and other irregularities. They could see the entire task completed - from locating books to recording data to replacing books on the return cart - at least once each day. As they watched the DONE box fill up they gained a sense of accomplishment. For this reason, the media specialist enlisted students who might benefit from participation in a project such as this: students in need of self-esteem building, involvement in a team effort, or simply reinforcement of library skills.

The data collected from each book in the sample included the following items:

The first three items uniquely identified each volume in the sample. Copyright dates revealed the age of the books. The last two circulation dates, when held against the date of data collection, determined an overall circulation history for the sample. The physical condition of the collection at large was judged by the condition of the sample.


The data for each book was written onto bibliographic tally sheets by the student volunteers and then transcribed into a computer spreadsheet to simplify the data analysis. A spreadsheet is simply a program that helps in building tables. When a new spreadsheet is started it looks like a blank page that has been ruled into rows and columns. For this study, each row of the spreadsheet represented the information on one book. The first column had the call number, the second column the book title, the third column the author, the fourth the copyright date, and so on for all of the information collected. Information on the next book began on the next line of the table.

For this study, one of the most powerful features of the spreadsheet was its ability to sort the data in different ways. The first thing that the media specialist did was to sort the the data by call number. This sorting placed all the books with Dewey call numbers (non fiction) at the top of the list followed by all of the "Easy" books whose call numbers started with "E" and then the fiction whose call numbers started with "F." With the data in this order, it was a simple matter to count the number of books in nonfiction, easy, and fiction (the counting is simple because the rows of the table are numbered) and to produce Table I.

Distribution of Sample by Broad Classification
Classification Percentage of sample
Nonfiction 48%
Fiction 20%
Easy 32%

The distribution of sampled books by broad classification offers an initial look at the distribution of books in the collection. As Table 1 indicates, approximately half of the collection is made up of nonfiction and one-fifth is fiction above the Easy level. The remainder are the easy reading books. This was the first time that the media specialist had ever had confirmation of her own personal impressions of the composition of the collection and the first time that she had a specific description that she could share with the principal and the teachers.

Since the collection had never been systematically evaluated or weeded, the media specialist was also curious about where the subject emphasis in the nonfiction section had been placed over the years. Since the first part of the spreadsheet was already sorted by Dewey Classification, it was a simple matter to count the number of books in each category and create Table II.

Proportion of Nonfiction Titles by Dewey Classification
Dewey Classification Proportion of Titles
000 2%
100 2%
200 3%
300 18%
400 2%
500 27%
600 9%
700 14%
800 4%
900 19%

This table indicates that the collection emphasis is in pure science(27%) and history (19%). The number of titles in these areas combined account for almost half (46%) of all nonfiction titles in the sample. In contrast, titles in technology make up less than one-tenth of nonfiction books examined. This raises questions for the media specialist regarding curriculum support. She must ensure that acquisition priorities match existing curricular emphases. This information not only lets her focus her attention on subject areas that seem underrepresented in the collection, it gives her an objective basis for opening discussions with teachers about how these underrepresented areas could be better served.

The quantitative data does not have to stand by itself in obtaining a better picture of the collection. The media specialist scanned the titles of the sample books in the Dewey 300's. It became apparent to her that most of the books in the sample which are classified in the 300's are fairy tales or folk tales. Based on her knowledge of the social science curriculum she knew that there is a need in this classification for materials that support the requirement that students find out more about their communities and those who work in them, e.g., police, fire and postal employees and their services, as well as the changing characteristics of their families. Because so little of this social science material appeared in the sample, the media specialist will consult with the faculty to further evaluate this part of the collection. In this case, the statistics of the sample were combined with professional judgement and a knowledge of the curriculum to identify a possible weakness of the collection.


The spreadsheet was used to examine the ages of the books in the sample by first sorting on the copyright date column. The publication dates were divided into ten year groups and the number of books in each group was counted. The results are shown in Table III.

TABLE III Collection by Publication Date
Before 1951 10%
1951-60 17%
1961-70 45%
1971-80 22%
1981 - After 6%

Forty five percent of the collection was published in the 1960's, with 27% published before 1961 and 28% published within the last twenty years. One problem with looking at data on the age of materials in the collection is it is hard to know what is good and what is bad. The large fraction of older materials may simply reflect the effects of the library mergers discussed above. Further, just being twenty years old is not necessarily bad. This is especially true for fiction. The indication that only 6% of the collection is from the last decade is of some concern, stirring memories of tight acquisition budgets and rapidly increasing prices. But age of material should not be used to view the collection without a consideration of how well student needs are met.

To take a different look at the age of the collection a table was prepared which looked only at the nonfiction materials. The Dewy books were sorted first by class and then by date within each class. The number of books that were less than twenty years old for each class was recorded in Table IV.

Table IV
Fraction of Dewy Collection Less than 20 Years Old
Class % Within 20 years
000 100%
100 75%
200 33%
400 33%
500 27%
600 35%
700 33%
800 25%
900 17%

This table raises real concern about the ability of the nonfiction portion of the collection to meet educational needs. Of books in the Physical Sciences (Dewey class 500), only 27% were published since 1970. These include books about atomic energy, dinosaurs, and the planets. Thirty-five percent of books in the Applied Sciences (class 600) - books on health and computer technology - are 20 years old or younger. In the area of History and Geography (900), covering countries like Japan and the Soviet Union, and collected biographies of important world figures, only 17% are that current. Most of the sampled titles in the Social Science subject area are classed as folk tales and fairy tales: 398. Age is not believed to be an imperative criterion for determining the usefulness of these books. However, some titles in the Dewey 300's that are not of this genre - books on such timely subjects as communication, transportation, waste recycling and the military and this must be considered in evaluating this class. Overall, this data strongly suggests that, in addition to weeding the collection, it will be necessary to seek special funding to obtain more current material in many nonfiction areas.


Examination of the circulation histories of sampled books reveals to the media specialist how recently these materials have been borrowed. The study of circulation took advantage of the ability of the spreadsheet to calculate using dates. A new column was created called "Years Since Last Circulation." In this column the number of days since the last circulation was calculated by by subtracting the circulation date from the date of data collection. This result was divided by 365 days to give the number of years and the result was rounded up to the next higher whole year. Thus an entry of 2 years in this new column means the last circulation occurred more than one year ago but less than two years. This column was then sorted and a count of the number of items which fell into each of the groups was recorded in Table V.

Table V
Time Since Last Circulation
Less than 1 year 35%
1 to 3 years 19%
3 to 5 years 13%
5 to 10 years 17%
More than 10 years 17%

This table shows that over one-third of the sample has circulated within the last year. By adding the first three percents, it can be seen that two-thirds of the sample has circulated within the last five years. The sample shows that the collection is both heavily used and widely used. If two-thirds of the collection has been in the last five years, the students, and the teachers who evaluate what the students have learned, must be finding the collection to be broadly useful. This should ,of course, be tempered by the earlier finding that high school students were relatively insensitive to the age of the materials which they were using (Mancall, 1979). Any weeding plan must recognize this fundamental usefulness and use of the collection.


The question about the condition of the collection has been postponed untill the other aspects of the collection were examined. It makes more sense to talk about repairs to books after we have some idea of what part of the collection will be removed by weeding. The analysis above suggests that a weeding cut off of materials published before 1961 might be appropriate. To see how such a cut off would affect the need for book repairs the pie chart "Condition of Collection by Age" was created. This chart shows that 90 % (68% published in 1961 or after plus 22% published before 1961) of the collection is in good condition while 10% (6% published in 1961 or after plus 4% published before 1961) is in need of repair or replacement.

As one might expect , the older books account for a disproportionate number of the books in poor condition. That is, books published before 1961 are 26% (22% good plus 4% poor) of the collection but they account for 40% (4% out of 10%) of all books in poor condition. The approximate size of the repair problem now becomes apparent. If all books published before 1961 were eliminated and all published after that date were kept, a fairly simplistic assumption, then 6% of the collection will need repairs. Assuming that the circulating collection is abour 9,000 volumes, 540 books will need to be sent out for repair. (Poor condition was defined as requiring repairs beyond in-library mending.) Rather than discarding all material prior to 1961, it is likely that some fraction will be saved while on the other hand some post 1960 materials in poor condition will be discarded. Balancing these two effects might suggest that a repair budget aimed at bringing all of the retained portion up to servicable condition should provide for the repair of 500 to 600 volumes. Given the labor involved in locating worn materials and preparing them to be sent out, this cost should probably be spread over several years.

In summary, the collection is mostly in good physical condition. Circulation figures show that much of the collection has been recently used. There is cause for concern about the age of some parts of the non-fiction collection and in a broader sense about the age of the collection in general. Between 10% and 20% of the collection is both old and not recently used, the exact figure depending on the how old and not recent are specified. These volumes are good candidates for weeding.


1. Establish a Special Task Force of teachers and administrators to evaluate collection areas identified as special problems.
These areas include the Dewey 500's and 600's and part of the 300's as identified above. The task force needs to evaluate materials for currency and relevance to the present curriculum. Most important, the task force should work with the media specialist in assembling a proposal to the School Board for special funding to update critical collection areas. This proposal should include both quantitative and qualitative assessments of the current collection areas.

2.Determine an initial cut-off point for weeding considerations. and establish a weeding plan
Identify, as candidates for weeding, all books over 20 years of age that have not circulated for over 10 years. Selecting this initial cut point is a reasonable first start for bringing this collection into line with reality. Data collected in this study indicates that books bearing these characteristics constitute 16% of the random sample. Given our confidence in the accuracy of the sample, roughly the same percentage of books will be identified in the entire collection. This is a number that can be justified to administrators and to the school community. Adult volunteers can perform this task quickly and easily over a reasonable period of time. Of the titles weeded it is predicted that 57% will be non-fiction, 31% fiction and about 11% will be drawn from the easy fiction.

3.Weed !!!
Outdated books, especially those in subject areas where currency is critical, must be replaced with accurate, up-to-date information. Removal of these volumes will make available much needed shelf space, thereby improving the students' browsing capability.

4. Once candidates have been identified and pulled from the shelves, call upon classroom teachers to examine the books to determine if the information contained in them needs to be replaced with more recent books, or if the subject is no longer covered in the present curriculum.
Teachers and the media specialist together can begin to select titles which support current curriculum implementation.

5. Discard obsolete materials according to district policy guidelines.
At this points the media specialist involved in this study is unaware of any existing guidelines at the district level. She intends to suggest the development of such guidelines with her current supervisor and principal, in concert with building-level teaching colleagues.

6. Plan for ongoing collection evaluation and development .
Consult regularly with classroom teachers about curriculum changes and needs. Attend grade-level meetings. Learn what kinds of materials teachers prefer to have available in the Library Media Center: print or nonprint formats, subject-specific periodicals, games or other realia, etc.

7. Look into the availability of inter-library loan services, both intra-district and through the local public library network.
Use ILL to meet the need for current materials in subjects that are not well represented in the this time.

Weeding is not an activity performed in isolation. It is a professional response to the changing needs of users as reflected by the collection As such, it is incumbent upon school library media specialists to become as knowledgeable about the approaches to the deselection of materials as they are to their selection. A systematic approach, such as the one suggested in this study, can provide the necessary evidence for aggressive weeding action.


1. Doll, Carol A. and Pamela Petrick Barron (1991) Collection Analysis for the School Library Media Center: A Practical Approach, Chicago, American Library Association

2. Drott, M. Carl (1969) "Random Sampling: a Tool for Library Research", College and Research Libraries v 30 n 2 p119-125

3. Drott, M. Carl, (1992) Dr. Drott's Random Sampler: A Manual and Program for Automated Random Sampling in Libraries, in preparation.

4. Mancall, Jacqueline C. and M. Carl Drott (1979) "Materials Used by High School Students in Preparing Independent Study Projects: A Bibliometric Approach." Library Research 1 p223-236.

5. Slote, Stanley (1982) Weeding Library Collections. 2nd ed. Littleton,CO:Libraries Unlimited.


1. Accession or acquisition numbers can indicate when materials were purchased and integrated into a collection. Since the Media Specialist wants to determine the age of the materials themselves , copyright dates will serve this purpose. The problem with copyright dates is that they date the edition, not the work. With nonfiction this is not much of a problem, but for fiction materials, especially classics, copyright dates may be far different from the date of the work.

2. Obviously the dates collected were due dates -- two weeks later than circulation dates. Since the concern here is for measuring in terms of years this difference has no practical effect.

This page is at:

This page is at: