Definition of Collections, Standards and Procedures for Retrospective Digitisation

Trix Bakker

This paper is the result of (literature) research and expertise acquired by digitisation projects in the Koninklijke Bibliotheek (KB). The project described is part of the DELTA project, a joint project of Dutch university libraries together with Pica, which focuses on integration of existing and new local and central services into one integrated end-user service. Aim of this (sub)project is to establish a common working procedure within each library and between libraries to improve interoperability, exchange of expertise and to provide the basis for combining dispersed collections in a virtual digital library. During the project selection criteria were developed, five scholarly core journals of international reputation and use were selected, an organisation model was developed, costs were estimated, and decided was to work according to a simple hybrid model e.g. microfilming and digitisation of articles: the backfiles will be scanned in 400-600 dpi and made available in PDF, and the tables of contents in HTML. The articles will be catalogued in the central Pica database and are accessible via the Online Contents (OLC) database and PiCarta, an integrated, multimaterial database offering access to online resources and electronic documents. Copyright will be cleared individually with the Dutch publishers. Part of the project is to develop draft license agreements with different kind of publishers. Archiving will be integrated into the DNEP (Deposit of Dutch Electronic Publications) service of the Koninklijke Bibliotheek.

THE DELTA PROJECT

A group of Dutch University libraries together with Pica, the centre for library automation & online information services decided last year to develop a new generation of integrated services for endusers to enable them to access and use a Virtual Research Library. For this purpose they established a consortium, called DELTA (Dutch Electronic Library Technology Association) and started a joint project, with a window of 3 years (1998-2001), for extension of enduser services. The project will focus on integration of existing and new local and central services into one integrated enduser service. The libraries will, as always, concentrate on content, while Pica will concentrate on the technical infrastructures.

The primary goal of DELTA is the implementation of a Virtual Research Library via an integrated package of enduser services for the use of resources selected and offered by research libraries, using state-of-the-art web technology combined with existing library infrastructures.

To reach this goal the project will focus on:

The DELTA project as a whole consists of 16 subprojects. This paper has to do with two subprojects on retrospective digitisation (Selection, and Standards and procedures), which proved to be so intertwined that they were combined during the development phase. Selection is depending on standards and procedures to make judgements concerning the format and nature of the proposed digital product and how it will be described, delivered and archived. The items ‘archiving’ and ‘publishing’ are not explicitly mentioned in the overall Delta project plan, and will also be given attention during the project. Digital storage and retrieval technologies and industries are not and will not be for the foreseeable future sufficiently stable to ensure adequate preservation and conservation of electronic data. This should of course not prohibit libraries from starting and continuing digitisation projects. As long as open, flexible standards are used, the information can be reformatted later and adapted to new emerging technologies. However, before I describe the development of a common working procedure for retrospective digitisation, I’ll give a short impression about current preservation programmes in the Netherlands.

The Current Situation

Although many libraries are digitising small parts of their collections, wide access to complete digitised items is still far away. There is a significant difference in the level of expertise and experience. However, this is changing, especially by the initiative of the Netherlands Ministry of Education, Culture and Science of Metamorfoze, co-ordinated by the National Preservation Office of the Netherlands (BCB) of the Koninklijke Bibliotheek. Metamorfoze1 is a national programme for preservation of library material, which focuses on the preservation of manuscripts, books, newspapers and periodicals of Dutch origin from the 1840-1950 period in libraries with a preservation function. This material, which is an important part of our national cultural heritage, is threatened by the internal decay of paper. Libraries in the Netherlands are responsible themselves for the preservation of their collections, but a considerable part of the costs is subsidised by the Ministry through the Metamorfoze programme.

Metamorfoze will provide for a threefold approach in the years 1997-2000:

The focal points in the first four years of Metamorfoze are registration, filming and reliable storage. Every document to be preserved is catalogued in the shared automated cataloguing system of Pica. The original document is then transferred to microfilm. After filming, the original is stored in acid-free sleeves or boxes, under optimum conditions. To prevent a printed document from being filmed more than once, central registration of the microfilms takes place in the EROMM-database2, a European system for the input and retrieval of descriptions of microforms, managed by the University Library of Göttingen, Germany. The next logical step is to improve access to these collections by digitising them, preferably making use of the produced microfilms. Within the scope of the project it’s intended to digitise in collaboration with other libraries parts of the literary collections, or the Dutch book production 1840-1950 to create improved access for the user.

The report of the project Digitisation of Microfilms3, which is carried out in the framework of Metamorfoze, became recently available and this summer a comparative study of direct scanning and using colour microfilm will be finished. The Delta project can build on the results of these studies, and from best practices and guidelines from other projects in the field.

The Ministry also gave the impetus to investigate the digital infrastructure of the Dutch cultural inheritance. This research is carried out by the Scientific Technical Council (WTR), the main advisory board of SURF (Foundation for University Computing Facilities) and resulted in the rapport Alles uit de kast [= Everything out of the closet] (1998)4. This report gives an outline of a national plan for digitisation of cultural heritage collections. These studies form the basis for setting up a joint program for a virtual library.

DELTA SUBPROJECT 1 & 2: DEFINITION OF COLLECTIONS, STANDARDS AND PROCEDURES FOR RETROSPECTIVE DIGITISATION

At this moment most of the larger libraries in the Netherlands are starting to digitise parts of their collections. For the larger part, special collections and image collections are most popular to make available on the Internet. Collaboration between libraries in digitisation does not yet occur. As a consequence, a common or co-ordinated approach to digitisation is lacking. The main aim of this (sub)project was to develop a working procedure for retrospective digitisation, which can be used in each library and between libraries. This means defining standards and logistic procedures for digitisation of the collections of the Delta Libraries. A common approach for digitisation will improve the efficiency of production and maintenance and the mutual use of each other’s digitised collections. A common approach will improve interoperability, exchange of expertise and it will provide the basis for combining dispersed collections in a virtual digital library. Studying best practices in the field can help to find the best way to digitise collections. The Koninklijke Bibliotheek has drawn up a document about procedures, describing its guidelines for digitisation, which will be used for DELTA as well5. It provides details on hard- and software, formats, compression ratios, storage etc. However, before any decision can be taken with regard to technical standards and techniques, collections are to be selected for digitisation. In the first phase of the project selection criteria were developed. Of course some criteria are similar for selection of printed documents (scientific relevance, scholarly quality, frequency of use, suited to education and research, in tune with the collection profile etc.), but selection for digitisation has to be framed in a much broader context than the local collections. It must be seen in a more holistic, co-operative way, such as selection by discipline (the core literature, based on priorities set by many scholars in that discipline), geography (country or region), genre, chronological period, agency of publication, language, and various combinations of these. This project can be seen as a national co-operative plan or model for selection of documents for preservation and for provision of enhanced access through digital conversion.

Assumptions and Preconditions

It is assumed that all DELTA-libraries will participate in Selection; other academic libraries can be invited to participate, in case of missing items, etc.

It is also assumed that the potential user group will be substantial and the use of the material will increase, because digitisation

Within the Virtual Library Netherlands the common way to provide access is via the existing access tools of the Netherlands Central Catalogue (NCC), the Online Contents (OLC) database6, and PiCarta7, an integrated, multimaterial database offering access to online resources and electronic documents. The same way of access is assumed and will be used for this project.

It’s a precondition that in case of copyright on the selected materials this can be cleared by the publisher. If permissions can be secured, the work can move ahead. If permissions are not forthcoming for copyright sources, however, the materials cannot be reproduced and the focus of the project must change.

The Selection Process

Selection for digitisation is much more complicated than selecting printed material, CD-ROM’s or online databases. The conversion of textual, visual, and numeric information to electronic form, from preparation and conversion to presentation and archiving encompasses a range of procedures and technologies with many varying implications and costs. So, it’s very important to know why, what and how you want to digitise.

Why:

What:

How:

The project group suggested three options for a digitisation project to the DELTA Steering Committee, which could be of interest for the research community:

  1. to digitise a number of complete runs of historically interesting scholarly Dutch journals with JSTOR as model8;

  2. to digitise text corpora, for example interesting material of the second part of the golden age: documents of well-known scholars in the Dutch republic: Christiaan Huygens (1629-1695), Herman Boerhaave (16681738), etc.;

  3. to digitise Dutch bibliographies to enrich the indexing for retrieval.

The Steering Committee opted at 10 November 1998 for the first item, e.g. to select a few authoritative Dutch journals for digitisation „à la JSTOR”, and which can also be of interest for JSTOR.

JSTOR AS MODEL?

Instead of actually making contact with JSTOR, a study was made of the JSTOR project9 to see if JSTOR could be used as guidance. As a result it was decided, for different reasons, not to follow their procedures. The overall aim of JSTOR is to serve the academic community by building a reliable and comprehensive electronic archive of core scholarly journals, thereby reducing long-term capital and operating costs of libraries (storage and care of journal collections), addressing preservation issues (such as mutilated pages and longterm deterioration of paper copy), and improving access to the literature (by assisting publishers in making transition to electronic modes of publication). The way JSTOR works with publishers, the saving in space (and in capital costs associated with that space) argument, their choices of technology, and the organisation will be different in the Netherlands:

THE PROCEDURE

  1. Identifying appropriate criteria for selection of journals of potential national interest.

  2. Making an inventory of scholarly core journals, which meet these criteria.

  3. Contacting the publisher(s) and to come to an agreement which is of interest for both parties.

ad 1) Criteria for selection of journals Dutch origin:

1. Criteria concerning content:

2. External factors:

ad 2) According to those criteria subject specialist of participating libraries made a list of titles, from which 5 titles where selected by the project group and approved of by the Steering Committee:

  1. Oud-Holland. –Vol. 1 (1883)-… Rijksbureau voor Kunsthistorische Documentatie, 1986-… 4 x per year.
  2. Sociologia neerlandica. – Vol. 1 (1962/63)-… Van Gorcum, 1989-… 2 x per year.
  3. Bijdragen tot de taal-, land- en volkenkunde. – Dl. 1 (1853)-… KITLV, 1989-… 4 x per year.
  4. Economische Statistische Berichten. - Vol. 1, (1916) -... Stichting het Nederlandsch Economisch Instituut, 1959-… weekly.
  5. De economist. – Vol. 1 (1853)-…Stenfert Kroese, 1975-… 4 x per year. Online resource since 1997-...Kluwer Academic Publishers.

Those titles sound very Dutch, but at least part of the articles are in English or in another language.

Simple Hybrid Model

On account of a cost-benefit analysis of the relationship between functionality, demand, and expense, we decided to start with a simple hybrid model: creating preservation-standard microfilms and scanning articles for digital access purposes. In this way we hope to make it more acceptable for funding agencies.

The amount of physical preparation and control work that is needed for every digital project is rather large. Most of the costs occur before the item is laid on the scanner. Part of that cost is the physical preparation of, research into, and description of the item.

The Workflow:

The advantage is the high realisation (feasibility) value of a project like this. The disadvantage is however that the result will be a limited surplus value compared with the printed version: every authorised user can print the articles (s)he needs from wherever her/his workplace is located, but the articles itself cannot be (cross) searched. Only the online contents can be searched.

Estimate of the Costs, Independent of the Journals Selected

Microfilming:
Preparation
→ microfilming
→ cost microfilm
→ control = ƒ 0,70 (double page) per photo11.

Digitising:
Digitising, inclusive preparation, ca. 400/600 dpi
→ keying contents
→ cost CD-ROM
→ quality control
→ storage on server = ƒ 2,00 per image.

Microfilming + scanning = ƒ 2,35 per page.
The variable is the description of the articles. Some articles are 30 pages while other are only 3 pages (ƒ 2,10 : 30 = ƒ 0,07 per page; ƒ 2,10 : 3 = ƒ 0,70 per page).

Other costs:
Copyright clearance
[Acquisition server ca. ƒ 200.000,-]
Project management: 15% a 20 %

Estimated number of pages to be filmed/scanned and costs of the journals

1. Oud-Holland.

2. Sociologia neerlandica.

3. Bijdragen tot de taal-, land- en volkenkunde.

4. Economische Statistische Berichte.

5. De economist.

Total costs: ca. ƒ 675.000,-
BTW 17,5 % ƒ 118.125,-
Projectmanagement ca. 15%ƒ 118.969,-
Total 5 journals>ƒ 912.094,-
  
Without Economische Statistische Berichten: ca. ƒ 450.000,-
BTW 17,5 %ƒ 78.750,-
Projectmanagement ca. 15% ƒ 71.813,-
Total 4 journalsƒ 550.563,-

It is assumed that funds will be recruited for digitising the journal by the coordinating bureau. It is expected that it will be a 50-50% financing (the Dutch government and the participating libraries).

Organisation

A central co-ordinating bureau will be established which will offer technical, organising and juridical expertise. The bureau will draft the guidelines and contracts, and will offer support during the whole process. The Koninklijke Bibliotheek is willing to take this role.

Each of the Delta-partners will be responsible for the microfilming and digitising of a journal according to the drafted criteria and guidelines. Each Delta-partner has to decide to film and/or digitise in-house or to contract it out to a commercial business. Each partner is responsible to submit quotations to concerns, for co-ordination of the work and control of the files. Each Delta-partner will be responsible for archiving their own share; this will be linked up with the DNEP service of the Koninklijke Bibliotheek.

SCHEME OF THE ORGANISATION:

Recommendations

To get a better insight in the whole process and real costs it is recommended to start with a pilot with one journal of one publisher. The Steering Committee DELTA will decide which of the 5 opted journals it will be, which library will execute this and how the costs will be spread. The Koninklijke Bibliotheek is willing to start with Oud-Holland.

To Conclude

This paper is not only a description of the development of a project plan, but can also be seen as an example of the rather complicated process of selection for digitisation. A collection manager no longer can be seen as solely responsible for the collections and the process of collection management, but must at least have some knowledge of technical standards and procedures for digitisation, has to be concerned with matters such as copyright and licensing, the potential costs of gaining the necessary permissions etc. Or, to quote Demas: „Through an intellectual sound process involving hundreds of scholars in each discipline, we can identify a critical mass of content which is clearly of enduring significance to scholarship. We can then work with our colleagues in preservation, access, technology, law, and publishing to find and implement the best format in which to deliver and preserve these qualitatively selected bodies of literature.”12

REFERENCES

1.English leaflet: http://www.konbib.nl/metamorfoze/publiciteit/folder_en.htm
2.EROMM-database website: http://www.brzn.de/eromm/gbvero-e.htm
3.Ligthart, Anita. Digitalisering van microvormen: onderzoek naar de verschillende aspecten van de conversie van microvormen naar een digitale omgeving: een omgevingsverkenning. Den Haag: Koninklijke Bibliotheek, 1999. 35 p.
4.Adriaans, W. [et al.] red. Alles uit de kast: op weg naar een nationaal investeringsprogramma digitale infrastructuur cultureel erfgoed. Wetenschappelijke Technische Raad SURF, september 1998, 79 p.
5.Digital Collections KB: Policy Plan for Digitisation of Collections from the Koninklijke Bibliotheek (http://www.konbib.nl/kb/sbo/digi/digplnen.htm)
6.The Dutch union catalogue and Online Contents database: http://www.pica.nl/en/products/nccibl/index.shtml.
7.PiCarta: http://www.pica.nl/en/products/picarta/index.shtml.
8.JSTOR is an not-for-profit organisation established in 1995 with the assistance of The Andrew W. Mellon Foundation with the objective to help libraries address the growing and persistent space problems. JSTOR is also the Journal Storage Project, created to provide access to the full text of articles of 117 titles primarily in the social sciences and the humanities by the end of 1999. JSTOR is beginning to archive titles in General Science and Ecology/Botany clusters, owing to the importance of the historical literature. Over the next three years JSTOR envisions to establish collections in Business, Medicine, Education, and Art History.
9.The JSTOR web site (http://www.jstor.org/) and articles about JSTOR.
10.The journals will, however, be microfilmed from cover to cover for preservation/conservation reasons. Besides it provides the opportunity to digitise in a later stadium - in case of extra funding - the remaining items.
11.One Dutch guilder (ƒ1,00) is about half a dollar.
12.Demas, Samuel. What will collection development do? Collection management, 22 (1998) 3/4, p. 158.

Selection of Collections

Adriaans, W. [et al.] red. Alles uit de kast: op weg naar een nationaal investeringsprogramma digitale infrastructuur cultureel erfgoed. Wetenschappelijke Technische Raad SURF, September 1998, 79 p.

Bowen, William G. JSTOR and the economics of scholarly communication. The Andrew W. Mellon Foundation journal storage project. (http://www.mellon.org/jsesc.html)

De Gennaro, R. JSTOR: The Andrew W. Mellon Foundation‘s journal storage project. Publications of Essen University Library 21 (1997), pp. 223-230.

Demas, Samuel. What will collection development do? Collection management, 22 (1998) 3/4, pp. 151-159. Also published in: Going digital: strategies for access, preservation, and conversion of collections to a digital format. Haworth Press, Inc., 1998, pp. 151-159.

Guthrie, Kevin M. JSTOR: the development of a cost-driven, value-based pricing model. Scholarly communication and technology. Conference organized by The Andrew W. Mellon Foundation at Emory University April 24-25, 1997. (<http://www.arl.org/scomm/scat/guthries.html>).

Guthrie, Kevin M. JSTOR: from project to independent organization. D-Lib magazine, July/August 1997. http://www.dlib.org/dlib/july97/07guthrie.html)

Hazen, Dan, Jeffrey Horrell, Jan Merill-Oldham. Selecting research collections for digitization. Washington, D.C: Council on Library and Information Resources, August 1998. 19 p.

Ligthart, Anita. Digitalisering van microvormen: onderzoek naar de verschillende aspecten van de conversie van microvormen naar een digitale omgeving: een omgevingsverkenning. Den Haag: Koninklijke Bibliotheek, 1999. 35 p.

JSTORnews (http://www.jstor.org/news/).

Ostrow, Stephen E. What to digitise: the questions to ask. Digitizing historical pictorial collections for the Internet. Washington, D.C.: Council on Library and Information Resources, 1998, pp. 26-27.

Parry, David. Virtually new, creating the digital collection: a review of digitalisation projects in local authority libraries & archives. Final report to the library & information commission. London: Library and Information Commission, 1998. 129 p. (http://www.ukoln.ac.uk/services/lic/digitisation/)

Selecting library and archive collections for digital reformatting: proceedings from an RLG symposium held November 5-6, 1995 in Washington, DC. Mountain View, Calif : Research Libraries Group, 1996. x, 170 p.

Sully, Sarah E. JSTOR: an IP practitioner’s perspective. D-Lib magazine, January 1997. (http://www.dlib.org/dlib/january97/01sully.html)

Standards and Procedures

Arms, Caroline R. Historical collections for the National Digital Library: lessons and challenges at the Library of Congress. D-Lib Magazine, April 1996 (http://www.dlib.org/dlib/april96/loc/04c-arms.html) and May 1996. (http://www.dlib.org/dlib/may96/loc/05c-arms.html).

Digitalisierung von Archiv- und Bibliotheksgut. Deutsche Forschungsgemeinschaft: Projekt der Landesarchivdirektion Baden-Württemberg. (http://www.lad-bw.de/digpro/digpro.htm)

Kenney, Anne R. and Stephen Chapman. Digital resolution requirements for replacing text-based material: methods for benchmarking image quality: tutorial. Washington: The Commission on Preservation and Access, 1995. 22 p.

The making of America II testbed project white paper. Version 2.0 (September 15, 1998). (http://sunsite.berkeley.edu/moa2/wp-v2.html)

McClung, Patricia A. (ed.). RLG digital image access project: proceedings from an RLG symposium held March 31 and April 1, 1995 Palo Alto, California. Mountain View: Research Libraries Group, 1995. viii, 104 p.

Quality Review of Document Images. Internal Training Guide, National Digital Library Program, Library of Congress, September 1996.

Reilly, James M. and Franciska S. Frey. Recommendations for the evaluation of digital images produced from photographic, microphotographic, and various paper formats. Report to the Library of Congress, National Digital Library Project. May 1996. (http://lcweb2.loc.gov/ammem/lpireprt.pdf)

Task Force on archiving of digital information. Preserving digital information: final report and Recommendations. Washington, D.C.: Commission on Preservation and Access. (http://www.rlg.org/ArchTF/)

Technical notes on formats for digital reproductions. American Memory. (http://lcweb2.loc.gov/ammem/award/html/technical_notes1.html)






Trixi Bakker
Koninklijke Bibliotheek
National Library of the Netherlands
PO Box 90407
2509 LKThe Hague, Netherlands
trix.bakker@konbib.nl




LIBER Quarterly, Volume 9 (1999), 305-322, No. 3