Definition of Collections, Standards and Procedures for Retrospective Digitisation

DigiZeitschriften e.V. gewährt ein nicht exklusives, nicht übertragbares, persönliches und beschränktes Recht auf Nutzung dieses Dokuments. Dieses Dokument ist ausschließlich für den persönlichen, nicht kommerziellen Gebrauch bestimmt. Das Copyright bleibt bei den Herausgebern oder sonstigen Rechteinhabern. Als Nutzer sind Sie sind nicht dazu berechtigt, eine Lizenz zu übertragen, zu transferieren oder an Dritte weiter zu geben. Die Nutzung stellt keine Übertragung des Eigentumsrechts an diesem Dokument dar und gilt vorbehaltlich der folgenden Einschränkungen: Sie müssen auf sämtlichen Kopien dieses Dokuments alle Urheberrechtshinweise und sonstigen Hinweise auf gesetzlichen Schutz beibehalten; und Sie dürfen dieses Dokument nicht in irgend einer Weise abändern, noch dürfen Sie dieses Dokument für öffentliche oder kommerzielle Zwecke vervielfältigen, öffentlich ausstellen, aufführen, vertreiben oder anderweitig nutzen; es sei denn, es liegt Ihnen eine schriftliche Genehmigung von DigiZeitschriften e.V. und vom Herausgeber oder sonstigen Rechteinhaber vor. Mit dem Gebrauch von DigiZeitschriften e.V. und der Verwendung dieses Dokuments erkennen Sie die Nutzungsbedingungen an.

and started a joint project, with a window of 3 years (1998-2001), for extension of enduser services.The project will focus on integration of existing and new local and central services into one integrated enduser service.The libraries will, as always, concentrate on content, while Pica will concentrate on the technical infrastructures.
The primary goal of DELTA is the implementation of a Virtual Research Library via an integrated package of enduser services for the use of resources selected and offered by research libraries, using state-of-the-art web technology combined with existing library infrastructures.
To reach this goal the project will focus on: • ongoing co-operative selection of relevant content (for digitising, licensing etc.); • ongoing monitoring of user behaviour to improve services; • development of an integrated enduser oriented information search, order and access facility; • development of an underlying integrated technical infrastructure; • development of a clear, multilingual graphical enduser interface.
The DELTA project as a whole consists of 16 subprojects.This paper has to do with two subprojects on retrospective digitisation (Selection, and Standards and procedures), which proved to be so intertwined that they were combined during the development phase.Selection is depending on standards and procedures to make judgements concerning the format and nature of the proposed digital product and how it will be described, delivered and archived.The items 'archiving' and 'publishing' are not explicitly mentioned in the overall Delta project plan, and will also be given attention during the project.Digital storage and retrieval technologies and industries are not and will not be for the foreseeable future sufficiently stable to ensure adequate preservation and conservation of electronic data.This should of course not prohibit libraries from starting and continuing digitisation projects.As long as open, flexible standards are used, the information can be reformatted later and adapted to new emerging technologies.However, before I describe the development of a common working procedure for retrospective digitisation, I'll give a short impression about current preservation programmes in the Netherlands.

The Current Situation
Although many libraries are digitising small parts of their collections, wide access to complete digitised items is still far away.There is a significant difference in the level of expertise and experience.However, this is changing, especially by the initiative of the Netherlands Ministry of Education, Culture and Science of Metamorfoze, co-ordinated by the National Preservation Office of the Netherlands (BCB) of the Koninklijke Bibliotheek.Metamorfoze 1 is a national programme for preservation of library material, which focuses on the preservation of manuscripts, books, newspapers and periodicals of Dutch origin from the 1840-1950 period in libraries with a preservation function.This material, which is an important part of our national cultural heritage, is threatened by the internal decay of paper.Libraries in the Netherlands are responsible themselves for the preservation of their collections, but a considerable part of the costs is subsidised by the Ministry through the Metamorfoze programme.
Metamorfoze will provide for a threefold approach in the years 1997-2000: • Literary collections by and about Dutch authors: 20% of all literary collections.
• Dutch book production of the years 1870-1899, because the paper condition of this period is worst of all: a total of some 66,000 books.
• Newspapers: national dailies from the post-1869 period and the Dutch newspapers from World War II.
The focal points in the first four years of Metamorfoze are registration, filming and reliable storage.Every document to be preserved is catalogued in the shared automated cataloguing system of Pica.The original document is then transferred to microfilm.After filming, the original is stored in acid-free sleeves or boxes, under optimum conditions.To prevent a printed document from being filmed more than once, central registration of the microfilms takes place in the EROMM-database 2 , a European system for the input and retrieval of descriptions of microforms, managed by the University Library of Göttingen, Germany.The next logical step is to improve access to these collections by digitising them, preferably making use of the produced microfilms.Within the scope of the project it's intended to digitise in collaboration with other libraries parts of the literary collections, or the Dutch book production 1840-1950 to create improved access for the user.
The report of the project Digitisation of Microfilms 3 , which is carried out in the framework of Metamorfoze, became recently available and this summer a comparative study of direct scanning and using colour microfilm will be finished.The Delta project can build on the results of these studies, and from best practices and guidelines from other projects in the field.
The Ministry also gave the impetus to investigate the digital infrastructure of the Dutch cultural inheritance.This research is carried out by the Scientific Technical Council (WTR), the main advisory board of SURF (Foundation for University Computing Facilities) and resulted in the rapport Alles uit de kast [= Everything out of the closet] (1998) 4 .This report gives an outline of a national plan for digitisation of cultural heritage collections.These studies form the basis for setting up a joint program for a virtual library.
DELTA SUBPROJECT 1 & 2: DEFINITION OF COLLECTIONS, STANDARDS AND PROCEDURES FOR RETROSPECTIVE DIGITISATION At this moment most of the larger libraries in the Netherlands are starting to digitise parts of their collections.For the larger part, special collections and image collections are most popular to make available on the Internet.Collaboration between libraries in digitisation does not yet occur.As a consequence, a common or co-ordinated approach to digitisation is lacking.The main aim of this (sub)project was to develop a working procedure for retrospective digitisation, which can be used in each library and between libraries.This means defining standards and logistic procedures for digitisation of the collections of the Delta Libraries.A common approach for digitisation will improve the efficiency of production and maintenance and the mutual use of each other's digitised collections.A common approach will improve interoperability, exchange of expertise and it will provide the basis for combining dispersed collections in a virtual digital library.Studying best practices in the field can help to find the best way to digitise collections.The Koninklijke Bibliotheek has drawn up a document about procedures, describing its guidelines for digitisation, which will be used for DELTA as well 5 .It provides details on hard-and software, formats, compression ratios, storage etc.However, before any decision can be taken with regard to technical standards and techniques, collections are to be selected for digitisation.In the first phase of the project selection criteria were developed.Of course some criteria are similar for selection of printed documents (scientific relevance, scholarly quality, frequency of use, suited to education and research, in tune with the collection profile etc.), but selection for digitisation has to be framed in a much broader context than the local collections.It must be seen in a more holistic, co-operative way, such as selection by discipline (the core literature, based on priorities set by many scholars in that discipline), geography (country or region), genre, chronological period, agency of publication, language, and various combinations of these.This project can be seen as a national co-operative plan or model for selection of documents for preservation and for provision of enhanced access through digital conversion.

Assumptions and Preconditions
It is assumed that all DELTA-libraries will participate in Selection; other academic libraries can be invited to participate, in case of missing items, etc.
It is also assumed that the potential user group will be substantial and the use of the material will increase, because digitisation • offers broader access which can facilitate more active scholarship, • can enhance the intellectual value of the material, • can stimulate the use of older material in education and research, • and offers better search facilities.
Within the Virtual Library Netherlands the common way to provide access is via the existing access tools of the Netherlands Central Catalogue (NCC), the Online Contents (OLC) database 6 , and PiCarta 7 , an integrated, multimaterial database offering access to online resources and electronic documents.The same way of access is assumed and will be used for this project.
It's a precondition that in case of copyright on the selected materials this can be cleared by the publisher.If permissions can be secured, the work can move ahead.If permissions are not forthcoming for copyright sources, however, the materials cannot be reproduced and the focus of the project must change.

The Selection Process
Selection for digitisation is much more complicated than selecting printed material, CD-ROM's or online databases.The conversion of textual, visual, and numeric information to electronic form, from preparation and conversion to presentation and archiving encompasses a range of procedures and technologies with many varying implications and costs.So, it's very important to know why, what and how you want to digitise.

Why:
• Digital surrogates can provide improved access to information.They can make the remote accessible, the hard to see visible, unite a dispersed collection to a coherent whole, and bring together research materials that are scattered world-wide.
• Digital resources can lead to new scholarly use.The richness of special collections as research tools lies among others in the representation of an event or phenomenon in many different formats.Co-ordinated collection building is very important, because only a critical mass of research resources gives a surplus value.
• Digital resources can better direct students to scholarly resources and aid in developing information literacy.
• Flexibility or plasticity of the digitised material.This can, however, give problems with the authenticity and integrity of the material.Marking images and time-stamping them can be seen as a possible solution. What: • Decisions about what to digitise need to address the intellectual value of the original sources: • Will digitisation enhance the intellectual value?Scholarship can be facilitated when texts are made fully searchable by rekeying them or by employing OCR software.Digitised ephemera for example from different repositories will facilitate the combinations and comparisons that are otherwise impossible by the fragility, value and dispersion of the original images.
• Copyright assessments play a defining role with regard to digitising projects: • Are the materials protected by copyright?
• Must the rights be negotiated with the copyright holder?
• What about distribution rights for the digital products?
• What are the potential costs of gaining those permissions?
• The physical nature, size and condition of source materials affect the characteristics of the desired product.Decisions to digitise must therefore address whether available means of conversion can satisfy expectations for the result.
• Which qualities are considered essential: high-resolution copies, accurate rendition of colours etc.? • Can the original sources withstand the digitisation process?
• Will digitisation increase the utility?
• Will it enable new kinds of teaching or research?
• The types and levels of use can help to shape priorities, but intensive use does not automatically make a collection a good candidate for digitising: • Will improved access lead to higher use of digitised material?
• Will digitisation c.q. broader access create a new audience?
• The way of use of the source materials: • Will digitised material permit new kinds of use and more sophisticated types of analysis?• What approach to digitisation will facilitate the use, increase the utility, and new kinds of teaching or research? How: • The technology to be used depends upon the desired resolution of the copies, the nature of the source materials (colour, black and white, or shades of grey), and the future use of the images when making decisions about the parameters for image capture.
• Some projects may be done most economically if they are contracted out.This means that technical conditions, performance expectations, and handling guidelines must be specified in the agreements with external vendors to define ownership and distribution rights for the digital products.
• Beforehand it has to be decided how the digital file will be described (metadata), delivered (immediate access, remote access, authorisation, pricing and billing procedures) and retained (long-term preservation intentions).
The project group suggested three options for a digitisation project to the DELTA Steering Committee, which could be of interest for the research community: 1. to digitise a number of complete runs of historically interesting scholarly Dutch journals with JSTOR as model 8 ; 2. to digitise text corpora, for example interesting material of the second part of the golden age: documents of well-known scholars in the Dutch republic: Christiaan Huygens (1629-1695), Herman Boerhaave (1668-1738), etc.; 3. to digitise Dutch bibliographies to enrich the indexing for retrieval.
The Steering Committee opted at 10 November 1998 for the first item, e.g. to select a few authoritative Dutch journals for digitisation "à la JSTOR", and which can also be of interest for JSTOR.

JSTOR AS MODEL?
Instead of actually making contact with JSTOR, a study was made of the JSTOR project 9 to see if JSTOR could be used as guidance.As a result it was decided, for different reasons, not to follow their procedures.The overall aim of JSTOR is to serve the academic community by building a reliable and comprehensive electronic archive of core scholarly journals, thereby reducing long-term capital and operating costs of libraries (storage and care of journal collections), addressing preservation issues (such as mutilated pages and longterm deterioration of paper copy), and improving access to the literature (by assisting publishers in making transition to electronic modes of publication).The way JSTOR works with publishers, the saving in space (and in capital costs associated with that space) argument, their choices of technology, and the organisation will be different in the Netherlands: • Publishers: JSTOR obtains the physical copies of the titles directly from the publishers, libraries have just a complementary role in the case of missing or damaged issues.In the Netherlands all the copies will be obtained from the libraries.
• Archive: JSTOR's backfile archive consists of the complete runs of journals, digitised cover to cover.In the Netherlands we opted to preserve only the articles out of feasibility reasons 10 .
• Copyright: the publisher holds the copyright of the digitised images, while JSTOR has a permanent -but non-exclusive -license on the digitised archive to make it available to the scholarly community.The academic institutions are offered site licenses permitting access to the archive to faculty, staff and students registered with the institutions.In the Netherlands the publisher has no rights on the -by a third party-digitised articles.It has to be explored if the publisher has all the copyrights of the printed material or not.The publisher could play the role of clearing house.It has to be investigated in a pilot study if the publisher is willing to give protection against claims on copyright on the digitised articles against payment of a certain amount per article.
• Technology: JSTOR's choices of technology have flowed directly from its mission.They have chosen to scan the backfiles, combine the scanned images with an OCR created text file, and make them centrally available to libraries and scholars via the Internet, with a few copies at mirror sites.
The images scanned at 600 dpi -faithful replications of previously published materials -are used for display and for printing, and text files are used for searching.This is an expensive enterprise, and is financially not feasible in the Netherlands, because there are only 15 academic libraries, while in JSTOR participate nearly 300 institutes and libraries.
• Organisation: JSTOR is a non-profit organisation with a production staff of 21 members who ensure that the addition of new material continues in a timely and accurate fashion.In the Netherlands is opted for a central, co-ordinating bureau offering technical, organising and juridical expertise.
The co-ordinating bureau drafts the guidelines and contracts, and offers support during the whole process.The participating libraries are responsible for the material they want to digitise.This means that all the work is being done by the libraries themselves.

THE PROCEDURE
1. Identifying appropriate criteria for selection of journals of potential national interest.2. Making an inventory of scholarly core journals, which meet these criteria.3. Contacting the publisher(s) and to come to an agreement which is of interest for both parties.
ad 1) Criteria for selection of journals Dutch origin: 1. Criteria concerning content: • international reputation and use, • preferably polyglot or in another language, • relatively high use, broad user group, • no reference journals, annuals or annual reports.

External factors:
• older than 25 years and still running, • no reprint available, • different types of journals (illustrated, colour and/or black and white, only text), • different kinds of publishers (big commercial publisher, university press, scholarly associations).Online resource since 1997-...Kluwer Academic Publishers.
Those titles sound very Dutch, but at least part of the articles are in English or in another language.

Simple Hybrid Model
On account of a cost-benefit analysis of the relationship between functionality, demand, and expense, we decided to start with a simple hybrid model: creating preservation-standard microfilms and scanning articles for digital access purposes.In this way we hope to make it more acceptable for funding agencies.
The amount of physical preparation and control work that is needed for every digital project is rather large.Most of the costs occur before the item is laid on the scanner.Part of that cost is the physical preparation of, research into, and description of the item.
The Workflow: • The journal has to be taken out of stock; • controlled on completeness, and missing items/pages must be supplied (if necessary from another library); • registered for digitisation in the catalogue (optional); • technical conditions, performance expectations, and handling guidelines for contractors (400/600 dpi, dependent on format of the journal/ typeface) have to be specified; • the microfilm is seen as an intermedium (microfilming will be done inhouse or by a commercial business; cover to cover because of the preservation purpose, and with archive quality); • a copy has to be made of the microfilm master, which will be used for digitising, while the master film will be stored; • the microfilm will be digitised (in-house or contracted out to a commercial business); • only the articles (no editorials, reviews, advertisements, news items etc.); • the contents will be keyed for indexing purposes; • the table of contents file (in HTML or XML format) and the page images (in PDF-format) are downloaded to CD-ROM (in case of outsourcing); • the files will be stored on the local servers of the participating libraries and on the server of the Koninklijke Bibliotheek in the context of its DNEP (Deposit of Dutch Electronic Publications) service; • the articles will be registered in the central catalogue and the OLC database, and will be accessible via OLC and PiCarta.
The advantage is the high realisation (feasibility) value of a project like this.The disadvantage is however that the result will be a limited surplus value compared with the printed version: every authorised user can print the articles (s)he needs from wherever her/his workplace is located, but the articles itself cannot be (cross) searched.Only the online contents can be searched.It is assumed that funds will be recruited for digitising the journal by the coordinating bureau.It is expected that it will be a 50-50% financing (the Dutch government and the participating libraries).

Organisation
A central co-ordinating bureau will be established which will offer technical, organising and juridical expertise.The bureau will draft the guidelines and contracts, and will offer support during the whole process.The Koninklijke Bibliotheek is willing to take this role.
Each of the Delta-partners will be responsible for the microfilming and digitising of a journal according to the drafted criteria and guidelines.Each Delta-partner has to decide to film and/or digitise in-house or to contract it out to a commercial business.Each partner is responsible to submit quotations to concerns, for co-ordination of the work and control of the files.Each Delta-partner will be responsible for archiving their own share; this will be linked up with the DNEP service of the Koninklijke Bibliotheek.

Recommendations
To get a better insight in the whole process and real costs it is recommended to start with a pilot with one journal of one publisher.The Steering Committee DELTA will decide which of the 5 opted journals it will be, which library will execute this and how the costs will be spread.The Koninklijke Bibliotheek is willing to start with Oud-Holland.

To Conclude
This paper is not only a description of the development of a project plan, but can also be seen as an example of the rather complicated process of selection for digitisation.A collection manager no longer can be seen as solely responsible for the collections and the process of collection management, but must at least have some knowledge of technical standards and procedures for digitisation, has to be concerned with matters such as copyright and licensing, the potential costs of gaining the necessary permissions etc.Or, to quote Demas: "Through an intellectual sound process involving hundreds of scholars in each discipline, we can identify a critical mass of content which is clearly of enduring significance to scholarship.We can then work with our colleagues in preservation, access, technology, law, and publishing to find and

Co-ordinating bureau
Advice KB Maastricht Bijdragen TLV
implement the best format in which to deliver and preserve these qualitatively selected bodies of literature."
Costs, Independent of the Journals Selected