The European Digital Information Landscape:
How Can LIBER Contribute?

Paul Ayris
Chair LIBER Access Division, Library Services University College London
Gower Street, London WC1E 6BT, UK
p.ayris@ucl.ac.uk
Abstract

This paper looks at a snapshot of the current state of digitisation in the information landscape. It then looks at what LIBER can contribute to that landscape through portal development, funding, identifying and documenting best practice, lobbying at a European level, and managing the transition from paper to digital delivery, including the issue of digital preservation. The paper ends by trying to identify how the user will use the digitised resources which are increasingly being made available by libraries.

Key Words:
digitisation; Europeana; portal development; LIBER; information landscape

Digitisation in the Information Landscape

‘… Libraries have been moving from smaller digitization projects to mass digitization projects that will eventually make available whole collections, including millions of books. Funding agencies are supporting research and demonstration projects that aid libraries and cultural heritage institutions in better understanding digitization processes … All of this has taken place without a coherent body of policy to guide decision making …’ Digitization Policy Workshop, Chicago, April 2006.

This conclusion, from the Digitization Policy Workshop in Chicago in 2006, makes sobering reading. Has the move to mass digitisation of content resulted in the creation of a digital dustbin? How do users retrieve content and how are the relationships between different editions of printed works, or between text and related images in different repositories, expressed? What can LIBER do to help?

Portals

The Europeana portal is due to be launched in November 2008. Europeana — the European digital library, museum and archive — is a two-year project that began in July 2007. It will produce a prototype website giving users direct access to some two million digital objects, including film material, photos, paintings, sounds, maps, manuscripts, books, newspapers and archival papers. The digital content will be selected from that which is already digitised and available in Europe's museums, libraries, archives and audio-visual collections. The prototype aims to have representative content from all four of these cultural heritage domains, and also to have a broad range of content from across Europe.

Europeana wishes to work with designated national and/or domain-specific portals. It will harvest metadata from these portals and surface it in Europeana. Europeana will therefore present one public interface to the user, which will direct the enquirer to the appropriate local repository where the desired digital object will be stored. Domains contributing to Europeana need to:

For European national libraries, there is TEL (The European Library). The European Library is a free service that offers access to the resources of the 48 national libraries of Europe in twenty languages. Resources can be both digital or bibliographical (books, posters, maps, sound recordings, videos, etc.). Currently The European Library gives access to 150 million entries across Europe. The amount of referenced digital collections is constantly increasing. Quality and reliability are guaranteed by the 48 collaborating national libraries of Europe. The libraries participating in The European Library are all members of the Conference of European National Librarians (CENL), a foundation which aims to increase and re-inforce the role of national libraries in Europe. Members of CENL are the national librarians of the Council of Europe Member States.

European research libraries have no comparable portal to TEL, from which metadata for digitised content can be harvested. There is an opening here for LIBER. What LIBER can do is to create an aggregation service for member libraries, whereby metadata for materials digitised in each member library could be harvested by the LIBER aggregation service and offered to Europeana automatically so that searchers of Europeana would discover the materials which LIBER member libraries have digitised.

LIBER already has expertise in this aggregation work through its work on the DART-Europe portal. DART-Europe is a partnership of research libraries and library consortia who are working together to improve global access to European research theses. DART-Europe is endorsed by LIBER as part of the work of the LIBER Access Division, and it is the European Working Group of the Networked Digital Library of Theses and Dissertations (NDLTD).

The DART-Europe architecture is helpful in determining what a LIBER-sponsored architecture for the discovery of digitised materials from LIBER member libraries should look like. The DART-Europe portal, represented schematically in Figure 1 below, comprises three layers. The top layer consists of the local or national partners who make available the full text of research theses. The middle layer is the local or national platform which houses the full text of the stored research theses. Metadata, using simple Dublin Core, is also housed at this level. The storage platforms comprise institutional, regional and national repositories. The third layer is the DART-Europe portal, which is the aggregation service which the LIBER Access Division provides. The portal harvests the metadata from the storage layer and presents it in a unified front end to the user via the DART-Europe portal. Further details on the development of the portal are available in the article by Martin Moyle.


Figure 1: DART-Europe architecture.

By extension, it is possible to extrapolate what the shape of an aggregator service which LIBER might construct to feed Europeana could look like. Whilst the underlying technology platform for the construction of the aggregation service might differ, the set of roles and relationships for European digitisation activity are, in essence, no different from the pattern of roles and relationships which govern the development of the DART-Europe portal. A possible architecture for the digitisation of materials in Europe's research libraries who are in membership of LIBER, is given in Figure 2.

In Figure 2 below, the local and national platforms as the first two layers of the architecture can remain the same as in the DART-Europe diagram. The DART-Europe portal in Figure 1 has now been replaced by the LIBER aggregation service, which is a dark aggregation of metadata which is not itself available for public consultation. Instead, this aggregation feeds the Europeana portal, which is the public space for resource discovery and retrieval by users wishing to find content. However, like Figure 1, it is the OAI-PMH protocol which is the linking technology to secure the relationships and deliver metadata records to Europeana portal.


Figure 2: Possible architecture for a LIBER aggregation service.
Why a Portal?

Portal technology is not the only technology which can be used to deliver digitised content to the user. Google itself crawls the web and can provide a front-end search interface with comprehensive retrieval. The OCLC Report College Students’ Perceptions of Libraries and Information Resources[1], a subset of OCLC's larger Perceptions report, shows that 89% of college students’ information searches begin with a search engine. Google was the overwhelming favourite (68%). Google is certainly a popular choice with students and with general users. However, there are issues with this approach, which is basically one of ‘pile them high, sell them cheap’. The retrieval by any search engine is so great that it is often difficult to find the correct/best entry in mass retrieval. A search on the single word ‘Alps’ currently retrieves over 29,000,000 hits in Google in a mixture of material types and file formats. The phrase ‘Alps mountains’ retrieves just under 400,000 hits. In Google Scholar, the retrieval is 268,000 entries — mainly text-based articles. The scale of the retrieval and the jumble of the order in which the results are presented to the end-user makes such search engines confusing information spaces.

In terms of e-book retrieval, for example, what such an approach lacks is refinement. It would be useful for such searches to FRBRise the results, so that related items could be grouped together. In terms of digitised content, it would be useful to bring together successive editions of a work, so that the user could tell which edition he/she was looking at. De-duplication of results would surely be essential. One of the issues about current digitisation activity is that there is no coordination in such work across continents, or even in individual countries. It is quite possible for the same work to be digitised several times. In this scenario, what international mass digitisation needs is registries of persistent identifiers and knowledge bases which store this information and can be invoked as part of resource discovery and retrieval.

Google and Google Scholar may provide quick solutions which retrieve large amounts of content, but current research is beginning to show that users find such mass retrieval confusing. The ‘Google Generation’ Report from UCL's School of Library, Archive and Information Studies has identified trends in information seeking behaviour which are challenging.[2] The research was undertaken by the CIBER Research Group at UCL (University College London). The Report found that the traits commonly associated with the ‘Google Generation’ are in fact common to all age groups. Young people:

The Report has received wide publicity and the findings are serious. In a Higher Education environment, the cultivation of critical awareness and the development of skills which will allow the user to make judgements about the academic value of individual pieces of information can rightly be said to be part of the core skills that students are expected to attain. If the ‘Google Generation’ Report is correct, then university libraries are faced with a challenge in embedding training in key skills into the curriculum.

One of the things which libraries can do to improve the tools available to the end-user is to embrace portal technology. Portals can, by the nature of the selection and description of the content to which they give access, provide access to quality-assured resources which will help the end-user to obtain the best possible information. Portals will not replace search engines such as Google as a means of resource discovery. Rather, they can sit alongside such search engines and act as a tool which filters the mass of content available on the web, identify the best, and present this to the end-user in the form of academically-accurate resources descriptions and content.

Bidding for Funds

One of the tasks which LIBER has recently taken on itself is the identification of funding streams for digitisation activity. The business model to support digitisation activity in European research libraries is not straightforward. The majority of such libraries do not currently see the digitisation of paper content as a core activity. Consequently, library budgets may well not have a separate budget head which would fund substantial digitisation work.

There are various ways of addressing this issue. One way is to seek project funding from external funders to support such work. LIBER has recently submitted a bid to the e-contentPlus funding programme to undertake the digitisation of analogue library materials on the theme of European travel and tourism. Whether the bid is successful or not, the model which LIBER has helped to establish by participating in this bid may well set the trend for similar bids.

The model for Europeana Travel is this. It represents a partnership between European national libraries (represented by CENL) and European research libraries (represented by LIBER). The project is also supported by the EDL Foundation. The consortium's 19 members include 17 library members providing content from 16 European countries, drawn from both the CENL and LIBER membership across the whole of Europe.

At the time of writing, the units of measurement have not been standardised; however Figure 3 contains no double counting. So the 15,879 books (comprising perhaps 3,000,000 pages) are separate from the 193,650 pages of material also listed.


Figure 3: Table showing initial assessment of the units of output in Europeana Travel.

How will the materials contributed by LIBER member libraries be discovered and retrieved by users? If funded, LIBER will construct the aggregator outlined in section 2. The aggregator, to be developed by UCL (University College London), will harvest metadata from contributing LIBER libraries. It is to be a dark aggregator and it will make the metadata harvested via the OAI-PMH protocol available to the Europeana portal. In this way, the European user will be able to discover and retrieve the holdings of Europe's research libraries alongside the holdings of national libraries in the same search. The project would begin to support the creation of a critical mass of content in the area of travel and tourism, which will support the development of a pan-European portal.

The richness of content types is one of the impressive features of Europeana Travel. European libraries contain a mass of content which is extremely difficult for the general user to access. The European citizen cannot travel between vast numbers of libraries to view content. Through digitisation, these unseen materials will be made available to the public for the public good, transforming their experience of travel and tourism in Europe.

Identifying Best Practice

In order to help LIBER members, the LIBER Access Division is supporting a series of activity lines which are designed to address core requirements from research libraries for European digitisation activity.

Digitisation Road Map

One of these activity lines is the construction of a Digitisation Road Map, which has four objectives:

  1. the construction of a task force to investigate the best way to create a European registry for digital library material, with EU funding

  2. the preparation of a follow-up Workshop on Digitisation to the Copenhagen Workshop in October 2007[3]

  3. the identification of external funding for European digitisation activity

  4. the encouragement of European research libraries to contribute to European digitisation activities.

Action is currently underway in activity lines 2–4. Actions 3 and 4 are covered by the Europeana Travel bid outlined in section 3 of this paper. Actions 1 and 2 stem from a very successful Digitisation Workshop in Copenhagen on 24–26 October 2007, which was organised by LIBER and EBLIDA. In the course of the three days, the attenders reached a number of conclusions and wished to make a significant number of recommendations to the European Commission.[4] The recommendations, which are reproduced in full and discussed in my article listed in [note 4], fall into seven categories:

  1. towards a Vision for European Digitisation activity

  2. content issues

  3. discovery

  4. copyright and intellectual property rights

  5. standards and policies: metadata

  6. standards and policies: business models

  7. digital preservation.

Lobbying: Discussion with the European Commission

On 2 June, members from EBLIDA and LIBER met representatives of the European Commission. In a meeting lasting just under one and a half hours, all seven areas outlined above in section 4.1, and described in some detail in the Appendix to my article on the EBLIDA-LIBER Digitisation Workshop, were covered in a wide-ranging discussion.

In terms of a Vision for European digitisation activity, the LIBER-EBLIDA delegation stressed the fact that Europe's research libraries hold materials which are of interest to the European citizen. Such materials do not duplicate the holdings of Europe's national libraries, rather they complement them. In terms of manuscripts, archives, photographs, and to a lesser extent printed books, such holdings are unique. As such, it is quite right that the metadata for these holdings (once the materials have been digitised) should be added to the Europeana portal. The LIBER-EBLIDA delegation made the point that research libraries in Europe have a vital role to play in opening up digitised content to the European citizen and that this role should be explicitly recognised in the future development of Europeana. The value that LIBER and EBLIDA could add to European digitisation activity is one of co-ordination. Currently, there is no agreement, scope or real co-ordination for such activity across Europe.

One of the points made by attenders to the Workshop was the need for a European collections strategy for digitisation and that this was an activity which could be encouraged by LIBER and EBLIDA. There is a need for formal selection criteria for material to be digitised, as this would help the task of co-ordination across Europe. There have been attempts to draw up such selection criteria in the past — including a set of criteria produced by the present author.[5] Such criteria tend not to be widely adopted by those libraries undertaking digitisation activity. There are a number of reasons for this. First, digitisation is usually not seen as a core activity to be funded from the baseline budget. If such digitisation happens, it is usually covered by project funding and such funding can be tied to particular themes or objectives. Second, there is currently no mechanism in Europe for collaboration and co-ordination in digitisation activity in libraries. The LIBER EuropeanaTravel bid is therefore an interesting example of the kind of joint working and activity that could take place across Europe.

The June meeting between LIBER-EBLIDA and the Commission also discussed digital preservation. In some senses, this was one of the liveliest discussions of the day with many interesting points being made by both sides. LIBER and EBLIDA pointed to a number of national activities across Europe in this regard:

Recommendations 24 and 25, which LIBER and EBLIDA have made to the Commission, make the point that LIBER would be happy to contribute to the dialogue in Europe over digital preservation activity. What is the future model for European digital preservation services? There needs to be a road map to help identify the way forward. Trust is a key issue in identifying the journey. The Commission representatives emphasised strongly the need for Europe's libraries, galleries, archives and museums to undertake practical efforts to do digital preservation activity. Studies and reports are all well and good, but at the end of the day what is needed is action to undertake sustainable digital preservation of content.

This is a challenge that LIBER must take up. Part of the answer lies in the work in which the chair of the Access Division is involved in the Blue Ribbon Task Force on economically sustainable digital preservation. Funded by the National Science Foundation in the USA with other funders, representatives from the US, the LIFE project and the Digital Curation Centre in the UK are meeting throughout 2008 and 2009 to determine what needs to be in place for economically sustainable digital preservation to take place. This includes business and costing models, infrastructure, decision-making processes and the lobbying of decision makers to identify the right case to be made for the allocation of sufficient resources to undertake practical digital preservation. Outside national libraries, there is no doubt that it is a challenge in the library sector to make digital preservation a high institutional priority. The Blue Ribbon Task Force is therefore looking at digital preservation activities in a number of sectors: film, television, banking, aeronautics, libraries, the environmental industries, and commercial providers. The first Report from the Blue Ribbon Task Force is expected in December 2008.[7]

LIBER-EBLIDA Joint Expert Group on Digitisation and Online Access (JEGDO)

One of the needs in European digitisation activity is guidance in standards and good practice to be followed/adopted in this work. LIBER and EBLIDA have together established a Joint Expert Group on Digitisation and Online Access (JEDGO). One of the pieces of work which this Expert Group is undertaking is the compilation of a checklist to help those digitising materials to know what the accepted guidance/standards are. The checklist is in the early stages of development, but it is intended that the document will provide guidance in the following areas:

  1. selection criteria to support a European collections strategy

  2. technical standards — for digitisation, for identification via registries/persistent identifiers etc.

  3. copyright and IPR issues

  4. costing models

  5. metadata standards

  6. issues for search and retrieval

    1. EROMM (European Register of Microform and digitised Masters), RDM (Register of Digital Masters), TEL (The European Library), EDL (European Digital Library), MICHAEL, Europeana[8]

    2. need for a European portal to unite disparate resources

    3. OAI-compatibility

  7. business models: open access/public+private partnerships

  8. arrangements for digital preservation

Digital Preservation and LIFE

LIBER, though the LIFE project, is addressing the issue of how to cost digital preservation. Using a lifecycle methodology, the LIFE team has identified formulae which will help individuals or institutions cost the lifecycle costs of acquiring/creating digital assets and making them available over the long term, with digital preservation as an explicit part of the lifecycle. This lifecycle model (version 2) in Figure 4 reflects a lifecycle formula which the LIFE team has developed, based on a number of case studies.


Figure 4: LIFE model (version 2).

Using activity-based costings, the LIFE team has created a number of case studies which can put real costs to long-term digital preservation. LIFE Phase 1 looked at a number of case studies — commercially procured electronic journals, voluntarily deposited electronic publications at the British Library, and web archiving, also at the British Library.[9]

In LIFE Phase 2, further work teased out some real-life costs for digital preservation and lifecycle management in a series of further case studies looking at the costs of preserving materials in institutional repositories, for a centralised archiving service for repository content, and for analogue material which has been digitised — comparing the costs of analogue versus digital preservation of the same materials. In this latter case study, the Burney Papers at the British Library were used.

The Burney case study is important because it used the LIFE lifecycle model to cost digital preservation for digitised materials. The Burney collection is a collection of newspapers purchased from the Reverend Dr. Charles Burney in 1818 for £13,500. It comprises over 1,100 volumes of the earliest-known newspapers in the history of printing. These 1,100 volumes in turn generate close to 1,000,000 pages of text from the 17th and 18th centuries. Due to its age and its rarity, the collection has been managed through its analogue lifecycle by The British Library's curatorial and collection care staff. At various points in the collection's history, decisions have been taken to extend the collection's life and to widen its access for research and other use. The two main decisions that interested LIFE are the decision to microfilm and to digitise the collection. Both of these actions to preserve the originals formed part of the digital lifecycle for this case study. It is important to keep clear that even though Burney is an analogue collection of newspapers, it is the digitised Burney content that was used for comparison to the analogue legal deposit of newspapers for the purposes of costing and analysis.

A headline conclusion of LIFE Phase 2 in this case study is that the LIFE model can be used to evaluate the costs associated with both analogue and digital objects. This is a useful step forward and opens the door to many such comparisons being made in the future. The costs of digitisation and collection management are well known to The British Library through its day-to-day activities. Staff costs are predictable and the timeframes required to undertake tasks are recorded by managers at operational levels. Less known are the costs of digital preservation and long-term digital object management. However, as digital libraries develop, the LIFE team expects this part of the cost comparison to become more accurate.

By taking the total costs incurred in both lifecycles, LIFE was able to predict not just the areas where most costs are incurred but also the per object lifecycle cost which may help in the decision-making process for future library collection management decisions. No content creation costs, apart from the original purchase price, were incurred for the analogue Burney Papers. Today, such a collection would be acquired by The British Library under legal deposit legislation. This posed a considerable challenge for the LIFE model and is discussed in detail in the LIFE 2 Report.[11] Eventually, the LIFE Team thought that the best comparison would be the digital object cost minus the creation costs versus the equivalent analogue object cost, which results in the per entity cost in Figure 5. Costs are taken for Year 1.


Figure 5: Total per entity cost minus creation cost (Year 1).

It would be too simplistic to say that digital lifecycle curation is cheaper than analogue curation: further work on other collections needs to be undertaken. However, LIBER has started this pivotal work and has established an approach by which a comparison of lifecycle costs for analogue and digital lifecycles, including preservation, can be made.

A further comparison was made by the LIFE team by looking at 5-year costs and these are given in Figure 6.


Figure 6: Per entity costs — 5-year totals, minus creation costs.
Identifying the Needs of the User

One of the decisive outcomes of the ongoing digital revolution is the empowerment of the user, putting him/her at the centre of the information landscape. In a university setting, what does that landscape look like? A graphical representation of the landscape is given in Figure 7.


Figure 7: The information landscape for a European student/researcher.

The user at the centre of this European information landscape is called Charlie. He may be a taught-course student, or a Ph.D student or an academic member of staff. Whatever role he/she has, he/she will have information requirements; what they are will depend on the role he/she has in the university institution.

If Charlie is an undergraduate, he/she will want to pay fees or book a room/flat in a hall of residence. This he/she will want to do electronically. He/she will also want to look at administrative information from the library, such as the number of books on loan and to pay any fines electronically by credit card payment.

If Charlie is a research staff member, he/she will want to undertake research collaborations. If he/she is a scientist, or works in the fields of technology or medicine, he/she will want to look at primary data — e.g. astronomical data from observations of celestial bodies or readings from a sensor looking at vibration control in aircraft engines over many months of testing.

Whoever Charlie is, he/she will want to look at information resources. These could be local and be paper resources recorded in the library catalogue; or they could be digital content available locally or via the Internet. Amongst this digital content will be resources which have been digitised from analogue resources. In Figure 7 materials in categories 3, 4, 5, 6 and 7 could well be digitised material from analogue formats.

The conclusion from this overview of the information landscape is that digitised material forms an important part of the provision in five out of seven global categories. As such, it is clear that the digitisation of content is of major importance for the European user and, by extension, for the European research library.

What can LIBER offer in this environment? The traditional model for the research library can be said to be the British Museum Library in London.


Figure 8: Reading Room in the British Museum.

Here, the nineteenth-century Reading Room in the British Museum offers the traditional model of what a library should look like. The user is drawn into the library space, at times when the library is open, and it acts as the main content provider in this space. The library pulls the user into the library space. However, in a global information environment the library is just one content provider. Therefore, the library must push information out to where the user is. In UCL (University College London), to cite just one example, researchers in science, technology and medicine (STM) rarely set foot in physical library space. Digital material is pushed to them at their desktop 24/7 no matter where they are.

At the beginning of the twenty-first century, European research libraries are faced with a great challenge. How can they respond? Is the mass digitisation of content from libraries, pushed out to the user over the network, one of the traits of a twenty-first century library and information service? Does this not help to re-define what a European research library is?

Conclusions

This paper has attempted to paint a picture of current European digitisation activity, and to place this work in the context of current developments in the information landscape. There is little doubt that the user now lies at the centre of this landscape, and that libraries are simply one provider of content which he/she will want to use.

In this landscape, LIBER is attempting to help the European user and European research libraries in a number of ways. LIBER is working with Europeana to help libraries surface content through this pan-European portal. LIBER is actively bidding for substantial funding from the EU to help European research libraries digitise their content. LIBER with EBLIDA is helping to identify good digitisation practice, lobbying the Commission to help embed the views of European research libraries into the EU's plans. In projects such as LIFE, LIBER is also attempting to determine the true costs of long-term digital preservation and it has used the outputs of digitisation activity as one of its case studies.

LIBER, as a membership organisation, has an important role to play in supporting European research libraries in this digital age. The focus of LIBER's attention is both the library and the European researcher and citizen. It is an exciting time for all stakeholders as the new information landscape emerges.

Websites Referred to in the Text

ABES, http://www.abes.fr/abes/en/index.html

CENL, Conference of European National Librarians, http://www.cenl.org/

DART Europe, http://www.dart-europe.eu

Digitization Policy Workshop, Chicago, April 2006, http://www.ala.org/ala/washoff/oitp/dig_pol_w_participants.pdf

e-Depot, Netherlands, http://www.kb.nl/dnp/e-depot/e-depot-en.html

Europeana, http://www.europeana.eu/about.php

LIFE, Life Cycle Information for E-Literature, http://www.life.ac.uk/

NDLTD, Networked Digital Library of Theses and Dissertations, http://www.ndltd.org/

PLANETS, http://www.planets-project.eu/

Recommendations to the European Commission from the LIBER-EBLIDA Digitisation Workshop in Copenhagen in October 2007, reproduced in http://liber.library.uu.nl/publish/articles/000222/article.pdf

TEL, The European Library, http://www.theeuropeanlibrary.org/portal/index.html


Notes

See http://www.bl.uk/news/2008/pressrelease20080116.html, entitled Information Behaviour of the Researcher of the Future.

See Paul Ayris, ‘LIBER-EBLIDA Digitisation Workshop’, LIBER Quarterly 18(2008)1, p.4–19, http://liber.library.uu.nl/publish/articles/000222/article.pdf

These selection criteria are reproduced in the author's article on the ‘LIBER-EBLIDA Digitisation Workshop’, see [note 4] for details.

Phase 2 of the LIFE project is reporting in the summer of 2008 and one of the case studies is looking at the use of the LIFE costing formulae to calculate the costs of the long-term digital preservation of digitised materials, using the Burney Newspapers in the British Library as the basis for the case study. For the LIFE 2 Report, see http://eprints.ucl.ac.uk/11758/

This stage may be beyond the scope of some costing activities. Creation may occur outside the view of the costing institution. It should therefore be considered to be optional. Where considered within scope, elements will need to be tailored to the specific lifecycle case in question.