Research article
The Status of Digitisation in Europe
Extensive Summary of the Second LIBER-EBLIDA Workshop on the Digitisation of Library Materials in Europe

Paul Ayris
LIBER Vice-President
Library Services, University College London, Gower Street, London WC1E 6BT, UK
p.ayris@ucl.ac.uk
Abstract

The 2nd LIBER-EBLIDA Digitisation Workshop took place in The Hague between 19–21 October 2009 and was hosted by Bas Savenije as National Librarian of The Netherlands. In six sessions, the workshop considered a wide range of issues from business models to persistent identifiers. The papers were all of high quality, marking out the LIBER-EBLIDA Digitisation Workshop as the meeting to discuss pan-European digitisation issues. Four break-out groups helped to revise the LIBER Road Map for pan-European digitisation. Assessing the impact of the workshop, this paper identifies six top-level themes and questions to emerge from the three days of activity, which are summarised in the conclusions to this paper.

Key Words:
Digitisation; library materials; LIBER; EBLIDA

Introduction

The second LIBER-EBLIDA Digitisation Workshop was held in The Hague on 19–21 October 2009. The programme and presentations from the event are stored on the LIBER website and are available for download/viewing. The purpose of this article is to summarise the many discussions in the meeting and to attempt to point to their significance in the wider framework of global digitisation activity.

109 people are listed on the attendance list for the workshop (as opposed to 95 attenders in 2007) from 24 countries (23 in 2007).

The purpose of the LIBER-EBLIDA workshops is fourfold:

It is, therefore, against these four criteria that the success of the 2nd LIBER-EBLIDA Digitisation Workshop will be judged.

Programme

The Programme was thematic and the following areas were addressed by the 2009 Workshop:

The Workshop ended with a number of break-out sessions which reflected on a number of these themes:

International Activity

Paul Ayris gave an overview on progress in the LIBER Digitisation Road Map since the 2007 workshop and his talk covered four areas:

In terms of content, LIBER had successfully bid for a €2.8 million project to digitise materials on the themes of travel and exploration. This project was launched in Tallinn in Estonia on 11 May 2009 and has nineteen partners from across Europe. The objectives of the project are:

In terms of content, an ambitious programme was being followed and the types of material being delivered are listed in Figure 1.


Figure 1: Content Provision to Europeana.

In terms of aggregating content into Europeana, LIBER is working with Europeana to establish a European aggregating service, for which EU funding will be sought, to allow Europe's research libraries to provide metadata for digital content, to be surfaced in Europeana. The architecture of this schema has been identified and is given in Figure 2.


Figure 2: Architecture for a pan-European aggregator into Europeana for Europe's research libraries.

LIBER has made some progress in advocating, along with partner organisations in Europe, academic-friendly copyright frameworks. The European Commissioner, Viviane Redinge, has stated:

‘We should create a modern set of European rules that encourage the digitisation of books. More than 90% of books in Europe's national libraries are no longer commercially available, because they are either out of print or orphan works (which means that nobody can be identified to give permission to use the work digitally). The creation of a Europe-wide public registry for such works could stimulate private investment in digitisation, while ensuring that authors get fair remuneration also in the digital world. This would also help to end the present, rather ideological debate about “Google books”. I do understand the fears of many publishers and libraries facing the market power of Google. But I also share the frustrations of many internet companies which would like to offer interesting business models in this field, but cannot do so because of the fragmented regulatory system in Europe. I am experiencing myself such frustrations in the context of the development of Europeana, Europe's digital library. Let us be very clear: if we do not reform our European copyright rules on orphan works and libraries swiftly, digitisation and the development of attractive content offers will not take place in Europe, but on the other side of the Atlantic. Only a modern set of consumer-friendly rules will enable Europe's content to play a strong part in the digitisation efforts that have already started all around the globe.’[1]

LIBER's statement to the EU Google Books hearing on 7 September calls for a similar pan-European framework:

‘[The EU] can, through its legislative powers, draw up legislation to be adopted in the member states which creates a framework for copyright laws and Directives which will reflect the advances made in the US Google Book Settlement and give the European researcher and learner the same advantages as the US user.’[2]

The final area of the road map which Ayris illustrated was the work which LIBER has been undertaking on costing digital curation.[3] LIFE Phase 2 has developed the digital costing formulae which the LIFE project has produced, and applied them in a set of case studies. One of these case studies was the Burney Newspaper collection in The British Library. This is an important study as there are analogue and digital equivalents, which comprise over 1,100 volumes of the earliest-known newspapers in the history of printing. The headline conclusion of LIFE 2 is that the same lifecycle model can be used to cost curation of analogue and digital materials. Figure 3 compares analogue and digital preservation costs over year 1. It is too simplistic to say that digital curation is more cost-effective than analogue curation. This LIBER digitisation case study has established an approach which allows comparison of digital and analogue costs. LIFE Phase 3, which will be completed in the summer of 2010, will develop an open source costing tool to allow such costings and comparisons to take place.


Figure 3: Curation costs for the Burney Newspapers.
Public-Private Partnerships

Two speakers, Javier Hernandez-Ros (European Commission) and Ben White (British Library) spoke about public-private partnerships as a way of progressing digitisation activity. The EU representative noted that amongst national libraries in the EU, only 2–3% of collections are digitised (2008). Europe's copyright framework needs adaptation to the digital age (e.g. orphan works). Also, public resources are not sufficient to cover the costs of digitisation at the required speed and so public-private partnerships can help.

Ros gave a number of examples of such partnerships:

So important is this subject that the Commission has issued guidance[4] in this area. The key Recommendations are as follows:

Ros stressed that the question is not: public-private partnerships: yes or no; it is: public-private partnerships, how?

Ben White, Head of Intellectual Property at the British Library (see Figure 4), categorised the nature of public-private partnerships as:


Figure 4: Microsoft — British Library Partnership.

He also made the point that there are public-public partnerships too, e.g. JISC/HEFCE, Europeana.

White saw the motives as:

In terms of licensing, White advocated the following principles as best practice: standard contracts should emphasise the following:

Europeana: a view from the US

Ricky Erway from OCLC gave an insightful overview of the Europeana portal as seen from the US, which was welcomed by members of the Europeana team at the workshop.[5] Erway wondered whether Europeana could in fact compete with Google. She questioned whether there might not be a problem with participation, in that prospective members were sometimes unclear how to get involved. Access was a challenge, because it was easy to get lost on the Europeana site, and it is difficult to know what is there. Erway advocated that Europeana should become more involved with social networking tools. She queried how sustainable Europeana was, given the ongoing running costs, noting that the US development American Memory had tried commercialisation, but that this was not a success.

In terms of presentation, Erway wondered whether the branding was right, suggesting that ‘World Library’ might be a better description. She advocated partnerships between Europeana and other bodies, where Europeana could offer material to other sites, such as OCLC WorldCat. She wondered whether Europeana might not have to harvest every six months from partner sites, to ensure that the materials in the portal were in step with local files at the partner sites. Text mining is certainly an area that Europeana should investigate, as there will be a demand for such a value-added service.

Financial Aspects

Silvia Gstrein from the University of Innsbruck presented an interesting model of E-Books on Demand (EOD), with participating libraries in ten European countries (Figure 5). The delivery time is an average of one week. The service currently has 1900 customers and the average price of an order is around €50. The service has an active development programme, which will certainly raise its profile in Europe. Deliverable 5.7 of the EuropeanaConnect project is to embed the delivery of EOD e-books into Europeana. The first books will be delivered before May 2010.


Figure 5: Participating members of EoD.

John Hanley gave an impressive overview of Google Editions, which aims to bring e-texts to the global internet user market of 1.8 billion users. The model is this:

  1. Consumers come across a book in Google.com.

  2. Consumers browse a book preview, and purchase via Google or one of Google's retail partners.

  3. Consumers pay via a Google account, or through an online retailer.

  4. Once they have bought a Google Edition, it lives in their online bookshelf.

  5. The consumers can read the book from their browser on any web-enabled device, at any time.

Hanley outlined a number of business models:

Hanley stressed that the future for such a programme may well be an institutional subscription model, and that such a publishing programme was certainly aimed at national groupings.

Frédéric Martin talked about Gallica at the French National Library in terms of defining economic models. In terms of costs, Martin presented Figure 6.


Figure 6: Cost distribution in Gallica.

The 49% preservation cost is large because of a high initial investment in storage equipment. Martin suggested a number of issues which Gallica would have to address in terms of a Return on Investment model — see Figure 7.


Figure 7: Issues for Return-on-Investment.

Rémi Gimazane of the Ministère de la Culture et de la Communication looked at public sources for the financing of digitising library materials in France, which each year gives out €3,000,000 in grants with an expectation that such sums will cover 50% of the project costs.[6]

A new strategy was drawn up in 2005 for the large-scale digitisation of printed books to establish a French contribution to the European digital library. The necessary funding was achieved through the creation of a public-private partnership between the Bibliothèque nationale de France (BnF) and the Centre national du livre (CNL),[7] which is a public institution whose goal is to re-distribute resources to many actors in the book sector in order to improve the diffusion of quality works.

The revenue of the CNL is based on the product of two taxes:

As a result of the extension of the second income stream listed above, the Digital Policy Commission of the CNL is entrusted with re-distributing revenues of around M€ 8 a year. The Bibliothèque nationale de France is now eligible for funding. Since 2007, the Digital Policy Commission of the CNL has redistributed more than M€ 25 to digitisation projects. The projects of the BnF are generally totally supported by the Commission (the subsidies cover 100% of the costs). Publisher projects are granted between 30% and 70% of their costs, depending on the nature of the project. The BnF has thus been granted about M€ 20 since 2007 to achieve its book digitisation projects. Until 2009, the Digital Policy Commission of the CNL had only financed projects either from the BnF or from the publishers. However, from 2009 on, the BnF will actively assist other libraries in their digitisation projects: the digitisation programmes of the BnF are now open to partnerships along thematic guidelines.

User Needs

The workshop looked in some detail at the needs of users in the digitisation arena. Roswitha Poll from Münster reported on the NUMERIC project, an EU-funded project which finished in May 2009:[8]

1,539 representative institutions were surveyed across Europe, with a response rate of 788 or 51%. For all institutions surveyed, the main source of funding was their own resources (62.1%) (Figure 8).


Figure 8: Sources of funding for libraries.

A variety of digitisation costs was reported. The survey looked at estimated median costs per digitised item for future digitisation activity and these are given in Figure 9. For all types of institution, those questioned were asked how much of their collection was digitised and how much remains to be done. For all institutions (616 valid responses), 30.2% said there was no need for digitisation, 19.3% said their digitisation was completed, and 50.5% said they had outstanding digitisation. It was noted, however, that answers to this question may have been skewed by a lack of clarity about what ‘digitised’ means:


Figure 9: Median costs per item for future digitisation projects.

Stuart Dempster from the Strategic Content Alliance looked at how digitised collections can meet user needs. Knowledge about your audiences can provide evidence to help you answer the following questions:

Dempster made the point that the Library is ‘of the web’ and not ‘on the web’ … moving from ‘building digital libraries’ to ‘digital libraries supporting diffusion’ of content.

The development of the BBC iPlayer is a good example. The iPlayer was originally launched in 2007; it is a service which allows users to watch BBC television programmes from the last week over the internet. The aim of the re-design was to launch a version of the iPlayer that integrated the delivery of on-demand TV and radio. The re-design relied on user engagement to text design concepts and usability; to check that existing users would not react negatively; to find out how perceptions of the design might affect audiences' attitudes towards the BBC as an organisation and content provider in the 21st century digital world. The user research concentrated on the following:

Dempster identified five phases in audience research (Figure 10):

  1. Target audience: describe and define the target audience

  2. Plan: plan your research

  3. Research: collect the data

  4. Analyse: model your audience

  5. Apply: exploit the evidence.


Figure 10: Stages of audience research.

Audience research does not need to be perfect to be useful. Even a small audience research project is worthwhile. Audience research should be seen as an ongoing process. Many techniques can be implemented quite cheaply or adapted to a shoestring budget. Audience research should be done with commitment and support from senior management

Catherine Lupovici talked about the user needs for metadata. She analysed the forms of metadata required as follows:

Lupovici stressed that the main message of her report for research libraries was that the future is now, not ten years away, and that they have no option but to understand and design systems around the actual behaviour of today's virtual scholar.[9]

How will metadata be used in the digital transition? Lupovici suggested the following:

Lupovici felt that the future was one of opportunities for research libraries. She noted the trend to explode structured packaged content into its semantic components to be exploited in the web of data; highlighted that convergence with the technologies experimented by the open access movement was very active in some niches of the research libraries users, highlighting that OAI-ORE (Object Reuse and Exchange) can facilitate instant publication on the web; stressed the use of professional, structured metadata information to contribute knowledge to the emerging semantic web by exposing the metadata in the web of data using the emerging global models, and sharing the metadata beyond the library community and creating opportunities for use and re-use of them in many ways.

Public Libraries

There were three papers on the offerings of public libraries — by Koen Calis and Jan Brackmann on ‘Cabriology’, the Bruges Aquabrowser experience; by Magali Haettiger on the digitisation project with Google in the public library in Lyon; and by Erland Nielsen on cross-domain activity in Denmark.

‘Cabriology’ is a successful attempt to take the public library beyond the library walls, to make content interesting to those who do not use traditional library services, to link digital catalogues and services together into one interface, and to create a presence for digital information at a civic level (see Figure 11).[11]


Figure 11: Successful advertising by the Bruges public library: ‘This man was looking for a CD from the Red Hot Chili Peppers. And found much more …’.

Haettiger talked of the incipient partnership between the Bibliothèque municipale de Lyon[12] and Google. The library has one of the greatest collections of historical texts, manuscripts and archives in the whole of France. Working with printed material from the sixteenth century to 1869, the aim of the project is to create 500,000 digitised texts as part of the European Google Books project.

Erland Nielsen, Director of The Royal Library in Denmark, spoke about Pearls of Culture, a cross-domain finding tool for Denmark. Pearls of Culture[13] is a national web portal for retro-digitised resources regarded as a new part of the National Bibliography, created by The Royal Library and opened 16th April 2009 (see Figure 12). The aim of the portal is to create a:


Figure 12: Pearls of Culture.

The selection criteria are as follows:

The metadata record has eleven fields:

  1. institution responsible

  2. types of material

  3. subject areas or fields

  4. size of collection

  5. presentation format

  6. preservation format

  7. searchable where?

  8. time of digitisation

  9. rights of Use

  10. access

  11. contact Person

As of 15 October 2009, the portal held records for nearly 200 e-collections, with 11,000 visits from 6,500 visitors. 48.62% of all accesses were from search engines. Possible future developments include:

Cross-domain Aspects

Four speakers spoke on the cross-domain aspects of the aggregation of digitised content. Jill Cousins spoke about the work of Europeana. One of the main reasons for aggregating content is because the landscape is so diverse for the user (Figure 13).


Figure 13: Costs of the Europeana Office in 2012.

Most of the time, users are starting from Google or Wikipedia. Sometimes they go directly to an individual content provider, usually if they are looking for something specific. If a user searches, say, for the Berlin Wall, it is very unlikely that an individual content provider's AV content will appear high in Google.

Cousins looked at the cost-effectiveness of the Europeana Office to deliver pan-European aggregation across domains (Figure 14).


Figure 14: Routes to content for the European user.

Averaged across 27 countries, the cost works out at €146,000 per country. Cousins also painted a picture of what current Europeana aggregation activity looks like for libraries (Figure 15).


Figure 15: TEL as the library aggregator for Europeana.

TEL (The European Library) currently aggregates the content of Europe's national libraries into Europeana. Plans are underway, however, subject to successful European funding, for all LIBER research libraries to have their digital content aggregated into Europeana too.

Cousins finished her talk by highlighting the top-level issues around aggregation.

Alastair Dunning of the JISC gave a thought-provoking talk on why digitisation activity was needed and illustrated the findings of a JISC report Usage and Impact Study of JISC-funded Phase 1 Digitisation Projects & the Toolkit for the Impact of Digitised Scholarly Resources (TIDSR).[14]

One of the recordings from the British Library's sound resource is an interview with the English sculptor, Elizabeth Frink. In this clip she answers questions about her bird sculptures.[15] However, do people really use such resources? Here Goldsmiths' College lecturer Rose Sinclair explains why she uses the Elizabeth Frink interviews in her university seminars.[16] Dunning's conclusion is that the community needs to do more of this, using

Jef Malliet talked about preparing cross-domain data for the semantic web. Construction of Erfgoedplus.be[17] began in 2005. The objectives were to:

‘Erfgoedplus’ has been online since May 2009. It covers two provinces — Limburg and Vlaams-Brabant. It houses around 145 collections and about 40,000 artefacts, mostly from museums or churches. It is planned to expand coverage to the immovable heritage, published/non-published documents over the next two years. Erfgoedplus is a content partner for Belgium in EuropeanaLocal and a partner in the Europeana v1.0 Thematic Network.

Hildelies Balk spoke of the role of IMPACT[18] as a Centre of Competence under the European i2010 vision. In May 2008, the project was submitted as a successful proposal in answer to the first call of the FP 7 ICT Work Programme 2007.4.1 Digital Libraries and technology-enhanced learning. Balk outlined the main challenges to pan-European digitisation as:

IMPACT has a focus on historical printed texts before 1900. IMPACT aims to significantly improve mass digitisation of historical printed texts by:

In terms of innovation, IMPACT is:

Towards a sustainable Centre of Competence, IMPACT will aim to achieve the following in 2010–11:

In 2012, the vision is that IMPACT will be a sustainable Centre of Competence for mass digitisation of historical printed text in Europe:

Around this Centre, a bigger community has formed, with added expertise from digitisation suppliers, research institutes, libraries and archives across Europe. This will contribute to the ultimate aim: All of Europe's historical text digitised in a form that is accessible, on a par with born digital documents.

Access to Digitised Materials

A highlight of the workshop was the talk by Dan Burnstone of ProQuest, who spoke of Digitising Early European Books (Figures 16 and 17). Burnstone outlined the objectives of the Programme:


Figure 16: Early European Books.

Figure 17: Measurement of Luther's Psalms of David (1598).

In terms of the collaborative model:

Burnstone analysed progress in the Royal Library in Denmark:

A full rollout is planned for 2010–11 on a subscription basis.

Burnstone, ProQuest and LIBER feel that a viable model has been established to create a resource with huge benefit to scholars, students and beyond. It creates value that does not pre-exist in the analogue constituents. If this is not open access, then it is at least ‘opening up access’. LIBER is delighted to be associated with this project from ProQuest and the vision is to include all imprints from across the whole of Europe between 1475 and 1700 into this important database in the coming years. It is a compelling vision and will be a magnificent tool for European scholarship.

Ralf Goebel and Sebastian Meyer gave an incisive paper on a viewer from the Deutsche Forschungsgemeinschaft (DFG) (Figure 18).[19] The DFG viewer is based at the Saxon State Library. Using OAI-PMH, the viewer interrogates METS metadata and draws down the resulting digitised image. The DFG viewer needs to know which digitised book is to be summoned and where the book is located.[20]


Figure 18: The DFG viewer in action.

Kai Stalmann gave a talk on increasing access to European Biodiversity Libraries (BHL). BHL aims to facilitate open access to taxonomic literature. It is a multilingual access point providing material to Europeana. Issues were identified in the future development of the Biodiversity database:

Dorothea Sommer from Halle spoke about persistent identifiers, describing the URN Granular project of the German National Library and the University and State Library Sachsen-Anhalt, Halle.[21] When digitisation was in its infancy, the issue of citability of digital resources was frequently underestimated. But it is exactly citability that makes internet-based digitised sources viable for academic writing. Different from previous secondary formats, like microfilm or paper print-outs, an internet resource is not just a copy of the original which can be treated and hence quoted like the original, but rather an independent object in a dynamic integral research space … When a copy is online, it needs a unique address so that other documents or databases can link to it.[22]

The paper posited some solutions:

Sommer described the objectives of URN Granular as ‘long-term, reliable, sustainable opportunity to address/quote not only the digital work as a whole, but the individual pages/units/entities within a digital work. The definition of a URN-Object is: ‘Within the framework of URN management a digital object is a unit that a URN can be assigned to. This unit refers currently to a static publication, like an online resource in monographic form. […] The smallest resource of a digital object is accessible via a common tool of access on the internet such as a URL.’[23]

The architectural principles of the Uniform Resource Name (Resolution (RFC 2276, 1998) are described in detail in Sommer's article in this issue of LIBER Quarterly.

Break-out Sessions on Improving the Digitisation Road Map

Four break-out sessions identified areas for discussion which would help LIBER to re-draw its Digitisation Road Map.

Metadata, Interoperability, Persistent IDs, Standards

This group made seven recommendations. There needs to be a minimum set of DC-based metadata standards, based on usage in Europeana. Best practice toolkits need to be developed for mapping, exporting and protocols. Centres of competence should be developed to cover all areas of digitisation activity. Regarding persistent identifiers, there is a need to identify chosen solutions (e.g., URN, DOI, ARK, PURL) and to get libraries using them. Duplication in metadata in library catalogues needs to be addressed. There is a perceived need for the development of an open source reader and to define minimum standards for inter-operability:

Funding and Commercial Partnerships

The group identified strengths and weaknesses in gaining private funding:

A number of suggestions were made:

User Needs

The break-out group on user needs had a lot to say. In terms of business models, they felt that digitisation-on-demand was a suitable business model as it met a known need. Digitisation had to move beyond the local silo towards large aggregations. However, in terms of access the group felt that there should be multiple routes to content. Use cases may be helpful in raising expectations on what digitised content can offer users. In terms of discovery, the group noted that different sorts of resources may well require different levels of metadata — e.g., image collections may require rich metadata to make them discoverable on the web. The group felt nonetheless that browsing was more important than searching as a means of discovery. The use of social networking tools such as Flickr was advocated as a means of building virtual communities of users. The group questioned how Europeana would help the specialist. Persistent IDs were seen to be vital in enabling discovery. The group identified three actions:

Cross Domain Aspects

The group noted that aggregation drives standardisation, but that real standards should be agreed upon. Aggregation means records can be enhanced, but:

The group asked how important Europeana is, as many searches will originate in places such as Wikipedia and Flickr. Does Europeana need a collecting policy? The group noted that Google has no collecting policy. The group was unsure how Europeana can interact with emerging European infrastructures for data. In terms of aggregation, the group recognised that different sectors have different views. Museums, for example, think that catalogue records count as digital objects, which libraries do not. Different domains have different standards, and so we need to know what the benefits of cross-domain collaboration are. Will the end-users understand everything they see? Domain silos need to be broken down. We need models of good practice for cross-domain activity. We need structures which can speak with authority about these issues, to co-ordinate activity.

Conclusions

A number of high-level conclusions can be drawn from the workshop, which will help shape the future map for digitisation activity in Europe.

First, the high quality of the speakers and the very wide range of topics discussed indicate that the LIBER-EBLIDA Digitisation Workshop has established itself as the event for discussing European issues regarding the digitisation of library content. As a result of all the papers and break-out groups, the LIBER Digitisation Road Map has been re-drawn.[24]

Second, the importance of Europeana as a pan-European cross-domain aggregator of content emerged as a lively theme throughout the workshop. It's not yet clear whether Europeana can compete with Google, but it is certainly a part of the European information landscape.

Third, the workshop included presentations on new services which seem destined to become embedded into the European information landscape — notably Early European Books from ProQuest, working in partnership with LIBER, and E-Books on Demand (EOD).

Fourth, the need for specific identification of digital objects, down to chapter and section level, underlined the need for persistent identifiers. There needs to be further discussion across Europe about this issue.

Fifth, a number of financial and business models were discussed during the meeting. What is abundantly clear is that libraries do not have sufficient resources locally to meet the costs of digitisation. This is why financial and business models are necessary. The various organs of the European Commission need to think hard about this, and consider whether more European money could not be made available for the digitisation of content.

Sixth, and finally, a number of papers emphasised the end user experience. Clearly, it is not enough simply to digitise materials. Libraries should consider the impact which this digitisation activity has, assess user needs, and (using social networking tools) help to create real virtual communities of users who use and interact with digitised texts.

Websites Referred to in the Text

Europeana, http://www.europeana.eu/portal/

LIBER Road Map on Digitisation, version 31 October 2009, www.libereurope.eu/files/LIBER-Digitisation-Roadmap-2009.pdf

Second LIBER-EBLIDA Workshop on Digitisation, programme and presentations, http://www.libereurope.eu/files/Digitisation%20Programme%20Online-final.pdf


Notes

See an extended version of Erway's presentation in her article in this issue of LIBER Quarterly.

See Poll's article elsewhere in this issue of LIBER Quarterly.

See Calis' article elsewhere in this issue of LIBER Quarterly.

See their article elsewhere in this issue of LIBER Quarterly.

See Sommer's article elsewhere in this issue of LIBER Quarterly.