How do Libraries Find their Way onto the Semantic Web?

The conference

Timo Borst et al.
The event 'Semantic Web in Libraries -SWIB09' was conceived as a 'conference for innovative librarians' and took place for the first time on 24 and 25 November 2009 in Cologne.Aiming to familiarise libraries with the semantic web, the first day offered an introduction and several practical examples illustrating the potential of semantic web applications.The second day revolved around practical examples demonstrating the challenges and possible solutions for libraries.The conference was hosted by the University Library Centre of North Rhine-Westphalia (hbz) and the German National Library of Economics (ZBW) and attracted international speakers and participants.For capacity reasons, the number of participants was limited to one hundred on the first and to sixty on the second day.A significantly higher number of registrations showed the great need for further exchanges on this topic.

What is the Semantic Web?
Jakob Voss, software developer at the Head Office of the Common Library Network (VZG/GBV), opened the conference with a brief presentation on the semantic web.With reference to conventional bibliographic catalogue entries, he outlined the change from a web of linked documents to a web of linked data that characterises the semantic web.Various institutions in the academic environment already publish their data as linked data.Voss pointed out that currently libraries do not use all the opportunities offered by the web, hiding the contents of their databases in the 'deep web'.A lack of stable URLs often hindered linking to an entry in a library catalogue by traditional means.
Linked data, on the other hand, offer new opportunities for the division of labour, said Voss.It will be possible for libraries to link bibliographic data of a publication provided by its publisher to other data, such as the name authority file (PND) or the exact location in the library.In this way data published by other persons or libraries about a particular publication can be re-used.The library is able to present selected information to its users on a website customised for their needs.The benefits of such scenarios of use for linked data are obvious.Because the web itself is the database, the need for specialist software and data conversions will diminish.The laborious collection of data in proprietary databases and duplicating efforts in similar institutions will become redundant, because the relevant information will be gathered from various sources.The library will publish only those data itself that are not yet available elsewhere.

Why Should Libraries Engage in the Semantic Web?
According to Stefan Gradmann, Professor of Knowledge Management at the Humboldt University in Berlin (HU Berlin), libraries should engage in the semantic web in order to remain visible at all in the future knowledge society.In addition, libraries could develop attractive and innovative services for their users, such as improved search functionalities, by building on semantic web technologies.Libraries in particular could also make specific contributions to the semantic web.In this context Gradmann reminded the audience that Tim Berners-Lee, the inventor of the web and a pioneer of the semantic web, had named 'Catalogs on the web' a killer application of the semantic web as far back as 2000.In contrast to many other database providers, libraries possess reliable metadata and context data (such as name or corporate body authority files, classifications and thesauri).These are highly attractive for linking with other data.For the transfer onto the semantic web a suitable standard is already available in SKOS which was developed specifically for classifications and thesauri.For other aspects insufficiently considered until now, such as how to deal with the change of meaning or the long-term availability of information and knowledge on the semantic web, libraries could contribute their experience.
In agreement with Gradmann, Dr. Bernhard Haslhofer, research assistant at the Department of Distributed and Multimedia Systems of Wien University, saw great potential in the incorporation of classical mechanisms of information organisation by libraries (identifiers, metadata, controlled vocabularies) as 'open linked data' into the semantic web.'Linked data' is an attempt to continue these mechanisms on the web.Libraries might even be able to contribute a building-block to a free and community-driven alternative to commercial applications such as Google books.A necessary prerequisite for this would be that libraries no longer keep their data in closed data silos, but provide them freely on the internet under an appropriate licence.
In his lecture Jürgen Kett (German National Library, DNB) asserted that universities, libraries, museums and archives are integral parts of the semantic web, as they give it the necessary reliability and stability.Timo Borst et al.

What is Happening in Other Libraries?
Several contributions showed that linked data now provide a suitable technology for turning the semantic web into a reality for libraries.Ed Summers, software developer at the Library of Congress (LoC) and responsible for the LoC Subject Headings, which represent the largest linked data stocks in libraries so far, referred to the many linkage options and applications of linked data.Headings from different vocabularies could be correlated to provide the basis for better search options.Press clippings from a project such as 'Chronicling America: Historic American Newspapers' could be linked to pictures on 'Flickr' and geographic coordinates, thus illustrating and locating the event.Anders Söderbäck from the Swedish National Library showed how a classical library catalogue could be transferred to linked open data and subsequently to a modern web application and demonstrated what part semantic web technologies could play in this process.Felix Ostrowski, IT developer at the hbz, delineated a model of deep integration of ontologies directly into the programming language, using repository software as his example for 'linked applications'.André Hagenbruch of the University Library Bochum presented a project for a linked data-based library portal intended to integrate not only bibliographic data, but also administrative information such as personnel data or opening hours.she demonstrated the integration of DBPedia data into browser, search and portal interfaces, calling it an important node for linked data because of the great number of data and links to other linked data sets.
Elena Semanova, researcher and independent ontology expert, offered insight into another pillar of the semantic web: ontologies, their connection with natural speech and their possible uses.

What Could be the First Steps Towards the Semantic Web?
Anders Söderbäck said that it could be very useful for linked data projects to assume a different perspective and regard the library's catalogue as a network.A fundamental step in the direction of the semantic web would then be the legally safeguarded publication of their data as linked data.Joachim Neubert, IT developer at the ZBW (German National Library of Economics), summarised practical experiences from the linked data projects of the ZBW and gave precise tips on how to start on the semantic web.He recommended picking a manageable field and dataset.Licensing and attributioning problems should be addressed early.Data storage and maintenance as well as the library's work routines should remain unaltered as far as possible.
As Jakob Voss described in his opening lecture, the use of stable HTTP URIs (URLs) for catalogue entries and authority files is a basic rule that libraries should adhere to when publishing linked data.According to the rules for linked data as laid down by Tim Berners-Lee, phrasing an offer of relevant information to these URIs in the semantic web standards RDF (Resource Description Framework) or SPARQL should be the next step.For this, ontologies could be used describing what kinds of objects there are, their attributes and the relations between them.The last step would be linkage to external URIs in order to open up new resources.
When publishing linked data, libraries should apply open standards already available and help to develop these further, instead of relying on in-house developments.Several contributors stressed the importance of this rule.Gradmann mentioned that Europeana switched from a planned in-house development to the re-use of already existing standards.For ontologies, several quasi-standards are available in SKOS, Dublin Core, FRBR, FOAF (Friend of a friend, an ontology for the description of persons and social networks) or OAI-ORE (Open Archives Initiative -Object Reuse and Exchange).The 'Bibliographic Ontology' (bibo) presented by Jakob Voss in another lecture could develop into another standard.Presently available as version 1.3, this ontology is based on existing vocabularies such as Dublin Core or FOAF and is in continuous development through a community of currently around 150 people.It can be used to describe types of documents, compilations and events, as well as the publication status.Since at present there is no other ontology for bibliographic data in RDF as extensive, well documented and openly discussed, there was no alternative to 'bibo' for transferring bibliographic data onto the semantic web, according to Voss.In this context he asked a provocative question: will 'bibo' supersede bibliographic data formats?Svensson contended in his lecture that 'bibo' concentrated too much on citation management and therefore put RDA (Resource Description and Access) at the centre of the DNB's planning.His conclusion was that there is no optimal ontology for library data.
Neubert cited a number of practical tips, tools and tutorials for the creation of linked data.He made recommendations on how to guarantee the persistence of URIs and how to design them practically.He also reported his positive experience with mailings lists, where the linked data community could be invited to give feedback on one's own applications.
Finally, libraries should take great care to publish their data with the least possible restrictions imposed by licences in order to ensure widest Timo Borst et al. re-usability.This point was made most emphatically by Patrick Dankowski, emerging Technologies Librarian at CERN, in his passionate plea for 'free data': linked data can only unfold the whole power of their network effects if the data are published under a free licence.Danowski recommended posting them as 'public domain' data.

What Next?
During the course of SWIB09 it became evident that many questions and problems remain.Some of these were summarised by Söderbäck: The present library system and the prevalent bibliographic mindset are both just as unsuited to linked data as the current legal system.But Söderbäck also entertained hopes that 'All this will change …'.
Jürgen Kett moderated a final discussion round gathering ideas on 'What next?'.One suggestion among others was to install a communication platform within the framework of the semantic web activities of the W3C in order to continue the exchange of information and ideas on linked data projects.Many also expressed a need for a repeat conference in 2010.

Fig. 1 :
Fig. 1: Ed Summers of the Library of Congress.