Oury Clément (Bibliothèque nationale de France),
An increasing part of our intellectual and cultural production has gone online. The Web has now become one of the main, and sometimes the main publishing method for all kinds of documents that were previously available on physical media: newspapers, books, governmental publications, audiovisual materials, maps or software? while it has also developed novel ways of publishing content, such as blogs or social networks. But the web is a volatile space where content may quickly disappear. As documents on the web constitute a growing part of our cultural heritage, and as the memory of online publications is at risk, heritage institutions have developed over the past dozen years principles, methods and tools to archive this transient media.
National and university libraries, national or local archives, research laboratories seeking preservation of web content mostly rely on ‘crawling’ technologies to harvest online and store websites, hence generating ‘web archives’. They use dedicated software, called robots or crawlers, which browse the web to automatically identify and retrieve web documents, according to the crawling policy defined by the institution.
Even though the crawling tools are mature, these institutions are still lacking a conceptual framework to give access and exploit this novel kind of documents. The first question that remains unclear relates to their very nature: are web archives copies of web publications? Could they be assimilated to traditional archives? Are they completely different documents that need specific analysing tools and methods? In fact, the second question derives from the first one: what kind of tools, services or information (eg metadata) should heritage institutions provide to researchers in order to help them adequately understand and use web archives? Especially, are current diplomatics relevant for them? For example, how to apply the traditional criticism of sources (internal and external) to the documents generated by automated crawling software? Another example could be the use of web archives in a legal context: in what cases may web archives be used in lawsuits (as has already occurred in France, for example), and with what kind of precautions?
This presentation will be based on the experiences (in terms of collections, services and user requirements) of the member institutions of the International Internet Preservation Consortium (IIPC), a consortium grouping more than forty heritage or research organizations (libraries, archives, not for profit foundations) worldwide. It will use the current reflexions and research on diplomatics to see how far they can be applied to web archives, and also to identify where the experience gained by heritage professionals and researchers on diplomatics can help web archiving institutions to develop concepts, procedures, methods and tools to use web archives.
This proposal is intended for individual papers or for a traditional session.