Étude de corpus, diplomatique contemporaine et technologies numériques : quelques-uns des premiers résultats du projet Mémoloi

Clavaud Florence (École nationale des chartes, Paris; EA 3624 (Centre Jean-Mabillon)), Brunilde Renouf (École nationale des chartes, EA 3624), Christine Nougaret (École nationale des chartes, EA 3624)
Within the framework of Mémoloi, a research project on the making and evolution of the French legislation on cultural heritage from 1789 to our days, led with several partners among which the Centre national de la recherche scientifique (« Law on cultural heritage » CECOJI research team), the École nationale des Chartes (Jean Mabillon Centre research team) began to digitize an archival corpus which should help to understand that history. Our communication presents the problems that this research topic raises and some of the trails that we have been investigating since November 2012.
The digital corpus, which should virtually gather, thanks to their description and images, a selection of public archival items kept in various fonds and collections by several repositories, will eventually include more than ten thousand handwritten, typed or printed units (department notes, correspondence, reports, drafts and bills; Parliament proceedings, amendments and debates; acts, decrees, decisions, etc.).
The current works concentrate on an subset of this corpus, which is not entirely built yet and concerns the legislation on historical monuments.
The main objective during this first stage of the project is to understand, through all the remaining witnesses, how these acts, decrees and decisions have been made and transmitted.
At first, we worked to identify the archival provenance of the selected items and their documentary context. The results allowed us to enrich the existing descriptive metadata, with which we could design and initiate a structured, searchable and browsable finding aid, based on XML/EAD 3 standard.
Besides, in order to consolidate our knowledge about the authors of the texts, and more widely about the entities (be they corporate bodies or persons) which played a role in their making and transmission, we decided to create authority records about them and to encode them in XML/EAC-CPF. These records will be integrated to the digital corpus and be managed and processed like its other components.
Furthermore, we started to create vocabularies, including accurate definitions, historical and scope notes, about the document types, document statuses, interdocumentary relations (be they established through explicit references in the texts, or be they implicit – genetic or legal), and the entities’ roles. We intend to encode these vocabularies in XML/SKOS and align them with other ones.
Finally, the searchers will need to process the text of a lot of these archival items (they would like to query these texts accurately, to analyze their content and structure, eventually to release a digital edition; later on, to extract the legal terms from them and study how the legal concepts beyond appeared, spread and evolved). In order to reach this goal, we chose to define a markup model based on the XML/TEI standard, from which we will create the elements needed, with a special interest in diplomatic discurse, external features such as extra notes, and in some semantic components (links, named entities, legal terms).
Each of these tasks uses the concepts and methods of diplomatics applied to contemporary documents, and our kwowledge about the complex French institutional and administrative history – and we hope they will either enhance them or provide them with new tools.
Before the end ot this first stage (in April 2014), we plan :
1. to move forward on all these tasks, particularly to encode manually a small but representative and consistent selection of the texts related to historical monuments ;
2. to use semantic Web technologies and build a conceptual model for these documents and entities, their digital surrogates, their properties and relations. In other words a documentary ontology adapted to the corpus will be designed. We will take into account several other models and initiatives, such as the CIA project for an ontology of archives;
3. to generate RDF triples, conforming to this ontology, from the existing XML metadata, texts and authority records, then define a user interface showing the digital corpus as a graph.
If the project gets enough funding to continue, it will be time to build tools for it. It could also be connected with other recent initiatives (such as authority records project in French national Archives, or digital-native legislation and its management in French government departments).

