Do we really need an OCR for medieval charters? On the limits and advantages of Digital Palaeography and Diplomatics.

Hotz Benedikt, Benjamin Schönfeld (Ludwig-Maximilians-Universität München)
Attaching the prefix “Digital” to any “traditional” discipline such as palaeography or diplomatics has not only become very popular over the last few years, but has led to interesting effects in the perception of the respective research projects among historians. Computers occasionally seem to be made some kind of a magic bullet where “traditional” methods have failed in the past or reach natural limits.
This leads to an inevitable overestimation or misunderstanding of what “historical computing” can actually do, respectively what it can reasonably do. We have experienced this when presenting our project “Schrift und Zeichen” to the public; further, other scholars involved in digital humanities have reported similar issues Hopes for OCR programs reliably handling handwritten texts, or for fully automated writer identification are just two examples of what is expected of such projects. However, there are both technical and diplomatic/palaeographical limits making it necessary to reconsider our claim on historical IT-technologies. Relying on the experiences with papal charters, some of the issues to be discussed here would be:

  • Amount, availability and especially quality of digital images of charters and thus statistical problems
  • Extensive amount of different writing styles, writer’s hands and other palaeographical problems such as linked (cursive) script. This is a distinctive issue for all palaeographical work on charters and administrative script in general, especially with regard to projects dealing with more canonized, artificial script in books.
  • Use of a potential OCR programme optimized for early or high medieval script: these texts will in most cases be handled way faster by an experienced editor (with regard to the time necessary to build up such a programme and the amount of preserved sources). Further, the above mentioned technical obstacles are considerably higher for the individual, cursive script of the renaissance and enlightenment era, if not impossibly high. Considering the amount of preserved texts, however, these are in fact the centuries any OCR would be strongly necessary to handle the sources in just basic terms.

Drawing such lines for digital diplomatics, it is as a consequence necessary to outline which advantages these technologies actually have, and where they can considerably assist the diplomatist or palaeographer. Or, plainly: what can computers really do what human scholars can’t do?

