Korkiakangas, Timo: Challenges for the Linguistic Annotation of an Early Medieval Charter Corpus

Posted by AA in Linguistical Statistics, Paper, Proposals, Thursday |

My Ph. D. research deals with the Latin noun declension system in a corpus of ca. 500 early medieval Tuscan private charters digitised from three diplomatic editions. I annotate the charters morphosyntactically using the Perseus Treebank online environment. A crucial point for any linguistic study of medieval charters is to be able to distinguish the less formulaic or “freer” parts of a charter (i.e. the improvised parts of the disposition etc.) from the completely formulaic ones: these parts present totally different linguistic realities. The very formulaicity of charter language has led the scribes to produce contaminational errors, while the “freer” parts are likely to reflect more closely the spoken language and the problems involved in its transcription in written form. I am currently looking for ways to distinguish between the different parts of the charters by tagging them on the grounds of a textometrics-based estimation of their relative formulaicity. I also hope to be able to exploit the results that a colleague of mine will obtain when processing the same charter material with similarity detection methods based on diffusion maps. A more detailed study on the formulae can be performed only through normalising all the word forms written in non-classical orthography, which may be a feasible task when all the words in my corpus will finally have been provided with their lemmatic and morphologic analyses.

Timo Korkiakangas
Ph. D. Student
Classical Philology
University of Helsinki, Finland
eMail: timo.korkiakangas@helsinki.fi

Comments are closed.