The Algorithms That Automatically Date Medieval Manuscripts
Around a million medieval documents have no date making their historical significance difficult to quantify. But automated computer techniques look set to revolutionise the work for historians
An important aspect of any society is the way it keeps records of property and land transactions so that ownership can be properly established and disputes resolved. In medieval Britain, this process was largely carried out by religious or royal institutions which recorded transactions in documents, written in Latin, called charters.
Today, more than a million charters survive either as originals or more often as ancient copies. They provide a remarkable insight into the pressures at work in medieval politics, economics and society between the tenth and fourteenth centuries in England.
For example, historians can use these documents to study the rise and fall of military and religious organisations. A good example is the Order of the Hospital of Saint John of Jerusalem, a religious and military organisation set up after the western conquest of Jerusalem in the 11th century (the First Crusade).
Historians say the charters clearly show how the organisation became militarised in response to the call for a Second Crusade in 1145, triggered by Muslim forces recapturing various towns in the region.
Clearly, these documents have huge historical value but there is a problem: most charters are not dated, particularly during the period of Norman rule between 1066 and 1307.
The problem for historians is to find some way of time-ordering these documents. But it is no easy task.
Today, Gelila Tilahun and colleagues at the University of Toronto discuss this challenge and outline their new statistical computer techniques that they use to tackle the problem.
Their approach is to use a subset of some 10,000 charters that are dated and to look for changes in language over time that could be used to date other documents. For example, Tilahun and co say that the phrase “amicorum meorum vivorum et mortuorum”, which means “of my friends living and dead”, was popular between the years 1150 and 1240 but not at other times. And the phrase “Francis et Anglicis”, which is a form of address meaning “to French and English”, was phased out when England lost Normandy to the French in 1204.
However, the statistical approach is much more rigorous than simply looking for common phrases. Tilahun and co’s computer search looks for patterns in the distribution of words occurring once, twice, three times and so on. “Our goal is to develop algorithms to help automate the process of estimating the dates of undated charters through purely computational means,” they say.
This approach reveals various patterns which they then test by attempting to date individual documents in this set. They say the best approach is one known as the maximum prevalence technique. This is a statistical technique that gives a most probable date by comparing the set of words in the document with the distribution in the training set.
Tilahun and co say their approach also has other applications. For example, the same technique could be used to work out authorship and to weed out forgeries, of which there are known to be a substantial number.
So how well does it work in practice? These guys finish their paper with a fascinating anecdote about a medieval English charter that was discovered in a drawer at the library of Brock University near Niagara Falls.
The charter lacked a data so various historians attempted to work out when it was written. The first estimates pointed to the 14th century but these were later revised to the 13th century. Eventually, by comparing the charter to other records, one academic pinned it down to a date between 1235 and 1245.
Inspired by the media interest in this charter, Tilahun and co ran the document through their automated maximum prevalence procedure. “The date estimate we obtained was 1246,” they say, with just a little hint of pride. Not bad!
Ref: arxiv.org/abs/1301.2405: Dating Medieval English Charter