One of the most important research topics in the field of palaeography is the identification of the different scribes who participated in the writing process of a medieval book. Using traditional palaeographic tools, a palaeographer spends a lot of time reading, measuring and comparing thousands of letters or graphic signs. The aim is to evaluate different characteristics, such as height or width of letters, distance between characters, angles of inclination, number and type of abbreviations etc., which allow a reliable identification of the scribes who contributed to the production of a given manuscript. Despite the growing scientific interest that has been observed in recent years in the use of computer techniques applied to palaeographic research, a general agreement has not yet been reached among researchers, either about the effectiveness of automatic analysis tools, or on the features that should be considered to perform such an analysis. However, in the context of a highly standardized school, the use of some basic page layout features can be very useful for automatically identifying the presence of different hands. In this context, the aim of our study is to verify whether it is possible to strongly reduce the amount of data a palaeographer must analyse manually, in an attempt to answer the following question: what is the minimum size of the training set that allows a classification system to identify the different scribal hands reliably? To this purpose, we have considered two well-known and highly efficient classification techniques, progressively varying the size of the training set and comparing the corresponding classification results. To improve the classification reliability, we have also introduced a multi-expert classification architecture, enabling an easy implementation of a reject option. The experimental results, performed on two large sets of digital images extracted from two entire 12th-century Bibles, show that using only a few pages of these bibles as a training set, it is possible to identify automatically the scribal hands in the remaining pages with great reliability.
What is the minimum training data size to reliably identify writers in medieval manuscripts?
Cilia N. D.;De Stefano C.
;Fontanella F.;Molinara M.;Scotto di Freca A.
2020-01-01
Abstract
One of the most important research topics in the field of palaeography is the identification of the different scribes who participated in the writing process of a medieval book. Using traditional palaeographic tools, a palaeographer spends a lot of time reading, measuring and comparing thousands of letters or graphic signs. The aim is to evaluate different characteristics, such as height or width of letters, distance between characters, angles of inclination, number and type of abbreviations etc., which allow a reliable identification of the scribes who contributed to the production of a given manuscript. Despite the growing scientific interest that has been observed in recent years in the use of computer techniques applied to palaeographic research, a general agreement has not yet been reached among researchers, either about the effectiveness of automatic analysis tools, or on the features that should be considered to perform such an analysis. However, in the context of a highly standardized school, the use of some basic page layout features can be very useful for automatically identifying the presence of different hands. In this context, the aim of our study is to verify whether it is possible to strongly reduce the amount of data a palaeographer must analyse manually, in an attempt to answer the following question: what is the minimum size of the training set that allows a classification system to identify the different scribal hands reliably? To this purpose, we have considered two well-known and highly efficient classification techniques, progressively varying the size of the training set and comparing the corresponding classification results. To improve the classification reliability, we have also introduced a multi-expert classification architecture, enabling an easy implementation of a reject option. The experimental results, performed on two large sets of digital images extracted from two entire 12th-century Bibles, show that using only a few pages of these bibles as a training set, it is possible to identify automatically the scribal hands in the remaining pages with great reliability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.