The form processing systems commercially available include a verification step during which a human operator verifies the output provided by the system to ensure 100% accuracy. In order to reduce the time and the cost of such a stage, the OCR engine incorporated into the system provides a reliability measure of the classification to be used for implementing a reject option: in this way only rejected samples are passed to the verification stage. Most of the strategies for designing such a reject option consider that the source of classification errors are within the OCR engine. Such an assumption becomes less reasonable as the forms become less structured, as in case when boxes are provided for the entire data field and not only for isolated characters. Under these circumstances, we investigate to which extent the reliability measure provided by an OCR engine designed to deal with boxed isolated characters can be used to detect both segmentation and classification errors. The experimental results, obtained on a large data set of forms currently in use by a large organization, show that the proposed method successfully achieves its aim. It represents a powerful tool for the system manager to plan system enhancement as the volume of forms containing less constrained data fields increases.

Rejecting both segmentation and classication errors in handwritten form processing

DE STEFANO, Claudio;FONTANELLA, Francesco;SCOTTO DI FRECA, Alessandra
2014-01-01

Abstract

The form processing systems commercially available include a verification step during which a human operator verifies the output provided by the system to ensure 100% accuracy. In order to reduce the time and the cost of such a stage, the OCR engine incorporated into the system provides a reliability measure of the classification to be used for implementing a reject option: in this way only rejected samples are passed to the verification stage. Most of the strategies for designing such a reject option consider that the source of classification errors are within the OCR engine. Such an assumption becomes less reasonable as the forms become less structured, as in case when boxes are provided for the entire data field and not only for isolated characters. Under these circumstances, we investigate to which extent the reliability measure provided by an OCR engine designed to deal with boxed isolated characters can be used to detect both segmentation and classification errors. The experimental results, obtained on a large data set of forms currently in use by a large organization, show that the proposed method successfully achieves its aim. It represents a powerful tool for the system manager to plan system enhancement as the volume of forms containing less constrained data fields increases.
2014
978-1-4799-4335-7
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11580/36383
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
social impact