The form processing systems commercially available include a verification step during which a human operator verifies the output provided by the system to ensure 100% accuracy. In order to reduce the time and the cost of such a stage, the OCR engine incorporated into the system provides a reliability measure of the classification to be used for implementing a reject option: in this way only rejected samples are passed to the verification stage. Most of the strategies for designing such a reject option consider that the source of classification errors are within the OCR engine. Such an assumption becomes less reasonable as the forms become less structured, as in case when boxes are provided for the entire data field and not only for isolated characters. Under these circumstances, we investigate to which extent the reliability measure provided by an OCR engine designed to deal with boxed isolated characters can be used to detect both segmentation and classification errors. The experimental results, obtained on a large data set of forms currently in use by a large organization, show that the proposed method successfully achieves its aim. It represents a powerful tool for the system manager to plan system enhancement as the volume of forms containing less constrained data fields increases.
Rejecting both segmentation and classication errors in handwritten form processing
DE STEFANO, Claudio;FONTANELLA, Francesco;SCOTTO DI FRECA, Alessandra
2014-01-01
Abstract
The form processing systems commercially available include a verification step during which a human operator verifies the output provided by the system to ensure 100% accuracy. In order to reduce the time and the cost of such a stage, the OCR engine incorporated into the system provides a reliability measure of the classification to be used for implementing a reject option: in this way only rejected samples are passed to the verification stage. Most of the strategies for designing such a reject option consider that the source of classification errors are within the OCR engine. Such an assumption becomes less reasonable as the forms become less structured, as in case when boxes are provided for the entire data field and not only for isolated characters. Under these circumstances, we investigate to which extent the reliability measure provided by an OCR engine designed to deal with boxed isolated characters can be used to detect both segmentation and classification errors. The experimental results, obtained on a large data set of forms currently in use by a large organization, show that the proposed method successfully achieves its aim. It represents a powerful tool for the system manager to plan system enhancement as the volume of forms containing less constrained data fields increases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.