First OCR tests terminated

The ZEDHIA project team is enthusiastic about the low error rate of the first full text results.


The OCR (optical character recognition) tests carried out mid-May for four selected Compass volumes from 1882, 1906, 1928 and 1930 showed an unexpectedly good result. Both Treventus Mechatronics GmbH as well as the ZEDHIA project team verified the generated full text separately from each other and arrived at the positive conclusion that on the basis of today's state of the art no better result would have been achieved in an automatic process. "The surprising high quality of the OCR results is a big step forward in our project because ensuring the quality of the contents required for ZEDHIA can be achieved in an easier and faster way", says Mag. Nikolaus Futter, managing director of Compass-Verlag GmbH. 

Many parameters make the full text not completely free of errors: special characters not possible to translate automatically, because not available in Unicode, bad print or the use of Fraktur fonts are only a few of the reasons why today's OCR software cannot avoid a certain error rate. 

From now on, the ZEDHIA project team will intensively be involved with the enhancement of the full text which will also contain a search function of the ZEDHIA portal.  Both manual rectification as well as the use of algorithms will be used for a rule based and systematic post-rectification.