Morph: Learning to learn: An Adaptive Reading System using a High-Performance Morphed-Image Correlator
08 / 2006 - unknown
Current methods for handwriting recognition are unsuitable for use in massive collections of historical documents. All statistical techniques require large amounts of labeled word images with their 'ASCII' ground truth. The manual labeling of text ground truth of image sections needs to be replicated for each document type and historical period due to the extraordinary variation in writing styles. Since optical character recognition of unconstrained-style handwritten documents is not possible, the digitization process of large and important document collections is in a state of deadlock. Current computing power, notably the availability of the Blue Gene supercomputer, allows for a new way of using machine-learning technology and non-statistical brute-force matching methods. Using high-performance computing, it will be possible to learn to identify similarities in text passages. Using a bootstrapping approach with limited-effort human intervention, relevant keywords and phrases in the text can be learned. Subsequently, adapted information-retrieval (IR) techniques can be used to search in a large handwritten document collection. Single-processor experiments yield promising results but the experimentation process takes too much time on a single-processor. The manual construction of optimal processing recipes for a given problem is cumbersome. High-performance computing will help out under the condition that principled approaches for optimizing the processing pipeline exist.