
Typically, hand-crafted features are extracted from images for further processing tasks. These features are then passed to a Machine Learning algorithm to learn specific models. These features are generally difficult to design and are poorly adaptable from one data set to the other. This thesis investigates the use of machine learning approaches to generate these features instead of relying on handcrafted algorithms. This thesis attempts to answer two questions: are such learned features outperforming regular handcrafted features and what are the prerequisites and difficulties to perform the learning. More specifically, this thesis investigates the use of Restricted Boltzmann Machines (RBMs) and Convolutional RBMs (CRBMs) to learn features from images. Two tasks have been defined to develop the systems. To support these experiments, a complete machine learning framework has been implemented on top of an optimized matrix computation backend.
Sudoku Recognition. This first experiment used a Deep Belief Network (DBN) that was trained in a unsupervised manner to recognize Sudoku images taken from Swiss newspapers. The goal was to study the impact of unsupervised pretraining using RBM with, among other things, an analysis of the capability to learn features on mixed printed and handwritten inputs.
Handwritten Keyword Spotting. A complete model for Keyword Spotting is designed. This model uses feature extracted using a CRBM training in a purely unsupervised manner. These features are then passed to a DTW algorithm or to a HMM in order to perform keyword spotting. The learned features have demonstrated to outperform significantly the handcrafted state-of-the-art features in most configurations.
