Abstract:
State-of-the-art speech recognition and language processing systems widely use data-driven methods. These methods require large transcribed speech and annotated text corpora. The success of these systems greatly depends on the amount of the training data. Need for transcribed speech makes transcription an important component of every system employing statistical methods. Manual transcription is an expensive and slow task. Computers may do the same task much faster but with more errors. Computer Aided Transcription is a combination between these two methods. The output lattices of an ASR engine, which contain hypotheses about the utterances to be transcribed, are transformed into letter-based, deterministic, weighted finite-state acceptors. These transformed lattices are combined with a letter-based N-gram language model trained on a text corpus similar in content to the speech data. The combined model is used as the language model of the open source graphical text entry application Dasher, developed at the University of Cambridge. Lattice expansion methods are used to increase the performance of the combined model. It is shown that combining the models at letter level performs better than a letter-based N-gram model used as the only language model and the model built by combining the transformed lattices and letter-based N-gram model at sentence level.