Abstract:
The recent developments in artificial neural networks has brought significant improvement in Automatic Speech Recognition (ASR). However the performance of the neural network based models mostly depends on the availability of large amounts of data and computational resources. When there is limited amount of in-domain data, acoustic and language model adaptation methods are used. These methods utilise large amount of out-of-domain data as well as limited in-domain data while learning the parameters of the model. This work explores different adaptation methods in neural network based ASR systems developed for spoken lecture processing in English and in Turkish. We mainly investigate speaker adaptation, acoustic condition adaptation and effect of both adaptations together with limited amount of spoken lecture data. We show that building a source model with out-of-domain data and adapting this model with limited in-domain data yields improvement in performance both in hybrid acoustic model based ASR systems and in end-to-end ASR systems.