Abstract:
This thesis is motivated by the challange of high computational demand of the dynamic time warping algorithm, which is very prevalent for template based keyword search tasks. The dynamic time warping algorithm su ers from grand calculations of the cost matrix during the search process. To solve this problem, in this thesis, we present a pre- ltering step for the algorithm. We use phonetic posteriorgrams to represent the audio data and generate average posteriorgrams to represent the given text queries. Posteriorgram of the document is divided into segments, or submatrices, and are turned into supervectors, which are used to determine the possible candidates for query matching. Cosine distance is used for the distance measure to nd the similarity between the document supervectors and the query supervectors. Filtering is performed according to a threshold value, which speci es the amount of ltering. The other aim is to improve the performance of a large vocabulary continuous speech recognition based keyword search system using the proposed approach for posteriorgram based keyword search. Experimental results indicate that the proposed method reduces the computational complexity of the dynamic time warping algorithm without a signi cant loss and when combined with the baseline large vocabulary continuous speech recognition based keyword search system, it improves the performance for both the in-vocabulary and the out-of-vocabulary queries.