Özet:
The aim of keyword search (KWS) is to locate written queries in large amount of audio data such as archived news broadcasts, audio/video lectures, recorded customer call-center data or conversational speech. State of the art KWS approaches are based on indexing automatic speech recognition (ASR) lattices. However, for languages having only a limited amount of transcribed audio, the ASR performance decreases which in turn reduces the KWS performance. Another problem with ASR based KWS systems is searching for out-of-vocabulary (OOV) keywords which are not covered by the ASR vocabulary. One common approach is expanding the keyword using a confusion model (CM) and searching for similar words along with the original. In this work, the KWS index is generated using symbolic representations of the data instead of ASR lattices. These symbols are obtained by encoding the search data posteriorgram which is generated using the deep neural network (DNN) output of the ASR system. In the experiments performed on the low resource language datasets of the IARPA Babel Program, we show that when combined with existing ASR lattice based KWS systems, the proposed system improves the KWS performance measured in terms of term weighted value (TWV), especially for OOV queries. In order to handle OOV queries, a discriminative approach for training the CM is also introduced which directly aims at maximizing the TWV for OOV queries. We explore the in uence of discriminative training on both an existing ASR lattice based system and the symbolic index based system under low resource settings.