dc.description.abstract |
We designed a voice driven keyword spotter. To improve the success of the system, we made use of synthetically generated voice inputs in addition to natural voice inputs and used approximate string matching instead of exact string matching. Classical keyword spotters are mostly text driven. However, we have taken the input in the form of voice. Different people may pronounce the same keyword in different ways because effects such as gender, age, nationality, intonation, accent, emotional mood, environment, noise etc. play an important role on pronunciation. Even the samples of a keyword taken from the same person at different times may be different. Therefore, driving the keyword spotter with voice instead of text provides us with a source of variety. This variety increases the probability of spotting the keyword. Classical keyword spotters are mostly language dependent. In our spotter, many phoneme recognizers trained with different languages may be used in co-operation. We believe that, this ability of our spotter is highly likely to make it language independent. Even if a phoneme recognizer of only one language is used, it will make similar errors for both the input side and the search database side and the system may still have the chance of being language independent to some extent. As we take the input in voice format, we have the chance of collecting many samples of the keyword and producing their appropriate transformations. This ability of our spotter alleviates speaker dependency. |
|