Abstract:
Speaker verification is one of the most challenging branches of biometric authentication. Covering a wide spectrum from security services to law enforcement, speaker veri cation systems are employed in phone banking, forensic audio analysis and access control applications. An important observation is that verification accuracies depend vastly on the amount of data and get easily affected by acoustic variations. This study investigates the effects of data duration, model size and session variability on text-independent speaker verification performance. We implement GMM/UBM and SVM supervector classiffiers to represent speaker characteristics and compare their results for various training and testing durations as well as model complexities. The in uence of speaker adaptation methods and kernel function selection over the verification accuracy is examined. A minority oversampling scheme is utilized in order to avoid the issue of class imbalance in SVMs. We also explore how session variability acts on error rates and resort to Nuisance Attribute Projection method for reducing acoustic mismatches between the training and test samples. Working on the CSLU Speaker Recognition Dataset, we present a comparative evaluation of speaker verification systems with limited and extensive data conditions.