A Novel automatic voice recognition system based on text-independent in noisy environment

Motaz Hamza, Touraj Khodadadi, Sellappan Palaniappan


Automatic voice recognition system aims to limit fraudulent access to sensitive areas as labs. Our primary objective of this paper is to increase the accuracy of the voice recognition in a noisy environment of the Microsoft Research (MSR) identity toolbox. The proposed system will ask the user to speak into the microphone then it will match an unknown voice with other human voices existing in the database using a statistical model, in order to grant or deny access to the system. Accordingly, voice recognition is done in two steps: training and testing. During the training, a Universal Background Model as well as a Gaussian Mixtures Model: GMM-UBM models are calculated based on different sentences pronounced by the human voice (s) used to record the training data. Then the testing of the voice signal in a noisy environment is done by calculating the Log Likelihood Ratio of the GMM-UBM models in order to classify the user's voice. However, before testing noise and de-noise methods are applied, as well as we investigate different MFCC features of the voice to determine the best feature possible as well as noise filter algorithm that will improve the performance of the automatic voice recognition system.


Automatic Voice Recognition (AVR), Microsoft Research identity toolbox (MSR), Gaussian Mixture Model (GMM), Universal Background Model (UBM), Mel Frequency Cepstral Confections (MFCC’s)


S. Memon, N. C. Maddage, M. Lech, and N. Allen, "Effect of Clinical Depression on Automatic Speaker Verification," World Academy of Science, Engineering and Technology, International Journal of Electronics and Communication Engineering, vol. 3, no. 2, pp. 1-4, 2016.

C. Yu, G. Liu, S. Hahm, and J. H. Hansen, "Uncertainty propagation in front end factor analysis for noise robust speaker recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 2014: IEEE, pp. 4017-4021.

F. K. Soong, A. E. Rosenberg, B. H. Juang, and L. R. Rabiner, "Report: A vector quantization approach to speaker recognition," Bell Labs Technical Journal, vol. 66, no. 2, pp. 14-26, 1987.

J. S. Mason, J. Oglesby, and L. Xu, "Codebooks to optimise speaker recognition," in First European Conference on Speech Communication and Technology, pp. 1267-1270, 1989.

P. Mishra, "A vector quantization approach to speaker recognition," in Proceedings of the International Conference on Innovation & Research in Technology for sustainable development (ICIRT 2012), 2012, vol. 1, pp. 152-155.

V. N. Vapnik, "An overview of statistical learning theory," IEEE transactions on neural networks, vol. 10, no. 5, pp. 988-999, 1999.

V. Vapnik, The nature of statistical learning theory. Springer science & business media, 2013.

V. Wan and W. M. Campbell, "Support vector machines for speaker verification and identification," in Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, 2000, vol. 2: IEEE, pp. 775-784.

S. Fine, J. Navratil, and R. A. Gopinath, "A hybrid GMM/SVM approach to speaker identification," in Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on, 2001, vol. 1: IEEE, pp. 417-420.

T. Merlin, J.-F. Bonastre, and C. Fredouille, "Non directly acoustic process for costless speaker recognition and indexation," in COST-254 International Workshop on Intelligent Communication Technologies and Applications, with emphasis on Mobile Communications, pp. 29, 1999.

G. F. Cooper and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data," Machine learning, vol. 9, no. 4, pp. 309-347, 1992.

D. Heckerman, "A tutorial on learning with Bayesian networks," in Learning in graphical models: Springer, 1998, pp. 301-354.

R. P. Stapert and J. S. Mason, "A segmental mixture model for speaker recognition," in Seventh European Conference on Speech Communication and Technology, pp. 2509-2512, 2001.

S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 2, pp. 254-272, 1981.

J.-F. Bonastre, P. Morin, and J.-C. Junqua, "Gaussian dynamic warping (gdw) method applied to text-dependent speaker detection and verification," in Eighth European Conference on Speech Communication and Technology, 2003.

L. Burget, O. Plchot, S. Cumani, O. Glembek, P. Matějka, and N. Brümmer, "Discriminatively trained probabilistic linear discriminant analysis for speaker verification," in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011: IEEE, pp. 4832-4835.

S. Cumani, N. Brümmer, L. Burget, P. Laface, O. Plchot, and V. Vasilakakis, "Pairwise Discriminative Speaker Verification in the ${rm I} $-Vector Space," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 6, pp. 1217-1227, 2013.

J. H. Hansen and T. Hasan, "Speaker recognition by machines and humans: A tutorial review," IEEE Signal processing magazine, vol. 32, no. 6, pp. 74-99, 2015.

I. Stefanus, R. J. Sarwono, and M. I. Mandasari, "GMM based automatic speaker verification system development for forensics in Bahasa Indonesia," in Instrumentation, Control, and Automation (ICA), 2017 5th International Conference on, 2017: IEEE, pp. 56-61.

L. M. Yee and A. M. Ahmad, "Comparative study of speaker recognition methods: Dtw, gmm and svm,", Retrieved from https://comp.utm.my/wp-content/uploads/2013/04/Comparative-Study-of-Speaker-Recognition-Methods-DTW-GMM-and-SVM.pdf, 2007.

P. M. Chauhan and N. P. Desai, "Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter," in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on, 2014: IEEE, pp. 1-5.

DOI: http://doi.org/10.11591/ijece.v10i4.pp%25p
Total views : 4 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.