High Level Speaker Specific Features as an Efficiency Enhancing Parameters in Speaker Recognition System

Satyanand Singh


In this paper, I present high-level speaker specific feature extraction considering intonation, linguistics rhythm, linguistics stress, prosodic features directly from speech signals. I assume that the rhythm is related to language units such as syllables and appears as changes in measurable parameters such as fundamental frequency (  ), duration, and energy. In this work, the syllable type features are selected as the basic unit for expressing the prosodic features. The approximate segmentation of continuous speech to syllable units is achieved by automatically locating the vowel starting point. The knowledge of high-level speaker’s specific speakers is used as a reference for extracting the prosodic features of the speech signal. High-level speaker-specific features extracted using this method may be useful in applications such as speaker recognition where explicit phoneme/syllable boundaries are not readily available. The efficiency of the particular characteristics of the specific features used for automatic speaker recognition was evaluated on TIMIT and HTIMIT corpora initially sampled in the TIMIT at 16 kHz to 8 kHz. In summary, the experiment, the basic discriminating system, and the HMM system are formed on TIMIT corpus with a set of 48 phonemes. Proposed ASR system shows 1.99%, 2.10%,  2.16%  and  2.19 % of efficiency improvements compared to traditional ASR system for and of 16KHz TIMIT utterances.


Automatic Speaker Recognition (ASR); Gaussian Mixer Model (GMM); Deep neural networks (DNN); Mel-frequency Cepstral Coefficients (MFCC); Confidence Measure (CM)

Full Text:


DOI: http://doi.org/10.11591/ijece.v9i4.pp2443-2450
Total views : 150 times

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

ISSN 2088-8708, e-ISSN 2722-2578