High Level Speaker Specific Features as an Efficiency Enhancing Parameters in Speaker Recognition System

Satyanand Singh, Pragya Singh


In this paper, I present high-level speaker specific feature extraction considering intonation, linguistics rhythm, linguistics stress, prosodic features directly from speech signals. I assume that the rhythm is related to language units such as syllables and appears as changes in measurable parameters such as fundamental frequency (  ), duration, and energy. In this work, the syllable type features are selected as the basic unit for expressing the prosodic features. The approximate segmentation of continuous speech to syllable units is achieved by automatically locating the vowel starting point. The knowledge of high-level speaker’s specific speakers is used as a reference for extracting the prosodic features of the speech signal. High-level speaker-specific features extracted using this method may be useful in applications such as speaker recognition where explicit phoneme/syllable boundaries are not readily available. The efficiency of the particular characteristics of the specific features used for automatic speaker recognition was evaluated on TIMIT and HTIMIT corpora initially sampled in the TIMIT at 16 kHz to 8 kHz. In summary, the experiment, the basic discriminating system, and the HMM system are formed on TIMIT corpus with a set of 48 phonemes. Proposed ASR system shows 1.99%, 2.10%,  2.16%  and  2.19 % of efficiency improvements compare to traditional ASR system  for , ,  and  of 16KHz TIMIT utterances.


Automatic Speaker Recognition (ASR) Gaussian Mixer Model (GMM) Deep neural networks (DNN) Mel-frequency Cepstral Coefficients (MFCC) Confidence Measure (CM)

DOI: http://doi.org/10.11591/ijece.v10i4.pp%25p
Total views : 16 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.