Energy distribution in formant bands for arabic vowels

ABSTRACT


INTRODUCTION
The production of a vowel is characterized by a maximum opening of the vocal tract without constriction or noise production or silence. A periodic vibration of the vocal cords, which characterize voiced sounds, always accompanies this production [1]. The standard Arabic language has six vowels, three short (/a/, /i/, /u/) and three long (/a:/, /i:/, /u:/) [2]. These vowels differ from those of other languages (English, Spanish...) in terms of number and vocalic quantity.
The characterization of vowels can be performed in terms of time, frequency and energy. Kimiko Tsukada (2009) studied the time characterization of vowels. He presented a comparative study between long and short vowels in Standard Arabic, Japanese and Thai. He reported that the duration of long vowels represent the double of the short vowels duration. He also noticed that the ratio between the duration of short and long vowels differ significantly for the three languages [3]. Alghamidi (1998) conducted a comparative study between long vowels and short ones in terms of frequency for some Arabic dialects (Egyptian, Sudanese, Saudi). He determined that, for studied dialects, F1 and F2 formants of long vowels are different from those of short vowels. The vocalic triangle formed by long vowels includes the short ones [2], [4]. Alotaibi and Hussein (2009) analyzed the vowel formants of standard Arabic. They confirmed that the long vowels formants F1 and F2 are peripheral to those of short ones. They also showed that the values of the formants F1 and F2 help to classify the vowels: the vowel / a / and /a:/ are characterized by a high value of F1 (F1> 500Hz) and the vowel / i / and / i: / have a high value of F2 (F2>1500 Hz) [5]. Sawusch (1996) investigated the effects of duration on vowel perception in normal American-English speakers. He summarized that vowel duration was not a strong perceptual cue to vowel identity but was used by listeners when other sources of information were distorted [6]. Mohammad Abuoudeh and Olivier Crouzet that the vowel length systematically influences the locus equation data, and the variations of vowel length are associated with modifications of spectral configuration [7].
In this work, we carry out an acoustic study of Arabic long vowels compared to those shorter. The studied parameters are summarized in: vowel production time, its formants and the variation of energy contained in the F1 and F2 bands. This paper is organized as follows: we begin by describing the methods and tools used and the experiments carried out. Then we present and discuss the results and we close by a conclusion.

METHOD 2.1. Corpus
We constructed a corpus of Arabic language. It consists of short and long vowels. Five Moroccan speakers (three male and two female) were invited to pronounce syllables CV (C: consonant and V: vowel) with short and long vowels. We chose to work with isolated syllables in place of words to reduce the influence of other phonemes on the vowel studied. We can then expand freely the length of the vowel to examine his behavior. For the consonant C associated with the vowel V studied, we chose /A/: / ‫ء‬ / because its production induces minimal stress on the vocal tract. Table 1 shows the syllables of the corpus.

Formants extraction method
To construct our corpus, we used the vocal sounds process tool "Praat" to achieve our records in a noise-isolated room, with a sampling frequency of 22050 Hz. We used "Praat" to isolate and determine the duration of each vowel. We used linear predicting coding method "LPC" to extract the first four formants. Figure 1 shows the pre-treatments of the speech signal in order to extract the formants. For our experiments, the speech data was sampled to the frequency of 22050 Hz. All coefficients have been computed from pre-emphasised speech signal using 512 points Hamming windowed speech frames. Then the linear prediction coefficients are calculated. The LPC model supplies a smoothed spectral, the peaks of the spectral envelope correspond to the formants.

Energy formants
The speech sampled at 22050Hz is divided into time segments of 11.6 ms with an overlap of 9.6 ms. Each segment is Hanning windowed and followed by zero-padding. 512 point fast Fourier transform (FFT) is then computed. The magnitude spectrum for each frame is smoothed by a 20-point moving average taken along the time index n. From the smoothed spectrum X(n,k), peaks in two frequency formants (250-850 Hz and 750-2300 Hz) are selected as: Where the formant index b represent first and second formant (F 1 and F 2 ). The frequency index k ranges from the DFT indices representing the lower and upper boundaries for each formant. Then, for each frame, the normalized energy band was calculated by: Where E bn (n) is the normalized formant energy b in the frame n, E T (n) is the overall energy in the frame n and E b (n) is the formant energy b in the frame n. Figure 2 represent spectrograms of short and long vowels. It can be seen that both short and long vowels are voiced sounds even if the duration production of long vowels increases. To minimize the effect of the variability intra-speaker, we calculated the average of each formant for each speaker. We, then, set the average for the five speakers. We performed the same way to calculate the energy variation contained in F1 and F2 bands. Figure 3 show that the duration of long vowels represent the double of the short ones. This result is consistent with those of Alghamdi [2], Tsukada [3] Alotaibi [5].  Figure 4 summaries the variation of energy contained in F1 band with the increase of the production duration of the three vowels. It can be seen that for /a/ and /i/, the energy in F1 band increases when the duration of vowel production increases. This behavior is the opposite in the /u/ vowel. It is also shown that the energy contained in this band is lower for /a/ and higher for /i/. This behavior can be explained by the fact that the place of articulation of /i/ is in the back of the vocal tract (near to the vocal folds). The energy in F1 band is then more important. For /u/ vowel, we observed that the F1 band energy decreases with the increase of the /u/ production duration. This behavior is due to the limitation of space between the back of the tongue and palate if producing /u/ takes longer. Figure 5 shows the variation of energy in F2 band. We noticed that the /i/ production duration has no effect on the energy variation: no difference between short and long vowel is noted. For /a/ and /u/ vowels, the F2 band energy decreases rapidly when producing long vowels /a:/ or /u:/ and remains constant even when the duration of long vowels /a:/ and /u:/ increases. It can also be seen that the energy contained in this band is higher for /a/ due to its place of articulation in the back of the tongue: the energy in F2 band depend on the area between the teeth and the place of articulation. The expansion of this region during production of /a/ leads to more important energy in F2 band.

CONCLUSION
This study compares the long and short Arabic vowels in terms of production duration and energy distribution in F1 and F2 bands. The obtained results show that the long vowels are voiced sound even when their duration production increases in time. The comparison between short and long vowels in term of production duration reveals that long vowels are twice long than short vowels. For each vowel (/u/, /a/ or /i/), the energies contained in F1 and F2 bands vary when producing long vowels. When the production duration of long vowels increases, the F2 band energy remains constant for all vowels while the F1 band energy increases or decreases depending on the vowel produced.