Accuracy Analysis of Latin-to-Balinese Script Transliteration Method

ABSTRACT


INTRODUCTION
Balinese script writing, as one of Balinese cultural richness, is going to extinct. As reported by The Indonesia News Agency [1], Bali Governor, Made Mangku Pastika, said that although Balinese language education has been included in the educational curriculum at the school, but less and less people who are fluent in Balinese language. This is because for the simplicity, daily communication in Bali tend to use national language, i.e. Indonesian language, rather than the local language. Since long time, the less of users or speakers of Balinese language has caused concern over the threat of extinction of this local language.
The decreased use of the Balinese language has big influence to the decreased use of its Balinese script writing, as this research concerned about. Stern [2] said that "As with endangered species, the goal with languages should not be to wait until there are only a few remaining survivors and then place them under protection (i.e. make recordings of the last speakers). Rather, we should take the more sustainable path of preserving the diverse natural habitats where minority languages are spoken. This means taking a cultural, political and even economic approach to saving languages ... and starting as soon as possible." For preservation of Balinese language, more specifically on its Balinese script writing, another approach was taken, i.e. technological approach by this research. This research focused on accuracy analysis of Latin-to-Balinese script transliteration method on mobile application, as it eases people by providing handy transliteration learning on mobile device. Transliteration itself is the conversion of a text from one script to another [3]. This analysis using an existing method from few research in this area, i.e. method on mobile application that was called Transliterasi Aksara Bali (Balinese Script Transliteration) [4]. The analysis was based on Balinese script writing rules and examples by Sudewa [5], as a project script committee related to the proposal by Eversen and Suatjana [6] for encoding the Balinese script in ISO [7]. Not so many references in this research area. A work related to the same object, i.e. Balinese script, was conducted by Sudana et al. [8] but their work is on Augmented-Reality-based learning media application that was focused on learning on how to write Balinese script. On Latin-to-Balinese script transliteration research area, Sartini et al. [9] has developed a text-to-digital-image converter method. The output is the Balinese script represented by using pre-collected images that were captured from Bali Simbar font [10] display at word processor. Arimbawa et al. [11] has developed a Latin-to-Balinese script transliteration method where the output Balinese script pattern was configured to be written by the robotic system.
Complex rendering was demanded by Balinese script complex behaviours, includes: 1) Reordering and splitting; 2) Various placement and shape of diacritics; 3) Contextual shaping; and 4) Complex ligature construction. Table 2 no. 1-3 shows that complex rendering from several Balinese syllable examples, i.e.: 1) ba (U+1B29 Balinese letter ba); 2) be from ba + pangangge suara e (U+1B3E Balinese sign taling). Taling is placed on the left of syllable so that it is appeared as if it is written first and then followed by ba [14]. In fact, it is written later. This Balinese complex behaviour was called reordering; and 3) bo from ba + pangangge suara o (U+1B40 Balinese vowel sign taling tedung). Separated taling and tedung each written before and after the syllable. This Balinese script complex behaviour was called reordering and character splitting. Also, taling tedung is an example of a character that has several separated glyphs. Table 2 no. 4-7 shows various placement and shape of diacritics, i.e.: 4) di from da (U+1B24 Balinese letter da) + pangangge suara i (U+1B36 Balinese vowel sign ulu); 5) ding from da + i + pangangge tengenan ng (U+1B02 Balinese sign cecek). Ulu at di was located in the center above da, while ulu at ding was slightly shifted by cecek; 6) dě from da + pangangge suara ě (U+1B42 Balinese vowel sign pepet); and 7) děr from da + ě + pangangge tengenan r (U+1B03 Balinese sign surang). Pepet at dě was located in the center above da, while pepet at děr not only was slightly shifted by surang but also change smaller to make width of pepet surang equal to da below them. Table 2 no. 8-10 show several forms of glyph that represent gantungan of Balinese syllables ra (U+1B2D Balinese letter ra), also known as cakra or guwung; 8) kra from ka (U+1B13 Balinese letter ka) + gantungan ra; 9) skra from sa (U+1B32 Balinese letter sa) + ka + gantungan ra; and 10) krya from ka + gantungan rya which is combination of gantungan ra (the third cakra) + gantungan ya (see Table 2 no. 12). The shape of cakra glyph on kra (the first cakra) is narrower than the shape of cakra glyph on skra (the second cakra). Besides, glyph is written below ka at the end of the first cakra, while glyph is written beside ka at the end of the second cakra. This Balinese script complex behaviour shows some

RESEARCH METHOD
The accuracy analysis of Latin-to-Balinese script transliteration method was based on Balinese script writing rules and examples (cases) by Sudewa [5] (Table 3), where some of those cases referred to [14] [15]. That analysis was conducted on mobile application that was called Transliterasi Aksara Bali [4].
Several rules can not be tested independently without example, like the appended form of eighteen basic syllables at case 1-18, since provided examples are limited (case 19-25). Table 4 shows case sixteen, word boundaries and line break rules (Table 3), with the given sentence and its transliteration.  Table 4. Provided sentence and its transliteration

RESULTS AND ANALYSIS
Experiment for accuracy analysis of Latin-to-Balinese script transliteration method on existing method Transliterasi Aksara Bali version 0.0.2 [4] was conducted on Windows 7 64-bit Operating System, powered by Intel Core M-5Y71 CPU @1.20GHz platform using 8 GB RAM. For the next reference, this method was referred by its abbreviation, i.e. TAB. Since this method using Android mobile platform, the testing using Android Emulator with Nexus 5 platform and Nougat (Android 7.1.1) 32-bit Operating System. Table 5 shows the accuracy result where column Case represents rule or example of Table 3 and column Result shows result correct or incorrect (each was marked by check and cross). Figure 1 on left shows sentence transliteration differences by TAB, while Figure 1 on right shows the modified sentence --onsists of uncommon words in writing--to get the result more precise. Figure 1 shows character ě and ţ replacement by e and t, respectively. There is now way to enter such characters on mobile virtual keyboard. Note that TAB provides additional keyboard for character ś. Table 5 shows that TAB has passed over 68% (103 of 151) cases.  On the next Section, sixteen analysis for accuracy improvement was discussed based on sixteen case types on Table 3, respectively.
Dual transliteration of syllable a, as a counterpart of syllable ha, was shown by case 6.1. TAB transliterated syllable a the same as it transliterated independent vowel a at case 6.35 (see next Section 4.3). Also, syllable a can be transliterated the same as syllable ha or independent vowel a. It depends on the word. If that word is a special word, for an example word Akśara (letter) at case 6.45 (see next Section 4.3), syllable a will be transliterated the same as independent vowel a. Otherwise, that word will be transliterated the same as syllable ha, for an example word Angklung (a musical instrument) at case 6.31 (see next Section 4.2). For TAB, the transliteration algorithm can be improved to handle those special words through word searching on dictionary data structure that will give average time complexity O(1) regardless of the amount of words save inside [16]. If a special word is found, simply transliterate it by using independent vowel. As a note, the use of dictionary data structure was previously done by the authors for biometric data discriminator in [17]- [20]. Unfortunately, there is still no research to know the precise list of those special words that influence the accuracy of developed transliteration system in general.
On case 6.26, vowel ā as tedong ligature was used by word Kādep (see next Section 4.14). Also, vowel e should be written using vowel ě (become Kāděp) since vowel sign pepet was used at transliteration result (like vowel ě at word Jěro at case 6.27). As said previously, there is a limitation on entering vowel ě on mobile application. TAB was incorrectly transliterating vowel ě but it provides a replacement vowel é. Vowel e cannot be a replacement vowel since it has already represented another sound, like vowel e at word Sela (yam) at case 6.30. Word Kāděp and its variations (Kādep, Kaděp, and Kadep) refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1. On case 6.31, vowel A of word Angklung was failed to be transliterated by TAB because of incorrect usage of independent vowel. Case 6.31 is the same as syllable a at case 6.1 (see  1) where word Angklung need to be modified become Hangklung, which is uncommon in writing but right in transliteration. On case 6.32, TAB was failed to transliterate vowel sign taleng repa ai of word Daitya but TAB provide a replacement vowel ê that also in use for independent vowel airsania ai and word Airlangga (a Javanese king) at case 6.42 and 6.49, respectively (see next Section 4.3). It related to diphthong ai that is pronounced as long vowel ê. Not all of words having vowel ai should be transliterated like this. Word Daitya and its variation (Dêtya) refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1.
On case 6.42 and 6.49, independent vowel airsania ai and word Airlangga were transliterated with the same mechanism on word Daitya at case 6.32 (see previous Section 4.2 that covered the analysis). On case 6.42, vowel ê was used for independently written vowel airsania ai for correct transliteration. Case 6.44 is basically the same as case 6.42 but on different independently written vowel au that should used vowel ô (related to the diphthong au that is pronounced as the long vowels ô) for correct transliteration.

Syllable -vowel sign combination
Illegal combination of syllable -vowel signs happened on case where syllable ra or la uses ě or ö appended to it. Syllable ra or la must use regular form ra repa and la lenga, respectively (case 6.52-6.57). All of cases were transliterated incorrectly by TAB, i.e. rě, lě, lö, word Talěr (also), and Kěrěng (frequently) at case 6.52, 6.53, 6.54, 6.55, 6.56 (also appeared at Table 4), and 6.57, respectively. Major cause is disablity on entering vowel ě on mobile application, the same as word Kāděp (sold) at case 6.26 (see previous Section 4.2). However, TAB provides a replacement vowel é and the next analysis was based on that. On case 6.56, lé of word Talér was transliterated incorrectly by using illegal combination of syllable la and vowel sign pepet (to kill previous sound a of la and replace it with sound é), instead of using regular form la lenga. Word Talěr and its variation (i.e. Taler) refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same as described on special words at previous Section 4.1. On case 6.57, the condition is basically the same as case 6.56 but on different word with syllable ré.
On case 6.62, word Briag should be transliterated by stacking cakra and nania together. Using ya (rather than ia) at word Briag gave correct transliteration result. Word Briag and its variation (i.e. Bryag) refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1.
On case 6.63, and 6.66-68, replacement syllables were provided by TAB, i.e. ṇa, ṭa (syllable ta tawa), śa (syllable sa saga), and ṣa (syllable sa sapa), respectively. The use of those replacement solve all of incorrectly transliterated words related to akśara şwalalita. Word Gaņitri and its variation (Ganitri); Jaţayu and its variation (Jatayu); Bhiśama and its variation (Bhisama); and Şiwa and its variation (Siwa) each refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement them is basically the same like special words at previous Section 4.1.

Sound killers
Four sound killers (pangangge tengenan), i.e. cecek, surang, bisah, and adeg-adeg, are used to end sound of a syllable and represent consonant ng, r, h, and others, respectively. Adeg-adeg is the default sound killer that appears after a syllable (other than nga, ra, and ha). All of provided examples (case 6.79-6.83) were transliterated correctly by TAB, except word Karņa (ear) at case 6.82.
On case 6. 82, there is no surang was used for consonant r, the same as word Airlangga (a Javanese king) at case 6.49 (see previous Section 4. 3), Talěr (also) at case 6.56 (see previous Section 4.4) and Table 4, Partha (man's name) at case 6.73 (see previous Section 4. 5), and dīrgha (long sound vowels) at Table 4. Inconsistency was shown since surang was only used for consonant r at the end of the word Talěr even though it can appear anywhere.

Miscellaneous signs
Two miscellaneous signs, i.e. ulu candra and ulu ricem, are kind of sound killers (see previous Section 4.7) that are used to write Sanskrit words. They are used to end the sound of a syllable and represent consonant ng and m, respectively (at previous Section 4.7, their counterpart sound killer of ng and m is cecek and adeg-adeg, respectively). Their appended form of these syllables cannot be tested independently, as described previously at Research Method Section. All of provided examples were transliterated incorrectly by TAB, i.e. word Mang (a holly letter) and Siddham (perfect) at case 6.84 and 6.85, respectively.
On case 6.84, a replacement ṁ was provided for consonant cluster ng of word Mang. Word Mang represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1.

Holy symbol
When independent vowel au kara (see previous Section 4.3) met sound killer ulu candra (see previous Section 4.8), the Romanization is Om. Om was known as Ongkara, a holy symbol (akśara modre). This word is used almost everywhere in the text, as it is the symbol of God Himself. All of provided examples (case 6.86-6.87) were transliterated incorrectly by TAB, i.e. phrase Om Swastiastu (May God blesses you) and Om Şanti, Şanti, Şanti, Om (May peace be everywhere) at case 6.86 and 6.87, respectively.

Miscellaneous syllables
The existence of two miscellaneous syllables, borrowed from Javanese, in Balinese script is very rare (case 6.88-6.89). Their appended form of these syllables cannot be tested independently, as described previously. On case 6.88, syllable cha was transliterated incorrectly by TAB. Syllable cha has no regular form. It is always paired with the normal form of syllable ca. On case 6.89, syllable kha was transliterated correctly by TAB.

Punctuations
TAB transliterated correctly all of punctuations (case 6.100-6.107). Comma, period, less-than, period-0-period, greater-than, double greater-than, and colon sign was transliterated correctly become carik, carik pareren, panten, pasalinan, pamada, carik agung, and carik pamungkah sign, respectively. Double quotes (case 6.107) has the same sign ("). Panten is used at the beginning of a letter, a story, or a verse, while pasalinan is at the end of it. Pamada is used at the beginning of a religious text, while carik agung is at the end of it.

Some variation of usages
Some variation of usages includes: 1) Incorrect combination of independence vowel a kara (see previous Section 4.3) and vowel signs (see previous Section 4.2); 2) Special use of syllable pa kapal (see previous Section 4.6) that is never attached to suku or suku ilut (see previous Section 4.2); 3) Romanization of the inherent sound; and 4) The use of pangangge akśara (see previous Section 4.1).
On third variation of usages, a stand-alone syllable has inherent sound that is always Romanized as a and it is common to a Balinese to pronounced an a at the end of a word as ĕ. On case 5.116, word sekalĕ (real) was transliterated incorrectly by TAB since there is transliteration difference with word sekala. Word Int J Elec & Comp Eng ISSN: 2088-8708  Sekala and its variations (Sekalě, Sěkala, and Sěkalě) refer to one meaning and should have same transliteration. On this case, they represent another kind of special word that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same as described on special words at previous Section 4.1. On fourth variation of usages, several provided examples of case 6.117-6.124 were transliterated incorrectly by TAB, i.e. word Sukśma (thank you), Kśatria (warrior), Smerti (books of Veda), Utama (primary), and Dharma (religion) at case 6.118-6.119, and 6.121-6.123, respectively. On case 6.119, consonant ś of word Kśatria was transliterated incorrectly by TAB. This case is the same as word Sukśma at case 6.118. Another aspect, vowel cluster ia need to be written iya for correct transliteration. Word Kśatria and its variations (Ksatria, Kśatriya, and Ksatriya) refer to one meaning and should have same transliteration. On this case, they represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same as described on special words at previous Section 4.1. On case 6.122, word Utama was transliterated incorrectly by TAB. It need to be written as Utthama for correct transliteration. Word Utama and its variation (Utthama) refer to one meaning and should have same transliteration. On this case, they represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same as described on special words at previous Section 4.1.

Ligatures
Ligature as one pen strokes of two glyphs is desirable but not mandatory form. Tedung form ligatures with certain syllables. On case 6.125-6.146, tedung ligatures were transliterated correctly by TAB.

Abbreviations
Three different scheme for abbreviations can be used in Balinese, i.e.: 1) The one endorsed by the government to abbreviate government institutions. The scheme is to follow the way the abbreviation pronounced in Indonesian language; 2) The one used by Simpen [15]. The scheme is to use the first syllable with all the vowel signs attached to it; or if it is an independent vowel, then the independent vowel itself is used; and 3) The one less commonly used, but somehow the shortest one. The scheme is to use only syllable or independent vowel.
On case 6.147-5.150, phrase Bank Pembangunan Daerah Bali (Development Bank of Bali Province) and all of its three abbreviation schemes were transliterated incorrectly by TAB. Latin abbreviation of that phrase is BPD Bali and its three abbreviation schemes, i.e. Be Pe De Bali, Ba Pe Da Bali, and Ba Pa Da Bali at case 6.148, 6.149, and 6.150, respectively. On case 6.147, word Bank and Pembangunan each need to be written as Bang and Pěmbangunan for correct transliteration. Since consonant ng has the same pronunciation with consonant nk (foreign sound), it was used for transliteration. Word Bank represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1.

Word boundaries and line break rules
There are no spaces to separate words in Balinese script. In the old time of writing on dried palm leaves, spaces were scarce and there was common practice to break the sentence at any places. For modern writing, several rules applied, i.e.: 1) No line breaks allowed between syllable and any of its signs; and 2) No line breaks allowed just before a colon, comma, or full stop. On case 6.151, several words of sentence were transliterated incorrectly by TAB, i.e. Akeh (Many), akśara (alphabet), luir (i.e.), ipun (a pronoun to previous word akśara), suara (vowel), wianjana (consonant), talěr (also), dados (become), madrěwe (have), kawāśţanin (called), hrěswa (short vowel), and dīrgha (long vowel).
On word wianjana, semi vowel ia was transliterated incorrectly except it was written as ya (see previous Section 4.5), as shown by Table 4. At another aspect, cluster nj was transliterated incorrectly by using gantungan ja on syllable na, instead on syllable nga. This is because there is assimilation combination on syllable na into syllable nga [21]. TAB provide replacement consonant ň for consonant n of cluster nj for correct transliteration. Word wianjana and its variations (wyanjana, wiaňjana, and wyaňjana) refer to one meaning and should have same transliteration. They represent another kind of special words that there is still no research to know the precise list of them. The transliteration algorithm improvement on them is basically the same like special words at previous Section 4.1.

CONCLUSION AND FUTURE WORK
Accuracy analysis of Latin-to-Balinese script transliteration method on existing method of mobile application, i.e. Transliterasi Aksara Bali (TAB), gave accuracy on TAB up to 68% (103 of 151) cases, based  [5]. A Latin-to-Balinese script transliteration method can be improved significantly by taking care of thirteen kind of special words that was identified on the testing cases during the experiment. This research contributes on accuracy analysis and that recommendation for future development of Latin-to-Balinese script transliteration method. Based on authors knowledge, there is still no such analysis study in this research area.