A systematic review of text classification research based on deep learning models in Arabic language

Received Apr 10, 2020 Revised Jun 6, 2020 Accepted Jun 17, 2020 Classifying or categorizing texts is the process by which documents are classified into groups by subject, title, author, etc. This paper undertakes a systematic review of the latest research in the field of the classification of Arabic texts. Several machine learning techniques can be used for text classification, but we have focused only on the recent trend of neural network algorithms. In this paper, the concept of classifying texts and classification processes are reviewed. Deep learning techniques in classification and its type are discussed in this paper as well. Neural networks of various types, namely, RNN, CNN, FFNN, and LSTM, are identified as the subject of study. Through systematic study, 12 research papers related to the field of the classification of Arabic texts using neural networks are obtained: for each paper the methodology for each type of neural network and the accuracy ration for each type is determined. The evaluation criteria used in the algorithms of different neural network types and how they play a large role in the highly accurate classification of Arabic texts are discussed. Our results provide some findings regarding how deep learning models can be used to improve text classification research in Arabic language.


INTRODUCTION
The classification of texts is a method of searching for data and exploring the data among large data and classifying them into groups for easy reference [1][2][3][4][5]. Internet pages, books [6], magazines, and social media [7][8][9][10][11][12][13][14][15][16][17][18][19][20], have become a rich source of information that needs to be categorized and organized for easy reference [21][22][23][24][25]. There has been a lot of research in this field, but most of it involves the classification of English and Spanish texts and texts in other languages. There is a lack of research in Arabic text classification, and the techniques and algorithms used on English texts do not fit the Arabic language texts. This is because the Arabic language has certain characteristics where the structure of the word is concerned [26]. The main aim of this study was to make reviewing Arabic text classification based on neural networks. So, this research summarized the list of neural network techniques that were used to classify Arabic texts and determined which ones were more efficient and accurate. Finding the gap in the current literature was another important aim. In this research, we focused more on the use of neural networks in the classification of Arabic texts. We discussed the concept of the classification of texts first alongside the types of classification then we presented the topic of neural networks, their types, and the classification processes for the different types. Through systematic review, the research questions were initially identified, then the research strategy was developed to obtain the topics more accurately. Subsequently, quality  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 6, December 2020 : 6629 -6643 6630 assessment criteria were developed to assess the quality of the research papers in order to extract the important ones. In the discussion section, all research questions were answered in detail.
In order to achieve the main aim of this study, we defined our research questions as follows: -RQ1: Which corpora were mainly used for Arabic text classification? Were they open source or created by the author? -RQ2: Which countries focused on publishing research on ATC? -RQ3: Which databases had more publications? -RQ4: During which years was there more focus on ATC? -RQ5: What were the types of neural network algorithms used to classify Arabic texts? And which one was most used? -RQ6: What were the efficiency measures used? -RQ7: Did the author compare NN with other techniques? And which ones were the most frequently compared with NN? -RQ8: Which type of neural network algorithm proved most efficient?

LITERATURE REVIEW 2.1. Definition of TC
The text classification process is the process of automating a set of documents into specific groups based on the content of the text itself through the use of certain technologies and algorithms [24,[27][28][29][30]. Elhassan and Ahmed [30] also defined the classification of texts as a method of searching for data and exploring them among large data and classifying them into groups for easy reference. According to [31], the classification of texts was defined as the process of organizing documents according to a known and pre-existing structure of specific categories that suited this type of text. Others mentioned that there was a slight difference between text categorization and classification. Whereas the categorization of texts entailed sorting them according to their content, their classification placed them in certain groups that suited their content according to author, subject, title, language, and other classifications [32]. Specialized research on the classification of texts has increased significantly due to the enormous data available from many sources, including Internet pages, e-mail messages, news pages, texts circulated through social media, reports, and journal articles. Therefore, this research focused attention in order to make the best possible use of all these data and classify them [33].

Arabic text classification
Arabic is one of the six primary languages worldwide. Its use is spread across many countries in the world, and it is the basis of the Arab world, but some Islamic countries also use the Arabic language as their primary language because it is the language of the Qur'an. The Arabic language consists of 28 letters, including hamzas and elmodod. Each letter has certain structures that depend on the position of the letter in the whole, that is, whether it is at the beginning of the word, in the middle of the word, or at the end of the word [34]. There are no upper-and lower-case letters in Arabic unlike English or other languages [31]. The morphology of the Arabic language is not easy; it may be complicated as it has radical words, prefixes, and suffixes; moreover, unlike those in other languages, its words do not consist of sequential forms as its word structures are completely different and depend on position in the sentence and meaning [31]. As mentioned above, the classification of texts has become widespread, and most of the research includes the classification of other languages, such as English. Little of the research on the classification of texts has focused on Arabic. Many techniques and algorithms have been created to classify English texts, and they give excellent results and high accuracy due to the nature of the language, letters, and words. But when it comes to the Arabic language, the application of these algorithms does not give similar accuracy and validity in classification. Therefore, appropriate algorithms have been created to synthesize words in the Arabic language, but they have not yet demonstrated fully reliable merit for the classification process [34].

Classification technique
There are two basic methods of classifying texts. One is the grammar or linguistic based method, which depends on preparing certain rules and operating them in some expert systems in order to classify the texts. The second method is the process of feeding the text extrapolation process with a set of documents called training documents that have previously been classified according to specific categories [30]. Below are the explanations of the two methods.

Manual and statistical techniques
Manual text classification (TC) is done by writing specific queries manually for each category, for example sports, nutrition, clothing, and health. Then the text entered within a specific search engine is categorized based on the predefined queries. However, this method works on small texts, not on large ones or on many documents. The accuracy of this approach to classification depends on the validity of quires and the skill behind the query design. Expert systems have been created that prepare rules and inquiries. An example is Construe-TIS, the one created for Reuters News, which was able to apply 674 categories and to get to know more than 170,000 companies [32]. This classification was not based on individual words but on concepts graded from the actual text as shown in Figure 1. Techniques which rely on mathematical rules and principles are also called statistical text classification. These techniques are suitable for small data, such as "Frequentist procedures, Bayesian procedures, and the Binary and multiclass procedures" [34].

Machine learning techniques
Due to the large amounts of information during the past two decades, there emerged a need for technical methods of classifying huge texts as statistical and manual methods had never become useless in this field [35][36][37][38][39][40][41][42][43][44][45][46][47][48]. Therefore, machine learning classification techniques appeared which aim to classify unstructured texts and documents based on certain algorithms designed for this purpose. Machine learning techniques can be divided into groups as shown in Figure 2 [34]. In the Supervised Learning technique, different data are recognized, whether visual, audio, or text. The data are compared with the expected ideal data or results through a process called backpropagation in which the data are directed from the output layers to the input layers and errors are corrected and minimized to achieve better accuracy. In Unsupervised Learning, the learning process takes place during the processing. There are no previous data for comparison as the network analyzes the data and processes them; then, builds a function to determine the error rate and reduce it to obtain high accuracy. The Semi-Supervised learning technique is a combination of the previous two techniques. Researchers saw that there was little unnamed data that could contribute to improving the accuracy of the learning process that took place during the data classification stage. So, a small percentage of the data was used in Semi-Supervised learning.

Neural network
Algorithms of neural networks give very accurate results in the area of NLP and support deep learning very significantly as they solve some errors and problems related to the variance of data resulting from the process of deep learning [36][37][38][39]. Neural networks are used in the classification of texts to address linear and non-linear problems. The backpropagation model classifies texts using neural networks by means of a group of nodes that form and represent a mathematical model of biological nerves. No interference and self-learning affect these neural networks as they are used to identify systems and patterns and to classify text and image processing [40]. Many consider neural networks to be units that are connected to each other and intertwined similarly to the dynasticity of the neuron. In addition, through this process of entanglement, the inputs are made and the other neuron allows the exit of the transmitted outputs. The design of ANN is similar in structure to that of the brain and its neural network. All types of sabotage of the neural network have a basic principle, which is that every neuron in the network receives inputs, processes them, and sends outputs. Each neuron is linked to at least one neuron, and each connection or association has a specific digital weight called the weight factor, which works to reflect the importance of the communication between neurons in the network [41]. Some popular primary types of neural networks follow.

Feedforward neural network (FNN)
It is a neural network whose cells are linearly connected to each other and do not represent a cycle like any other neural network. The single layer perceptron (SLP) is the simplest form of this type of neural network since the inputs are directly related to the output. In the SLP, the inputs go through several layers of transformation, so they are considered very suitable for categorizing texts [42]. The structure of the feedforward neural network is shown in Figure 3.

Convolutional neural network (CNN)
Convolutional neural networks are inspired by the multi-layer perceptron (MLP) and are designed to extract the spatial structure in image data and the positions of objects that can be used in the image [43]. The same principle and idea were used to classify texts and one-dimensional words, and this was done through interaction with neighboring neurons. Its name is "convolutional" due to folding process that occurs between neurons during the classification process. The structure of the convolutional neural network is shown in Figure 4.

Recurrent neural network (RNN)
The structure in this type of network is done continuously and sequentially since the outputs of the previous stage are the inputs of the next stage. By contrast, with traditional neural networks, the inputs and outputs are separate units from each other. Sometimes there is a need to predict the desired word, Int J Elec & Comp Eng ISSN: 2088-8708  and therefore, the need develops to refer to the words prior to the review, and this is what recurrent neural networks do. A hidden layer has been used to help with this type of network and to complete the process [44]. An example of a frequent neural network is shown in Figure 5.

Long short-term memory (LSTM)
LSTM are long short-term memory networks as they are a specific recurrent neural network structure (RNN) designed for use in modeling long-range time series that are observed to operate more precisely than RNN. The neurons of this network are made up of units that are gates; in this way, the network can control the flow of the inputs that lead to the final output. Thus, only a few inputs may participate in the output, so the error rate is reduced [45]. LSTM contains units called memory blocks, and they are located in the hidden layer. These blocks have self-connections in memory cells that record the time status of the network during the work. They also contain units called gates that contribute to controlling the flow of inputs and outputs. Their input gates control the activation process for entering information into the memory. As for the output port, it controls the output after the activation process, which takes place at the entry gate. A portal called the forget gate, which addresses the weakness of LSTM in determining flows for specific units, has also been added [46]. Figure 6 shows the structure of LSTM RNN.

Arabic Text classification process
There are three main stages in the classification of texts, which based on the type of neural network used as shown in Figure 7.

Data pre-processing
At this stage, the word is filtered and returned to its original root by removing all the changing polices related to the whole, for instance, hamzah, Ta Marbouta ( " ‫"ة‬ ), tashkeel from all the words, numbers, and punctuation marks. Also, the words that link words, such as (" ‫لذلك‬ ، ‫ل‬ ‫,)"بالنسبة‬ are removed. Prefixes and suffixes are also removed, and all words of the same root are grouped. This stage aims to give precision to the classification process and save time [34]. After removing all of the appendages, the word is returned to the root by three methods: "the root-based stemmer, the light stemmer, and the statistical stemmer." Then the document is classified based on the vectors for each document [30].

Text classification
The training phase takes place here; a specific algorithm is applied to the words obtained from the previous stage in order to complete the classification process. Several comparisons can also be made here by using a different algorithm to classify the text and then choosing the best in terms of performance and the result of accuracy in classification [30].

Evaluation
Afterwards, and at this stage, the effectiveness of classification is evaluated; several techniques are used, but the most famous and frequently used one in the classification of texts is "F1, precision and recall" [30]. Figure 8 demonstrates the key phases we followed. Generally, research questions were identified then we identified the search strategy and the keywords we used in our research. Subsequently, we identified the quality assessment questions. After that, extraction of the papers occurred. In the last step, we did critical analysis of the chosen papers.

Data source
This study has begun in January 2020, and we included most researches from 2016 to 2019 and some researches before that period. The databases we used follow: IEEE, Science Direct, Springer, ACM. The Figure 9 shows the data source and the number of articles that were used.

Search strategy
The keywords we used to collect all the previous studies follow: -"Arabic text" and "classification" and "neural networks" -"Arabic text" and "classification" and "deep learning" -"Arabic script" and "classification" and "deep learning" -"Arabic script" and "tagging" and "deep learning" -"Arabic script" and "categorization" and "deep learning" Int J Elec & Comp Eng ISSN: 2088-8708 

Inclusion criteria
In the selection criteria step, inclusion and exclusion criteria were set to ensure that the research included in this study was valuable and relevant and would lead us to our main aim.

Exclusion criteria
-Must be a text classification study -Must be for Arabic text classification only -Must use neural network for text classification -Must report the model and its performance measures, for instance, accuracy or another metric -Must address summary for the corpus used.

Selection criteria
-Papers not for Arabic text classification -Paper didn't address the accuracy -Paper had not been published in journal or conference -Paper did not use NN

Quality assessment
In this part, we designed quality assessment questions to make a checklist for the research and ensure that it would satisfy the aim of this systematic review. Below are the questions. Q1: Was the corpus identified and described well? Q2: Was the corpus identified well regarding the extent of training and testing? Q3: Were the text classification (model / framework) steps described clearly? Q4: Did the author make comparisons with techniques other than the NN he used? Q5: Was the performance of the model identified clearly?
Scale: A three-point scale was used in this assessment. If the paper addressed the question exactly, it would be graded 1, and if it did not address it, it would be graded 0, but if it answered the question partially, it would be graded 0.5. Table 1 shows the result of evaluating the research using the designed assessment questions.  Figure 01, it is clear that the documents that were used in the process of classifying the Arabic texts were mostly open source and news sites with 4 sources in each case. Some also used books: two research papers applied the classification process to used books. One research paper classified guest reviews of a hotel, which could be considered to not be an open source. That is, the number of open source resources in the research papers under study was 11, and the number of non-open source ones was only one as shown in Table 2. b. RQ2: Which countries focusing on publishing researches in ATC?
Many researchers from Arab countries were interested in studying and publishing research related to the classification of Arabic texts. Jordan was at the forefront of the countries that were classified. After the liquidation, we obtained three research papers of researchers from Jordan, then some from Morocco, the United Arab Emirates, and the United States of America at a rate of two research papers from each country. As for the least published countries in the period identified in this research, Algeria, Saudi Arabia, and Tunisia qualified, with one research paper from each country. It is clear that the Arab countries have started to study and publish on the subject of the classification of Arabic texts in order to use the language in almost all fields. Also, these countries have started to use Artificial Intelligence techniques in almost all fields. Therefore, the Arabic language should make a contribution in the fields of Artificial Intelligence and Machine Learning. Figure 10 illustrates the number of studied articles in each country. Figure 10. corpora were mainly used in the selected papers  According to the chart below in Figure 12, a number of well-known scientific journals published research related to the classification of Arabic texts. IEEE was at the forefront. 60 research papers were obtained from IEEE, 40 were obtained from Springer, and 17 were obtained from ACM. As for the scientific Based on the graph illustrated in Figure 13, the period from 2017 to 2019 witnessed a significant development in the field of Arabic text classification research. Six research papers were distributed between 2018 and 2019, that is, three research papers for each year. In the period between 2013 and 2016, only four research papers were studied; they were distributed as follows: one article in 2015 and three in 2016. Only two research papers were obtained between 2009 and 2012. From the data we have, it is clear that research interest in the field of the classification of Arabic texts is starting to increase due to the importance of the Arabic language, its use in more than one field, and its use in more Arab Islamic countries.  [47][48][49][50]. Convolutional neural networks (CNN) were used to classify Arabic text in [50][51][52][53]. Recurrent neural networks (RNN) were used in only one study [52]. In this study, the author also tried to build ensemble methods, which combined RNN with CNN, in order to improve efficiency. In the same study, another model was built using a special type of RNN: BiRNN. Three-layer feed-forward NN have been used twice in [40] and in [54]. But, in [55], the multilayer perceptron (MLP) was used as a part of 3-layer  [56]. Backpropagation autoencoder was used just once in [57]. Figure 14 shows the types of NN and how many times it has been used in all reviewed papers. Table 3 shows the references for each type of neural network founded in each paper. The papers did not mention the reason behind choosing these types of NN. We believe that the reason should be stated.   [57] f. RQ6: What were the efficiency measures used?
The efficiency measure mainly used in these studies was primarily accuracy; the others used included Recall, F-measure, and Precision as shown in Figure 15. Accuracy was used in 8 studies in our systematic review as shown in Table 4, but not all of these studies identified it well. Accuracy was identified well and the researchers cleared the definition or the equation in only four studies: [48][49][50]57].
The Recall efficiency measure came in second position: it was used in five studies. Precision and F-measure were the least used; they were used in only 4 studies. Table 4 shows the efficiency measure that used in each paper.   [40], [54], [56], [57] F-measure [54], [55], [56], [57] Recall [32], [40], [54[, [56], [57] g. RQ7: did the author used other techniques to compare NN with? And what were the most one used to compare with NN? Many machine learning algorithms have been used to compare it with the efficiency of neural network in classifying Arabic text. Some of them are supervised machine learning algorithms and others are unsupervised machine learning algorithms as shown in Figure 16. The algorithms that have been used in Arabic text classification based on our systematic review are Support Vector Machine, Logistic Regression, Naïve Bayes, KNN, Decision Tree, Random Forest and XGBoost. As it is clear from Table 5, SVM and Naïve Bayes were the most algorithms used in the past researches to compare it with NN. Followed by Decision Trees where it has been used twice and other techniques only one time.   [56] h. RQ8: Which type of neural network algorithm proved most efficient? After the deep systematic review, answering this question was not easy, nor was the answer comprehensive. Hence, we cannot generalize one type of NN as the best in Arabic Text Classficiation for many reasons we have discovered in this systematic review. We found that the databases used in all these studies were different, and no two similar corpora were used. Some were open source, and others had been created by the authors. Another reason was that the neural networks used were different in each study. In the cases where the researchers used the same type of NN, there was other missing information. The researchers did not indicate in detail the parameters used in these networks. Hence, usually these algorithms would be tuned by changing parameters. This made it difficult to compare or to make sharp decisions about which neural networks were the best. The only clear finding would be arrived at if more than one neural network was used in the same research. The result of this comparison already stated in each study had done this. In general, this is a comparison, but as discussed before, we cannot state it. In [55] paper, MLP did not revert to high accuracy in two studies, mostly around 50%. The CNN result was better than ISSN: 2088-8708 