A powerful comparison of deep learning frameworks for Arabic sentiment analysis

Received Oct 9, 2019 Revised Aug 11, 2020 Accepted Aug 23, 2020 Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.


INTRODUCTION
Arabic sentiment analysis or opinion mining aims to determine the sentiment polarity (positivity, negativity, or neutrality) of a writer. A large variety of opinions are borne in posts on different social media platforms like Twitter, YouTube, Instagram, Facebook. This field of research has recently attracted increasing attention [1,2], especially in English. Although the Arabic language is deemed as the most useful language on social media platforms, only some works have relied on ASA so far.
There is a set of machine learning models powering natural language processing (NLP) applications. Recently, DL approaches have gained high performance across various NLP tasks [3]. Specifically, it has held good results in the sentiment analysis domain [4][5][6][7], and it is the state-of-the-art model in different languages [8][9][10] while the state-of-the-art accuracy for ASA still requires ameliorations.
DL is a part of the ML field concerned with a collection of algorithms based on the brain function named artificial neural networks (ANNs). It is a ML method that teaches computers to do things that are natural to humans by creating architecture made up of an input and output layer with various hidden layers (encoders) between them. Numerous techniques are applied with DL like recurrent neural networks (RNN), deep neural networks (DNN), convolutional neural networks (CNN), long short-term memory (LSTM), etc. As part of our Arabic sentiment analysis research project, we attempt to perform an in-depth comparative evaluation to achieve a summarization of the most valuable programming languages, which are abundant in terms of ASA libraries. We try to compare these tools to specify the most useful ones.
Recently, the NLP community has attended numerous penetrations due to the application of DL. This later has offered salient ameliorations in the domain of sentiment analysis in English. However, less research has been done on employing DL in ASA. Due to its complication, morphological, and syntactic abundance, Arabic has deemed the most difficult language, and it has a limited number of DL libraries compared to other famous languages like English.
Choosing the most useful libraries is very complicated and needs an in-depth analysis. For this reason, we rely on many comparative levels. For the purpose of this work, we compared the most powerful DL libraries for ANLP in Python and Java to select the most convenient group of libraries that addresses our purposes in the ASA domain.
The rest of this article is divided as follows: Section 2 shows the critical Java and Python libraries in the Arabic language for DL. Section 3 lays emphasis on a detailed comparative study, and it also provides a conclusion about the most useful deep learning libraries in the field of ASA. The results are debated in detail in Section 4, and the work is finished with the final ideas in Section 5.

ARABIC SENTIMENT ANALYSIS LIBRARIES FOR DL
This section presents four open-source libraries for Deep Learning, namely Theano, TensorFlow, Keras, and Deeplearning4j.

Arabic supporting Python libraries for DL
Nowadays, DL is the hottest trend in ML and AI. We selected some of the best Python libraries for the Arabic language, which is deemed the most powerful DL libraries in ASA.
TensorFlow: it is an open-source software library for dataflow and differentiable programming on a range of tasks. It is a symbolic math tool and is also applied for ML applications like neural networks. It comes with robust support for ML and DL, and the flexible numerical computation core is employed across various other scientific fields. TensorFlow library can run on multiple GPUs and CPUs (with optional SYCL and CUDA extensions for general-aim computing on graphics processing units). It is obtainable on 64-bit Windows, Linux, and mobile computing platforms, containing iOS and android. For making it simple for users to understand, debug, and optimize TensorFlow programs, there is an excellent group of visualization tools named TensorBoard [11]. Figure 1 presents the TensorFlow python library architecture.
Theano: [12,13], is a cross-platform open-source tool that permits the researcher to evaluate mathematical expressions, including multi-dimensional arrays worthily. It is a mathematical library, but it was initially created to facilitate research in the DL field. Based on the advantages of Theano, divers packages have been developed, such as Keras, Pylearn2, Blocks, and Lasagne [14]. Theano has many features like it provides most of NumPy's functionality, but adds automatic symbolic differentiation, offers transparent employment of a GPU also it does the derivatives for functions with one or numerous inputs. Figure 2 presents the Theano architecture. convolutional networks, as well as conjunctions of the two. Figure 3 shows the Keras architecture.
Depending on the literature, we found that these libraries are very used in the ASA field, and many authors recommend them in their works. Table 1 highlights a comparison of many valuable deep learning libraries in Arabic sentiment analysis.

Arabic supporting Java toolkits for DL
We chose the Deeplearning4j library because it is considered as the most useful Java deep learning library in Arabic sentiment analysis: Deeplearning4j: It is released under apache license 2.0, created mainly by a ML set headquartered in Tokyo and San Francisco and led by Adam Gibson. It is a free and cross-platform tool, and it was designed to integrate with Spark, Hadoop, and other Java-based distributed software. It is designed for Java virtual Machine and Java as well as computer frameworks that broadly support deep learning algorithms. It has pre-trained models and sustains CUDA, but it can be more quickened with cuDNN. This library also offers GPU support for the distributed framework, and we can select native CPUs or GPUs for our backend linear algebra processes. Figure 4 shows the Deeplearning4j architecture.

COMPARATIVE EVALUATION OF DL TOOLS
In this part, we will present our in-depth comparative study on various levels: first, we show the many essential standards and the most critical application areas. Then, we rely on the GitHub results and the history of Google Trends to conclude the most powerful DL libraries that satisfy our specific requirements very well.

Comparison of the most famous open-source DL libraries
The comparison of these DL tools can rely on different criteria. Table 2 shows a variety of parameters adopted in this comparative study:

Comparison by supported treatments in DL and NLP fields
In Tables 3 and 4, we highlight the essential supported treatments of DL tools TensorFlow, Keras, Theano, and Deeplearning4j. We also aim to compare them for concluding the library that sustains a very high amount of covered tasks. According to this benchmarking study, Python libraries TensorFlow, Keras, and Theano support almost the same tasks in ANLP or deep learning. Compared to the Deeplearning4j Java library, these three Python libraries support a large number of applications.

Comparative evaluation of the open software tools focused on the forks, stars, commits, and contributors received by the GitHub community
The GitHub site numbers are constantly variable. That is why we will select the consultation date of these pieces of information (17/06/2020). Table 5 shows the GitHub results. Relied on the following results, we deduce that Tensorflow is the most used, pursued by Keras and finally Theano and Deeplearning4j.

Comparison of libraries according to Google Trends and GitHub pull request history
In Figures 5 and 6, we highlight the results of Google Trends and GitHub pull request history. According to Google Trends history results, we can conclude that Keras and TensorFlow libraries are very famous within the users' community. Focusing on GitHub pull request history results, TensorFlow is considered to be the most commonly used compared to the three other libraries in recent years.

RESULTS AND DISCUSSION
In this comparative evaluation, we selected Java and Python programming languages because they are very popular and have many useful DL libraries used in ASA. It is very difficult to deduce that one tool is greatest than other tools, as these tools are very valuable and have a large popularity in these fields. The following part presents our conclusions: Theano: is great for creating networks from available components and reusing pre-trained networks, but more challenging as far as the building of perfect solutions is concerned. The main disadvantage of this library is frequently long compile times when creating huge models.
TensorFlow: it built to substitute Theano. These two tools are, in fact, quite similar. A neural network is declared as a computational graph like in Theano, which is optimized during compilation. TensorFlow, however, has a faster compile time than Theano, but it is slower than other libraries. A significant new feature is the implementation of data parallelism, which is identical to the Iterative MapReduce from Deeplearning4j.
Keras: is easy to use and offers actionable feedback upon user error, and it permits fast prototyping. Keras library sits atop Theano and TensorFlow. Deeplearning4j focuses on Keras as its Python API and imports models from Keras and through Keras from TensorFlow and Theano. Its programs are generally smaller than the equivalent Theano and TensorFlow programs.
Deeplearning4j: provides great solutions for beginners interested in exploring deep neural networks, appropriate for educational and training objectives. On the other hand, TensorFlow and Theano tools are more suitable for experienced users who need to have much more control over network architectures.
As a summarization of this section, each DL tool is characterized by its benefits in ANLP tasks. Although they share almost the same characteristics and advantages, Tensorflow outperforms other libraries according to Google Trends and GitHub pull request history. However, when we talk about Arabic Sentiment analysis, and according to the literature and various great works in the field of ASA, we find that: TensorFlow, Theano, and Keras are very popular and very used in this research domain.

CONCLUSION
In this work, we described a variety of Python and Java DL tools that are deemed most helpful in ANLP. Besides, we attempted also to conclude the most powerful and useful DL libraries for the ASA field. Moreover, we have compared each library using several aspects. In conclusion, every DL library is characterized by its advantages and benefits in ANLP tasks. For this reason, we chose this set of libraries because they made great results in the ASA task.