Security and privacy recommendation of mobile app for Arabic speaking

ABSTRACT


INTRODUCTION
Currently the digital world has millions of mobile applications which interfere directly or indirectly to our personal's information such as location, name, and photos.The increase of numbers and diversities of apps lead to rise in synchronized way with threats, security risk and privacy issues which can impact the user data privacy.To evaluate the level of security and privacy, the users' reviews can be used to extract from their experiences to identify to what extent these applications security and privacy might be trusted.In addition, some of mobile apps provide information about their privacy and security level that can be use as index-helper in evaluation [1], [2].On the other hand, the tricky matter that facing us is "Arabic language" is a semantic language with a complicated morphology, which is significantly different from the other popular languages, and thus to satisfy this large number of Arab users in terms of security and privacy on mobile apps we should take on consideration the special features of Arabic language.
Arab states are around 500 million people which they use Arabic language with different dialects beside classical one for instance Middle East and the dialects of the Maghreb, dialects of the Bedouin and the dialects of the people of cities and villages.In this case study we have been faced the dealing with reviews written in different Arab dialects.In many cases, one orthographic word in Arabic language comprises many  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 5, October 2022: 5191-5203 5192 semantic and syntactic words.In addition of classical Arabic, there are two types of morphology: roots morphemes and affixes morphemes [1].In this context, our work is to develop a privacy and security awareness recommender for the Arabic users in particularly.The recommender system be able to classify enormous of users' reviews in Arabic dialects, then it will determine the level of apps security and privacy [3], [4].
The rest of the article is structured as follows: the second section briefly presents the Arabic language specification.Also, it provides an overview of related work of recommender system in term of security and privacy.The third section illustrates the conducted survey results in objective to design our recommender with high performance.and the design assumptions adopted to establish the proposed system.The exhaustive methods for implementing the cluster selection mechanism and the trust system evaluation are provided in the fourth section.The fifth section analyses simulation results by highlighting the improvements achieved by the proposed protocol as compared to state-of-art techniques.Finally, we conclude the paper and highlights future work based on paper contributions.

RELATED WORK
There have been many studies deal with security and privacy by using a recommender system or survey.While, there still several issues with permissions in app and awareness of user about security and privacy.In this section, we present an overview for the Arabic language characteristics and present a survey result to solid our work.

The Arabic language: overview
The Arabic language is one of the most popular languages around the world and is commonly used on the internet and social media.It is considered one of the top six languages worldwide.Over 200 million people are native Arabic speakers, distributed over 20 countries [5].
Many researchers such as Perrin 2015 agreed that Unlike English language, characteristics of Arabic languages makes it is complex to developed in term of Corpora and some classifier tools, compared to the English language.Hence, it is stated that, in daily life and social media as well, the Arabic language is manifested in three forms: i) classical Arabic, the Holy Quran language, ii) modern standard Arabic (MSA), the formal Arabic used for professional purposes like books, media and education and which is easy to understand for all Arabs from different regions, and iii) dialectal Arabic (DA), the local dialects for Arabs which differ based on geographical regions, and which consists of four regions: i) Sudan and Egypt, ii) Lebanon, Syria, Jordan, and Palestine, iii) Gulf (Iraq, Kingdom of Saudi Arabia (KSA), United Arab Emirates (UAE), Kuwait, Qatar, Bahrain, and Yemen), and iv) Libya, Tunisia, Algeria, and Morocco [1]- [6].

Recommender systems
Social networks content is increasing steadily with a large amount of information like data, images, videos, contents, and documents that are shared on these networks which can be noisy and heterogeneous.Hence, this continuous huge increase in data needs to be organized and arranged in a way that allows users to extract the needful information easily.Previously, this demand could be achieved through recommender systems.
The field of recommender systems has its origins in the mid-1990s.Recommender system is an information filtering system that aims to solve the problem of information overload to users and suggest useful information to targeted users [7]- [9].This is becoming increasingly important to e-commerce and social media sites.It helps to make decisions regarding products to buy and businesses to patronize.
Recommender systems (RSs) are built and developed based on users' textual reviews, ratings and comparative opinions [10].There are four different approaches used in developing RSs, including content-based (CB) filtering, collaborative filtering (CF), hybrid-based (HB) filtering, and knowledge-based (KB) filtering.When using a CF or a HB filtering approach, RSs must gather information regarding the user in order to develop recommendations [9]- [11].

An android permission control recommender system based on crowdsourcing
Mobile applications may be a concern to users due to risks of data security and privacy, because apps request many permissions which users do not fully understand, and apps do not disclose all information about the purposes of these permissions.Rustgi and Fung worked to improve a recommender system (DroidNet) by showing app permissions to the user, so that the user could agree or disagree to installing the app after seeing recommendations about the app [9].This technique can reduce a user's concern around security and privacy.Moreover, DroidNet has a database which gathers all the user's permissions from the mobile and another database that is online.The significant point here is the linking between the two databases, which is immediately up to date.Also, DroidNet's recommendations are supported by expert users who deal with apps.In sum, this paper illustrates that DroidNet is considered an effective recommender system that gives recommendations based on expert users and database [12]- [14].

SURVEY
The aim of the survey is to discover users' level of security and privacy awareness, and whether they have enough knowledge about security risk and privacy in mobile apps.Further, we attempt to gather words relevant to the description of security and privacy, which will use in the recommender system.It will be a good tool to help users know the level of security and privacy in an app, especially for those who prefer not to read reviews before downloading an app.

Result of survey
Findings of survey illustrates users' awareness around security and privacy, and which words are collected through survey.It extends to investigate how users deal with apps' privacy policy and third-party too, which really illustrates user's level of knowledge about security and privacy awareness.Firstly, we created the design of the questionnaires online using Google Drive.Then, it was distributed to many people who speak Arabic.There were 827 participants who responded to the survey through the internet; they are from 17 countries such as Saudi Arabia, Kuwait, Oman, Iraq, the United Kingdom (UK), and Turkey.
As shown in Figure 1, around 250 participants read the privacy policy of apps, while 577 of participants do not care about it.This shows how users who download apps still do not realize the significance of security and privacy.Also, other studies also found similar findings, such as the experiment in Universiti Sains Malaysia BYOD, 2017 where it was found that more than 50% of sharers do not read the guidelines of privacy [15].

Figure 1. Users' lack of awareness about privacy policy when downloading an app
As shown in Figure 2, 617 participants prefer to use a recommender system before installing an app.Based on these answers, it is demonstrated that users realize the significance of a system of recommendations, even though 210 participants prefer not to use it.However, there are many recommender systems that focus on detecting permissions and showing them to a user; they can also show the scale of security and privacy on apps.Therefore, we work beside this survey to create a recommender system to support my aim about users' reviews.The result of the question of how important the recommender system can be obvious, as it can reduce security risk, and helps users to make the correct decision around installing apps [12], [16], [17].
The question shown in Figure 3 is significant in the survey because it allows the gathering of expressions and words about security and privacy awareness that can support my recommender system's dictionary.However, about 143 participants provided their comments, while 684 participants did not provide their views on any app after they downloaded it.There are 127 comments written by participants who said 'yes'; also, 16 comments came from participants who said 'no'.Therefore, there are 25 comments that are useful and usable in the dictionary.They include an enormous number of words (lexical and semantical) which involve the meaning of security and privacy awareness, as shown in Table 1 [18], [19].
Figure 4 illustrates the number of participants who provided their information when downloading an app.Further, around 300 participants provide their email, location and mobile number when they would like to install an app, while four participants provided everything to download an app. Figure 5 illustrates the number of participants who chose the factor 'security and privacy' as impacting on their decision before downloading an app (around 61.4%).In addition, the factor 'quality of app' affected exactly 58.8% of participants' decisions before installing an app, which is slightly lower than security and privacy, and which shows how users have knowledge about protecting their sensitive data.Moreover, 'advertising' and 'ratings' can impact on users' decisions to download an app (42.6% and 44.7%, respectively).Of the participants, 17% are impacted by message errors (report bug).This question deeply shows the significance of security and privacy in users' awareness.

Survey discussion
There are two significant parts of survey: i) to gather words for recommender system's dictionary and ii) to discover level of users' security and privacy awareness by their answers in the survey.Firstly, in Figure 3 and Table 1, it is demonstrated how participants do not care about reviews, which we believe includes sensitive words about security and privacy.Few participants provide words about security and privacy, but this is a slightly weak result.In addition, Figure 5 provides some information that participants provide to apps before downloading them.Therefore, many users still struggle with apps that request they insert their information before installing them, and which sometimes extend the request to include sensitive data which is then saved, such as credit card information.These questions reflect users' low awareness.
Secondly, Figures 1 to 4 involve specific questions about security and privacy, which we can use to dramatically determine users' level of awareness.Figure 1 illustrates users' unawareness about privacy policies, but they do have knowledge that the issue of their data is serious, so they select to use a recommender system to avoid leaking their data or at least reduce threats, as shown in Figure 2.Moreover, Figure 5 shows that security and privacy are a priority for users.

RECOMMENDER SYSTEM FOR SECURITY AND PRIVACY
It acts as reader where it reads users' reviews and assists them to pay attention to level of security and privacy in apps before they download them.That allows users to take an obvious decision about app if they want to install it or not.Also, recommender system can assist the users to prevent effectively what is considered as threat or violation to their security and privacy or "unexpected data collection practices" [17], [20].

. Users' reviews
We select users' reviews from Google Play; this allowed to gather many words to illustrate the level of security and privacy.To build a recommender system requires an enormous number of words (lexical or semantical).Therefore, we gathered 1,354 comments from these groups (21 games, 16 education apps, 20 shopping apps and 10 social media apps) which were relevant to security and privacy.The aim of this work is to fully understand the context of users' reviews and collect each word relevant to security and privacy.Therefore, we can increase the words (lexically and semantically) in the recommender system's dictionary, which helps it to classify and evaluate each review for an app.

Dictionary
The dictionary includes lexical and semantic words.I classify the words based on their relevance about security and privacy.In addition, I attempt to insert each word into as correct a place as possible in the dictionary based on whether the word is close to a security expression or privacy expression.After that, I attempt to weigh the word about which level it will be (where five is strong and one is weak) as shown in Tables 2 and 3.

Recommender system's diagram
"A recommender system is algorithm whose aim is to provide the most relevant information to a user by discovering patterns in a dataset."(Dictionary and Recommendation system), machinimas of recommender system as presented in Figure 6.The diagram describes the different steps we took to build my learning model and its implementation by the recommendation application.These steps are briefly described: − Reviews collection process: This consists of collecting reviews from the open-source Google Play Store platform using Google Play Scraper.The collected data set contains 954,684 reviews obtained from 2,816 apps.I chose the last 500 reviews in Arabic for each app, if there are any.This data set will be cleaned and then harvested to create the classifier's training features.− Pre-processing of reviews: This is the stage of preparing reviews to create the training data, it mainly comprises the following processing: i) Tokenization: tokenizing a review amount to separating it into tokens, that is to say into distinct words or symbols.From a review we extract a vector of tokens, ii) lexical standardization: in the Arabic language some characters can be written in several ways, this step will allow them to be standardized, iii) english words remover: some reviews contain in addition to words in Arabic other words containing Latin characters, a function will take care of them, and iii) remove stop words: stop words are words that are so common that it is unnecessary to index them or use them in learning.the classification algorithm by LinearSVC is applied to build two models, one for security and the other for privacy, this model is exported into the web application that collects the online reviews (request last reviews) and apply the model to decide which class corresponds to the review.The remaining sentences are called 'not categorized reviews' (class 0).We define the representative reviews as what contains pre-defined keywords of the category in its content.Nevertheless, there exist error reviews in the representative reviews, which we will eliminate.As similar words tend to appear in similar contexts, we compute the similarity by using contextual information.This step is done twice, one for security classes and the other for privacy as shown respectively in the Figures 7 and 8.

Unbalanced data
Learning from imbalanced data is a difficult task since most learning systems are not prepared to cope with a large difference between the number of cases belonging to each class.Researchers have reported difficulties to learn from imbalanced data sets in several domains [20].To overcome these difficulties, two main solutions are proposed in the literature: one is based on the adaptation of learning algorithms and the other on the modification of the size of the data in order to make them balanced.we opted for the second strategy which generally offers three alternatives: − Under-sampling: this method aims to balance the data set by eliminating examples of the majority class.− Over-sampling: this method replicates examples of the minority class in order to achieve a more balanced distribution, by data duplication or self-generation of new data synthetic minority oversampling technique (SMOTE).− A combination of over-and under-sampling.
Given the difficulty of applying over-sampling, we opted for under-sampling by trying to balance the data according to the minority class and by performing the performance tests to find the ideal size for the other classes.Technically we worked with 'imbalanced-learn application programming interface (API'1), Security and privacy recommendation of mobile app for Arabic speaking (Hameed Hussain Almubarak) 5199 which provides the necessary methods to perform under-sampling in several ways, but we opted for the random under-sampling technique by fixing the size of the samples of each class according to the test's performance.

Feature extraction
The reviews must be parsed to remove words, known as tokenization.Then, the words need to be encoded as integers or floating-point values for use as input to a machine-learning algorithm, known as feature extraction (or vectorization).For this step we used the 'TfidfVectorizer' 2 method which converts a collection of raw documents to a matrix of TF-IDF features.The term frequency/inverse document frequency (TF/IDF) model learns a vocabulary from all of the documents, then models each document by calculating a numerical statistic for each word of the document that reflects how important the word is to the document.
Note that this method comes with options to limit the number of features by setting a 'max_features' option by ignoring terms that have a document frequency strictly lower than the 'min_df' threshold and/or by ignoring terms that have a document frequency strictly higher than the 'max_df' threshold and ignoring the stop words.Given the specificity of the Arabic language and the presence of double and triple words in the dictionary of keywords, we defined the parameter 'ngram_range' to the tuple (1,3), the lower and upper boundary of the range of n-values for different n-grams to be extracted.All values of n such that 1<=n<=3 will be used.
This method consists of representing the reviews by n-grams.The n-gram is a sequence of n consecutive words (in our case).It consists of splitting the text into several sequences of n words by moving with a window of one word.This technique has several advantages.The n-grams automatically capture the roots of the most frequent words without going through the step of searching for lexical roots; these spaces are considered, independent of the language.In fact, not taking them into account introduces noise.

Classification
The classification of texts includes a choice of learning technique (or classifier).Some of the most commonly used learning methods include: naive Bayes, support vector machine, k-near neighbors and decision trees.Usually, the choice of classifier is based on the end goal to be achieved.If the end goal is, for example, to provide an explanation or a rationale that will then be presented to a decision-maker or expert, then methods that produce understandable models such as decision trees are preferred.But it remains difficult to replace tests to know which classifier is appropriate for which situation.In our case we tested three learning techniques namely: − Random Forest Classifier3 (random forest classifier): 'Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.The generalization error for forests converges as. to a limit as the number of trees in the forest becomes large' [21], [22].− Linear SVC4 (linear support vector classification): this is a faster implementation of support vector classification (SVC) for the case of a linear kernel.LinearSVC implements the 'one-vs-the-rest' multiclass strategy.− Multinomial NB5 (naive Bayes classifier for multinomial models): this implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although TF-IDF vectors are also known to work well in practice).

TEST AND EVALUATION OF TRAINING PROCESS
The experimental evaluation of classifiers is the last step in the indexing process.It usually attempts to assess the effectiveness of a classifier, namely its ability to make classification decisions.There are numerous measures for this, each highlighting a particular property of the system.I retained the following, most widely used measures: i) recall, which is synonymous with the true acceptance rate, ii) precision, which measures the rate of correct answers among positive answers, iii) the F1-score, which synthesizes the first two, and iv) accuracy, which represents the number of correctly predicted data out of all the data.Consider the following appointments [23].− TP (true positive); i.e. the number of documents correctly attributed to a class, − FN (false negative); i.e. the number of documents incorrectly attributed to a class, − FP (false positive); i.e. the number of incorrectly rejected documents assigned to a class, and − TN (true negative); i.e. the number of correctly rejected documents attributed to a class.

𝑟𝑒𝑐𝑎𝑙𝑙= 𝑇𝑃𝑇𝑃+𝐹𝑁
= + Equations to compute the recall and the precision [24], [25].An experimental comparative study between these three classifiers was carried out and the different performance scores are indicated in Table 4 for the security and privacy model.Considering the previous comparisons, the model based on linearSVC was generated and exported to the application (website).

Challenges
Several challenges were met throughout this work.The first concerns the collection of reviews and the extraction of the keywords constituting the dictionary.Therefore, we wrote a python script using Google Play Scraper to extract reviews from the Google Play Store, which allowed to automate this collection.After studying the result, we noticed some problems, such as the fact that reviews are generally written using very varied dialects depending on the region, with no respect for lexical or grammatical rules of the Arabic language, which forced to rule out any sort of classic pre-processing on these reviews.The most delicate step was the labelling of the reviews because this step is crucial to learning and is usually carried out by an expert and requires a huge amount of time, but our choice was to automate it using the term-matching technique.
The second challenge concerned understanding the machine-learning world, with all the details concerning supervised and unsupervised learning, classification algorithms, how to work with unbalanced data, how to evaluate a classifier and how to generate a model that we could integrate in our recommender system.The third challenge was the creation of the website which should highlight our machine-learning model and take advantage of the results obtained.The choice was to use a python framework trained in web development which was capable of using models generated natively.After a comparison between Django and Flask, our choice was fixed on the latter.

RECOMMENDER SYSTEM'S WEB SITE
A website is considered the interface of the recommender system, which allows users to check the level of security and privacy of an app.The website involves two parts, as shown in Figure 9.The two parts are; i) keyword part: where the main word of the app is entered that the user wants to search for and ii) search button: to make search Engine work to find all apps that have related to the main word.Users can search for any application they would like to make sure it is safe and will protect their data.Here, I applied a simple experiment on the web engine to search for the social media section.First, we selected the IMO app to check the level of security and privacy on it.In the second step, the recommender system will gather all applications that have links with this keyword (IMO), as shown in Figure 10.The mean computes all reviews found in the class and divides it by the number of these classes.Here is a simple example, to clarify: & Comp Eng, Vol. 12, No. 5, October 2022: 5191-5203 5194

Figure 7 .Figure 8 .
Figure 7. Number of users for the security class

Figure 9 .
Figure 9. Interface of the recommender system

Figure 10 .
Figure 10.Result of the recommender system's search

Figure 11 .
Figure 11.The recommender's result for social media

Table 4 .
Result of experiment