Security techniques for intelligent spam sensing and anomaly detection in online social platforms

Received Oct 23, 2018 Revised Aug 21, 2019 Accepted Aug 30, 2019 The recent advances in communication and mobile technologies made it easier to access and share information for most people worldwide. Among the most powerful information spreading platforms are the Online Social Networks (OSN)s that allow Internet-connected users to share different information such as instant messages, tweets, photos, and videos. Adding to that many governmental and private institutions use the OSNs such as Twitter for official announcements. Consequently, there is a tremendous need to provide the required level of security for OSN users. However, there are many challenges due to the different protocols and variety of mobile apps used to access OSNs. Therefore, traditional security techniques fail to provide the needed security and privacy, and more intelligence is required. Computational intelligence adds high-speed computation, fault tolerance, adaptability, and error resilience when used to ensure security in OSN apps. This research provides a comprehensive related work survey and investigates the application of artificial neural networks for intrusion detection systems and spam filtering for OSNs. In addition, we use the concept of social graphs and weighted cliques in the detection of suspicious behavior of certain online groups and to prevent further planned actions such as cyber/terrorist attacks before they happen.


CURRENT MACHINE LEARNING SOLUTIONS FOR INTRUSION AND SPAM DETECTION OVER OSNs
Today, Facebook, Twitter, Instagram, Google+, Snapchat, and other social media networks shaping our social lives, by effortlessly staying connected with friends and family. Nevertheless, OSN users need to make a conscious decision about every piece of personal information he/she shares, because the aforementioned threats and intrusions. In addition, every social network has its privacy and security settings that govern the online experience and protect user's information [26]. The privacy and security settings are adjusted through collaboration between academic researchers, security companies, and delegates for every social media network to cope with current security and privacy threats. Therefore, the security and privacy settings must be updated and enhanced at regular basis.
The operators of social networks improved authentication by including new options to prevent possible threats and attacks. Among these authentication techniques are the two or three-factor authentication, such as using mobile phone numbers to verify their accounts. In addition, tokens or one-time pads are being sent to users to verify social networks accounts [27].
On the other side, social networks operators allow users to configure their own privacy settings such as limiting who can see their profiles, who can contact them, and the option to block certain users. In order to achieve automated future protection data analysis and classification techniques are required [28]. The further protection can include options like abuse, spam messages and privacy policy violations reporting. Moreover, efficient implementations of cryptographic algorithms can be used to provide confidentiality for users data [29]. The authors in [30] proposed an efficient programmable elliptic curve cryptography implementation, which is considered to be very secure crypto system. In similar context, the authors in [31] proposed a scalable crypto processor that can be used to provide confidentiality for different applications with different operand sizes.
On the other side, information security companies have a significant role in providing better protection by developing security tools such as the ZoneAlarm SocialGuard software that offers high protection from strangers and dangerous links on Facebook. Several researchers investigated new solutions for intrusion detection and threat prevention in social networks. Stringhini et al. [32] developed a technique to detect spammers in social networks. They show that it is possible to automatically define which accounts spammers are using. During their study, the research team collaborated with Twitter and using their technique, they detected and deleted around 16000 spammer profiles. Another work by Gao et al. [33] proposed an online spam-filtering system that inspected messages sent from users in real-time before reaching the recipients. The authors suggested to reconstruct spam messages and classify them into campaigns, and so, the messages will be examined in campaigns rather than individually. On the other hand, fake news is considered as spam data and unfortunately, it can be spread over the OSNs very fast. Many previous related works discussed the wide spread of fake news over social networks and proposed some solutions for fake news detection by applying machine learning techniques over Online Social Networks.
The authors in [34] identified two opposite ways to detect the fake news: human intervention and using algorithms. The first approach depends on the users to flag the fake news by fact checkers from media organizations such as the Washington Post, Snopes.com, and the French newspaper Le Monde that has a specialized fact-checking unit who developed a web extension Decodex. The second approach is to use algorithms to validate the information sources and identify fake contents. In their opinion, this approach has not yet gained the necessary robustness to accurately verify which information is false or which is not.
Machine Learning has been used to detect fake news based on news content and social context features [35]. The existing Boolean crowdsourcing algorithms work well when used to classify a post with social interactions is above a certain threshold. The performance might go down when the social interactions are below that threshold. Based on that, the authors proposed content-based methods to be used as well. The paper combined content-and social-based approaches by computing a score and classifies posts exceeding a threshold λ. The score depends only on social interactions i.e. number of likes and shares on Facebook, retweets and follows on a Tweeter, etc.
Social spammers change their spamming strategies to trick deployed anti-spamming systems, which creates the need for more efficient anti-spamming techniques to protect social networks users. The authors in [36] indicated that social bots can be used populate social systems. In most of the cases, the social bots are used for useful purposes but in other cases, it can be very harmful by deceiving the Online Social Networks users. For example, they can be used to influence elections, tamper the stock market, and spread fake news to serve certain agenda. They also mentioned that there are many proposed systems to detect social bots. Some of these systems utilize crowdsourcing strategies, feature-based supervised learning, and hybrid systems.

279
We can say that systems that depend on machine learning techniques are the best candidates for insuring spam-free social networks [37]. Machine learning techniques lean on electing knowledge from previously sent spam items and then use the acquired information to predict the behavior of newly received spam and classify them. The authors in [38] proposed an efficient classifier to predict and detect spammers' actions using feature relevance analysis on social network is developed Zheng et al. [19] proposed an effective spammer detection system based on supervised machine learning solution. This system considered user's content and behavior features, and then applied them into the SVM for spammers classification. According to the experiments, this system showed excellent performance. Suganya, and Hemalatha [39] combined user's content and behavior features with machine learning to implement spam classification method and their experiments showed interesting results. Hwa et al. [40] focused on sending spam using sets of thousands of fake accounts. Authors provided a machine-learning pipeline that classifies fake accounts into clusters according to their actors.
Fahim, Mutahira and Naseem [41] presented the reason behind the behavior of Facebook spammers. They also proposed a methodology to filter Facebook spam using Artificial Neural Network to detect each unusual action that may lead to spam sharing or post by studying the behavior of all friends. Assuming social networks make our social lives simpler without worries about privacy and security concerns, authors of [42] talked about the big role Machine Learning techniques play in OSN privacy. They focused especially on Artificial Neural Networks and Genetic Algorithm as they both show extra intelligence and prediction that is more accurate.

ARTIFICIAL NEURAL NETWORK
Artificial Neural Network is a branch of Artificial Intelligence (AI) that is based on the neural structure of the human's brain [43]. ANN aims to convert a specific input into significant output using hidden artificial neurons, which are considered the main processing elements in ANN that can be used to develop many applications [44]. Each neuron is programmed to accomplish specific operation according to which data will flaw from neuron to another. These neurons are organized in specific layers through which input data move until the output is produced. In other words, the output is an emergent result of each operation performed by every neuron data reach [45]. Building Artificial Neural Model to solve specific problem needs intensive knowledge about the problem, the ANN itself, and the working plan [45].
Each Artificial Neural Model has input (data to be processed), the output (resulted information), neurons, and weights for them. The model also contains the operations (mathematical functions) that determine, which neuron data need to be activated [46]. The high weight of a neuron indicates strong data to be operated. By setting the weight of every neuron using particular algorithm implemented specifically for this reason, the output will be produced for specific input [46]. Almost all ANNs have the same structure. Figure 1 and Figure   Artificial Neural Networks are more than artificial neurons grouped into layers and connected through communication lines. According to Figure 1, there are three kinds of neurons. There are neurons that receive the input from the real word, neurons that send the output to a secondary processing and controlling system, and neurons that are hidden from view. The neurons are distributed into several layers. Each neuron in the hidden layer receives input from all input neurons and sends output to all output neurons after performing its correlated function. On the other side, there are three types of communication lines that connect neurons together.
There are connections that let next neurons' summing mechanism add, and other connections let them subtract. Some ANNs have another type of connections, called feedback connection lines. These lines are used to route back the output from the output layer to the hidden layer as it can be seen from Figure 3.
After structuring an ANN for a specific application, the network begins to learn. Training the network happens in two different approaches. The first approach is the supervised training in which we supply the network with output either by rating the performance of the network or by providing the output along with its input. In the second approach, no outside help is provided to the network, and it should portend the input according to specific characteristics. Artificial Neural Networks have been successfully applied to several real-world fields. For example, it is applied in finance (e.g. credit rating), medicine (e.g. patient diagnosis), industry (e.g. process and quality control), and science (e.g. character recognition) [47]. Moreover, the ANN can be applied in education (e.g. teaching neural networks), energy (e.g. electrical load and demand forecasting), and other miscellaneous fields [48].

281
Artificial Neural Networks are recently used in intrusion detection systems, threat prevention systems, and spam prediction systems. However, there is not much work done on using Artificial Neural Networks for security concerns in OSNs. We believe that it is going to be interesting, valuable and contributory to study the availability of ANN's security applications for ensuring the required security and privacy in OSNs. Basically, in the field of security and privacy in OSNs, ANNs will be utilized in two ways, distinguishing normal accounts from spam accounts and designing detection features [49]. In both cases, ANN security-ensuring systems need to be updated frequently.

ARTIFICIAL NUERAL NETWORK INTRUSION DETECTION AND SPAM FILTERING OVER SOCIAL NETWORKS
Detecting spam emails and social network spam posts can benefit from applying the same techniques, because of the striking similarities according to [50]. Online Social Networks malicious community represented by spammers is getting more dangerous. A proposed but not perfectly explored strategy is to structure an Artificial Neural Network for Spam detection over social networks. In general, to approximate specific functions by ANN, there are difficulties in setting up its structure, deciding hidden nodes, and dealing with its complex parameters like weights of connections and learning rates.
To overcome such difficulties, ANNs are applied along with Genetic Algorithm to enhance the performance of spam detection and classification [51]. The authors in [51] proposed a combination of both ANN and GA to come up with a new hybrid algorithm that beats the conventional ANN. According to the improvement on spam detection accuracy, the proposed hybrid algorithm can be implemented to detect spam messages on OSNs. The authors of [52] proposed a system that focused on the main body of the spam and checked it word by word using ANN. Each word in the message is given a specific weight based on its probability to be a spam word. According to these weights, the message is blacklisted or whitelisted. If a message is blacklisted, then it is sent from domains that are restricted to spammers. If the message is whitelisted, then it is sent from trusted domains.
In addition to distinguishing legitimate from ham messages, this research develops a technique using Optical Character Recognition (OCR) tools to extract spam message embedded in images. According to the proposed system, the text spam-detection, and extracted text from image spam-detection are very important to be utilized in Social Network, because same types of spam are being spread via these Networks. Applying such system to OSN spam detection suits will help in reducing wasted time and memory and will protect personal data from being harmed because of spam-spreading.
The work in [53] took into consideration the task of Text Classification (TC) of spam messages. The authors proposed an anti-spam filtering system that uses ANN for multilayer protection, and a Genetic Algorithm to train their protection system. Applying this system showed high level of accuracy when used to distinguish ham messages from spam messages. From this research, we derive that subject and body fields always contain specific indications that ease the process of distinguishing ham from spam messages. On the other hand, this system proved that 15-30 hidden neutrons are good enough to process easy messages and classify them. In Social Media, this system can be utilized to detect spam messages if authors solved the problem of long detection time.
Another hybrid ANN was proposed by [54]. In this research, they used Radial Basis Function Neural Networks (RBFNN) along with Particle Swarm Optimization (PSO) to reach better accuracy and effectiveness. This method is very appropriate to be utilized in OSNs because it used improved network architecture and learning algorithm. In [55], the authors proposed another multilayer ANN method, and they called it "antidote." This method is very special because it is designed to serve each user by using his chosen parameters to set an appropriate multilayer ANN according to which messages are going to be classified into spam and legitimate messages. This system can be applied to Social Media security suites due to its flexibility and short learning time.
ANNs have great potential in OSNs intrusion detection; unfortunately, they have not been fully investigated in the literature. In general, IDSs can catch misuses and stop them from causing damages. For the same reasons, ANN-based intrusion detection methods can be applied, with some modifications, to Social Network platforms. Al-Jarrah and Arafat [56] used Time Delay Dynamic Artificial Neural Network (TDDNN) to identify each attack behavior. They designed their system to generate alerts when the ANN classifier recognizes an attack. Producing the attack features takes short and constant time starting from recognizing the attack presence to generating the attack alert. Because of its fast intrusion recognition, this system is very compatible with Social Networks security and privacy needs, especially because OSN attackers are very aggressive.
Qiu and Shan [57] used multiply swarm optimization-back propagation MPSO-BP neural network for their proposed model of intrusion detection. PSO algorithm is used to optimize back propagation ANN's parameters. Thus, the proposed model showed an improved effect on the intrusion detection rates in comparison with PSO-BP neural network and BP neural network. This model is suitable for Social Network Platforms, because it can handle significant amounts of data simultaneously; in addition to its independent learning and regular database updates.
In [58], the authors proposed supervised back propagation ANN based Anomaly Detection System. Their system aims to catch all attempted anomalies and keep all data completely safe and it concentrates on the hierarchy anomaly IDS. The proposed system showed higher accuracy, efficiency, and performance because they use only 17 KDD 99 features. They followed features reduction technique that are appropriate for Social Network Platforms as accuracy reaches 98% and training and testing time is reduced to the minimum.
In [59], the authors compared the accuracy achieved by applying several methods to Anomaly Detection System. The methods they studied in their work are Genetic Algorithm with Artificial Neural Network (GA-ANN) Classifier that used 18 features. Other methods they used are the Modified Mutual Information Feature Selection (MMlFS) with 24 features, Linear Correlation Feature Selection (LCFS) with 21 features, and Forward Feature Selection (FFS) with 31 features. According to their study, GA-ANN classifier raised the accuracy of detecting anomalies to the maximum of 99% making it an excellent candidate for Social Network Platforms.

DETECTION OF MALICIOUS ACTIVITIES AND COMMUNITIES' BEHAVIOR OVER OSNs
In general, the OSN malicious communities share many of the following distinguishing observations: a. Social media spammers are the most perspicacious among all kinds of spammers. b. 40% of all social media accounts are marked as spam accounts. c. Nowadays, most of malicious contents are being sent and shared by automated spamming tools. Such tools send spam efficiently especially when targeting groups of users. d. Spam accounts tend to be connected (Friends or following each other), because they usually send following and friendship requests with no specific consideration to the quality of the accounts they contact. e. Spamming accounts tend to accept all friend requests they receive and follow back all accounts they follow them. f. When there is a specific inner relationship between spamming accounts over social networks, they can be exposed easily. g. Spamming accounts share topics that attract the targeted victims, and these topics are usually similar across most of the spamming accounts. One interesting topic is to fake celebrities' accounts. h. Malicious accounts tend to stay active for protracted periods and keep damaging as long as they are active. i. Spammers search all popular accounts to reach the private information of their followers or friends and use them in their crimes.

Clique-based detection methodology
An OSN user is usually represented by an account or a profile. The profile describes the user's social related attributes that include his or her name, the list of contacts, and their hobbies or interests. There are two types of relationships between the accounts: it could be either one sided as in Twitter or could be two-sided as can be seen in Facebook friendships. As mentioned earlier, the online social network users can share videos, photos, locations and even more personal information such as birthdays and phone numbers. It is worth pointing out that even Facebook and Twitter are among the most popular OSNs, there are other online social platforms such as Google+ and LinkedIn. Such networks help users from different geographical areas to stay connected. Moreover, these online networks allow their users to establish new relations with different people all over the world who have similar interests including the same profession or hobbies.
Online social networks can be represented as a social graph. Let's assume the group of users within a certain social network consists of three users who are represented as nodes {A, B, C} on that network. The relationship between these users are represented by edges. This social graph can be used to identify "ties" between the users (nodes). These ties could give common details and interests of the group members such as their gender, personal interests, sports, education level, and many other important details. Now, technically speaking these groups over online social networks that have all the users as friends are called sometimes cliques. Figure 4  concept of cliques, Figure 5 shows the same users within the same group represented as nodes on the social graph, but they are not clique for each other. As can be seen from the figure, the users C, D, and E are not friends for each other. As it is known, the users of online social networks spend good amount of their time socializing with friends and people whom they share the same interests with. There is a high possibility that some malicious behavior or thoughts and posts might take place within these socialization activities. Here comes the important role of the social graphs to build a trustworthiness model based on different social activities that take place between users within the same group [60]. It is believed that the social graphs can be constructed specially between cliques to capture different types of social activities. These activities range from just retweeting to posting new tweets [61]. Also, activities to be monitored and captured include sharing false information, viral images and videos on Facebook and other social platforms.
Using social graphs, we can propose many levels of trustworthiness. After that, malicious activities can be detected based on their level of trustworthiness. In other words, the users and their social relationships and activities over the online social networks are grouped as distinguished entities. And so, by measuring how much each social activity is trustful, we can distinguish the legitimate activities from the malicious activities within the clique network. Figure 6 shows the different methods used to detect malicious activities over online social networks including the social graph. The other detection approach is using the Machine Learning techniques, which can be classified into supervised methods and unsupervised methods. The difference between the two categories is that the supervised methods use prepared set of data for training and predicting the model, while in the unsupervised methods there will be no data used for training. Examples of the first category include: regression models, support vector machine (SVM), and decision tree models. Examples of the unsupervised learning include: clustering algorithms and hidden Markov models. There is another machine learning category that uses combinations of the above two categorizes and so it is called semi-supervised method.
The third malicious activities detection technique is through using manual verification. The content should be analysed to make sure if it is fake or legitimate, therefore is the OSNs users' task to double check the contents before spreading it to other users. As part of the manual verification process, some social online platforms allow their users to report content that violates their privacy rules. Such violating content might include malware links, spam, and aggressive or abusive content. Moreover, many social networks ask the feedback from the users to enhance their privacy policies.

Results and discussions
For each entity in the social graph the trustworthiness score will be calculated. Now, if that score is below a certain threshold, then we can predict with high confidence that the activities associated with this entity will be malicious and can't be trusted. One main social graph property is the Diameter, which is defined to be the greatest distance between any two nodes on the graph. In order to find the diameter, first we need to find the shortest path between any pair of nodes in the social graph for all nodes. Then we take the maximum length of all these shortest paths. As an example, let the length between any two nodes (x,y) on the graph to be len. Then the diameter, diam, is: diam= max len(x, y) for all possible nodes on the social graph The usage of cliques is popular in graph-based analysis areas including the social networks in order to understand connections and trust relations between graph nodes. As an algorithm, clique calculates the maximum number of nodes in the graph in which every node in the clique is a friend to all other nodes in the clique. Figure 7 shows the total and weighted clique strength. Figure 7. Comparison between total and normalized/weighted clique strengths It is clear that extracting clique relations in OSNs captures groups properties and how they interact with each other. Larger cliques may contain smaller cliques and the researcher's focus should be on finding the maximum clique size. Our main contribution was normalizing clique strengths to give better assessment of user's trustworthiness and users activities and interaction within a clique.

CONCLUSION
In this paper, we addressed the important issue of security and privacy in Online Social Networks. What makes securing information in OSN more complicated is the heterogeneous web applications and protocols used, and variety of mobile apps platforms such as Android or iOS used to access OSNs. We investigated the need to apply Computational Intelligence techniques in OSNs security because traditional security techniques are not efficient enough to provide complete-protection against recent cyber-attacks [63]. This research took Artificial Neural Networks and Machine Learning into consideration and provided a comprehensive related work study and analysis of the existing methods for spam and fake news detection. Moreover, we investigated the application of ANNs for intrusion detection systems and spam filtering for OSNs platforms.
Finally, we addressed how online social networks can be structured into social graphs with the users represented by nodes on the graph. Moreover, and based on the observation that certain community or group over an OSN can form an entity with their social activities, we proposed the trustworthiness principle to identify the communities/entities with malicious activates. Additionally, we proposed new approach that uses the concept of weighted cliques in the detection of sub-communities' malicious behaviors over OSNs. The proposed methodology is based on computing the overall weight of the clique based on individual edges, and it can be used to identify suspicious behavior of certain online groups and to prevent further planned actions such as cyber/terrorist attacks before they happen.