Detecting the magnitude of depression in Twitter users using sentiment analysis

ABSTRACT


INTRODUCTION
The growing use of social media by a more significant sector of society strongly increases the possibility of utilizing the internet as a tool to explore the world and express one's opinions. According to a survey done in 2014, 74 percent 1 of people use social networking sites in one or the other way. Social networking sites allow users to communicate over the internet and share their views, ideas, and thoughts. An essential part of our information gathering behavior has always been to determine what others think. With the growing availability of opinion-rich sources like social networking sites or personal blogs, new opportunities have come up as the users can now diligently use the information to explore and understand the opinions of others. One of the most widely used social networking sites is Twitter, with its 300 million users 2, Twitter has become such a big platform for expressing one's views, that researchers have now begun to use it as a great source for acquiring data to dig into mental health problems. Thus the textual information posted on Twitter is beneficial for sentiment analysis.
The sentiments expressed in the tweets gives an idea of the deeper emotions of the users. Thus the feelings which have some negative meaning expressed in the tweets may indicate a negative emotion. Sentiment analysis of Twitter data is the extraction of emotions and opinions from the tweets posted by the users. It has achieved much notice in the recent years due to its significance in various applications as it analyzes such emotions and opinions analytically and offers a robust technology to a number of problem domains ranging from business intelligence and fake news detection, to analyzing a users emotional state, depression or stress. The tweets have specific keywords and sentences, which are analyzed to identify the users who are at high risk of having depression. They have a certain kind of attitude on social media, and this attitude can be analyzed by sentiment analysis, if the users sentiment intensity stays at low levels it can depict that the user has some degree of mental problem, in otherwords, the user has depression. In this matter, sentiment analysis is very powerful especially when the users are diagnosed as to have certain level of depression. Twitter is used as a social media tool for extraction of data in this research, as the data is available for public and there is a character limit on the tweets, which allows the users to write only appropriate words in a brief manner. The data has been extracted using the twitter API provided to twitter developer accounts. Python has been used to parse json data extracted from twitter, merge selected data and provide a formatted data as a csv file. R has been used for base emotion analysis, sentiment calculation and depression score calculation. The data has been visualized using ggplot in R which provided a better understanding of the final data.
Reece et al. [1] developed computational models to predict the development of depression in Post-Traumatic Stress Disorder in Twitter users. The authors collected Twitter data and details of depression history from 204 individuals (105 depressed and 99 healthy) and extracted predictive features measuring affect,linguistic style and the context in order to predict the tweets (N=279,951). Then they used supervised learning algorithms with all these features to build models. Emerged models favorably differentiated between users who had posted healthy content and the users who had content, which depicted depression. The models gave results even when the analysis was confined to the tweets posted even before the first depression was diagnosed. They replicated the predictive results with an independent sample of users diagnosed with PTSD. The methods used by the authors suggest a data-driven predictive approach for early detection of depression. Singh et al. [2] proposed an approach for analyzing the sentiments from the comments posted on social media using a Lexicon based approach. The proposed approach focused mainly on subjective data acquired from Facebook. The authors' primary focus was on classifying the comments as positive, negative or neutral which gave a correct measure of statistics of likes and dislikes which associated with the posts. The proposed work also helps in securing the Social Media sites such as Facebook to a certain extent.
Singh and Wang [3] focus on predicting depression from the tweets of a Twitter user. They created their own dataset by extracting tweets from different Twitter pages and then labeled those tweets with the help of the polarity score obtained from a python package Textblob. Then the constructed many deep learning models namely RNN, CNN and GRU, which were used to make predictions on the dataset, for all the models they examined the impacts of the character-based vs. word-based models and also the pretrained embeddings vs. learned embeddings. As a result, they found that the models, which performed the best, were the wordbased GRU, which had 98 percent accuracy, and the word-based CNN, which had 97 percent accuracy.
Akay et al. [4] proposed a three-step method wherein the weighted network model was used to represent the activites of the user, the network clustering model and the module analysis model were used to characterize the users interactions over the web and to extract further information from the users posts. Adding to it the topological properties of the network emulate the users activities such posts topic and its timing, while the weighted edges emulate the semantic content of the post and the similarities between them.
Ishtiaq [5] in his work focused on the Parts of Speech (POS) tagging in which the parts of speech were ranked in accordance to their sentiment describing influence. Sentiment Influencers were devised for the parts of speech and were ranked according to how they influence the detection of any kind of sentiment. The results of this work showed that if the POS are ranked appropriately then good results could be obtained. The author mainly focused on sentiment extraction in the form of positive and negative tweets using a rule based scoring mechanism wherein the objective was on ranking the sentiment influencers. For this reason, nouns, verbs, adverbs and adjectives were considered as the main sentiment influencers. All the above influencers were assigned some scores and ranked after a thorough understanding of all the concepts. The results on the proposed model were applied on STSGold dataset and then a confusion matrix was created for three classes, namely positive, negative and neutral. Most of the tweets were correctly classified by the proposed model. Also the functionality of the given model was measured from precision and recall, which was calculated from the confusion matrix.
Nadeem et al. [6] examined the ability of social media for prediction of Major Depressive Disorder (MDD) in the online users, even before it begins. They used a crowdsourced method to compile a list of twitter users who have a chance to being diagnosed with depression. Also used the bag of words approach on the tweets collected from a whole year, then leveraged various statistical classifiers to give certain estimations to the risk of depression. Used Decision Trees, a Linear Vector Classifier and the Nave Bayes Algorithm. Devitt et al. [7] provides a basic understanding of sentiment polarity identification. They use a cohesion-based text representation method used to compare news stories and group thetextsbasedonpolarity. Herepolaritydirectionandintensity is used as an evaluation metric relative to the human decisions. A reference to the work done by Pang and Visvanathan was mentioned wherein they worked on identifying polarity based on the absence of or preens of particular texts. Their work also relies on polarity measurement based on the lexicon of pretagged positive and negative terms which were used as relative positive and negative emotion descriptor. The intensity was measured based on average human ratings for a specific text and as such this was not always a linear relationship between average ranking scale and the real intensity displayed through the text. Also, the problem of over-reliance on the hand-coded lexicon had its cons as they would be inconsistent and prone to errors. So the use of multiple lexicons from different unique sources was suggested so that together using multiple lexicons would increase the base polarity values and a better match with human notions.
Klenner e. al. [8], used an Affect Lexicon that provided the prior polarity of the words. Then a chunker was used to determine words that were relevant to the use in compositional phrase level polarity determination. Here not all the words in the sentence were used and only specific phrases were used to determine the final polarity. The main problem identifying the words was regarding neutral words and their compositions. The composed phrase could create a positive or a negative term, which could be overlooked by the method. The sentiment analysis using lexicon sentiment composition provided a better way to understand the polarity and sentiment evaluation given prior polarity.
Neuman et al. [9], provides a method to screen depression-related words proactive metaphorical screening. This method extracts metaphorical relations in which depression or relevant subsets of this term is used and extracts the conceptual domains relating to it. A lexicon was generated using these results, which was used to automatically evaluate the level of depression in texts or whether the whole text dealt with depression or not. Although this was a unique way, the domain level analysis would prove costly and the apt measure of depression may not be achieved. The level of depression was also not achieved using this method as it was based on the frequency of words in the texts, which could at times provide inconsistent results.
Ireland and Mehl [10] in 2014, the idea that depressed people are in a state of more self-reflection and tend to focus on themselves and talk about themselves frequently even during a conversation has been found by other authors. Using this method singular first person pronouns were evaluated in text transcripts and it was found that there was indeed a correlation between more frequent use of first of singular first-person pronouns when people are depressed. The main analysis to note was that this effect was common among all demographics including age and gender. But the use of singular first person pronouns could only tell to an extent that the person was feelingdepressed, i. e. the sentence would be a negative polarity sentence, but the level or intensity of the polarity was not discussed.
Berengueres and Castro [11] in their paper have introduced an author sentiment benchmark of emoji and have compared it with a writer sentiment benchmark. They found that readers and writers seem to broadly agree on the sentiment of emoji (84 percent) even though the reader's source of comments were tweets and the writer's source of comments was an anonymous work-happiness monitoring app where comments refered to work suggestions. The largest disagreement between readers and writers occurred for negative emoji (sad face and pouting face). For this group of emoji, the authors report feeling 26 percent worse than what readers perceive in terms of standard deviation. Emoji use was not found to be correlated with author moodiness. It was found to be correlated with more happiness within users and between users. Their research seems to improve the accuracy of sentiment estimation of authors and writers of when emojis are used.
The Evolution of Sentiment Analysis [12], in this article, the authors presented a computer-assisted literature review, using automated text clustering with manual qualitative analysis, and a bibliometric study of sentiment analysis of 6,996 papers. They investigated the history of sentiment analysis and evaluated the impact of sentiment analysis and its trends through a citation and bibliometric study, delimited the communities of sentiment analysis by finding the most popular publication venue, discovered which research topics have been investigated in sentiment analysis, and reviewed the most cited original works and literature reviews in sentiment analysis.
To Edwards and Holtzman [13] the authors have clear implications for theory, identification, assessment, and treatment. In terms of theory, the notion that self-focuses are central to depression receives some modest support. In terms of identification and assessment, they conclude that indeed first person singular pronouns are a modest linguistic marker of depression. In terms of treatment, their meta-analysis provides confidence in the estimate of the effect size, such that cutting-edge research on linguistic strategies to cope with depression are more informed ( Rosa et al. [14] found that sentiment analysis can help monitor a person's mood. People with symptoms of depression have similar behavior that can be expressed in the phrases posted in social networks. Thus this useful information helps determine the users who have potential psychological disturbs such as The proposed solution in their research attends all these considerations. It states that a recommender system based on the person's mood can be implemented to improve the emotional state of the users identified with depression symptoms.
Zucco et al. [15] presented a preliminary design of an integrated multimodal system based on SA and AC methodologies for depression conditions monitoring. Specically, the paper described the main steps of SA and AC analysis pipelines and discussed the main challenges in the design and implementation of such a system. Future works will present the nal system and the results testing and validation.
Gratch et al. [16], conducted several clinical interviews designed to support the psychological condition. These interviews were conducted by humans as well as by an agent named Ellie. This data has been transcribed and is being used in this research. The transcribed data has provided a list of words with the term frequency which tells us which words the users used more. They also used multiple clinical methods like the use of sensors, respiration of the patients and Electro Cardio Gram to determine the emotional state of the person. Additionally, the automated agents were able to generate logs of the user's speech and provide realtime recognition of the user's voice pattern and the use of repetitive words. It was found that the people displayed emotions more intensely when interacting with a computer compared to a human. This transcript generated through these interviews will be used in the current research. Although this has been used and investigated in the past, using a new method to identify the words it would be possible to avoid the major hindrances that does not allow for the measurement of the emotional intensity of the user.
Salimath et al. [17] proposed a method for sentiment analysis of the conversation with 189 participants. The dataset consisted of voice recordings, facial expressions, and the text. What they did was that they mainly focused on classifying the conversations belonging to patients as positive, negative or neutral, thus providing the actual statistics of likes and dislikes related to a discussion.

RESEARCH METHOD
In a large portion of the research, which has been done so far, for Depression, one regular thing among all, was that the researchers utilized the information from constrained sources, for example, online gatherings and did not utilize every word which was given in the content. This brought about half learning and less precise presence of some sort of depression in the patients. To defeat this it is required that appropriate content mining strategies ought to be utilized and such methodologies ought to be utilized which can dissect finish words in sentences and screen the presence of depression in those.
The research problem that will be investigated as part of this research is "Does there exist a method to evaluate the magnitude of depression in a person based on the emotional integrity of their tweets". To answer this question, different emotions will be evaluated and a weighted score will be calculated for each tweet which will be averaged out for a single day, thereby generating the final score or magnitude. Also the correlation between the twitter tweets and the magnitude will be explored to understand depression in a better fashion. The primary goal of this paper is to provide a system which can check for depression in twitter users based on their tweets for the past few months. In this section we will discuss how the data has been produced and how the depression levels are measured for the users.

Building corpus
In order to answer the research questions we needed to fetch the data for specific users who can be identified as depressed, so a few keywords were used to fetch a list of users who had used tags such as #abuse, #anxiety, #addict, #addiction, and #bullying. Using these tags a list of users was curated and the twitter API was used to fetch the data from the site. A restriction of 3200 tweets including retweets, videos etc has been placed on by twitter. The twitter data has been compiled for 52 users with each user data averaging around 2000 tweets. The data has been further segregated per day so that the variations for depression can be understood.

Data pre-processing
The data extracted which was raw data in json format and was incompatible for use with our program, so data preprocessing was implemented in three stages.  File Conversion: The json data in utf-8 encoding was first converted to ASCII, so as to remove all Unicode special symbols which would be of no use for the purpose of depression analysis. This data was then parsed and only the keys with value tweet-text, tweet-date, user, tweet-type, location were selected. The parsed data was then saved in the form of a comma separated file (csv) for easy analysis of the data.  Data Cleaning: The final data from the last step contained data in ASCII encoding with a lot of text in gibberish as the Unicode converted special symbols in the tweet were converted to ASCII. Different symbols like exclamation marks, punctuations, digits, special characters etc were removed. After these values were removed it was visible that some tweets produced null values, which were omitted. Also the text formatting was also removed during this step and all words were converted to small case letters.  Stop Word Removal: The TM library was used for this purpose. TM library provides a dictionary with stop words, which are most frequently used words. As the deletion of all stop words would render the depression analysis useless, a join was performed on the stop word dictionary and a list of positive words curated by Minqing Hu and Bing Liu (insert citation http://www.cs.uic.edu/liub/FBS/sentimentanalysis.html).
Using the join removed only the positive stop words and retained the others which could be further used for depression analysis.

Depression magnitude calculation
 Base emotion calculation: Using the syuzhet package, base emotions were calculated for each tweet and a list of eight emotions were realized. The emotions calculated included anger, anticipation, disgust, fear, joy, sadness, surprise and trust. The resulting list of these emotions showed how these emotions had been represented in the tweet. Also the basic sentiment of the tweet was also analyzed, whether it was a positive tweet or a negative one.  Sentiment score evaluation: The tweets were tokenized as bigrams and trigrams to further analyze the sentiment scores for the tweets. Three different lexicons were used for sentiment analysis for the tokenized words namely, AFINN, BING and NRC. All three provide a different magnitude of sentiment score. As the tweets have been created as chunks based on the date, so each day represents a bunch of tweets and it varies considerably based on user. Each lexicon provides the final sentiment score in varying magnitudes which will be difficult to analyze, so the values were normalized around -1 to +1 and then averaged to calculate a final sentiment score for a bunch of tweets for a specific user. the day to day tweets for the user, but this did not prove that a user had been depressed during that time.
Depression as a state can have long term effects and so the values as such can not be taken to represent depression in a user. The values can however correlate to the fact that there is a chance that the user is depressed or not depressed during the period. To use the magnitudes generated in a proper manner, the data needed to be visualized and as each value represented a day of tweets, a list of all the values were found wherein the magnitude remained negative for a considerable period of time.

RESULTS AND ANALYSIS
We take the case of a user "AADowd". From his tweets,it was clearly visible that there had been multiple instances wherein negative magnitude had been recorded for a considerable period. From the figure, it is visible that the user had been depressed from 32nd day to the 44th day as they show a long sequence of negative magnitudes. When the data from the tweets were checked it was clear that the user was in fact in depression as the user all through these days was continuously posting tweets regarding the loss of lives during war, the ill effects of war on the users family, the inability of the government to support such families, political corruption" etc. During this period it was visible that the user was transfixed at a single location for a long period of time, which could be assumed as a house or a hospital. The timing of tweets in comparison to tweets from other periods was different and the tweets in this period were consistent and during specific hours of the day. Similarly, the data also correlated for the period between 70th day to 84th day and the user had either tweeted many negative tweets or retweeted negative tweets by others. A graph has been provided which shows the engagement of this user on Twitter during different hours of the day as well as on different days. But for all the other posts the engagement pattern was very different from the one showing the engagement pattern during the depression interval. This provides us with more evidence that the user was, in fact, suffering from some sort of depression and it could clearly be correlated with the data at different points of time with the tweets.

CONCLUSION
The primary aim of this research was to identify a means to calculate the level of depression in a twitter user based on their tweets. To evaluate the score or the magnitude generated by the algorithm, it was important to correlate the data with the tweets and find out manually whether the scores represented were correct and that they provide a factual understanding of depression in the user. It is clearly evident from the out come that the phases identified by the algorithm does indeed correlate with the user's tweets and so it can be clearly said that the user was depressed during that time period. It was also found that when the user was in depression, they were tweeting during specific hours of the day and during specific days of the week, which was abnormal as for all the other tweets it was recorded to be a different time of the day and different days of the week. A further help of a psychologist was sought out and it was clearly evident from their response that the depression levels do correlate with the tweets and that there exists some sense of depression in the user during that period. The research question for this paper was also answered as to whether there exists a means to evaluate depression on the basis the emotional integrity of a person can be formulated, which can be correlated with the data and studied.
The limiting factor for such systems are manual intervention and clinical assessments. It is important that the data i. e., the tweets correlate with the final depression scores calculated and that a complete medical evaluation would be a preferred method for understanding the mental health of the patient. So in terms of usage, this algorithm can be used as an aide for medical observers to understand patients undergoing depression therapy at rehab centres. It can aide medical professionals to understand a person's behavior based on their on line activity.
In terms of future work the current system can be upgraded to check all social media activity for the user and assess their mental health based on their activities on different platforms. Also the implementation can be used to check the online activity for a person in a closed environment such as a rehab clinic and where the user behavior can be monitored full time. Such activities can scrape keywords from the user's online activity and enable medical professionals to assess the state of the patient in a better way.
Also the manual intervention for validating the tweets can be made autonomous and a machine learning model can be developed which can help in corroborating the existence of any signs of depression based on the magnitude calculated. This can help in making the system completely autonomous and help in easy implementation in specific environments such as hospitals, rehabilitation clinics, offices. This algorithm can also be implemented in a machine learning model and the machine can be made to understand the categorization of tweets and proper evaluation based on magnitude. Such learning can help us understand if a particular user in real-time is currently in depression or has been in the depression for a period of time and further necessary steps can be undertaken.