Classification of instagram fake users using supervised machine learning algorithms
Abstract
On Instagram, the number of followers is a common success indicator. Hence, followers selling services become a huge part of the market. Influencers become bombarded with fake followers and this causes a business owner to pay more than they should for a brand endorsement. Identifying fake followers becomes important to determine the authenticity of an influencer. This research aims to identify fake users' behavior, and proposes supervised machine learning models to classify authentic and fake users. The dataset contains fake users bought from various sources, and authentic users. There are 17 features used, based on these sources: 6 metadata, 3 media info, 2 engagement, 2 media tags, 4 media similarity. Five machine learning algorithms will be tested. Three different approaches of classification are proposed, i.e. classification to 2-classes and 4-classes, and classification with metadata. Random forest algorithm produces the highest accuracy for the 2-classes (authentic, fake) and 4-classes (authentic, active fake user, inactive fake user, spammer) classification, with accuracy up to 91.76%. The result also shows that the five metadata variables, i.e. number of posts, followers, biography length, following, and link availability are the biggest predictors for the users class. Additionally, descriptive statistics results reveal noticeable differences between fake and authentic users.
Keywords
Social media;Machine learning;Paid follower;Fake user;Classification algorithm;
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v10i3.pp2763-2772
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).