Assessment of the main features of the model of dissemination of information in social networks

ABSTRACT


INTRODUCTION
The study of information dissemination processes is becoming an increasingly important task every year.This happens for several reasons: firstly, because of the importance of information as such in modern society, and secondly, with the development of technological progress, including the improvement of means of communication between people, which now cover almost the entire globe, it has become important understand how this or that information is distributed [1].The analysis of these processes allows us to predict the reactions of certain groups of people to this or that information, and, therefore, it becomes possible to develop strategies that allow us to work effectively with the audience, for wider coverage.
Also, social networks that have appeared quite recently are becoming more and more visited sites every day, where people can spend a huge amount of free time, which in turn has led to the fact that most of the previously consumed information from other sources, people now receive from social networks [2].Thus, the applied value in the development of an information dissemination model can lie in many areas at once, for example, starting from the creation of effective marketing strategies for the development of some news sources, to the analysis of business processes that accompany this process and the analysis of communication systems between people, and, therefore, and analysis of the acceptance of certain opinions in general [3].Information dissemination models [4] have several important factors that can affect their suitability for work.For example, an important problem is that many of the parameters described in such models can be qualitative rather than quantitative, which makes their use difficult, and besides, these parameters are rather difficult to formalize due to their subjective nature.It is also necessary to remember that the processes in social networks, being only a part of the processes on the internet, have a fairly high "impulsivity", which also complicates the analysis and, ultimately, can lead to a rejection of the simulated data and the real ones.
By themselves, the processes of dissemination of information are quite similar to the processes of the spread of epidemics [5].You can imagine a certain information unit as a virus that infects more and more people over time, thanks to their communication with each other, the virus, in turn, also has a certain life span, some group of people has immunity, and so on.Such parallels can be drawn long enough, but for a more substantive description, they should be considered in the context of already existing methods [6].To date, there are several methods that describe these processes.The models built to analyze the dissemination of information are based on susceptible-infected-removed (SIR) models of epidemics, due to the similarity of these two processes [7].However, determining the parameters of the information dissemination model is a complex problem.First, to determine the parameters of the model, it is necessary to have reliable data on the infection rate and the spread of information in time and space.Secondly, the model itself can have many parameters that need to be adjusted for a specific epidemic.
Several attempts have been made to study the dissemination of information using traditional epidemic models such as the susceptible-infected model, and the susceptible-infected-recovered model.Thus, in research [8]- [10], epidemic models were proposed to study the spread processes in various social networks.Wang et al. [11] propose an iterative algorithm for studying an identifiable system and a method for estimating identifiable parameters.The method of least squares, based on a finite set of observations, helps the authors to estimate the initial values of the parameters.Next, the authors test the proposed algorithm.In this work, the least squares method (LSM) is used to estimate the parameters.Chen et al. [12] use the method of moments to estimate the parameters and develop a numerical algorithm to solve them.The paper also presents experimental results demonstrating the effectiveness of the proposed method on real datasets.Stolfi et al. [13] developed numerical tools to accurately calculate the steady state infection probability and influential thresholds, providing an estimated basis for the dissemination strategy.In research [14]- [17], to estimate the parameters that determine the model, the authors propose the least squares method with second-order centering.The article also discusses the problems and future directions of research in this area.Authors use simulations to test their model and compare it to other models.

METHOD
The main purpose of information dissemination analysis is to illustrate the dissemination process.In the course of the study, an epidemic model was chosen to model the process of information dissemination [18].Epidemic models are still used to model the dissemination of information.This is because the process of information dissemination can be compared to an epidemic.Especially on social media.Due to the lack of distance between agents, the speed of information dissemination is very high (provided that the information is new and of interest), the dissemination begins with small groups and moves to larger groups until it reaches a peak and starts to decline.The advantages of the model include its parametric simplicity, as well as transparency in its solution.
The deterministic SIR epidemic model describes how an epidemic is transmitted from one individual (agent) to another.The process has a decay parameter.The state of an agent can be described by three types: vulnerable, infected, and immune.The number of agents in the network can be expressed as (1), where () is the number of information-receptive agents, () is the number of informed agents, () is the number of unreceptive agents, and  is the total number of agents.The unresponsive state can be interpreted as a loss of interest in the news and further unwillingness to spread it [19].The following parameters are used in the model:  is the average awareness rate and  is the constant average rate of "recovery" per unit of time.
The model can be represented as a system of (1) [20].
Int J Elec & Comp Eng ISSN: 2088-8708  As method of convolutional neural networks (CNN), the ResNet152V2 method was used, which makes it there are various methods for estimating parameters in epidemic models [21].In the work, the states of agents are described by real data on three current topics of the VK social network based on a detailed analysis.To estimate the parameters in this work, the authors used a geometric approach.Using a dataset obtained from various news channels of a social network, tangents were drawn to each graph of the function to determine the slope, then, using a system of equations and initial data, unknown parameters are estimated, such as the average speed of agent awareness and the average speed of "recovery".The dataset can be represented as follows: likes, reports, the sum of likes and reports, views, subscribed, and unsubscribed.Thus, from the system of (3) we obtain the following formulas for finding the parameters: where () is -views-subscribed at time ; ()-sum of likes and reports.Information propagation models can be implemented using various methods and approaches such as Cox-Ingersoll-Ross (CIR) models, random walk models, and percolation models.Depending on the goals and parameters set, you can choose the appropriate method and implement it using software tools.In this work, the construction of an information dissemination model with given parameters is implemented in the SiminTech program [22] using functional block programming Figure 1.
Numerical integration was performed by the 4 th -order Runge-Kutta method [23] with a fixed step of 0.001 (day).Thus, knowing the initial number of information-receptive agents, the initial number of informed ones, and the distribution coefficients, we can model the information dissemination model.To evaluate the main features of the model, the authors used hierarchical cluster analysis associated with the construction of dendrograms.In this paper, we consider a hierarchical agglomerative algorithm.Before the start of clustering, all objects are considered separate clusters (one element in each cluster), which are combined during the implementation of the algorithm.First, a pair of nearest multidimensional elements are selected, which are combined into a cluster; as a result, the number of clusters becomes equal to (n-1).The procedure is repeated: either the two elements are combined again, or the element is added to the already existing nearest cluster.This continues until all clusters are united, that is until a single cluster containing all elements is obtained.At any stage, the association can be interrupted by obtaining the desired number of clusters.As a result of successful analysis and integration, our study revealed clusters (branches) on three topical topics.

Data analysis for plant disease classification
In this paper, the social network "VK" is considered, as it is the most frequently visited and largest site on the Kazakhstan Internet.As the research topics of the communities, current news related to politics,  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 6729-6736 6732 news related to information technology, as well as current news from the field of travel were selected.The study period was two calendar weeks since this period is the minimum possible for the full registration of the outflow and growth of subscribers.For each day, the average parameters of the model were obtained, such as the number of likes, reposts, views, and the number of subscribed and unsubscribed agents.The data required for the parameters described above were collected and adapted to the dissemination model.The data is systematized in Excel tables, as it is the most convenient software for such operations among those that do not require special study, besides, data from such tables is much easier to use in other programs.The practical implementation of the information dissemination model is implemented in the SiminTech programs for modeling the process of information dissemination and Statistica Soft [24] for assessing the main features of the model.Based on the data obtained from a publication related to information technology, using a geometric approach, having an initial number of agents susceptible to information, and an initial number of informed ones, we modeled an information dissemination model and obtained the main parameters of the model Figure 2.

Figure 2. Information dissemination modeling
However, there are some discrepancies between the simulation results and the real social network data.This is due to the insufficiency of the number of model parameters necessary for a complete description of the processes.The study of the processes of dissemination of information in social networks is an important task in the modern information society.Such a study makes it possible to identify the patterns and principles that guide users when distributing information in social networks.To conduct such studies, network analysis methods, statistical methods, and machine learning are usually used [25].One of the statistical methods is the hierarchical tree.The Ward method was used, where the distance between clusters is equal to the sum of squared distances between objects and the center of the cluster Figure 3.
In Table 1 shows the meanings of selected topics discussed in social networks, where they are shown between groups (between CC) and within groups (within CC) [26].When analyzing the variance, the 3 topics considered for the model parameters were selected taking into account the large distance between classes and the small distance between features within the class.The results of the analysis of variance for the three classes show a good quality of classification: the significance of the level is less than 5% everywhere.
Potential applications of model parameterization, including more effective development of marketing and advertising strategies in social networks, as well as to analyze the impact of information on public opinion and decision-making.Determining the main parameters of the information dissemination model can also be useful for developing more accurate and efficient algorithms for detecting and combating fake news in social networks.Evaluation of the main features of the information dissemination model helps to determine the most effective methods of communication and improve its dissemination.

CONCLUSION
In this article, we considered the classic SIR epidemic model and adapted it to the problem of disseminating information in social networks by introducing parameters, β, and γ, representing the rate of agent awareness and the rate of "recovery", respectively.The collection and systematization of data was carried out and the factors that influence the dissemination of information were formulated.Using a geometric approach, the main parameters of the model were determined.Based on the results obtained in the work, we can conclude the possibility of applying the classical epidemic model to the problem of disseminating information in social networks.However, there are some discrepancies between the simulation results and real data, this is due to the insufficient number of model parameters necessary for a full description of the processes.Further, using a hierarchical classifier, Statistica Soft evaluated the possibility of applying the epidemic model to the problem of information dissemination.
SIR models provide insight into the coverage and quantitative distribution of information (how many agents received the information in total) but do not provide insight into the distribution channels of information.This model is well suited for the preliminary calculation of the coverage of network agents.In the future, using the model, it is planned to investigate the parameters that affect the reach of the social network audience.For example, the time of publication, the use of virtual marketing to different communities.Even though there are several works, research in the field of information dissemination is relevant and needs to be improved in this area.

Figure 1 .
Figure 1.Functional block representation of the model

Figure 3 .
Figure 3. Dendrogram of clusters obtained using 3 hot topics in social networks: Ward's method, Euclidean distance

Table 1 .
Analysis of the variance of the topics covered