A forecasting of stock trading price using time series information based on big data

Received Jul 3, 2020 Revised Sep 27, 2020 Accepted Oct 8, 2020 Big data is a large set of structured or unstructured data that can collect, store, manage, and analyze data with existing database management tools. And it means the technique of extracting value from these data and interpreting the results. Big data has three characteristics: The size of existing data and other data (volume), the speed of data generation (velocity), and the variety of information forms (variety). The time series data are obtained by collecting and recording the data generated in accordance with the flow of time. If the analysis of these time series data, found the characteristics of the data implies that feature helps to understand and analyze time series data. The concept of distance is the simplest and the most obvious in dealing with the similarities between objects. The commonly used and widely known method for measuring distance is the Euclidean distance. This study is the result of analyzing the similarity of stock price flow using 793,800 closing prices of 1,323 companies in Korea. Visual studio and Excel presented calculate the Euclidean distance using an analysis tool. We selected “000100” as a target domestic company and prepared for big data analysis. As a result of the analysis, the shortest Euclidean distance is the code “143860” company, and the calculated value is “11.147”. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.


INTRODUCTION
Recently, due to the proliferation of mobile and the introduction of web services, not only online structured data, but also unstructured data is rapidly increasing, and it is used in various ways in various fields [1]. In the case of big data, the annual average growth rate of 23.1% is expected from 2014 to 2019 in the global market, and the annual average growth rate of 26.4% from 2014 to 2018 is expected in the domestic market [1,2].
In particular, the emergence of social media in the field of big data has been an opportunity for the rapid spread and accumulation of unstructured data accumulated from individuals and organizations regardless of time and place. In fact, about 70% of recently generated digital data is generated in various social media where users generate data, including e-mail [2][3][4]. A good advantage of this more accurately predict a diversified contemporary society, and can provide personalized information to individual. In order  Nam) 2549 to extract meaningful information from a large number of unstructured data generated in social media, interest in big data technology is increasing in various fields, and continuous discussions are being made on how to effectively manage and analyze big data [5,6]. Generally, big data refer to a large amount of large data beyond the range that can be stored, managed, and analyzed by existing database software. However, it is difficult to simply define big data on a volume. Big data describe large scale data that include not only structured data, but also unstructured data types such as text, image, video, and voice. ig data generated in various environments has a large data size compared to general data, and the data creation speed is very fast [7,8]. It is said that big data has three characteristics: volume of data, velocity of data creation, and variety of information types [9,10]. In conclusion, the three aspects are generally called "V", and recently "3V" is also defined as "4V" including the value of the fourth aspect, big data. And, scholars called the oil of the 21 st century big data. Efficient refining of crude oil can produce high added value raw materials like gasoline.
Therefore, it can be profitable to extract valuable information from a large number of data. Big data can be used to solve various problems in the general enterprise. Analysis of big data will help you to operate and manage your company. Big data technology into existing data management and analysis system indicates the technique used to gain insight from the huge extent of the data difficult to handle. Google is the most notable company with big data.
Today, the emergence of big data brings a variety of changes to the way of life in human. The development of computer and information communication technology (ICT) has made it possible to analyze big data. In addition, the importance of big data as a core resource and tool in various fields such as industrial, public, medical, and science, especially in developed countries, is emerging [11]. However, one of the problems that continues to be mentioned with the positive future prospects of big data is related to invasion of personal privacy and protection of personal information.
In the big data era, digital data such as location information, search patterns, and access records generated and generated through various smart devices is generated. In addition, even in the case of data created and released at the will of the person, the possibility of the infringement of personal information continues to increase as such information is used or abused in an unintended direction [12][13][14]. With such problems, research on big data analysis and research on big data security has been actively conducted in certain fields.
The time series data are obtained by collecting and recording the data generated in accordance with the flow of time. Such time series data occurs not only in science, but also in various fields such as medicine, economic, and medical care. If the analysis of these time series data, found the characteristics of the data implies that feature helps to understand and analyze time series data. In particular, the problem of finding meaningful features of the time series data collected in the past and using them to predict future data changes has long been of interest to many researchers. The concept of distance is the simplest and the most obvious in dealing with the similarities between objects. The Euclidean distance is the most widely used methods of measuring the distance between objects, Minkowski distance, Manhattan distance, Mahalanobis distance, Chebyshev distance and Hamming distance.

RESEARCH METHOD
Big data is a set of data that goes beyond the ability of common database management tools to capture, store, manage, and analyze. Recently, due to the spread of mobile and the introduction of web services, the amount of online data has been rapidly increasing and is being used in various fields. In particular, the advent of social media in the field of big data has triggered a rapid increase in the amount of unstructured data that has been accumulated. In order to extract meaningful information from these unstructured data, there is increasing interest in big data technology in various fields [3,4]. A good advantage of this more accurately predict a diversified contemporary society, and can provide personalized information to individual. And, scholars called the oil of the 21 st century big data. Efficient refining of crude oil can produce high added value raw materials like gasoline. Therefore, it can be profitable to extract valuable information from a large number of data. Big data can be used to solve various problems in the general enterprise. Analysis of big data will help you to operate and manage your company.
Analysis and forecasting of the stock market have long been recognized as a very important research project, not only in the economic field, but also in mathematics, statistics, and computation. Recently, with the development of financial engineering, research on the prediction and the use of stock prices through scientific methods has been greatly activated. The stock price prediction algorithm is classified into three types: mathematical prediction, statistical predictive, and artificial intelligence prediction. Recently, in order to compensate for the weaknesses of financial engineering systems, patterns are extracted from SNS or news and applied to stock price prediction [15,16]. First, mathematical prediction is a technique that predicts the future value quantified based on a mathematical model to determine whether to invest, such as building a portfolio or trading. The Black Sholes Model, published by Fisher Black and Myron Shoals in 1973, became the basis for all options trading, and various techniques have since emerged. Representatively, the filtration method (percolation method) that studies how the price moves on a trading order with a limited transaction price range. The wavelet transform is used to analyze the movement of time series data and use it to predict the association between data and future motion.
There is a moving average analysis that divides the arithmetic average of stock prices within a certain period and expresses them as the average stock price. There is a Monte Carlo simulation method that statistically obtains a stochastic distribution of the results to be obtained by generating a large number of random numbers. Statistical forecasting is an approach to predict the future based on historical stock market data. The AI-based stock price prediction method, which began in the late 1980s, finds optimized parameters applicable to predictive models. SVM, ANN, and GA are widely used in classification and regression analysis. It is widely used to find optimal patterns or weighting variables of predictive models using neural networks or genetic algorithms.
Prediction using SNS or news is a method of extracting meaningful features in a document through text mining processing after collecting text data. Using this, it is classified whether the news is good or bad for the stock price and then attempts to predict the simulation investment and price fluctuation using the classification result. Bollen [16] predicted the rise and fall of the dow jones indices (DJIA) by measuring six emotion modes (calm, alert, sure, vital, kind, happy) detected by Twitter. Schumaker [15] proposed AZFinText, a machink learning system that derives stock price predictors from the news, and conducted experiments that simulated trading.
The concept of distance is the simplest and most obvious in dealing with the similarity between an object and an object in a specific coordinate or space. K-nearest neighbor algorithm is used for classification learning and is a very simple and efficient nonparametric method proposed by Hart in 1968. It is a very intuitive method of finding the k-nearest individuals in the training dataset for a single entity based on the similarity between the samples and assigning the highest frequency group within the k-sets. There are many ways to measure similarity within k-nearest neighbors. Euclidean distance, Minkowski distance, Manhattan distance, Mahalanobis distance, Chebyshev distance and Hamming distance are the most widely used and widely known Euclidean distances [17,18]. In general, the One dimension is a vertical line. The Two dimensions represent the coordinate plane and the Three dimensions represent the space plane. The most commonly used and widely known of these is the Euclidean distance. Therefore, based on this methodology, Figure 1 shows three equations for measuring the distance between entities [19][20][21][22]. We want to analyze the similarity between entities using raw data collected using the following equation.

CONCLUSION
Finding meaning through based on big data analysis is the ultimate goal of all researchers. Things you never thought of in the past are now possible. Today, in the big data era, it is possible to forecast the future through data. Based on previous studies, similarity was analyzed through stock trading prices using a simple and clear Euclidean distance. This study is the result of analyzing the similarity of stock price flow using 793,800 closing prices of 1,323 companies in Korea. As a result, Euclidean distance is a method of classifying similar companies using the price flow of stocks between companies. We calculated the Euclidean distance after coding using visual studio as the most convenient and smart big data analysis tool. First, we selected "000100" as a target domestic company and prepared for big data analysis. Next, Euclidean distances for 1,323 companies were calculated based on the reference company using visual studio. As a result of the analysis, the shortest Euclidean distance is the code "143860" company, and the calculated value is "11.147". The meaning of Euclidean distance is interpreted as showing similar stock price flows in the past. It can't be said that the two companies with these results show the same share price trend in the future. However, it can be said that the flow of stock prices seems to be similar. Finally, Figures 2 and 3 show the