Earthquake trend prediction using long short-term memory RNN

Received Sep 25, 2017 Revised Oct 15, 2018 Accepted Nov 30, 2018 The prediction of a natural calamity such as earthquakes has been an area of interest for a long time but accurate results in earthquake forecasting have evaded scientists, even leading some to deem it intrinsically impossible to forecast them accurately. In this paper an attempt to forecast earthquakes and trends using a data of a series of past earthquakes. A type of recurrent neural network called Long Short-Term Memory (LSTM) is used to model the sequence of earthquakes. The trained model is then used to predict the future trend of earthquakes. An ordinary Feed Forward Neural Network (FFNN) solution for the same problem was done for comparison. The LSTM neural network was found to outperform the FFNN. The R^2 score of the LSTM is better than the FFNN’s by 59%.


INTRODUCTION
Earthquakes are a natural hazard which can cause a lot of damage and loss to human life. It does not follow any set of patterns in occurence and thus predicting the trend has always been an important area of research. A fault in geology is a fracture in rock across which there has been significant rock mass movement. Faults are created by the action of plate tectonic forces. The energy release caused by rapid movement of plates across certain faults can be called active faults and is the most common cause of earthquakes. This energy travels to the surface of the Earth as waves. There are three kinds of seismic waves. S and P waves penetrate the interior of the earth and are hence called body waves [1]. The third kind of waves and the most destructive are surface waves, which are similar to water and travel across the surface of the earth.
Due to its destructive potential, humankind has long been searching for an earthquake trend prediction method. Predicting an earthquake implies stating the exact time, magnitude and location of a coming earthquake. Prediction models come under either short-term prediction (<1-year time scale), long term prediction (10 to 100 years time scale) or intermediate term prediction (1 to 10 years time scale) [2], [3]. Great effort has been made by the scientific community but due to the intrinsic random nature of the phenomenon itself, no valid and reliable method has yet been found. Nevertheless, earthquakes generation is not a cyclical process due to the variation of rupture area and earthquake-mediated interactions along other faults. This means that the time between events can be extremely irregular. Consequently, the prediction of the time, or a relatively close time interval, of an oncoming large earthquake is still a difficult task. Although considerable research is devoted to the science of short-term earthquake forecasting, standardisation of operational procedures is in a nascent stage of development. The problem is challenging because large earthquakes cannot be reliably predicted for specific regions over time scales that span less than decades. There are two general approaches to predict earthquakes, precursors based and trend based. Precursors are anomalous phenomena that might signal an impending earthquake [4] such as radon gas emissions, unusual animal behaviour, electromagnetic anomalies etc. Trend based methods involve identifying patterns of seismicity that precede an earthquake. In this paper, a trends-based approach is adopted and the LSTM neural network is used to capture the trend involving statistical techniques.
The relationship between the maximum of earthquake affecting coefficient and site and basement condition was studied, also proposed a model based on earthquake magnitude prediction using artificial neural network in the northern read sea area [5]. A multilayer using compression data for precursor detection in electromagnetic wave observation was proposed [6]. A time series approach composed of seismic events occurred in Greece was applied [7]. A study between radon and earthquake using an artificial neural networks model was done [8]. A relationship between radon concentration and environmental parameters for earthquake prediction was modelled using an ANN in the region of Thailand [9]. A neural network for classification after analysing the electric field data and the seismicity collected from different stations was studied and results were pretty accurate [10]. Investigated the seismic damage identification by using a PCAcompressed response function and artificial neural networks [11]. Prediction of earthquake damages and reliability analysis using fuzzy sets [12]. The variation of Total Electron Content (TEC) as an anomaly as an indication of earthquake a few days or hours before it, this was used by them to build a model [13]. Recursive sample-entropy technique for earthquake forecasting, where the earth data based on VAN method was used for the modelling [14]. Models based on measurement of elastic and electromagnetic waves to predict earthquakes and tsunami was done [15]. Earthquake hazard assessment was done using EaHaAsTo tool for visualization [16]. Determined the threshold energy leading to seismic activity [17].

RESEARCH METHOD
An artificial neural network is a mathematical model that mimics the biological neurons in brain. A neural network is a set of input, output and hidden layers. These layers have nodes which are interconnected through links. These links have some associated numeric weight which determines how much the input contributes to and affects the results. The weights and activation functions can be modified by a process called learning which is governed by a learning rule [18]. In this paper we have compared the structures of Feed Forward Neural Network (FFNN) and Recurrent Neural Network (RNN) on time-series based data.

Feed forward neural network
Feedforward networks are acyclic network usually arranged in layers, where each neuron receives inputs only from the immediately preceding layer. The architecture of an FFNN with 2 hidden layers is shown in Figure 1. context about each point of interest [19]. This approach is followed by concatenating a fixed number (fourteen) of past earthquake data and giving them as input to the FFNM with the next earthquake as the target. The model uses two hidden layers with 20 nodes and 60 nodes respectively. All the nodes having the sigmoid function as their activation. The learning rule used it the 'rmsprop'. This model is trained for 1000 epochs on the dataset. These attributes were selected after employing a grid search method which selected the best architecture based on the error rate.

Long short-term memory
Long Short-Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture (an artificial neural network) proposed by Sepp Hochreiter and Jürgen Schmidhuber in 1997 [20]. RNNs can capture the dynamics of sequences via cycles in the network. But some RNNs suffer from the vanishing and exploding gradients problem in which gradients are either squashed to zero or increase without bound during back propagation through a large number of time steps. LSTM is introduced primarily to overcome the problem of vanishing gradients. It has chain like structure, having three or four neural network layer or "gates" which are implemented using logistic function.
The information given in [19] depicts about the forward pass and backward pass in LSTMs. In terms of the forward pass, the LSTM can learn when to let activation into the internal state. As long as the input gate takes value zero, no activation can get in. Similarly, the output gate learns when to let the value out. When both gates are closed, the activation is trapped in memory cell, neither growing nor shrinking. In terms of backwards pass, the constant error carousel enables the gradient to propagate back across many time steps, neither exploding nor vanishing. Figure 2 shows a chunk of neural network, A, which looks at some input x t and outputs a value h t . A loop allows information to be passed from one step of the network to the next. The LSTM RNN model proposed in this study includes two hidden layers with 40 hidden units each that are LSTM cells. The backpropagation through time is limited to 15 steps. A dropout layer is included between the 2 hidden layers for regularisation [21]. It will randomly exclude 30% of the activations of the previous layer from propagating to prevent overfitting. The Root Mean Square (RMS) loss is reduced using the Adagrad algorithm which increases the learning rate for more sparse parameters and decreases the learning rate for less sparse ones. This strategy often improves convergence performance over standard stochastic gradient descent in settings where data is sparse [22]. The initial learning rate is taken to be 7 and is exponentially decreased when the RMS loss does not improve for more than 10 epochs. The training was stopped after the loss started to fluctuate despite very low learning rate. The number of epochs came to be 1600. These parameters were selected after trying out other architectures.

RESULTS AND ANALYSIS 3.1. Data exploration
Number of occurrences of earthquakes recorded over various regions is shown in Figure 3. The Afghanistan -Tajikistan region recorded close to 5000 earthquakes being the highest. The Lakshadweep region recorded the lowest number of earthquakes. The distribution of count of earthquakes based on the magnitude recorded on each earthquake is shown in Figure 4. Earthquakes of magnitude 3.4 on the Richter scale were highest in amount being close to 45 occurrences whereas that of higher range close to 7.9 are low in occurrences. Figure 5 shows the average magnitude of earthquakes in each region. The western xizangindia border has the highest average magnitude whereas Thailand and North-Eastern India has the lowest. Figure 6 shows the average depth of each earthquake recorded in kilometres. An earthquake of 140 km is the    Figure 7 shows the Mean Square Error after every epoch for FFNN. As the model converges, the error becomes static and achieves highest value showing its inability to model sequence data. Figure 8 shows the R^2 score for each variable used for prediction and are negative which indicates that the FFNM is not able to capture the chaotic nature of the attributes. Figure 9 shows the original data set plotted on map. Figure 10 shows the predicted 6000 earthquakes by the FFNN plotted on the map, as seen the region is clustered showing a weak prediction trend. Figure 11 shows the region wise distribution of earthquakes of the earthquakes predicted by the FFNN, as observed the data is skewed and it shows the Xizang region will have more than 7000 earthquakes in the prediction trend.  Figure 11. Region wise distribution of predicted earthquake by FFNN Figure 12 shows the MSE after every epoch for LSTM. As we can see the MSE converges at a point lower than that of the FFNN model, this show that the model is converging and the error rate is low as well. Figure 13 shows the R^2 score of every variable on the test set. The scores are all positive. There is a drastic improvement in the score for Timestamp and this pulls up the overall score to -0.252 which is 59% more than the score for FFNN but is still negative. This indicates that though explicitly capturing sequence information by using LSTMs leads to better results than traditional neural networks. The R^2 score is used to evaluate our models. R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. R^2 scores are considered to be better than the MSE because it is scaled between 0-1, where as MSE is not scaled to any particular values. R^2 can be interpreted more easily. Table 1 gives the comparison of the R^2 scores between the Feed Forward Neural Network and Long Short-Term Memory RNN.  By plotting the earthquakes predicted from the data set on a map across epochs we can get a feel of how the data is spread. Figure 14 shows the Future 6000 predicted earthquakes on the map; the spread-out points show that the LSTM is better than the FFNN in prediction. Figure 15 shows the region wise distribution of the earthquakes predicted by the LSTM network. Nepal region is predicted to have the highest number of earthquakes in the future.  Table 2 shows comparison of future predicted earthquakes of both models against the original dataset, it can be observed that the FFNN shows a large number of earthquakes in Xizang and totally zero earthquakes for Southern India, this indicates that the model is weak. LSTM shows a changing earthquake trend in the future with much more earthquakes along Nepal and South India.