Crowd recognition system based on optical flow along with SVM classifier

ABSTRACT


INTRODUCTION
Density of population in urban areas is much higher compared to rural areas.So, there is a higher occurrence of abnormal activities such as illegal gathering, group fighting etc.These activities may be dangerous for our social public security.To detect such illegal activities, crowd behavior prediction is an interesting topic of a video monitoring system and has a great value for study and research.It is very difficult to analyze individual crowd members whether their behavior is normal or abnormal.There are some factors like number of crowd objects, motion of crowd objects, and segregation of crowd objects, eliminating the background disturbance, catch the variance among crowd object etc., which is required to analyze and predict the outcome.Some scholars have made several achievements in this area.To describe the crowd behavior Wang et al. [1] has implemented an improved spatial-temporal feature with adaptive size.A gradient based spatial model is used to define the motion of objects in scenes and recognize the type of local crowd behavior by Kratz et al. [2].LBP-TOP algorithm is implemented by Xu et al. [3] to extract the crowd behavior.This algorithm makes a LDA model by the help of training and based on that model it decides the local crowd behavior.The above discussed model mainly implements spatial-temporal feature, gradient feature or training texture feature to determine the local crowd behavior.In all these models there is a lack of providing information of global characteristics of the crowd, and hence the complete information of the crowd cannot be  [4,5], Lagrangian Coherent Structures (LCS) [6], Social Force Model (SFM) [7], MAPRM [8], Chaotic Invariants (CI) [9] and Kinematic Feature (KF) [10] .These models can give better performance in detection of crowd behavioral on some given benchmark [7,9].But these methods may fail to detect the crowd behavioral in case of low resolution video or if the speed of object is too fast or too slow.In [11], a method is proposed to obtain behavioural recognition which is based on Violence Flows that starts from computation of optical flow between adjacent frames.This method is well suitable for a chosen video set but it needs more enhancements to get more accurate result.
In this paper, we have demonstrated an approach that is based on optical flow pattern.Some treatment has been done along with optical flow pattern in order to detect the crowd behavior.To extract the region of motion activity, we take the help of motion heat map in a crowd scene.This heat map reduces the time of processing and gives better result that plays a very important role in real time application.A hot region of the scene gives the point of interest easily.These points of interest make a boundary in hot areas of the scene and based on the marked area, optical flow is computed.Optical flow information makes a multi model optical flow pattern in different time based on the behavioral of the crowd.Variations of motions are estimated to make a distinction in favor of potential abnormal events.We consider that we are in a crowd environment with no restrictions of the number of people.To analyze the optical flow model, we have taken standard UMN dataset and perform the proposed approach.
The paper is organized is as follow: In section 2 a brief related work is presented.Section 3 talks about the proposed method.We compare the proposed method with Sparse Reconstruction Cost (SRC) [1], Chaotic Invariants (CI) [12], Social Force Model (SF) [4] and the force field model (FF) [13] on UMN dataset.In section 4 outcome is evaluated and analysis is performed.Last section concludes the paper.

RELATED WORK
Liu et al. [14] has projected an approach i.e. agent-based motion models (AMMs) which uses multiple exemplar to detect crowd behavior of captured scene trajectory.They proposed an optimization algorithm that correlates between trajectory data and exemplar AMM that is based on KL-divergence and Extended Kalman Smoother.In order to describe the crowd motion, they introduced the novel individual feature along with holistic feature that is based on proposed measurement of correlation.The result proves that the proposed approach is well suitable for recognizing real world crowded scenes.
In [15], Bera et al. has proposed an algorithm for low to medium density crowd that uses a trajectory behavior learning process.This algorithm combines tracking algorithm, model of non-linear pedestrian motion and techniques of Bayesian learning to compute the behaviour of each pedestrian individually in the scene.These combinations of features are used to identify the different motion patterns of pedestrian and segment the trajectories.PETS 2016 ARENA dataset has been used for this algorithm.This algorithm is suitable for indoor or outdoor crowd scene and has got a better performance.
In [16] author has proposed a novel algorithm to detect abnormal crowd behaviour which is based on the acceleration feature.This algorithm explores the global feature of the current behavioural state and the previous behavioral state unlike previous work that explores only local features.In this paper, author has projected a new global acceleration feature that is based on invariance of three consecutive frames of greyscale images.It has an ability to match pixels and provides the speed changes precisely.After that, an logarithm of detection is designed with the help of acceleration computation along with a foreground step of extraction.This algorithm is robust because it is independent of detection and segmentation.This algorithm gives the result based on the threshold analysis.Several datasets have been taken and based on that result has been computed.Results obtained are promising.
Crowd behavioral detection is an actively researched area where behavioral features are used to differentiate between normal and abnormal activities.In [17], various types of features are summarized in order to capture the behavior of the crowd.These features are projected for encoding the crowd behavior, which may be local or global, either temporal or spatial or both.Position, motion by optical flow [18], foreground object size [19,20] gradient [21], histogram of (direction, motion, texture, field) [22], texture and temporal spatial features are the different types of features.A latent Dirichlet allocation model is introduced in [23] for crowded scene where motion feature of direction, velocity and position is used.A Markov random field model is proposed by the author in [12] for abnormal detecting activities.At each frame level, optical flow feature was extracted.The detection is based on the probability matrix.These models are well suitable for several scenes.A trajectory is extracted for normal event videos and is compared with abnormal videos in [24], so that the abnormal activities can be differentiated from normal activities.In [25] author has generated a model that is based on mixture of outliers and dynamic textures, which is labelled as anomalies.In [26], a conditional function was constructed based on the feature of direction, magnitude and position as well as a Bayesian framework was projected for escaping detection through modelling crowd motion in both the absence and presence of escape events.Several Spatio-temporal types are also implemented.In order to identify anomaly detection, a Spatio-temporal type block wise approach is used applying K-nearest neighbors that implements co-occurrence model [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27].A crowd motion pattern is proposed in [28] for crowd behavioral detection which uses Spatio-temporal types.In [29], author has considered both temporal and spatial framework to propose a feature that is based on new region in order to describe appearance and motion of both the framework.Both spatial and temporal framework feature is determined to generate the local motion pattern behavior [22] even in extremely crowded scenario.HMM feature is used to determine local motion pattern that is based on the 3D Gaussian distribution.A Spatio-temporal model is also implemented in [30], where abnormal activities in the speed, size and direction of object were detected.A video surveillance system is developed in [31][32] for abnormal visual detection and recognition in crowd.

PROPOSED SYSTEM 3.1. Motion heat map
This work Motion heat map is a two dimensional (2D) graphical representation of an image that represents a histogram where most activity of motion occurs.Let H and I indicate the heat map and intensity of frame respectively.  (, ) =  −1 (, ) +   (, ) (1) Where n and (n-1) represents the current and the previous frame numbers, i and j are the coordinates (line and column) of the pixel (i,j) of a frame.Heat map is just used for analysing the area where the motion activity occurs.This will help in reducing processing time and improves the quality of the result.

Points of interest extraction
Once heat map is generated, next we go for extracting the points of interest.Moravec's corner detector is a simple algorithm that is commonly used for this purpose.But now it is outdated.Sometimes it was not able detect the accurate corner edge due to weak invariant response with respect to direction.So it is considered as noise sensitive.Moravec's corner model had minimal complexity in computational process, but there are limitations that were discussed earlier.
The other way of extracting point of interest is Harris corner detector model.It is computationally demanding and overcomes all the limitations of Moravec's corner.This model is implemented in the current paper.It is invariant to direction and illumination variation.It is entirely based on a local auto correlated signal function.The implemented function can measure the signal with help of patch shift in various directions, the shift being very small.Let us assume a point (x, y) and a shift (∆x, ∆y) then the local auto-correlation function is defined as: where I denotes the image function and (x_i, y_i)are the points in the smooth circular window w cantered on (x, y).The shifted image is approximated by a Taylor expansion truncated to the first order terms as where I_x (.,.) and I_y (.,.)denotes the partial derivatives in x and y, respectively.Substituting the right hand site of (4) into (3) yields: Now we assume that λ 1 and λ 2 are the Eigen values of matrix M(x, y)This Eigen value will describe our point of interest.If the value of λ 1 and λ 2 is small, then, point of interest does not exist and the obtained auto correlated function is flat in nature.If the value of λ 1 is high and other λ 2 value is low, then it denotes that an edge is found and the corresponding auto correlated function is rigid in shape.Also if the value of both λ 1 and λ 2 is high that means a point of interest is found and corresponding auto correlation function is sharply picked.It denotes there is a significant increase in c(x, y).

Optical flow model
A modelling of optical flow is described in this section to analyse the crowd behaviour.Intensity parameter is avoided in estimating optical flow.Here we have used alpha parameters to improve the result and enhance the performance.A normal movement of crowd is shown in Figure 1, which contains a variation in background.This variation issue comes in pattern analysis of the crowd and analysing in human detection.In order to eliminate this problem an optical flow model is prepared.Assume that an image P with background B and foreground F is considered.This image is nothing but a frame.This frame is achieved from videos that we get from surveillance system and the number of frame depends on the video duration.

Figure 1. Example of video frames in UMN dataset
At any time t the position of image can be represented as I(p,q,t)where p and q is the corresponding coordinates and the image model is represented as: ℱ(, , ).(, , ) + ℬ(, , ).(1 − (, , )) = (, , ) This model is useful in estimating the parameters of the sequential image.We assume that the image interval is 25 frames per second i.e. (tϵ ).To achieve this model, it is considered that the background is stationary.Movable objects are represented by higher brightness.Background pixel is estimated by the lowest image brightness.The value of extracted pixel can be defined as: 1(, ) = min((, , 1), (, , 2) … ., (, , )) Here b1 is extracted pixel and this is not dependent on the time interval.Using the value of b1 we can estimate the foreground image f1 that can be defined as: If the number of frames varies, the intensity of that frame will also vary.It also effects on the image background and foreground.So the variation in intensity can computed in (11).
β denotes intensity transformation.Now we need to compute the optical flow of the human or any moving object in the crowd scene.In our work, we assume that the total space derivative and time derivative are stored in β and each of the particle  First the video sequence is selected and converted into frames.Now each frame is taken as an input on a specific time interval, then the pre-processing steps is applied.In pre-processing steps image frames is converted into grayscale and further noise removing and enhancing process is performed.The experimental results are given in the form of Figures 3,4      As from the above graphs and it is proved that our approach gives better result compare to existing (SFM) algorithm.ROC is a most common technique to visualize performance of a classifier.Here, we have used SVM classifier which is binary.Binary in the sense, that we have only two class.According to our paper scenario a crowd can be classified into two first one is normal and other is abnormal.These SVM classifier has a success rate of 97% which is better comparison to earlier SFM method.ROC curve is shown in Figure 10, 11 and 12 for different UMN dataset.Table 1 indicates that the proposed method has a better success rate when compared to the other methods.Table 1.Comparison with different state of arts method AUC Method AUC SFM [33] 94.9 Chaotic invariants [34] 99.4 Sparse recons.[35] 99.6 Local statistics [36] 99.5 MDT [37] 99.5 HOG+HOS [38] 97.02 LMVD [39] 98.61 OUR PROPOSED 99.71

CONCLUSION
In order to increase the accuracy of the behaviour of an abnormal crowd detection system we proposed optical flow method.Initially we have done pre-processing on the converting image frames extracted from video sample.This image frame is subjected to motion heat map, which extracts the main region of motion.Later points of interest are extracted from those areas of motion heat map.This process reduces our calculation complexity.Once we get the point of interest, we compute optical flow model.Optical flow model describes the flow of motion of the crowd.After analysing the optical flow model, we set a threshold point.This threshold value is subjected to SVM classifier.If the value of the optical flow is less than the threshold value, the classifier passes a decision normal.If the optical flow value is greater than the threshold value, then the system passes a decision abnormal.This classification has 99.71% accuracy.


ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2451 -2459 2452 obtained.So it is important to understand the crowd behavioral of complete scene without knowing the individual action.Recently some models are proposed which are based on Hidden markov model (HMM) Int J Elec & Comp Eng ISSN: 2088-8708  Crowd recognition system based on optical flow along with SVM classifier (Shreedarshan.K) 2453

2 (
Int J Elec & Comp Eng ISSN: 2088-8708  Crowd recognition system based on optical flow along with SVM classifier (Shreedarshan.K) 2455 in frame is denoted bys.Further we follow Taylor expansion on the input frame sequence and implementing position of the image p, q, t.It can be given as: ℰ  () ≈     () +     () +   (12)  p denotes first order derivatives with respect to time and space data vector.The function of optical flow estimation is given as equation.The function of optical flow estimation is given as equation.(  ,   ) = ∑   ()   ,  )(13)A complete flow diagram of our proposed model is shown in Figure2.

Figure 3 .
Figure 3.optical flow of the video sequence

Figure 4 .
Figure 4. Streak line flow of the video sequence Figure 5. Normal frames Figure 6.Abnormal frames Figure 6 demonstrates that the abnormal activity is starting now. ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2451 -2459 2456 Next, we are just comparing our proposed algorithm to ground truth and SFM model.In our model we took 42 frames of crowd sample, 44 frames of courtyard sample and 36 frames of corridor sample.In Figures 7, 8 and 9, we have shown the comparative result of our proposed approach to ground truth and SFM model across frames.

Figure 7 .
Figure 7.Comparison of proposed algorithm with Ground truth and existing algorithm (SFM) for dataset crowd.

Figure 8 .Figure 8 .
Figure 8.Comparison of proposed algorithm with Ground truth and existing algorithm (SFM) for dataset courtyard