Extraction of image resampling using correlation aware convolution neural networks for image tampering detection

ABSTRACT


INTRODUCTION
With the growth of technology and availability of image editing software adopting artificial intelligence technique makes tampering detection challenging as both tampered image looks very similar to the original image.Image can tamper through different means such as content preserving and content change [1].The primary tampering attacks such as splicing, copy-clone, and object removal, are used for changing the semantic representation of an image.On contrary, the secondary tampering attacks such as compression, blurring, contrast enhancement are not a big concern as they do not change the meaning/structure of an image.Thus, this work focuses on detecting the primary tampering attacks and also improve the accuracy of localization of tampered regions at the pixel level.
The state-of-art tampering detection methodologies have majorly focused on detecting to identify whether an image has been tampered with or not [2], [3].In [4], [5] the tampering region is localized at a  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 3, June 2022: 3033-3043 3034 pixel level.In [6], [7] focused on localizing tampering at the patch level and added noise into frequency domain [8], [9] of joint photographic expert group (JPEG) compressed image for improving resampling detection performance.In recent times, the number of deep learning-based tampering detection [10]- [12] such as convolutional neural networks (CNN) [13]- [15], long-short term memory (LSTM) and stacked auto-encoders (SAE) [16] have been presented.In media crime scene investigation, the majority of state-of-art tampering detection methodologies have focused on detecting certain types of tampering only such as splicing [17] and copy-clone [18], [19].As a result, these methodologies cannot be used for detecting hybrid tampering detection.This paper aimed at detecting hybrid tampering attacks and segmenting tampering regions by employing an improved convolution neural network [4].
Segmentation of tampered regions is a challenging task.Recently, CNN-based semantic segmentation methodologies [20], [21] have attained wide attention.In [21], used fully connected CNN for analyzing region shape and object content by extracting feature sets at different levels in a hierarchical manner.The CNN-based framework works very well in the area of object detection [19] and segmentation [20], [21] in learning and a better understanding of the content of different segments.Unlike object segmentation, tampered segments could be copied objects from different regions of an image or could be removed objects.A good, tampered image will have good similarities among authenticated and fake images [14].Although convolution neural network produces spatial maps for different segments of multimedia content, they achieve very poor performance in generalizing different artifacts induced by different tampering methodologies.As a result, tampering region segmentation using a standard convolution neural network may not produce a good result.In [4] carried out a comparative analysis of various existing tampering region segmentation methodologies [20], [21] and showed they do not perform well for object removal and copy-move tampering [22]- [25].Image forgeries create certain artifacts such as compression, and resampling, which can be better learned using resampling features [6], [26].Due to interpolation resampling introduces periodic correlation between the pixels.The CNN-based tampering detection methodologies shows good translational invariance to produce spatial maps across different segment of multimedia content, and certain artifacts are well-learned using resampling feature sets [27]; which can be utilized to locate tampered segments [28], [29].From extensive, it can be seen resampling feature detection of hybrid attacks within copy-clones attacks is a challenging task.The existing tampering detection method [3], [18], [30] provides a poor result when a tampered image is noisy and also failed to detect the tampering segment under the small-smooth region.
The major challenges of tampering detection: detection of multiple copy-clone tampering within an image and distinguish source and tampered region is challenging.Detecting tampering under a small and smooth region is very difficult [31], [32].Extracting resampling feature correlation among horizontal and vertical directions using the standard CNN model is challenging.How to extract resampling feature when the image is extremely noisy.It is extremely very difficult to detect tampering when a different type of tampering operation such as scaling, rotation, compression is performed within a copy-clone attack [33]- [35].
The research hypothesis is that the state-of-art tampering detection methodologies [36] using deep learning techniques are effective in detecting various types of tampering attacks.However, existing models predominantly achieves poor results when hybrid attacks are introduced into an image; for example, when a copy-clone attack is transformed by rotation, scaling, and compression.This is because existing model fails to learn correlation among neighboring pixel.This working hypothesis is that effective learning of neighboring pixels and correlating relationship [37] in obtaining effective resampling feature extraction for detecting a hybrid attack.
For overcoming research issues this paper presents an improved CNN architecture namely the correlation aware convolution neural network (CA-CNN) model for extracting resampling features.To detect tampering segments under small and smooth segments, the image is resized [37], [38].Then, even under a noisy environment, the resampling feature can be extracted using CA-CNN architecture with good correlation.Finally, these features are trained considering different images, and a descriptor is constructed for detecting image tampering.
The contribution of research work is described: this paper presented a correlation-aware convolution neural network for detecting tampering in the image.The CA-CNN model can exploit resampling feature correlation among horizontal and vertical directions by introducing a correlation layer.The CA-CNN can detect multiple tampering within an image considering a noisy environment with different kinds of tampering operations such as scaling, rotation, and compression.The model achieves better tampering detection performance when trained with the CA-CNN model considering diverse tampering datasets such as coverage, media integration and communication center (MICC), and copy move forgery detection (CoMoFoD); no prior methodology has considered performance evaluation considering all these datasets together.The CA-CNN-based tampering detection method achieves better recall, precision, and F1-score performance than existing tampering detection methodologies.

RESAMPLING FEATURE-BASED TAMPERING DETECTION USING CORRELATION AWARE CONVOLUTION NEURAL NETWORK
This section presents image tampering methodologies using resampling features and convolution neural networks.First, present preprocessing and resampling feature extraction for performing tampering detection.Second, the extracted features are trained using the CA-CNN model shown in Figure 1 for detecting whether an image is tampered with or not and segment the tampered region.The step-by-step process of proposed resampling feature-based tampering detection using CA-CNN is shown in algorithm 1. Step 2. Load image.
Step 4. Extract resampling features with different dimensions using a noise-invariant layer.
Step 5. Extract high-level features in each patch in both horizontal and vertical directions.
Step 6. Extract common features among horizontal and vertical are cumulated and aggregated.
Step 7. Aggregated features are fed into SoftMax layer to perform classification image is tampered with or not.
Step 8. Segment the tampered region.

Preprocessing and resampling feature detection and extraction
In general, the images are tampered with using the following operations such as object removal, splicing, and copy-move.This tampering affects the statistical feature alongside the edges of the forged segments.In [29], the resampling detection method is presented using affine transformation and the Laplacian operator for extracting the resampling features for respective patches.This work uses a similar methodology for the extraction of resampling features in a given image.First, the image is segmented into a non-overlapping patch size of 64 (i.e., 8*8).When considering an image with the size of 512*512, then each patch dimension size will be 64*64.Further, for producing magnitude of linear projected error for different patches Laplacian operator is used [13].For accumulating errors concerning the different angles of projection this work uses affine transformation because there exist periodic correlations among resampling signals.At last, fast fourier transform (FFT) is applied for identifying the resampling features periodic characteristic of the signals.Generally, the resample feature sets have the capability of identifying different resampling nature such as rotation, up or down-sampling, and JPEG thresholding.For bringing good tradeoffs between increasing accuracy and reducing computation complexity here the image is resized to 512*512 which may induce certain artifacts such as up-or down-sampling, and image quality variations.In [13] showed that the resampling feature can be utilized for classifying the aforementioned artifacts.Further, resampling feature sets are used for classifying patches.However, in this work, it is used for localizing at a pixel level.For obtaining a higher number of features it is important to bring good tradeoffs in choosing the patch size.This is because resampling signal can be easily established in larger patch size as it will have a higher number of repeated features; however, identifying small, tampered segments will be difficult for localizing it.The existing resampling-based tampering detection methodologies extracted resampling features considering a block size of 8*8.However, in this work patch size is set to 32*32 for obtaining more useful information.The main factor of using the resampling feature within the patches is to establish the nature of local artifacts because of different tampering.
The outcome of CNN mainly depends on the organization of the patches.It can either be ordered in vertical or horizontal directions; however, it fails to obtain relevant local feature information.This is because, if we are arranging the patches in a vertical direction, then the patch sets of different neighbors horizontally will be disconnected by a complete column of patches.Thus, takes a lot of time and CNN fails to bring a good correlation among these patches.Similarly, if we traverse through horizontal direction over the rows will result in the same problem [19], [20].Thus, in this work for establishing a good correlation among both directions here, we introduce an additional layer namely the correlation layer.

Correlation aware-CNN based tampering detection methodology
In this work we used deep learning methodology for detecting resampling features; here the tampering detection is considered as a pattern classification problem.The architecture of correlation aware-CNN (CA-CNN) architecture for tampering detection is shown in Figure .1.The CA-CNN tampering detection methodology is composed of three layers.In layer one, the resampling features with different dimensions are captured using a noise-invariant layer; here the variance of the neighboring pixel among vertical and horizontal directions is captured.Second, in both horizontal and vertical directions, the tampered segment high-level features are extracted.Here for capturing association among vertical and horizontal directions, vertical and horizontal features are correlated and aggregated.Lastly, the aggregated features are given as an input for the SoftMax/sigmoid layer.The SoftMax layer is efficient in solving multiple tampering classification problems and sigmoid can be used for solving a binary tampering classification problem.More detail of CA-CNN architecture for tampering detection is discussed below.

Noise elimination
Resampling feature detection is challenging which generally relies on or is affected by the content of an image.However, some well-noted recent work has shown that the resampling feature can be obtained from the redundant feature of the spatial domain and doesn't depend on the content of an image.In the work, the residual among particular pixels and its respective estimates obtained through interpolating its adjacent pixels, the noise is modeled.For modeling it, in this work a new convolution layer is introduced; this layer is the first layer and is known as the noise invariant layer.From Figure 1 it can be seen two high-pass filters are selected as convolution kernels for reducing training overhead.These filters are used for capturing neighboring pixels' variance in both horizontal and vertical streams.For example, an image with the size of a pixel of 256*256 is initially convolved with 3*1 and 1*3 filters considering padding and stride of 1.The aforementioned mention filter setting will aid in learning noisy features using correlation among local pixels.Thus, the noise-invariant layers will provide a noise map of forecasting residuals of 256*256*1.

Bidirectional sequence feature extraction
In this section, the resampling high-level feature is extracted from noise obtained through the noiseinvariant layer.The existing method predominantly focused on extracting correlation features in one particular direction; thus, resampling feature detection performance is degraded.For addressing in this work the resampling feature is extracted through horizontal and as well as vertically also.These features are extracted independently and feature weight obtained through different directions is not shared.From Figure 1, we can see both vertical and horizontal sequences have five identical clusters.Each cluster is composed of four layers such as batch normalization, convolution, pooling, and activation layers.The last cluster is composed of a supplementary resampling feature obtained through a correlated sequence.Lastly, the feature extracted through different sequences is aggregated.

Correlation sequence feature extraction
This section aimed at modeling better decision making (i.e., linear fusion making) for extracting resampling behavior by merging bidirectional features.Thus, this paper presents an efficient correlation 3037 feature extraction method of a correlated sequence composed of four distinct clusters.The first cluster is composed of batch normalization, convolution, and an activation layer.The other clusters are composed of an added pooling layer; the feature obtained from the first cluster of both the sequence are aggregated and are represented through 1*1 convolution Kernels with stride 1 for obtaining linear feature fusion.The other three clusters are used for extracting high-level feature representations of cumulated features.Lastly, the outcome (i.e., feature map) obtained through the cumulated sequence is interpolated back towards vertical and horizontal sequences.The correlation feature learning method extract better feature without affecting feature extraction performance of both the sequence.

Classification
Here we present a fully connected layer using the sigmoid/SoftMax function that takes the final feature extracted from the previous layer as input to it.Using the proposed classifier, the probability that a certain feature fits the respective category is obtained, and the most ideal group is the outcome of the CNN classifier.The above-stated statement is functionally represented through (1) and (2): where (1) represents the sigmoid function applied for binary tampering detection classification problems and represents the outcome of neurons of a fully connected layer.( = 1|) represents the probabilities that  will put forth into the successful cluster.As shown in (2) defines the SoftMax function for performing multiple tampering detection, where   represents the outcome of respective  ℎ the neuron of a fully connected layer.( = |) represents the probabilities that  will fall into   ℎ cluster.

Convolution layer
The convolution layer is used for extracting features as described (3): where * defined two-dimensional convolution function,   () represent the  channel of respective  ℎ convolution kernel within  ℎ layer,   (−1) defines the  ℎ feature map extracted within the ( − 1) ℎ layer,   () represents the  ℎ feature map constructed within  ℎ layer, and   () defines the  ℎ bias term of  ℎ layers.Here we used three convolution layers with the size of (1*1, 3*3, and 5*5) with stride size is fixed to 1.

CNN batch normalization
In process of training, the feature maps computed using the convolution layer must be normalized for optimizing data distribution variations in the middle layer.For doing, a batch normalization layer is introduced between the activation and convolution layers.The process of carrying out batch optimization is mathematically represented using ( 4) to (7).The mean among entire data within the batch is computed using (4): where  depicts mean,  represents feature size considered within the batch,   defines the  ℎ data within the batch.Similarly, the variance among the entire feature within the batch is computed using (5): where  2 depicts the variance.Then every feature is normalized for generating a new set of features  ̂ with variance and mean set to 1 and 0, respectively.The  ̂ is computed as ( 6 where  is greater than 0 defining a small floating-point digit.This is done for eliminating dividing by zero errors.The optimized feature is defined using (7): where  and  are feature learned by the CNN,   represent the  ℎ outcome of batch normalization layer.The optimized feature obtained uses a non-linear activation function for better feature representation; that is, significant changes are the previous layer due to trivial changes in the forward layer are eliminated by introducing the batch normalization layer.

CNN activation
Here the activation layer is composed of a non-linear function.To improve the tampering detection accuracies, the resampling feature sets extracted using convolution layer is transformed into different space.Generally, rectified linear unit (ReLU), sigmoid, and TanH are used in the activation layer.Generally, TanH is mostly preferred over Sigmoid in most applications because TanH's average output is zero.ReLU is much faster than TanH, but its training accuracies are poor when the learning rate is kept larger.On the other side, the TanH can increase constantly concerning features; thus, attain very effective outcomes concerning features with major variance.As a result, in this work TanH function is used in the activation layer.

Pooling layer
In this layer, the feature maps are down-sampled for reducing their element size.Further, it signifies hierarchical patter by cumulating the observed window of successive convolution layers.Here we use averaging pooling (AP) and max-pooling (MP) function.In, AP the feature maps are down-sampled to 1 through averaging pooling, and for reducing the model parameter the AP replaces the fully connected layer.An important thing to be noted here is that the AP is used just for the last pooling layer of both vertical and horizontal sequences.The MP for every input feature provides outcome with maximum value and except the fifth layer of both horizontal and vertical sequence, it is applied to all the pooling layers.The kernel size of Max pooling is set to 3*3 with stride set to 2 for capturing the pattern of the adjacent pixel concerning each pixel.The proposed tampering detection using the CA-CNN framework achieves a much better detection and segmentation outcome than the traditional CNN-based tampering detection methodology which is experimentally shown below.

RESULTS AND DISCUSSION
Here experiment is carried for evaluating the performance of tampering detection performance using the proposed CA-CNN method and existing CNN-based tampering detection methodologies considering different datasets.Here performance is evaluated using MICC-600, Coverage, and CoMoFoD dataset.The aforementioned dataset is widely used in most recent tampering detection methods for validating performance.
The CA-CNN model is using Python, C++, and MATLAB libraries.The performance of CA-CNN and the existing tampering detection method are evaluated in terms of the following metrics such as true positive rate (TPR) (i.e., recall), F1 score, and false positive rate (FPR).To verify the performance of the proposed CA-CNN-based image forensics, the experimental results are compared to existing tampering detection methodologies [1], [8], [9], [26], [30] to perform the forgeries, including copying and translations, scaling, rotation, and compression.

Performance evaluation on MICC dataset
Here experiment is conducted using the MICC-F600 dataset.The dataset is composed of 440 original images and 160 tampered images.The tampering segmentation outcome achieved using the proposed CA-CNN and existing tampering detection model is shown in Figure 2. The Figure 2(a) shows the original image, Figure 2(b) shows respective ground truth of tampered region, segmentation outcome achieved using existing and CA-CNN tampering model is shown in Figures 2(c) and 2(d), respectively.Further, the accuracy performance of the proposed CA-CNN-based tampering detection method over the existing tampering detection method is carried is shown in Table 1.From Figure 2 it can be seen the proposed CA-CNN model achieves better tampering region segmentation outcomes when compared with existing models.From the result achieved it can be seen the proposed CA-CNN-based tampering detection method achieves a much superior outcome than the existing tampering detection method in terms of Recall/TPR, FPR, and F1-Score for the MICC-F600 dataset.Thus, the proposed CA-CNN-based tampering detection method is robust in detecting forged segments considering rotation and scaling.

Performance evaluation on coverage dataset
Here experiment is carried out using a coverage dataset.The dataset is very challenging, which contains 100 copy-move tampered images and the corresponding original images with similar but genuine objects.The tampering segmentation outcome achieved using the proposed CA-CNN and the existing tampering detection model is shown in Figures 3 and 4 Further, the accuracy performance of the proposed CA-CNN-based tampering detection method over the existing tampering detection method is shown in Table 2. From Figure 3 we can see CA-CNN achieves better tampering segmentation outcomes for all images except image 3. The Figure 4(a) shows the original image, Figure 4(b) shows respective ground truth of tampered region, segmentation outcome achieved using BusterNet, STRDNet (source/target region distinguishment network), and CA-CNN tampering model is shown in Figures 4(c), 4(d), and 4(e), respectively.Similarly, in Figure 4 we can see CA-CNN achieves very good tampering segmentation outcomes for image 1 and archives not that good tampering segmentation outcomes for image 2. On the overall result achieved it can be seen the proposed CA-CNN model achieves better tampering region segmentation outcomes when compared with existing models.From the result achieved it can be seen the proposed CA-CNN-based tampering detection method achieves a much superior outcome than the existing tampering detection method in terms of Accuracy and F1-Score for Coverage dataset.[9] 0.8581 -Base-Ada-Atten [9] 0.8542 -AR-Net [8] 0.8488 -BusterNet [1] -0.464 STRDNet [26] -0.677 CA-CNN 0.8563 0.7456

Performance evaluation on CoMoFoD dataset
Here experiment is carried out using the CoMoFoD dataset.The dataset contains 200 base tampered images.To hide the traces of manipulation, each base image will undergo 25 post-processing methods, with a Further, the accuracy performance of the proposed CA-CNN-based tampering detection method over the existing tampering detection method is carried is shown in Table 3. From Figure 5 we can see CA-CNN achieves better tampering segmentation outcomes for all images except image 1.On overall result achieved the proposed CA-CNN model achieves better tampering region segmentation outcome when compared with existing models.From the result achieved it can be seen the proposed CA-CNN-based tampering detection method achieves a much superior outcome than the existing tampering detection method in terms of Recall/TPR, precision, and F1-Score for the CoMoFoD dataset.

CONCLUSION
This paper presented robust tampering detection using the correlation-aware-CNN model.The CA-CNN-based tampering detection methodologies can effectively classify forged and non-forged segments and can semantically segment the forged region.The CA-CNN model can retain spatial features by using resampling features among different patches and establish correlation among tampered and non-tampered patches by employing correlated layers.Then, these resampling features are aggregated for eliminating spatial dependencies, and a descriptor is built for the whole image.An experiment is conducted on standard MICC-F600, D0, Coverage, and CoMoFoD datasets which includes different copy-clone, scaling, rotation, and compression.From the results attained it can be seen the CA-CNN-based tampering detection model achieves a much superior True positive rate, F1 score, False Positive rate, F-measure, and accuracies when compared with the existing tampering detection model.Future work would consider evaluating the accuracies of the proposed tampering model at pixel level and carry out comparative analysis over existing tampering detection methodologies.Further, evaluate the model considering a more diverse dataset.Along with, would consider improving tampering and segmentation performance.

Figure 1 .
Figure 1.The architecture of CA-CNN for image tampering detection

Table 1 .
Comparative analysis of proposed CA-CNN-based tampering detection method over existing tampering detection method for MICC-F600 dataset

Table 2 .
Comparative analysis of proposed CA-CNN based tampering detection method over existing tampering detection method for Coverage dataset

Table 3 .
Comparative analysis of proposed CA-CNN based tampering detection method over existing tampering detection method for CoMoFoD dataset