A dilution-based defense method against poisoning attacks on deep learning systems

ABSTRACT


INTRODUCTION
In conjunction with recent remarkable achievements in the field of deep learning (DL), there is also active research being conducted on adversarial attacks targeting DL systems or models [1]- [5].As the threat to DL models increases, it is essential to ensure the security and stability of artificial intelligence systems for defending against poisoning attacks [6]- [9].Recently, the generative pre-trained transformer 3 (GPT-3) model, which ChatGPT is based on, collected data from the internet to generate the model and performed the task of filtering out contaminated data to utilize the collected data as training data [10].During the model training phase, poisoning attacks introduce contaminated data into the training dataset, resulting in the creation of a flawed model.Therefore, it is crucial to defend against such attacks on training data to ensure the model's accuracy and reliability.The methods for defending against poisoning attacks can be broadly categorized into two approaches: enhancing the model's robustness or detecting and removing contaminated data [11]- [17].The detection method is a method of determining whether the training data is normal and removing abnormal data before training [16], [17].However, with the continuous emergence of new state-of-the-art attacks, it remains a difficult challenge to ideally distinguish whether the data collected, through detection methods, is normal or not.Therefore, in this study, a dilution-based defense technique is proposed based on the assumption that it is impossible to perfectly  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol.14, No. 1, February 2024: 645-652 646 differentiate contaminated data using a detection method.The aim of our dilution-based defense method is to reduce the attack components of contaminated data by increasing the amount of clean data during the training phase.
There has been no related research that increases the amount of training data to reduce the components of contaminated data for defending against poisoning attacks.However, there has been similar research.They proposed a method to defend against poisoning attacks by creating a model with resistance to attacks through the augmentation of the training data [12]- [15].To distinguish between data augmentation techniques and our proposed defense mechanism based on dilution, we divide the defense stages into two, before and after the poisoning attack (or post-attack and pre-attack), respectively [11].In the stage before the attack, clean data can be used since there is no poisoning attack yet.However, in the stage after the poisoning attack, clean data cannot be used, and it is difficult to distinguish normal data from contaminated data.As clean data can be used in the stage before the poisoning attack, it can be utilized to augment the data and strengthen the model [11]- [15].It is possible to defend against poisoning attacks using data augmentation techniques even in the post-attack stage.However, since it is impossible to perfectly distinguish contaminated data, the defense effectiveness of dilution techniques is expected to be weaker than that of clean data when the proportion of contaminated data is high in the training data.To verify this, we conducted experiments comparing our proposed method with an existing method.
The main contributions of this paper are as follows.First, to the best of our knowledge, this is the first study that proposed a dilution-based defense mechanism against poisoning attacks on DL systems.Specifically, we duplicate innocuous clean data in the training dataset and then build a DL model based on it.As a result, our proposed method lowers the impact of contaminated data included in the training dataset and thus significantly reduces the impact of contaminated data added by adversarial attackers in transfer learning environments.Second, we demonstrated the effectiveness of our proposed dilution defense method against poisoning attacks by conducting extensive experiments.According to the experimental results, our dilutionbased defense method increased the classification accuracy of a DL model by at most 9.7%p compared to a DL model with no defense mechanism, and 20.9%p higher than a DL model with the existing defense method (Cutmix data augmentation).Furthermore, the attack success rate (ASR) of backdoor attack decreased by 33.5%p.
The rest of this paper is organized as follows: in section 2, we overview the background knowledge and introduce existing studies.In section 3, we design our proposed method based on the analysis of general poisoning attacks.In section 4, we conduct extensive experiments and analyze the results.Finally, we conclude with future research directions in section 5.

RELATED WORKS 2.1. Poisoning attacks
The poisoning attacks occur during the transfer learning process using training data collected from outside source which cannot be completely trusted [18], [19].Therefore, if the collected dataset contains contaminated data (i.e., poisoned dataset), the DL model trained based on it is also contaminated and thus behaves abnormally.We explain four representative poisoning attack techniques considered in this study as follows.Figure 1 shows these examples of four types of poisoning attacks.Figure 1(a) is a dirty-label poisoning attack that changes the label which is the simplest poisoning attack.Figure 1(b) is a clean-label poisoning attack while Figure 1(c) and Figure 1(d) are examples of backdoor attacks that apply dirty-label and clean-label, respectively.
Poisoning attacks can be classified into a dirty-label attack or a clean-label attack depending on whether the label in a poisoned sample is falsified or not.First, the dirty-label attack is an attack in which an attacker changes the label of training data to reduce the accuracy of the model, as shown in Figure 1(a) [19].Second, the clean-label attack generates adversarial examples by adding perturbations to existing training images without changing the labels as shown in Figure 1(b) [19], [20].The clean-label attack is called invisible attack because human eyes hardly detect changes in the poisoned adversarial images produced by this attack and the label is also used as it is [19].
The concepts of the dirty-label attack and the clean-label attack can also be extended to backdoor attacks.The backdoor attack is a special type of poisoning attack that inserts a trigger inducing a specific behavior into the training data, as shown in Figure 1(c) and Figure 1(d) [19].This attack forces a DL model to conduct specific behaviors such as misclassifying inputs containing the trigger according to the attacker's intention [19].The clean-label backdoor attack, as shown in Figure 1(d), inserts perturbations into the training data while maintaining the original labels [21].In subsection 4.1, clean-label poisoning attacks and clean-label backdoor attacks were employed as attack methods.

Existing defense methods
Existing defense methods against poisoning attacks are based on data augmentation (DA) techniques as follows.First, Borgnia et al. [12] proposed a method to defend against poisoning attacks by enhancing the robustness of the model using Cutmix DA techniques [13]; Cutmix has been used as an augmentation technique to defend against poisoning attacks [12], [13].Figure 2 shows a poisoning attack defense method that applies a conventional data augmentation technique.Specifically, Borgnia et al. [12] generated an augmented dataset   based on the authenticated dataset   using the Cutmix technique and trained the model, and the number of data in   and   is the same at 50,000 [10], [20].As a result, trained models using such methods have shown lower success rates for poisoning attacks and higher classification accuracy [20].In addition, Veldanda et al. [14] proposed a data augmentation technique that adds noise to the training data during the pre-processing stage to defend against BadNets that attackers may generate when downloading data from the internet.Qiu et al. [15] used 71 data augmentation techniques to transform images in the training data during both the training and inference phases.As a result, they showed that this technique effectively mitigated eight types of backdoor attacks and demonstrated superior performance compared to five existing defense methods.
Most DA techniques have focused on modifying images before training at stage  1 to remove any adversarial components in the training data.However, applying DA techniques after a poisoning attack launched at stage  2 has two limitations.First, data collected from outside the system   cannot be trusted entirely and thus applying augmentation techniques based on this data may not be very effective in defense.Second, as the proportion of contaminated data in the training data   increases, the defense effectiveness decreases because the risk of the trained model increases, and details are described in subsection 3.2 [12], [14].

. Our approach: dilution-based defense method
We propose a dilution-based defense mechanism against poisoning attacks on the training data collected from outside the system, specifically at stage  2 , even if the data contains poisoned samples.We consider the following attack scenario.As shown in Figure 3, we assume that a DL model is trained with training data   collected from outside the system after poisoning attack launched at stage  2 and the ratio of contaminated data is unknown.Since there is no classifier that can perfectly classify the contaminated data according to the previous assumption, a detection technique alone cannot defend a DL model.Therefore, to reduce impact of contaminated data, we generate additional clean data Dclean, and then add it to the collected data   .By this dilution approach, we expect the success rate of poisoning attacks to decrease.
The design of our dilution-based defense mechanism is illustrated in Figure 3.To reduce the proportion of poisoned data in newly collected data   , we generate a clean dataset   based on various augmentation techniques such as by simply copying   or by using deep convolutional generative adversarial network (DCGAN).After that, we add   to   and then train a DL model based on   ∪   .
The expected benefits of the dilution-based defense mechanism are as follows.First, if the dilution defense mechanism is applied at stage  2 after a poisoning attack launched, it is expected that the effectiveness of the poisoning attack will decrease, resulting in an increase in the model's classification accuracy.In addition, since conventional data augmentation techniques do not focus on the amount of data, it is expected that the dilution-based defense technique will show better performance as the amount of contaminated data increases.

Simple analysis of decreasing attack risk by proposed method
The reason why the accuracy increases by the dilution-based defense method can be predicted by examining the change in risk .In poisoning attacks, the overall risk R can be expressed as in (1): where   is the risk for normal data,   is the risk for poisoning attacks, λ is a non-negative hyperparameter, and we let  is a dataset,   is a total dataset, and   is a poisoned dataset, subset of   , and (|  | / |  |) is the poisoning rate [19].Since the contaminated data cannot be detected, according to the our assumption, the detection risk   is eliminated [19].
When we apply the dilution-based defense method proposed in this paper,   is maintained but   and  decreases.This is because as normal data increases,   converges to a lower minimum through the optimization process during training.Additionally, since the overall amount of data   increases,  decreases proportionally to the ratio of   .However, since the number of contaminated data remains unchanged,   is maintained.Therefore, by implementing our dilution-based defense method, the overall risk decreases in the DL training stage while the classification accuracy improves.

. Experimental purpose and setup
The main experimental purpose is to verify the effectiveness of our proposed dilution-based defense method against poisoning attacks and compare its performance with an existing method.For performance comparison, we use classification accuracy and the attack success rate (ASR) as performance evaluation metrics.Classification accuracy is a metric that represents the ratio of correctly distinguishing normal and abnormal data while ASR represents the ratio of attacks succeeded out of the total number of poisoning attack trials against the target model.Thus, a higher classification accuracy indicates a better defense performance while a lower number of successful attacks indicates a better defense effect.For clean-label poisoning attacks, classification accuracy was measured.For clean-label backdoor attacks, the classification accuracy remains almost constant, so ASR was used.
The experiment was designed based on the experimental objectives as follows.First, to verify the effectiveness of the dilution defense method, we measured the changes in accuracy according to the poisoning rate and dilution rate.to compare with the existing DA-based defense method using Cutmix (DA-Cutmix) method, we applied the DA-Cutmix method in step  1 and compared its performance with our dilution-based defense method.For experiment implementation, we used the Anaconda software's virtual environment based on Python 3.9 and Tensorflow 2.10 framework with Adversarial Robustness Toolbox, and for running the experiment program, we used an Intel Core i9-12900k CPU and a GeForce RTX 3090 24 GB random-access memory (RAM) graphics processing unit (GPU) [22]- [24].The specific experimental setup is described as follows.− Target DL model and dataset: To construct the attack target DL model, we used a ResNet model trained on the CIFAR-10 dataset which is commonly used in poisoning attack and defense research [25], [26].The CIFAR-10 dataset consists of 32×32-pixel color images that can be classified into 10 classes such as airplanes, birds, and horses.It consists of 50,000 training images and 10,000 test images.The ResNet model using residual blocks to reduce information loss during training is a convolutional neural network (CNN) that has shown a high performance in image recognition [26].To align with an existing method and experimental setup, we use ResNet-50 with 0.47 million parameters [12], [26].− Poisoning attack methods: To taint the target model, we used clean-label poisoning and clean-label backdoor attacks, as shown in Figure 1 [20], [21].We created   that includes contaminated data   generated using the two poisoning attack techniques.We trained the contaminated model   with   generated at various ratios.For clean-label poisoning attacks, we used various poisoning rates (0%, 20%, 40%, 60%, 80%, and 100%), and the experiment with clean-label poisoning attack is evaluated based on the average accuracy for 10 classes.In addition, the experiment with clean-label backdoor attack (a targeting attack) is evaluated based on ASR for one class of the target.− Constructing training dataset for evaluation: To measure the performance of the dilution-based defense technique for each additional data ratio, we constructed the training data as follows.The number of duplicating   in our dilution method is denoted by m, and the proportion of contaminated data   from the newly collected data   is indicated by the poisoning rate;   = 1,000.For clean-label poisoning attack, we duplicated   up to 9,000 by varying m from 1 to 9. For clean-label backdoor attack, we duplicated   up to 20,000 by varying m from 1 to 20.
− Comparison of performance with DA-Cutmix: To compare the performance of our proposed dilution defense technique, we measured the performance of an existing DA technique.As shown in Figure 2, we applied the Cutmix DA technique at  1 to generate model  1 , and measured the changes in accuracy when  1 is subjected to a poisoning attack [12].We then compared the performance of this data augmentation technique with the performance of our dilution defense technique [12].

Experimental result and analysis
We now explain three experimental results and analyze them as follows: First, as m increases, our proposed dilution-based defense method can better defend the target DL model against clean-label poisoning attacks with various poisoning rates, as shown in Figure 4. Specifically, Figure 4(a) shows the changes in classification accuracy of the target DL model according to m and Figure 4(b) shows the changes in loss according to m. Especially, when  = 0 (i.e., no defense option), the classification accuracy dropped to 81.4% in the presence of clean-label poisoning attack with poisoning  = 100%.However, with our dilution defense technique, the classification accuracy increased by 9.7%p and thus reached 91.1%.In addition, as shown in Figure 4(a), we could not observe the clear increment in classification accuracy after  = 3, which means there can be an optimal m given a poisoning rate.
Second, our dilution-based defense technique outperformed an existing defense method (Cutmix data augmentation; DA-Cumix) in terms of classification accuracy.Before presenting the results, we explain  1.To compare three target DL models in Table 1(a) to (c) in various ways, we considered two attack cases as the following.In Attack case 1, the attacker uses the same poisoned dataset to contaminate the three target models and in Attack case 2 the advanced white-box attacker uses different poisoned datasets.Next, we explain the results as follows.For Attack case 1, as shown in Table 2, while as the contamination rate increases, the classification accuracy decreases.Specifically, when poisoning rate = 100%, no defense method (a) and an existing method (b) showed a significant reduction in accuracy around 20%p compared to when poisoning rate = 0%.Meanwhile, our proposed method demonstrates a small decrease by less than 1%p in classification accuracy.For Attack case 2, as shown in Table 3, when poisoning rate = 20%, there is no significant difference.However, as poisoning rate grows to 100%, the performance of DA-Cutmix decreases significantly while our dilution defense method (c) maintains the similar classification accuracy of the case when poisoning rate = 0%.Third, our proposed dilution-based defense methods better prevented clean-label backdoor attacks as m grows, as shown in Figure 5. Specifically, as shown in Figure 5, 148 attacks were successful when m = 0 (no dilution defense).However, as the clean data   was added to the training data (as m grows), ASR clearly lowered to 40.5% (when m = 12; the number of added data = 12,000); thus, thanks to our defense method, around 33.5%p of attacks were prevented.Meanwhile, while our dilution-based method is very effective against clean-label backdoor attacks due to the specific targeting characteristic, it requires more additional clean data compared to clean-label poisoning attacks.This is because the dilution defense method reduces both normal and attack components in the data.

CONCLUSION
In this paper, by assuming that there are no techniques that perfectly detect poisoning attacks, we proposed a dilution-based defense method against poisoning attacks which is a novel defense mechanism to complement existing detection methods.Our dilution-based defense method adds clean data to training data in order to reduce the impact of poisoned data in the post-poisoning phase.Our experimental results demonstrate its validity and effectiveness in defending against poisoning attacks.Specifically, applying dilution defense increased the classification accuracy performance of a DL model by 9.7%p for poisoning attack and decreased 33.5%p of ASR for backdoor attack.In addition, the defense performance of our proposed method is up to 20.9%p better than that of an existing data augmentation method.Consequently, the results show that our dilution-based defense method is very effective against both poisoning attacks and backdoor attacks.
Our future research directions are as follows.First, a mere increase in the amount of training data leads to higher computing costs.Therefore, it is essential to study methods for minimizing the additional data required for training.Second, to further improve the classification performance of DL models, we will study weakening the attack components of transferred data and maintaining the benign feature of them during the dilution process.

Figure 3 .
Figure 3. Dilution-based defense method Int J Elec & Comp Eng ISSN: 2088-8708  A dilution-based defense method against poisoning attacks on deep learning systems (Hweerang Park)

Figure 4 .
Figure 4. Depending on  (the number of duplications   ), (a) classification accuracy and (b) loss graph in experiments on clean-label poisoning attacks with various poisoning rates

Figure 5 .
Figure 5. Attack success rate (ASR, left y-axis) and classification accuracy (right y-axis) depending on  (the number of added data; x-axis) in clean-label backdoor attacks

Table 1 .
Experimental dataset setting and two attack cases (attack case1 and attack case 2)

Table 2 .
Attack case 1: comparison of classification accuracy in clean-label poisoning attack using the same attack dataset A dilution-based defense method against poisoning attacks on deep learning systems (Hweerang Park) 651

Table 3 .
Attack case 2: comparison of classification accuracy in clean-label poisoning attack using white-box attacks Methods Poisoning rate (|  |/|  |, average of 3 times)