Adversarial Learning Algorithm
An Adversarial Learning Algorithm is an machine learning algorithm that can be implemented by an adversarial learning system to solve an adversarial learning task by training models to be robust against adversarial opponents.
- AKA: Adversarial Training Algorithm, Robust Optimization Algorithm, Adversarial Defense Algorithm.
- Context:
- It can (typically) handle Adversarial Examples that the attacker has intentionally designed to cause the model to make a mistake.
- It can generate adversarial examples during training using methods like Fast Gradient Sign Method or Projected Gradient Descent.
- It can iteratively update model parameters to reduce performance degradation under adversarial perturbations.
- It can integrate with supervised or self-supervised training objectives to enhance both clean and adversarial accuracy.
- It can include adversarial loss terms that penalize vulnerability to specific types of perturbations.
- It can be applied across domains including image recognition, natural language understanding, speech processing, and cybersecurity.
- It can serve as the core mechanism in black-box or white-box threat model defenses.
- It can include ensemble strategies or certified robustness guarantees under specific mathematical assumptions.
- ...
- Example(s):
- The original adversarial training method introduced in "Explaining and Harnessing Adversarial Examples" using FGSM.
- Code referencee: https://github.com/anishathalye/obfuscated-gradients
- PGD adversarial training implemented in Madry Lab's benchmark for robust models
- Official implementation: https://github.com/MadryLab/mnist_challenge
- TRadeoff-inspired Adversarial DEfense via Surrogate-loss (TRADES) minimization, a code for robust generalization under adversarial attack.
- Code repository: https://github.com/yaodongyu/TRADES
- ...
- The original adversarial training method introduced in "Explaining and Harnessing Adversarial Examples" using FGSM.
- Counter-Example(s):
- Standard Backpropagation Algorithm, which optimizes on clean data and lacks adversarial considerations.
- Data Augmentation Algorithm, which improves generalization but does not specifically enhance robustness.
- Dropout Regularization Algorithm, which prevents overfitting but does not defend against adversarial perturbations.
- ...
- See: ANTIDOTE Algorithm, Evasion Attack, Poisoning Attack, GAN Training Algorithm, Adversarial Perturbation, Fast Gradient Sign Method, Projected Gradient Descent, Robust Machine Learning, Obfuscated Gradient.
References
2019a
- (Zhang et al., 2019) ⇒ Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, & Michael I. Jordan. (2019). "Theoretically Principled Trade-off between Robustness and Accuracy". arXiv Preprint.
- QUOTE: "We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. ... Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed adversarial learning algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance."
2019b
- (TRADES, 2019) ⇒ Yaodong Yu, Hongyang Zhang, & TRADES Contributors. (2019). "TRADES: A Benchmark and Code for Adversarial Robustness".
- QUOTE: "TRADES is an open-source adversarial learning algorithm implementation that provides a theoretically justified framework for balancing the trade-off between adversarial robustness and classification accuracy. The repository includes code, benchmarks, and pretrained models for robust deep learning."
2019c
- (MadryLab, 2019) ⇒ Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu, & MadryLab Contributors. (2019). "MadryLab MNIST Adversarial Challenge".
- QUOTE: "The MadryLab MNIST Adversarial Challenge provides a standardized benchmark and codebase for evaluating adversarial learning algorithms on the MNIST dataset, encouraging the development and testing of robust deep neural networks against adversarial attacks."
2019d
- (Athalye et al., 2019) ⇒ Anish Athalye, Nicholas Carlini, David Wagner, & Obfuscated Gradients Contributors. (2019). "Obfuscated Gradients: Analyzing and Defeating Gradient Obfuscation Defenses".
- QUOTE: "This repository provides code and analysis for identifying and circumventing obfuscated gradient defenses in adversarial learning algorithms. It demonstrates that many proposed defenses can be broken using carefully designed attacks, emphasizing the need for rigorous evaluation of adversarial robustness."
2017a
- (Madry et al., 2017) ⇒ Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, & Adrian Vladu. (2017). "Towards Deep Learning Models Resistant to Adversarial Attacks". arXiv Preprint.
- QUOTE: We study the adversarial robustness of neural networks through the lens of robust optimization. ... Our approach specifies a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. ... The code and pre-trained models for our adversarial learning algorithm are available for reproducibility and benchmarking.
2017b
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Adversarial_machine_learning Retrieved:2017-12-9.
- Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection and biometric recognition.
The problem arises from the fact that machine learning techniques were originally designed for stationary environments in which the training and test data are assumed to be generated from the same (although possibly unknown) distribution. In the presence of intelligent and adaptive adversaries, however, this working hypothesis is likely to be violated to at least some degree (depending on the adversary). In fact, a malicious adversary can carefully manipulate the input data exploiting specific vulnerabilities of learning algorithms to compromise the whole system security.
Examples include: attacks in spam filtering, where spam messages are obfuscated through misspelling of bad words or insertion of good words;[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] attacks in computer security, e.g., to obfuscate malware code within network packets [13] or mislead signature detection;[14] attacks in biometric recognition, where fake biometric traits may be exploited to impersonate a legitimate user (biometric spoofing) [15] or to compromise users’ template galleries that are adaptively updated over time.[16] [17]
- Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection and biometric recognition.
2017c
- (OpenAI, 2017) ⇒ https://blog.openai.com/adversarial-example-research/
- QUOTE: Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake; they’re like optical illusions for machines. In this post we’ll show how adversarial examples work across different mediums, and will discuss why securing systems against them can be difficult.
2014
- (Goodfellow et al., 2014) ⇒ Ian J. Goodfellow, Jonathon Shlens, & Christian Szegedy. (2014). "Explaining and Harnessing Adversarial Examples". arXiv Preprint.
- QUOTE: We argue that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. ... This view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset. ... Our adversarial learning algorithm provides a foundation for robust model training and evaluation.
2016
- https://mascherari.press/introduction-to-adversarial-machine-learning/
- QUOTE: Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. All machine learning algorithms and methods are vulnerable to many kinds of threat models. … At the highest level, attacks on machine learning systems can be classified into one of two types: Evasion attacks and Poisoning attacks.
2011
- (Huang et al., 2011) ⇒ Ling Huang, Anthony D. Joseph, Blaine Nelson, Benjamin I.P. Rubinstein, and J. D. Tygar. (2011). “Adversarial Machine Learning.” In: Proceedings of the 4th ACM workshop on Security and artificial intelligence. ISBN:978-1-4503-1003-1 doi:10.1145/2046684.2046692
- QUOTE: In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning --- the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for classifying attacks against online machine learning algorithms; discuss application-specific factors that limit an adversary's capabilities; introduce two models for modeling an adversary's capabilities; explore the limits of an adversary's knowledge about the algorithm, feature space, training, and input data; explore vulnerabilities in machine learning algorithms; discuss countermeasures against attacks; introduce the evasion challenge; and discuss privacy-preserving learning techniques.
2010
- (Laskov & Lippmann, 2010) ⇒ Pavel Laskov, and Richard Lippmann. (2010). “Machine Learning in Adversarial Environments.” Machine learning 81, no. 2
2005
- (Lowd & Meek, 2005) ⇒ Daniel Lowd, and Christopher Meek. (2005). “Adversarial Learning.” In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge discovery in data mining (KDD-2005)
- ↑ N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. “Adversarial classification”. In Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 99–108, Seattle, 2004.
- ↑ D. Lowd and C. Meek. “Adversarial learning”. In A. Press, editor, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 641–647, Chicago, IL., 2005.
- ↑ B. Biggio, I. Corona, G. Fumera, G. Giacinto, and F. Roli. “Bagging classifiers for fighting poisoning attacks in adversarial classification tasks”. In C. Sansone, J. Kittler, and F. Roli, editors, 10th International Workshop on Multiple Classifier Systems (MCS), volume 6713 of Lecture Notes in Computer Science, pages 350–359. Springer-Verlag, 2011.
- ↑ B. Biggio, G. Fumera, and F. Roli. “Adversarial pattern classification using multiple classifiers and randomisation”. In 12th Joint IAPR International Workshop on Structural and Syntactic Pattern Recognition (SSPR 2008), volume 5342 of Lecture Notes in Computer Science, pages 500–509, Orlando, Florida, USA, 2008. Springer-Verlag.
- ↑ B. Biggio, G. Fumera, and F. Roli. “Multiple classifier systems for robust classifier design in adversarial environments”. International Journal of Machine Learning and Cybernetics, 1(1):27–41, 2010.
- ↑ M. Bruckner, C. Kanzow, and T. Scheffer. “Static prediction games for adversarial learning problems”. J. Mach. Learn. Res., 13:2617–2654, 2012.
- ↑ M. Bruckner and T. Scheffer. “Nash equilibria of static prediction games”. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 171–179. 2009.
- ↑ M. Bruckner and T. Scheffer. “Stackelberg games for adversarial prediction problems". In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 547–555, New York, NY, USA, 2011. ACM.
- ↑ A. Globerson and S. T. Roweis. “Nightmare at test time: robust learning by feature deletion”. In W. W. Cohen and A. Moore, editors, Proceedings of the 23rd International Conference on Machine Learning, volume 148, pages 353–360. ACM, 2006.
- ↑ A. Kolcz and C. H. Teo. “Feature weighting for improved classifier robustness”. In Sixth Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 2009.
- ↑ B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. “Exploiting machine learning to subvert your spam filter”. In LEET’08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1–9, Berkeley, CA, USA, 2008. USENIX Association.
- ↑ G. L. Wittel and S. F. Wu. “On attacking statistical spam filters”. In First Conference on Email and Anti-Spam (CEAS), Microsoft Research Silicon Valley, Mountain View, California, 2004.
- ↑ P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, and W. Lee. Polymorphic blending attacks. In USENIX- SS’06: Proc. of the 15th Conf. on USENIX Security Symp., CA, USA, 2006. USENIX Association.
- ↑ J. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature learning by training maliciously. In Recent Advances in Intrusion Detection, LNCS, pages 81–105. Springer, 2006.
- ↑ R. N. Rodrigues, L. L. Ling, and V. Govindaraju. “Robustness of multimodal biometric fusion methods against spoof attacks". J. Vis. Lang. Comput., 20(3):169–179, 2009.
- ↑ B. Biggio, L. Didaci, G. Fumera, and F. Roli. “Poisoning attacks to compromise face templates.” In 6th IAPR Int’l Conf. on Biometrics (ICB 2013), pages 1–7, Madrid, Spain, 2013.
- ↑ M. Torkamani and D. Lowd “Convex Adversarial Collective Classification”. In: Proceedings of the 30th International Conference on Machine Learning (pp. 642-650), Atlanta, GA., 2013.