Abstract
Safeguarding GenAI models against threats and aligning them with security requirements is imperative yet challenging. This chapter provides an overview of the security landscape for generative models. It begins by elucidating common vulnerabilities and attack vectors, including adversarial attacks, model inversion, backdoors, data extraction, and algorithmic bias. The practical implications of these threats are discussed, spanning domains like finance, healthcare, and content creation. The narrative then shifts to exploring mitigation strategies and innovative security paradigms. Differential privacy, blockchain-based provenance, quantum-resistant algorithms, and human-guided reinforcement learning are analyzed as potential techniques to harden generative models. Broader ethical concerns surrounding transparency, accountability, deepfakes, and model interpretability are also addressed. The chapter aims to establish a conceptual foundation encompassing both the technical and ethical dimensions of security for generative AI. It highlights open challenges and lays the groundwork for developing robust, trustworthy, and human-centric solutions. The multifaceted perspective spanning vulnerabilities, implications, and solutions is intended to further discourse on securing society’s growing reliance on generative models. Frontier model security is discussed using Anthropic proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, N. (2023, March 23). Model inversion attacks | A new AI security risk. Michalsons. Retrieved August 28, 2023, from https://www.michalsons.com/blog/model-inversion-attacks-a-new-ai-security-risk/64427
Anthropic. (2023a, July 25). Frontier model security. Anthropic. Retrieved November 26, 2023, from https://www.anthropic.com/index/frontier-model-security
Anthropic. (2023b, October 5). Decomposing language models into understandable components. Anthropic. Retrieved October 10, 2023, from https://www.anthropic.com/index/decomposing-language-models-into-understandable-components
Bansemer, J., & Lohn, A. (2023, July 6). Securing AI makes for safer AI. Center for Security and Emerging Technology. Retrieved August 29, 2023, from https://cset.georgetown.edu/article/securing-ai-makes-for-safer-ai/
Brownlee, J. (2018, December 7). A gentle introduction to early stopping to avoid overtraining neural networks - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 29, 2023, from https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
Datascientest. (2023, March 9). SHapley additive exPlanations ou SHAP : What is it ? DataScientest.com. Retrieved August 29, 2023, from https://datascientest.com/en/shap-what-is-it
Dickson, B. (2022, May 23). Machine learning has a backdoor problem. TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2022/05/23/machine-learning-undetectable-backdoors/
Dickson, B. (2023, January 16). What is reinforcement learning from human feedback (RLHF)? TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2023/01/16/what-is-rlhf/
Duffin, M. (2023, August 12). Machine unlearning: The critical art of teaching AI to forget. VentureBeat. Retrieved October 7, 2023, from https://venturebeat.com/ai/machine-unlearning-the-critical-art-of-teaching-ai-to-forget/
Gupta, A. (2020, October 12). Global model interpretability techniques for Black Box models. Analytics Vidhya. Retrieved August 29, 2023, from https://www.analyticsvidhya.com/blog/2020/10/global-model-interpretability-techniques-for-black-box-models/
Irolla, P. (2019, September 19). Demystifying the membership inference attack | by Paul Irolla | Disaitek. Medium. Retrieved August 29, 2023, from https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39
Kan, M., & Ozalp, H. (2023, November 9). OpenAI Blames ChatGPT Outages on DDoS Attacks. PCMag. Retrieved November 23, 2023, from https://www.pcmag.com/news/openai-blames-chatgpt-outages-on-ddos-attacks
Nagpal, A., & Guide, S. (2022, January 5). L1 and L2 regularization methods, explained. Built In. Retrieved August 29, 2023, from https://builtin.com/data-science/l2-regularization
Nguyen, A. (2019, July). Understanding differential privacy | by An Nguyen. Towards Data Science. Retrieved August 28, 2023, from https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a
NIST. (2022, February 3). NIST Special Publication (SP) 800-218, Secure Software Development Framework (SSDF) Version 1.1: Recommendations for mitigating the risk of software vulnerabilities. NIST Computer Security Resource Center. Retrieved November 26, 2023, from https://csrc.nist.gov/pubs/sp/800/218/final
Noone, R. (2023, July 28). Researchers discover new vulnerability in large language models. Carnegie Mellon University. Retrieved August 28, 2023, from https://www.cmu.edu/news/stories/archives/2023/july/researchers-discover-new-vulnerability-in-large-language-models
O’Connor’s, R., & O’Connor, R. (2023, August 1). How reinforcement learning from AI feedback works. AssemblyAI. Retrieved October 10, 2023, from https://www.assemblyai.com/blog/how-reinforcement-learning-from-ai-feedback-works/
Olah, C. (2022, June 27). mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread. Retrieved August 29, 2023, from https://transformer-circuits.pub/2022/mech-interp-essay/index.html
OWASP. (2023). OWASP top 10 for large language model applications. OWASP Foundation. Retrieved August 29, 2023, from https://owasp.org/www-project-top-10-for-large-language-model-applications/
Ribeiro, M. T. (2016, April 2). LIME - Local interpretable model-agnostic explanations – Marco Tulio Ribeiro –. Retrieved August 29, 2023, from https://homes.cs.washington.edu/~marcotcr/blog/lime/
Sample, I., & Gregory, S. (2020, January 13). What are deepfakes – and how can you spot them? The Guardian. Retrieved August 29, 2023, from https://www.theguardian.com/technology/2020/jan/13/what-are-deepfakes-and-how-can-you-spot-them
Sanzeri, S., & Danise, A. (2023, June 23). The quantum threat to AI language models like ChatGPT. Forbes. Retrieved August 29, 2023, from https://www.forbes.com/sites/forbestechcouncil/2023/06/23/the-quantum-threat-to-ai-language-models-like-chatgpt/
Secureworks. (2023, June 27). Unravelling the attack surface of AI systems. Secureworks. Retrieved August 29, 2023, from https://www.secureworks.com/blog/unravelling-the-attack-surface-of-ai-systems
Tomorrow.bio. (2023, September 21). Preventing Bias in AI Models with Constitutional AI. Tomorrow Bio. Retrieved October 10, 2023, from https://www.tomorrow.bio/post/preventing-bias-in-ai-models-with-constitutional-ai-2023-09-5160899464-futurism
van Heeswijk, W. (2022, November 29). Proximal policy optimization (PPO) explained | by Wouter van Heeswijk, PhD. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b
Wolford, B. (2021). Everything you need to know about the “Right to be forgotten” - GDPR.eu. GDPR compliance. Retrieved October 7, 2023, from https://gdpr.eu/right-to-be-forgotten/
Wunderwuzzi. (2020, November 10). Machine learning attack series: repudiation threat and auditing · Embrace the red. Embrace The Red. Retrieved August 29, 2023, from https://embracethered.com/blog/posts/2020/husky-ai-repudiation-threat-deny-action-machine-learning/
Yadav, H. (2022, July 4). Dropout in neural networks. Dropout layers have been the go-to… | by Harsh Yadav. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9
Yasar, K. (2022). What is a generative adversarial network (GAN)? | Definition from TechTarget. TechTarget. Retrieved August 29, 2023, from https://www.techtarget.com/searchenterpriseai/definition/generative-adversarial-network-GAN
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Huang, K., Goertzel, B., Wu, D., Xie, A. (2024). GenAI Model Security. In: Huang, K., Wang, Y., Goertzel, B., Li, Y., Wright, S., Ponnapalli, J. (eds) Generative AI Security. Future of Business and Finance. Springer, Cham. https://doi.org/10.1007/978-3-031-54252-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-54252-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54251-0
Online ISBN: 978-3-031-54252-7
eBook Packages: Business and ManagementBusiness and Management (R0)