GenAI Model Security

Huang, Ken; Goertzel, Ben; Wu, Daniel; Xie, Anita

doi:10.1007/978-3-031-54252-7_6

Ken Huang⁷,
Ben Goertzel⁸,
Daniel Wu⁹ &
…
Anita Xie¹⁰

Part of the book series: Future of Business and Finance ((FBF))

334 Accesses
1 Altmetric

Abstract

Safeguarding GenAI models against threats and aligning them with security requirements is imperative yet challenging. This chapter provides an overview of the security landscape for generative models. It begins by elucidating common vulnerabilities and attack vectors, including adversarial attacks, model inversion, backdoors, data extraction, and algorithmic bias. The practical implications of these threats are discussed, spanning domains like finance, healthcare, and content creation. The narrative then shifts to exploring mitigation strategies and innovative security paradigms. Differential privacy, blockchain-based provenance, quantum-resistant algorithms, and human-guided reinforcement learning are analyzed as potential techniques to harden generative models. Broader ethical concerns surrounding transparency, accountability, deepfakes, and model interpretability are also addressed. The chapter aims to establish a conceptual foundation encompassing both the technical and ethical dimensions of security for generative AI. It highlights open challenges and lays the groundwork for developing robust, trustworthy, and human-centric solutions. The multifaceted perspective spanning vulnerabilities, implications, and solutions is intended to further discourse on securing society’s growing reliance on generative models. Frontier model security is discussed using Anthropic proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, N. (2023, March 23). Model inversion attacks | A new AI security risk. Michalsons. Retrieved August 28, 2023, from https://www.michalsons.com/blog/model-inversion-attacks-a-new-ai-security-risk/64427
Anthropic. (2023a, July 25). Frontier model security. Anthropic. Retrieved November 26, 2023, from https://www.anthropic.com/index/frontier-model-security
Anthropic. (2023b, October 5). Decomposing language models into understandable components. Anthropic. Retrieved October 10, 2023, from https://www.anthropic.com/index/decomposing-language-models-into-understandable-components
Bansemer, J., & Lohn, A. (2023, July 6). Securing AI makes for safer AI. Center for Security and Emerging Technology. Retrieved August 29, 2023, from https://cset.georgetown.edu/article/securing-ai-makes-for-safer-ai/
Brownlee, J. (2018, December 7). A gentle introduction to early stopping to avoid overtraining neural networks - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 29, 2023, from https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
Datascientest. (2023, March 9). SHapley additive exPlanations ou SHAP : What is it ? DataScientest.com. Retrieved August 29, 2023, from https://datascientest.com/en/shap-what-is-it
Dickson, B. (2022, May 23). Machine learning has a backdoor problem. TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2022/05/23/machine-learning-undetectable-backdoors/
Dickson, B. (2023, January 16). What is reinforcement learning from human feedback (RLHF)? TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2023/01/16/what-is-rlhf/
Duffin, M. (2023, August 12). Machine unlearning: The critical art of teaching AI to forget. VentureBeat. Retrieved October 7, 2023, from https://venturebeat.com/ai/machine-unlearning-the-critical-art-of-teaching-ai-to-forget/
Gupta, A. (2020, October 12). Global model interpretability techniques for Black Box models. Analytics Vidhya. Retrieved August 29, 2023, from https://www.analyticsvidhya.com/blog/2020/10/global-model-interpretability-techniques-for-black-box-models/
Irolla, P. (2019, September 19). Demystifying the membership inference attack | by Paul Irolla | Disaitek. Medium. Retrieved August 29, 2023, from https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39
Kan, M., & Ozalp, H. (2023, November 9). OpenAI Blames ChatGPT Outages on DDoS Attacks. PCMag. Retrieved November 23, 2023, from https://www.pcmag.com/news/openai-blames-chatgpt-outages-on-ddos-attacks
Nagpal, A., & Guide, S. (2022, January 5). L1 and L2 regularization methods, explained. Built In. Retrieved August 29, 2023, from https://builtin.com/data-science/l2-regularization
Nguyen, A. (2019, July). Understanding differential privacy | by An Nguyen. Towards Data Science. Retrieved August 28, 2023, from https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a
NIST. (2022, February 3). NIST Special Publication (SP) 800-218, Secure Software Development Framework (SSDF) Version 1.1: Recommendations for mitigating the risk of software vulnerabilities. NIST Computer Security Resource Center. Retrieved November 26, 2023, from https://csrc.nist.gov/pubs/sp/800/218/final
Noone, R. (2023, July 28). Researchers discover new vulnerability in large language models. Carnegie Mellon University. Retrieved August 28, 2023, from https://www.cmu.edu/news/stories/archives/2023/july/researchers-discover-new-vulnerability-in-large-language-models
O’Connor’s, R., & O’Connor, R. (2023, August 1). How reinforcement learning from AI feedback works. AssemblyAI. Retrieved October 10, 2023, from https://www.assemblyai.com/blog/how-reinforcement-learning-from-ai-feedback-works/
Olah, C. (2022, June 27). mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread. Retrieved August 29, 2023, from https://transformer-circuits.pub/2022/mech-interp-essay/index.html
OWASP. (2023). OWASP top 10 for large language model applications. OWASP Foundation. Retrieved August 29, 2023, from https://owasp.org/www-project-top-10-for-large-language-model-applications/
Google Scholar
Ribeiro, M. T. (2016, April 2). LIME - Local interpretable model-agnostic explanations – Marco Tulio Ribeiro –. Retrieved August 29, 2023, from https://homes.cs.washington.edu/~marcotcr/blog/lime/
Sample, I., & Gregory, S. (2020, January 13). What are deepfakes – and how can you spot them? The Guardian. Retrieved August 29, 2023, from https://www.theguardian.com/technology/2020/jan/13/what-are-deepfakes-and-how-can-you-spot-them
Sanzeri, S., & Danise, A. (2023, June 23). The quantum threat to AI language models like ChatGPT. Forbes. Retrieved August 29, 2023, from https://www.forbes.com/sites/forbestechcouncil/2023/06/23/the-quantum-threat-to-ai-language-models-like-chatgpt/
Secureworks. (2023, June 27). Unravelling the attack surface of AI systems. Secureworks. Retrieved August 29, 2023, from https://www.secureworks.com/blog/unravelling-the-attack-surface-of-ai-systems
Tomorrow.bio. (2023, September 21). Preventing Bias in AI Models with Constitutional AI. Tomorrow Bio. Retrieved October 10, 2023, from https://www.tomorrow.bio/post/preventing-bias-in-ai-models-with-constitutional-ai-2023-09-5160899464-futurism
van Heeswijk, W. (2022, November 29). Proximal policy optimization (PPO) explained | by Wouter van Heeswijk, PhD. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b
Wolford, B. (2021). Everything you need to know about the “Right to be forgotten” - GDPR.eu. GDPR compliance. Retrieved October 7, 2023, from https://gdpr.eu/right-to-be-forgotten/
Wunderwuzzi. (2020, November 10). Machine learning attack series: repudiation threat and auditing · Embrace the red. Embrace The Red. Retrieved August 29, 2023, from https://embracethered.com/blog/posts/2020/husky-ai-repudiation-threat-deny-action-machine-learning/
Yadav, H. (2022, July 4). Dropout in neural networks. Dropout layers have been the go-to… | by Harsh Yadav. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9
Yasar, K. (2022). What is a generative adversarial network (GAN)? | Definition from TechTarget. TechTarget. Retrieved August 29, 2023, from https://www.techtarget.com/searchenterpriseai/definition/generative-adversarial-network-GAN

Download references

Author information

Authors and Affiliations

DistributedApps.ai, Fairfax, VA, USA
Ken Huang
SingularityNET Foundation, Amsterdam, The Netherlands
Ben Goertzel
JPMorgan Chase & Co., Palo Alto, CA, USA
Daniel Wu
Black Cloud Technology, Jiangsu Province, People’s Republic of China
Anita Xie

Authors

Ken Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ben Goertzel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Wu
View author publications
You can also search for this author in PubMed Google Scholar
Anita Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ken Huang .

Editor information

Editors and Affiliations

DistributedApps.ai, Fairfax, VA, USA
Ken Huang
The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Yang Wang
SingularityNET Foundation, Amsterdam, The Netherlands
Ben Goertzel
World Digital Technology Academy, Geneva, Switzerland
Yale Li
Universal Music Group (United States), Santa Monica, CA, USA
Sean Wright
Innovation Strategy & Research, Truist Bank, Southlake, TX, USA
Jyoti Ponnapalli

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, K., Goertzel, B., Wu, D., Xie, A. (2024). GenAI Model Security. In: Huang, K., Wang, Y., Goertzel, B., Li, Y., Wright, S., Ponnapalli, J. (eds) Generative AI Security. Future of Business and Finance. Springer, Cham. https://doi.org/10.1007/978-3-031-54252-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-54252-7_6
Published: 06 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54251-0
Online ISBN: 978-3-031-54252-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics