WithSecure-Creatively-malicious-prompt-engineering.pdf资源-CSDN文库

版权申诉

113 浏览量 2024-03-23 22:10:39 上传评论收藏 2.14MB PDF 举报

### Creatively Malicious Prompt Engineering：理解与防范 #### 引言随着人工智能技术的飞速发展，特别是自然语言处理（NLP）领域的突破性进展，诸如ChatGPT等大型语言模型已经成为日常生活中不可或缺的一部分。然而，这些技术的进步也带来了新的安全挑战——恶意提示工程（Malicious Prompt Engineering）。本文将深入探讨这一新兴的安全威胁，并提出相应的防范措施。 #### 恶意提示工程概述恶意提示工程是指利用精心设计的输入来操纵或误导人工智能系统的行为。这种攻击方式不仅能够欺骗AI模型生成有害内容，还能通过这种方式进行网络钓鱼、骚扰甚至社会工程攻击。例如，在《WithSecure-Creatively-malicious-prompt-engineering.pdf》一文中提到的多种应用场景： - **网络钓鱼与针对性钓鱼**：攻击者可能利用恶意提示诱导受害者点击恶意链接或泄露敏感信息。 - **骚扰行为**：通过定制的提示来生成令人不适的信息，对特定个人进行骚扰。 - **社会验证**：通过伪造认证信息或评论来影响公众舆论。 - **风格转换**：改变文本的风格或语调，使之更易于被接受或引起共鸣。 - **观点转移**：利用AI生成具有强烈倾向性的内容，从而影响他人的观点或信念。 - **虚假新闻**：生成假新闻以误导公众，破坏社会稳定。 #### 防范措施面对这些挑战，企业和个人都应采取积极的应对策略： 1. **增强意识教育**：定期开展员工培训，提高他们对恶意提示工程的认知水平，学会识别潜在的威胁。 2. **技术防御手段**： - **内容审核工具**：采用如OpenAI提供的内容审核工具等技术手段，过滤掉有害或不合规的信息。 - **检测算法**：开发专门的算法来检测异常行为模式，及时发现并阻止恶意活动。 - **模型安全性提升**：持续改进AI模型的设计，增加其对抗恶意输入的能力。 3. **法律与政策支持**：制定相关法律法规，明确界定和惩罚恶意使用AI的行为，为受害者提供法律保护。 4. **多领域合作**：鼓励跨学科研究与合作，共同探索有效的解决方案和技术。 #### 结论虽然恶意提示工程为网络安全带来了新的挑战，但通过加强公众意识、采用先进的技术手段以及推动相关政策法规的建立，我们可以有效地抵御这类威胁。未来，随着技术的不断进步，我们有理由相信，人类将能够更好地驾驭人工智能的力量，使其服务于社会而非成为隐患。《WithSecure-Creatively-malicious-prompt-engineering.pdf》深入分析了恶意提示工程的概念及其在不同场景下的应用，并提出了具体的防范措施。对于网络安全专业人士和AI开发者来说，这是一份宝贵的参考资料，有助于他们更好地理解和应对这一新兴的安全威胁。

资源推荐

资源详情

资源评论

Creatively

malicious prompt

engineering

Written and researched by

Andrew Patel and Jason Sattler

WithSecure Intelligence, January 2023

Introduction

1. https://openai.com/blog/new-and-improved-content-moderation-tooling/

2. https://arxiv.org/pdf/2209.11344.pdf

3. https://simonwillison.net/2022/Sep/12/prompt-injection/

4. https://github.com/sw-yx/ai-notes/blob/main/Resources/Notion%20AI%20Prompts.md

5. https://github.com/f/awesome-chatgpt-prompts

In Ridley Scott’s early 80s tech noir masterpiece, Rick

Deckard of the Los Angeles Police Department has one

assignment. He needs to nd and “retire” four replicants

that hijacked a ship and then blended into the human

population on earth in search of their creator. A key weapon

in the arsenal of Blade Runners, like Deckard, is the

Voight-Kamp test—a series of prompts designed to elicit

a response that might determine whether a respondent is

human or an android, guided by articial intelligence. We

are all now—to some degree—Blade Runners.

With the wide release of user-friendly tools that employ

autoregressive language models such as GPT-3 and

GPT-3.5, anyone with an internet connection can access

a bot that can deliver wide varieties human-like speech in

seconds. The speed and quality of the language produced

by these models will only improve. And the improvements

will likely be drastic.

This marks a remarkable moment in history. From the end

of 2022 on, any sentient being—which may eventually

include robots—may pause upon encountering a new piece

of text to ask a not-so simple question: Did a robot write

this?

Benefit or hazard or both

This moment presents more than an interesting thought

experiment about how consciousness, society, and

commerce may change. Our ability or inability to identify

machine-generated behavior will likely have serious

consequences when it comes to our vulnerability to crime.

The generation of versatile natural language text from a

small amount of input will inevitably interest criminals,

especially cyber criminals—if it hasn’t already. Likewise,

anyone who uses the web to spread scams, fake news or

misinformation in general may have an interest in a tool

that creates credible, possibly even compelling, text at

incredible speeds.

Widely available interfaces to OpenAI’s large language

models include safety lters

designed to reduce or

eliminate potential harmful uses. These lters are GPT-

based classiers that detect undesired content. Publicly

available large language models aim to be benecial robots.

As access to these models grow, we need to consider

how these models can be misused via the primary way we

engage with articial intelligence to deliver text: prompts.

How we all became prompt engineers

From a cyber security perspective, the study of large

language models, the content they can generate, and the

prompts required to generate that content is important

for a few reasons. Firstly, such research provides us with

visibility into what is and what is not possible with current

tools and allows the community to be alerted to the

potential misuses of such technologies. Secondly, model

outputs can be used to generate datasets containing many

examples of malicious content (such as toxic speech and

online harassment) that can subsequently be used to craft

methods to detect such content, and to determine whether

such detection mechanisms are eective. Finally, ndings

from this research can be used to direct the creation of safer

large language models in the future.

The focus of this research is on prompt engineering

Prompt engineering is a concept related to large language

models that involves discovering inputs that yield desirable

or useful results. In the context of this research, prompt

engineering was used to determine how changes in inputs

aected the resulting synthetic text output. In some cases,

a chain of prompts were used, allowing the model to

support, oppose, refute, reply to, or evaluate its own output.

To instruct GPT-3 to generate content, one must rst

provide it with an input. Inputs can contain multiple

sentences, paragraphs, or even full articles. The more

detailed the prompt, the more likely the model will

synthesize the desired piece of content. Short, simple

prompts are often too general in nature and will not, in

most cases, generate desired output. Think of this task

as describing a wish granted by a genie – the description

should be precise enough to describe what the wish is and

contain enough detail such that no ambiguity remains.

Many prompt engineering tricks

3,4

, have already been

found, such as a set of prompts applicable to ChatGPT

listed in “Prompt engineering: awesome GPT prompts”

Additionally, some “magic” prompts have also been

discovered to work with many large language models such

as GPT-3. One example is “Let’s think step by step.” which

forces the model to work through its reasoning when it

answers. This particular prompt has been shown to improve

GPT-3's handling of certain tasks such as mathematical

problems and explaining why a joke is funny.

Prompt engineering is a relatively new discipline that is

being continually explored and exploited. Some prompts

only work with some models. For instance, ChatGPT is

6. https://lex.page/

7. https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/chatgpt-emerging-ai-threat-landscape/

very nnicky and restrictive and special tricks are required

to make it do things that other models can do by default. In

this article we detail various tips and tricks that we found

while attempting to have GPT-3 create the kind of content

we desired.

Experiments conducted during this research were

performed using lex.page

, an online word processor with

inbuilt GPT-3 functionality. During our research, OpenAI

released ChatGPT. ChatGPT provides a chat-like interface

to a dierent build of GPT-3 that OpenAI denotes as GPT-

3.5. Although we briey evaluated ChatGPT at the time

of its release, we didn’t nd its interface suitable for our

research and we didn’t run any experiments with it.

ChatGPT’s interface to its corresponding large language

model contains much stricter safety lters than the

interface provided by lex.page. Both sites use an OpenAI

API to access their corresponding models. Both APIs

contain in-built safety lters. Results from our experiments

represent the state of those safety lters at the time the

experiments were conducted. It is assumed that safety

lters are updated and improved on a continual basis. As

such, similar experiments may return dierent results in the

future.

Use cases studied during this research were

broken down into the following categories:

Large language models have also been experimented with

for the purposes of identifying vulnerabilities in code, and

for the creation of exploits

. The research described in this

article does not touch upon those use cases.

The GPT-3 model employed by lex.page, and utilized in this

research, was trained using data collected in June 2021.

In all presented examples both prompts and responses

are presented. Some responses were edited for formatting

purposes, but the words themselves were never changed.

Experiments are detailed in the following sections.

Phishing content

Emails or messages designed to trick a user into opening a

malicious attachment or visiting a malicious link

Social opposition

Social media messages designed to troll and harass individuals

or to cause brand damage

Social validation

Social media messages designed to advertise or sell, or to

legitimize a scam

Style transfer

A technique designed to coax the model into using a particular

writing style

Opinion transfer

A technique designed to coax the model into writing about a

subject in a deliberately opinionated way

Prompt creation

A way of asking the model to generate prompts based on content

Fake news

Research into how well GPT-3 can generate convincing fake

news articles of events that weren’t part of its training set

Creatively malicious prompt engineering |

剩余35页未读，继续阅读

评论收藏

内容反馈

版权申诉

百态老人

粉丝: 9588
资源: 2万+

WithSecure-Creatively-malicious-prompt-engineering.pdf

Blender插件-在线搜索免费模型导入 Thangs V0.2.2

教育资料完美版(2021-2022年）高中英语教师资格证试讲面试.doc

初中英语说课稿模板(英文版、中文版)-.pdf

creatively-quiz:学生测验门户

2020_2021学年高中英语Module1Basketball模块素养提升含解析外研版选修7202103041187

2020_2021学年新教材高中英语Unit2ImprovingyourselfPeriod4Developingideas阅读提能课课时素养检测含解析外研版选择性必修第二册202103031193

全英文说课稿.doc

教师资格证高中英语试讲教案.pdf

Groovy in action

achieve的用法总结大全.docx

求职英文的自我介绍精选.doc

Prompt engineering

Sublists of the Academic Word List

Coding project in python

Coding Projects in Scratch

Coding.Projects.in.Python.2017.pdf

Think Java: How to Think Like a Computer Scientist [2016]

Competing ‘Creatively’ in the Credit Card Industry through Exploratory Multivariate Segmentation Strategy

Sybex - Mastering Visual Basic .NET (VBL).pdf

Think Java(O'Reilly,2016)

Mastering Unity Scripting(PACKT,2015)

Coyote’s Guide To:Traditional IDL Graphics:Using Familar Tools Creatively

Streaming.Sharing.Stealing.Big.Data.and.the.Future.of.Entertainment

最新资源