Introduction
1. https://openai.com/blog/new-and-improved-content-moderation-tooling/
2. https://arxiv.org/pdf/2209.11344.pdf
3. https://simonwillison.net/2022/Sep/12/prompt-injection/
4. https://github.com/sw-yx/ai-notes/blob/main/Resources/Notion%20AI%20Prompts.md
5. https://github.com/f/awesome-chatgpt-prompts
In Ridley Scott’s early 80s tech noir masterpiece, Rick
Deckard of the Los Angeles Police Department has one
assignment. He needs to nd and “retire” four replicants
that hijacked a ship and then blended into the human
population on earth in search of their creator. A key weapon
in the arsenal of Blade Runners, like Deckard, is the
Voight-Kamp test—a series of prompts designed to elicit
a response that might determine whether a respondent is
human or an android, guided by articial intelligence. We
are all now—to some degree—Blade Runners.
With the wide release of user-friendly tools that employ
autoregressive language models such as GPT-3 and
GPT-3.5, anyone with an internet connection can access
a bot that can deliver wide varieties human-like speech in
seconds. The speed and quality of the language produced
by these models will only improve. And the improvements
will likely be drastic.
This marks a remarkable moment in history. From the end
of 2022 on, any sentient being—which may eventually
include robots—may pause upon encountering a new piece
of text to ask a not-so simple question: Did a robot write
this?
Benefit or hazard or both
This moment presents more than an interesting thought
experiment about how consciousness, society, and
commerce may change. Our ability or inability to identify
machine-generated behavior will likely have serious
consequences when it comes to our vulnerability to crime.
The generation of versatile natural language text from a
small amount of input will inevitably interest criminals,
especially cyber criminals—if it hasn’t already. Likewise,
anyone who uses the web to spread scams, fake news or
misinformation in general may have an interest in a tool
that creates credible, possibly even compelling, text at
incredible speeds.
Widely available interfaces to OpenAI’s large language
models include safety lters
1
designed to reduce or
eliminate potential harmful uses. These lters are GPT-
based classiers that detect undesired content. Publicly
available large language models aim to be benecial robots.
As access to these models grow, we need to consider
how these models can be misused via the primary way we
engage with articial intelligence to deliver text: prompts.
How we all became prompt engineers
From a cyber security perspective, the study of large
language models, the content they can generate, and the
prompts required to generate that content is important
for a few reasons. Firstly, such research provides us with
visibility into what is and what is not possible with current
tools and allows the community to be alerted to the
potential misuses of such technologies. Secondly, model
outputs can be used to generate datasets containing many
examples of malicious content (such as toxic speech and
online harassment) that can subsequently be used to craft
methods to detect such content, and to determine whether
such detection mechanisms are eective. Finally, ndings
from this research can be used to direct the creation of safer
large language models in the future.
The focus of this research is on prompt engineering
2
.
Prompt engineering is a concept related to large language
models that involves discovering inputs that yield desirable
or useful results. In the context of this research, prompt
engineering was used to determine how changes in inputs
aected the resulting synthetic text output. In some cases,
a chain of prompts were used, allowing the model to
support, oppose, refute, reply to, or evaluate its own output.
To instruct GPT-3 to generate content, one must rst
provide it with an input. Inputs can contain multiple
sentences, paragraphs, or even full articles. The more
detailed the prompt, the more likely the model will
synthesize the desired piece of content. Short, simple
prompts are often too general in nature and will not, in
most cases, generate desired output. Think of this task
as describing a wish granted by a genie – the description
should be precise enough to describe what the wish is and
contain enough detail such that no ambiguity remains.
Many prompt engineering tricks
3,4
, have already been
found, such as a set of prompts applicable to ChatGPT
listed in “Prompt engineering: awesome GPT prompts”
5
Additionally, some “magic” prompts have also been
discovered to work with many large language models such
as GPT-3. One example is “Let’s think step by step.” which
forces the model to work through its reasoning when it
answers. This particular prompt has been shown to improve
GPT-3's handling of certain tasks such as mathematical
problems and explaining why a joke is funny.
Prompt engineering is a relatively new discipline that is
being continually explored and exploited. Some prompts
only work with some models. For instance, ChatGPT is
4