使用GitHub增强基于AI的代码合成的安全性通过廉价而高效的即时工程实现副驾驶.docx资源-CSDN文库

版权申诉

133 浏览量 2024-04-17 12:47:22 上传评论收藏 82KB DOCX 举报

资源推荐

资源详情

资源评论

Enhancing Security of AI-Based Code Synthesis with GitHub

Copilot via Cheap and Efficient Prompt-Engineering

Jakub Res

iresj@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

Aleš Smrčka

smrcka@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

ABSTRACT

Ivan Homoliak

ihomoliak@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

Kamil

Malinka

malinka@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

Martin Perešíni

iperesini@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

Petr Hanacek

hanacek@fit.vut.cz

Brno University of Technology,

Faculty of Information Technology

Czech Republic

AI assistants for coding are on the rise. However one of the reasons

developers and companies avoid harnessing their full potential is

the questionable security of the generated code. This paper first

reviews the current state-of-the-art and identifies areas for im-

provement on this issue. Then, we propose a systematic approach

based on prompt-altering methods to achieve better code security

of (even proprietary black-box) AI-based code generators such as

GitHub Copilot, while minimizing the complexity of the applica-

tion from the user point-of-view, the computational resources, and

operational costs. In sum, we propose and evaluate three prompt

altering methods: (1) scenario-specific, (2) iterative, and (3) general

clause, while we discuss their combination. Contrary to the audit

of code security, the latter two of the proposed methods require

no expert knowledge from the user. We assess the effectiveness of

the proposed methods on the GitHub Copilot using the OpenVPN

project in realistic scenarios, and we demonstrate that the proposed

methods reduce the number of insecure generated code samples

by up to 16% and increase the number of secure code by up to

8%. Since our approach does not require access to the internals of

the AI models, it can be in general applied to any AI-based code

synthesizer, not only GitHub Copilot.

1

INTRODUCTION

With the release of ChatGPT [1] , public attention shifted towards

AI assistant tools. These assistants are proficient in many areas,

including software engineering or coding. The advent of AI coding

assistants means transitioning from intelligent code-completion

tools to code-generating tools. Although these AI assistants are far

from perfect, in terms of solving coding problems, a recent model

AlphaCode 2, proposed by Deepmind, scored better than over 85 %

of human competitors [9].

According to Liang et al. [11] in the survey with 410 Github users’

responses, 70 % of respondents who had experiences with Github

Copilot utilize it at least once in a month while 46 % utilize the AI

assistant daily. The most frequent reasons for developers using AI

assistants were fewer keystrokes to write code and faster coding.

Due to the rapidly rising popularity of AI assistants, researchers

started to focus on studying the quality of the synthesized code and

Fig. 1: Example of security issue generated by AI. The sce-

nario comes from the dataset proposed in [17].

ways of improving it (see Sec. 5.2). While observing the validity

or correctness, many studies overlook the crucial aspect of code—

security.

In the motivating example, the AI assistant was tasked with

generating a code snippet to fill a gap in the context of a C program.

Its objective was to create a new instance of the structure "person"

and assign a status value of zero to it. Although the AI assistant

provided a reasonable code (see Fig. 1), the snippet contain CWE-

476 [25] (the malloc function could fail to allocate memory, thus

resulting in a NULL pointer dereference).

In this research, we aim to study various ways of improving

code security generated by any proprietary Large Language Mod-

els (LLMs), and we demonstrate our approach on the well-known

GitHub Copilot [6].

There exist a few categories for improving the code synthe-

sis of AI models, such as output optimization, model fine-tuning,

and prompt engineering, and each of them has some pros and

cons. In this work, we focus on efficiency, generality, and low

costs, and therefore prompt engineering is the most suitable tech-

nique for us. While literature for prompt engineering is mostly

general [14][31][5][4], we are more specific and determine four ap-

proaches to it, which we further investigate: (1) scenario-specific

information and warning providing, (2) iterative security-specific

prompting, (3) general alignment shifting using inception prompt

(i.e., general clause), (4) cooperative agents system. In particular,

we experiment with the former three approaches that are orthogo-

nal in their principles.

Contributions. The contributions of our paper are as follows:

(1)

We reviewed the literature and identified three different

areas of code synthesis improvements of LLMs, involving

person * new

P

e

r

s

on

=

(

person

*)

m

a

ll

o

c

(

s

i

z

eo

f

(

person

));

ne

w

P

e

r

s

on

-

>

s

t

a

t

u

s

=

0

;

arXiv:2403.12671v1

[cs.CR]

19 Mar 2024

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余6页未读，立即下载

内容反馈

版权申诉

百态老人

粉丝: 1584
资源: 2万+

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip