Angora(S&P2018)资源-CSDN文库

2018

需积分: 28 138 浏览量 2018-03-24 18:41:37 上传评论收藏 468KB PDF 举报

标题《Angora: Efficient Fuzzing by Principled Search》以及描述中涉及的知识点主要集中在软件测试领域中的模糊测试（fuzzing）技术，特别是有关提高模糊测试效率与效果的方法研究。以下是该文档所涉及知识点的详细说明： 1. 模糊测试（Fuzzing）的概念：模糊测试是一种用于发现软件中错误（bug）的技术，其方法是自动地、随机地或基于变异生成大量的输入数据（称为模糊数据），并将其输入到目标程序中去，以发现程序的异常行为或崩溃。模糊测试是一种常见的安全测试方法，它能够揭示内存损坏、资源泄露、逻辑错误等程序问题。 2. 现有模糊测试工具的局限性：文档描述中提出现有的模糊测试工具存在性能上的不足，有的基于符号执行产生高质量的输入但运行速度慢，有的基于随机变异快速运行但难以产生高质量输入。符号执行通常能够更准确地解决路径约束问题，但是它的效率较低；随机变异则依赖于大量随机生成的测试用例，可能效率较高，但难以精确触发复杂的程序路径。 3. Angora的设计目标与创新点： Angora作为一款新的基于变异的模糊器，通过提出原则性搜索方法，旨在提高分支覆盖（branch coverage），从而发现更多的软件漏洞。Angora以不同于现有工具的方法解决路径约束问题，并且能够以较高的效率执行。其设计目标在于通过一系列关键技术的引入，显著提升现有模糊测试工具的性能。 4. Angora的关键技术： - 可扩展的字节级污点跟踪（scalable byte-level taint tracking）：通过监控程序执行过程中的数据流，Angora能够跟踪输入数据如何影响程序状态。 - 上下文敏感的分支计数（context-sensitive branch count）：Angora考虑上下文信息来提高分支覆盖的准确性。 - 基于梯度下降的搜索（search based on gradient descent）：Angora使用了先进的搜索算法来更高效地解决路径约束问题。 - 输入长度探索（input length exploration）：Angora探索不同长度的输入对覆盖路径的效果，以优化测试数据。 5. Angora的实际效果：文档提到，Angora在LAVA-M数据集测试中发现了几乎所有的注入漏洞，并且在与之对比的其它模糊测试器中发现了更多漏洞，甚至比排名第二的模糊测试器多发现八倍的漏洞。此外，Angora还发现了LAVA作者注入但未触发的103个漏洞。在对八个流行、成熟的开源程序进行测试时，Angora分别在file、jhead、nm、objdump和size程序中发现新漏洞6个、52个、29个、40个和48个。 6. Angora与其他模糊测试工具的比较：文档并未详细说明Angora与哪些具体工具进行了对比，但从描述中可以了解到Angora在发现漏洞方面具有显著优势，性能超过所有其它对比的模糊测试工具。 7. 模糊测试的覆盖率测量和性能评估：模糊测试效果通常通过覆盖率来评估，即通过模糊测试覆盖了多少分支路径来衡量。文档说明了Angora的性能评估，并指出了其关键技术和设计如何贡献了出色的性能表现。通过以上内容的概述，我们可以看出Angora模糊测试器通过在路径约束求解和输入数据生成方面的创新，为提高软件测试效率和漏洞发现能力提供了新的解决方案。这种方法的提出，对于提高当前模糊测试工具的性能具有重要意义。

资源推荐

资源详情

资源评论

To appear in the 39th IEEE Symposium on Security and Privacy, May 21–23, 2018, San Francisco, CA, USA

Angora: Efﬁcient Fuzzing by Principled Search

Peng Chen

ShanghaiTech University

chenpeng@shanghaitech.edu.cn

Hao Chen

University of California, Davis

chen@ucdavis.edu

Abstract—Fuzzing is a popular technique for ﬁnding software

bugs. However, the performance of the state-of-the-art fuzzers

leaves a lot to be desired. Fuzzers based on symbolic execution

produce quality inputs but run slow, while fuzzers based on

random mutation run fast but have difﬁculty producing quality

inputs. We propose Angora, a new mutation-based fuzzer that

outperforms the state-of-the-art fuzzers by a wide margin.

The main goal of Angora is to increase branch coverage

by solving path constraints without symbolic execution. To

solve path constraints efﬁciently, we introduce several key

techniques: scalable byte-level taint tracking, context-sensitive

branch count, search based on gradient descent, and input

length exploration. On the LAVA-M data set, Angora found

almost all the injected bugs, found more bugs than any other

fuzzer that we compared with, and found eight times as many

bugs as the second-best fuzzer in the program who. Angora

also found 103 bugs that the LAVA authors injected but could

not trigger. We also tested Angora on eight popular, mature

open source programs. Angora found 6, 52, 29, 40 and 48

new bugs in ﬁle, jhead, nm, objdump and size, respectively. We

measured the coverage of Angora and evaluated how its key

techniques contribute to its impressive performance.

1. Introduction

Fuzzing is a popular technique for ﬁnding software bugs.

Coverage-based fuzzers face the key challenge of how to

create inputs to explore program states. Some fuzzers use

symbolic execution to solve path constraints [5, 8], but

symbolic execution is slow and cannot solve many types

of constraints efﬁciently [6]. To avoid these problems, AFL

uses no symbolic execution or any heavy weight program

analysis [1]. It instruments the program to observe which

inputs explore new program branches, and keeps these in-

puts as seeds for further mutation. AFL incurs low overhead

on program execution, but most of the inputs that it creates

are ineffective (i.e., they fail to explore new program states)

because it blindly mutates the input without taking advan-

tage of the data ﬂow in the program. Several fuzzers added

heuristics to AFL to solve simple predicates, such as “magic

bytes” [25, 19], but they cannot solve other path constraints.

TABLE 1: Bugs found on the LAVA-M data set by different

fuzzers. Note that Angora found more bugs than listed by

LAVA authors.

Program

Listed Bugs found by each fuzzer

bugs Angora AFL FUZZER SES VUzzer Steelix

uniq 28 29 9 7 0 27 7

base64 44 48 0 7 9 17 43

md5sum 57 57 0 2 0 Fail 28

who 2136 1541 1 0 18 50 194

We designed and implemented a fuzzer, called Angora

that explores the states of a program by solving path con-

straints without using symbolic execution. Angora tracks the

unexplored branches and tries to solve the path constraints

on these branches. We introduced the following techniques

to solve path constraints efﬁciently.

• Context-sensitive branch coverage. AFL uses context-

insensitive branch coverage to approximate program

states. Our experience shows that adding context to

branch coverage allows Angora to explore program

states more pervasively (Section 3.2).

• Scalable byte-level taint tracking. Most path constraints

depend on only a few bytes in the input. By track-

ing which input bytes ﬂow into each path constraint,

Angora mutates only these bytes instead of the entire

input, therefore reducing the space of exploration sub-

stantially (Section 3.3).

• Search based on gradient descent. When mutating the

input to satisfy a path constraint, Angora avoids sym-

bolic execution, which is expensive and cannot solve

many types of constraints. Instead, Angora uses the

gradient descent algorithm popular in machine learning

to solve path constraints (Section 3.4).

• Type and shape inference. Many bytes in the input

are used collectively as a single value in the program,

e.g., a group of four bytes in the input used as a 32-

bit signed integer in the program. To allow gradient

descent to search efﬁciently, Angora locates the above

group and infers its type (Section 3.5).

1. The Angora rabbit has longer, denser hair than American Fuzzy Lop.

We name our fuzzer Angora to signify that it has better program coverage

than AFL while crediting AFL for its inspiration.

arXiv:1803.01307v1 [cs.CR] 4 Mar 2018

• Input length exploration. A programs may explore cer-

tain states only when the length of the input exceeds

some threshold, but neither symbolic execution nor

gradient descent can tell the fuzzer when to increase the

length of the input. Angora detects when the length of

the input may affect a path constraint and then increases

the input length adequately (Section 3.6).

Angora outperformed state-of-the-art fuzzers substan-

tially. Table 1 compares the bugs found by Angora with

other fuzzers on the LAVA-M data set [9]. Angora found

more bugs in each program in the data set. Particularly,

in who Angora found 1541 bugs, which is eight times as

many bugs as found by the second-best fuzzer, Steelix.

Moreover, Angora found 103 bugs that the LAVA authors

injected but could not trigger. We also tested Angora on

eight popular, mature open source programs. Angora found

6, 52, 29, 40 and 48 new bugs in ﬁle, jhead, nm, objdump

and size, respectively (Table 5). We measured the coverage

of Angora and evaluated how its key techniques contribute

to its impressive performance.

2. Background: American Fuzzy Lop (AFL)

Fuzzing is an automated testing technique to ﬁnd

bugs. American Fuzzy Lop (AFL) [1] is a state-of-the-

art mutation-based graybox fuzzer. AFL employs light-

weight compile-time instrumentation and genetic algorithms

to automatically discover test cases that likely trigger new

internal states in the targeted program. As a coverage-based

fuzzer, AFL generates inputs to traverse different paths in

the program to trigger bugs.

2.1. Branch coverage

AFL measures a path by a set of branches. During each

run, AFL counts how many times each branch executes.

It represents a branch as a tuple (l

, l

cur

), where l

and l

cur

are the IDs of the basic blocks before and after

the conditional statement, respectively. AFL gets the branch

coverage information by using lightweight instrumentation.

The instrumentation is injected at each branch point at

compile time. For each run, AFL allocates a path trace table

to count how many times each branch of every conditional

statement executes. The index to the table is the hash of a

branch, h(l

, l

cur

), where h is a hash function.

AFL also keeps a global branch coverage table across

different runs. Each entry contains an 8-bit vector that

records how many times the branch executes in different

runs. Each bit in this vector b represents a range: b

, . . . , b

represent the ranges [1], [2], [3], [4, 7], [8, 15], [16, 31],

[32, 127], [128, ∞), respectively. For example, if b

is set,

then it indicates that there exists a run where this branch

executed between 4 and 7 times, inclusively.

AFL compares the path trace table and branch coverage

table to determine, heuristically, whether a new input trig-

gers a new internal state of the program. An input triggers

a new internal state if either of the following happens:

• The program executes a new branch, i.e., the path

trace table has an entry for this branch but the branch

coverage table has no entry for this branch.

• There exists a branch where the number of times, n,

this branch executed in the current run is different from

any previous runs. AFL determines this approximately

by examining whether the bit representing the range of

n was set in the corresponding bit vector in the branch

coverage table.

2.2. Mutation strategies

AFL applies the following mutations on the input ran-

domly [3].

• Bit or byte ﬂips.

• Attempts to set “interesting” bytes, words, or dwords.

• Addition or subtraction of small integers to bytes,

words, or dwords.

• Completely random single-byte sets.

• Block deletion, block duplication via overwrite or in-

sertion, or block memset.

• Splice two distinct input ﬁles at a random location.

3. Design

3.1. Overview

AFL and other similar fuzzers use branch coverage as

the metric. However, they fail to consider the call context

when calculating branch coverage. Our experience shows

that without context, branch coverage would fail to explore

program states adequately. Therefore, we propose context-

sensitive branch coverage as the metric of coverage (Sec-

tion 3.2).

Algorithm 1 shows Angora’s two stages: instrumentation

and the fuzzing loop. During each iteration of the fuzzing

loop, Angora selects an unexplored branch and searches

for an input that explores this branch. We introduce the

following key techniques to ﬁnd the input efﬁciently.

• For most conditional statements, its predicate is inﬂu-

enced by only a few bytes in the input, so it would

be unproductive to mutate the entire input. Therefore,

when exploring a branch, Angora determines which

input bytes ﬂow into the corresponding predicate and

focuses on mutating these bytes only (Section 3.3).

• After determining which input bytes to mutate, Angora

needs to decide how to mutate them. Using random

or heuristics-based mutations is unlikely to ﬁnd sat-

isfactory values efﬁciently. Instead, we view the path

constraint on a branch as a constraint on a blackbox

function over the input, and we adapt the gradient de-

scent algorithm for solving the constraint (Section 3.4).

• During gradient descent, we evaluate the blackbox

function over its arguments, where some arguments

consist of multiple bytes. For example, when four

consecutive bytes in the input that are always used

together as an integer ﬂow into a conditional statement,

Algorithm 1 Angora’s fuzzing loop. Each while loop has a

budget (maximum allowed number of iterations)

1: function FUZZ(program, seeds)

2: Instrument program in two versions: program

(no taint tracking) and program

(with taint tracking).

3: branches ← empty hash table  Key: an

unexplored branch b. Value: the input that explored b’s

sibling branch.

4: for all input ∈ seeds do

5: path ← Run program

(input)

6: for all unexplored branch b on path do

7: branches[b] ← input

8: end for

9: end for

10: while branches 6= ∅ do

11: Select b from branches

12: while b is still unexplored do

13: Mutate branches[b] to get a new input

input

(Algorithm 5)

14: Run program

(input

)

15: if input

explored new branches then

16: path

← Run program

(input

)

17: for all unexplored branch b

on path

18: branches[b

] ← input

19: end for

20: end if

21: if b was explored then

22: branches ← branches − {b}

23: end if

24: end while

25: end while

26: end function

we ought to consider these four bytes as a single

argument to the function instead of as four independent

arguments. To achieve this goal, we need to infer which

bytes in the input are used collectively as a single value

and what the type of the value is (Section 3.5).

• It would be inadequate to only mutate bytes in the

input. Some bugs are triggered only after the input is

longer than a threshold, but this creates a dilemma on

deciding the length of the input. If the input is too

short, it may not trigger certain bugs. But if the input

is too long, the program may run too slow. Most fuzzers

change the length of inputs using ad hoc approaches.

By contrast, Angora instruments the program with code

that detects when a longer input may explore new

branches and that determines the minimum required

length (Section 3.6).

Figure 1 shows a diagram of the steps in fuzzing a con-

ditional statement. The program in Figure 2 demonstrates

these steps in action.

• Byte-level taint tracking: When fuzzing the conditional

statement on Line 2, using byte-level taint tracking,

Angora determines that bytes 1024–1031 ﬂow into this

expression, so it mutates these bytes only.

• Search algorithm based on gradient descent: Angora

needs to ﬁnd inputs that run both branches of the

conditional statement on Line 2, respectively. Angora

treats the expression in the conditional statement as

a function f(x) over the input x, and uses gradient

descent to ﬁnd two inputs x and x

such that f(x) > 0

and f(x

) ≤ 0.

• Shape and type inference: f(x) is a function over the

vector x. During gradient descent, Angora computes

the partial derivative of f over each component of x

separately, so it must determine each component and

its type. On Line 2, Angora determines that x consists

of two components each consisting of four bytes in the

input and having the type 32-bit signed integer.

• Input length exploration: main will not call foo unless

the input has at least 1032 bytes. Instead of blindly

trying longer inputs, we instrument common functions

that read from input and determine if longer input

would explore new states. For example, if the initial

input is shorter than 1024 bytes, then the conditional

statement on Line 12 will execute the true branch.

Since the return value of fread is compared with

1024, Angora knows that only inputs at least 1024

bytes long will explore the false branch. Similarly, the

instrumentation on Lines 16 and 19 instructs Angora

to extend the input to at least 1032 bytes to execute the

function foo.

3.2. Context-sensitive branch count

Section 2 describes AFL’s branch coverage table. Its

design has several advantages. First, it is space efﬁcient. The

number of branches is linear in the size of the program.

Second, using ranges to count branch execution provides

good heuristics on whether a different execution count indi-

cates new internal state of the program. When the execution

count is small (e.g., less than four), any change in the count

is signiﬁcant. However, when the execution count is large

(e.g., greater than 32), a change has to be large enough to

be considered signiﬁcant.

But this design has a limitation. Because AFL’s branches

are context-insensitive, they fail to distinguish the executions

of the same branch in different contexts, which may over-

look new internal states of the program. Figure 3 illustrates

this problem. Consider the coverage of the branch on Line 3.

During the ﬁrst run, the program takes the input “10”. When

it calls f() on Line 19, it executes the true branch on

Line 4. Later, when it calls f() on Line 21, it executes

the false branch on Line 10. Since AFL’s deﬁnition of

branch is context-insensitive, it thinks that both branches

have executed. Later, when the program takes a new input

“01”, AFL thinks that this input triggers no new internal

state, since both the branches on Line 4 and 10 executed

in the previous run. But in fact this new input triggers a

new internal state, as it will cause crash on Line 6 when

input[2]==1.

We incorporate context into the deﬁnition of branches.

We deﬁne a branch as a tuple (l

, l

cur

, context), where

剩余14页未读，继续阅读

评论收藏

内容反馈

lsj_csdn

粉丝: 0
资源: 4

Angora (S&P 2018)

最新资源

Angora (S&P 2018)

安哥拉：安哥拉是一个基于突变的模糊器。 安哥拉的主要目标是通过解决路径约束而无需符号执行来增加分支覆盖率

Angora

Angora Guestbook-开源

angora:这是安哥拉猫

java安卓辅助源码-FuzzingPaper:最近的模糊测试论文

Vector Davinci官方帮助配置使用手册（AutoSAR）.pdf

c++入门，核心，提高讲义笔记

数字图像处理 冈萨雷斯 课后习题

离散数学及其应用 第八版 奇数编号练习答案.pdf

科研伦理与学术规范 期末考试2 （40题）.pdf

最值得收藏的 考研线性代数 全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx

软件著作权设计说明书模板（含填写说明）.docx

AUTOSAR培训教材.rar

“互联网+”大学生创新创业大赛项目计划书

菜菜sklearn课程讲义.rar

最优化理论与算法习题解答.pdf

AUTOSAR官方培训教材.zip

SMA_Connector.zip

HALCON快速入门手册.pdf

LabView 官方教程（全）

notepad++-7.9下载

最值得收藏的 考研高等数学 全部知识点思维导图整理(张宇, 汤家凤), 附带做题技巧/易错点/知识点整理.emmx

2019年最新全国行政区划省市区县级别（矢量数据.shp格式）

孙兴华讲PowerBI【火力全开版】课件和笔记.rar

工程伦理案例分享.docx

费恩曼物理学讲义.pdf

浙江大学机器学习配套资源（胡老师）.rar

IEEE 33节点配电网模型.rar

Simulink 快速入门指南 (R2020b)中文版（官译）.pdf

最新资源

安哥拉：安哥拉是一个基于突变的模糊器。安哥拉的主要目标是通过解决路径约束而无需符号执行来增加分支覆盖率

数字图像处理冈萨雷斯课后习题

离散数学及其应用第八版奇数编号练习答案.pdf

科研伦理与学术规范期末考试2 （40题）.pdf

最值得收藏的考研线性代数全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx

最值得收藏的考研高等数学全部知识点思维导图整理(张宇, 汤家凤), 附带做题技巧/易错点/知识点整理.emmx