没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Simulation of Built-in PHP Features for
Precise Static Code Analysis
Johannes Dahse
Horst G
¨
ortz Institute for IT-Security (HGI)
Ruhr-University Bochum, Germany
johannes.dahse@rub.de
Thorsten Holz
Horst G
¨
ortz Institute for IT-Security (HGI)
Ruhr-University Bochum, Germany
thorsten.holz@rub.de
Abstract—The World Wide Web grew rapidly during the last
decades and is used by millions of people every day for online
shopping, banking, networking, and other activities. Many of
these websites are developed with PHP, the most popular scripting
language on the Web. However, PHP code is prone to different
types of critical security vulnerabilities that can lead to data
leakage, server compromise, or attacks against an application’s
users. This problem can be addressed by analyzing the source
code of the application for security vulnerabilities before the
application is deployed on a web server. In this paper, we present
a novel approach for the precise static analysis of PHP code to
detect security vulnerabilities in web applications. As dismissed
by previous work in this area, a comprehensive configuration
and simulation of over 900 PHP built-in features allows us to
precisely model the highly dynamic PHP language. By performing
an intra- and inter-procedural data flow analysis and by creating
block and function summaries, we are able to efficiently perform
a backward-directed taint analysis for 20 different types of
vulnerabilities. Furthermore, string analysis enables us to validate
sanitization in a context-sensitive manner. Our method is the
first to perform fine-grained analysis of the interaction between
different types of sanitization, encoding, sources, sinks, markup
contexts, and PHP settings. We implemented a prototype of our
approach in a tool called RIPS. Our evaluation shows that RIPS
is capable of finding severe vulnerabilities in popular real-world
applications: we reported 73 previously unknown vulnerabilities
in five well-known PHP applications such as phpBB, osCommerce,
and the conference management software HotCRP.
I. INTRODUCTION
According to W
3
Techs, PHP is the most popular server-
side programming language of all recognized websites with
a share of 81.4% [40]. Many well-known websites such as
Facebook and Wikipedia as well as the most commonly used
content management systems [39] are written in PHP. Due
to its weakly and dynamically typed syntax and a large
number of built-in features, the language is easy to learn
for beginners. However, PHP has a large number of com-
plex language characteristics that lead to many intricacies in
practice. As a result, PHP applications are prone to software
vulnerabilities: in the MITRE CVE database [26], about 29%
of all security vulnerabilities found in computer software are
related to PHP. The wide distribution of PHP and the large
number of PHP-related vulnerabilities lead to a high interest
of finding and patching security vulnerabilities in PHP source
code (e. g., [2, 18, 34, 36, 41, 43]).
Detection of Taint-Style Vulnerabilities: A security
vulnerability occurs when data supplied by the user is used
in critical operations of the application and is not sanitized
sufficiently. An attacker might be able to exploit this flaw by
injecting malicious input that changes the behavior or result
of this operation [33]. These kinds of vulnerabilities are called
taint-style vulnerabilities because untrusted sources such as
user-supplied data are considered as tainted and literally flow
into vulnerable parts of the program (referred to as sensitive
sinks) [5, 24, 29, 35].
Given the fact that large applications can have many
thousands lines of code and time is limited by costs, a man-
ual source code review might be incomplete and inefficient.
Static Code Analysis (SCA) tools can help code reviewers
to minimize the time and costs of a review and convey
expertise in security to the user by encapsulating knowledge in
a limited degree. They automate the process of finding security
vulnerabilities in source code by using taint analysis [35].
Here, the data flow between sources and sinks is modeled and
analyzed for sanitization, which is a hard problem specifically
for highly dynamic languages such as PHP.
Current Approaches: Recent work in this area focused
on the detection of only a limited number of vulnerability
types such as Cross-Site Scripting (XSS) and SQL injection
(SQLi) vulnerabilities [18, 41, 42] or the analysis of sani-
tization routines [13]. Furthermore, existing approaches are
typically imprecise in the sense that some language features
such as built-in sanitization or string manipulation functions
and markup contexts are not modeled accurately. As a result,
certain types of vulnerabilities and sanitization cannot be found
by such approaches. For example, Saner [2] relies on manually
generated test cases, which implies that it can only detect
the vulnerabilities encoded within the tool. Furthermore, other
approaches such as the one presented by Xie and Aiken [43]
do not model built-in functions and thus miss important attack
and defense vectors. Commercial tools that support the PHP
language focus on the detection of vulnerabilities in three or
more programming languages. Consequently, these tools are
building a more generic model and are missing many PHP-
specific vulnerabilities and characteristics [23].
Permission to freely reproduce all or part of this paper for noncommercial
purposes is granted provided that copies bear this notice and the full citation
onthe firstpage. Reproductionfor commercialpurposes isstrictly prohibited
withoutthepriorwrittenconsentoftheInternetSociety,thefirst-namedauthor
(for reproduction of an entire paper only), and the author’s employer if the
paper was prepared within the scope of employment.
NDSS ’14, 23-26 February 2014, San Diego, CA, USA
Copyright 2014 Internet Society, ISBN 1-891562-35-5
http://dx.doi.org/OETT
Our Approach: In this paper, we introduce a novel
approach for the precise static analysis of PHP code. Based
on the insight that prior work missed vulnerabilities due to
not precisely modeling the specifics of the PHP language, we
perform a comprehensive analysis and simulation of built-in
language features such as 952 PHP built-in functions with
respect to the called arguments. This allows us to accurately
analyze the data flow, to detect various sources and sinks,
and to analyze sanitization in a more comprehensive way
compared to prior work in this area. As a result, we find more
security vulnerabilities with higher accuracy. More specifically,
we perform an intra- and inter-procedural data flow analysis
to create summaries of the data flow within the application to
detect taint-style vulnerabilities very efficiently. We perform
context-sensitive string analysis to refine our taint analysis
results based on the current markup context, source type,
and PHP configuration. Generalizing our approach to different
languages is possible by modelling its (less diverse) built-in
features while the analysis algorithms remain the same.
We implemented our approach for PHP in a tool called
RIPS and evaluated it by analyzing popular and complex real-
world applications such as phpBB, HotCRP, and osCommerce.
In total, we analyzed 1 390 files with almost half a million lines
of PHP code. We found that on average, every 4th line of
code required taint analysis. Overall, we detected and reported
73 previously unknown vulnerabilities such as for example
three SQL injection vulnerabilities in HotCRP and several XSS
vulnerabilities in osCommerce. We also analyzed several web
applications that were used during the evaluation of prior work
in this area and found that RIPS outperforms existing tools.
In summary, the contributions of this paper are as follows:
• We demonstrate that a precise modeling of the com-
plex characteristics of the PHP language is essential
to detect weak sanitization and to find security vul-
nerabilities in modern PHP applications. To this end,
we are the first to support the detection of 20 different
types of security vulnerabilities.
• We introduce the algorithms of our tool that is specif-
ically focusing on the specifics of the PHP language.
The tool is the first to perform a fine-grained analysis
of a large number of PHP built-in features. It performs
string analysis for context-sensitive vulnerability con-
firmation of 45 different markup contexts with respect
to the interaction of sink, source type, sanitization,
encoding, and PHP configuration.
• We implemented a prototype of our approach in a tool
called RIPS. We evaluate our approach on large, real-
world applications and demonstrate that RIPS is capa-
ble of finding several previously known and unknown,
severe vulnerabilities. Furthermore, we compare our
results to previous work in this area and demonstrate
that RIPS outperforms state-of-the-art tools.
II. TECHNICAL BACK GRO U ND
In contrast to prior work, we include edge cases of complex
taint-style vulnerabilities in our analysis. We thus first provide
a brief overview of such vulnerabilities and then examine some
specific features and characteristics of the PHP language to
illustrate the difficulties when performing PHP code analysis.
A. Taint-style Vulnerabilities
In the following, we examine the concept for two com-
mon taint-style vulnerabilities where tainted data flows into
a sensitive sink. More specifically, we focus on sanitization
approaches and weaknesses in different scenarios that have to
be identified by our tool in a precise manner.
1) SQL Injection: Web applications are often connected to
a database that stores sensitive data like passwords or credit
card numbers. If a web application dynamically generates a
SQL query with unsanitized user input, an attacker can poten-
tially inject her own SQL syntax to modify the query. This
type of vulnerability is well known and called SQL injection
(SQLi) [10]. Depending on the environment, the attacker can
potentially extract sensitive data from the database, modify
data, or compromise the web server.
To patch such a vulnerability, the user input must be
sanitized before it is embedded into the query. For example,
all quotes must be escaped within a quoted string such that
evasion is not possible. For MySQL, the PHP built-in function
mysql_real_escape_string() adds a preceding back-
slash to every single quote, double quote, and backslash to
neutralize their syntactical effect.
However, if the user input is not embedded into quotes
within the SQL query, no quotes are required for evasion
(see Listing 1). In this case, escaping is not sufficient for
sanitization and the application is still vulnerable. To find such
vulnerabilities, we not only have to model sanitization routines,
but also consider if they are applied to the right markup
context. A complementary way to prevent SQLi vulnerabilities
is to use prepared statements [37].
1 $id = mysql_real_escape_string($_GET['id']);
2 mysql_query("SELECT data FROM users WHERE id = $id");
Listing 1: Insufficient sanitization of a SQL query.
2) Cross-Site Scripting: Cross-Site Scripting (XSS) [21]
is the most common security vulnerability in web applica-
tions [34]. It occurs when user input is reflected to the HTML
result of the application in an unsanitized way. It is then
possible to inject HTML markup into the response page that
is rendered by the client’s browser. An attacker can abuse
this behavior by embedding malicious code into the response
that for example locally defaces the web site or steals cookie
information.
To patch such a vulnerability, the output has to be validated.
Meta characters like < and > as well as quotes must be replaced
by their corresponding HTML entities (e. g., < and >).
The characters will still be displayed by the browser, but
not rendered as HTML markup. In PHP, the built-in function
htmlentities() can be used for output validation. As with
SQL queries, however, it is important to adjust sanitization to
the context of the HTML markup [16].
Listing 2 depicts a snippet of an application that is vulner-
able to XSS. Although sanitization is applied, the context of
the injection still allows an attacker to break the markup and
inject Javascript code. The function htmlentities() only
sanitizes the characters < and > as well as double quotes by
default. Note that the function does not sanitize single quotes.
Thus, an attacker can break the single quoted href-attribute
2
and inject an eventhandler that is attached to the link tag (e. g.,
’ onmouseover=’alert(1)). To encode single quotes to
the HTML entity ', the parameter ENT_QUOTES must
be added to the function htmlentities().
1 $page = htmlentities($_GET['page']);
2 echo "<a href='$page'>click</a>";
Listing 2: Insufficient sanitization with htmlentities().
In our example, however, the application would still be vul-
nerable. Instead of breaking the markup, an attacker can abuse
the diversity of web browsers and inject a Javascript protocol
handler into the link (e. g., javascript:alert(1)). This
injection does not need any meta characters that are encoded
by htmlentities().
Note that there are several other scopes that need to be
considered when using sanitization. For example, when user
input is used within style and script tags, or within event-
handler attributes, additional sanitization is required. Previous
work missed to take the different scopes and their intrinsic
behaviors into account.
B. Intricacies of the PHP language
PHP is the fastest growing and most popular script lan-
guage for web applications. It is a highly dynamic language
with lots of complicated semantics [3] that are frequently used
by modern web applications [12]. In this section, we introduce
the most important language features our tool has to model
precisely in order to correctly identify the flow of tainted data
into sensitive sinks. In particular, the flow of tainted strings is
of interest for taint-style vulnerabilities.
1) Dynamic and Weak Typing: PHP is a dynamically typed
language and does not require an explicit declaration of vari-
ables. The variable type is inferred on the first assignment at
runtime. Additionally, PHP is a weakly typed language and its
variables are not bound to a specific data type. Thus, data types
can be mixed with other data types at runtime. In Listing 3 the
string test is evaluated to 0 to fit the mathematical operation
and added to 1. The integer result is stored in the variable
$var2 whose previous data type was string.
1 $var1 = 1; $var2 = 'test';
2 $var2 = $var1 + $var2; // 1
Listing 3: Addition of a string and an integer in PHP.
2) Variable Variables: Variables are usually introduced
with the dollar character followed by an alphanumeric, case-
sensitive name. However, in PHP the name can also be an
expression, for example retrieved from another variable or the
return value of a function call that is only known at runtime
(see Listing 4). This makes it extremely difficult to analyze
the PHP language statically.
1 $name = "x"; $x = "test";
2 echo $$name; // test
3 $y = ${getVar()};
Listing 4: Variable variables in PHP.
3) Dynamic Arrays: Arrays are hash-tables that map num-
bers or strings (referred to as keys) to values. The key name
can be omitted when initializing an array and generated at
runtime (see Listing 5). Furthermore, keys and values can be
dynamic, as well as the array name itself. When performing
a static analysis, it is a challenge to precisely model such a
dynamic array structure and the dynamic access to it.
1 $var = 6;
2 $arr = array('a', "4" => $var, 'foo' => 'c', 'd');
3 $arr[] = 'e';
4 // Array ([0] => a [4] => 6 [foo] => c [5] => d [6] => e)
5 print $arr[$var]; // e
Listing 5: Dynamically generated key names in an array.
4) Dynamic Constants: In PHP, it is possible to define
constant scalar values as in other programming languages like
C. However, the constant name can be dynamically defined by
the built-in function define() and dynamically accessed by
the built-in function constant(). Although a constant may
not change once it is defined, it is possible to define constants
conditionally in the program flow or dynamically generated
with user input.
5) Dynamic Functions: Several functions with the same
name can be defined conditionally by the developer. Thus, a
totally different function may be called depending on the pro-
gram flow. It is also possible to define a function B() within
another function A() that is only present during the execution
of A(). Further, the built-in functions func_get_arg()
and func_get_args() allow to dynamically fetch argu-
ments of the function call by index.
1 $name = 'step' .(int)$_GET['id'];
2 $name();
3 array_walk($arr = array(1), $name);
Listing 6: Dynamically built and executed function name.
Listing 6 illustrates two different possibilities to call a
function dynamically (Reflection). The function name is built
dynamically in line 1 and is only known at runtime. It is
called in line 2 by adding parenthesis to the variable $name
and used in line 3 as callback function. The built-in function
create_function() dynamically creates function code.
6) Dynamic Code: The eval operator and the built-in
function assert() allows to directly evaluate PHP code that
is passed as string to its first argument. Other functions such
as preg_replace() allow the execution of dynamic PHP
code when used with certain modifiers. Dynamically generated
code is very challenging to analyze if the executed PHP code is
only known at runtime and cannot be reconstructed statically.
Furthermore, it introduces critical security vulnerabilities.
7) Dynamic Includes: The code of large PHP projects is
often split into several files and directories. At runtime, the
code can be merged and executed conditionally. The PHP
operator include opens a specified file, evaluates its PHP
code, and returns to the code after the include operator. It
can be used as expression within any other expression. Further-
more, the file name of an inclusion can be built dynamically
which implies that it is challenging to reconstruct it statically
in complex applications. During static analysis it is crucial to
resolve all file inclusions to analyze the PHP code correctly.
Additionally, tainted data within the file name leads to a File
Inclusion vulnerability.
3
剩余14页未读,继续阅读
资源评论
皇天霸
- 粉丝: 35
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功