Simulation_ofBuilt-in_PHP_Features_for_Precise_Static_Code

需积分: 9 153 浏览量 2021-06-15 19:17:00 上传评论收藏 299KB PDF 举报

资源推荐

资源详情

资源评论

Simulation of Built-in PHP Features for

Precise Static Code Analysis

Johannes Dahse

Horst G

ortz Institute for IT-Security (HGI)

Ruhr-University Bochum, Germany

johannes.dahse@rub.de

Thorsten Holz

Horst G

ortz Institute for IT-Security (HGI)

Ruhr-University Bochum, Germany

thorsten.holz@rub.de

Abstract—The World Wide Web grew rapidly during the last

decades and is used by millions of people every day for online

shopping, banking, networking, and other activities. Many of

these websites are developed with PHP, the most popular scripting

language on the Web. However, PHP code is prone to different

types of critical security vulnerabilities that can lead to data

leakage, server compromise, or attacks against an application’s

users. This problem can be addressed by analyzing the source

code of the application for security vulnerabilities before the

application is deployed on a web server. In this paper, we present

a novel approach for the precise static analysis of PHP code to

detect security vulnerabilities in web applications. As dismissed

by previous work in this area, a comprehensive conﬁguration

and simulation of over 900 PHP built-in features allows us to

precisely model the highly dynamic PHP language. By performing

an intra- and inter-procedural data ﬂow analysis and by creating

block and function summaries, we are able to efﬁciently perform

a backward-directed taint analysis for 20 different types of

vulnerabilities. Furthermore, string analysis enables us to validate

sanitization in a context-sensitive manner. Our method is the

ﬁrst to perform ﬁne-grained analysis of the interaction between

different types of sanitization, encoding, sources, sinks, markup

contexts, and PHP settings. We implemented a prototype of our

approach in a tool called RIPS. Our evaluation shows that RIPS

is capable of ﬁnding severe vulnerabilities in popular real-world

applications: we reported 73 previously unknown vulnerabilities

in ﬁve well-known PHP applications such as phpBB, osCommerce,

and the conference management software HotCRP.

I. INTRODUCTION

According to W

Techs, PHP is the most popular server-

side programming language of all recognized websites with

a share of 81.4% [40]. Many well-known websites such as

Facebook and Wikipedia as well as the most commonly used

content management systems [39] are written in PHP. Due

to its weakly and dynamically typed syntax and a large

number of built-in features, the language is easy to learn

for beginners. However, PHP has a large number of com-

plex language characteristics that lead to many intricacies in

practice. As a result, PHP applications are prone to software

vulnerabilities: in the MITRE CVE database [26], about 29%

of all security vulnerabilities found in computer software are

related to PHP. The wide distribution of PHP and the large

number of PHP-related vulnerabilities lead to a high interest

of ﬁnding and patching security vulnerabilities in PHP source

code (e. g., [2, 18, 34, 36, 41, 43]).

Detection of Taint-Style Vulnerabilities: A security

vulnerability occurs when data supplied by the user is used

in critical operations of the application and is not sanitized

sufﬁciently. An attacker might be able to exploit this ﬂaw by

injecting malicious input that changes the behavior or result

of this operation [33]. These kinds of vulnerabilities are called

taint-style vulnerabilities because untrusted sources such as

user-supplied data are considered as tainted and literally ﬂow

into vulnerable parts of the program (referred to as sensitive

sinks) [5, 24, 29, 35].

Given the fact that large applications can have many

thousands lines of code and time is limited by costs, a man-

ual source code review might be incomplete and inefﬁcient.

Static Code Analysis (SCA) tools can help code reviewers

to minimize the time and costs of a review and convey

expertise in security to the user by encapsulating knowledge in

a limited degree. They automate the process of ﬁnding security

vulnerabilities in source code by using taint analysis [35].

Here, the data ﬂow between sources and sinks is modeled and

analyzed for sanitization, which is a hard problem speciﬁcally

for highly dynamic languages such as PHP.

Current Approaches: Recent work in this area focused

on the detection of only a limited number of vulnerability

types such as Cross-Site Scripting (XSS) and SQL injection

(SQLi) vulnerabilities [18, 41, 42] or the analysis of sani-

tization routines [13]. Furthermore, existing approaches are

typically imprecise in the sense that some language features

such as built-in sanitization or string manipulation functions

and markup contexts are not modeled accurately. As a result,

certain types of vulnerabilities and sanitization cannot be found

by such approaches. For example, Saner [2] relies on manually

generated test cases, which implies that it can only detect

the vulnerabilities encoded within the tool. Furthermore, other

approaches such as the one presented by Xie and Aiken [43]

do not model built-in functions and thus miss important attack

and defense vectors. Commercial tools that support the PHP

language focus on the detection of vulnerabilities in three or

more programming languages. Consequently, these tools are

building a more generic model and are missing many PHP-

speciﬁc vulnerabilities and characteristics [23].

Permission to freely reproduce all or part of this paper for noncommercial

purposes is granted provided that copies bear this notice and the full citation

onthe ﬁrstpage. Reproductionfor commercialpurposes isstrictly prohibited

withoutthepriorwrittenconsentoftheInternetSociety,theﬁrst-namedauthor

(for reproduction of an entire paper only), and the author’s employer if the

paper was prepared within the scope of employment.

NDSS ’14, 23-26 February 2014, San Diego, CA, USA

http://dx.doi.org/OETT

Our Approach: In this paper, we introduce a novel

approach for the precise static analysis of PHP code. Based

on the insight that prior work missed vulnerabilities due to

not precisely modeling the speciﬁcs of the PHP language, we

perform a comprehensive analysis and simulation of built-in

language features such as 952 PHP built-in functions with

respect to the called arguments. This allows us to accurately

analyze the data ﬂow, to detect various sources and sinks,

and to analyze sanitization in a more comprehensive way

compared to prior work in this area. As a result, we ﬁnd more

security vulnerabilities with higher accuracy. More speciﬁcally,

we perform an intra- and inter-procedural data ﬂow analysis

to create summaries of the data ﬂow within the application to

detect taint-style vulnerabilities very efﬁciently. We perform

context-sensitive string analysis to reﬁne our taint analysis

results based on the current markup context, source type,

and PHP conﬁguration. Generalizing our approach to different

languages is possible by modelling its (less diverse) built-in

features while the analysis algorithms remain the same.

We implemented our approach for PHP in a tool called

RIPS and evaluated it by analyzing popular and complex real-

world applications such as phpBB, HotCRP, and osCommerce.

In total, we analyzed 1 390 ﬁles with almost half a million lines

of PHP code. We found that on average, every 4th line of

code required taint analysis. Overall, we detected and reported

73 previously unknown vulnerabilities such as for example

three SQL injection vulnerabilities in HotCRP and several XSS

vulnerabilities in osCommerce. We also analyzed several web

applications that were used during the evaluation of prior work

in this area and found that RIPS outperforms existing tools.

In summary, the contributions of this paper are as follows:

• We demonstrate that a precise modeling of the com-

plex characteristics of the PHP language is essential

to detect weak sanitization and to ﬁnd security vul-

nerabilities in modern PHP applications. To this end,

we are the ﬁrst to support the detection of 20 different

types of security vulnerabilities.

• We introduce the algorithms of our tool that is specif-

ically focusing on the speciﬁcs of the PHP language.

The tool is the ﬁrst to perform a ﬁne-grained analysis

of a large number of PHP built-in features. It performs

string analysis for context-sensitive vulnerability con-

ﬁrmation of 45 different markup contexts with respect

to the interaction of sink, source type, sanitization,

encoding, and PHP conﬁguration.

• We implemented a prototype of our approach in a tool

called RIPS. We evaluate our approach on large, real-

world applications and demonstrate that RIPS is capa-

ble of ﬁnding several previously known and unknown,

severe vulnerabilities. Furthermore, we compare our

results to previous work in this area and demonstrate

that RIPS outperforms state-of-the-art tools.

II. TECHNICAL BACK GRO U ND

In contrast to prior work, we include edge cases of complex

taint-style vulnerabilities in our analysis. We thus ﬁrst provide

a brief overview of such vulnerabilities and then examine some

speciﬁc features and characteristics of the PHP language to

illustrate the difﬁculties when performing PHP code analysis.

A. Taint-style Vulnerabilities

In the following, we examine the concept for two com-

mon taint-style vulnerabilities where tainted data ﬂows into

a sensitive sink. More speciﬁcally, we focus on sanitization

approaches and weaknesses in different scenarios that have to

be identiﬁed by our tool in a precise manner.

1) SQL Injection: Web applications are often connected to

a database that stores sensitive data like passwords or credit

card numbers. If a web application dynamically generates a

SQL query with unsanitized user input, an attacker can poten-

tially inject her own SQL syntax to modify the query. This

type of vulnerability is well known and called SQL injection

(SQLi) [10]. Depending on the environment, the attacker can

potentially extract sensitive data from the database, modify

data, or compromise the web server.

To patch such a vulnerability, the user input must be

sanitized before it is embedded into the query. For example,

all quotes must be escaped within a quoted string such that

evasion is not possible. For MySQL, the PHP built-in function

mysql_real_escape_string() adds a preceding back-

slash to every single quote, double quote, and backslash to

neutralize their syntactical effect.

However, if the user input is not embedded into quotes

within the SQL query, no quotes are required for evasion

(see Listing 1). In this case, escaping is not sufﬁcient for

sanitization and the application is still vulnerable. To ﬁnd such

vulnerabilities, we not only have to model sanitization routines,

but also consider if they are applied to the right markup

context. A complementary way to prevent SQLi vulnerabilities

is to use prepared statements [37].

1 $id = mysql_real_escape_string($_GET['id']);

2 mysql_query("SELECT data FROM users WHERE id = $id");

Listing 1: Insufﬁcient sanitization of a SQL query.

2) Cross-Site Scripting: Cross-Site Scripting (XSS) [21]

is the most common security vulnerability in web applica-

tions [34]. It occurs when user input is reﬂected to the HTML

result of the application in an unsanitized way. It is then

possible to inject HTML markup into the response page that

is rendered by the client’s browser. An attacker can abuse

this behavior by embedding malicious code into the response

that for example locally defaces the web site or steals cookie

information.

To patch such a vulnerability, the output has to be validated.

Meta characters like < and > as well as quotes must be replaced

by their corresponding HTML entities (e. g., < and >).

The characters will still be displayed by the browser, but

not rendered as HTML markup. In PHP, the built-in function

htmlentities() can be used for output validation. As with

SQL queries, however, it is important to adjust sanitization to

the context of the HTML markup [16].

Listing 2 depicts a snippet of an application that is vulner-

able to XSS. Although sanitization is applied, the context of

the injection still allows an attacker to break the markup and

inject Javascript code. The function htmlentities() only

sanitizes the characters < and > as well as double quotes by

default. Note that the function does not sanitize single quotes.

Thus, an attacker can break the single quoted href-attribute

and inject an eventhandler that is attached to the link tag (e. g.,

’ onmouseover=’alert(1)). To encode single quotes to

the HTML entity ', the parameter ENT_QUOTES must

be added to the function htmlentities().

1 $page = htmlentities($_GET['page']);

2 echo "<a href='$page'>click</a>";

Listing 2: Insufﬁcient sanitization with htmlentities().

In our example, however, the application would still be vul-

nerable. Instead of breaking the markup, an attacker can abuse

the diversity of web browsers and inject a Javascript protocol

handler into the link (e. g., javascript:alert(1)). This

injection does not need any meta characters that are encoded

by htmlentities().

Note that there are several other scopes that need to be

considered when using sanitization. For example, when user

input is used within style and script tags, or within event-

handler attributes, additional sanitization is required. Previous

work missed to take the different scopes and their intrinsic

behaviors into account.

B. Intricacies of the PHP language

PHP is the fastest growing and most popular script lan-

guage for web applications. It is a highly dynamic language

with lots of complicated semantics [3] that are frequently used

by modern web applications [12]. In this section, we introduce

the most important language features our tool has to model

precisely in order to correctly identify the ﬂow of tainted data

into sensitive sinks. In particular, the ﬂow of tainted strings is

of interest for taint-style vulnerabilities.

1) Dynamic and Weak Typing: PHP is a dynamically typed

language and does not require an explicit declaration of vari-

ables. The variable type is inferred on the ﬁrst assignment at

runtime. Additionally, PHP is a weakly typed language and its

variables are not bound to a speciﬁc data type. Thus, data types

can be mixed with other data types at runtime. In Listing 3 the

string test is evaluated to 0 to ﬁt the mathematical operation

and added to 1. The integer result is stored in the variable

$var2 whose previous data type was string.

1 $var1 = 1; $var2 = 'test';

2 $var2 = $var1 + $var2; // 1

Listing 3: Addition of a string and an integer in PHP.

2) Variable Variables: Variables are usually introduced

with the dollar character followed by an alphanumeric, case-

sensitive name. However, in PHP the name can also be an

expression, for example retrieved from another variable or the

return value of a function call that is only known at runtime

(see Listing 4). This makes it extremely difﬁcult to analyze

the PHP language statically.

1 $name = "x"; $x = "test";

2 echo $$name; // test

3 $y = ${getVar()};

Listing 4: Variable variables in PHP.

3) Dynamic Arrays: Arrays are hash-tables that map num-

bers or strings (referred to as keys) to values. The key name

can be omitted when initializing an array and generated at

runtime (see Listing 5). Furthermore, keys and values can be

dynamic, as well as the array name itself. When performing

a static analysis, it is a challenge to precisely model such a

dynamic array structure and the dynamic access to it.

1 $var = 6;

2 $arr = array('a', "4" => $var, 'foo' => 'c', 'd');

3 $arr[] = 'e';

4 // Array ([0] => a [4] => 6 [foo] => c [5] => d [6] => e)

5 print $arr[$var]; // e

Listing 5: Dynamically generated key names in an array.

4) Dynamic Constants: In PHP, it is possible to deﬁne

constant scalar values as in other programming languages like

C. However, the constant name can be dynamically deﬁned by

the built-in function define() and dynamically accessed by

the built-in function constant(). Although a constant may

not change once it is deﬁned, it is possible to deﬁne constants

conditionally in the program ﬂow or dynamically generated

with user input.

5) Dynamic Functions: Several functions with the same

name can be deﬁned conditionally by the developer. Thus, a

totally different function may be called depending on the pro-

gram ﬂow. It is also possible to deﬁne a function B() within

another function A() that is only present during the execution

of A(). Further, the built-in functions func_get_arg()

and func_get_args() allow to dynamically fetch argu-

ments of the function call by index.

1 $name = 'step' .(int)$_GET['id'];

2 $name();

3 array_walk($arr = array(1), $name);

Listing 6: Dynamically built and executed function name.

Listing 6 illustrates two different possibilities to call a

function dynamically (Reﬂection). The function name is built

dynamically in line 1 and is only known at runtime. It is

called in line 2 by adding parenthesis to the variable $name

and used in line 3 as callback function. The built-in function

create_function() dynamically creates function code.

6) Dynamic Code: The eval operator and the built-in

function assert() allows to directly evaluate PHP code that

is passed as string to its ﬁrst argument. Other functions such

as preg_replace() allow the execution of dynamic PHP

code when used with certain modiﬁers. Dynamically generated

code is very challenging to analyze if the executed PHP code is

only known at runtime and cannot be reconstructed statically.

Furthermore, it introduces critical security vulnerabilities.

7) Dynamic Includes: The code of large PHP projects is

often split into several ﬁles and directories. At runtime, the

code can be merged and executed conditionally. The PHP

operator include opens a speciﬁed ﬁle, evaluates its PHP

code, and returns to the code after the include operator. It

can be used as expression within any other expression. Further-

more, the ﬁle name of an inclusion can be built dynamically

which implies that it is challenging to reconstruct it statically

in complex applications. During static analysis it is crucial to

resolve all ﬁle inclusions to analyze the PHP code correctly.

Additionally, tainted data within the ﬁle name leads to a File

Inclusion vulnerability.

剩余14页未读，继续阅读

评论收藏

内容反馈

皇天霸

粉丝: 35
资源: 4

Simulation_of Built-in_PHP_Features_for_Precise_Static_Code_Anal...

最新资源

Simulation_of Built-in_PHP_Features_for_Precise_Static_Code_Anal...

TD.rar_TD_TD-ADRC_TD-SCDMA物理层仿真程序_downlink simulation_td-scdma

四只鸭子半圆问题的蒙特卡洛模拟__Monte_carlo_simulation_of_four_d_-

gate-level_simulation_methodology.pdf.pdf

The_Art_of_Molecular_Dynamics_Simulation.pdf

F16Simulation.rar_3. F16simulation_F-16数据_simulink配平_飞机_飞机 配平

VANET-Simulation-in-MATLAB-master.rar_VANET-Simulation_in_matlab

interest_rate_simulation.zip_Ho-Lee_Model_MATLAB 利率_ho-lee_ho-le

OMNET_simulation.rar_OMNET_simulation_WSN_omnet_tinyos--wsn_wsn-

Flying-simulation.rar_Simulation _flying_flying-simulation_socke

MPPT_Modeling_INC-simulation_matlab_matlabsimulink_mpptmatlab_

IQ-MQAM.rar_MQAM_Simulation _This Just In.._modulation_mqam modu

IOT_Simulation_69448com_opnet_iot_陈敏iot仿真_IoTSimulation

L3M_DS_Space-Vehicle-Simulation_space_Simulation_vehicle_

ifix5.5全部电子手册

FACTS_.Modelling.and_.Simulation.in_.Power_.Netwo_FACTS_facts ma

ScienceDirect_articles_20Jun2020_14-46-53.043_research_matlab_

OSPF.rar_OSPF simulation_ospf_ospf-te_routing simulation_路由协议

ADRC_Simulation_with_s_function.zip_ADRC_ADRC MATLAB_微分跟踪器_状态观测

suzuki_simulation.zip_Rayleigh-Normal_SUZUKI_Suzuki_simulation_n

Ship_roll_-pitch_angle_simulation.rar_ship_ship simulation_船舶 仿真

IEEE802.15.4_opnet_simulation_model_v2.0.zip

sea-clutter_simulation.rar_clutter-simulation_matlab 海杂波_matlab海

IQ-2FSK.rar_2FSK IQ_Simulation _This Just In.._modulation

MATLAB_simulation_digital-baseband-transmission.ra_baseband_base

dcn_simulation_using_ns3-master_ns3计算机网络_ns3_计算机网络实验_

PDSCH_simulation.zip_PDSCH_mimo_pdsch code_pdsch mu-mimo

Multisim-simulation.rar_MULTISIM通信_Multisim-simulation_modulatio

laorange-Modeling_and_Simulation2021-v1.9.2.zip

5g_Simulation_MATLAB_MPF-BCQI-main_matlab_5G_

最新资源

F16Simulation.rar_3. F16simulation_F-16数据_simulink配平_飞机_飞机配平

Ship_roll_-pitch_angle_simulation.rar_ship_ship simulation_船舶仿真