PeerSorter:
Classifying Generic P2P Traffic in Real-time
Jie He, Yuexiang Yang, Xiaolei Wang
College of Computer
National University of Defense Technology
Changsha, China
{hejie, yyx, wangxiaolei}@nudt.edu.cn
Yingzhi Zeng, Chuan Tang
Information Center
National University of Defense Technology
Changsha, China
{zengyingzhi, tangchuan}@nudt.edu.cn
Abstract—The rapid development of Peer-to-Peer (P2P)
technology brings challenges to quality of service (QoS), network
planning and access control. An accurate classification of P2P
traffic is vital for addressing those challenges. Traditional port-
based and payload-based methods fail to cope with emerging port
disguise and payload encryption techniques. In this paper, we
present PeerSorter, a system for the classification of generic P2P
traffic in real-time. PeerSorter is featured by four characteristics.
Firstly, it can accurately classify nearly all kinds of legitimate
P2P applications as well as various P2P botnets, by building
application profiles of their significant network activity patterns.
Moreover, PeerSorter is capable of real-time processing, because
of its simplicity of mechanism and small classification time
windows. In addition, PeerSorter can be readily extended by
adding profiles of new P2P applications. Finally, PeerSorter can
work well even in the scenario where the classification target is
running along with other bandwidth consumer (including P2P
applications) at the same time. We evaluate the performance of
PeerSorter on traffic datasets of a large variety of P2P
applications, including two popular P2P botnets. The
experimental results demonstrate that we can classify all the
considered types of P2P traffic with an average true positive rate
of 97.83% and an average false positive rate below 0.04% within
2 minutes.
Keywords—traffic classification; peer to peer; real-time; botnet
I. INTRODUCTION
P2P technology has been widely applied in file sharing,
media streaming, instant messaging and other fields since its
emergence in the 90s. As P2P networks become increasingly
complex and the volume of their traffic grows rapidly, network
administrators are constantly demanding efficient and
dependable engines to resolve emerging issues with bandwidth,
security and management [1]. For better management and
control of P2P traffic, it is vital to accurately identify and
classify the P2P applications generating the traffic in real-time.
In addition, reliable classification of P2P traffic is conducive to
the accuracy of network-based intrusion detection system (IDS)
[2]. Furthermore, it would be feasible to detect P2P malware
through P2P traffic classification.
A number of methods have been proposed to classify P2P
traffic, including port-based, payload-based and in-the-dark
methods [3]. Traditional port-based method [4] inspects the
port number in packet headers and identifies P2P applications
according to the standard port numbers assigned by IANA.
However, this approach has become unreliable with the
emergence of port disguise technique [5]. The payload-based
method, which is also known as deep packet inspection (DPI)
method, examines the payload of packets and extracts
distinctive signatures from each P2P application [6]–[9]. This
kind of method also gradually begins to fail since more and
more P2P applications are adopting the technique of payload
encryption [3]. The in-the-dark method [10]–[15] conducts P2P
traffic identification based on statistical features extracted from
transport layer data or host network behaviors. This kind of
method seem to be very promising for detecting unknown and
encrypted P2P traffic accurately without inspecting the port
number and payload content.
Unfortunately, most of these studies mainly focus on
identifying P2P traffic apart from non-P2P traffic. Only a small
fraction of them address the problem of fine-grained P2P traffic
classification, which still limit their classification to a few
types of P2P applications. To the best of our knowledge, there
are very few systems dedicated to generic classification of a
large variety of P2P traffic and there are even fewer systems
that can classify various legitimate P2P applications as well as
P2P botnets simultaneously. Moreover, many works to date
have focused on improving the accuracy of traffic classifiers by
employing complicated features and sophisticated algorithms,
without considering their efficiency. As a result, most of these
classifiers are incapable of real-time processing due to their
complexity and long detection time windows.
In this paper we propose PeerSorter, a real-time and generic
P2P traffic classification system, which exclusively on several
basic properties of flow records with no access to port numbers
or payload content of individual packets. In contrast to
contemporary works, PeerSorter has four characteristics as
follows. First, based on the most fundamental properties of P2P
protocols, PeerSorter is able to accurately classify the traffic
generated by a variety of P2P applications, including common
file-sharing applications such as BitTorrent, eMule, Thunder,
etc., P2P-TV platforms such as PPTV, CNTV, etc., P2P VoIP
applications such as Skype, and even P2P botnets such as
Storm [20] and Waledac [21], even if their traffic is encrypted.
Theoretically, nearly all kinds of benign P2P applications and
malicious P2P malwares can be classified by PeerSorter.
Second, different from previous in-the-dark approaches,
This work was supported by the National Natural Science Foundation of
China under Grant No. 61170286 and NO. 61202486.
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605
2014 IEEE 17th International Conference on Computational Science and Engineering
978-1-4799-7981-3/14 $31.00 © 2014 IEEE
DOI 10.1109/CSE.2014.134
605