Citation: Dai, X.; Cheng, G.; Yu, Z.;
Zhu, R.; Yuan, Y. MSLCFinder: An
Algorithm in Limited Resources
Environment for Finding Top-k
Elephant Flows. Appl. Sci. 2023, 13,
575. https://doi.org/10.3390/
app13010575
Academic Editor: Rubén
Usamentiaga
Received: 30 November 2022
Revised: 24 December 2022
Accepted: 27 December 2022
Published: 31 December 2022
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
MSLCFinder: An Algorithm in Limited Resources Environment
for Finding Top-k Elephant Flows
Xianlong Dai
1,2,3
, Guang Cheng
1,2,3,
* , Ziyang Yu
1,2,3
, Ruixing Zhu
1,2,3
and Yali Yuan
1,2,3
1
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
2
Jiangsu Province Engineering Research Center of Security for Ubiquitous Network, Nanjing 211189, China
3
Purple Mountain Laboratories, Nanjing 211189, China
* Correspondence: chengguang@seu.edu.cn
Featured Application: The results of this paper can be used in the fields related to network mea-
surement, especially in the fields of network traffic sampling, network traffic measurement, and
finding top-k elephant flows.
Abstract:
Encrypted traffic accounts for 95% of the total traffic in the backbone network environment
with Tbps bandwidth. As network traffic becomes more and more encrypted and link rates increase
in modern networks, the measurement of encrypted traffic relies more on collecting and analyzing
massive network traffic data that can be separated from the support of high-speed network traffic
measurement technology. Finding top-k elephant flows is a critical task with many applications in
congestion control, anomaly detection, and traffic engineering. Owing to this, designing accurate
and fast algorithms for online identification of elephant flows becomes more and more challenging.
Existing methods either use large-size counters, i.e., 20 bit, to prevent overflows when recording
flow sizes or require significant space overhead to measure the sizes of all flows. Thus, we adopt
a novel strategy, called count-with-uth-level-sampling,in this paper, to find top-k elephant flows in
limited resource environments. Moreover, the proposed algorithm, called MSLCFinder, incurs
lightweight counter and uth-level multi-sampling with small, constant processing for millions of
flows. Experimental results show that MSLCFinder can achieve more than 97% precision with an
extremely limited hardware resource. Compared to the state-of-the-art, our method realizes the
statistics and filtering of millions of data streams with less memory.
Keywords:
network measurement; top-
k
finding; elephant flow; MSLCFinder; data flow; traffic
sampling; network security; massive traffic
1. Introduction
Since the late 1960s, following the ARPANET’s birth and the development of network
technologies over five decades, the Internet has achieved great success. According to the
report [
1
], Internet users increased by 3.5 percent from October 2022, reaching 5.07 billion
as we enter the year’s final quarter. One hundred and seventy-one million new users over
the past 12 months have taken global Internet penetration to 63.5%. Global mobile users
have reached 5.48 billion, with smartphones accounting for almost four in five of the mobile
handsets in use today, with 68.6% of all the people on Earth now using some form of mobile
phone. The scale of global Internet users is unprecedented, the existing network traffic is
characterized by encryption, and the network link has entered the era of high speed. The
line rate in modern high-speed networks has reached hundreds of Gbps or multiple Tbps.
Encrypted traffic accounts for 95% of the total traffic in the backbone network environment
with Tbps bandwidth [
2
]. The different types of networks have mushroomed in our life
based on preeminent network infrastructure, such as IoT, Internet of Vehicles, 5G, cloud
computing, blockchain, satellite networks, etc. However, the network forms are diversified
Appl. Sci. 2023, 13, 575. https://doi.org/10.3390/app13010575 https://www.mdpi.com/journal/applsci