没有合适的资源?快使用搜索试试~ 我知道了~
了解 GitHub 上的促销即服务.pdf
0 下载量 94 浏览量
2024-05-08
13:38:18
上传
评论
收藏 3.53MB PDF 举报
温馨提示
![preview](https://dl-preview.csdnimg.cn/89280377/0001-30f6c5da7c5ccde7dd783f933f8c1e05_thumbnail.jpeg)
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
试读
14页
了解 GitHub 上的促销即服务.pdf
资源推荐
资源详情
资源评论
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083646.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/release/download_crawler_static/89280377/bg1.jpg)
Understanding Promotion-as-a-Service on GitHub
Kun Du
Tsinghua University
dk15@tsinghua.edu.cn
Hao Yang
Tsinghua University
h-yang@tsinghua.edu.cn
Yubao Zhang
University of Delaware
ybzhang@udel.edu
Haixin Duan*
Tsinghua University
QI-ANXIN Group
duanhx@tsinghua.edu.cn
Haining Wang
Virginia Tech
hnw@vt.edu
Shuang Hao
University of Texas at Dallas
shao@utdallas.edu
Zhou Li
University of California, Irvine
zhou.li@uci.edu
Min Yang
Fudan University
m_yang@fudan.edu.cn
ABSTRACT
As the world’s leading software development platform, GitHub has
become a social networking site for programmers and recruiters
who leverage its social features, such as star and fork, for career and
business development. However, in this paper, we found a group
of GitHub accounts that conducted promotion services in GitHub,
called “promoters”, by performing paid star and fork operations on
specied repositories. We also uncovered a stealthy way of tamper-
ing with historical commits, through which these promoters are
able to fake commits retroactively. By exploiting such a promotion
service, any GitHub user can pretend to be a skillful developer with
high inuence.
To understand promotion services in GitHub, we rst investi-
gated the underground promotion market of GitHub and identied
1,023 suspected promotion accounts from the market. Then, we
developed an SVM (Support Vector Machine) classier to detect pro-
motion accounts from all active users extracted from GH Archive
ranging from 2015 to 2019. In total, we detected 63,872 suspected
promotion accounts. We further analyzed these suspected promo-
tion accounts, showing that (1) a hidden functionality in GitHub
is abused to boost the reputation of an account by forging histor-
ical commits and (2) a group of small businesses exploit GitHub
promotion services to promote their products. We estimated that
suspicious promoters could have made a prot of $3.41 million and
$4.37 million in 2018 and 2019, respectively.
CCS CONCEPTS
• Security and privacy → Network security.
KEYWORDS
GitHub, Promoter Detection, Promotion-as-a-Service
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ACSAC 2020, December 7–11, 2020, Austin, USA
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8858-0/20/12.. . $15.00
https://doi.org/10.1145/3427228.3427258
ACM Reference Format:
Kun Du, Hao Yang, Yubao Zhang, Haixin Duan*, Haining Wang, Shuang
Hao, Zhou Li, and Min Yang. 2020. Understanding Promotion-as-a-Service
on GitHub. In Annual Computer Security Applications Conference (ACSAC
2020), December 7–11, 2020, Austin, USA. ACM, New York, NY, USA, 14 pages.
https://doi.org/10.1145/3427228.3427258
1 INTRODUCTION
GitHub was founded in 2008 and has now become the most im-
portant code management and sharing website. According to a
2019 GitHub Report [
18
], there are more than 40 million devel-
opers, more than 44 million repositories created, and 2.9 million
organizations within GitHub. In addition to being used as a code
repository, GitHub also integrates several functionality for online
socialization, resembling those of Facebook and Twitter. In fact,
developers can watch, star, and fork repositories of others, introduc-
ing social communications. By watching a repository, developers
can receive notications for new pull requests and issues that are
created. Starring a repository means a developer is interested in
this project and would pay sustained attention on it. Forking a
repository enables other developers to build their own repositories
based on the current one. This means the code in this repository can
be reused eectively. Those functions encourage the contribution
of high-quality code and lower the obstruction of developing new
open-source projects.
GitHub’s Impact on Job Recruiting.
Due to its prominent role
in the software community, the number of stars, watches, and forks
attached to a GitHub user or repository has been considered a strong
indicator of coding skills, and it is used as a metric when screening
job applicants. For example, Devskiller, a developer screening and
online interview platform, states that “Stars and forks are a sign
of good, usable code” and “good code is forked and starred a lot,
so pay attention to these elements” [
9
]. In Zhaopin [
50
], the most
popular online recruitment service provider in China, many job
advertisements related to software development require applicants
to have more than a certain number of stars on their owned reposi-
tories. The most common requirement is to own a repository with
at least 100 GitHub stars.
GitHub Abuse.
With such similar requirements, some develop-
ers attempt to manipulate the social statistics of their own GitHub
597
![](https://csdnimg.cn/release/download_crawler_static/89280377/bg2.jpg)
ACSAC 2020, December 7–11, 2020, Austin, USA Kun Du, Hao Yang, Yubao Zhang, Haixin Duan*, Haining Wang, Shuang Hao, Zhou Li, and Min Yang
accounts by purchasing stars and forks, which then boosts the un-
derground “Promotion-as-a-Service” business for GitHub that sells
stars and forks for prot. In fact, since 2018, there have been some
scattered reports about such fraudulent activities [
32
,
52
]. In 2019,
even SK Telecom, the biggest mobile service provider in Korea, is
reported abusing GitHub stars by giving free drinks to accounts
for starring a specied repository [
21
]. Although this Promotion-
as-a-Service on GitHub is another type of fraud and is not allowed
in most instances, so far, there is no systematic study on this issue,
not to mention a deep understanding of the problem’s scale and
fraudsters’ strategies.
Our Studies.
In this paper, we performed the rst large-scale
measurement and analysis of Promotion-as-a-Service on GitHub.
First, we crawled GitHub logs from 2015 to 2019 in GH Archive [
14
],
which is a project recording public user events on GitHub. The log
les consist of more than 20 event types, such as commits, forks,
watches, tickets, comments, and member changes. These events are
aggregated into buckets separated by hours. The total size of the
log les is 4.79TB.
Our rst task is to identify activities related to this type of fraud.
Although the problem is similar to crowdturng attacks, in which
human workers are paid to commit fraudulent online activities
for a buyer, existing detection systems like those used on social
networks [
36
,
42
,
49
,
51
] cannot be directly applied because the user
activities in GitHub are far more complicated than just “post,” “like,”
“follow,” and “comment”. For GitHub promoters, they have more
choices to conceal their promotion tracks by forking, watching,
issuing in a popular repository, or even faking updates to their
own repositories. As such, we decided to build a new detection
system tailored to this problem. To obtain ground-truth datasets,
we created a repository in GitHub with only a few script les and
then ordered 1,023 stars and forks by taking advantage of GitHub
promotion services. Tracing back from these paid stars and forks,
we identied a list of promotion accounts. The activity histories of
these promotion accounts were also extracted from log les that
we crawled from GH Archive.
After a pilot analysis, we trained an SVM (Support Vector Ma-
chine) classier by using the data related to these promotion ac-
counts and reputable GitHub accounts we sampled elaborately from
normal GitHub accounts. We applied the SVM classier on all of the
accounts extracted from the log les and detected 63,872 suspected
promotion accounts. We checked their homepages in GitHub and
found that a large ratio of suspected promotion accounts had not
yet been banned by GitHub during our study.
Next, we conducted a comprehensive analysis on these suspected
promotion accounts to understand how they operate and gain prot.
We analyzed the organization distribution of these suspected ac-
counts, clustering them into groups to understand their topological
structure and relations. Then we examined suspected promotion
accounts to check if they were banned by GitHub itself, and found
that most of them had not been detected yet. We further analyzed
the characteristics of fake stars and forks, prole, and the regis-
tration time of these suspected accounts, revealing more intrinsic
characteristics of these suspected promotion accounts.
Moreover, we identied dierent features between normal and
suspicious promoted repositories. Regarding the business and oper-
ational models of this GitHub fraud, there are two main interest-
ing observations. First, we witnessed that a hidden functionality
in GitHub can be abused to boost the reputation of a promotion
account by forging the time and frequency of historical commits.
Second, we observed that some software companies published parts
of their products’ source code or instructions on GitHub and paid
promotion services to boost these repositories, targeting the GitHub
trending list, in order to attract potential customers to purchase
their products.
Contributions. We summarize our main contributions below:
(1) We performed the rst comprehensive study on GitHub pro-
motion and uncovered the strategies used by suspicious promoters.
We found that stars and forks of a repository are not trustworthy
indicators of a developer’s coding skill, due to the use of fraudulent
promotion services. Based on the GitHub log data in GH Archive,
we estimate that suspicious promoters made a prot of about $3.41
million and $4.37 million in 2018 and 2019, respectively.
(2) We conducted a large-scale measurement on more than 40
million GitHub accounts by examining the data from 2015 to 2019.
We developed an SVM classier and trained it through the publicly
available user historical activity data. We evaluated our classier
using an F1-measure, achieving an accuracy of 99.1% on the ground-
truth dataset. We identied 63,872 suspected promotion accounts
from GitHub accounts.
(3) We shed new light on how this promotion service is operated.
We also disclosed a hidden functionality of GitHub that allows a
user to pretend to be a skillful developer retroactively. We reported
this type of abuse to GitHub, and they indicated that they would
pass our request on to the right team for remediation.
The rest of this paper is organized as follows. Section 2 briefs the
background of GitHub. Section 3 elaborates on how we processed
data and built an SVM classier to detect promotion accounts. Sec-
tion 4 presents the large-scale measurement study to uncover the
characteristics of suspected promotion accounts. Section 5 demon-
strates how promoters help their clients to forge a hard-working
account by tampering with historical commits and how small busi-
nesses exploit GitHub to promote their products. Section 6 discusses
related issues and possible countermeasures. Section 7 surveys re-
lated works, and nally, Section 8 concludes our work.
2 BACKGROUND
GitHub provides a web-based hosting service for code hosting and
version control by using Git. Therefore, it oers all the functions of
Git (i.e., distributed version control and source code management) as
well as its own features, including access control and collaboration
features [
6
]. GitHub has dierent kinds of plans for enterprise and
individual users. In general, an individual developer prefers to use
a free account to host open-source or private repositories. It was
reported that GitHub has more than 40 million developers as of
December 2019 [18].
Since GitHub has social-networking functions, e.g., starring and
forking, it facilitates social interactions among developers. The
number of a repository’s stars and forks indicates a developer’s
skills to some extent. During job screenings, a candidate could
598
![](https://csdnimg.cn/release/download_crawler_static/89280377/bg3.jpg)
Understanding Promotion-as-a-Service on GitHub ACSAC 2020, December 7–11, 2020, Austin, USA
Figure 1: Implications of watch, star, and fork
be evaluated in part based on the number of stars and forks of
her repositories. Therefore, promotion services have emerged and
been exploited by developers to promote their repositories with
paid stars and forks, especially for job screenings. In this section,
we will explain how stars and forks work in GitHub, how GitHub
promotion services operate and make a prot. At the end, we will
also discuss the problem scope of this work.
2.1 How Stars and Forks Work on GitHub
Generally speaking, starring a repository is considered a techni-
cal endorsement on the repository. Therefore, the code quality of
a repository has a positive correlation with the number of stars.
Developers can obtain a considerable number of stars by writing
code with a superior quality [
13
]. Starring a repository also helps a
GitHub user keep track of changes. Figure 1 shows the user interface
of these features.
Forking a repository is similar to creating a copy, which allows
developers to modify code without directly changing the original
repository. After forking a repository, developers can either propose
changes to the repository that will be reviewed later or create a new
project based on it. Therefore, the number of forks can indicate the
popularity and re-usability of a repository.
The focus of this work is on stars and forks in a repository,
because these two operations are abused by dishonest developers
to increase their career prospects in software development [
32
,
52
].
Other GitHub social-networking functions such as “watch” and
“follow” are not considered, as to the best of our knowledge, they
have not been used as factors for job screenings so far.
2.2 How GitHub Promotion Operates
We observed the GitHub promotion services from search engines,
public websites, web blogs, online shops, and instant messaging
(IM) tools including Telegram, QQ, and WeChat (the last two are
mainly used in China). The underground market in Darknet was
also included for this purpose
1
. Here we focus on the entities
behind promotions and make the following observations.
First, there are a small number of merchants selling GitHub
accounts in the Darknet. One of them claimed that these selling
accounts can be successfully logged into and even oered a lifetime
warranty. The price is about $2.07 per account, which implies that
1
We examined the most famous market, Dream Market, in Darknet.
it is not hard for promotion service providers to set up enough
promotion accounts and operate for a long time in GitHub.
Second, in addition to individual sellers, we also discovered a
few websites dedicated to GitHub promotion services. One of them
is called GitStar [
23
], which serves as a platform for users to ex-
change their stars and forks. We inspected the website and found
some interesting characteristics, which are described below: (1)
This website is not opened on the publicly known port 80 but 88.
We speculate that the website owner attempted to avoid public
attention. Moreover, we queried the domain name
gitstar.top
record in passive DNS provided by Farsight [
12
] and found that
it had pointed to various IP addresses over time. This shows that
the site migrates more often than normal ones. (2) This website
enforces web cloaking. We received no response when we visited
the website from IP addresses out of China. This shows that the
main customers of this website are located in China. (3) During the
registration process, the website requires the same username as
the one used in GitHub and checks the ownership by asking the
registrant to star a famous repository [
19
] in GitHub. After registra-
tion, developers can post the repositories that solicit stars and forks.
GitStar acts as a bulletin board that lists all of these repositories.
All members of GitStar can star and fork the repositories listed,
regardless of the repository’s content or quality. The publisher is
supposed to return the favor to other repositories when receiving
stars or forks. If not, the website will treat it as an “owe.” All of the
owe information is open and members of the website can deter-
mine if a repository is worthy of starring or forking by checking
the publisher’s owe information. (4) We also observed that GitHub
API is leveraged by GitStar to query an account’s star and fork
information. During the registration process, GitStar can check
the ownership of an account by examining if the account indeed
performed the star operation on the repository required. During
operation step, if user
A
has nished a star operation on user
B
’s
repository, GitStar is able to validate whether user
B
performs the
same on
A
’s repository through GitHub API. This is another abuse
of GitHub API in this type of blackhat promotion.
Third, there are also a number of IM groups through which
GitHub users can exchange stars and forks for free or prot. We
identied more than 20 groups by searching “GitHub star each
other” (translated from Chinese) in QQ and WeChat (IM tools like
whatsapp). The largest group has more than 1,020 members and
charges a “membership” fee. To investigate the market, we paid
$1.49 to join the group. The owner of this group can earn more
than $1,520 by just collecting the “membership” fees. We also joined
three other IM groups for more information and comparison. After
monitoring these groups for more than one year, we found that
on average, there were about 20 repositories asking for promo-
tion every day. About 20 to 30 members in the chat group actively
conrmed that they had given stars or forks for the promoted repos-
itories. We also contacted the users in these groups who operated
promotion services and found that it costs $0.40 to purchase one
star and $0.50 to purchase one fork.
Fourth, there are a few online shops selling GitHub stars and
forks (e.g., the websites illustrated in [
15
] and [
35
]). We found that
the online shops in dierent locations preferred dierent charging
modes. In China, operators preferred online third-party payment
like Alipay or WeChat Pay, although this type could be tracked and
599
剩余13页未读,继续阅读
资源评论
![avatar-default](https://csdnimg.cn/release/downloadcmsfe/public/img/lazyLogo2.1882d7f4.png)
![avatar](https://profile-avatar.csdnimg.cn/68ef26bd67034c68b8d314222b3e4014_weixin_41429382.jpg!1)
百态老人
- 粉丝: 2129
- 资源: 2万+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)