# AVClass
AVClass is a malware labeling tool.
You give it as input the AV labels for a large number of
malware samples (e.g., VirusTotal JSON reports) and it outputs the most
likely family name for each sample that it can extract from the AV labels.
It can also output a ranking of all alternative names it found for each sample.
The design and evaluation of AVClass is detailed in our
[RAID 2016 paper](https://software.imdea.org/~juanca/papers/avclass_raid16.pdf):
> Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero.
AVClass: A Tool for Massive Malware Labeling.
In Proceedings of the International Symposium on Research in
Attacks, Intrusions and Defenses,
September 2016.
In a nutshell, AVClass comprises two phases:
preparation (optional) and labeling.
Code for both is included,
but most users will be only interested in the labeling, which outputs the
family name for the samples.
The preparation produces a list of aliases and generic tokens
used by the labeling.
If you use our default aliases and generic tokens lists,
you do not need to run the preparation.
## Labeling
The labeler takes as input
a JSON file with the AV labels of malware samples (-vt or -lb options),
a file with generic tokens (-gen option),
and a file with aliases (-alias option).
It outputs the most likely family name for each sample.
If you do not provide alias or generic tokens files,
the default ones in the *data* folder are used.
```shell
$./avclass_labeler.py -lb ../examples/malheurReference_lb.json -v > malheurReference.labels
```
The above command labels the samples whose AV labels are in the
*../examples/malheurReference_lb.json* file.
It prints the results to stdout,
which we redirect to the *malheurReference.labels* file.
The output looks like this:
```
aca2d12934935b070df8f50e06a20539 adrotator
67d15459e1f85898851148511c86d88d adultbrowser
```
which means sample aca2d12934935b070df8f50e06a20539 is most likely
from the *adrotator* family and
67d15459e1f85898851148511c86d88d from the *adultbrowser* family.
The verbose (-v) option makes it output an extra
*malheurReference_lb.verbose* file
with all families extracted for each sample ranked by the number of AV
engines that use that family.
The file looks like this:
```
aca2d12934935b070df8f50e06a20539 [(u'adrotator', 8), (u'zlob', 2)]
ee90a64fcfaa54a314a7b5bfe9b57357 [(u'swizzor', 19)]
f465a2c1b852373c72a1ccd161fbe94c SINGLETON:f465a2c1b852373c72a1ccd161fbe94c
```
which means that for sample aca2d12934935b070df8f50e06a20539
there are 8 AV engines assigning *adrotator* as the family and
another 2 assigning *zlob*.
Thus, *adrotator* is the most likely family.
On the other hand, for ee90a64fcfaa54a314a7b5bfe9b57357 there are 19 AV
engines assigning *swizzor* as family,
and no other family was found.
The last line means that for sample f465a2c1b852373c72a1ccd161fbe94c
no family name was found in the AV labels.
Thus, the sample is placed by himself in a singleton cluster
with the name of the cluster being the sample's hash.
Note that the sum of the number of AV engines may not equal the number
of AV engines with a label for that sample in the input file
because the labels of some AV engines may only include generic tokens
that are removed by AVClass.
## Input JSON format
AVClass supports three input JSON formats:
1. VirusTotal v2 API JSON reports (*-vt file*),
where each line in the input *file* should be the full JSON of a
VirusTotal v2 API response to the */file/report* endpoint,
e.g., obtained by querying https://www.virustotal.com/vtapi/v2/file/report?apikey={apikey}&resource={hash}
There is an example VirusTotal v2 input file in examples/vtv2_sample.json
2. VirusTotal v3 API JSON reports (*-vt file -vt3*),
where each line in the input *file* should be the full JSON of a VirusTotal API version 3 response with a *File* object report,
e.g., obtained by querying https://www.virustotal.com/api/v3/files/{hash}
There is an example VirusTotal v3 input file in examples/vtv3_sample.json
3. Simplified JSON (*-lb file*),
where each line in *file* should be a JSON
with (at least) these fields:
{md5, sha1, sha256, av_labels}.
There is an example of such input file in *examples/malheurReference_lb.json*
**Multiple input files**
AVClass can handle multiple input files putting the results in the same output files
(if you want results in separate files, process each input file separately).
It is possible to provide the -vt and -lb input options multiple times.
```shell
$./avclass_labeler.py -vt <file1> -vt <file2>
```
```shell
$./avclass_labeler.py -lb <file1> -lb <file2>
```
There are also -vtdir and -lbdir options that can be used to provide
an input directory where all files are VT (-vtdir) or simplified (-lbdir) JSON reports:
```shell
$./avclass_labeler.py -vtdir <directory>
```
It is also possible to combine -vt with -vtdir and -lb with -lbdir,
but you cannot combine input files of different format. Thus, this command works:
```shell
$./avclass_labeler.py -vt <file> -vtdir <directory>
```
But, this one throws an error:
```shell
$./avclass_labeler.py -vt <file1> -lb <file2>
```
## Labeling: Family Ranking
AVClass has a -fam option to output a file with a ranking of the
families assigned to the input samples.
```shell
$./avclass_labeler.py -lb ../examples/malheurReference_lb.json -v -fam > malheurReference.labels
```
will produce a file called *malheurReference_lb.families* with two columns:
```
virut 441
allaple 301
podnuha 300
```
indicating that 441 samples were classified in the virut family,
301 as allaple, and 300 as podnuha.
This option is very similar to using the following shell command:
```shell
$cut -f 2 malheurReference.labels | sort | uniq -c | sort -nr
```
The main difference is that using the -fam option all SINGLETON samples,
i.e., those for which no label was found,
are grouped into a fake *SINGLETONS* family,
while the shell command would leave each singleton as a separate family.
## Labeling: PUP Classification
AVClass also has a -pup option to classify a sample as
Potentially Unwanted Program (PUP) or malware.
This classification looks for PUP-related keywords
(e.g., pup, pua, unwanted, adware) in the AV labels and was proposed in our
[CCS 2015 paper](https://software.imdea.org/~juanca/papers/malsign_ccs15.pdf):
> Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero.
Certified PUP: Abuse in Authenticode Code Signing.
In Proceedings of the 22nd ACM Conference on Computer and Communication Security, Denver, CO, October 2015
```shell
$./avclass_labeler.py -lb ../examples/malheurReference_lb.json -v -pup > malheurReference.labels
```
With the -pup option the output of the *malheurReference.labels* file
looks like this:
```
aca2d12934935b070df8f50e06a20539 adrotator 1
67d15459e1f85898851148511c86d88d adultbrowser 0
```
The digit at the end is a Boolean flag that
indicates sample aca2d12934935b070df8f50e06a20539 is
(likely) PUP, but sample 67d15459e1f85898851148511c86d88d is (likely) not.
In our experience the PUP classification is conservative,
i.e., if it says the sample is PUP, it most likely is.
But, if it says that it is not PUP, it could still be PUP if the AV labels
do not contain PUP-related keywords.
Note that it is possible that some samples from a family get
the PUP flag while other samples from the same family do not
because the PUP-related keywords may not appear in the labels of
all samples from the same family.
To address this issue, you can combine the -pup option with the -fam option.
This combination will add into the families file the classification of the
family as malware or PUP, based on a majority vote among the samples in a
family.
```shell
$./avclass_labeler.py -lb ../examples/malheurReference_lb.json -v -pup -fam > malheurReference.labels
```
will produce a file called *malheurReference_lb.families* with five columns:
```
# Family Total Malware PUP FamTyp
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
<项目介绍> 该资源内项目源码是个人的毕设,代码都测试ok,都是运行成功后才上传资源,答辩评审平均分达到96分,放心下载使用! 1、该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的,请放心下载使用! 2、本项目适合计算机相关专业(如计科、人工智能、通信工程、自动化、电子信息等)的在校学生、老师或者企业员工下载学习,也适合小白学习进阶,当然也可作为毕设项目、课程设计、作业、项目初期立项演示等。 3、如果基础还行,也可在此代码基础上进行修改,以实现其他功能,也可用于毕设、课设、作业等。 下载后请首先打开README.md文件(如有),仅供学习参考, 切勿用于商业用途。 -------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
资源推荐
资源详情
资源评论
收起资源包目录
Graduation-master (5).zip (55个子文件)
Graduation-master
Grade1
avclass-master
LICENSE 1KB
avclass
lib
avclass_common.py 11KB
data
default.generics 3KB
default.aliases 9KB
avclass_generic_detect.py 2KB
avclass_alias_detect.py 3KB
avclass_labeler.py 16KB
README.md 15KB
examples
malheurReference_gt.tsv 151KB
malheurReference_lb.json 4.7MB
vtv3_sample.json 16KB
vtv2_sample.json 24KB
avclass2
lib
avclass2_common.py 22KB
avclass2_input_checker.py 2KB
data
misp
galaxy
avclass2.json 210B
cluster
avclass2.json 270KB
default.tagging 22KB
andropup.expansion 7KB
default.taxonomy 15KB
default.expansion 308B
avclass2_labeler.py 17KB
README.md 10KB
avclass2_update_module.py 17KB
.gitignore 6B
shared
evaluate_clustering.py 4KB
README.md 8KB
src
3_copy2family.py 2KB
1_mal2json.py 2KB
json
allsample.json 74.09MB
Malconv
src
utils.py 2KB
model.py 2KB
data
test_lable.csv 180KB
train_lable.csv 946KB
k_fold_tr_test.py 10KB
train.py 8KB
config
example 1KB
Bili
src
model.py 581B
test.py 3KB
log
events.out.tfevents.1630633763.HIT-Server.39000.0 5KB
src
prepare
2_copy2mal.py 3KB
4_createCSV.py 969B
3_createTrainTest.py 2KB
.idea
Graduation.iml 772B
webServers.xml 602B
other.xml 233B
vcs.xml 255B
misc.xml 369B
inspectionProfiles
profiles_settings.xml 174B
modules.xml 272B
deployment.xml 713B
.gitignore 176B
sshConfigs.xml 301B
remote-mappings.xml 297B
output.csv 2.47MB
README.md 92B
共 55 条
- 1
资源评论
机智的程序员zero
- 粉丝: 2014
- 资源: 4228
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功