# ImbalancedMLC
The influence of an imbalanced dataset on the efficacy of classification models has been thoroughly and extensively studied for single-label classification (SLC) problems. These are problems that require a binary classification output to predict the existence of a single class in a given input instance. In the domain of multi-label classification (MLC), a single input instance may have multiple classes associated with it. As a result of this intrinsic label concurrence, many of the dataset imbalance remedies applied to single-label classification problems are ineffective and possibly detrimental when applied to a multi-label dataset (MLD). For example, one of the most common SLC imbalance remedies is random oversampling; that is, randomly duplicating instances with associated labels that are uncommon. If we apply this naive approach to a MLD, we will likely duplicate undesired labels that are simultaneously present in the input instance, possibly even increasing the severity of the dataset imbalance.
This is an empirical analysis of the efficacy of several state-of-the-art methods on imbalanced Multilabel Classification datasets.
# Experimental Setup
In our experiments we use the 2017 Pascal VOC image database using a custom 90%/10% train/val dataset split,
manually partitioning the provided Pascal VOC train/val dataset. This results in 10337 train and 1153 validation images.
The label distribution can be seen in Figure 1. This includes counts of multiple occurrences of a single label in one input instance.
<img src="images/dist.png" width="500">
Our baseline model consists of an XceptionNet architecture with a final fully-connected layer with 20 outputs,
one for each label in the Pascal VOC label set. The network is initalized with ImageNet weights.
During training we apply several image augmentations to the input images including random flips, rotations,
shears, crops, gaussian blur, and contrast normalization. We also apply pixel-wise mean subtraction.
We use a batch size of 24 and stochastic gradient descent with Nesterov momentum of 0.9 at an initial learning rate of 0.01.
We train for 500 epochs, processing 100 batches per epoch. Our baseline model uses categorical cross entropy as a loss function.
To tackle the multilabel problem, we may either apply a sigmoidal or a two-dimensional softmax activation on each element of the output vector.
All network features are implemented with Keras.
# Methods
Below are some of the publicly available methods used in the experiments.
## Loss Functions
### Crossentropy
### Balanced Crossentropy
### Weighted Crossentropy
### Focal Loss (https://arxiv.org/abs/1708.02002)
### Dice Loss
## Sampling Algorithms
### ML-ROS (https://arxiv.org/abs/1802.05033)
### REMEDIAL (https://arxiv.org/abs/1802.05033)
没有合适的资源?快使用搜索试试~ 我知道了~
ImbalancedMLC:对不均衡的多标签分类数据集的几种最新方法的有效性的经验分析
共17个文件
py:5个
ipynb:4个
json:4个
4星 · 超过85%的资源 需积分: 42 10 下载量 112 浏览量
2021-05-16
19:25:51
上传
评论 2
收藏 6.38MB ZIP 举报
温馨提示
不平衡的MLC 对于单标签分类(SLC)问题,已经对不平衡数据集对分类模型的有效性的影响进行了彻底和广泛的研究。 这些问题需要二进制分类输出来预测给定输入实例中单个类的存在。 在多标签分类(MLC)的域中,单个输入实例可能具有与其关联的多个类。 由于这种固有的标签并发性,应用于单标签分类问题的许多数据集不平衡补救措施在应用于多标签数据集(MLD)时均无效且可能有害。 例如,最常见的SLC不平衡补救措施之一是随机过采样。 也就是说,随机复制具有不常见标签的实例。 如果将这种幼稚的方法应用于MLD,我们很可能会复制在输入实例中同时存在的不想要的标签,甚至可能加剧数据集失衡的严重性。 这是对不平衡的多标签分类数据集上几种最新方法功效的实证分析。 实验装置 在我们的实验中,我们使用2017 Pascal VOC图像数据库,该数据库使用自定义的90%/ 10%火车/ Val数据集拆分,手动分区提
资源详情
资源评论
资源推荐
收起资源包目录
ImbalancedMLC-master.zip (17个子文件)
ImbalancedMLC-master
train.py 8KB
train_baseline.ipynb 7.32MB
images
dist.png 116KB
dataset_metrics.ipynb 1.28MB
plot_results.ipynb 571KB
losses.py 2KB
train_utils
callbacks.py 2KB
utils.py 3KB
__init__.py 2B
.utils.py.swp 12KB
README.md 3KB
prepare_data.ipynb 17KB
annotations
categories.txt 135B
ML_ROS_10_train.json 936KB
REMEDIAL_train.json 1.25MB
val.json 92KB
train.json 829KB
共 17 条
- 1
EngleSEN
- 粉丝: 50
- 资源: 4502
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论3