# ImageNet Downloader
This is ImageNet dataset downloader. **You can create new datasets from subsets of ImageNet by specifying how many
classes you need and how many images per class you need.**
This is achieved by using image urls provided by ImageNet API.
[In this blog post](https://towardsdatascience.com/how-to-scrape-the-imagenet-f309e02de1f4) I wrote in a bit more detail how and why I wrote the tool. Also, I did a little analysis of the current state of the ImageNet URLs in the post.
This software is written in Python 3
## Usage
The following command will randomly select 100 of ImageNet classes with at least 200 images in them and start downloading:
```
python ./downloader.py \
-data_root /data_root_folder/imagenet \
-number_of_classes 100 \
-images_per_class 200
```
The following command will download 500 images from each of selected class:
```
python ./downloader.py
-data_root /data_root_folder/imagenet \
-use_class_list True \
-class_list n09858165 n01539573 n03405111 \
-images_per_class 500
```
You can find class list in [this csv](https://github.com/mf1024/ImageNet-datasets-downloader/blob/master/classes_in_imagenet.csv) where I list every class that appear in the ImageNet with number of total urls and total flickr urls it that class.
# Multiprocessing workers
I've implementet parallel request processing and I've added **multiprocessing_workers parameter** which by default is 8. You can turn it higher, but I havent yet tested the limits of flickr allowed bandwith myself, so use it with care and you will have to find the limits yourself if you want to go for the maximum speed.
You can do something like this:
```
python ./downloader.py \
-data_root /data_root_folder/imagenet \
-number_of_classes 1000 \
-images_per_class 500 \
-multiprocessing_workers 24
```
没有合适的资源?快使用搜索试试~ 我知道了~
ImageNet-Datasets-Downloader:ImageNet数据集下载器。 通过指定类中所需的类数和图像来创建自定...
共7个文件
txt:2个
py:2个
md:1个
需积分: 50 17 下载量 174 浏览量
2021-05-07
06:17:31
上传
评论
收藏 1.55MB ZIP 举报
温馨提示
ImageNet下载器 这是ImageNet数据集下载器。 您可以通过指定所需的类以及每个类需要多少个图像来从ImageNet的子集创建新的数据集。 这是通过使用ImageNet API提供的图像URL来实现的。 我更加详细地介绍了如何以及为何编写该工具。 另外,我对帖子中ImageNet URL的当前状态进行了一些分析。 该软件是用Python 3编写的 用法 以下命令将随机选择其中包含至少200张图像的100个ImageNet类,然后开始下载: python ./downloader.py \ -data_root /data_root_folder/imagenet \ -number_of_classes 100 \ -images_per_class 200 以下命令将从每个选定的类中下载500张图像: python ./downloader.py
资源推荐
资源详情
资源评论
收起资源包目录
ImageNet-Datasets-Downloader-master.zip (7个子文件)
ImageNet-Datasets-Downloader-master
prepare_stats.py 4KB
requirements.txt 49B
classes_in_imagenet.csv 635KB
README.md 2KB
words.txt 2.53MB
downloader.py 10KB
imagenet_class_info.json 1.95MB
共 7 条
- 1
资源评论
EngleSEN
- 粉丝: 50
- 资源: 4502
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功