'''
This code is a data preprocessing script that is used for creating a dataset to train, validate and test an image classification model. Specifically, it creates a dataset of fruit images, including Apples, Bananas, Grapes, Mangos, and Strawberries.
The script first defines the class names and number of images per class by counting the number of files in the directory for each fruit. It then creates three directories, one for each dataset split, i.e., training, validation, and testing. These directories are created if they don't exist.
The script then creates subdirectories in each of the three main directories for each fruit class. This step creates a hierarchical structure for the dataset, where each fruit class has a folder in each dataset split directory.
Next, the script collects all the image paths for each fruit class, storing them in a list. It then shuffles the paths randomly to ensure that the data is not biased in any way.
The script then calculates the size of each dataset split based on the total number of images and predefined ratios. The training dataset is the largest, comprising 97% of the total data, while the validation and testing datasets comprise 2% and 1%, respectively.
The script then creates three separate lists, one for each dataset split, containing tuples with the old and new paths for each image. The new paths are generated by concatenating the class folder, the image filename, and the corresponding dataset split directory.
Finally, the script moves each image from its old path to its new path in the appropriate dataset split directory using the os.rename() function. It then removes the old directories for each fruit class, as they are no longer needed.
The script outputs the total size of the data, as well as the size of each dataset split. Once the script finishes running, the dataset is ready to use for training, validating, and testing an image classification model.
'''
import os
import numpy as np
from glob import glob
from tqdm import tqdm
# Define class names and number of images per class
class_names = ['Apple', 'Banana', 'Grape', 'Mango', 'Strawberry']
n_images_per_class = len(os.listdir(f"./{class_names[0]}"))
# Define train, valid, and test directories and create them if they don't exist
train_dir = "./train"
valid_dir = "./valid"
test_dir = "./test"
for directory in [train_dir, valid_dir, test_dir]:
if not os.path.exists(directory):
os.makedirs(directory)
# Create subdirectories for each class in train, valid, and test directories
for name in class_names:
for directory in [train_dir, valid_dir, test_dir]:
class_path = os.path.join(directory, name)
if not os.path.exists(class_path):
os.makedirs(class_path)
# Collect all image paths for each class
all_class_paths = [glob(f"./{name}/*") for name in class_names]
# Define training, validation, and testing size
total_size = sum([len(paths) for paths in all_class_paths])
train_ratio = 0.97
valid_ratio = 0.02
test_ratio = 0.01
train_size = int(total_size * train_ratio)
valid_size = int(total_size * valid_ratio)
test_size = int(total_size * test_ratio)
train_images_per_class = int(n_images_per_class * train_ratio)
valid_images_per_class = int(n_images_per_class * valid_ratio)
test_images_per_class = int(n_images_per_class * test_ratio)
print("Total Data Size : {}".format(total_size))
print("Training Size : {}".format(train_size))
print("Validation Size : {}".format(valid_size))
print("Testing Size : {}\n".format(test_size))
# Shuffle image paths for each class
for paths in all_class_paths:
np.random.shuffle(paths)
# Define lists of (old_path, new_path) tuples for training, validation, and testing images
train_images = [(path, os.path.join(train_dir, path.split('/')[-2], path.split('/')[-1])) for paths in all_class_paths for path in paths[:train_images_per_class]]
valid_images = [(path, os.path.join(valid_dir, path.split('/')[-2], path.split('/')[-1])) for paths in all_class_paths for path in paths[train_images_per_class: train_images_per_class + valid_images_per_class]]
test_images = [(path, os.path.join(test_dir, path.split('/')[-2], path.split('/')[-1])) for paths in all_class_paths for path in paths[train_images_per_class+valid_images_per_class: train_images_per_class + valid_images_per_class + test_images_per_class]]
# Move images to their new directories
for images, data_type in [(train_images, "Training"), (valid_images, "Validation"), (test_images, "Testing")]:
for (old_path, new_path) in tqdm(images, desc=data_type + " Data"):
os.rename(old_path, new_path)
# Remove the old directories
for directory in class_names:
os.rmdir("./" + directory)
# Print confirmation message
print("ALL DONE!!")
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
计算机视觉中的丰富的甜美果实图片数据集 数据说明: 水果分类数据集是用于训练和测试计算机视觉模型的各种水果的图像的集合。数据集包括五种不同类型的水果: 苹果 芭蕉 葡萄 芒果 草莓 每个类包含2000张图像,导致数据集中总共有10.000张图像。 数据集中的图像具有各种形状、大小和颜色,并且是在不同的照明条件下捕获的。该数据集可用于训练和测试执行对象检测、图像分类和分割等任务的模型。 该数据集可用于各种研究项目,如开发和测试新的图像分类算法,以及对现有算法进行基准测试。该数据集还可用于训练机器学习模型,这些模型可用于现实应用中,例如在农业行业中用于水果分级和分拣。 总的来说,水果分类数据集是计算机视觉领域的研究人员和开发人员的宝贵资源,它的可用性将有助于推进图像分析和分类的新算法和技术的发展。
资源推荐
资源详情
资源评论
收起资源包目录
计算机视觉中的丰富的甜美果实图片数据集 (2000个子文件)
Grape (235).jpeg 22KB
Grape (1872).jpeg 22KB
Grape (283).jpeg 20KB
Grape (1815).jpeg 20KB
Grape (229).jpeg 20KB
Grape (189).jpeg 18KB
Grape (1788).jpeg 18KB
Grape (1708).jpeg 18KB
Grape (666).jpeg 18KB
Grape (1014).jpeg 18KB
Grape (252).jpeg 18KB
Grape (1822).jpeg 18KB
Grape (1361).jpeg 18KB
Grape (43).jpeg 18KB
Grape (1461).jpeg 18KB
Mango (523).jpeg 17KB
Grape (1674).jpeg 17KB
Grape (598).jpeg 17KB
Grape (378).jpeg 17KB
Grape (1630).jpeg 17KB
Grape (607).jpeg 17KB
Grape (2010).jpeg 17KB
Grape (1974).jpeg 17KB
Grape (259).jpeg 17KB
Grape (514).jpeg 17KB
Mango (703).jpeg 17KB
Grape (308).jpeg 17KB
Grape (1351).jpeg 17KB
Mango (1585).jpeg 17KB
Grape (1446).jpeg 17KB
Grape (933).jpeg 17KB
Grape (1821).jpeg 17KB
Grape (1813).jpeg 16KB
Grape (328).jpeg 16KB
Grape (894).jpeg 16KB
Grape (1699).jpeg 16KB
Grape (1547).jpeg 16KB
Grape (585).jpeg 16KB
Grape (1886).jpeg 16KB
Grape (682).jpeg 16KB
Grape (1425).jpeg 16KB
Grape (1501).jpeg 16KB
Grape (1350).jpeg 16KB
Mango (931).jpeg 16KB
Grape (1337).jpeg 16KB
Grape (325).jpeg 16KB
Grape (640).jpeg 16KB
Grape (1342).jpeg 16KB
Grape (1322).jpeg 16KB
Grape (1369).jpeg 16KB
Mango (111).jpeg 16KB
Grape (1936).jpeg 16KB
Grape (1349).jpeg 16KB
Grape (1399).jpeg 16KB
Grape (1782).jpeg 16KB
Grape (1693).jpeg 15KB
Grape (1463).jpeg 15KB
Grape (1477).jpeg 15KB
Grape (590).jpeg 15KB
Grape (829).jpeg 15KB
Grape (1392).jpeg 15KB
Grape (1660).jpeg 15KB
Grape (1015).jpeg 15KB
Mango (545).jpeg 15KB
Grape (579).jpeg 15KB
Grape (1428).jpeg 15KB
Grape (1344).jpeg 15KB
Grape (2004).jpeg 15KB
Grape (1336).jpeg 15KB
Grape (1865).jpeg 15KB
Grape (1901).jpeg 15KB
Grape (1577).jpeg 15KB
Grape (1224).jpeg 15KB
Grape (130).jpeg 15KB
Grape (1510).jpeg 15KB
Grape (339).jpeg 15KB
Mango (61).jpeg 15KB
Grape (960).jpeg 15KB
Grape (1341).jpeg 15KB
Grape (40).jpeg 15KB
Grape (528).jpeg 15KB
Grape (1478).jpeg 15KB
Grape (965).jpeg 15KB
Grape (155).jpeg 15KB
Grape (1854).jpeg 15KB
Grape (549).jpeg 15KB
Grape (1827).jpeg 15KB
Grape (1476).jpeg 15KB
Grape (1993).jpeg 15KB
Grape (928).jpeg 15KB
Grape (1355).jpeg 15KB
Grape (634).jpeg 15KB
Mango (1860).jpeg 15KB
Grape (256).jpeg 15KB
Grape (1189).jpeg 15KB
Mango (1696).jpeg 15KB
Grape (1984).jpeg 15KB
Grape (543).jpeg 15KB
Grape (1853).jpeg 15KB
Grape (1160).jpeg 15KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
地理探险家
- 粉丝: 995
- 资源: 5416
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功