基于DQN深度强化学习解决三维在线装箱问题python源码+项目说明.zip

共10个文件

py：5个

png：2个

fig1：1个

版权申诉

python开发

深度学习

毕业设计

课程设计

5星 · 超过95%的资源 193 浏览量 2023-08-30 13:49:02 上传评论 3 收藏 5.64MB ZIP 举报

【资源介绍】基于DQN深度强化学习解决三维在线装箱问题python源码+项目说明.zip 问题描述物流公司在流通过程中，需要将打包完毕的箱子装入到一个货车的车厢中，为了提高物流效率，需要将车厢尽量填满，显然，车厢如果能被100%填满是最优的，但通常认为，车厢能够填满85%，可认为装箱是比较优化的。设车厢为长方形，其长宽高分别为L，W，H；共有n个箱子，箱子也为长方形，第i个箱子的长宽高为li，wi，hi（n个箱子的体积总和是要远远大于车厢的体积），做以下假设和要求： 1. 长方形的车厢共有8个角，并设靠近驾驶室并位于下端的一个角的坐标为（0,0,0），车厢共6个面，其中长的4个面，以及靠近驾驶室的面是封闭的，只有一个面是开着的，用于工人搬运箱子； 2. 需要计算出每个箱子在车厢中的坐标，即每个箱子摆放后，其和车厢坐标为（0,0,0）的角相对应的角在车厢中的坐标，并计算车厢的填充率。运行环境主机 |内存 | 显卡 | IDE | Python | torch -----|------|------|-----|--------|----- CPU：12th Gen Intel(R) Core (TM) i7-12700H 2.30 GHz | 6GB RAM | NVIDIA GEFORCE RTX 3050 | Pycharm2022.2.1 | python3.8 | 1.13.0 思路（1）箱子到来后，根据车厢的实际空间情况，按照策略选择放置点；（2）当摆放箱子时，以6种姿态摆放，并对其进行评估，使用评估值最高的姿态将箱子摆放在选中的角点上；（3）重复以上步骤，直到摆放完毕。建立模型在车厢内部设置坐标系，靠近驾驶室并位于下端的一个角的坐标为（0,0,0），相交于原点的车厢长边、宽边和高边分别为x轴，y轴和z轴方向，L、W、H分别为车厢的长、宽、高。箱子具有六种摆放姿态，分别以箱子的长宽、长高、宽高平面为底，旋转90°可以得到另外三种摆放姿态。核心 # 箱子放置策略本算法将角点作为车厢内部空间中箱子的摆放位置，每次放入新箱子后搜索新生成的角点，当向车厢中放入第一个箱子时，假设车厢中只有原点一个角点，当一个箱子放入后，会产生新的角点，再放置箱子后，又会产生新的角点。建立箱子可放置点列表，表示箱子i到来时，车厢内部所有可选的摆放位置，在放置新箱子后更新可放置点列表，并记录已放置箱子到车厢顶部距离，用于后续的奖励函数。 # DQN （1）设置一些超参数，包括ε-greedy使用的ε，折扣因子γ，目标网络更新频率，经验池容量等。（2）由于给定的箱子数据较少，为了增加模型训练数据数量，将给定的箱子数据打乱，以随机的形式生成并保存，作为训练数据，训练网络模型。（3）奖励函数使用x-y平面中两个最大剩余矩形面积（如下图）之和与箱子到车厢顶部的距离作为奖励值R，奖励函数表示如下【说明】该项目是个人毕设项目，答辩评审分达到95分，代码都经过调试测试，确保可以运行！欢迎下载使用，可用于小白学习、进阶。该资源主要针对计算机、通信、人工智能、自动化等相关专业的学生、老师或从业者下载使用，亦可作为期末课程设计、课程大作业、毕业设计等。项目整体具有较高的学习借鉴价值！基础能力强的可以在此基础上修改调整，以实现不同的功能。欢迎下载交流，互相学习，共同进步！

资源推荐

资源详情

资源评论

收起资源包目录

基于DQN深度强化学习解决三维在线装箱问题python源码+项目说明.zip （10个子文件）

eval.py 2KB

项目说明.md 5KB

draw.py 2KB

data.py 4KB

cnn.pth 6.06MB

images

ͼƬ2.png 3KB

fig1 1B

ͼƬ1.png 91KB

train.py 13KB

container.py 6KB

# -*- coding: utf-8 -*- import torch import torch.nn as nn from torch.autograd import Variable import torch.nn.functional as F import numpy as np import random import copy import time # import draw from container import * from data import * # from draw import * # 1. Define some Hyper Parameters EPSILON = 0.9 # epsilon used for epsilon greedy approach BATCH_SIZE = 16 # batch size of sampling process from buffer LR = 0.0001 # learning rate GAMMA = 0.9 # discount factor TARGET_NETWORK_REPLACE_FREQ = 100 # How frequently target netowrk updates MEMORY_CAPACITY = 2000 # The capacity of experience replay buffer device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 2. Random generate box data solution = [[(91, 54, 45, 32), (105, 77, 72, 24), (79, 78, 48, 30)], [(108, 76, 30, 24), (110, 43, 25, 7), (92, 81, 55, 22), (81, 33, 28, 13), (120, 99, 73, 15)], [(88, 54, 39, 16), (94, 54, 36, 14), (87, 77, 43, 20), (100, 80, 72, 16), (83, 40, 36, 6),(91, 54, 22, 15), (109, 58, 54, 17), (94, 55, 30, 9)], [(86, 84, 45, 18), (81, 45, 34, 19), (70, 54, 37, 13), (71, 61, 52, 16), (78, 73, 40, 10),(69, 63, 46, 13), (72, 67, 56, 10), (75, 75, 36, 8), (94, 88, 50, 12), (65, 51, 50, 13)], [(108, 76, 30, 12), (110, 43, 25, 12), (92, 81, 55, 6), (81, 33, 28, 9), (120, 99, 73, 5), (111, 70, 48, 12), (98, 72, 46, 9), (95, 66, 31, 10), (85, 84, 30, 8), (71, 32, 25, 3), (36, 34, 25, 10), (97, 67, 62, 7), (33, 25, 23, 7), (95, 27, 26, 10), (94, 81, 44, 9)]] def random_generate(): idx = np.random.randint(0,len(solution)) box_list = solution[idx] gen_box_order = [] while True: index = np.random.randint(0, len(box_list)) box_list[index] = (box_list[index][0], box_list[index][1], box_list[index][2], box_list[index][3]-1) gen_box_order.append((box_list[index][0], box_list[index][1], box_list[index][2])) if box_list[index][3] == 0: box_list.pop(index) if len(box_list) == 0: break return gen_box_order def normalization(data): _range = np.max(data) - np.min(data) return (data - np.min(data)) / _range def standardization(data): mu = np.mean(data) sigma = np.std(data) return (data - mu) / sigma # 3. Define the network used in both target net and the net for training class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() # 继承__init__功能 ## 第一层卷积 self.conv1 = nn.Sequential( # 输入[2,587,233] nn.Conv2d( in_channels=2, # 输入图片的高度 out_channels=16, # 输出图片的高度 kernel_size=3, # 5x5的卷积核，相当于过滤器 stride=1, # 卷积核在图上滑动，每隔一个扫一次 padding=1, # 给图外边补上0 ), # 经过卷积层输出[16,28,28] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[16,14,14] 传入下一个卷积 ) ## 第二层卷积 self.conv2 = nn.Sequential( nn.Conv2d( in_channels=16, # 同上 out_channels=32, kernel_size=3, stride=1, padding=1 ), # 经过卷积输出[32, 14, 14] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[32,7,7] 传入输出层 ) ## 第三层卷积 self.conv3 = nn.Sequential( nn.Conv2d( in_channels=32, # 同上 out_channels=64, kernel_size=3, stride=1, padding=1 ), # 经过卷积输出[32, 14, 14] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[32,7,7] 传入输出层 ) ## 第四层卷积 self.conv4 = nn.Sequential( nn.Conv2d( in_channels=64, # 同上 out_channels=128, kernel_size=3, stride=1, padding=1 ), # 经过卷积输出[32, 14, 14] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[32,7,7] 传入输出层 ) ## 第五层卷积 self.conv5 = nn.Sequential( nn.Conv2d( in_channels=128, # 同上 out_channels=256, kernel_size=3, stride=1, padding=1 ), # 经过卷积输出[32, 14, 14] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[32,7,7] 传入输出层 ) ## 第六层卷积 self.conv6 = nn.Sequential( nn.Conv2d( in_channels=256, # 同上 out_channels=512, kernel_size=3, stride=1, padding=1 ), # 经过卷积输出[32, 14, 14] 传入池化层 nn.ReLU(), nn.MaxPool2d(kernel_size=2) # 经过池化输出[32,7,7] 传入输出层 ) ## 输出层 self.output = nn.Linear(in_features=512*9*3, out_features=1) def forward(self, x): x = self.conv1(x) x = self.conv2(x) # [batch, 32,7,7] x = self.conv3(x) # [batch, 32,7,7] x = self.conv4(x) # [batch, 32,7,7] x = self.conv5(x) # [batch, 32,7,7] x = self.conv6(x) # [batch, 32,7,7] x = x.view(x.size(0), -1) # 保留batch, 将后面的乘到一起 [batch, 32*7*7] output = self.output(x) # 输出[50,10] return output class DQN(object): def __init__(self): # -----------Define 2 networks (target and training)------# self.eval_net, self.target_net = CNN(), CNN() # Define counter, memory size and loss function self.learn_step_counter = 0 # count the steps of learning process self.memory: List = [None] * MEMORY_CAPACITY self.memory_counter = 0 # counter used for experience replay buffer # ------- Define the optimizer------# self.optimizer = torch.optim.Adam(self.eval_net.parameters(), lr=LR) # ------Define the loss function-----# self.loss_func = nn.MSELoss() def choose_action(self, state:Container, cargo): # 可行点取最大的 is_encase, inputs, points, poses = state.encase(cargo) # torch.set_printoptions(profile="full") # print(inputs) if is_encase == False: return is_encase, is_encase, is_encase if np.random.uniform() < EPSILON: # greedy data = inputs.data.cpu().numpy() data = normalization(data) data = standardization(data) inputs = torch.tensor(data) with torch.no_grad(): actions_value = self.target_net.forward(inputs) action = torch.max(actions_value, 0)[1].data.numpy() action = action[0] point = points[action] pose = poses[action] else: action = np.random.randint(0, high=len(points)) point = points[action] pose = poses[action] return point, pose def store_transition(self, s:torch.Tensor, a:Cargo, r, s_:Container, a_:Cargo): transition = [s, a.matrix(), r, s_, a_] # if the capacity is full, then use index to replace the old memory with new one index = self.memory_counter % MEMORY_CAPACITY self.memory[index] = transition self.memory_counter += 1 def learn(self): # update the target network every fixed steps if self.learn_step_counter % TARGET_NETWORK_REPLACE_FREQ == 0: # Assign the parameters of eval_net to target_net self.target_net.load_sta

评论收藏

内容反馈

版权申诉

小瓶子770

2023-10-09

大佬请问能不能分享一下代码的一些细节问题

Make程序设计
上传者
2023-10-10

感谢支持，项目说明讲的比较清楚，代码中有主是
m0_74812855

2024-04-14

实在是宝藏资源、宝藏分享者！感谢大佬~
Mr.杨295

2024-04-02

资源内容详细全面，与描述一致，对我很有用，有一定的使用价值。
fagege001

2023-10-25

资源不错，对我启发很大，获得了新的灵感，受益匪浅。