人工智能实验五.zip资源-CSDN文库

共8个文件

png：4个

txt：2个

py：1个

需积分: 5 162 浏览量 2024-04-18 16:15:48 上传评论收藏 111KB ZIP 举报

人工智能（Artificial Intelligence，简称AI）是一门新的技术科学，致力于研究、开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。其目标是让计算机具备类人智能的能力，包括感知、理解、判断、推理、学习、识别、生成、交互等，从而能够执行各种任务，甚至在某些方面超越人类的智能表现。人工智能的应用范围极为广泛，涵盖了计算机科学、金融贸易、医药、诊断、重工业、运输、远程通讯、在线和电话服务、法律、科学发现、玩具和游戏、音乐等诸多领域。具体的应用实例包括语音识别、图像识别、自然语言处理、智能交互、自动驾驶、医疗健康等。例如，在冬奥会上，人工智能技术被用于场馆的智能化建设，提供定位精准、随叫随到的引导服务；在证件照检测方面，人工智能可以高准确度地进行人脸识别；在企业管理中，人工智能平台可以实现对各类云资源的统一管理。人工智能的优点主要有以下几点：高效性：人工智能可以在短时间内处理大量的数据和任务，显著提高效率和生产力。可靠性：相较于人类，人工智能可以更快速、更准确地执行任务，并且不会受到疲劳、情绪等因素的影响，提高了任务执行的可靠性。个性化服务：通过分析大量的用户数据，人工智能可以为用户提供个性化的服务和推荐，提高用户体验和满意度。自主学习：借助机器学习和深度学习等技术，人工智能可以自主地学习和优化模型，不断提升其性能。然而，人工智能也存在一些缺点：数据偏差：如果用于训练的数据集存在偏差，那么训练出来的模型可能会存在误差，影响其性能。隐私问题：人工智能在处理和分析大量数据时，可能会涉及到用户隐私的问题，如个人信息泄露等。总的来说，人工智能是一个不断发展和进步的领域，随着技术的不断改进，其应用范围和优势也在不断扩大。但同时，也需要关注并解决其存在的挑战和问题，以确保其健康、可持续的发展。

资源推荐

资源详情

资源评论

收起资源包目录

人工智能实验五.zip （8个子文件）

content

hw5.py 8KB

image.png 73KB

test_with_label.txt 7KB

image-3.png 10KB

image-2.png 13KB

requirements.txt 76B

image-1.png 17KB

README.md 13KB

# 多模态情感分析 ###### 姓名：李欣然学号:10215501425 ###### github地址：https://github.com/ranwan20/ModelAI（p.s.'bert-base-uncased'文件太大就没有上传到github，邮件压缩包添加了该模型。 #### 实验要求设计一个多模态融合模型。自行从训练集中划分验证集，调整超参数。预测测试集（test_without_label.txt）上的情感标签。 #### 实验过程定义了一个名为Model的多模态融合模型，这也是本次实验的主要部分，它由文本模型和图像模型组成，并将它们的输出连接起来进行分类。在__init__方法中，模型定义了以下组件： txt_model：使用预训练的Bert模型，通过BertModel.from_pretrained方法加载了一个基于bert-base-uncased的模型。 img_model：使用预训练的ResNet-18模型，通过torchvision.models.resnet18(pretrained=True)加载了一个预训练模型。 linear1：一个线性层，将文本模型的输出维度（768）映射到128维。 linear2：一个线性层，将图像模型的输出维度（1000）映射到128维。 fc：一个线性层，将文本和图像模型的输出连接起来，并将结果映射到3维，用于分类。 relu：ReLU激活函数。在forward方法中，模型定义了数据在前向传播过程中的流动方式：图像通过图像模型（img_model）进行处理，得到图像的特征表示（img_out）。图像特征经过线性层（linear2）和ReLU激活函数（relu）的处理后，得到处理后的图像特征（img_out）。文本数据通过文本模型（txt_model）进行处理，得到文本的特征表示（txt_out）。这里使用了Bert模型的last_hidden_state属性获取最后一层的隐藏状态。通过切片操作，提取文本特征的第一个词的表示（txt_out.last_hidden_state[:,0,:]）。使用view方法将文本特征的形状调整为(batch_size, -1)，其中batch_size是输入数据的批大小。文本特征经过线性层（linear1）和ReLU激活函数（relu）的处理后，得到处理后的文本特征（txt_out）。将文本和图像特征在最后一个维度上进行拼接，并通过线性层（fc）进行分类。最后，返回分类结果（out）。这个模型将文本和图像特征结合起来，通过训练来学习如何将它们有效地组合以进行分类任务。 ```python class Model(nn.Module): def __init__(self): super().__init__() self.txt_model = BertModel.from_pretrained('./bert-base-uncased') self.img_model = torchvision.models.resnet18(pretrained=True) self.linear1 = nn.Linear(768, 128) self.linear2 = nn.Linear(1000, 128) self.fc = nn.Linear(256, 3) self.relu = nn.ReLU() def forward(self, input_ids, attention_mask, image): img_out = self.img_model(image) img_out = self.linear2(img_out) img_out = self.relu(img_out) txt_out = self.txt_model(input_ids=input_ids, attention_mask=attention_mask) txt_out = txt_out.last_hidden_state[:,0,:] txt_out.view(txt_out.shape[0], -1) txt_out = self.linear1(txt_out) txt_out = self.relu(txt_out) out = torch.cat((txt_out, img_out), dim=-1) out = self.fc(out) return out ``` 分别定义了txtonlyModel和imgonlyModel，只输入文本或图像数据，用于得到消融实验结果。 ```python class txtonlyModel(nn.Module): def __init__(self): super().__init__() self.txt_model = BertModel.from_pretrained('bert-base-uncased') self.linear = nn.Linear(768, 256) self.fc = nn.Linear(256, 3) self.relu = nn.ReLU() def forward(self, input_ids, attention_mask, image): txt_out = self.txt_model(input_ids=input_ids, attention_mask=attention_mask) txt_out = txt_out.last_hidden_state[:,0,:] txt_out.view(txt_out.shape[0], -1) txt_out = self.linear(txt_out) txt_out = self.relu(txt_out) out = self.fc(txt_out) return out ``` ```python class imgonlyModel(nn.Module): def __init__(self): super().__init__() self.img_model = torchvision.models.resnet18(pretrained=True) self.linear = nn.Linear(1000, 256) self.fc = nn.Linear(256, 3) self.relu = nn.ReLU() def forward(self, input_ids, attention_mask, image): img_out = self.img_model(image) img_out = self.linear(img_out) img_out = self.relu(img_out) out = self.fc(img_out) return out ``` 函数对文本进行处理，得到input_ids和attention_mask。input_ids是编码后的文本的张量表示，attention_mask是注意力掩码，用于指示哪些标记是有效的。 ```python def txt_(txt, token): result = token.batch_encode_plus(batch_text_or_text_pairs=txt, truncation=True, padding='max_length', max_length=32, return_tensors='pt') input_ids = result['input_ids'] attention_mask = result['attention_mask'] return input_ids, attention_mask ``` 将图像、描述和标签数据组合在一起，并提供方便的访问接口。在使用时，可以通过索引来获取对应的图像、描述、标签以及编码后的文本数据和注意力掩码。 ```python class MultimodalDataset(): def __init__(self, images, descriptions, tags, token): self.images = images self.descriptions = descriptions self.tags = tags self.input_ids, self.attention_masks = txt_(self.descriptions, token) def __len__(self): return len(self.descriptions) def __getitem__(self, idx): img = self.images[idx] des = self.descriptions[idx] tag = self.tags[idx] input_id = self.input_ids[idx] attention_mask = self.attention_masks[idx] return img, des, tag, input_id, attention_mask ``` 根据给定的模型、优化器和数据加载器进行训练，并在每个epoch结束时打印训练和验证的损失以及准确率。 ```python device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def train_process(model, epoch_num, optimizer, train_dataloader, valid_dataloader, train_count, valid_count): Loss_C = nn.CrossEntropyLoss() train_acc = [] valid_acc = [] for epoch in range(epoch_num): loss = 0.0 train_cor_count = 0 valid_cor_count = 0 for b_idx, (img, des, target, idx, mask) in enumerate(train_dataloader): img, mask, idx, target = img.to(device), mask.to(device), idx.to(device), target.to(device) output = model(idx, mask, img) optimizer.zero_grad() loss = Loss_C(output, target) loss.backward() optimizer.step() pred = output.argmax(dim=1) train_cor_count += int(pred.eq(target).sum()) train_acc.append(train_cor_count / train_count) for img, des, target, idx, mask in valid_dataloader: img, mask, idx, target = img.to(device), mask.to(device), idx.to(device), target.to(device) output = model(idx, mask, img) pred = output.argmax(dim=1) valid_cor_count += int(pred.eq(target).sum()) valid_acc.append(valid_cor_count / valid_count) print('Train Epoch: {}, Train_Loss: {:.4f}, Train Accuracy: {:.4f}, Valid Accuracy: {:.4f}'.format(epoch + 1, loss.item(), train_cor_count / train_count, valid_cor_count / valid_count)) ``` 主函数解析命令行参数，选择适当的模型进行训练和预测。读取训练数据，包括图像、描述和标签，并对描述文本进行预处理。然后，将数据集划分为训练集和验证集，并创建相应的数据加载器，使用选择的模型和优化器进行训练，并在每个训练轮次中评估模型在验证集上的性能。最后，读取测试数据的guid列表，并使�

评论收藏

内容反馈