基于LLMs的文本分类系统开发任务指导-高级机器学习资源-CSDN文库

版权申诉

自然语言处理

105 浏览量 2024-11-19 12:34:27 上传评论收藏 122KB PDF 举报

资源推荐

资源详情

资源评论

158736 - Advanced Machine Learning

Assignment 2

Introduction

In this assignment, you will develop a simple text classification system using LLMs. You will

be using the dataset given in this github repository. The corresponding research paper can

be found here (note that you need to access the paper through the Massey network. The

paper has been added to Stream for your convenience).

Read the paper to get an understanding of the dataset. You do not have to understand the

full context of the paper, as we are not going to replicate their method. However, you should

notice that this is a fine example of a text classification system based on traditional Machine

Learning techniques, where a greater emphasis has been placed on input feature

engineering and producing a pipe-lined system.

The dataset contains 5000 questions that have been asked by (potential) travellers. Each

question has been classified into a coarse-grain class and a fine-grain class. For this

assignment, we will only consider the coarse-grain classes as follows:

TTD things to do

TGU travel guide

ACM accommodation

TRS transport

WTH weather

FOD food

ENT entertainment

General Instructions

Similar to assignment 1, your code should run in a colab notebook. Make sure anyone can

access your notebook. Add clear instructions/comments to the code.

In order to answer the questions, you need to refer to external sources such as research

papers. All the sources you refer to must be cited within text, and corresponding bibliography

should be added to the bottom of the document. You may use a bibliography style of your

choice (e.g. APA, IEEE), but make sure your referencing is consistent throughout the

document.

Data Preparation

Download the data csv file and remove the fine-grain column. Use 4000 samples for training,

700 for testing and 300 for validation.

Convert the dataset into the instruction format. Upload this dataset into the GDrive as csv

files. Make sure you use the following folder structure:

/content/gdrive/MyDrive/Massey-158736/assignment-2/

It is extremely important that you have this structure, so that we can easily run the code.

Once the data has been uploaded, your directory should look similar to the image below:

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余2页未读，立即下载

内容反馈

版权申诉

pk_xz123456

粉丝: 2750
资源: 3914

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip