南开大学计算机学院自然语言处理课程大作业，含Python实现的代码及数据集资源-CSDN文库

共125个文件

txt：67个

py：24个

sh：14个

版权申诉

自然语言处理

课程资源

python

数据集

5星 · 超过95%的资源 191 浏览量 2023-12-20 16:44:39 上传评论收藏 203.16MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

南开大学计算机学院自然语言处理课程大作业，含Python实现的代码及数据集（125个子文件）

train.csv 810KB

test.csv 231KB

whole_data.json 17.14MB

train_wiki.json 13.64MB

val_wiki.json 3.37MB

val_semeval.json 2.1MB

pubmed_unsupervised.json 1.04MB

val_nyt.json 1008KB

val_pubmed.json 422KB

pid2name.json 78KB

categories.json 2KB

README.md 850B

README.md 529B

README.md 358B

README.md 287B

README.md 178B

README.md 163B

README.md 160B

README.md 140B

README.md 129B

train.dynamicFewShot.py 41KB

train.DNNC.py 40KB

train.entailment.py 37KB

train.Hybrid.py 35KB

train.DNNC.py 35KB

train.protonet.py 34KB

train.entailment.py 31KB

train_text_classifier.py 29KB

class_oriented_document_representations.py 11KB

static_representations.py 7KB

data_split.py 5KB

document_class_alignment.py 5KB

prepare_text_classifer_training.py 3KB

preprocessing_utils.py 3KB

split_data.py 3KB

utils.py 2KB

evaluate.py 2KB

compute_mean_std.py 2KB

read_data.py 1KB

train.dynamicFewShot.commands.sh 2KB

train.entailment.commands.sh 2KB

train.protonet.command.sh 2KB

train.Hybrid.commands.sh 2KB

train.DNNC.commands.sh 2KB

train.entailment.commands.sh 2KB

train.Hybrid.commands.sh 2KB

train.DNNC.commands.sh 2KB

run.sh 793B

run_train_text_classifier.sh 737B

run_data_preprocess.sh 84B

run_data_preprocess.sh 49B

dataset.txt 163.22MB

dataset.txt 130.48MB

dataset.txt 47.83MB

dataset.txt 30.22MB

dataset.txt 27.29MB

dataset.txt 26.13MB

train.txt 1.56MB

labels.txt 1.22MB

total_train.txt 689KB

labels.txt 234KB

total_test.txt 231KB

test.txt 169KB

train.txt 159KB

test.txt 127KB

test.txt 124KB

test.txt 123KB

test.txt 122KB

total_dev.txt 120KB

test.txt 98KB

dev.txt 88KB

train.txt 82KB

labels.txt 74KB

dev.txt 65KB

labels.txt 62KB

dev.txt 62KB

dev.txt 61KB

test.txt 57KB

dev.txt 49KB

labels.txt 35KB

test.txt 35KB

test.txt 30KB

dev.txt 30KB

test.txt 28KB

test.txt 27KB

labels.txt 26KB

test.txt 24KB

共 125 条

This folder contains the datasets used for X-Class. Due to size constraints, we uploaded the data to google drive, [here](https://drive.google.com/drive/folders/1w0g3c0z9eoV-IYHCcA54tBKiNTYJy-3J?usp=sharing) is the download link. After download, you can unzip the zipped dataset through `unzip -o`. ## Data format We also describe the dataset format for potential use of new datasets. All files should be placed in a folder with the dataset's name, in this directory. The files to include are - dataset.txt - A text file containing documents, one per line. We will use BERT's tokenizer for tokenization. - classes.txt - A text file containing the class names, one per line. - labels.txt - A text file containing the class (index) of each document in `dataset.txt`, one label per line. All the files should have the exact same names.

评论收藏

内容反馈

版权申诉