基于机器学习的新闻标题分类系统源码+数据集+模型（高分毕业设计）.zip资源-CSDN文库

共62个文件

html：16个

css：13个

txt：10个

版权申诉

毕业设计

机器学习

数据集

人工智能

2 浏览量 2024-03-25 11:36:45 上传评论收藏 10.93MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于机器学习的新闻标题分类系统源码+数据集+训练好的模型+项目操作说明-本科毕设项目.zip （62个子文件）

主-main

Bachelor_Graduation.sql 2KB

main.py 69B

app

__init__.py 276B

tables.py 2KB

templates

user_info.html 1KB

keywords.html 540B

show_admin.html 2KB

clean.html 474B

admin_info.html 1KB

classify.html 1KB

emotional.html 855B

detect_admin.html 489B

404.html 51KB

show_user.html 2KB

admin.html 7KB

index.html 499B

detect_user.html 793B

user.html 11KB

vector.html 487B

view.py 13KB

pipeline.py 2KB

static

admin.js 2KB

user.js 2KB

404.js 421B

jquery.simplePagination.js 11KB

hl-all.js 13KB

echarts.min.js 993KB

news.js 3KB

classify.js 2KB

css

style.css 2KB

404.css 867B

admin.css 4KB

hl.css 2KB

classify.css 8KB

clean.css 880B

detect.css 1KB

keywords.css 880B

user.css 7KB

vector.css 880B

emotional.css 1KB

simplePagination.css 6KB

config.py 244B

filter.py 2KB

data

dev.txt 2.68MB

test.word 2.47MB

taskgline02.pdf 112KB

test_with_label.word 2.68MB

哈工大停用词表.txt 5KB

sensitive_words.txt 234KB

id2tag.txt 233B

中文停用词表.txt 5KB

四川大学机器智能实验室停用词库.txt 7KB

vocab.txt 1.49MB

train.txt 11.73MB

.idea

Bachelor_Graduation.iml 284B

modules.xml 290B

preprocess.ipynb 27KB

使用说明.txt 1KB

requirements.txt 5KB

.gitignore 2KB

Corpus for Chinese News Headline Categorization

1 Task Denition

This task aims to evaluate the automatic classication

techniques for very short texts, i.e., Chinese news head-

lines. Each news headline (i.e., news title) is required

to be classied into one or more predened categories.

With the rise of Internet and social media, the text data

on the web is growing exponentially. Make a human

being to analysis all those data is impractical, while ma-

chine learning techniques suits perfectly for this kind of

tasks. after all, human brain capacity is too limited and

precious for tedious and non-obvious phenomenons.

Formally, the task is dened as follows: given a news

headline x = (x

, x

, ..., x

), where x

represents jth

word in x, the object is to nd its possible category or

label c ∈ C. More specically, we need to nd a function

to predict in which category does x belong to.

∗

= arg max

c∈C

f(x; θ

), (1)

where θ is the parameter for the function.

2 Data

We collected news headlines (titles) from several Chinese

news websites, such as toutiao, sina, and so on.

There are 18 categories in total. The detailed infor-

mation of each category is shown in Table 1. All the

sentences are segmented by using the python Chinese

segmentation tool jieba.

Some samples from training dataset are shown in Ta-

ble 2.

Length The statistical information is also given in Fig.

Figure 1 shows that most of title sentence character

number is less than 40, with a mean of 21.05. Title

sentence word length is even shorter, most of which is

less than 20 with a mean of 12.07.

The dataset is released on github along with a Ten-

sorow

[

Abadi et al., 2015

]

implemented demonstration

code.

3 Evaluation

We use the macro-averaged precision, recall and F1 to

evaulate the performance.

Category Train Dev Test

entertainment 10000 2000 2000

sports 10000 2000 2000

car 10000 2000 2000

society 10000 2000 2000

tech 10000 2000 2000

world 10000 2000 2000

nance 10000 2000 2000

game 10000 2000 2000

travel 10000 2000 2000

military 10000 2000 2000

history 10000 2000 2000

baby 10000 2000 2000

fashion 10000 2000 2000

food 10000 2000 2000

discovery 4000 2000 2000

story 4000 2000 2000

regimen 4000 2000 2000

essay 4000 2000 2000

Table 1: The information of categories.

The Macro Avg. is dened as follow:

Macro_avg =

∑

i=1

And Micro Avg. is dened as:

Micro_avg =

∑

i=1

Where m denotes the number of class, in the case of this

dataset is 18. ρ

is the accuracy of ith category, w

rep-

resents how many test examples reside in ith category,

N is total number of examples in the test set.

4 Baseline Implementations

As a branch of machine learning, Deep Learning (DL)

has gained much attention in recent years due to its

prominent achievement in several domains such as Com-

puter vision and Natural Language processing.

评论收藏

内容反馈

版权申诉

盈梓的博客

粉丝: 9573
资源: 2310

基于机器学习的新闻标题分类系统源码+数据集+模型（高分毕业设计）.zip

python实现基于机器学习的新闻标题分类系统源码+数据集+模型+项目说明（高分毕设）.zip

基于机器学习的分布式webshell检测系统源码+数据集+详细文档（高分毕业设计）.zip

基于机器学习的新闻标题分类系统源码+数据集+训练好的模型+项目操作说明-本科毕设项目.zip

基于机器学习和OCR的车牌识别系统源码+数据集+详细文档（高分毕业设计）.zip

毕业设计 基于springboot+机器学习的新闻阅读网站源码+详细部署文档+全部数据资料（高分项目）.zip

基于机器视觉实现昆虫识别计数系统python源码+数据集+模型（高分大作业项目）zip

基于传统机器学习方法的文本分类技术源码+全部数据.zip

基于机器学习的入侵检测系统+源代码+文档说明（高分项目）.zip

人工智能大作业基于ALBERT+机器学习算法实现文本分类源码+文档说明+数据集（高分项目）.zip

基于VUE的新闻管理系统源码+数据库（毕业设计）.zip

基于机器学习实现的气温预测python源码+数据集+训练好的模型(课程设计).zip

Python实现基于机器学习的银行客户认购产品预测项目源码+数据集+模型文件（毕业设计）.zip

基于机器学习分析地下储层岩性识别与分类Python源码+文档说明+数据集（高分项目）

基于机器学习的新闻标题分类系统源码.zip

基于SSM框架的新闻管理系统源码（课程设计+毕业设计）.zip

基于机器学习SVM与LSTM模型的商品评论情感分析python源码+数据集+模型+GUI界面(毕业设计).zip

基于安卓Android卷积神经网络的生活垃圾的图像识别与分类系统源码+部署教程文档+全部数据+训练好的模型（高分项目）.zip

基于Python篇章结构自动作文评分系统源码+数据集+详细文档（高分毕业设计）.zip

TUST本科毕业设计（基于机器学习的新闻标题分类系统）.zip

毕业设计：Python基于机器学习新闻文本分类系统（源码 + 数据库 + 说明文档）

基于TensorFlow，python-scapy机器学习的web攻击检测系统源码+数据集+详细文档（高分毕业设计）.zip

毕业设计 基于Python+Flask机器学习的新闻标题分类系统源码+部署文档+全部数据资料（优秀项目）.zip

基于机器学习的新闻标题分类系统

基于Python的机器学习之新闻上的文本分类.zip

机器学习基于Python朴素贝叶斯的新闻文本分类项目源码.zip

基于机器学习算法在数据分类中的应用研究

Something about machine learning

机器学习 数据集

最新资源

毕业设计基于springboot+机器学习的新闻阅读网站源码+详细部署文档+全部数据资料（高分项目）.zip

毕业设计基于Python+Flask机器学习的新闻标题分类系统源码+部署文档+全部数据资料（优秀项目）.zip

机器学习数据集