MovieLens-Recommendation-Python
Python
Python, Shell
共58个文件
py: 9
pyc: 8
base: 7
test: 7
xml: 6
pkl: 5
dat: 3
gitignore: 1
iml: 1
md: 1
基于协同过滤算法实现的电影推荐。
MovieLens-Recommender
MovieLens-Recommender is a pure Python implement of Collaborative Filtering. Which contains User Based Collaborative Filtering(UserCF) and Item Based Collaborative Filtering(ItemCF). As comparisons, Random Based Recommendation and Most-Popular Based Recommendation are also included. The famous Latent Factor Model(LFM) is added in this Repo,too.
The buildin-datasets are Movielens-1M and Movielens-100k. But of course, you can use other custom datasets.
Besides, there are two models named UserCF-IIF and ItemCF-IUF, which have improvement to UseCF and ItemCF. They eliminate the influence of very popular users or items.
Overview
The book 《推荐系统实践》 written by Xiang Liang is quite wonderful for those people who don't have much knowledge about Recommendation System. But the book only offers each function's implement of Collaborative Filtering. A good architecture project with datasets-build and model-validation process are required.
So I made MovieLens-Recommender project, which is a pure Python implement of Collaborative Filtering based on the ideas of the book.
This repository is based on MovieLens-RecSys, which is also a good implement of Collaborative Filtering. But its efficiency is so damn poor!
Besides, Surprise is a very popular Python scikit building and analyzing recommender systems. So, I Mix the advantages of these two projects, and here comes MovieLens-Recommender.
My Recommendation System contains four steps:
Create trainset and testset
Train a recommender model
Give recommendations
Evaluate results
At the end of a recommendation process, four numbers are given to measure the recommendation model, which are:
Precision
Recall
Coverage
Popularity
No python extensions(e.g. Numpy/pandas) are needed!
Getting started
1. Download
Git is awesome~
git clone https://github.com/fuxuemingzhu/MovieLens-Recommender.git
Movielens-1M and Movielens-100k datasets are under the data/ folder.
2. Run
The configures are in main.py. Pleas choose the dataset and model you want to use and set the proper test_size. The default values in main.py are shown below:
dataset_name = 'ml-100k'
# dataset_name = 'ml-1m'
# model_type = 'UserCF'
# model_type = 'UserCF-IIF'
# model_type = 'ItemCF'
# model_type = 'Random'
# model_type = 'MostPopular'
model_type = 'ItemCF-IUF'
# model_type = 'LFM'
test_size = 0.1
Then run python main.py in your command line. There will be a recommendation model built on the dataset you choose above.
Note: my code only tested on python3, so python3 is prefer.
Python main.py
#Python3 main.py
if you are using Linux, this command will redirect the whole output into a file.
Python main.py > run.log 2>&1 &
#Python3 main.py > run.log 2>&1 &
This command will run in background. You can wait for the result, or use tail -f run.log to see the real time result.
All model will be saved to model/ fold, which means the time will be cut down in your next run.
3. Output
Here is a example run result of ItemCF model trained on ml-1m with test_size = 0.10. No mater which model are chosen, the output log will like this.
**********************************************************************
This is ItemCF model trained on ml-1m with test_size = 0.10
**********************************************************************
ItemBasedCF start...
No model saved before.
Train a new model...
counting movies number and popularity...
counting movies number and popularity success.
total movie number = 3693
generate items co-rated similarity matrix...
steps(0), 0.00 seconds have spent..
steps(1000), 18.50 seconds have spent..
steps(2000), 46.39 seconds have spent..
steps(3000), 63.52 seconds have spent..
steps(4000), 87.37 seconds have spent..
steps(5000), 111.83 seconds have spent..
steps(6000), 132.71 seconds have spent..
generate items co-rated similarity matrix success.
total step number is 6040
total 133.61 seconds have spent
calculate item-item similarity matrix...
steps(0), 0.00 seconds have spent..
steps(1000), 1.77 seconds have spent..
steps(2000), 3.47 seconds have spent..
steps(3000), 5.01 seconds have spent..
calculate item-item similarity matrix success.
total step number is 3693
total 5.67 seconds have spent
Train a new model success.
The new model has saved success.
recommend for userid = 1:
['1196', '364', '1265', '318', '2081', '1282', '1198', '2716', '1', '2096']
recommend for userid = 100:
['2916', '1580', '457', '1240', '589', '1291', '780', '1036', '1610', '1214']
recommend for userid = 233:
['1022', '594', '1282', '2087', '2078', '1196', '608', '2081', '593', '1393']
recommend for userid = 666:
['296', '1704', '593', '356', '1196', '589', '1580', '50', '1393', '1']
recommend for userid = 888:
['2916', '457', '480', '2628', '1265', '1610', '1198', '1573', '2762', '1527']
Test recommendation system start...
steps(0), 0.10 seconds have spent..
steps(1000), 291.42 seconds have spent..
steps(2000), 627.60 seconds have spent..
steps(3000), 898.21 seconds have spent..
steps(4000), 1219.94 seconds have spent..
steps(5000), 1523.29 seconds have spent..
steps(6000), 1817.46 seconds have spent..
Test recommendation system success.
total step number is 6040
total 1829.26 seconds have spent
precision=0.1900 recall=0.1147 coverage=0.1673 popularity=7.3911
total Main Function step number is 0
total 1972.49 seconds have spent
Benchmarks
Here are four models' benchmarks over Precision、Recall、Coverage、Popularity. The testsize is 0.1.
These results are nearly same with Xiang Liang's book, which proves that my algorithms are right.
Movielens 1M:
Movielens 1M
Precision
Recall
Coverage
Popularity
UserCF
19.84%
11.97%
28.16%
7.2023
ItemCF
19.00%
11.47%
16.73%
7.3911
UserCF-IIF
19.77%
11.93%
29.62%
7.1660
ItemCF-IUF
18.71%
11.29%
15.03%
7.4748
LFM
/
/
/
/
Random
0.54%
0.33%
100.00%
4.4075
Most Popular
10.59%
6.39%
2.76%
7.7462
Movielens 100k:
Movielens 100k
Precision
Recall
Coverage
Popularity
UserCF
19.69%
18.50%
22.20%
5.4928
ItemCF
17.89%
16.80%
13.23%
5.6202
UserCF-IIF
19.57%
18.38%
22.74%
5.4716
ItemCF-IUF
20.38%
12.30%
17.30%
7.3643
LFM
20.29%
19.06%
27.41%
4.9983
Random
0.82%
0.77%
99.64%
3.0332
Most Popular
10.54%
9.90%
4.07%
5.9578
Notice
UserCF is faser than ItemCF. Using ml-100k instead of ml-1m will speed up the predict process.
Caculating similarity matrix is quite slow. Please wait for the result patiently.
LFM will make negative samples when running. And when the ratio of Neg./Pos. goes to larger, the performance goes to better.
LFM has more parameters to tune, and I don't spend much time to do this. I believe you will do quite better!
Licence
Apache License.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
https://gitee.com/quxiaolong2020/movie-lens-recommendation-python
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
项目概述:MovieLens是基于Python实现的协同过滤电影推荐系统。该系统主要由Python语言编写,辅以Shell脚本,包含共计58个文件,其中详细划分为9个Python脚本文件,8个编译过的Python字节码文件,7个基础配置文件,7个测试文件,6个XML配置文件,5个Python序列化数据文件,3个数据文件,1个Git忽略配置文件,1个IDE项目文件以及1个Markdown文档。 该推荐系统利用协同过滤算法,通过分析用户历史行为数据,为用户推荐匹配其兴趣的电影,极大提升个性化观影体验。
资源推荐
资源详情
资源评论
收起资源包目录
upload.zip (58个子文件)
utils.py 3KB
main.py 3KB
data
ml-100k
u3.test 387KB
u.occupation 193B
u3.base 1.51MB
README 7KB
u2.base 1.51MB
u5.base 1.51MB
u.genre 202B
u1.test 383KB
u.info 36B
mku.sh 643B
u5.test 388KB
u4.base 1.51MB
u.item 231KB
ub.test 182KB
u.data 1.89MB
ub.base 1.71MB
ua.base 1.71MB
u.user 22KB
u2.test 386KB
allbut.pl 716B
u1.base 1.51MB
u4.test 388KB
ua.test 182KB
ml-1m
README 5KB
users.dat 131KB
ratings.dat 23.45MB
movies.dat 167KB
similarity.py 7KB
.idea
MovieLens-Recommender-master.iml 452B
vcs.xml 185B
misc.xml 191B
inspectionProfiles
Project_Default.xml 786B
profiles_settings.xml 212B
modules.xml 315B
deployment.xml 1KB
.gitignore 184B
ItemCF.py 6KB
random_pred.py 5KB
UserCF.py 6KB
dataloader.py 4KB
model
ml-100k-testsize0.1-user_sim_mat.pkl 10.7MB
ml-100k-testsize0.1-movie_count.pkl 6B
ml-100k-testsize0.1-testset.pkl 163KB
ml-100k-testsize0.1-trainset.pkl 1.29MB
ml-100k-testsize0.1-movie_popular.pkl 24KB
__pycache__
UserCF.cpython-36.pyc 4KB
similarity.cpython-36.pyc 4KB
ItemCF.cpython-36.pyc 5KB
LFM.cpython-36.pyc 7KB
dataloader.cpython-36.pyc 4KB
random_pred.cpython-36.pyc 4KB
most_popular.cpython-36.pyc 4KB
utils.cpython-36.pyc 4KB
LFM.py 7KB
readme.txt 8KB
most_popular.py 5KB
共 58 条
- 1
资源评论
- 一个奇怪的土豆2024-04-06感谢资源主分享的资源解决了我当下的问题,非常有用的资源。
- 2301_772044352024-04-10资源内容总结地很全面,值得借鉴,对我来说很有用,解决了我的燃眉之急。
沐知全栈开发
- 粉丝: 4760
- 资源: 3372
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于opencv的dnn模块实现Yolo-Fastest的目标检测python源码+模型+说明(高分项目).zip
- 使用Python调用微信本地ocr服务.zip
- 【精品推荐】人工智能在医疗中的应用.pptx
- 【精品推荐】电子医疗仪器人机接口-(1).ppt
- 【精品推荐】电子医疗仪器人机接口.ppt
- ubuntu镜像ubuntu镜像01
- 基于paddle搭建神经网络实现5种水果识别分类python源码+数据集(高分毕设).zip
- 【精品推荐】电子商务网店类型介绍.ppt
- 基于paddle搭建神经网络实现水果识别分类python源码+数据集(高分项目).zip
- 三菱plc编程口通信学习笔记.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功